mmtfPyspark.datasets.customReportService module

customReportService.py

This class uses RCSB PDB Tabular Report RESTful webservices to retrieve metadata and annotations for all current entries in the ProteinDataBank.

References

Examples

Retrieve PubMedCentral, PubMedID, and Depositiondate:

>>> ds = customReportService.get_dataset("pmc","pubmedId","depositionDate")
>>> ds.printSchema()
>>> ds.show(5)
get_dataset(columnNames)[source]

Returns a dataset with the specified columns for all current PDB entires. See <a href=”https://www.rcsb.org/pdb/results/reportField.do”> for a list of supported filed names

Parameters:

columnNames : str, list

names of columns for the dataset

Returns:

dataset

dataset with the specified columns