mmtfPyspark.io.mmtfReader module

mmtfReader.py: Methods for reading and downloading structures in MMTF file formats. The data are returned as a PythonRDD with the structure id (e.g. PDB ID) as the key and the structural data as the value.

Supported operations and file formats: - Read directory of MMTF-Hadoop sequence files in full and reduced representation - Download MMTF full and reduced representations using web service (mmtf.rcsb.org) - Read directory of MMTF files (.mmtf, mmtf.gz)

download_full_mmtf_files(pdbIds)[source]

Download and reads the specified PDB entries in full mmtf format using MMTF web services

Parameters:

path : str

Path to PDB files

Returns:

data

structure data as keywork/value pairs

download_mmtf_files(pdbIds, reduced=False)[source]

Download and reads the specified PDB entries using MMTF web services with either full or reduced format

Parameters:

path : str

Path to PDB files

reduced : bool

flag to indicate reduced or full file format

Returns:

data

structure data as keywork/value pairs

download_reduced_mmtf_files(pdbIds)[source]

Download and reads the specified PDB entries in reduced mmtf format using MMTF web services

Parameters:

path : str

Path to PDB files

Returns:

data

structure data as keywork/value pairs

get_mmtf_full_path()[source]

Returns the path to the full MMTF-Hadoop sequence file. It looks for the environmental variable “MMTF_FULL”, if not set, an error message will be shown.

Returns:

str

path to the mmtf_full directory

get_mmtf_reduced_path()[source]

Returns the path to the reduced MMTF-Hadoop sequence file. It looks for the environmental variable “MMTF_REDUCED”, if not set, an error message will be shown.

Returns:

str

path to the mmtf_reduced directory

read_full_sequence_file(pdbId=None, fraction=None, seed=123)[source]

Reads a MMTF-Hadoop Sequence file using the default file location. The default file location is determined by get_mmtf_full_path()

To download mmtf files: https://mmtf.rcsb.org/download.html

Parameters:

pdbID : list, optional

List of structures to read

fraction : float, optional

fraction of structure to read

seed : int, optional

random seed

read_mmtf_files(path)[source]

Read the specified PDB entries from a MMTF file

Parameters:

path : str

Path to MMTF files

Returns:

data

structure data as keywork/value pairs

read_reduced_sequence_file(pdbId=None, fraction=None, seed=123)[source]

Reads a MMTF-Hadoop Sequence file using the default file location. The default file location is determined by get_mmtf_reduced_path()

To download mmtf files: {https://mmtf.rcsb.org/download.htm}

Parameters:

pdbID : list, optional

List of structures to read

fraction : float, optional

fraction of structure to read

seed : int, optional

random seed

read_sequence_file(path, pdbId=None, fraction=None, seed=123)[source]

Reads an MMTF Hadoop Sequence File. Can read all files from path, randomly rample a fraction, or a subset based on input list. See <a href=”http://mmtf.rcsb.org/download.html”> for file download information</a>

Parameters:

path : str

path to file directory

pdbID : list

List of structures to read

fraction : float

fraction of structure to read

seed : int

random seed

Raises:

Exception

file path does not exist