mmtfPyspark.datasets.secondaryStructureExtractor module

secondaryStructureExtractor.py

Creates a dataset of DSSP secondary structure assignments. The dataset includes protein sequence, the DSSP 3-state (Q3) and 8-state (Q8) assignments, and the fraction of alpha, beta, and coil within a chain. The input to this class must be a single protein chain.

Examples

get dataset of secondary structure assignments:

>>> pdb.flatMapToPair(new StructureToPolymerChains())
...    .filter(new ContainsLProteinChain())
>>> secStruct = SecondaryStructureExtractor.getDataset(pdb)
>>> secStruct.show(10)
get_dataset(structure)[source]

Returns a dataset with protein sequence and secondary structure assignments.

Parameters:

structure : mmtfStructure

single protein chain

Returns:

dataset

dataset with sequence and secondary structure assignments

get_python_rdd(structure)[source]

Returns a pythonRDD of 3-state secondary structure

Parameters:structure : mmtfStructure