mmtfPyspark.datasets.secondaryStructureSegmentExtractor module

secondaryStructureSegmentExtractor.py:

This class creates a dataset of sequence segments of specified length and associate secondary structure information. Sequence and secondary structure strings are split into segments using a sliding window of the specified segment length. The dataset contains the sequence segment and the DSP Q8 and DSSP Q3 secondary structure annotation of the cneter residue. Therefore, the segment length must be an odd number

get_dataset(structureRDD, length)[source]

Returns a dataset of sequence segments of the specified length and the DSSP Q8 and Q3 code of the center residue in a segment.

Parameters:

structureRDD : structure

length : int

segment length, must be an odd number

Returns:

dataset

dataset of segments

Raises:

Exception

Segment length must be an odd number