Create Representative Set Demo¶

Creates an MMTF-Hadoop Sequence file for a Picses representative set of protein chains.

References¶

Please cite the following in any work that uses lists provided by PISCES G. Wang and R. L. Dunbrack, Jr. PISCES: a protein sequence culling server. Bioinformatics, 19:1589-1591, 2003. PISCES

Imports¶

In [1]:

from pyspark import SparkConf, SparkContext
from mmtfPyspark.io import mmtfReader, mmtfWriter
from mmtfPyspark.mappers import StructureToPolymerChains
from mmtfPyspark.filters import PolymerComposition
from mmtfPyspark.webfilters import Pisces

Configure Spark¶

In [2]:

conf = SparkConf().setMaster("local[*]") \
                  .setAppName("CreateRepresentativeSetDemo")
sc = SparkContext(conf = conf)

Read in Haddop Sequence Files¶

In [6]:

path = "../../resources/mmtf_full_sample/"

pdb = mmtfReader.read_sequence_file(path, sc)

Filter by representative protein chains at 40% sequence identity¶

In [7]:

sequenceIdentity = 40
resolution = 2.0

pdb = pdb.filter(Pisces(sequenceIdentity, resolution)) \
         .flatMap(StructureToPolymerChains()) \
         .filter(Pisces(sequenceIdentity, resolution)) \
         .filter(PolymerComposition(PolymerComposition.AMINO_ACIDS_20))

Show top 10 structures¶

In [8]:

pdb.top(10)

Out[8]:

[('1FYE.A', <mmtf.api.mmtf_writer.MMTFEncoder at 0x7fef707b1a58>),
 ('1FXL.A', <mmtf.api.mmtf_writer.MMTFEncoder at 0x7fef702727f0>),
 ('1FVI.A', <mmtf.api.mmtf_writer.MMTFEncoder at 0x7fef701a12b0>),
 ('1FV1.F', <mmtf.api.mmtf_writer.MMTFEncoder at 0x7fef5d39d0f0>),
 ('1FTR.D', <mmtf.api.mmtf_writer.MMTFEncoder at 0x7fef5d341048>),
 ('1FT5.A', <mmtf.api.mmtf_writer.MMTFEncoder at 0x7fef43e000b8>),
 ('1FSG.C', <mmtf.api.mmtf_writer.MMTFEncoder at 0x7fef43caf0b8>),
 ('1FS1.C', <mmtf.api.mmtf_writer.MMTFEncoder at 0x7fef43ad1438>),
 ('1FR3.L', <mmtf.api.mmtf_writer.MMTFEncoder at 0x7fef43aa2c18>),
 ('1FPZ.C', <mmtf.api.mmtf_writer.MMTFEncoder at 0x7fef43a19358>)]

Save representative set¶

In [9]:

write_path = f'./pdb_representatives_{sequenceIdentity}'

mmtfWriter.write_sequence_file(write_path, sc, pdb)

Terminate Spark¶

In [3]:

sc.stop()

Download Notebook Download Script