Filter Exclusively By L Protein Demo¶

Simple example of reading an MMTF Hadoop Sequence file, filtering the entries exclusively by LProtein, and counting the number of entries. This example shows how methods can be chained for a more concise syntax.

Imports¶

In [2]:

from pyspark import SparkConf, SparkContext
from mmtfPyspark.io import mmtfReader
from mmtfPyspark.filters import ContainsLProteinChain
from mmtfPyspark.structureViewer import view_structure

Configure Spark¶

In [3]:

conf = SparkConf().setMaster("local[*]") \
                      .setAppName("FilterExclusivelyByLProtein")
sc = SparkContext(conf = conf)

Read in MMTF Files, filter by L protein, and count the entries¶

In [4]:

path =  "../../resources/mmtf_reduced_sample/"

structures = mmtfReader.read_sequence_file(path, sc) \
                .filter(ContainsLProteinChain(exclusive = True))

print(f"Number of L-Proteins: {structures.count()}")

Number of L-Proteins: 4506

Visualize Structures¶

In [5]:

structure_names = structures.keys().collect()
view_structure(structure_names, style='sphere')

Out[5]:

<function mmtfPyspark.structureViewer.view_structure.<locals>.view3d>

Terminate Spark¶

In [6]:

sc.stop()

Download Notebook Download Script