Author Search Demo

pdbj

pdbj

AuthorSearch shows how to query PDB structures by metadata. This example queries the name fields in the audit_author and citation_author categories.

References

Each category represents a table and fields represent database columns, see: Available tables and columns

Data are provided through: Mine 2 SQL

Queries can be designed with the interactive PDBj Mine 2 query service

Imports

In [1]:
from pyspark import SparkConf, SparkContext
from mmtfPyspark.webfilters import PdbjMineSearch
from mmtfPyspark.io import mmtfReader

Configure Spark Context

In [2]:
conf = SparkConf().setMaster("local[*]") \
                  .setAppName("AuthorSearchDemo")
sc = SparkContext(conf = conf)

Query to find PDB structures for Doudna, J.A. as a deposition (audit) author or as an author in the primary PDB citation

In [6]:
sqlQuery = "SELECT pdbid from audit_author " \
                + "WHERE name LIKE 'Doudna%J.A.%' " \
                + "UNION " \
                + "SELECT pdbid from citation_author " \
                + "WHERE citation_id = 'primary' AND name LIKE 'Doudna%J.A.%'"

Read PDB and filter by author

In [8]:
path = "../../resources/mmtf_reduced_sample/"

pdb = mmtfReader.read_sequence_file(path, sc) \
                .filter(PdbjMineSearch(sqlQuery))

print(f"Number of entries matching query: {pdb.count()}")
Number of entries matching query: 90

Terminate Spark Context

In [9]:
sc.stop()