AuthorSearch shows how to query PDB structures by metadata. This example queries the name fields in the audit_author and citation_author categories.
Each category represents a table and fields represent database columns, see: Available tables and columns
Data are provided through: Mine 2 SQL
Queries can be designed with the interactive PDBj Mine 2 query service
In [1]:
from pyspark import SparkConf, SparkContext
from mmtfPyspark.webfilters import PdbjMineSearch
from mmtfPyspark.io import mmtfReader
In [2]:
conf = SparkConf().setMaster("local[*]") \
.setAppName("AuthorSearchDemo")
sc = SparkContext(conf = conf)
In [6]:
sqlQuery = "SELECT pdbid from audit_author " \
+ "WHERE name LIKE 'Doudna%J.A.%' " \
+ "UNION " \
+ "SELECT pdbid from citation_author " \
+ "WHERE citation_id = 'primary' AND name LIKE 'Doudna%J.A.%'"
In [8]:
path = "../../resources/mmtf_reduced_sample/"
pdb = mmtfReader.read_sequence_file(path, sc) \
.filter(PdbjMineSearch(sqlQuery))
print(f"Number of entries matching query: {pdb.count()}")
Number of entries matching query: 90
In [9]:
sc.stop()