Drug Bank Demo

DrugBank

DrugBank

This demo demonstrates how to access the open DrugBank dataset. This dataset contains identifiers and names for integration with other data resources.

Reference

Wishart DS, et al., DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2017 Nov 8.

doi:10.1093/nar/gkx1037.

Imports

In [2]:
from pyspark.sql import SparkSession
from mmtfPyspark.datasets import drugBankDataset

Configure Spark Session

In [4]:
spark = SparkSession.builder\
                    .master("local[*]")\
                    .appName("DrugBankDemo") \
                    .getOrCreate()

Download open DrugBank dataset

In [8]:
openDrugLinks = drugBankDataset.get_open_drug_links()

openDrugLinks.columns
Out[8]:
['DrugBankID',
 'AccessionNumbers',
 'Commonname',
 'CAS',
 'UNII',
 'Synonyms',
 'StandardInChIKey']

Find all drugs with an InChIKey

In [9]:
openDrugLinks = openDrugLinks.filter("StandardInChIKey IS NOT NULL")

Show some sample data

In [10]:
openDrugLinks.select("DrugBankID","Commonname","CAS","StandardInChIKey").show()
+----------+--------------------+-----------+--------------------+
|DrugBankID|          Commonname|        CAS|    StandardInChIKey|
+----------+--------------------+-----------+--------------------+
|   DB00006|         Bivalirudin|128270-60-0|OIRCOABEOLEUMC-GE...|
|   DB00014|           Goserelin| 65807-02-5|BLCLNMBMMGCOAS-UR...|
|   DB00027|        Gramicidin D|  1405-97-6|NDAYQJDHGXTBJL-MW...|
|   DB00035|        Desmopressin| 16679-58-6|NFLWUMRGJYTJIN-NX...|
|   DB00050|          Cetrorelix|120287-85-6|SBNPWPIBESPSIF-MH...|
|   DB00080|          Daptomycin|103060-53-3|DOAKLVKFURWEDJ-RW...|
|   DB00091|        Cyclosporine| 59865-13-3|PMATZTZNYRCHOR-CG...|
|   DB00093|         Felypressin|    56-59-7|SFKQVVDKFKYTNA-DZ...|
|   DB00104|          Octreotide| 83150-76-9|DEQANNDTNATYII-OU...|
|   DB00106|            Abarelix|183552-38-7|AIWRTTMUVOZGPW-HS...|
|   DB00114| Pyridoxal Phosphate|    54-47-7|NGVDGCNFYWLIFO-UH...|
|   DB00115|      Cyanocobalamin|    68-19-9|RMRCNWBMXRMIRW-WZ...|
|   DB00116|Tetrahydrofolic acid|   135-16-0|MSTNYGQPCMXVAQ-KI...|
|   DB00117|           Histidine|    71-00-1|HNDVDQJCIGZPNO-YF...|
|   DB00118|        Ademetionine| 29908-03-0|MEFKEPWMEQBLKI-AI...|
|   DB00119|        Pyruvic acid|   127-17-3|LCTONWCANYUPML-UH...|
|   DB00120|     L-Phenylalanine|    63-91-2|COLNVLDHVKWLRT-QM...|
|   DB00121|              Biotin|    58-85-5|YBJHBAHKTGYVGT-ZK...|
|   DB00122|             Choline|    62-49-7|OEYIOHPDSNJKLS-UH...|
|   DB00123|            L-Lysine|    56-87-1|KDXKERNSBIXSRK-YF...|
+----------+--------------------+-----------+--------------------+
only showing top 20 rows

Download DrugBank dataset for approved drugs

The DrugBank password protected datasets contain more information. YOu need to create a DrugBank account and supply username/passwork to access these datasets.

Create DrugBank account

In [13]:
username = "<your DrugBank account username>"
password = "<your DrugBank account password>"
drugLinks = drugBankDataset.get_drug_links("APPROVED", username,password)

Terminate Spark

In [7]:
sc.stop()