# Drug Bank Demo

![DrugBank](./figures/drugbank.jpg)

This demo demonstrates how to access the open DrugBank dataset. This dataset contains identifiers and names for integration with other data resources.

## Reference
 
Wishart DS, et al., DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2017 Nov 8.

doi:10.1093/nar/gkx1037.

## Imports

In [2]:
from pyspark.sql import SparkSession
from mmtfPyspark.datasets import drugBankDataset

## Configure Spark Session

In [4]:
spark = SparkSession.builder\
 .master("local[*]")\
 .appName("DrugBankDemo") \
 .getOrCreate()

## Download open DrugBank dataset

In [8]:
openDrugLinks = drugBankDataset.get_open_drug_links()

openDrugLinks.columns

['DrugBankID',
 'AccessionNumbers',
 'Commonname',
 'CAS',
 'UNII',
 'Synonyms',
 'StandardInChIKey']

## Find all drugs with an InChIKey

In [9]:
openDrugLinks = openDrugLinks.filter("StandardInChIKey IS NOT NULL")

## Show some sample data

In [10]:
openDrugLinks.select("DrugBankID","Commonname","CAS","StandardInChIKey").show()

+----------+--------------------+-----------+--------------------+
|DrugBankID| Commonname| CAS| StandardInChIKey|
+----------+--------------------+-----------+--------------------+
| DB00006| Bivalirudin|128270-60-0|OIRCOABEOLEUMC-GE...|
| DB00014| Goserelin| 65807-02-5|BLCLNMBMMGCOAS-UR...|
| DB00027| Gramicidin D| 1405-97-6|NDAYQJDHGXTBJL-MW...|
| DB00035| Desmopressin| 16679-58-6|NFLWUMRGJYTJIN-NX...|
| DB00050| Cetrorelix|120287-85-6|SBNPWPIBESPSIF-MH...|
| DB00080| Daptomycin|103060-53-3|DOAKLVKFURWEDJ-RW...|
| DB00091| Cyclosporine| 59865-13-3|PMATZTZNYRCHOR-CG...|
| DB00093| Felypressin| 56-59-7|SFKQVVDKFKYTNA-DZ...|
| DB00104| Octreotide| 83150-76-9|DEQANNDTNATYII-OU...|
| DB00106| Abarelix|183552-38-7|AIWRTTMUVOZGPW-HS...|
| DB00114| Pyridoxal Phosphate| 54-47-7|NGVDGCNFYWLIFO-UH...|
| DB00115| Cyanocobalamin| 68-19-9|RMRCNWBMXRMIRW-WZ...|
| DB00116|Tetrahydrofolic acid| 135-16-0|MSTNYGQPCMXVAQ-KI...|
| DB00117| Histidine| 71-00-1|HNDVDQJCIGZPNO-YF...|
| DB00118| Ademetionine

## Download DrugBank dataset for approved drugs

The DrugBank password protected datasets contain more information.
YOu need to create a DrugBank account and supply username/passwork to access these datasets.

[Create DrugBank account](https://www.drugbank.ca/public_users/sign_up)

In [13]:
username = ""
password = ""
drugLinks = drugBankDataset.get_drug_links("APPROVED", username,password)

## Show some sample data from DrugLinks

In [21]:
drugLinks.select("DrugBankID","Name","CASNumber","Formula","PubChemCompoundID",\
 "PubChemSubstanceID","ChEBIID","ChemSpiderID").show()

+----------+-------------------+-----------+---------------+-----------------+------------------+-------+------------+
|DrugBankID| Name| CASNumber| Formula|PubChemCompoundID|PubChemSubstanceID|ChEBIID|ChemSpiderID|
+----------+-------------------+-----------+---------------+-----------------+------------------+-------+------------+
| DB00006| Bivalirudin|128270-60-0| C98H138N24O33| 16129704| 46507415| 59173| 10482069|
| DB00014| Goserelin| 65807-02-5| C59H84N18O14| 5311128| 46507336| 5523| 4470656|
| DB00027| Gramicidin D| 1405-97-6| C96H135N19O16| 45267103| 46507412| null| 24623445|
| DB00035| Desmopressin| 16679-58-6| C46H64N14O12S2| 16051933| 46507014| 4450| 10481973|
| DB00050| Cetrorelix|120287-85-6| C70H92ClN17O14| 25074887| 46505494| 59224| 10482082|
| DB00067| Vasopressin| 11000-17-2| null| null| 46505933| null| null|
| DB00080| Daptomycin|103060-53-3| C72H101N17O26| 16134395| 46504551| 600103| 10482098|
| DB00091| Cyclosporine| 59865-13-3| C62H111N11O12| 5284373| 46508198| 40

## Terminate Spark

In [7]:
sc.stop()