{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Drug Bank Demo\n",
"\n",
"![DrugBank](./figures/drugbank.jpg)\n",
"\n",
"This demo demonstrates how to access the open DrugBank dataset. This dataset contains identifiers and names for integration with other data resources.\n",
"\n",
"## Reference\n",
" \n",
"Wishart DS, et al., DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2017 Nov 8.\n",
"\n",
"doi:10.1093/nar/gkx1037.\n",
"\n",
"## Imports"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from pyspark.sql import SparkSession\n",
"from mmtfPyspark.datasets import drugBankDataset"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configure Spark Session"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"spark = SparkSession.builder\\\n",
" .master(\"local[*]\")\\\n",
" .appName(\"DrugBankDemo\") \\\n",
" .getOrCreate()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Download open DrugBank dataset"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['DrugBankID',\n",
" 'AccessionNumbers',\n",
" 'Commonname',\n",
" 'CAS',\n",
" 'UNII',\n",
" 'Synonyms',\n",
" 'StandardInChIKey']"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"openDrugLinks = drugBankDataset.get_open_drug_links()\n",
"\n",
"openDrugLinks.columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Find all drugs with an InChIKey"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"openDrugLinks = openDrugLinks.filter(\"StandardInChIKey IS NOT NULL\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## Show some sample data"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"+----------+--------------------+-----------+--------------------+\n",
"|DrugBankID| Commonname| CAS| StandardInChIKey|\n",
"+----------+--------------------+-----------+--------------------+\n",
"| DB00006| Bivalirudin|128270-60-0|OIRCOABEOLEUMC-GE...|\n",
"| DB00014| Goserelin| 65807-02-5|BLCLNMBMMGCOAS-UR...|\n",
"| DB00027| Gramicidin D| 1405-97-6|NDAYQJDHGXTBJL-MW...|\n",
"| DB00035| Desmopressin| 16679-58-6|NFLWUMRGJYTJIN-NX...|\n",
"| DB00050| Cetrorelix|120287-85-6|SBNPWPIBESPSIF-MH...|\n",
"| DB00080| Daptomycin|103060-53-3|DOAKLVKFURWEDJ-RW...|\n",
"| DB00091| Cyclosporine| 59865-13-3|PMATZTZNYRCHOR-CG...|\n",
"| DB00093| Felypressin| 56-59-7|SFKQVVDKFKYTNA-DZ...|\n",
"| DB00104| Octreotide| 83150-76-9|DEQANNDTNATYII-OU...|\n",
"| DB00106| Abarelix|183552-38-7|AIWRTTMUVOZGPW-HS...|\n",
"| DB00114| Pyridoxal Phosphate| 54-47-7|NGVDGCNFYWLIFO-UH...|\n",
"| DB00115| Cyanocobalamin| 68-19-9|RMRCNWBMXRMIRW-WZ...|\n",
"| DB00116|Tetrahydrofolic acid| 135-16-0|MSTNYGQPCMXVAQ-KI...|\n",
"| DB00117| Histidine| 71-00-1|HNDVDQJCIGZPNO-YF...|\n",
"| DB00118| Ademetionine| 29908-03-0|MEFKEPWMEQBLKI-AI...|\n",
"| DB00119| Pyruvic acid| 127-17-3|LCTONWCANYUPML-UH...|\n",
"| DB00120| L-Phenylalanine| 63-91-2|COLNVLDHVKWLRT-QM...|\n",
"| DB00121| Biotin| 58-85-5|YBJHBAHKTGYVGT-ZK...|\n",
"| DB00122| Choline| 62-49-7|OEYIOHPDSNJKLS-UH...|\n",
"| DB00123| L-Lysine| 56-87-1|KDXKERNSBIXSRK-YF...|\n",
"+----------+--------------------+-----------+--------------------+\n",
"only showing top 20 rows\n",
"\n"
]
}
],
"source": [
"openDrugLinks.select(\"DrugBankID\",\"Commonname\",\"CAS\",\"StandardInChIKey\").show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Download DrugBank dataset for approved drugs\n",
"\n",
"The DrugBank password protected datasets contain more information.\n",
"YOu need to create a DrugBank account and supply username/passwork to access these datasets.\n",
"\n",
"[Create DrugBank account](https://www.drugbank.ca/public_users/sign_up)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"username = \"\"\n",
"password = \"\"\n",
"drugLinks = drugBankDataset.get_drug_links(\"APPROVED\", username,password)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Show some sample data from DrugLinks"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"+----------+-------------------+-----------+---------------+-----------------+------------------+-------+------------+\n",
"|DrugBankID| Name| CASNumber| Formula|PubChemCompoundID|PubChemSubstanceID|ChEBIID|ChemSpiderID|\n",
"+----------+-------------------+-----------+---------------+-----------------+------------------+-------+------------+\n",
"| DB00006| Bivalirudin|128270-60-0| C98H138N24O33| 16129704| 46507415| 59173| 10482069|\n",
"| DB00014| Goserelin| 65807-02-5| C59H84N18O14| 5311128| 46507336| 5523| 4470656|\n",
"| DB00027| Gramicidin D| 1405-97-6| C96H135N19O16| 45267103| 46507412| null| 24623445|\n",
"| DB00035| Desmopressin| 16679-58-6| C46H64N14O12S2| 16051933| 46507014| 4450| 10481973|\n",
"| DB00050| Cetrorelix|120287-85-6| C70H92ClN17O14| 25074887| 46505494| 59224| 10482082|\n",
"| DB00067| Vasopressin| 11000-17-2| null| null| 46505933| null| null|\n",
"| DB00080| Daptomycin|103060-53-3| C72H101N17O26| 16134395| 46504551| 600103| 10482098|\n",
"| DB00091| Cyclosporine| 59865-13-3| C62H111N11O12| 5284373| 46508198| 4031| 4447449|\n",
"| DB00093| Felypressin| 56-59-7| C46H65N13O11S2| 14257662| 46507522| 60564| 16736539|\n",
"| DB00104| Octreotide| 83150-76-9| C49H66N10O10S2| 448601| 46504600| null| 395352|\n",
"| DB00106| Abarelix|183552-38-7| C72H95ClN14O14| 16131215| 46508237| 337298| 10482301|\n",
"| DB00114|Pyridoxal Phosphate| 54-47-7| C8H10NO6P| 1051| 46506428| 18405| 1022|\n",
"| DB00115| Cyanocobalamin| 68-19-9|C63H88CoN14O14P| 70678590| 46509031| 17439| 21864832|\n",
"| DB00117| Histidine| 71-00-1| C6H9N3O2| 6274| 46507001| 15971| 6038|\n",
"| DB00118| Ademetionine| 29908-03-0| C15H22N6O5S| 34755| 46505280| 67040| 31982|\n",
"| DB00119| Pyruvic acid| 127-17-3| C3H4O3| 1060| 46505692| 32816| 1031|\n",
"| DB00120| L-Phenylalanine| 63-91-2| C9H11NO2| 6140| 46505708| 17295| 5910|\n",
"| DB00121| Biotin| 58-85-5| C10H16N2O3S| 171548| 46508694| 15956| 149962|\n",
"| DB00122| Choline| 62-49-7| C5H14NO| 305| 46508132| 15354| 299|\n",
"| DB00123| L-Lysine| 56-87-1| C6H14N2O2| 5962| 46504770| 18019| 5747|\n",
"+----------+-------------------+-----------+---------------+-----------------+------------------+-------+------------+\n",
"only showing top 20 rows\n",
"\n"
]
}
],
"source": [
"drugLinks.select(\"DrugBankID\",\"Name\",\"CASNumber\",\"Formula\",\"PubChemCompoundID\",\\\n",
" \"PubChemSubstanceID\",\"ChEBIID\",\"ChemSpiderID\").show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## Terminate Spark"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"sc.stop()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}