{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Drug Bank Demo\n", "\n", "![DrugBank](./figures/drugbank.jpg)\n", "\n", "This demo demonstrates how to access the open DrugBank dataset. This dataset contains identifiers and names for integration with other data resources.\n", "\n", "## Reference\n", " \n", "Wishart DS, et al., DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2017 Nov 8.\n", "\n", "doi:10.1093/nar/gkx1037.\n", "\n", "## Imports" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from pyspark.sql import SparkSession\n", "from mmtfPyspark.datasets import drugBankDataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Configure Spark Session" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "spark = SparkSession.builder\\\n", " .master(\"local[*]\")\\\n", " .appName(\"DrugBankDemo\") \\\n", " .getOrCreate()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Download open DrugBank dataset" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['DrugBankID',\n", " 'AccessionNumbers',\n", " 'Commonname',\n", " 'CAS',\n", " 'UNII',\n", " 'Synonyms',\n", " 'StandardInChIKey']" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "openDrugLinks = drugBankDataset.get_open_drug_links()\n", "\n", "openDrugLinks.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Find all drugs with an InChIKey" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "openDrugLinks = openDrugLinks.filter(\"StandardInChIKey IS NOT NULL\")" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Show some sample data" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+----------+--------------------+-----------+--------------------+\n", "|DrugBankID| Commonname| CAS| StandardInChIKey|\n", "+----------+--------------------+-----------+--------------------+\n", "| DB00006| Bivalirudin|128270-60-0|OIRCOABEOLEUMC-GE...|\n", "| DB00014| Goserelin| 65807-02-5|BLCLNMBMMGCOAS-UR...|\n", "| DB00027| Gramicidin D| 1405-97-6|NDAYQJDHGXTBJL-MW...|\n", "| DB00035| Desmopressin| 16679-58-6|NFLWUMRGJYTJIN-NX...|\n", "| DB00050| Cetrorelix|120287-85-6|SBNPWPIBESPSIF-MH...|\n", "| DB00080| Daptomycin|103060-53-3|DOAKLVKFURWEDJ-RW...|\n", "| DB00091| Cyclosporine| 59865-13-3|PMATZTZNYRCHOR-CG...|\n", "| DB00093| Felypressin| 56-59-7|SFKQVVDKFKYTNA-DZ...|\n", "| DB00104| Octreotide| 83150-76-9|DEQANNDTNATYII-OU...|\n", "| DB00106| Abarelix|183552-38-7|AIWRTTMUVOZGPW-HS...|\n", "| DB00114| Pyridoxal Phosphate| 54-47-7|NGVDGCNFYWLIFO-UH...|\n", "| DB00115| Cyanocobalamin| 68-19-9|RMRCNWBMXRMIRW-WZ...|\n", "| DB00116|Tetrahydrofolic acid| 135-16-0|MSTNYGQPCMXVAQ-KI...|\n", "| DB00117| Histidine| 71-00-1|HNDVDQJCIGZPNO-YF...|\n", "| DB00118| Ademetionine| 29908-03-0|MEFKEPWMEQBLKI-AI...|\n", "| DB00119| Pyruvic acid| 127-17-3|LCTONWCANYUPML-UH...|\n", "| DB00120| L-Phenylalanine| 63-91-2|COLNVLDHVKWLRT-QM...|\n", "| DB00121| Biotin| 58-85-5|YBJHBAHKTGYVGT-ZK...|\n", "| DB00122| Choline| 62-49-7|OEYIOHPDSNJKLS-UH...|\n", "| DB00123| L-Lysine| 56-87-1|KDXKERNSBIXSRK-YF...|\n", "+----------+--------------------+-----------+--------------------+\n", "only showing top 20 rows\n", "\n" ] } ], "source": [ "openDrugLinks.select(\"DrugBankID\",\"Commonname\",\"CAS\",\"StandardInChIKey\").show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Download DrugBank dataset for approved drugs\n", "\n", "The DrugBank password protected datasets contain more information.\n", "YOu need to create a DrugBank account and supply username/passwork to access these datasets.\n", "\n", "[Create DrugBank account](https://www.drugbank.ca/public_users/sign_up)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "username = \"\"\n", "password = \"\"\n", "drugLinks = drugBankDataset.get_drug_links(\"APPROVED\", username,password)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Show some sample data from DrugLinks" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+----------+-------------------+-----------+---------------+-----------------+------------------+-------+------------+\n", "|DrugBankID| Name| CASNumber| Formula|PubChemCompoundID|PubChemSubstanceID|ChEBIID|ChemSpiderID|\n", "+----------+-------------------+-----------+---------------+-----------------+------------------+-------+------------+\n", "| DB00006| Bivalirudin|128270-60-0| C98H138N24O33| 16129704| 46507415| 59173| 10482069|\n", "| DB00014| Goserelin| 65807-02-5| C59H84N18O14| 5311128| 46507336| 5523| 4470656|\n", "| DB00027| Gramicidin D| 1405-97-6| C96H135N19O16| 45267103| 46507412| null| 24623445|\n", "| DB00035| Desmopressin| 16679-58-6| C46H64N14O12S2| 16051933| 46507014| 4450| 10481973|\n", "| DB00050| Cetrorelix|120287-85-6| C70H92ClN17O14| 25074887| 46505494| 59224| 10482082|\n", "| DB00067| Vasopressin| 11000-17-2| null| null| 46505933| null| null|\n", "| DB00080| Daptomycin|103060-53-3| C72H101N17O26| 16134395| 46504551| 600103| 10482098|\n", "| DB00091| Cyclosporine| 59865-13-3| C62H111N11O12| 5284373| 46508198| 4031| 4447449|\n", "| DB00093| Felypressin| 56-59-7| C46H65N13O11S2| 14257662| 46507522| 60564| 16736539|\n", "| DB00104| Octreotide| 83150-76-9| C49H66N10O10S2| 448601| 46504600| null| 395352|\n", "| DB00106| Abarelix|183552-38-7| C72H95ClN14O14| 16131215| 46508237| 337298| 10482301|\n", "| DB00114|Pyridoxal Phosphate| 54-47-7| C8H10NO6P| 1051| 46506428| 18405| 1022|\n", "| DB00115| Cyanocobalamin| 68-19-9|C63H88CoN14O14P| 70678590| 46509031| 17439| 21864832|\n", "| DB00117| Histidine| 71-00-1| C6H9N3O2| 6274| 46507001| 15971| 6038|\n", "| DB00118| Ademetionine| 29908-03-0| C15H22N6O5S| 34755| 46505280| 67040| 31982|\n", "| DB00119| Pyruvic acid| 127-17-3| C3H4O3| 1060| 46505692| 32816| 1031|\n", "| DB00120| L-Phenylalanine| 63-91-2| C9H11NO2| 6140| 46505708| 17295| 5910|\n", "| DB00121| Biotin| 58-85-5| C10H16N2O3S| 171548| 46508694| 15956| 149962|\n", "| DB00122| Choline| 62-49-7| C5H14NO| 305| 46508132| 15354| 299|\n", "| DB00123| L-Lysine| 56-87-1| C6H14N2O2| 5962| 46504770| 18019| 5747|\n", "+----------+-------------------+-----------+---------------+-----------------+------------------+-------+------------+\n", "only showing top 20 rows\n", "\n" ] } ], "source": [ "drugLinks.select(\"DrugBankID\",\"Name\",\"CASNumber\",\"Formula\",\"PubChemCompoundID\",\\\n", " \"PubChemSubstanceID\",\"ChEBIID\",\"ChemSpiderID\").show()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Terminate Spark" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "sc.stop()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.0" } }, "nbformat": 4, "nbformat_minor": 2 }