{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Swiss Model Dataset\n", "\n", "This demo shows how to access metadata for SWISS-MODEL homology models.\n", "\n", "## Reference\n", " \n", "Bienert S, Waterhouse A, de Beer TA, Tauriello G, Studer G, Bordoli L, Schwede T (2017). The SWISS-MODEL Repository - new features and functionality, Nucleic Acids Res. 45(D1):D313-D319.\n", " * https://dx.doi.org/10.1093/nar/gkw1132\n", " \n", "Biasini M, Bienert S, Waterhouse A, Arnold K, Studer G, Schmidt T, Kiefer F, Gallo Cassarino T, Bertoni M, Bordoli L, Schwede T(2014). The SWISS-MODEL Repository - modelling protein tertiary and quaternary structure using evolutionary information, Nucleic Acids Res. 42(W1):W252–W258.\n", " * https://doi.org/10.1093/nar/gku340\n", "\n", "## Imports" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from pyspark.sql import SparkSession\n", "from mmtfPyspark.datasets import swissModelDataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Configure Spark Session" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "spark = SparkSession.builder\\\n", " .master(\"local[*]\")\\\n", " .appName(\"SwissModelDatasetDemo\") \\\n", " .getOrCreate()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Download metadata for Swiss-Model homology" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# list of uniProtIds to be retrived from Swiss-Model\n", "uniProtIds = ['P36575','P24539','O00244']\n", "\n", "ds = swissModelDataset.get_swiss_models(uniProtIds)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Show results" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
acsequencefromtoqmeanqmean_normgmqecoverageoligo-statemethodtemplateidentitysimilaritycoordinatesmd5md5
0P36575MSKVFKKTSSNGKLSIYLGKRDFVDHVDTVEPIDGVVLVDPEYLKC...2371-3.2063830.6580180.7570.953608monomerHomology1suj.1.A68.6648480.504633https://swissmodel.expasy.org/repository/unipr...1b2ec664c28f6cde36c416b6a66fc5911b2ec664c28f6cde36c416b6a66fc591
1P24539MLSRVVLSAAATAAPSLKNAAFLGPGVLQATRTFHTGQPHLVPVPP...76249-2.5436230.6698410.6560.679688monomerHomology5ara.1.S84.4827580.547889https://swissmodel.expasy.org/repository/unipr...138e5aeaf02a8fa2e9c52264e5383033138e5aeaf02a8fa2e9c52264e5383033
2O00244MPKHEFSVDMTCGGCAEAVSRVLNKLGGVKYDIDLPNKKVCIESEH...1681.0471340.8423320.9871.000000homo-2-merHomology1fe4.1.B100.0000000.606865https://swissmodel.expasy.org/repository/unipr...34f221f64be3395aa958786b84dfc0da34f221f64be3395aa958786b84dfc0da
\n", "
" ], "text/plain": [ " ac sequence from to \\\n", "0 P36575 MSKVFKKTSSNGKLSIYLGKRDFVDHVDTVEPIDGVVLVDPEYLKC... 2 371 \n", "1 P24539 MLSRVVLSAAATAAPSLKNAAFLGPGVLQATRTFHTGQPHLVPVPP... 76 249 \n", "2 O00244 MPKHEFSVDMTCGGCAEAVSRVLNKLGGVKYDIDLPNKKVCIESEH... 1 68 \n", "\n", " qmean qmean_norm gmqe coverage oligo-state method template \\\n", "0 -3.206383 0.658018 0.757 0.953608 monomer Homology 1suj.1.A \n", "1 -2.543623 0.669841 0.656 0.679688 monomer Homology 5ara.1.S \n", "2 1.047134 0.842332 0.987 1.000000 homo-2-mer Homology 1fe4.1.B \n", "\n", " identity similarity coordinates \\\n", "0 68.664848 0.504633 https://swissmodel.expasy.org/repository/unipr... \n", "1 84.482758 0.547889 https://swissmodel.expasy.org/repository/unipr... \n", "2 100.000000 0.606865 https://swissmodel.expasy.org/repository/unipr... \n", "\n", " md5 md5 \n", "0 1b2ec664c28f6cde36c416b6a66fc591 1b2ec664c28f6cde36c416b6a66fc591 \n", "1 138e5aeaf02a8fa2e9c52264e5383033 138e5aeaf02a8fa2e9c52264e5383033 \n", "2 34f221f64be3395aa958786b84dfc0da 34f221f64be3395aa958786b84dfc0da " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = ds.toPandas()\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Terminate Spark" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "sc.stop()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.0" } }, "nbformat": 4, "nbformat_minor": 2 }