{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Swiss Model Dataset\n", "\n", "This demo shows how to access metadata for SWISS-MODEL homology models.\n", "\n", "## Reference\n", " \n", "Bienert S, Waterhouse A, de Beer TA, Tauriello G, Studer G, Bordoli L, Schwede T (2017). The SWISS-MODEL Repository - new features and functionality, Nucleic Acids Res. 45(D1):D313-D319.\n", " * https://dx.doi.org/10.1093/nar/gkw1132\n", " \n", "Biasini M, Bienert S, Waterhouse A, Arnold K, Studer G, Schmidt T, Kiefer F, Gallo Cassarino T, Bertoni M, Bordoli L, Schwede T(2014). The SWISS-MODEL Repository - modelling protein tertiary and quaternary structure using evolutionary information, Nucleic Acids Res. 42(W1):W252–W258.\n", " * https://doi.org/10.1093/nar/gku340\n", "\n", "## Imports" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from pyspark.sql import SparkSession\n", "from mmtfPyspark.datasets import swissModelDataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Configure Spark Session" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "spark = SparkSession.builder\\\n", " .master(\"local[*]\")\\\n", " .appName(\"SwissModelDatasetDemo\") \\\n", " .getOrCreate()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Download metadata for Swiss-Model homology" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# list of uniProtIds to be retrived from Swiss-Model\n", "uniProtIds = ['P36575','P24539','O00244']\n", "\n", "ds = swissModelDataset.get_swiss_models(uniProtIds)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Show results" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | ac | \n", "sequence | \n", "from | \n", "to | \n", "qmean | \n", "qmean_norm | \n", "gmqe | \n", "coverage | \n", "oligo-state | \n", "method | \n", "template | \n", "identity | \n", "similarity | \n", "coordinates | \n", "md5 | \n", "md5 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "P36575 | \n", "MSKVFKKTSSNGKLSIYLGKRDFVDHVDTVEPIDGVVLVDPEYLKC... | \n", "2 | \n", "371 | \n", "-3.206383 | \n", "0.658018 | \n", "0.757 | \n", "0.953608 | \n", "monomer | \n", "Homology | \n", "1suj.1.A | \n", "68.664848 | \n", "0.504633 | \n", "https://swissmodel.expasy.org/repository/unipr... | \n", "1b2ec664c28f6cde36c416b6a66fc591 | \n", "1b2ec664c28f6cde36c416b6a66fc591 | \n", "
1 | \n", "P24539 | \n", "MLSRVVLSAAATAAPSLKNAAFLGPGVLQATRTFHTGQPHLVPVPP... | \n", "76 | \n", "249 | \n", "-2.543623 | \n", "0.669841 | \n", "0.656 | \n", "0.679688 | \n", "monomer | \n", "Homology | \n", "5ara.1.S | \n", "84.482758 | \n", "0.547889 | \n", "https://swissmodel.expasy.org/repository/unipr... | \n", "138e5aeaf02a8fa2e9c52264e5383033 | \n", "138e5aeaf02a8fa2e9c52264e5383033 | \n", "
2 | \n", "O00244 | \n", "MPKHEFSVDMTCGGCAEAVSRVLNKLGGVKYDIDLPNKKVCIESEH... | \n", "1 | \n", "68 | \n", "1.047134 | \n", "0.842332 | \n", "0.987 | \n", "1.000000 | \n", "homo-2-mer | \n", "Homology | \n", "1fe4.1.B | \n", "100.000000 | \n", "0.606865 | \n", "https://swissmodel.expasy.org/repository/unipr... | \n", "34f221f64be3395aa958786b84dfc0da | \n", "34f221f64be3395aa958786b84dfc0da | \n", "