Download the Python 3.7 Anaconda installer and install Anaconda.
The Git version control system is used to download respositories from Github.
To check if you have git installed, type the following line on your terminal:
git --version
A conda environment is a directory that contains a specific collection of conda packages that you have installed. If you change one environment, your other environments are not affected. You can easily activate or deactivate environments, which is how you switch between them.
git clone https://github.com/sbl-sdsc/mmtf-pyspark.git
cd mmtf-pyspark
conda env create -f binder/environment.yml
conda activate mmtf-pyspark
python test_mmtfPyspark.py
If the metadata for 1AQ1 are printed, you have successfully installed mmtf-pyspark.
jupyter notebook
In Jupyter Notebook, open DataAnalysisExample.ipynb
and run it.
Notebooks that demonstrate the use of the mmtf-pypark API are available in the demos
directory.
conda deactivate
Actvate the environment again if you want to use mmtf-pyspark.
To permanently remove the environment type:
conda remove -n mmtf-pyspark --all
The entire PDB can be downloaded as an MMTF Hadoop sequence file and environmental variables can be set by running the following command
curl https://raw.githubusercontent.com/sbl-sdsc/mmtf-pyspark/master/bin/download_mmtf_files.sh -o download_mmtf_files.sh
. ./download_mmtf_files.sh
The default download location is in the user’s home directory. The specify another directory, use the -o flag:
curl https://raw.githubusercontent.com/sbl-sdsc/mmtf-pyspark/master/bin/download_mmtf_files.sh -o download_mmtf_files.sh
. ./download_mmtf_files.sh -o {YOUR_LOCAL_DIRECTORY}