uniProt.py
This class downloads and reads UniProt sequence files in the FASTA format and converts them to datasets.This module reads the following files: - SWISS_PROT, - TREMBL, - UNIREF50, - UNIREF90, - UNIREF100.
Download, read, and save the SWISS_PROT dataset:
>>> ds = uniProt.get_dataset(UniProtDataset.SWISS_PROT)
>>> ds.printSchema()
>>> ds.show(5)
>>> ds.write().mode("overwrite").format("parquet").save(fileName)