mmtfPyspark.datasets.g2sDataset module

g2sDataset.py

This class maps human genetic variation positions to PDB structure positions. Genomic positions must be specified for the hgvs-grch37 reference genome using the HGVS sequence variant nomenclature.

References

Examples

>>> variantIds = ["chr7:g.140449098T>C", "chr7:g.140449100T>C"]
>>> ds = g2sDataset.get_position_dataset(variantIds, "3TV4", "A")
>>> ds.show()
+-----------+-------+-----------+------------+-----------+-------------------+
|structureId|chainId|pdbPosition|pdbAminoAcid|  refGenome|        variationId|
+-----------+-------+-----------+------------+-----------+-------------------+
|       3TV4|      A|        661|           N|hgvs-grch37|chr7:g.140449098T>C|
|       3TV4|      A|        660|           N|hgvs-grch37|chr7:g.140449100T>C|
+-----------+-------+-----------+------------+-----------+-------------------+
get_full_dataset(variationIds, structureId=None, chainId=None)[source]

Downloads PDB residue mappings and alignment information for a list of genomic variations

Parameters:

variationIds : list

genomic variation ids, e.g. chr7:g.140449103A>C

structureId : str

specific PDB structure used for mapping [None]

chainId : str

specific chain used for mapping [None]

Returns:

dataset

dataset with PDB mapping information

get_position_dataset(variationIds, structureId=None, chainId=None)[source]

Downloads PDB residue mappings for a list of genomic variations

Parameters:

variationIds : list

genomic variation ids, e.g. chr7:g.140449103A>C

structureId : str

specific PDB structure used for mapping [None]

chainId : str

specific chain used for mapping [None]

Returns:

dataset

dataset with PDB mapping information