containsSequenceRegex.py:
This filter returns true if the polymer sequence motif matches the specified regular expression. Sequence motifs support the following one-letter codes: - 20 standard amino acids, - O for Pyrrolysine, - U for Selenocysteine, - X for non-standard amino acid
Ranges of variable residues are specified by the {n} notation, where n is the number of variable residues. To query a motif with seven variables between residues W and G and twenty variable residues between G and L use the following notation:
W.{7}G.{20}L
Variable ranges are expressed by the {n,m} notation, where n is the minimum and m the maximum number of repetitions. For example the zinc finger motif that binds Zn in a DNA-binding domain can be expressed as:
C.{2,4}C.{12}H.{3,5}H
The ‘^’ operator searches for sequence motifs at the beginning of a protein sequence. The following two queries find sequences with N-terminal Histidine tags:
^HHHHHH or ^H{6}
Square brackets specify alternative residues at a particular position. The Walker (P loop) motif that binds ATP or GTP can be expressed as:
[AG].{4}GK[ST] A or G are followed by 4 variable residues, then G and K, and finally S or T