How Tos

Here are some examples for how to use CyPRUS:

  1. Click on "Search and Visualize Features" from the menu tab
  2. Enter a list of protein names, UniProt accessions, UniProt entry names, or protein names and amino acid positions for the "Inputs" field. The inputs can be comma-separated or multiple-line format. For amino acid positions with a mutation, the format should be in <Protein_Name>:<Original_Residue><Position><Mutated_Residue> (MTOR:A8S). It can be comma-separated or multiple-line format (case-senstive).
  3. Select an organism from the select menu. The protein identitifiers have to match with the organism selected.
  4. Click the "Submit" button

Once submitted, an excel like table and image of the first protein in the table will be generated on a new page. how to retrieve protein features for gene symbols result table for protein

Click on an isoform identifier to view an isoform and its features. result table for gene symbols isoform

The disrupted features will be highlighted in red and marked as "disrupted" in the "Status" column. result table for gene symbols isoform table

Click on the "Variants" category to view the impacts of the variants. Variants information are obtained from the UniProt feature API (https://www.ebi.ac.uk/uniprot/services/restful/features).

variant impact view

  1. Click on "Customize Features" from the menu tab
  2. Enter a protein name for "Protein Name" field. Select an organism from the select menu. If radio button for "Show Isoforms" is "Yes", a list of isoform IDs will be display in a drop down menu based on the user's protein and organism input selections. By default, "canonical" is the selected. Users can then enter a list of protein features that they want to display in the viewer in a <coordinate>:<feature_name> format.
  3. Click on the "Submit" button
  4. visualize custom protein features

Once submitted, a protein feature viewer will be generated on the buttom of the page. protein feature viewer

Users can view their custome features under the "Custom Features" category.

Supported sequence annotations

This tool extracts sequence features based on Gene(s) or Gene position(s) from Human UniProt. For the "Search and Visualize Features" option, the input takes the Gene Symbol(s) or Gene Position(s) in a list or csv format. A user can select the name of feature(s) he or she wants to extract. If no feature type is selected, then the tool will return all the sequence features for that gene or affected by that position. When entering Gene Position(s), the input takes the form of <Gene_symbol>:<Original_Residue><Position><Mutated_Residue> (for example: MTOR:A8S). This option will return all or selected features that are affected by the mutation(s).

Currently, following features are supported by this tool (this list is in reference to http://www.uniprot.org/help/sequence_annotation) :

Molecule processing

  • chain - Extent of a polypeptide chain in the mature protein
  • peptide - Extent of an active peptide in the mature protein
  • signal - Sequence targeting proteins to the secretory pathway or periplasmic space
  • transit - Extent of a transit peptide for organelle targeting
  • init_met - Cleavage of the initiator methionine
  • propep - Part of a protein that is cleaved during maturation or activation

Regions

  • region - Region of interest in the sequence
  • domain - Position and type of each modular protein domain
  • repeat - Positions of repeated sequence motifs or repeated domains
  • zn_fing - Position(s) and type(s) of zinc fingers within the protein
  • motif - Short (up to 20 amino acids) sequence motif of biological interest
  • compbias - Region of compositional bias in the protein
  • topo_dom - Location of non-membrane regions of membrane-spanning proteins
  • np_bind - Nucleotide phosphate binding region
  • transmem - Extent of a membrane-spanning region
  • dna_bind - Position and type of a DNA-binding domain
  • ca_bind - Position(s) of calcium binding region(s) within the protein
  • coiled - Positions of regions of coiled coil within the protein
  • lipid -
  • intramem - Extent of a region located in a membrane without crossing it

Amino acid modifications

  • mod_res - Modified residues excluding lipids, glycans and protein cross-links
  • carbohyd - Covalently attached glycan group(s)
  • non_std - Occurence of non-standard amino acids (selenocysteine and pyrrolysine) in the protein sequence.
  • disulfide - Cysteine residues participating in disulfide bonds.
  • crosslnk - Residues participating in covalent linkage(s) between proteins.

Natural variants

  • variant - Description of a natural variant of the protein

Natural variants - subsitution

  • subsitution - When original sequence length is > 1 and variant seqeunce length is > 1

Natural variants - insertion

  • insertion - When original sequence length is == 1 or 0 and variant seqeunce length is > 1

Natural variants - deletion

  • deletion - When original sequence length is > 1 or 0 and variant seqeunce length is == 0

Experimental information

  • conflict - Description of sequence discrepancies of unknown origin
  • mutagen - Site which has been experimentally altered by mutagenesis
  • unsure - Regions of uncertainty in the sequence
  • non-cons - Indicates that two residues in a sequence are not consecutive
  • non-ter - The sequence is incomplete. Indicate that a residue is not the terminal residue of the complete protein

Secondary structure

  • helix - Helical regions within the experimentally determined protein structure
  • turn - Turns within the experimentally determined protein structure
  • strand - Beta strand regions within the experimentally determined protein structure

Sites

  • site - Binding site for any chemical group (co-enzyme, prosthetic group, etc.)
  • act_site - Amino acid(s) directly involved in the activity of an enzyme
  • binding - Binding site for any chemical group (co-enzyme, prosthetic group, etc.)
  • metal - Binding site for a metal ion

Variants

  • user_variant - User inputted variants

User Inputed Feature

  • user_feature - User inputted feature

Uniprot version and statistics