MUTAFY

Get models of human proteins by inputting gene names

Tutorial



1.How to run a basic job
2.Input details
        2.1 Genes of interest
        2.2 Variants in VCF format
        2.3 Advanced settings
            2.3.1 Number of top structures
            2.3.2 Minimum HSP coverage
            2.3.3 Minimum relative sequence length
            2.3.4 Minimum relative wide type length
3. Results page
        3.1 Result tables
            3.1.1 SAV
            3.1.2 Unique Sequences
            3.1.3 Any Mutation
            3.1.4 Wildtype
            3.1.5 Alphafold WT Predictions
        3.2 Structure Visulization
        3.3 Additional information
            3.3.1 Structure info
            3.3.2 Blastp info
            3.3.3 Clinvar info



1.How to run a basic job

To run a job please follow the steps below (Figure 1):
1. Go to the homepage (https://mutafy.er.kcl.ac.uk/);
2. Provide the genes of interest;
3. Alternatively, load the example;
4. Click “Submit”.

Figure 1



2.Input details

For brief field descriptions, you can put the cursor on the field you are interested in (Figure 2).

Figure 2

2.1 Genes of interest

When one or more genes (HGNC symbols) are selected, Mutafy will retrieve the corresponding best mutant and wide type protein structures for them (Figure 3).

Figure 3

2.2 Variants in VCF format

Activating the 'Variants' option within the VCF (Variant Call Format) selection in the 'Choose' mode enables the visualization of missense variants extracted from the VCF file. The visualization module exclusively processes Variant Call Format (VCF) files. Additionally, there is a limitation on the file size, restricting inputs to files with a maximum size of 1 gigabyte (1GB). The Genome Reference Consortium human genome build 38 (GRCh38) was used as the reference assembly.

Figure 4


2.3 Advanced settings

If ‘Advanced settings’ option is selected, more options are available for users (Figure 5).

Figure 5

2.3.1 Number of top structures

Providing number of top structures allows users to specify number of top structures per sequence or variants to be included in result tables.

2.3.2 Minimum HSP coverage

HSP determines the length of the matching sequence (compared to reference). In Mutafy, HSP value is 90% of RSL by default.

2.3.3 Minimum relative sequence length

Structures in the PDB can contain multiple protein chains or separate peptides.
If a PDB structure is associated with the input gene, each protein sequence in the pdb file (corresponding to a unique chain / peptide in the structure) is compared to the canonical protein sequence of the input gene (reference sequence from UniProt).
The relative sequence length (RSL) filter allows users to filter out chains/peptides/sequences which are shorter than a given percentage of the canonical sequence.
Example: The RNA-binding protein FUS which is encoded by the FUS gene consists of 526 amino acids (canonical protein sequence). Setting the RSL filter to 10% will exclude all sequences which are shorter than 10% of the canonical sequence, i.e. consist of only 52 or fewer amino acids.

2.3.4 Minimum relative wide type length

The relative wide type length filter allows users to filter out wide types which are shorter than a given percentage of the canonical sequence.


3. Results page

After submitting a job, results should be available within minutes. However, this can vary substantially as the time needed to complete a job depends on the number of input genes and size of the protein structures of those genes.

3.1 Result tables

The results page is divided into 5 tabs (SAV, Unique Sequences, Any Mutation, Wildtype and AlphaFold WT Predictions).
In each tab the corresponding highest quality experimentally solved protein structures are listed for all the input genes.
The search bar allows users to filter data in the table, e.g. only show data for one specific gene of interest (Figure 6).

Figure 6

3.1.1 SAV

Lists the best structure per single amino acid variant (SAV).
Mutafy identifies all protein structures associated with the input genes with a single amino acid variant (SAV), i.e. structures which have only one amino acid substitution in their sequence compared to the canonical protein sequence (WT), and will list the best available structures for each SAV available in the PDB.

3.1.2 Unique Sequences

Lists the best structure per unique sequence / combination of mutations.
Mutafy identifies all protein structures associated with the input genes and will list the best available structure for each unique sequence in the PDB, including WT and SAV structures if available.

3.1.3 Any Mutation

Lists the best structure per any identified mutation (irrespective of other mutations).
Mutafy identifies all protein structures associated with the input genes and will list the best available structure for each mutated residue found in any of the PDB structures (regardless of other mutations in the structure), including SAV structures if available.

3.1.4 Wildtype

Lists all available wildtype structures.
Mutafy identifies all available WT structures associated with the input genes and will list all available WT structures with the best structures at the top of the table.

3.1.5 Alphafold WT Predictions

List all available wildtype structures predicted by Alphafold.
Mutafy rovides external links to predicted protein structures on Alphafold website.

3.2 Structure Visulization

Clicking on the row of the table will display the selected protein structure with mutation highlighted to facilitate inspection in the right panel of the page (Figure 7). You can modify it using the panel below it.

Figure 7

3.3 Additional information

The additional information part is divided into the following 3 main tables:

3.3.1 Structure info

Structure info table provides structure infomation of the selected protein with the data from Protein data bank.

3.3.2 Blastop info

Blastop info table provides blastp infomation of the selected protein generated during blast.

3.3.3 Clinvar info

Clinvar info table providing clinvar infomation of the selected protein. For most of the proteins, the table is empty.