MutationDistiller: Documentation

MutationDistiller makes the analysis of Next-Generation Sequencing data simple. This documentation takes you through the crucial steps.

Citing MutationDistiller

If you like MutationDistiller, please cite us:

MutationDistiller: user-driven identification of pathogenic DNA variants.
Hombach D, Schuelke M, Knierim E, Ehmke N, Schwarz JM, Fischer-Zirnsak B, Seelow D.
Nucleic Acids Res. 2019 May 20. pii: gkz330. doi: 10.1093/nar/gkz330.

Read the paper!

HPO score update

We have recently updated the HPO scoring system to be based on a term's information content as described in our paper. We decided to take this step because we found in internal tests that information-content based scoring is slightly better than our previous system. Observant users might note that the scores are now much lower than before - please do not be alarmed by this.

Documentation - Input

User Modes

Depending on your background and interest, MutationDistiller offers various modes with optimised settings. Of course, you can always pick and choose your own settings - but if you just wish to run a quick analysis, sticking to our pre-selected options might be the most convenient way for you. Clicking on one of the user modes leads you to MutationDistiller's main page with specific sections displayed or hidden. You can always display hidden sections with a simple click, so don't worry: the choice of user mode is not limiting your search in any way - the different modes are simply there to show you the settings which we consider to be the most relevant for you.

Clinical: HPO

This is the perfect setting for you if you want to enter a patient's phenotype using the Human Phenotype Ontology (HPO), OrphaNet or OMIM.

Clinical: Gene Panels

If you are interested in gene panels or already have some candidate genes for your case, then click here. MutationDistiller includes Genomic England's PanelApp and inhouse panels from the Charité Neuropaediatrics research group, but also allows you to enter your own candidate genes.

Gene Function

This allows you to base your analysis on gene function via Gene Ontology (GO), Reactome or Wikipathways.

Basic Search

Here, you can just have a quick look at the most severe alterations in your VCF file without entering any additional information.

Main Page Sections

On the User Mode page, you selected which sections are to be displayed now. Here, we will describe all of them to give you an introduction to each of them. Don't worry, they are pretty self-explanatory.

Project

This is the one essential section for MutationDistiller. It will be displayed in all user modes. Here, you get to upload your VCF or enter a MutationTaster ID - without one or the other, MutationDistiller will not be able to run an analysis of course. When you first upload a VCF to MutationDistiller, please record the ID. You will then be able to access and re-analyse your project whenever you like by simply entering said ID in the text field. You never have to upload a VCF more than once unless you forget your ID! We also email the ID to you, just to be safe.

Upload VCF

If you don't already have a user ID, click on 'upload vcf'. This leads you to our upload page. Here, we'll ask you to find your project's variant calling file (VCF) on your computer. Please note the format constraints and the genomic version: All coordinates must refer to GRCh37 (hg19) and only VCF files containing variants from a single sample will be accepted. If MutationDistiller detects an error in either of those two points, it will abort and send you an error message.
In addition to the mandatory information, you can then give your project a name and enter your email address. However, none of this is mandatory.
We'll only use your email address to notify you when your analysis is done... and it's a good reminder for your project ID, of course :)
In the Analysis settings section, you can customise certain parameters for your analysis. We recommend to only change things here if you're really sure what you're doing. The easiest, of course, is to stick with our default parameters. After uploading, you will have to wait a couple of minutes. Once your analysis is done, you will be forwarded to a statistics page with all sorts of information on the data you submitted. Just click "Query MutationDistiller" to be led back to MutationDistiller's main page with your Project ID pre-entered. If you have given us your email address, you will receive two emails: The first just contains a confirmation that the file has been uploaded together with your project ID; the second contains a link to access your data. This link opens the same statistics page as mentioned above and allows you to move on to MutationDistiller's main page.
Once you're safely back with MutationDistiller, you can enter all additional information the program might need to find your most likely disease causing mutation(s).

Please note that for reasons of data protection, we delete all projects at regular intervals. If you try to access a project from a while ago and it doesn't work, just upload the VCF again. You might notice tht it uploads much faster than before! MutationDistiller will then assign a new project ID which you can use for further analyses.

Project - additional settings

In the 'Project' section, you can not only enter your project but also determine some relevant information such as your expected mode of inheritance (MoI) or how many genes you want to be confronted with in the output. For the MoI, the options are dominant, recessive, and a 'strict' setting for each of them. If you set the MoI to 'dominant' or 'recessive', MutationDistiller will award some bonus points to every gene which is listed in the Human Phenotype Ontology (HPO, see below) as following your MoI of interest. In the strict settings, all genes which do not follow your desired inheritance pattern, will be excluded from analysis.

Variant Selection

This section determines which types of variants will be included in your results. It will never be displayed by default - however, feel free to activate it whenever you like if you wish to change the settings. We have five different categories: Clinvar/nonsense is the strongest category and includes known disease mutations listed in ClinVar and alterations designated as nonsense by MutationTaster. non-synonymous DM includes variants which MutationTaster deemed to be pathogenic and causative of an amino acid exchange. These two categories are included into the search by default. If you wish to use less strict criteria, you can do this easily by clicking on the respective box. MutationDistiller's less severe variant classes are as follows: DMs near splice sites includes all disease mutations (as designated by MutationTaster) which are located within 10 bp of a splice site. all DMs allows you to include all disease mutations, whereas all variants includes all variants, regardless of their pathogenicity. To prevent you from being swamped by data (and also to protect our servers), we only allow this setting of you have activated an additional filter, for example via candidate genes or regions. In general, we do not recommend to include all variants! Instead, if you are looking for less strict settings, it makes more sense to include benign non-synonymous alterations (excluding known polymorphisms) by checking the checkbox below your selection. By doing this, your initial variant selection criteria are still holding up. For instance, say you're looking at our default categories - in addition to disease mutations, now you will also see non-synonymous alterations which MutationTaster deemed as non-pathogenic.

Compound heterozygosity

Another setting option in this category is the possibility to select a different variant class severity for compound heterozygote cases. To do this, simply click on Enter second variant (compound heterozygosity). This will open up a similar view as the one for the first variant and allow you to select a less severe setting for the second variant. Thus, in your analysis you will see genes which have one severe alteration (a known disease mutation, for example) together with a less severe one. In cases of compound heterozygosity, this might help you identify the causative alterations. NOTE: Please keep in mind that without knowing the phase, MutationDistiller cannot give you an exact estimate of whether a 'compound heterozygote' actually shows one of the variants on the paternal chromosome and the other one on the maternal one. In theory, it might be that both alterations appear on the same chromosome. If MutationDistiller is talking about compound heterozygosity, you will still have to go and check where exactly the genetic changes occur!

Candidate Genes, Regions, or Panels

This section is visible by default in the panel-based user mode. Here, you can determine candidate genes or regions either by entering them manually into the text field, or by selecting a gene panel from our list. For convenience, Common Gene Panels lists the most common panels. Genomics England PanelApp contains the gene panels included in PanelApp, Genomic England's awesome crowdsourcing initiative to bring some structure into the mess that gene panels often are. We have ordered the panels alphabetically by disease categories. If you are looking for a specific panel, you can also find it by searching our PanelApp field manually (CTRL+F). To find out more about PanelApp, you can find them here - or by clicking on the link in the headline of the PanelApp field. We also provide panels generated by our in-house clinicians at the Charité Neuropaediatrics group. To access those, just click on the little 'show' button.

Phenotype

By default, the Phenotype section is displayed in the HPO-based view. It contains an auto-completion field which allows you to find entries from the HPO, Orphanet and from OMIM. Select which of the three different sources you wish to search (feel free to search all three if you want) and simply start typing. After at least four letters, the autocompletion will start to do its magic and allow you to pick your entries from the list. Selected terms will be added to the text field below automatically. If you've already got a list of identifiers, you can also copy this list (comma, semi-colon or tab-delimited) into the text field.
MutationDistiller allows you to enter HPO terms which you wish to exclude from your search. You will find another autocompletion field after clicking on Enter HPO-terms which you do NOT wish to include in your search.
If you want your output page to be all fancy, MutationDistiller has another option for you: Clicking on the Highlighting Options button opens a section where you can enter phenotype-related terms you wish to be highlighted in the output page. Maybe this helps you identify the best matching candidates more quickly?

A note on HPO-based searches: As the HPO-gene annotation is based on OMIM, it is impossible to find disease genes via the HPO symptoms if the gene is not listed in OMIM. Thus, if you cannot find a matching mutation in an OMIM-listed gene and are starting to look for new disease genes, we urge you to drop the HPO-based search entirely and focus on the Gene Ontology, OrphaNet, or the metabolic or signalling pathways (see below). If you combine HPO symptoms and GO terms in your search, many other genes will be annotated with (more or less) matching HPO symptoms, giving them a higher score than the real causal gene. Therefore, the only way to find genes without any OMIM annotation is not to use the HPO but to rely on the gene function options listed below.

Gene Function

The Gene Function section is visible by default in the Gene Function view. Like in the Phenotype section, you'll find an autocompletion field here. This one allows you to find entries from the Gene Ontology (GO), and from Reactome and WikiPathways. Nothing more to say here, just try it out.

Gene Expression

This section is hidden by default, but you can always activate it when you feel like it. Assume you don't really have a candidate gene, but you have good reason to expect your most likely candidate gene to be expressed in the brain - then this section is for you! Here, MutationDistiller provides data from various experiments obtained from ExpressionAtlas. You can select the developmental stage you're interested in, the experimental method and you can decide whether you wish to only show genes which fulfil your criteria ('filter') and if you wish to see the expression levels ('show expression'). By default, MutationDistiller will filter non-matching genes out and display expression levels, but feel free to play around with it. Then, you just pick your tissue(s) of interest and give it a go.

Note:
If you select to exclude genes which are not expressed in a tissue you are interested in, please note the following: The gene will only be included in the result list, if there is clear data in one of your selected data sources indicating that it is expressed in the given tissue. For instance, in cases where the gene is not expressed in a tissue and in all other data sources you selected there is no data available, the gene will be removed. Please take this into account when selecting to filter by expression -- for a more inclusive search, please do not use the filter function.

The ExpressionAtlas experiments included in MutationDistiller are:

ENCODE - RNAseq (ExpressionAtlas Accession-Number E-MTAB-4344)
FANTOM5 - RNA-CAGE adult and fetal (E-MTAB-3358)
GTEX - RNAseq (E-MTAB-5214)
Human Protein Atlas (HPA) - protein expression (E-PROT-3) and RNAseq (E-MTAB-2836)
ILLUMINA - RNAseq (E-MTAB-513)
PRIDE - protein expression adult and fetal (E-PROT-1)

For each selected tissue and experimental methods, all data will be displayed, together with sub-tissues. Moreover, MutationDistiller will point out genes which are expressed particularly highly by marking them as Q90 (above the 90% quartile).

HINT: If a candidate gene is mentioned in more than one data source, there might be contradictory information! Sometimes, a gene is expressed in a particular tissue in one data source and not in another (e.g. using a different method). Moreover, for some tissues there might be data available in one source but not in another. In case a gene is explicitly NOT expressed according to one data source while there is no information on it in any other source, this gene will be excluded from the display list if you hit 'filter'. If you wish to see genes like that, please just uncheck the 'filter' button.

Display Options

This section is hidden by default, too. It allows you to customise your output page by adding some sections. Just click on the checkboxes to see what happens.

Submit

Once you have entered all that you want MutationDistiller to know, just hit the big submit button at the bottom of the page. The only entry which is aboslutely required is a VCF file. All other data just serves to filter out the best-matching candidates for your case. Once you submit your entry, you will be redirected to the Output page - but you'll always be able to return to the Input to change your search parameters.

Documentation - Output

Overview Table

MutationDistiller's output page compiles all information in one spot. First of all, you will find a summary table listing all the genes found in your analysis. This information is combined with data on those genes, such as known diseases. In addition, MutationDistiller lists the variants found in each gene, their frequency in databases ExAC, 1000G and dbSNP and whether your patient is homo- or heterozygous. We also indicate compund heterozygosity - However, please note that it is upon you to check the phase!

In this summary table, we also provide some links:

A click on the gene symbol leads you to all the information MutationDistiller could gather on the gene - this is described in further detail below.
The variant link leads to MutationTaster and allows you to inspect your variant and MutationTaster's prediction in detail.
Clicking on the dbSNP, 1000G and ExAC links will take you to each of their pages.
In the reported diseases and mutations section, you can hover over the OMIM and OrphaNet hits to be taken to the relevant pages.
If you have the Integrated Genome Viewer (IGV) running on your computer, you can inspect the variant by clicking on the IGV-link.

Download

Below the summary table, you can find a link to download the most crucial information of the top alterations as a .csv file. As .csv files are common in bioinformatics and allow easy parsing, this tablecan be easily used in downstream applications.

Hint: You can open the downloaded file in any standard text editor. LibreOffice Calc handles .csv files well and should open it automatically in a human-readable way. Some of our users have experienced problems when opening the file using Microsoft Excel, having their data appear in one single column rather than the separate columns desired for easy data retrieval. If this is the case, the following steps might help:

Select the first column (which holds all your data)
In Excel's header menu, click on 'Data'.
In 'Data Tools', click on 'Text to Columns'.
A pop-up dialog will open. Under 'Original data type', select 'delimited', then click 'next'.
Lastly, set a tick at the 'Comma' delimiter and click 'finish'.

Detailed Gene Information

Clicking on a gene symbol takes you further down the page, where MutationDistiller holds all the information it could find for the gene, together with a whole bunch of links:
Here, you'll find basic information on the gene, such as its location, function, alternative names and mutations that have been reported for the gene. Moreover, MutationDistiller tells you why it ranked the gene the way it did. You'll find links to common databases and other sources. Most of them are pretty self-explanatory, but the next few paragraphs will introduce some of the more commonly used sections.

HPO

In the 'HPO' section, you will see which HPO terms are linked with the gene you're looking at. Moreover, MutationDistiller tells you which of those terms matched your search by displaying the term in bold - and it also shows which score was give to that term. MutationDistiller counts direct matches, but also matches of ancestors or decendants in the ontology. The score is determined by the specificity of the term - specific terms which stem from deep within the HPO and are linked with only a small number of genes get a higher score. If you decide on second thought that a certain term does not really fit your patient's phenotype, you can exclude it by clicking on the little 'exclude' link. Similarly, if you see a term in that list which you hadn't thought of but it acutally fits really well, you can include it with a single click and quickly re-run your analysis.

OMIM

The 'OMIM' section displays the OMIM terms linked with the current gene. OMIM has a number of categories and describes the clinical relevance of a gene and the links to known genetic disorders. MutationDistiller shows the ohenotypes with known molecular basis first, followed by other categories. Clicking on the OMIM identifier will forward you directly to the entry. For more information on OMIM, please refer to their website.

OrphaNet

OrphaNet is a repository of Mendelian disorders and rare diseases in human. In this section, you will find entries listed in OrphaNet for the gene in question. Moreover, you can directly access the OrphaNet entry via the link. Please refer to their website for more information on OrphaNet.

Other Sections

In the other sections, you find information from various sources and links to the respective websites to help you determine the most relevant alteration in your case. The following additional data sources are available via MutationDistiller:

WikiPathways: Repository of molecular pathways.
PFAM: Collection of protein families, represented by multiple sequence alignments and hidden Markov models (HMMs)
InterPro Domains: Database of protein domains.
Generifs: Functional annotation of genes.
MGD: Mouse Genome Database
STRING: Functional protein association networks.
GeneOntology (GO): Computable knowledge on gene and gene product function.