In a novel approach, we optimised these weights on a set consisting of known disease mutations from ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/) linked with HPO terms: We obtained all pathogenic ClinVar entries with at least two HPO terms; a total of 188 cases linked with 142 different genes. Please refer to our web site for this test set. We spiked these mutations into the HG00377 exome from the 1000 Genomes Project (http://www.internationalgenome.org/) and sent them, together with the associated HPO terms, to MutationDistiller. Subsequently, we iterated through a range of weight combinations (245 combinations in total) for direct, ancestor and descendant matches and compared the results. If the disease mutation was found, we then observed the distribution of the ranks given to the genes containing the disease mutation across all weight combinations. We only regarded the first 100 ranks, denoting any cases beyond that as not found. Genes with the exact same score were given the same rank.
A VCF file containing all the ClinVar variants we used, together with their HPO symptoms, can be found
The results for all iterations through 245 different weight combinations for direct HPO matches, ancestor and descendant term matches can be accessed here.
To validate MutationDistiller's HPO-based prioritisations, we compared it to other tools sharing similar properties: In our test, we included widely used and freely available functional state-of-the-art tools which do not require any software installation or user login, can work with single patient VCF files and offer HPO-based prioritisations. We found three different algorithms fulfilling these criteria: eXtasy (http://extasy.esat.kuleuven.be/) and the PhenIX (http://compbio.charite.de/PhenIX/) and HiPhive algorithms incorporated into Exomiser (https://www.sanger.ac.uk/science/tools/exomiser) We used version exomiser-cli-10.0.1 and eXtasy version 2013-07-04. For our analyses, we used default settings for all algorithms, which is what an untrained user would be expected to do. For each of the algorithms, we had to rely on locally installed versions as the online tools were not working reliably or fast enough for our purposes. We tested the software on a set of 101 solved patient cases from the Charité Berlin. These instances of rare, early-onset Mendelian disorders were provided by clinicians and researchers working in the Department of Neuropaediatrics and the Institute of Medical Genetics and Human Genetics. We used newly found disease mutations which were not yet included in ClinVar, together with the HPO symptoms assigned to the patient and information on the expected mode of inheritance (if available). The set included a range of disorders and various types of mutations as well as compound heterozygous cases. To account for patient data protection, we spiked the known causative variant for each case into the same 1000G exome VCF file used for optimisation of MutationDistiller (HG00377). As the eXtasy algorithm is not capable of working with all HPO terms, we removed for this tool the terms not found in eXtasy's database from our set. This limited our set for eXtasy analysis to 88 cases. Moreover, eXtasy's entry options are limited to 10 HPO symptoms per case. In the 7 cases with more than 10 HPO terms, we thus randomly removed symptoms to reach only 10 terms. We then sent the resulting VCF files, the HPO identifiers and mode of inheritance information submitted by the clinicians to the different tools. For MutationDistiller, we used the HPO weight settings determined in the optimisation procedure described above. The tools included into this comparison do not provide a score for known pathogenic variants, which is why we decided not to take into account MutationDistiller's ClinVar score at this stage.