What is wKinMut-2?

wKinMut-2 is a unified one-stop shop to facilitate the exploration and understanding of the consequences of protein kinase variants and the underlying biological mechanisms by which they contribute to disease, and in particular, cancer.

What is the scope of wKinMut-2?

Provided that focus is set on kinase variants involved in cancer and other human diseases, the scope of wKinMut-2 lays within the human kinome.

How do I submit variants for analysis?

Please, provide one or more variants in the 'variants' box. Optionally, a descriptive name for the experiment can be provided. The wKinMut-2 submission format includes the Uniprot/Swissprot accession number, the wild type residue, the position and the mutated residue. For instance, a variants from Glycine to Alanine in position 719 of the Epidermal Growth factor receptor, will be encoded P00533 G719A. Other identifier formats are aceptable in addition to UniProt/SwissProt accessions, such as gene symbols or Ensembl Protein IDs, which will be automatically translated. Multiple variants can be submitted at a time.

Is there any sample dataset I can use to try wKinMut?

Yes, there is. In the framework's main page, press 'Try example' to load a toy dataset.

How do I get extended information about the variants?

In the results table, click on the pathogenic character of the variants (red/green buttons). This will redirect you to a new page containing information including the values of the features using for classification, protein-protein interaction information, mentions in the literature of the variants and existing records of the variants in other databases. This additional information is intended to provide the basic background to help to understand and interpret the consequences of the variants.

Where does the drug and chemical compound information come from?

wKinMut-2 provides information about US FDA approved protein kinase inhibitors, according to Janne et al., 2009. The server also includes an update compiled by Robert Roskoski Jr. from the Blue Ridge Institute for Medical Research, North Carolina USA, available at http://www.brimr.org/PKI/PKIs.htm

In addition, wKinMut-2 reports chemical compounds and screening data for Kinases, extracted from Kinase SARfari an integrated chemogenomics workbench developed by the ChEMBL team at EMBL-EBI. Kinase SARfari is available at https://www.ebi.ac.uk/chembl/sarfari/kinasesarfari

What information does Structure-PPi provide?

Structure-PPi reports protein features (e.g. functional domains, known somatic variants in different types of cancer, UniProt annotations from missense variants, ligand binding residues, catalytic sites) that overlap the variants ("direct matches") or that are in close physical proximity ("neighbours"). When variants affect the interfaces of protein complexes, Structure-PPi also reports the partner Ensembl Protein ID, and the protein features for partner residues.

Where does the variant pathogenicity information come from?

wKinMut-2 provides predictions from 9 different methodologies. Of these methods, 8 are directly extracted from dbNSFP and include likelihood scores from SIFT, Polyphen2, LRT, MutationTaster, MutationAssessor, FATHMM, VEST3 and CADD. You may access the pathogenicity information of variants used for training for disease associated variants here, and for non-disease associated variants here. In addition to these, wKinMut-2 implements a Random Forest classifier specific to variants in the human kinome. A number of sequence-derived features that characterize variants affecting human protein kinases at different levels:

  • gene level, including membership to a Kinbase group and Gene Ontology categories
  • domain level, using PFAM domains
  • at the residue level, involved amino acids types, changes in biochemical properties, functional annotations from Uniprot, PhosphoELM and FireDB.

The focus on the protein kinase superfamily enabled the choice of features unique to this superfamily. These kinase-specific annotations greatly increase the accuracy of the classification of variants. Each prediction is accompanied by a reliability score, and the annotations giving rise to the predictions are also displayed to facilitate the biological interpretation.

What is the performance of the Random Forest? Is there any benchmark?

Our approach outperforms available methods in a cross-validation (k=10) experiment on the 3689 kinase variants for which a characterization of their pathogenicity in Uniprot exists. The evaluation followed established good practices in the field obtaining satisfactory results. The following table includes the results for the classifiers according to the values provided by dbNSFP. You may access the data used for training and validating the model here. The data splits used in our cross-validation assessment are available here.

Method Accuracy Precision Recall F-score MCC
MutationTaster 0.56 0.38 0.96 0.55 0.36
SIFT 0.68 0.45 0.81 0.58 0.39
Polyphen2:HDIV 0.66 0.44 0.90 0.59 0.42
LRT 0.65 0.45 0.87 0.59 0.39
MutationAssessor 0.76 0.55 0.66 0.60 0.43
CADD 0.76 0.54 0.77 0.64 0.48
Polyphen2:HVAR 0.64 0.53 0.85 0.65 0.50
FATHMM 0.82 0.69 0.63 0.66 0.54
VEST3 0.87 0.74 0.82 0.78 0.69
KinMut RF 0.88 0.82 0.75 0.78 0.68

Where does the Protein-Protein Interaction information come from?

Protein-Protein Interactions (PPI) information comes from iHOP and String. iHOP is a powerful text mining system to automatically extract protein protein interactions from PubMed abstracts. In addition, the original paper sentences are also provided as a means to relate the interaction information with its context. STRING is a resource that stores known and predicted protein interactions from different sources including Genomic context, High-throughput experiments, coexpression and text-mining of the literature. Hence, String interactions include direct (physical) and indirect (functional) associations (more at http://string-db.org/)

Where does the literature information come from?

The sentences providing contextual information about the variants come directly from the literature. We obtain this information using SNP2L. In brief, SNP2L is a literature mining pipeline for the automatic extraction and disambiguation of single-point variants mentions from both abstracts as well as full text articles, followed by a sequence validation check to link variants to their corresponding kinase protein sequences (more at Pubmed: 19758464)

Which databases are revised in order to gather information about the variants?

Currently, four different databases are revised: Uniprot Variation Pages, COSMIC, Kin-Driver and KinMutBase. These databases were selected in order to cover different aspects of human protein kinase variant.

Can I access wKinMut-2 programmatically?

Yes. The predictions from wKinMut-2 can be downloaded programmatically using a REST service. An example can be found here

Is the source code available?

Yes. The source code for this application is released under the GPL version 3 license at https://github.com/Rbbt-Workflows/kin_mut2

I have a question about wKinMut-2 that is NOT in this help section. May I contact you?

Of course you can. Feel free to email us (txema-at-cbs-dot-dtu-dot-dk or miguel-dot-vazquez-at-cnio-dot-es) and we will try to answer your questions.