FAQs


1. How often will the data in DGlinker be updated?
2. What is the difference between Uniprot and IntAct protein-protein interaction databases?
3. What gene nomenclature can be used on DGLinker?
4. What terms does DGLinker accept?
5. What phenotype/disease terms are available on DGLinker?
6. Why do I see only full pies in the Model performance tab or even in the Results table if I selected several datasets?
7.Given that some databases used in the Enrichment analysis might take part in the predictions, isn’t this circular reasoning?
8. What’s the difference between the “normal” and “expanded” versions of Gene Ontology?




1. How often will the data in DGlinker be updated?

We aim to update all databases in DGLinker every 3 months. However, sometimes we might get a little behind with the work. Please get in touch if you notice that any of our databases is not up to date and we will try to update it asap (dglinker.service@gmail.com).


2. What is the difference between Uniprot and IntAct protein-protein interaction databases?

Intact is the latest release of the whole IntAct database, while Uniprot (https://www.uniprot.org/help/binary_interactions_import) consist of a subset of binary interactions from Intact, determined using a simple scoring system developed by IntAct and a score threshold that has been deliberately chosen to exclude interactions supported by only one experimental observation. Details of how interactions are scored can be found at the IntAct website (http://www.ebi.ac.uk/intact/pages/faq/faq.xhtml#4).


3. What gene nomenclature can be used on DGLinker?

Only HGNC gene symbols (https://www.genenames.org) are recognised by the webserver


4. What terms does DGLinker accept?

1) Disease terms. For example, 'Alzheimer disease'
2) Phenotype terms. For example, 'headache, fatigue'
3) HGNC gene symbols


5. What phenotype/disease terms are available on DGLinker?

All phenotype/disease terms from OMIM, Clinvar, DisGeNet and HPO are available on DGLinker. Please use the query box in the input sections Select phenotype(s) or Select phenotype(s) associated genes to check if your target term is present. If still unsure, please feel free to contact dglinker.service@gmail.com


6. Why do I see only full pies in the Model performance tab or even in the Results table if I selected several datasets?

A full pie can represent the following 2 scenarios: 1) the user selected only one type of data, either one database or many but still the same type, to build the model, e.g. only disease-gene databases; or 2) the predictive features selected by the machine learning belong only to one type of data. Both are common scenarios and should not alarm the user.


7.Given that some databases used in the Enrichment analysis might take part in the predictions, isn’t this circular reasoning?

Depending on the input options, some of the databases in the Enrichment analysis tab of the results might also be used to build the model. This is not always a problem per se, and the enrichment analysis can still add value. However, if overlooked, this might lead to biased interpretations of the results. We discuss this in the tutorial (section 4.4). We encourage the users to take this into consideration in such a scenario and to use the pie chart in the Model performance tab to check which databases have been effectively used for the predictions among the ones selected at submission.


8. What’s the difference between the “normal” and “expanded” versions of Gene Ontology?

The basic GO data is the raw associations from the GO consortium. The GO (expanded) datasets are created by DGLinker using the hierarchical relationships between GO terms in the ontology to propagate gene – term associations up the hierarchy to their parents. Specifically this is done using the go-basic.obo ontology that is guaranteed to be acyclic. Therefore the machine learning process will see both specific annotations and the broader context of these terms, potentially allowing it to learn more broad associations.