crwdns2056:0Using Networks and Prior Knowledge to Uncover novel Rare Disease Phenotypescrwdne2056:0

crwdns2068:0Blue Sheep and Daniela Sadericrwdne2068:0

crwdns2074:0crwdne2074:0: July 31, 2025
DOI: 10.5072/zenodo.302714
crwdns2076:0crwdne2076:0: CC BY 4.0

This review is the result of a virtual, collaborative Live Review organized and hosted by the PREreview team as part of an ongoing collaboration with the Rare As One Network on July 17, 2025. The discussion was joined by 7 people: 2 facilitators from the PREreview Team, 2 members of the Rare As One Network team, and 3 Live Review participants. We thank all participants who contributed to the discussion and made it possible for us to provide feedback on this preprint.

Summary

This manuscript presents an innovative network-based approach developed by the MAGNET team—including the authors of the preprint—to identify novel phenotypes associated with rare diseases, in the context of the 2023 Xcelerate RARE challenge. Leveraging a multilayer network that integrates patient- and caregiver-reported symptoms from the Xcelerate dataset with curated data from two established knowledge repositories, Orphanet and the Human Phenotype Ontology (HPO), the authors used a Random Walk with Restart algorithm (MultiXrank) to prioritize phenotypic terms associated with specific rare diseases. The approach aimed to address key challenges in rare disease characterization, particularly the mismatch between standardized phenotypic vocabularies and the varied language used in clinical documentation. Live Review Participants acknowledged the innovative aspect of the approach and its potential in developing a scalable and data-driven strategy for uncovering underrecognized phenotype-disease associations in rare disease. However, they also highlighted some important limitations that should be more clearly discussed in the manuscript. Below we summarize main strengths, as well as major and minor concerns raised during the Live Review and hope this feedback will help authors improve the manuscript in its next version.

Main strengths

The approach is novel in its integration of datasets (Xcelerate, Orphanet, HPO) that have not previously been linked.
The solution appears well-deserving of its innovation prize due to its creative methodology, particularly for rare diseases that are traditionally understudied.
Use of publicly available datasets and provision of code via GitHub is a strength for transparency and reproducibility.

List of major concerns and feedback:

Lack of clinical validation and controls: A recurring concern among Live Review Participants was related to the lack of a clear validation strategy to test whether the methodology works as described, specifically to assess whether it leads to uncovering new, clinically-accurate phenotypes of rare disease. Some sort of clinical back-testing involving clinicians and patients could be useful to better understand how this method can lead to practical applications and clarify its limitations. Also, Live Review participants wondered if it would be possible/useful to run the model in reverse—i.e., start from outlier phenotypes and move backwards towards the starting point.
Overreliance on a single dataset (Xcelerate): While it’s clear that this study was done in the context of a competition in which the dataset was provided to challenge participants, a concern was raised around how truly generalizable this methodology is. It is unclear how the model would perform on other datasets or with more than two network layers. The authors may consider addressing this issue more directly in the limitation/discussion section of the manuscript.
The use of the word “novel” in the title is questioned due to lack of comprehensive dataset coverage and validation. The title and discussion should be nuanced to reflect that the methodology did not actually uncover any “novel phenotypes”. The novelty may be dictated by the scientific or clinical communities unfamiliarity with these phenotypes but may be commonly experienced by the patient community. To make such a claim there is a need for clinical validation of the findings (see first bullet).
Ethical and positional considerations:
- While the authors clearly state that their team performed this study in the context of a competition, it is unclear how this process may have shaped or biased how the study was conducted. It would be helpful to know more about how the innovation prize was awarded and who judged it. It would be helpful to add a permanent link pointing to a place readers can learn more about the challenge.
- The manuscript would benefit from a positionality statement from the authors clearly acknowledging potential biases in design or interpretation related to the experiences and background of the authors. An example of a positionality statement that was effective can be found in this preprint (https://osf.io/preprints/metaarxiv/7djhq_v1).
- More transparency and information about the demographics of the patients in the dataset would help readers assess generalizability of the study would be helpful.
- Live Review participants believe the preprint is missing acknowledgment of the contributions of patients and caregivers who shared sensitive data. Similarly to the positionality statement, this is a practice that should be more broadly adopted, particularly in studies involving communities whose data is openly shared often without a clear consent of being used for this particular study. Again, we believe the preprint linked above does a good job at highlighting these issues.

List of minor concerns and feedback

Missing comparison with existing tools: It’s unclear if there are existing tools that attempt to perform similar tasks or fall short, to justify the need for this new model.
Lack of clarity around disease selection: It is presently unclear what the rationale for disease selection was (only 15 of 27 diseases in the dataset were used for the model). Authors are encouraged to clarify exclusion criteria.
Limited applicability: A concern that was raised very soon in the Live Review discussion has to do with the fact that many rare diseases don’t have an Orphanet code. The authors do briefly mention this issue in their discussion, but it would be helpful to further highlight this issue as an upstream model only works for rare diseases that are in the datasets used (e.g., those with Orphanet codes).
Reproducibility of the study: Although the link to the code is provided, it’s unclear how reusable the pipeline is or how dependent it is on the specific 2023 challenge data. Participants of this Live Review did not test the code on GitHub so the true reproducibility of the code remains unverified.
Presentation & Figures:
- Table 1 would benefit from additional context, e.g., disease prevalence, significance of findings, annotations and references to external resources for diseases and symptoms.
- Missing or unclear figure legends (particularly for Figures 1 and 2); unclear use of color and visual encoding. The use of percentages would be appreciated for ease of interpretation.
- Figures could be improved for clarity and emphasis, e.g., clearer edge weights, linking related data columns.
Clarity, discoverability and accessibility of the findings:
- The language used to describe the methods and findings and the abbreviations assume technical familiarity with these kinds of approaches. The manuscript would benefit from clearer explanation and reduced jargon.
- From a discoverability perspective, it was suggested that the authors consider adding keywords to the abstract such as: XYZ.
- The manuscript also presents some terminology inconsistencies which make it hard to understand: for example it seems to use terms such as “intellectual disability” and “cognitive impairment” interchangeably. Consistent terminology would increase clarity. Authors should consider adding HPO terms every time they are describing a phenotype or symptom.

Concluding remarks

We thank the authors of the preprint for posting their work openly allowing us to discuss it in the context of a Live Review and to openly share our feedback. We also thank all participants of the Live Review call for their time and for engaging in the lively discussion that generated this review.

Competing interests

The authors declare that they have no competing interests.

crwdns4162:0crwdne4162:0

crwdns4170:0crwdne4170:0

crwdns4164:0crwdne4164:0