Development, implementation, and validation of an open-source Federated Learning platform to accelerate innovation and boost personalized medicine in rare and ultra-rare haematological diseases: an initiative by GenoMed4All Consortium
Published in medRxiv, 2025
Background. Rare haematological diseases (RHD) pose significant clinical challenges due to their heterogeneity, limited patient populations, and fragmented datasets. To overcome these limitations, improve access to, and use of real-world multimodal data for scientific and clinical purposes, the GenoMed4All Consortium developed an open-source Federated Learning (FL) platform. This platform enables collaborative, privacy-preserving AI model training without the need to centralize sensitive patient information. Methods. The FL platform was deployed within EuroBloodNet, the European Reference Network for RHD, across multiple use cases, including myelodysplastic syndromes (MDS), acute myeloid leukemia (AML), chronic myelomonocytic leukemia (CMML), and multiple myeloma (MM). Multimodal datasets (including clinical, genomic information together with histopathological and radiological extracted features) were utilized. Predictive models (DeepSurv and SAVAE) and generative Artificial intelligence (AI) algorithms (CTGAN, Bayesian Networks, and VAE-BGM) were trained using a federated approach. A dedicated data harmonization pipeline based on the FHIR standard ensured consistency across participating centers. Findings. Federated models achieved performance comparable to centralized approaches, with highest benefit for institutions with smaller datasets. The platform enabled integration of multimodal data demonstrating flexibility across diverse data types and clinical endpoints. The inclusion of multimodal information improved predictive accuracy over currently available prognostic schemes. Generative models successfully created synthetic datasets that preserved both clinical and statistical fidelity while ensuring patient privacy; this allows extraction of insights from real-world data that can be used beyond the boundaries of FL, as a source for accelerating the conduction of clinical trials. A preliminary implementation within the EuroBloodNet clinical network demonstrated feasibility for broader scale-up. Interpretation. This study validates FL as a robust, privacy-compliant approach to enable AI-driven precision medicine in RHD. The platform facilitates real-world data integration and model scalability, providing a foundation for multicenter collaboration, regulatory-grade evidence generation, and innovative trial designs in rare diseases. Funding. European Union’s Horizon 2020 research and innovation programme.
Recommended citation: Asti, G., Apellániz, P. A., Carota, L., Casadei, F., Piscia, D., Delleani, M., ... & Álvarez Garcia, F. (2025). Development, implementation, and validation of an open-source Federated Learning platform to accelerate innovation and boost personalized medicine in rare and ultra-rare haematological diseases: an initiative by GenoMed4All Consortium. medRxiv, 2025-08. /files/2025-08-11-fedlearning-rarediseases.pdf
