TR-24-01: Federated Learning Over Electronic Health Record Data


“Federated learning” is a relatively new concept that has emerged from the fields of machine learning and networking, in which centralized machine learning models are applied to private decentralized data. As part of the Biomedical Data Translator (“Translator”) program, we have developed three regulatory-compliant, open- source services for exposing insights derived from the electronic health records of three healthcare systems.

Herein, we propose a two-phased approach for extending our existing services to support federated learning. In the first phase, we propose a proof-of-concept demonstration, leveraging the current Translator architecture and a new Translator service termed BioPack, which comprises a general workflow manager, a centralized server, and a subgraph retrieval service. BioPack will be used to run sequential queries across our three Translator clinical endpoints, thus mimicking federated machine learning. Those queries will take the form of a simple mathematical algorithm that performs a calculation across the results that are returned by each of the three Translator clinical endpoints for each sequential query. In the second phase, we propose to extend the phase one effort, leveraging a secure version of BioPack or a separate secure server for centralized queries, but directly targeting the private clinical databases that support the Translator clinical endpoints. We will prioritize privacy and security during both phases of the work. Collectively, our proposal will allow us to generate new clinical insights for contribution back to our institutions and the Translator program.