How To Start A Data Federation: Common FAQs for Patient-Focused Organizations

September 13, 2023

Excited about what a data federation could do for the disease area you focus on? Unsure about how it can all come together? We’ve compiled answers to common questions we hear from patient-focused organizations looking to develop their own data federation with Array. 

Before diving in, remember we see you, the non-profit, as the leaders of the data federations we create. Additionally, we don’t pigeonhole you into using only data we have. Quite the opposite; we focus on your research goals and give you all the tools and support to build your patient-benefitting data asset. Think of us as your agnostic technology and services provider. Let’s get to it:

  1. What is the hospital/health system’s approval process to gain use of their data for the research we’d like to perform?
    • We have seasoned project managers and an approval playbook to lead you through the process. Once you provide the clinical champion within the hospital/health system, we will work through the hospital’s/health system’s IRB, IT, Information Security, Legal, and any other needed committees to get approval.
    • Our advanced data privacy technology is also part of our secret sauce for a fast hospital/health system approval process because zero-trust is required; only your mission-driven research will occur on the data and we can technologically guarantee it. Generally, from first conversation to signed contract is a 4 month process.
  2. What is the biggest challenge in getting hospitals/health systems onboard and providing data?
    • The biggest challenge in setting up a data federation (with or without Array), is if Hospital IT time is required for extracting, deidentifying, and mapping the hospital’s data to your data federation’s data model. If it is not required, it is simple for the data provider to load their data into their own cryptographically-contained enclave.
    • To mitigate that challenge, Array can provide and manage a third party to do this work. The cost per hospital is usually anywhere from $5 to 25K and takes 6-8 weeks to complete.
    • If relying solely on the Hospital’s IT team, it can 2 to 8 months for them to extract, deidentify, and map to your data federation’s data model.
  3. How do you identify hospitals, health systems, community health providers, or other clinical data partners to participate in a data federation I’d like to start?
    • The non-profits we work with usually have a network of clinical researchers who are affiliated with a hospital or health system. Additionally, we use our and the non-profit’s network to identify and recruit others as needed to meet the research goals.
  4. How does the agreement structure work for launching a data federation? Do you have template agreements?
    • There are two main agreements. First is a Technology Service Agreement between Array and your institution to establish us as your agnostic technology and service provider for your data federation. Second is a Data Federation Agreement between your institution and each partner providing data (e.g. hospital, health system, etc.).
    • We have a template agreement for each type of data federation model (see question 6 for the different models) as well as recommended affordable law firms with expertise in multi-institution health data research agreements.
  5. Each hospital/health system has different ways of formatting their data. What is the process for normalizing the data for research?
    • Part of the onboarding process is creating a common data model (e.g. what data fields to include and how that data is formatted). We have a software tool to help you and your scientists create this data model collaboratively. Additionally, our project manager and data scientists will support you in developing your data model. We have examples we can provide under a confidentiality agreement.
  6. How does the governance of our data federation work? How are research projects reviewed and approved for using our data federation?
    • Currently there are two main models and we have template agreements you can use for each.
      • First is a data federation to support one or multiple known research project(s) and, in the future, the ability to add research projects for the clinical data partners to review and approve under this same agreement.
      • Second, your non-profit creates a governing body, usually composed of your leadership and PIs from each hospital providing data in your data federation. This body reviews and approves research projects under overarching research goals and policies. The clinical data partners agree to cede approval to your governing body because the approval policies and mission-driven goals are aligned with their goals.
  7. How do my researchers get access to my data federation?
    • Once the research project is approved, you tell us to add that researcher to your data federation, and we provide that researcher the credentials to use the Array web portal for their project.
  8. Can we add new data fields or providers in the future? Can we have the hospitals/health systems refresh the data, if so, how often?
    • As part of our Technology Service Agreement with you, we offer at no additional cost two updates to your data model (e.g. adding new data fields) per year and two data refreshes per year.
  9. What types of data and datasets do you support?
    • All data we support is de-identified. The data can come from clinical care (e.g. EHR data), research/study (e.g. REDCap), your own internal data, public data (e.g. CDC’s SDOH), patient registries, and many others. We currently support data that is structured (think rows and columns). Additionally, we have the capability to process unstructured data (e.g. notes) into structured data and the ability to extract data from media files like PDFs. We will have the ability to load and perform research on medical image data in Q4 2024.
  10. Can you connect the clinical data provided by the hospital/health system to our patient registry data?
    • Yes, we have a partner who can connect a patient’s medical records to your patient’s survey data all while maintaining deidentification at an affordable price.
  11. What is the time commitment needed by us, the non-profit?
    • There are biweekly meetings with Array, meetings with your data partners for recruitment, and 1-2 hours of work during the onboarding and launch.
  12. What types of research analysis can be performed?
    • We have a researcher portal that supports all common statistical and advance data science analyses (e.g. paired t-test, XGBoost, etc.) and can easily add others. There is also a cohort builder and various graphical visualizations of results.
  13. Can researchers using my data federation download the data?
    • We can provide tooling for researchers to download the data and use it on their local servers if the contracts with the data providers are set up for that. A key part of our speedy approval by hospitals/health systems is that our product has the novel ability to enable data science research without their hospital data needing to be moved, downloaded or copied. With our product, zero-trust is required for only the approved data uses by the hospital to occur on their data. If the data is downloaded and leaves the Array platform, then the hospital has no guarantee beyond the contract that the data isn’t accessed, emailed out, or used in an unapproved way (even if it is just an accident). For our non-profits who want to enable these large data science projects but cannot financially or reputationally afford any risk of data misuse, our advance data security methods are a huge peace of mind. We worked with hospital data science centers to ensure our product includes the needed analytic tooling so downloading to use the data with analytic tools is not necessary. Additionally, adding new analytic tooling is very simple for our team to do.
  14. How do you recommend starting a data federation?
    • Start with your research goals and work with your science community to determine your first research project or set of research projects that require linking multiple datasets and the PI(s). See our case studies to learn more examples.
  15. How are you different from other data science companies?
    • You are the star of your data federation. We’re the tech and services provider to help you develop your own data research asset leveraging your current clinical network and expanding to new health systems or research institutions. We are not data profiteers seeking other uses of your patients data.
    • We do note pigeonhole you into only using pre-baked generic data.
    • Our advanced data security means faster hospital approval and bolstering a patient-centric reputation.
  16. How does IRB approval occur?
    • Since we only use deidentified data, the research projects are IRB exempt since it is not human-subject research. Your hospital/health system data partners will either go through their IRB to receive the exemption or sign a reciprocity agreement with an institution.

Array Insights is ready to be your agnostic technology and services partner as you explore deploying your own data federation. Patient-focused organizations looking to learn more should reach out to our team to set up an initial chat or email us at 

We look forward to hearing from you!