
Solution Architecture and creation of a data warehouse fulfilling German healthcare regulations, including an ETL pipeline for medical data of the leading fasting clinic, to provide personalised app based suggestions to improve the treatment.
Case Study: Modernizing Data Infrastructure for Buchinger Wilhelmi Fasting Clinic
Client: Buchinger Wilhelmi
Project Overview:
Buchinger Wilhelmi, the world’s leading fasting clinic, sought to enhance their treatment personalization and research capabilities through advanced data management. The goal was to establish a robust data warehouse that complies with stringent German healthcare regulations, specifically the Kliniklandeskrankenhausgesetz, while enabling efficient data processing and personalized app-based suggestions for patients.
Objective:
To architect and implement a secure, scalable data warehouse solution with an efficient ETL pipeline, ensuring compliance with German privacy laws and enabling the clinic to deliver personalized treatment advice and facilitate comprehensive research analytics.
Solution Design Process:
Requirement Analysis:
- Engaged with Buchinger Wilhelmi’s medical and IT teams to understand their specific needs, regulatory constraints, and data processing requirements.
- Identified the necessity for a secure, on-premises solution due to the restrictions on using public cloud providers.
Technology Selection:
- Chose Kubernetes for its robust container orchestration capabilities, enabling autoscaling, load balancing, and efficient resource management on bare-metal servers.
- Selected Prefect as the ETL scheduler for its efficiency and flexibility over traditional tools like Airflow.
- Decided on Minio for S3-compatible object storage, facilitating seamless data staging between ETL steps.
- Opted for PostgreSQL as the final data repository for its reliability and compatibility with analytical tools.
Architecture Design:
- Designed a Kubernetes cluster on bare-metal servers to ensure compliance with German healthcare regulations and provide scalability.
- Implemented a comprehensive ETL pipeline managed by Prefect, orchestrating the ingestion, processing, and transformation of medical data.
- Configured Minio S3 buckets for intermediate data storage, ensuring smooth data flow between ETL stages.
- Established PostgreSQL databases to store the final processed data, enabling easy access for app-based personalized recommendations and research analytics.
Implementation:
Infrastructure Setup:
- Deployed a Kubernetes cluster on bare-metal servers, ensuring high availability, autoscaling, and load balancing to handle varying workloads.
- Configured Prefect as the ETL scheduler, defining and orchestrating the necessary data processing workflows.
ETL Pipeline Development:
- Developed ETL processes to extract medical data, transform it into useful formats, and load it into the data warehouse.
- Utilized Minio S3 buckets for staging data at different ETL steps, ensuring seamless transitions and data integrity.
- Finalized the processed data in PostgreSQL databases, making it available for further analysis and application use.
Data Utilization:
- Integrated the processed data with a personalized app, providing users with tailored fasting recommendations based on their medical data.
- Developed analytical dashboards for the research department and doctors, enabling data-driven improvements in treatment protocols.
Results:
- Regulatory Compliance: Successfully implemented a secure, on-premises data warehouse solution that complies with the Kliniklandeskrankenhausgesetz.
- Enhanced Personalization: Enabled personalized fasting recommendations through a user-friendly app, enhancing patient treatment experiences.
- Improved Research Capabilities: Provided the research department and medical professionals with comprehensive analytical dashboards, facilitating data-driven insights and treatment improvements.
- Efficient Data Processing: Achieved efficient data processing and transformation through the use of Prefect and Minio, ensuring timely and accurate data availability.
Conclusion:
The project resulted in a state-of-the-art data warehouse solution for Buchinger Wilhelmi, leveraging advanced technologies to comply with German healthcare regulations and significantly enhance patient treatment personalization and research capabilities. By implementing a secure, on-premises infrastructure with efficient ETL processes, the clinic can now offer personalized recommendations and drive continuous improvements in fasting treatments.
Want to Transform Your Healthcare Data Management? Contact us today to explore how we can help you build a compliant, scalable, and efficient data warehouse solution tailored to your needs!
Full transcript
No industry has more data leaks than the healthcare industry. And this is oftentimes due to the fact that technology is really old, databases are still from the 60s and 70s, and they are not protected because of course technology is not the main interest point of a clinic. But luckily there are different solutions evolving for healthcare providers and clinics, which is that we can build a data warehouse with medical data in a safe and transparent way as well. And similar to the data warehouse that I was building for the finance and banking industry, check the Atrovia video for this. In the case of Buchinger Wilhelmi, we implemented a safe and non-public data warehouse, which means that we are not hosting the data warehouse in AWS or somewhere else which is public, but rather host it locally.
And as I said, we could use Hadoop, Trino, and whatever for this. But if you’re building, especially for smaller databases, a good combination of PostgreSQL or TimescaleDB, we can still have the benefits of big data, like distributed machines, distributed computing, but with a normal SQL database. And it really depends on the case if Trino, like the big, huge Shadoop-style database makes sense, or a well-tuned PostgreSQL. But in this case, it was the PostgreSQL. And something we did to ensure data safety is separating sensitive data from not too sensitive data. And I mean in the medical context basically everything is sensitive.
But what I mean is that we can set different columns in different tables to sensitive, which only, for example, only the research department is allowed to see. And for other people, like let’s say if you’re accessing this database as a marketing user, you will see an anonymized view of these columns, meaning programmatically without coding it, without having different databases, we can say that someone from marketing can only see the column first name in an anonymized way, like patient ABC324, and someone from research sees the full name. And in many cases, it’s not even necessary that we see the full name, especially in Europe where GDPR and AIPAA are really important. We have different solutions to work around these things.
And I mean, oftentimes, especially if you’re building research analytics, the names, addresses, and different things are not too important. And therefore, I would call it like a small or smart medical data warehouse where we implement different technical ways to anonymize, pseudonymize data and store it in different locations, meaning we could even split the database into the sensitive part, which is hosted in the clinic, which is locked down, which has encryption enabled, and the not so sensitive data, like for example, blood measurements or DNA data being in the cloud. But if we just have the cloud values with an ID, let’s say this is the DNA data of patient number three, we can’t make any conclusions on who that person is, especially if we kick out addresses, if we kick out ethnicity data and so on. But in case we still need this combination, we can still query our smart data warehouse and make the connection with the local sensitive data.
And especially if you are, for example, offering an app or something, you might need the public cloud as a way to scale. Like, let’s say, if there’s one specific timeframe where a lot of patients are in your clinic, then the cloud can easily scale up and down to manage the demand. But the connection with the sensitive data might not need so many credits. We could even cache it in the cloud with a time out of a day or six hours, for example, and just make this connection with the sensitive database if it is really required. Anyways, if you’re interested in something like this, or if you are interested in handling medical data in general, you can have a look at my website because I’m providing several tutorials about it all for free. You can download free material. data in general, you can have a look at my website because I’m providing several tutorials about it all for free. You can download free material. And if you want, we can also have a conversation. Just head over to my website and see you soon.