Danish researchers in a wide range of fields can obtain direct access to data owned by the national bureau Statistics Denmark. A new application programming interface (API) connects Statistics Denmark’s Data Window to the country’s High-Performance Computing (HPC) facilities.
The solution is the result of a collaboration between Statistics Denmark, universities, and DeiC, the national research and education network (NREN). The project was initiated at the request of the Coordinating Body for Register Research (KOR).
“It’s about moving the calculations to where the skills and resources are – without compromising on security and control. Our most important principle is that data should never get out of our control. Therefore, the entire solution has been built around the fact that all data transfers take place via the Danish Data Window, and that we keep track of every single movement,” explains Michael Specht, Project Manager at Statistics Denmark.
Smooth access has come in demand
For 175 years, Statistics Denmark has collected data about Denmark and Danes, and since 1988, researchers at Danish research institutions have been able to work with this data in closed computer environments under schemes controlled by Statistics Denmark. Here, data is made available to certified users at authorized institutions through Denmark’s Data Window. All data is pseudonymized before researchers get access.
However, with the growing use of supercomputing and the emergence of scientific fields such as genomics involving vast amounts of data, a smoother way of organizing access has come in demand.
With the new API, pseudonymized data can be processed at an approved HPC facility where the researcher has obtained computing time. The transfer is based on a so-called pull architecture, where the HPC centers retrieve the data. Notably, Statistics Denmark does not need to establish technical connections to the various facilities. This increases security, while making maintenance and expansion of the solution easier.
First agreement signed for genomics
The project began in 2023. Statistics Denmark has been responsible for project management and development of the secure access to data, while DeiC has played a key role in the technical collaboration with the HPC centers and in developing the connection between the API and the HPC centers. Further, DeiC has developed and tested a proof-of-concept code which the HPC centers have adapted to their local systems.
The first agreement with Statistics Denmark enabling use of the new solution has been signed by genomic research organization GenomeDK and DTU Computerome (an HPC facility operated by the Technical University of Denmark).
“We look forward to the solution going live and see opportunities for other types of organizations than HPC facilities, such as sector research organizations, to potentially benefit from the solution in the long term,” says Rune Gamborg Ørum, Project Manager at DeiC.
The text is inspired by the article “Researchers can now analyze data from Statistics Denmark on national HPC facilities” by Anne Rahbek-Damm at the DeiC website.
