The data challenge
Experiments at the European XFEL will produce an incredible amount of data—all of which needs to somehow be stored and made available for analysis. Chris Youngman’s group at European XFEL has taken up that challenge.
Before scientists can leverage their experiments at the European XFEL to generate new insights, they will have to dig through an enormous amount of data. Take one of the two-dimensional pixel detectors. Each detector will deliver 10 to 40 gigabytes of data—enough information to fill up to over seven DVDs—every second.
Operating all six instruments will, according to current estimates, produce 10 million gigabytes (10 petabytes) of data in the first years of operation, increasing to over 50 million gigabytes per year as a result of detector upgrades. In comparison, the four experiments at the Large Hadron Collider produce about 13 million gigabytes per year.
“To picture this: Storing 50 million gigabytes of data would require 10 million DVDs, which, if stacked on top of one another, would be 12 kilometres high,” Chris Youngman says. The British physicist is the leader of group “Data Acquisition and Controls” that is in charge of handling the vast amount of data.
Author: Dirk Rathje
Some features of the envisaged data handling system at the European XFEL:
- Initial size of the storage system will be 10 million gigabytes, increasing over time to 50 million gigabytes or more.
- Lossless data compression will be applied on the fly whenever possible. For single small biological molecules, the data can be compressed to five percent of its original size. Solids, liquids, and gases do not allow such extreme compression rates.
- Disks will be used to store raw data as well as results from scientific analysis for about one year. After that, all raw data are moved to a tape archive for long-term storage.
- Computing clusters close to the data archive will be used to analyse the data. Estimates indicate that 2 000 processor cores per petabyte of stored data will be needed to perform scientific analysis. For 10 million gigabytes, this corresponds to about 2 000 desktop or 200 large server machines.