[Nexus] NeXus - a solution to what is not the real problem ?

Wuttke, Joachim j.wuttke at fz-juelich.de
Tue Mar 9 13:37:29 GMT 2010


Dear colleagues,

I am currently preparing a deliberately provocative memo with
working title »Why don't we have better data processing software
for quasielastic neutron scattering ?«. One section in this paper
will deal with data storage, and in its present form, it is quite an
attack on NeXus. To play fair, I post it here, looking forward for
your comments. Maybe you will convince me that I am mistaken.

Looking forward to a sound discussion - Joachim


Though all raw data produced by QENS instruments have basically the
same structure, many different storage formats are in use.
Therefore, porting data processing software from one instrument
to another is generally not possible without
adapting at least a read-in routine or providing a raw-data conversion tool.
This is a severe nuisance for users,
and an obstacle for code sharing and collaborative software development.
For these reasons,
it is a popular idea that efforts to improve the software environment
should start with the adoption of a \textsl{common raw data format} ---
I shall call this strategy \textsl{data format first}.

The common raw data format of our time will be NeXus, if any.
Under development since more than 15 years,
NeXus~\cite{qda3} addresses neutron as well as X-ray scattering.
It enjoys strong political backing,
as evidenced by an International Advisory Committee
with delegates from all major facilities.
A growing number of new spectrometers actually use NeXus,
be it by choice or forced by site policy;
on the other hand, so far only few existing instruments have migrated.

When writing the instrument software for SPHERES,
I consciously opted against NeXus,
in favor of a less rigid self-defined format
that is easier to read by a human,
thereby facilitating the debugging of data acquisition and
raw data processing software.
Maybe, my wishes could have been accomodated within NeXus,
had I communicated more intensely with the project team.
However, I have more fundamental objections ---
not against NeXus itself,
but against unrealistic promises,
against overestimating data formats,
against the flawed strategy \textsl{data format first}.

Unifying data formats reminds me of church history:
attempts to (re)unify $n$ different denominations regularly
result in $n+1$ denominations being around:
the new, unified church, plus all the groups that split off
to preserve the good old faith of their own.
When migrating an existing spectrometer towards NeXus,
the instrument scientist needs either to support for long time
read-in routines for both the old and the new data format,
or to provide routines that achieve lossless conversion from the old
into the new format.
Choosing NeXus as raw data format is not sufficient to guarantee
that data from different instruments can be read by the same software.
For instance, at SPHERES,
energy calibration is done at acquisition time,
and energy transfers $\hbar\omega$ are part of the raw data set.
At the ILL backscattering spectrometers,
only a few hardware parameters are stored from which
the downstream software must construct the energy scale.
Translating the current output format into something looking like NeXus
would not make the raw data files mutually legible.
Unifying raw data formats is not possible without unifying
data acquisition programs ---
which will be rarely feasible
because in most cases the hardware is too different.

Some time ago,
NeXus may have been attractive for developers
because its rich application programming interface (API)
relieved them from implementing write-out and read-in routines.
However, this advantage has vanished because
modern generic data formats like YAML \cite{qda5}
allow to store and retrieve
complex data, composed of scalars, hashes, arrays
in arbitrary tree-like structures,
at zero cost through a much simpler API.

Most fundamentally,
I think that efforts to unify the raw data format
are adressing the wrong interface:
most users do not want to see raw data at all.
What users want is a calibrated, normalized, reasonably binned
scattering law $S(q,\omega)$.
What should be standardized is the procedure to obtain such $S(q,\omega)$.
While most of this procedure can be implemented in quite a generic way,
it will remain the instrument scientist's resposibility
to plug in a low-level routine that reads in and calibrates the
raw data from his instrument.
Only he has the technical knowledge required to do it correctly,
and hardly anybody else needs to care about the raw data and their format.

------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzende des Aufsichtsrats: MinDir'in Baerbel Brumme-Bothe
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------


More information about the NeXus mailing list