[Nexus] NeXus - a solution to what is not the real problem ?

Pete Jemian prjemian at gmail.com
Tue Mar 9 14:39:12 GMT 2010


Joachim:

You've almost got hold of the point, just a couple more steps (IMHO).  
Careful retention of the raw data is desired by so many scientists 
(experimentalists) that they become uncomfortable if that information is 
not retained.  That must have been an early driving force for NeXus.  
Practical experience shows that the real common denominator for data 
analysis is data which has been reduced to some common form (common, as 
decided by the science underlying that data analysis).  So where you 
introduce yet another data format (with good aims for sure), does it 
progress towards the goal that the new format will be adopted by more 
than one facility?  Be careful there.  No such thing as temporary 
software.  Recently, there was a workshop at the ESRF to discuss the 
suitability of HDF as a common underlying file format for multispectral 
data.  At this workshop, some raised points similar to yours.  I'm sure 
you could get a copy of the workshop summary from V. Armando Solé 
<sole at esrf.fr> (it does not appear to be easily found by Google today).

     http://www.esrf.eu/events/conferences/hdf5/workshop-agenda

Recently, NeXus has begun to broaden its view of data from raw data to 
the description of processed data such as the reduced data for a 
specific technique.  One of the barriers has been documenting what 
should be in such files.  The NIAC is just about ready to introduce the 
NeXus Definition Language (that has been engineered by instrument 
responsibles) to document what should be needed for a specific technique 
such as powder diffraction or SAS.  Yours truly has been working on the 
manual to help those new to NeXus learn how to use this resource.  It's 
not a requirement to use a NXDL specification when writing data but it 
can help to codify what some analysis program or scientific technique 
might require for processing.  Here's links to the draft manual in PDF 
and HTML forms:

     http://download.nexusformat.org/doc/NeXusManual.pdf
     http://download.nexusformat.org/doc/NeXusManual.html

Another response by the NIAC to the community expressed desire for human 
readable data is XML as an alternative (to HDF) for the underlying file 
format.  The current NeXus API now has support for writing and reading a 
"NeXus" file in XML.  For some, this is great news since data sets such 
as 1-D SAS are easily expressed by a few columns of numbers and rarely 
go beyond a few hundred, let alone a few thousand rows.  Other 
techniques, such as 2-D SAS or even to an extreme, tomography and 
protein crystallography, cannot suffer the performance penalties of 
being written in a TEXT (ASCII or utf-8) file such as XML.  For these, 
HDF is the common best choice of many.

So, my summary is thus:
  * The NIAC has been listening and is trying to meet the community needs.
  * I believe you are describing the need to communicate not raw data but
     processed data as input for common analysis routines.
  * Other techniques than yours have also expressed this need.
  * NeXus is capable of handling these needs.
  * NeXus is a 100% volunteer effort and is always looking for more helpers.

I welcome your input here.

Regards,
    Pete



On 3/9/10 7:37 AM, Wuttke, Joachim wrote:
> Dear colleagues,
>
> I am currently preparing a deliberately provocative memo with
> working title »Why don't we have better data processing software
> for quasielastic neutron scattering ?«. One section in this paper
> will deal with data storage, and in its present form, it is quite an
> attack on NeXus. To play fair, I post it here, looking forward for
> your comments. Maybe you will convince me that I am mistaken.
>
> Looking forward to a sound discussion - Joachim
>
>
> Though all raw data produced by QENS instruments have basically the
> same structure, many different storage formats are in use.
> Therefore, porting data processing software from one instrument
> to another is generally not possible without
> adapting at least a read-in routine or providing a raw-data conversion tool.
> This is a severe nuisance for users,
> and an obstacle for code sharing and collaborative software development.
> For these reasons,
> it is a popular idea that efforts to improve the software environment
> should start with the adoption of a \textsl{common raw data format} ---
> I shall call this strategy \textsl{data format first}.
>
> The common raw data format of our time will be NeXus, if any.
> Under development since more than 15 years,
> NeXus~\cite{qda3} addresses neutron as well as X-ray scattering.
> It enjoys strong political backing,
> as evidenced by an International Advisory Committee
> with delegates from all major facilities.
> A growing number of new spectrometers actually use NeXus,
> be it by choice or forced by site policy;
> on the other hand, so far only few existing instruments have migrated.
>
> When writing the instrument software for SPHERES,
> I consciously opted against NeXus,
> in favor of a less rigid self-defined format
> that is easier to read by a human,
> thereby facilitating the debugging of data acquisition and
> raw data processing software.
> Maybe, my wishes could have been accomodated within NeXus,
> had I communicated more intensely with the project team.
> However, I have more fundamental objections ---
> not against NeXus itself,
> but against unrealistic promises,
> against overestimating data formats,
> against the flawed strategy \textsl{data format first}.
>
> Unifying data formats reminds me of church history:
> attempts to (re)unify $n$ different denominations regularly
> result in $n+1$ denominations being around:
> the new, unified church, plus all the groups that split off
> to preserve the good old faith of their own.
> When migrating an existing spectrometer towards NeXus,
> the instrument scientist needs either to support for long time
> read-in routines for both the old and the new data format,
> or to provide routines that achieve lossless conversion from the old
> into the new format.
> Choosing NeXus as raw data format is not sufficient to guarantee
> that data from different instruments can be read by the same software.
> For instance, at SPHERES,
> energy calibration is done at acquisition time,
> and energy transfers $\hbar\omega$ are part of the raw data set.
> At the ILL backscattering spectrometers,
> only a few hardware parameters are stored from which
> the downstream software must construct the energy scale.
> Translating the current output format into something looking like NeXus
> would not make the raw data files mutually legible.
> Unifying raw data formats is not possible without unifying
> data acquisition programs ---
> which will be rarely feasible
> because in most cases the hardware is too different.
>
> Some time ago,
> NeXus may have been attractive for developers
> because its rich application programming interface (API)
> relieved them from implementing write-out and read-in routines.
> However, this advantage has vanished because
> modern generic data formats like YAML \cite{qda5}
> allow to store and retrieve
> complex data, composed of scalars, hashes, arrays
> in arbitrary tree-like structures,
> at zero cost through a much simpler API.
>
> Most fundamentally,
> I think that efforts to unify the raw data format
> are adressing the wrong interface:
> most users do not want to see raw data at all.
> What users want is a calibrated, normalized, reasonably binned
> scattering law $S(q,\omega)$.
> What should be standardized is the procedure to obtain such $S(q,\omega)$.
> While most of this procedure can be implemented in quite a generic way,
> it will remain the instrument scientist's resposibility
> to plug in a low-level routine that reads in and calibrates the
> raw data from his instrument.
> Only he has the technical knowledge required to do it correctly,
> and hardly anybody else needs to care about the raw data and their format.
>
> ------------------------------------------------------------------------------------------------
> ------------------------------------------------------------------------------------------------
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzende des Aufsichtsrats: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
> Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Prof. Dr. Sebastian M. Schmidt
> ------------------------------------------------------------------------------------------------
> ------------------------------------------------------------------------------------------------
> _______________________________________________
> NeXus mailing list
> NeXus at nexusformat.org
> http://lists.nexusformat.org/mailman/listinfo/nexus
>    

-- 


----------------------------------------------------------
  Pete R. Jemian, Ph.D.<jemian at anl.gov>
  Beam line Controls and Data Acquisition, Group Leader
  Advanced Photon Source,   Argonne National Laboratory
  Argonne, IL  60439                   630 - 252 - 3189
-----------------------------------------------------------
     Education is the one thing for which people
        are willing to pay yet not receive.
-----------------------------------------------------------



More information about the NeXus mailing list