[Nexus] HDF5 as NeXus file format

Osborn, Raymond rosborn at anl.gov
Thu Apr 7 20:42:50 BST 2022


Hi everyone,
NeXus was originally both a standard for defining how data should be organized in files and a unified API for reading and writing those files, irrespective of the physical file format, HDF4, HDF5, or XML. Thanks mainly to the work that Mark Koennecke put into the C-API, it achieved both goals seamlessly. I think it was a remarkable technical achievement. To some extent, the two components were independent of each other, but not entirely. For example, HDF4 did not allow group attributes, so for a long time, they were not part of the definitions. Now that HDF4 is obsolete, the NeXus definitions have evolved to use more and more of the features that HDF5 provides, many of which have become an essential part of the standard.

I think there are therefore two questions that need to be clarified, preferably before we next meet. The first is whether the proposed alternative physical file formats will allow the same type of hierarchical data organization and features that have now become part of the standard. If they didn’t, this would probably be a deal-breaker, unless there is a way of interfacing, say, regular HDF5 NeXus files to the new file formats, in much the same way that HDF5 handles external links.

The second is whether there is a need to revive the API to handle the new formats, so that we are not faced with the situation that Tobias warned about with a large fraction of the user community unable to read such files. The reason the API became deprecated is, in my view, that we have depended too much on one or two people (i.e., Mark in the case of the C-API) to take all the responsibility for maintaining it, so that supporting new HDF5 features, such as virtual data sets or even variable-length-strings, fell entirely on them. While an increasing number of facilities have become dependent on NeXus, very few of them, if any, have committed resources to maintaining it, so it has become an activity that many of us squeeze in when we have free time from our other responsibilities. If facilities like Alba, and presumably others, feel that they are being held back by technical limitations of HDF5, then it might be necessary to rethink the NeXus support model, so that API development is revived by being integrated into other facility software development projects, such as Mantid or Dials. It would probably save the facilities money in the long term, but it would require NeXus to be less of a part-time activity. Or we need to encourage more people to contribute, in the way the more successful open-source projects are run. I would certainly welcome additional support for the Python API for the same reasons.

With best regards,
Ray
--
Ray Osborn, Senior Scientist
Materials Science Division
Argonne National Laboratory
Argonne, IL 60439, USA
Phone: +1 (630) 252-9011
Email: ROsborn at anl.gov<mailto:ROsborn at anl.gov>


On Apr 7, 2022, at 12:16 PM, Watts Benjamin (PSI) via NeXus <nexus at shadow.nd.rl.ac.uk<mailto:nexus at shadow.nd.rl.ac.uk>> wrote:

Hi Tobias,
   I agree that we have spent the past decade recommending HDF5, but at the same time we have been saying that NeXus is about the organisation of data within the container, not the container file format itself. I can see that you feel surprised by the sudden discussion of other file formats, just as I was surprised to hear the community talking about it at the recent NFDI workshop. I am only advocating for discussion and I would like to hear your input. I agree that multiple backends will introduce higher support requirements that NeXus doesn't currently have resources for. Your preference for NeXus referring to a (small) defined set of file formats is a valid view point and I agree that there are practicalities to consider. I view the next telco as just the restart of an old discussion and nothing can really change until a proposal is brought to a NIAC meeting (and accepted).

Cheers,
Ben
________________________________
From: Tobias Richter <Tobias.Richter at ess.eu<mailto:Tobias.Richter at ess.eu>>
Sent: Thursday, 7 April 2022 6:23:43 PM
To: Discussion forum for the NeXus data format; nexus at nexusformat.org<mailto:nexus at nexusformat.org>
Cc: Watts Benjamin (PSI); Alexander Debus; Nicolas Soler; Franz Pöschel; Emilio Centeno Ortiz
Subject: Re: [Nexus] HDF5 as NeXus file format

Hi all,

According to this discussion from a decade ago HDF5 is singled out as the only preferred physical file format: https://www.nexusformat.org/NIAC2012


  *   NeXus guiding statements:
     *   The main focus of the NeXus community is to further develop the dictionaries, base classes and application definitions.
     *   The NIAC is a forum for resolving issues.
     *   The NIAC acts as a custodian for NeXus: definitions, examples, documentation, reference implementations.
     *   NeXus can be mapped to different physical file formats:
        *   HDF5 is the preferred physical file format.
        *   NeXus-XML is the currently supported ASCII file format.


Technically you can map into different backends. Yes. XML is still sort of supported. At the time it was requested that other options would get the official blessing (YAML being specifically asked for) but for practical exchange between facilities the consensus was to stick with HDF5.

There no longer is a recommended or fully maintained abstraction layer (like NAPI) to do an on the fly translation between backends. Who would be in charge of defining how the “official” mapping into sqlite or whatever would look like? How many backends can the community commit to support in the long run?

When tools that are currently developed/supported/maintained to “read” NeXus/HDF5 fail to work with what gets handed out, we are in a worse situation than we are now. Note: What facilities do internally for performance optimisations or other reasons could be different, if it stays internal. But I am clearly a lot less open minded about producing non-HDF5 files with a “NeXus” label than Ben. Maybe I missed some decisions that were taken lately in this direction or we’re no longer interested in being able to read each other’s files. Should either be the case, I’ll be quiet.

Best wishes,
Tobias


From: NeXus <nexus-bounces at shadow.nd.rl.ac.uk<mailto:nexus-bounces at shadow.nd.rl.ac.uk>> on behalf of "Watts Benjamin (PSI) via NeXus" <nexus at shadow.nd.rl.ac.uk<mailto:nexus at shadow.nd.rl.ac.uk>>
Reply to: Discussion forum for the NeXus data format <nexus at shadow.nd.rl.ac.uk<mailto:nexus at shadow.nd.rl.ac.uk>>
Date: Thursday, 7 April 2022 at 17:10
To: "nexus at nexusformat.org<mailto:nexus at nexusformat.org>" <nexus at nexusformat.org<mailto:nexus at nexusformat.org>>
Cc: Benjamin Watts <benjamin.watts at psi.ch<mailto:benjamin.watts at psi.ch>>, Alexander Debus <a.debus at hzdr.de<mailto:a.debus at hzdr.de>>, Nicolas Soler <nsoler at cells.es<mailto:nsoler at cells.es>>, Franz Pöschel <f.poeschel at hzdr.de<mailto:f.poeschel at hzdr.de>>, Emilio Centeno Ortiz <ecenteno at cells.es<mailto:ecenteno at cells.es>>
Subject: Re: [Nexus] HDF5 as NeXus file format

Hi Gabriel,
   NeXus is officially not dependent on HDF5 and we are definitely open minded about implementing the NeXus data format on file formats other than HDF5. We plan to discuss such issues at our next teleconference<https://www.nexusformat.org/Telco_20220426.html> on April 26th and I invite you to join us. Are there specific container file formats that you are interested in?

Cheers,
Ben

________________________________
From: NeXus <nexus-bounces at shadow.nd.rl.ac.uk<mailto:nexus-bounces at shadow.nd.rl.ac.uk>> on behalf of Gabriel Jover Manas via NeXus <nexus at shadow.nd.rl.ac.uk<mailto:nexus at shadow.nd.rl.ac.uk>>
Sent: Thursday, 7 April 2022 4:47 PM
To: nexus at nexusformat.org<mailto:nexus at nexusformat.org>
Cc: Gabriel Jover Manas; Nicolas Soler; Emilio Centeno Ortiz
Subject: [Nexus] HDF5 as NeXus file format

Dear NeXus Users Community,
Last NFDI NeXus Workshop was a great opportunity to meet the community and learn from the experience of other scientists and institutions.
Here at ALBA we are working on the integration of NeXus files in our data analysis workflows.
In this scope we are interested on investigating alternatives to hdf5 as NeXus file format, in terms of slice read/write performance, read-write-many capabilities and convenient reading.
Would the community be open to decoupling the data format (NeXus) from the file format (HDF5)?
Is there already any effort in the community in this direction?
Is anyone else also interested?

Best regards,
Gabriel

--
[Image removed by sender. ALBA Synchrotron]<http://www.albasynchrotron.es/>
Gabriel Jover-Mañas
Scientific Data Management
Computing Division

ALBA SYNCHROTRON LIGHT SOURCE
Carrer de la Llum 2-26 | 08290 | Cerdanyola del Vallès| Barcelona | Spain <http://www.albasynchrotron.es/en/about/coming-to-alba>
(+34) 93 592 4471
www.albasynchrotron.es<http://www.albasynchrotron.es/> | Gabriel.Jover at cells.es <mailto:Gabriel.Jover at cells.es> | legal notice<https://www.albasynchrotron.es/en/about/legal-notice>

Please, do not print this e-mail unless it is absolutely necessary.
Si heu rebut aquest correu per error, us informo que pot contenir informació confidencial i privada i que està prohibit el seu ús. Us agrairíem que ho comuniqueu al remitent i l'elimineu. Gràcies.
Si ha recibido este correo por error, le informo de que puede contener información confidencial y privada y que está prohibido su uso. Le agradeceré que lo comunique a su remitente y lo elimine. Gracias.
If you have received this e-mail in error, please note that it may contain confidential and private information, therefore, the use of this information is strictly forbidden. Please inform the sender of the error and delete the information received. Thank you.

_______________________________________________
NeXus mailing list
NeXus at nexusformat.org<mailto:NeXus at nexusformat.org>
https://lists.nexusformat.org/mailman/listinfo/nexus

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.nexusformat.org/pipermail/nexus/attachments/20220407/03104cbe/attachment-0003.htm>


More information about the NeXus mailing list