[Nexus] HDF5 as NeXus file format

Andrew GOETZ andy.gotz at esrf.fr
Tue Apr 12 12:43:39 BST 2022


Hi All,


I have followed this discussion with interest and would like to say I 
agree with all the replies of being cautious about adding more file 
formats to be officially supported by Nexus. It would be a classic case 
of throwing the baby out with the bathwater! In this case HDF5 is the 
baby and the bathwater is the infrastructure (file system, network, 
etc.). Nexus/HDF5 can be seen as the reference implementation which is 
the official (i.e. validated) implementation of Nexus. Other 
implementations (e.g. based on zarr etc.) are alternatives which could 
be shared as contributions derived from the reference implementation.


Gabriel, I understand the need to have very high performance or 
different file formats but these should be supported locally and mapped 
to Nexus definitions if necessary as a local exercise. For example we 
have done this at the ESRF for the ICAT database mapping Nexus 
definitions to ICAT parameters. We also support other file formats when 
we have to e.g. jpeg2000. The community is there to share such 
developments. I would not however burden the Nexus committee with  
having to officially support other file formats and I would definitely 
not develop our own API which is challenging in its own right but also a 
long term maintenance issue. The current version of Nexus relies 
strongly on HDF5 so moving to another format would be difficult, it 
would also create confusion for users who are in the process of moving 
to Nexus/HDF5 if we say to them now to adopt another file format - it is 
challenging enough to change their codes to read Nexus/HDF5. Most of our 
problems can still be solved with HDF5 so the best is to ask others how 
they solved the same problem to see if they have been solved already.


I recommend you (and all European HDF5 users) to attend the next HDF 
User Group workshop in Europe 
(https://www.hdfgroup.org/hug/europeanhug22/) which will take place not 
far away from ALBA in the south of France. This is an opportunity to 
discuss with the HDF5 developers and other users on your problems and 
how to solve them.


Kind regards


Andy


On 12/04/2022 12:50, Watts Benjamin (PSI) via NeXus wrote:
>
> Hi Everyone,
>
>    My understanding is that the core NeXus mission has only concerned 
> the organisation of the data. The container file format was only 
> within our scope for the practical reasons of demonstrating how to 
> implement the NeXus structures. We recommended HDF5 because we did not 
> see viable alternatives for a long time.
>
>
> I think that there are both practical and ideological aspects to the 
> discussion about file formats for NeXus and it is a bit confronting 
> for some of us because we see conflict between these aspects. On the 
> one hand, we should encourage people using file formats other than 
> HDF5 and XML to organise their data using NeXus. On the other hand, we 
> don't want to encourage a "wild west" in file formats where reading 
> archived files becomes a hunt for obscure libraries. On this point - 
> what is the role that NeXus should play? Certainly, NeXus should only 
> promote file formats that are capable of expressing the NeXus 
> hierarchies. NeXus should also document how the data hierarchies are 
> expressed in each file format (I think we can do this in a more formal 
> way than we currently do). If NeXus is to do any discrimination 
> between file formates, I would like for us to come up with a set of 
> criteria for file formats to meet in order to gain NeXus approval. 
> NeXus does not currently have the resources to support further file 
> formats in the NeXus libraries (NAPI) and perhaps community support 
> could be one of the criteria for approval.
>
>
> Cheers,
>
> Ben
>
>
> ------------------------------------------------------------------------
> *From:* Osborn, Raymond <rosborn at anl.gov>
> *Sent:* Thursday, 7 April 2022 9:42:50 PM
> *To:* Discussion forum for the NeXus data format
> *Cc:* Tobias Richter; nexus at nexusformat.org; Watts Benjamin (PSI); 
> Alexander Debus; Nicolas Soler; Franz Pöschel; Emilio Centeno Ortiz
> *Subject:* Re: [Nexus] HDF5 as NeXus file format
> Hi everyone,
> NeXus was originally both a standard for defining how data should be 
> organized in files and a unified API for reading and writing those 
> files, irrespective of the physical file format, HDF4, HDF5, or XML. 
> Thanks mainly to the work that Mark Koennecke put into the C-API, it 
> achieved both goals seamlessly. I think it was a remarkable technical 
> achievement. To some extent, the two components were independent of 
> each other, but not entirely. For example, HDF4 did not allow group 
> attributes, so for a long time, they were not part of the definitions. 
> Now that HDF4 is obsolete, the NeXus definitions have evolved to use 
> more and more of the features that HDF5 provides, many of which have 
> become an essential part of the standard.
>
> I think there are therefore two questions that need to be clarified, 
> preferably before we next meet. The first is whether the proposed 
> alternative physical file formats will allow the same type of 
> hierarchical data organization and features that have now become part 
> of the standard. If they didn’t, this would probably be a 
> deal-breaker, unless there is a way of interfacing, say, regular HDF5 
> NeXus files to the new file formats, in much the same way that HDF5 
> handles external links.
>
> The second is whether there is a need to revive the API to handle the 
> new formats, so that we are not faced with the situation that Tobias 
> warned about with a large fraction of the user community unable to 
> read such files. The reason the API became deprecated is, in my view, 
> that we have depended too much on one or two people (i.e., Mark in the 
> case of the C-API) to take all the responsibility for maintaining it, 
> so that supporting new HDF5 features, such as virtual data sets or 
> even variable-length-strings, fell entirely on them. While an 
> increasing number of facilities have become dependent on NeXus, very 
> few of them, if any, have committed resources to maintaining it, so it 
> has become an activity that many of us squeeze in when we have free 
> time from our other responsibilities. If facilities like Alba, and 
> presumably others, feel that they are being held back by technical 
> limitations of HDF5, then it might be necessary to rethink the NeXus 
> support model, so that API development is revived by being integrated 
> into other facility software development projects, such as Mantid or 
> Dials. It would probably save the facilities money in the long term, 
> but it would require NeXus to be less of a part-time activity. Or we 
> need to encourage more people to contribute, in the way the more 
> successful open-source projects are run. I would certainly welcome 
> additional support for the Python API for the same reasons.
>
> With best regards,
> Ray
> -- 
> Ray Osborn, Senior Scientist
> Materials Science Division
> Argonne National Laboratory
> Argonne, IL 60439, USA
> Phone: +1 (630) 252-9011
> Email: ROsborn at anl.gov <mailto:ROsborn at anl.gov>
>
>
>> On Apr 7, 2022, at 12:16 PM, Watts Benjamin (PSI) via NeXus 
>> <nexus at shadow.nd.rl.ac.uk> wrote:
>>
>> Hi Tobias,
>>    I agree that we have spent the past decade recommending HDF5, but 
>> at the same time we have been saying that NeXus is about the 
>> organisation of data within the container, not the container file 
>> format itself. I can see that you feel surprised by the sudden 
>> discussion of other file formats, just as I was surprised to hear the 
>> community talking about it at the recent NFDI workshop. I am only 
>> advocating for discussion and I would like to hear your input.I agree 
>> that multiple backends will introduce higher support requirements 
>> that NeXus doesn't currently have resources for.Your preference for 
>> NeXus referring to a (small) defined set of file formats is a valid 
>> view point and I agree that there are practicalities to consider. I 
>> view the next telco as just the restart of an old discussion and 
>> nothing can really change until a proposal is brought to a NIAC 
>> meeting (and accepted).
>>
>> Cheers,
>> Ben
>> ------------------------------------------------------------------------
>> *From:*Tobias Richter <Tobias.Richter at ess.eu>
>> *Sent:*Thursday, 7 April 2022 6:23:43 PM
>> *To:*Discussion forum for the NeXus data format; nexus at nexusformat.org
>> *Cc:*Watts Benjamin (PSI); Alexander Debus; Nicolas Soler; Franz 
>> Pöschel; Emilio Centeno Ortiz
>> *Subject:*Re: [Nexus] HDF5 as NeXus file format
>> Hi all,
>> According to this discussion from a decade ago HDF5 is singled out as 
>> the only preferred physical file 
>> format:https://www.nexusformat.org/NIAC2012
>>
>>   * NeXus guiding statements:
>>       o The main focus of the NeXus community is to further develop
>>         the dictionaries, base classes and application definitions.
>>       o The NIAC is a forum for resolving issues.
>>       o The NIAC acts as a custodian for NeXus: definitions,
>>         examples, documentation, reference implementations.
>>       o NeXus can be mapped to different physical file formats:
>>           + HDF5 is the preferred physical file format.
>>           + NeXus-XML is the currently supported ASCII file format.
>>
>> Technically you can map into different backends. Yes. XML is still 
>> sort of supported. At the time it was requested that other options 
>> would get the official blessing (YAML being specifically asked for) 
>> but for practical exchange between facilities the consensus was to 
>> stick with HDF5.
>> There no longer is a recommended or fully maintained abstraction 
>> layer (like NAPI) to do an on the fly translation between backends. 
>> Who would be in charge of defining how the “official” mapping into 
>> sqlite or whatever would look like? How many backends can the 
>> community commit to support in the long run?
>> When tools that are currently developed/supported/maintained to 
>> “read” NeXus/HDF5 fail to work with what gets handed out, we are in a 
>> worse situation than we are now. Note: What facilities do internally 
>> for performance optimisations or other reasons could be different, if 
>> it stays internal. But I am clearly a lot less open minded about 
>> producing non-HDF5 files with a “NeXus” label than Ben. Maybe I 
>> missed some decisions that were taken lately in this direction or 
>> we’re no longer interested in being able to read each other’s files. 
>> Should either be the case, I’ll be quiet.
>> Best wishes,
>> Tobias
>> *From:*NeXus <nexus-bounces at shadow.nd.rl.ac.uk> on behalf of "Watts 
>> Benjamin (PSI) via NeXus" <nexus at shadow.nd.rl.ac.uk>
>> *Reply to:*Discussion forum for the NeXus data format 
>> <nexus at shadow.nd.rl.ac.uk>
>> *Date:*Thursday, 7 April 2022 at 17:10
>> *To:*"nexus at nexusformat.org" <nexus at nexusformat.org>
>> *Cc:*Benjamin Watts <benjamin.watts at psi.ch>, Alexander Debus 
>> <a.debus at hzdr.de>, Nicolas Soler <nsoler at cells.es>, Franz Pöschel 
>> <f.poeschel at hzdr.de>, Emilio Centeno Ortiz <ecenteno at cells.es>
>> *Subject:*Re: [Nexus] HDF5 as NeXus file format
>> Hi Gabriel,
>>    NeXus is officially not dependent on HDF5 and we are definitely 
>> open minded about implementing the NeXus/data format/on/file 
>> formats/other than HDF5. We plan to discuss such issues at ournext 
>> teleconference <https://www.nexusformat.org/Telco_20220426.html>on 
>> April 26th and I invite you to join us. Are there specific container 
>> file formats that you are interested in?
>> Cheers,
>> Ben
>>
>> ------------------------------------------------------------------------
>> *From:*NeXus <nexus-bounces at shadow.nd.rl.ac.uk> on behalf of Gabriel 
>> Jover Manas via NeXus <nexus at shadow.nd.rl.ac.uk>
>> *Sent:*Thursday, 7 April 2022 4:47 PM
>> *To:*nexus at nexusformat.org
>> *Cc:*Gabriel Jover Manas; Nicolas Soler; Emilio Centeno Ortiz
>> *Subject:*[Nexus] HDF5 as NeXus file format
>> Dear NeXus Users Community,
>> Last NFDI NeXus Workshop was a great opportunity to meet the 
>> community and learn from the experience of other scientists and 
>> institutions.
>> Here at ALBA we are working on the integration of NeXus files in our 
>> data analysis workflows.
>> In this scope we are interested on investigating alternatives to hdf5 
>> as NeXus file format, in terms of slice read/write performance, 
>> read-write-many capabilities and convenient reading.
>> Would the community be open to decoupling the data format (NeXus) 
>> from the file format (HDF5)?
>> Is there already any effort in the community in this direction?
>> Is anyone else also interested?
>> Best regards,
>> Gabriel
>> --
>> Image removed by sender. ALBA Synchrotron 
>> <http://www.albasynchrotron.es/>
>> 	
>> *Gabriel Jover-Mañas*
>> Scientific Data Management
>> Computing Division
>> *ALBA SYNCHROTRON LIGHT SOURCE*
>> Carrer de la Llum 2-26 | 08290 | Cerdanyola del Vallès| Barcelona | 
>> Spain<http://www.albasynchrotron.es/en/about/coming-to-alba>
>> (+34) 93 592 4471
>> *www.albasynchrotron.es 
>> <http://www.albasynchrotron.es/>**|**Gabriel.Jover at cells.es<mailto:Gabriel.Jover at cells.es>|legal 
>> notice <https://www.albasynchrotron.es/en/about/legal-notice>*
>> *Please, do not print this e-mail unless it is absolutely necessary.
>> *Si heu rebut aquest correu per error, us informo que pot contenir 
>> informació confidencial i privada i que està prohibit el seu ús. Us 
>> agrairíem que ho comuniqueu al remitent i l'elimineu. Gràcies.
>> Si ha recibido este correo por error, le informo de que puede 
>> contener información confidencial y privada y que está prohibido su 
>> uso. Le agradeceré que lo comunique a su remitente y lo elimine. Gracias.
>> If you have received this e-mail in error, please note that it may 
>> contain confidential and private information, therefore, the use of 
>> this information is strictly forbidden. Please inform the sender of 
>> the error and delete the information received. Thank you.
>>
>> _______________________________________________
>> NeXus mailing list
>> NeXus at nexusformat.org
>> https://lists.nexusformat.org/mailman/listinfo/nexus
>
>
> _______________________________________________
> NeXus mailing list
> NeXus at nexusformat.org
> https://lists.nexusformat.org/mailman/listinfo/nexus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.nexusformat.org/pipermail/nexus/attachments/20220412/9648d25d/attachment-0001.htm>


More information about the NeXus mailing list