[Nexus] HDF5 as NeXus file format

Gabriel Jover Manas gjover at cells.es
Tue Apr 12 10:51:24 BST 2022


Hi everyone,

As Benjamin commented, at recent NFDI workshop I heard about alternative 
file formats to fulfill some features that hdf5 may lack as slice 
read/write performance, read-write-many capabilities and convenient 
reading.

We don't have a lot of experience with hdf5 and it might be that some of 
these features can be covered with an hdf5 virtual layer or other means.

I have the impression that NeXus file format will become the standard in 
PaNosc/ExPands european projects and we are working on the integration 
of NeXus files in our data analysis workflows.

Our question to the community is precisely to know your experience with 
hdf5, it's limitations, as well as to survey if there is interest for a 
join effort to investigate how hdf5 compares with other file format as 
zarr, adios or even exdir.

Thank you for your invitation to next meeting on April 26th. I would 
have join but I have an other appointment at that time. Any how, someone 
else from Alba will join.

Kind regards,

Gabriel



On 7/4/22 21:42, Osborn, Raymond via NeXus wrote:
> Hi everyone,
> NeXus was originally both a standard for defining how data should be 
> organized in files and a unified API for reading and writing those 
> files, irrespective of the physical file format, HDF4, HDF5, or XML. 
> Thanks mainly to the work that Mark Koennecke put into the C-API, it 
> achieved both goals seamlessly. I think it was a remarkable technical 
> achievement. To some extent, the two components were independent of 
> each other, but not entirely. For example, HDF4 did not allow group 
> attributes, so for a long time, they were not part of the definitions. 
> Now that HDF4 is obsolete, the NeXus definitions have evolved to use 
> more and more of the features that HDF5 provides, many of which have 
> become an essential part of the standard.
>
> I think there are therefore two questions that need to be clarified, 
> preferably before we next meet. The first is whether the proposed 
> alternative physical file formats will allow the same type of 
> hierarchical data organization and features that have now become part 
> of the standard. If they didn’t, this would probably be a 
> deal-breaker, unless there is a way of interfacing, say, regular HDF5 
> NeXus files to the new file formats, in much the same way that HDF5 
> handles external links.
>
> The second is whether there is a need to revive the API to handle the 
> new formats, so that we are not faced with the situation that Tobias 
> warned about with a large fraction of the user community unable to 
> read such files. The reason the API became deprecated is, in my view, 
> that we have depended too much on one or two people (i.e., Mark in the 
> case of the C-API) to take all the responsibility for maintaining it, 
> so that supporting new HDF5 features, such as virtual data sets or 
> even variable-length-strings, fell entirely on them. While an 
> increasing number of facilities have become dependent on NeXus, very 
> few of them, if any, have committed resources to maintaining it, so it 
> has become an activity that many of us squeeze in when we have free 
> time from our other responsibilities. If facilities like Alba, and 
> presumably others, feel that they are being held back by technical 
> limitations of HDF5, then it might be necessary to rethink the NeXus 
> support model, so that API development is revived by being integrated 
> into other facility software development projects, such as Mantid or 
> Dials. It would probably save the facilities money in the long term, 
> but it would require NeXus to be less of a part-time activity. Or we 
> need to encourage more people to contribute, in the way the more 
> successful open-source projects are run. I would certainly welcome 
> additional support for the Python API for the same reasons.
>
> With best regards,
> Ray
> -- 
> Ray Osborn, Senior Scientist
> Materials Science Division
> Argonne National Laboratory
> Argonne, IL 60439, USA
> Phone: +1 (630) 252-9011
> Email: ROsborn at anl.gov <mailto:ROsborn at anl.gov>
>
>
>> On Apr 7, 2022, at 12:16 PM, Watts Benjamin (PSI) via NeXus 
>> <nexus at shadow.nd.rl.ac.uk> wrote:
>>
>> Hi Tobias,
>>    I agree that we have spent the past decade recommending HDF5, but 
>> at the same time we have been saying that NeXus is about the 
>> organisation of data within the container, not the container file 
>> format itself. I can see that you feel surprised by the sudden 
>> discussion of other file formats, just as I was surprised to hear the 
>> community talking about it at the recent NFDI workshop. I am only 
>> advocating for discussion and I would like to hear your input.I agree 
>> that multiple backends will introduce higher support requirements 
>> that NeXus doesn't currently have resources for.Your preference for 
>> NeXus referring to a (small) defined set of file formats is a valid 
>> view point and I agree that there are practicalities to consider. I 
>> view the next telco as just the restart of an old discussion and 
>> nothing can really change until a proposal is brought to a NIAC 
>> meeting (and accepted).
>>
>> Cheers,
>> Ben
>> ------------------------------------------------------------------------
>> *From:*Tobias Richter <Tobias.Richter at ess.eu>
>> *Sent:*Thursday, 7 April 2022 6:23:43 PM
>> *To:*Discussion forum for the NeXus data format; nexus at nexusformat.org
>> *Cc:*Watts Benjamin (PSI); Alexander Debus; Nicolas Soler; Franz 
>> Pöschel; Emilio Centeno Ortiz
>> *Subject:*Re: [Nexus] HDF5 as NeXus file format
>> Hi all,
>> According to this discussion from a decade ago HDF5 is singled out as 
>> the only preferred physical file 
>> format:https://www.nexusformat.org/NIAC2012
>>
>>   * NeXus guiding statements:
>>       o The main focus of the NeXus community is to further develop
>>         the dictionaries, base classes and application definitions.
>>       o The NIAC is a forum for resolving issues.
>>       o The NIAC acts as a custodian for NeXus: definitions,
>>         examples, documentation, reference implementations.
>>       o NeXus can be mapped to different physical file formats:
>>           + HDF5 is the preferred physical file format.
>>           + NeXus-XML is the currently supported ASCII file format.
>>
>> Technically you can map into different backends. Yes. XML is still 
>> sort of supported. At the time it was requested that other options 
>> would get the official blessing (YAML being specifically asked for) 
>> but for practical exchange between facilities the consensus was to 
>> stick with HDF5.
>> There no longer is a recommended or fully maintained abstraction 
>> layer (like NAPI) to do an on the fly translation between backends. 
>> Who would be in charge of defining how the “official” mapping into 
>> sqlite or whatever would look like? How many backends can the 
>> community commit to support in the long run?
>> When tools that are currently developed/supported/maintained to 
>> “read” NeXus/HDF5 fail to work with what gets handed out, we are in a 
>> worse situation than we are now. Note: What facilities do internally 
>> for performance optimisations or other reasons could be different, if 
>> it stays internal. But I am clearly a lot less open minded about 
>> producing non-HDF5 files with a “NeXus” label than Ben. Maybe I 
>> missed some decisions that were taken lately in this direction or 
>> we’re no longer interested in being able to read each other’s files. 
>> Should either be the case, I’ll be quiet.
>> Best wishes,
>> Tobias
>> *From:*NeXus <nexus-bounces at shadow.nd.rl.ac.uk> on behalf of "Watts 
>> Benjamin (PSI) via NeXus" <nexus at shadow.nd.rl.ac.uk>
>> *Reply to:*Discussion forum for the NeXus data format 
>> <nexus at shadow.nd.rl.ac.uk>
>> *Date:*Thursday, 7 April 2022 at 17:10
>> *To:*"nexus at nexusformat.org" <nexus at nexusformat.org>
>> *Cc:*Benjamin Watts <benjamin.watts at psi.ch>, Alexander Debus 
>> <a.debus at hzdr.de>, Nicolas Soler <nsoler at cells.es>, Franz Pöschel 
>> <f.poeschel at hzdr.de>, Emilio Centeno Ortiz <ecenteno at cells.es>
>> *Subject:*Re: [Nexus] HDF5 as NeXus file format
>> Hi Gabriel,
>>    NeXus is officially not dependent on HDF5 and we are definitely 
>> open minded about implementing the NeXus/data format/on/file 
>> formats/other than HDF5. We plan to discuss such issues at ournext 
>> teleconference <https://www.nexusformat.org/Telco_20220426.html>on 
>> April 26th and I invite you to join us. Are there specific container 
>> file formats that you are interested in?
>> Cheers,
>> Ben
>>
>> ------------------------------------------------------------------------
>> *From:*NeXus <nexus-bounces at shadow.nd.rl.ac.uk> on behalf of Gabriel 
>> Jover Manas via NeXus <nexus at shadow.nd.rl.ac.uk>
>> *Sent:*Thursday, 7 April 2022 4:47 PM
>> *To:*nexus at nexusformat.org
>> *Cc:*Gabriel Jover Manas; Nicolas Soler; Emilio Centeno Ortiz
>> *Subject:*[Nexus] HDF5 as NeXus file format
>> Dear NeXus Users Community,
>> Last NFDI NeXus Workshop was a great opportunity to meet the 
>> community and learn from the experience of other scientists and 
>> institutions.
>> Here at ALBA we are working on the integration of NeXus files in our 
>> data analysis workflows.
>> In this scope we are interested on investigating alternatives to hdf5 
>> as NeXus file format, in terms of slice read/write performance, 
>> read-write-many capabilities and convenient reading.
>> Would the community be open to decoupling the data format (NeXus) 
>> from the file format (HDF5)?
>> Is there already any effort in the community in this direction?
>> Is anyone else also interested?
>> Best regards,
>> Gabriel
>> --
>> Image removed by sender. ALBA Synchrotron 
>> <http://www.albasynchrotron.es/>
>> 	
>> *Gabriel Jover-Mañas*
>> Scientific Data Management
>> Computing Division
>> *ALBA SYNCHROTRON LIGHT SOURCE*
>> Carrer de la Llum 2-26 | 08290 | Cerdanyola del Vallès| Barcelona | 
>> Spain<http://www.albasynchrotron.es/en/about/coming-to-alba>
>> (+34) 93 592 4471
>> *www.albasynchrotron.es 
>> <http://www.albasynchrotron.es/>**|**Gabriel.Jover at cells.es<mailto:Gabriel.Jover at cells.es>|legal 
>> notice <https://www.albasynchrotron.es/en/about/legal-notice>*
>> *Please, do not print this e-mail unless it is absolutely necessary.
>> *Si heu rebut aquest correu per error, us informo que pot contenir 
>> informació confidencial i privada i que està prohibit el seu ús. Us 
>> agrairíem que ho comuniqueu al remitent i l'elimineu. Gràcies.
>> Si ha recibido este correo por error, le informo de que puede 
>> contener información confidencial y privada y que está prohibido su 
>> uso. Le agradeceré que lo comunique a su remitente y lo elimine. Gracias.
>> If you have received this e-mail in error, please note that it may 
>> contain confidential and private information, therefore, the use of 
>> this information is strictly forbidden. Please inform the sender of 
>> the error and delete the information received. Thank you.
>>
>> _______________________________________________
>> NeXus mailing list
>> NeXus at nexusformat.org
>> https://lists.nexusformat.org/mailman/listinfo/nexus
>
>
> _______________________________________________
> NeXus mailing list
> NeXus at nexusformat.org
> https://lists.nexusformat.org/mailman/listinfo/nexus
-- 

ALBA Synchrotron <http://www.albasynchrotron.es>

	Gabriel Jover-Mañas
Scientific Data Management
Computing Division

ALBA SYNCHROTRON LIGHT SOURCE
Carrer de la Llum 2-26 | 08290 | Cerdanyola del Vallès| Barcelona | 
Spain <http://www.albasynchrotron.es/en/about/coming-to-alba>
(+34) 93 592 4471
www.albasynchrotron.es <http://www.albasynchrotron.es>| 
Gabriel.Jover at cells.es | legal notice 
<https://www.albasynchrotron.es/en/about/legal-notice>

**Please, do not print this e-mail unless it is absolutely necessary.
**Si heu rebut aquest correu per error, us informo que pot contenir 
informació confidencial i privada i que està prohibit el seu ús. Us 
agrairíem que ho comuniqueu al remitent i l'elimineu. Gràcies.
Si ha recibido este correo por error, le informo de que puede contener 
información confidencial y privada y que está prohibido su uso. Le 
agradeceré que lo comunique a su remitente y lo elimine. Gracias.
If you have received this e-mail in error, please note that it may 
contain confidential and private information, therefore, the use of this 
information is strictly forbidden. Please inform the sender of the error 
and delete the information received. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.nexusformat.org/pipermail/nexus/attachments/20220412/96dcf52e/attachment-0001.htm>


More information about the NeXus mailing list