[NeXus-committee] Example of links

Raymond Osborn rayosborn at mac.com
Thu Jan 30 21:03:06 GMT 2025


Hi Paul,
Thanks for the follow-up questions. I will try to answer them below.

From: NeXus-committee <nexus-committee-bounces at shadow.nd.rl.ac.uk> on behalf of Paul Millar via NeXus-committee <nexus-committee at shadow.nd.rl.ac.uk>
Date: Thursday, January 30, 2025 at 12:07 PM
To: nexus-committee at nexusformat.org <nexus-committee at nexusformat.org>
Subject: Re: [NeXus-committee] Example of links

> Hi Ray,
>  
> Thanks for sharing these examples, for talking about the "target" attribute.
>  
> For me, this is very interesting.
>  
> I took the opportunity to read through the description of groups and 
> links in the HDF5 manual.  I've a background in storage and filesystem 
> programming, so the concepts in HDF5 make perfect sense to me: it's 
> (more or less) just the standard POSIX filesystem's namespace.  HDF5 
> even reuses some of the POSIX vocabulary.
>  
> What confuses me is the "target" attribute in NeXus.
>  
> As the NeXus Design page itself describes, hard links (i.e., the same 
> object being linked to under multiple groups) are symmetric. There is no 
> sense of source and destination.  Instead, hard links are simply being 
> able to refer to the same object via two (or more) paths.  Under HDF5, 
> these paths are equivalent: neither path is more important.
>  
>  From what I see, the NeXus "target" attribute seeks to break this 
> symmetry.  The "target" attribute's value is the absolute path of these 
> paths.  This makes the "target" path a preferred way of referring to the 
> object.
>  
> What I'm missing is why having a preferred path is necessary in NeXus.

If the reason for using links is to save space (e.g., adding the same sample information to multiple entries), then it probably doesn’t matter which is the parent and which the child. The purpose of the link could also be to ensure that, e.g., the sample lattice parameter is updated in every entry when it is changed in one of them. Again, none of the objects is obviously the parent.

However, there are important structural reasons for adding links with one of the objects as the parent. The most common use of links is in the NXdata group, where the axes are stored elsewhere. Here’s a shortened version of chopper.nxs, for example. 

>>> print(chopper.tree)
chopper:NXroot
    entry:NXentry
       data:NXdata
           @axes = ['polar_angle', 'time_of_flight']
           @signal = 'data'
           data = int32(148x750)
           polar_angle -> /entry/instrument/detector/polar_angle
           time_of_flight -> /entry/instrument/detector/time_of_flight
       instrument:NXinstrument
           detector:NXdetector
               distance = float32(148)
               polar_angle = float32(148)
               time_of_flight = float32(751)
               type = 'He3 gas cylinder'

Here the main NXdata group plots the data against polar angle and time-of-flight, both of which are properties of the detector and so are stored in ‘entry/instrument/detector’. If someone plotting the data wants to know about other detector properties, such as the sample-to-detector distance, those are also in the NXdetector group and the target attribute shows the user where to look. There could be multiple NXdetector groups, but the link identifies the right one. So the target attribute provides important functionality. In a data reduction script that wants to convert from time-of-flight to energy transfer, it is essential they know in which group the relevant distance fields are stored. That is only possible by making the object in the NXdetector group the parent and using the ’target’ attribute to point to it.

Ironically, I think this functional purpose is what led the Fairmat group to propose the ’target’ attribute, so the original reasoning was sound, if now forgotten.
 
> The NeXus Design page is somewhat coy about saying why a "target" 
> attribute is needed.  There's some vague mention of people getting 
> confused when using a particular tool, but nothing concrete.  If people 
> are confused, isn't this rather a problem with that tool or with how 
> NeXus is organising data?

The importance of links was crystal-clear to the original developers of NeXus twenty years ago for the reasons I described above. I hadn’t realized that this aspect of the standard was no longer understood. I guess we did a bad job of documenting it at the time.

> The page also includes some rather confusing use of terminology. The 
> page seemingly confuses "links" (all objects are accessible through at 
> least one link, if not they are garbage collected) with "hard linking" 
> (a common term for creating a new reference to some existing objects).

If documentation of NeXus links is intermingled with discussions of garbage collection, then it should be changed. 
>  
> The NeXus Design page also talks about the "original dataset" . This is 
> arguable wrong.  There is no "original dataset" since all hard links 
> refer to the same, single dataset. One might talk about the "original 
> path".  However, given two paths, what is it that makes one path "original"?

This may be clumsy wording, but I think the meaning in the above example is that ‘/entry/instrument/detector/time_of_flight’ is the “original dataset.” It is reproduced in the NXdata group to make plotting more convenient.
>  
> As a counter example using the "Linking in a NeXus file" diagram from 
> the NeXus Design page, with HDF5 semantics I could create the dataset in 
> one group (that happens to be NXdata) and then create a link to that 
> dataset under a different group (which happens to be 
> NXinstrument/NXdetector). In temporal order, the "original dataset" (or 
> original path, if you prefer) would be under the NXdata group, which 
> isn't what is shown on the NeXus Design page and (I suspect) not what is 
> intended.

The temporal order when writing the file is irrelevant. 

All your complaints about the documentation seem justified, so we should probably revise it, but the value of using the target attribute is still, I believe, valid.

I hope this helps.

With best regards,
Ray
 -- 
Ray Osborn, Senior Scientist
Materials Science Division
Argonne National Laboratory
Lemont, IL 60439, USA
Phone: +1 (630) 252-9011
Email: ROsborn at anl.gov

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.nexusformat.org/pipermail/nexus-committee/attachments/20250130/7cc49e17/attachment.htm>


More information about the NeXus-committee mailing list