[NeXus-committee] Example of links

Raymond Osborn rayosborn at mac.com
Fri Jan 31 14:43:55 GMT 2025


It is possible to query a soft link in HDF5, but I don’t believe there is any way to query a hard link, without walking through the entire file checking for object IDs. And, of course, there is no way of telling which is the parent. 

Ray

> On Jan 30, 2025, at 3:50 PM, Aaron Brewster <asbrewster at lbl.gov> wrote:
> 
> In h5py, I had thought you could query a group or field to see if it's a soft link and get its original location.  I don't know how to do the same for a hard link but I presume it's possible.  Therefore the target attribute would appear to be redundant.
> 
> However, to me, the most important reason why to have @target is to not be tied to HDF5.  It's useful to have it from a specification point of view.
> -Aaron
> 
> On Thu, Jan 30, 2025 at 1:28 PM Raymond Osborn via NeXus-committee <nexus-committee at shadow.nd.rl.ac.uk <mailto:nexus-committee at shadow.nd.rl.ac.uk>> wrote:
>> Hi Paul,
>> Thanks for the follow-up questions. I will try to answer them below.
>> 
>> From: NeXus-committee <nexus-committee-bounces at shadow.nd.rl.ac.uk <mailto:nexus-committee-bounces at shadow.nd.rl.ac.uk>> on behalf of Paul Millar via NeXus-committee <nexus-committee at shadow.nd.rl.ac.uk <mailto:nexus-committee at shadow.nd.rl.ac.uk>>
>> Date: Thursday, January 30, 2025 at 12:07 PM
>> To: nexus-committee at nexusformat.org <mailto:nexus-committee at nexusformat.org> <nexus-committee at nexusformat.org <mailto:nexus-committee at nexusformat.org>>
>> Subject: Re: [NeXus-committee] Example of links
>> 
>>> Hi Ray,
>>>  
>>> Thanks for sharing these examples, for talking about the "target" attribute.
>>>  
>>> For me, this is very interesting.
>>>  
>>> I took the opportunity to read through the description of groups and 
>>> links in the HDF5 manual.  I've a background in storage and filesystem 
>>> programming, so the concepts in HDF5 make perfect sense to me: it's 
>>> (more or less) just the standard POSIX filesystem's namespace.  HDF5 
>>> even reuses some of the POSIX vocabulary.
>>>  
>>> What confuses me is the "target" attribute in NeXus.
>>>  
>>> As the NeXus Design page itself describes, hard links (i.e., the same 
>>> object being linked to under multiple groups) are symmetric. There is no 
>>> sense of source and destination.  Instead, hard links are simply being 
>>> able to refer to the same object via two (or more) paths.  Under HDF5, 
>>> these paths are equivalent: neither path is more important.
>>>  
>>>  From what I see, the NeXus "target" attribute seeks to break this 
>>> symmetry.  The "target" attribute's value is the absolute path of these 
>>> paths.  This makes the "target" path a preferred way of referring to the 
>>> object.
>>>  
>>> What I'm missing is why having a preferred path is necessary in NeXus.
>> 
>> If the reason for using links is to save space (e.g., adding the same sample information to multiple entries), then it probably doesn’t matter which is the parent and which the child. The purpose of the link could also be to ensure that, e.g., the sample lattice parameter is updated in every entry when it is changed in one of them. Again, none of the objects is obviously the parent.
>> 
>> However, there are important structural reasons for adding links with one of the objects as the parent. The most common use of links is in the NXdata group, where the axes are stored elsewhere. Here’s a shortened version of chopper.nxs, for example. 
>> 
>> >>> print(chopper.tree)
>> chopper:NXroot
>>     entry:NXentry
>>        data:NXdata
>>            @axes = ['polar_angle', 'time_of_flight']
>>            @signal = 'data'
>>            data = int32(148x750)
>>            polar_angle -> /entry/instrument/detector/polar_angle
>>            time_of_flight -> /entry/instrument/detector/time_of_flight
>>        instrument:NXinstrument
>>            detector:NXdetector
>>                distance = float32(148)
>>                polar_angle = float32(148)
>>                time_of_flight = float32(751)
>>                type = 'He3 gas cylinder'
>> 
>> Here the main NXdata group plots the data against polar angle and time-of-flight, both of which are properties of the detector and so are stored in ‘entry/instrument/detector’. If someone plotting the data wants to know about other detector properties, such as the sample-to-detector distance, those are also in the NXdetector group and the target attribute shows the user where to look. There could be multiple NXdetector groups, but the link identifies the right one. So the target attribute provides important functionality. In a data reduction script that wants to convert from time-of-flight to energy transfer, it is essential they know in which group the relevant distance fields are stored. That is only possible by making the object in the NXdetector group the parent and using the ’target’ attribute to point to it.
>> 
>> Ironically, I think this functional purpose is what led the Fairmat group to propose the ’target’ attribute, so the original reasoning was sound, if now forgotten.
>>  
>>> The NeXus Design page is somewhat coy about saying why a "target" 
>>> attribute is needed.  There's some vague mention of people getting 
>>> confused when using a particular tool, but nothing concrete.  If people 
>>> are confused, isn't this rather a problem with that tool or with how 
>>> NeXus is organising data?
>> 
>> The importance of links was crystal-clear to the original developers of NeXus twenty years ago for the reasons I described above. I hadn’t realized that this aspect of the standard was no longer understood. I guess we did a bad job of documenting it at the time.
>> 
>>> The page also includes some rather confusing use of terminology. The 
>>> page seemingly confuses "links" (all objects are accessible through at 
>>> least one link, if not they are garbage collected) with "hard linking" 
>>> (a common term for creating a new reference to some existing objects).
>> 
>> If documentation of NeXus links is intermingled with discussions of garbage collection, then it should be changed. 
>>>  
>>> The NeXus Design page also talks about the "original dataset" . This is 
>>> arguable wrong.  There is no "original dataset" since all hard links 
>>> refer to the same, single dataset. One might talk about the "original 
>>> path".  However, given two paths, what is it that makes one path "original"?
>> 
>> This may be clumsy wording, but I think the meaning in the above example is that ‘/entry/instrument/detector/time_of_flight’ is the “original dataset.” It is reproduced in the NXdata group to make plotting more convenient.
>>>  
>>> As a counter example using the "Linking in a NeXus file" diagram from 
>>> the NeXus Design page, with HDF5 semantics I could create the dataset in 
>>> one group (that happens to be NXdata) and then create a link to that 
>>> dataset under a different group (which happens to be 
>>> NXinstrument/NXdetector). In temporal order, the "original dataset" (or 
>>> original path, if you prefer) would be under the NXdata group, which 
>>> isn't what is shown on the NeXus Design page and (I suspect) not what is 
>>> intended.
>> 
>> The temporal order when writing the file is irrelevant. 
>> 
>> All your complaints about the documentation seem justified, so we should probably revise it, but the value of using the target attribute is still, I believe, valid.
>> 
>> I hope this helps.
>> 
>> With best regards,
>> Ray
>>  -- 
>> Ray Osborn, Senior Scientist
>> Materials Science Division
>> Argonne National Laboratory
>> Lemont, IL 60439, USA
>> Phone: +1 (630) 252-9011
>> Email: ROsborn at anl.gov <mailto:ROsborn at anl.gov>
>> _______________________________________________
>> NeXus-committee mailing list
>> NeXus-committee at nexusformat.org <mailto:NeXus-committee at nexusformat.org>
>> https://lists.nexusformat.org/mailman/listinfo/nexus-committee

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.nexusformat.org/pipermail/nexus-committee/attachments/20250131/2976717d/attachment.htm>


More information about the NeXus-committee mailing list