[NeXus-committee] Example of links

Raymond Osborn rayosborn at mac.com
Mon Feb 3 17:45:12 GMT 2025


Hi Sandor,
Thanks for taking the time to summarize your thinking. It is good to know that your reasoning is similar to numerous discussions we had 20 or more years ago, which was then incorporated by Mark Koennecke into the NAPI (RIP) and then by Paul Kienzle and I into the Python API. Now that NIAC doesn’t support an API, I think it does make the documentation of links much more important than it was when we provided the APIs, because it is very easy to make mistakes when writing directly using the HDF5 libraries.

Here are a few comments on your notes.

> On Jan 31, 2025, at 10:58 AM, Brockhauser Sandor <sandor.brockhauser at physik.hu-berlin.de> wrote:
> 
> In fact, it does not know anymore, if /g3/g32 was supposed to point to /g1/g12 (e.g. nice_instrument/nice_detector) and not to /g2/g22 (e.g. bad_instrument/bad_detector), because it does not point to a path(!), but to the physical object. 
> 
> This is a big difference between hard links and soft links in hdf5! In case of a soft link, the link is actually a path and it is resolved in runtime. Just like linux symbolic links, these can be broken and can point to different things if the targeted object is changed or replaced. 

I agree that the HDF5 object pointed to by a soft link could in principle be replaced by 


> Additionally, the so called external links can even point you to a path in a different file. Obviously, if you change the content of this file, such links can easily point to a different physical object.

We have to have a completely separate discussion about external links. They are not the same as internal links. For example, you cannot add a ’target’ attribute to an external link because the physical object is in a different file with a completely different tree. If it was added to the external file object, then the path would apply to the external file, not the local file, and would, indeed, be meaningless in the local file, which has no way of knowing the tree structure of the external file. 

> The reason why we need a concept of a "target" attribute, so we can register for any group or dataset this attribute is attached to that this object was actually derived from here and there. Please note the difference, that we do not assume that the data object here would be the same as the referenced one (e.g. the one here may contain only the relevant section what a monitor was measuring during the experiment, or the one here is converted to a different uint compared to the referenced one). This is a big difference compared to a simple hdf5 link (or even a soft link). We argue, that in some cases the community using NeXus would like to know where the data was originated from.
> Hence, additionally to the data (which is either a new dataset, a hard/soft/external link, or even a virtual dataset which one it is just an hdf5 implementation details when NeXus is used on top of hdf5) we would like to allow attaching an attribute telling where it is coming from.

Although there are reasons to criticize the documentation, this is precisely what is described in https://manual.nexusformat.org/design.html#links. The diagram explicitly shows a link from the two-theta axis in the NXdata group to the same array in the detector group. However, I think your description is clearer, because unfortunately, the text above talks about avoiding replication of the data between the two groups, which may have made people think the link was to save space. That has never been the main reason for needing links.

> - @napimount: doc says that it is a group attribute, but is not it a linkType attribute? Note that the provided link for further explanation (http://manual.nexusformat.org/_static/NeXusIntern.pdf) is not valid.

The ’napimount’ attribute was a programmatic mechanism for the NAPI to handle external links. It is not part of the standard, and is not used by the Python API at all. You cannot explicitly add a napimount attribute within the local file for the same reason you can’t add a target attribute.

> - @target: doc says that it is added only because of hdf5, but we believe that its usefulness is independent of the backend if it is hdf5 or something else.

The concept of links using the target attribute was introduced when the NeXus supported HDF4 and XML files. In fact, if we had only supported HDF5, we could have used soft links instead. So it was limitations particularly in XML that made the target attribute necessary. The documentation must have been written after dropping support for XML and was probably explaining why the attribute was necessary when using hard links.

> - in the example @target is added to /entry/data/polar_angle which corresponds to the explanation, but it is also added to /entry/instrument/detector/polar_angle. It is not explained why is it needed there. It is because this is not derived from anywhere else? Why is not it then simply a "." which convention is used throughout NeXus? 

They are the same object, so of course the target attribute appears in both groups. 

In the Python API, when an object has a target attribute added, it checks if the target is the same as the object path. If it’s the same, the dataset is read in as a NXfield object. If it’s different, the dataset is read in as a NXlink object. The NXlink object can still be used as if it were a field (e.g., you can check its dtype or shape), but it is in fact a sub-class of both NXlink and NXfield. Structurally, this works well. The use is alerted to the fact that it is a link, and can recover the link target using the ’nxlink’ attribute, but it can be used in most contexts that a NXfield is used. However, changes directly to the NXlink are forbidden. If the user wants to change a link value, they need to explicitly change the parent’s value (even though they are the same object). Of course, this is the same behavior as soft links. 

> - If these two datasets would actually be the same physical objects (e.g. both occurrences would be hdf5 hard links to the same object), this would explain this example, but as pointed out above, we foresee other usecases, too.

I think this could be dangerous. If you make the NXlink object a physically different object to its target, then any assumption that the user might make about their equivalence could be invalid. Whether we use hard or soft links, I believe that the two objects have to be identical. 

> - according to the documentation @target must be an absolute path (although validTargetName suggests its future extension to relative paths, even including 'parent' relationship although we have seen that in case of hdf5 hard links in the tree, the interpretation of 'parent' can become tricky)

The interpretation of the parent may be tricky in HDF5, but it’s not in the NeXus standard, where it is explicitly defined by the target attribute.

> - note that the example at validTargetName is not at all an absolute path what is explained at linkType. Instead of an absolute (hdf5) path, here a class_path is used, like "NXentry/NXinstrument/analyser:NXcrystal/ef". This is not at all pointing to a unique location in a NeXus file (e.g. if we have 2 entries with their respective instruments and analyser-s defined) resulting in ambiguity when link targets are tried to be resolved.

As you know, the use of classes in validTargetName rather than names is because NeXus allows different names, so a validator would only be able to check that the actual target, which has to use names, contains the right chain of classes. 

> - https://manual.nexusformat.org/datarules.html#index-3 explains the use of NXdata. In explaining signal, it says that it shall point to a Field (field or link) with such name. This either suggests 
>    (1) that an NXdata group shall have the referenced dataset child inside either as a fieldType or as a linkType implemented. Note that ; or
>    (2) NXdata needs a dataset inside as a child which is either a field (aka hdf5 dataset) or a link (aka hdf5 link), but both are actually fieldType from NeXus point of view. 

Personally, I think it’s safer for the NXDL files not to specify whether a NXfield or NXgroup should be a link or not, i.e., NXDL should only refer to fields and groups, with the understanding that a user could choose to make some of them links at runtime. Those links would have to conform to what is described in the NeXus Design web page. Is this equivalent to your option 2?

> - how to implement and use a linkType object in hdf5 nexus file? It is actually stated (https://manual.nexusformat.org/design.html#index-17) that NeXus links are hdf5 hard links to objects having a @target attribute inside. This statement alone makes linkType unusable in practice since data collected in many facilities (e.g. EuXFEL) are in multiple (huge) hdf5 files, so one cannot just create hard links between them.

As I wrote above, external links cannot be handled the same way as internal links. Even if we think it’s a limitation, there is simply no practical way of treating them the same. I tried within the Python API and I couldn’t make it work. 

I don’t actually think this is a problem because I am fairly sure that external links do not serve the same structural purpose as internal links in the NeXus standard. We too make extensive use of external links to point to raw data, which is really big (at least to me - 36GB) but which we never want to touch. But it’s only the data - never the axes - and it’s the axes that contains metadata that we might want to associate with other metadata, e.g., associating 'time-of-flight' with ‘distance’. These are small arrays so there is no reason not to keep them in the local file. If you are wanting to store all the metadata in an external file, then I think you will find that there is no way to make the ’target’ attribute work. Please let me know if I’m wrong.

With best regards,
Ray
-- 
Ray Osborn, Senior Scientist
Materials Science Division
Argonne National Laboratory
Lemont, IL 60439, USA
Phone: +1 (630) 252-9011
Email: ROsborn at anl.gov <mailto:ROsborn at anl.gov>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.nexusformat.org/pipermail/nexus-committee/attachments/20250203/5a6682b6/attachment.htm>


More information about the NeXus-committee mailing list