<html><head><style>pre,code,address {
margin: 0px;
}
h1,h2,h3,h4,h5,h6 {
margin-top: 0.2em;
margin-bottom: 0.2em;
}
ol,ul {
margin-top: 0em;
margin-bottom: 0em;
}
blockquote {
margin-top: 0em;
margin-bottom: 0em;
}
</style></head><body style="overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div>Dear all,</div><div><br></div><div>In fact, hdf5 has internally a graph data structure and not a tree (where we may or may not set a separately marked, so called links in between nodes). The tree view we generally see is just how this graph is presented by most software, but each parent-child relationship (when creating a subgroup or a dataset in a group) is actually just a hard link, an edge on the graph, just like any other hard links we may set at a later stage during the creation of the file. These links are registered at the object being targeted, and approaching an object via any of these hard links are basically the same from hdf5 perspective, and I am not sure if h5py would be able to tell you if you are coming from the direction of the "original" link or not. Although /g1/g12 and /g2/g22 are actually the same physical objects in the example below, the parent relationship to the same object is actually depends on where you were coming from:</div><div><font color="#1a5fb4">>>> f=h5py.File('htest.h5','w')</font></div><div><font color="#1a5fb4">>>> g1 = f.create_group("g1")</font></div><div><font color="#1a5fb4">>>> g12 = f['g1'].create_group("g12")</font></div><div><font color="#1a5fb4">>>> g2 = f.create_group("g2")</font></div><div><font color="#1a5fb4">>>> f['g2']['g22']=g12</font></div><div><font color="#1a5fb4">>>> f['g2']['g22'].parent</font></div><div><font color="#1a5fb4"><HDF5 group "/g2" (1 members)></font></div><div><font color="#1a5fb4">>>> f['g1']['g12'].parent</font></div><div><font color="#1a5fb4"><HDF5 group "/g1" (1 members)></font></div><div><font color="#1a5fb4">>>> f['g2']['g22']==f['g1']['g12']</font></div><div><font color="#1a5fb4">True</font></div><div><font color="#1a5fb4">>>> f.close()</font></div><div>In fact, the created links are ordered according to their creation, so one could work out some chronology. This is how h5dump does it:</div><div><font color="#1a5fb4">HDF5 "htest.h5" {</font></div><div><font color="#1a5fb4">GROUP "/" {</font></div><div><font color="#1a5fb4">GROUP "g1" {</font></div><div><font color="#1a5fb4">GROUP "g12" {</font></div><div><font color="#1a5fb4">}</font></div><div><font color="#1a5fb4">}</font></div><div><font color="#1a5fb4">GROUP "g2" {</font></div><div><font color="#1a5fb4">GROUP "g22" {</font></div><div><font color="#1a5fb4">HARDLINK "/g1/g12"</font></div><div><font color="#1a5fb4">}</font></div><div><font color="#1a5fb4">}</font></div><div><font color="#1a5fb4">}</font></div><div><font color="#1a5fb4">}</font></div><div>But please note(!), this is not the "original" assignment, as shown below by extending the test a bit:</div><div><div><font color="#1a5fb4">>>> import h5py</font></div><div><font color="#1a5fb4">>>> f=h5py.File('htest.h5','r+')</font></div><div><font color="#1a5fb4">>>> g3 = f.create_group("g3")</font></div></div><div><font color="#1a5fb4">>>> f['g3']['g32']=f['g2']['g22']</font></div><div><font color="#1a5fb4">>>> f.close()</font></div><div>Here, one would naively expect to see that the hardlink actually point to /g2/g22, but have a look on h5dump:</div><div><font color="#1a5fb4">HDF5 "htest.h5" {</font></div><div><font color="#1a5fb4">GROUP "/" {</font></div><div><font color="#1a5fb4">GROUP "g1" {</font></div><div><font color="#1a5fb4">GROUP "g12" {</font></div><div><font color="#1a5fb4">}</font></div><div><font color="#1a5fb4">}</font></div><div><font color="#1a5fb4">GROUP "g2" {</font></div><div><font color="#1a5fb4">GROUP "g22" {</font></div><div><font color="#1a5fb4">HARDLINK "/g1/g12"</font></div><div><font color="#1a5fb4">}</font></div><div><font color="#1a5fb4">}</font></div><div><font color="#1a5fb4">GROUP "g3" {</font></div><div><font color="#1a5fb4">GROUP "g32" {</font></div><div><font color="#1a5fb4">HARDLINK "/g1/g12" <span class="Apple-converted-space"> </span></font><font color="#e01b24"> <--!!!</font></div><div><font color="#1a5fb4">}</font></div><div><font color="#1a5fb4">}</font></div><div><font color="#1a5fb4">}</font></div><div><font color="#1a5fb4">}</font></div><div>In fact, it does not know anymore, if /g3/g32 was supposed to point to /g1/g12 (e.g. nice_instrument/nice_detector) and not to /g2/g22 (e.g. bad_instrument/bad_detector), because it does not point to a path(!), but to the physical object. </div><div><br></div><div>This is a big difference between hard links and soft links in hdf5! In case of a soft link, the link is actually a path and it is resolved in runtime. Just like linux symbolic links, these can be broken and can point to different things if the targeted object is changed or replaced. </div><div>Additionally, the so called external links can even point you to a path in a different file. Obviously, if you change the content of this file, such links can easily point to a different physical object.</div><div><br></div><div>================</div><div>Up to now, it was all about hdf5. In NeXus, we do use these hdf5 features a lot, and even more, like virtual datasets (where a dataset is virtually as a nexus Field, but its content is actually not a pure binary block of bits, but a dataset created on the fly by the hdf5 library using multiple datasets being referenced separately e.g. via external links - so we can concatenate, crop, slices, etc. on the fly).</div><div><br></div><div>The reason why we need a concept of a "target" attribute, so we can register for any group or dataset this attribute is attached to that this object was actually<span class="Apple-converted-space"> </span><b>derived from</b><span class="Apple-converted-space"> </span>here and there. Please note the difference, that we do not assume that the data object here would be the same as the referenced one (e.g. the one here may contain only the relevant section what a monitor was measuring during the experiment, or the one here is converted to a different uint compared to the referenced one). This is a big difference compared to a simple hdf5 link (or even a soft link). We argue, that in some cases the community using NeXus would like to know where the data was originated from.</div><div>Hence, additionally to the data (which is either a new dataset, a hard/soft/external link, or even a virtual dataset which one it is just an hdf5 implementation details when NeXus is used on top of hdf5) we would like to allow attaching an attribute telling where it is coming from.</div><div><br></div><div>Indeed, the documented linkType has a very similar purpose: with its target attribute this can delivers the information where a given object is coming from. Some problems with its documentation (<a href="https://manual.nexusformat.org/nxdl_desc.html#linktype">https://manual.nexusformat.org/nxdl_desc.html#linktype</a>) which pushed us for proposing something (indeed) similar:</div><div><br></div><div>- linkType says that it can be defined under definition, group, or field, but the documentation of fieldType (contrary to the documentation of definition and groupType) does not listed it as a possibility to add.</div><div><br></div><div>- @napimount: doc says that it is a group attribute, but is not it a linkType attribute? Note that the provided link for further explanation (<a class="reference external" href="http://manual.nexusformat.org/_static/NeXusIntern.pdf">http://manual.nexusformat.org/_static/NeXusIntern.pdf</a>) is not valid.</div><div><br></div><div>- @target: doc says that it is added only because of hdf5, but we believe that its usefulness is independent of the backend if it is hdf5 or something else.</div><div><br></div><div>- in the example @target is added to /entry/data/polar_angle which corresponds to the explanation, but it is also added to /entry/instrument/detector/polar_angle. It is not explained why is it needed there. It is because this is not derived from anywhere else? Why is not it then simply a "." which convention is used throughout NeXus? </div><div><br></div><div>- If these two datasets would actually be the same physical objects (e.g. both<span class="Apple-converted-space"> </span>occurrences<span class="Apple-converted-space"> </span>would be hdf5 hard links to the same object), this would explain this example, but as pointed out above, we foresee other usecases, too.</div><div><br></div><div>- according to the documentation @target must be an absolute path (although validTargetName suggests its future extension to relative paths, even including 'parent' relationship although we have seen that in case of hdf5 hard links in the tree, the interpretation of 'parent' can become tricky)</div><div><br></div><div>- note that the example at validTargetName is not at all an absolute path what is explained at linkType. Instead of an absolute (hdf5) path, here a class_path is used, like "NXentry/NXinstrument/analyser:NXcrystal/ef". This is not at all pointing to a unique location in a NeXus file (e.g. if we have 2 entries with their respective instruments and analyser-s defined) resulting in ambiguity when link targets are tried to be resolved.</div><div><br></div><div>- <a href="https://manual.nexusformat.org/datarules.html#index-3">https://manual.nexusformat.org/datarules.html#index-3</a> explains the use of NXdata. In explaining signal, it says that it shall point to a Field (field or link) with such name. This either suggests </div><div> (1) that an NXdata group shall have the referenced dataset child inside either as a fieldType or as a linkType implemented. Note that ; or</div><div> (2) NXdata needs a dataset inside as a child which is either a field (aka hdf5 dataset) or a link (aka hdf5 link), but both are actually fieldType from NeXus point of view. </div><div>Interpretation (1) contradicts the NeXus documentation in several places. E.g. in the NXdata defintions (<a href="https://github.com/nexusformat/definitions/blob/main/base_classes/NXdata.nxdl.xml">https://github.com/nexusformat/definitions/blob/main/base_classes/NXdata.nxdl.xml</a>), NXdata/DATA is actually defined as a fieldType: <<b>field</b><span class="Apple-converted-space"> </span>name="DATA" type="NX_NUMBER" nameType="any">. The NXDL syntax (<a href="https://github.com/nexusformat/definitions/blob/main/nxdl.xsd">https://github.com/nexusformat/definitions/blob/main/nxdl.xsd</a>) handles fieldType and linkType separately as not interchangable terms, but those which can be used separately in definitions. Hence, NXDL supports defining a field or a link. Note that linkType is rarely used in NeXus. An example is NXxas (<a href="https://github.com/nexusformat/definitions/blob/main/applications/NXxas.nxdl.xml">https://github.com/nexusformat/definitions/blob/main/applications/NXxas.nxdl.xml</a>), where 'energy' is not a fieldType but a linkType: <<b>link</b><span class="Apple-converted-space"> </span>name="energy" target="/NXentry/NXinstrument/monochromator:NXmonochromator/energy"/>. Based on this, "energy" cannot be referenced in NXdata under @signal as an NXdata/DATA (which is a fieldType) or under @axes as an NXdata/AXISNAME (which is also a FieldType).</div><div>Interpretation (2) tries to resolve the problem by simply saying that a linkType is a kind of fieldType, but this is not at all made clear from the NeXus documentation.</div><div><br></div><div>- how to implement and use a linkType object in hdf5 nexus file? It is actually stated (<a href="https://manual.nexusformat.org/design.html#index-17">https://manual.nexusformat.org/design.html#index-17</a>) that NeXus links are hdf5 hard links to objects having a @target attribute inside. This statement alone makes linkType unusable in practice since data collected in many facilities (e.g. EuXFEL) are in multiple (huge) hdf5 files, so one cannot just create hard links between them.</div><div><br></div><div>====================</div><div>Hence, a clarification in documentation would be nice:</div><div>- Groups can have Groups and Fields inside. (In hdf5, they can be either created directly as children /groups, datasets, or virtual datasets/ or referenced via links /hard, soft, or external links/)</div><div>- @target (or @origin? or @reference?) attribute can be added to any Group or Field to declare where the data is coming from.<br> + If a new Group or Field is created here, its 'origin' attribute can be set to the other object where its data is coming from.</div><div> + If we use a link here, the origin attribute can be set in the referenced object. <br> + Note that multiple linking looses the intermediate connections: e.g. in case of a -> b -> c where <a href="mailto:c@origin">c@origin</a>=c results that resolving <a href="mailto:'a@origin">'a@origin</a>' will tell that its origin is 'c' and not the direct parent 'b'. This is not necessarily a problem, because the data is actually coming from there.</div><div> + In case the data would have been only referenced (and potentially altered) in the chain somewhere and not linked, this will also be resolved correctly. e.g. <a href="mailto:a@origin">a (being different from b but having a@origin</a>=b) with b -> <a href="mailto:c@origin">c (being different from d, but having c@origin</a>=d) and d -> <a href="mailto:e@origin">e where e@origin</a>=e The full chain of dependency would be readable from the attributes properly: <a href="mailto:a@origin">a@origin: b;</a> <a href="mailto:b@origin">b@origin</a>: d; [also <a href="mailto:c@origin">c@origin</a>: d]; <a href="mailto:d@origin">d@origin</a>: e</div><div>- Application definition may require the presence of this attribute, so it can find out where the data was coming from and what are the corresponding data objects.</div><div><br></div><div>Thanks,</div><div>Sandor</div><br class="Apple-interchange-newline"><div><br></div><div>On Fri, 2025-01-31 at 08:43 -0600, Raymond Osborn via NeXus-committee wrote:</div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div>It is possible to query a soft link in HDF5, but I don’t believe there is any way to query a hard link, without walking through the entire file checking for object IDs. And, of course, there is no way of telling which is the parent. </div><div><br></div><div>Ray<br id="lineBreakAtBeginningOfMessage"><div><br><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div>On Jan 30, 2025, at 3:50 PM, Aaron Brewster <asbrewster@lbl.gov> wrote:</div><div><br class="Apple-interchange-newline"></div><div><div dir="ltr"><div>In h5py, I had thought you could query a group or field to see if it's a soft link and get its original location. I don't know how to do the same for a hard link but I presume it's possible. Therefore the target attribute would appear to be redundant.</div><div><br></div><div>However, to me, the most important reason why to have @target is to not be tied to HDF5. It's useful to have it from a specification point of view.<br></div><div>-Aaron</div></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Thu, Jan 30, 2025 at 1:28 PM Raymond Osborn via NeXus-committee <<a href="mailto:nexus-committee@shadow.nd.rl.ac.uk">nexus-committee@shadow.nd.rl.ac.uk</a>> wrote:<br></div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div><div style="font-size:13px">Hi Paul,</div><div style="font-size:13px">Thanks for the follow-up questions. I will try to answer them below.</div><div style="font-size:13px"><br></div><div style="font-size:13px"><b>From</b>: NeXus-committee <<a href="mailto:nexus-committee-bounces@shadow.nd.rl.ac.uk" target="_blank">nexus-committee-bounces@shadow.nd.rl.ac.uk</a>> on behalf of Paul Millar via NeXus-committee <<a href="mailto:nexus-committee@shadow.nd.rl.ac.uk" target="_blank">nexus-committee@shadow.nd.rl.ac.uk</a>></div><div style="font-size:13px"><b>Date</b>: Thursday, January 30, 2025 at 12:07 PM</div><div style="font-size:13px"><b>To</b>: <a href="mailto:nexus-committee@nexusformat.org" target="_blank">nexus-committee@nexusformat.org</a> <<a href="mailto:nexus-committee@nexusformat.org" target="_blank">nexus-committee@nexusformat.org</a>></div><div style="font-size:13px"><b>Subject</b>: Re: [NeXus-committee] Example of links</div><div style="font-size:13px"><br></div><div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div>Hi Ray,</div><div> </div><div>Thanks for sharing these examples, for talking about the "target" attribute.</div><div> </div><div>For me, this is very interesting.</div><div> </div><div>I took the opportunity to read through the description of groups and </div><div>links in the HDF5 manual. I've a background in storage and filesystem </div><div>programming, so the concepts in HDF5 make perfect sense to me: it's </div><div>(more or less) just the standard POSIX filesystem's namespace. HDF5 </div><div>even reuses some of the POSIX vocabulary.</div><div> </div><div>What confuses me is the "target" attribute in NeXus.</div><div> </div><div>As the NeXus Design page itself describes, hard links (i.e., the same </div><div>object being linked to under multiple groups) are symmetric. There is no </div><div>sense of source and destination. Instead, hard links are simply being </div><div>able to refer to the same object via two (or more) paths. Under HDF5, </div><div>these paths are equivalent: neither path is more important.</div><div> </div><div> From what I see, the NeXus "target" attribute seeks to break this </div><div>symmetry. The "target" attribute's value is the absolute path of these </div><div>paths. This makes the "target" path a preferred way of referring to the </div><div>object.</div><div> </div><div>What I'm missing is why having a preferred path is necessary in NeXus.</div></blockquote><div style="font-size:13px"><br></div><div style="font-size:13px">If the reason for using links is to save space (e.g., adding the same sample information to multiple entries), then it probably doesn’t matter which is the parent and which the child. The purpose of the link could also be to ensure that, e.g., the sample lattice parameter is updated in every entry when it is changed in one of them. Again, none of the objects is obviously the parent.</div><div style="font-size:13px"><br></div><div style="font-size:13px">However, there are important structural reasons for adding links with one of the objects as the parent. The most common use of links is in the NXdata group, where the axes are stored elsewhere. Here’s a shortened version of chopper.nxs, for example. </div><div style="font-size:13px"><br></div><div style="font-size:13px"><font face="Courier New">>>> print(chopper.tree)</font></div><div style="font-size:13px"><div><font face="Courier New">chopper:NXroot</font></div><div><span style="font-family:"Courier New""> entry:NXentry</span></div><div><span style="font-family:"Courier New""> data:NXdata</span></div><div><font face="Courier New"> @axes = ['polar_angle', 'time_of_flight']</font></div><div><font face="Courier New"> @signal = 'data'</font></div><div><font face="Courier New"> data = int32(148x750)</font></div><div><span style="font-family:"Courier New""> polar_angle -> /entry/instrument/detector/polar_angle</span></div><div><font face="Courier New"> time_of_flight -> /entry/instrument/detector/time_of_flight</font></div><div><span style="font-family:"Courier New""> instrument:NXinstrument</span></div><div><font face="Courier New"> detector:NXdetector</font></div><div><font face="Courier New"> distance = float32(148)</font></div><div><span style="font-family:"Courier New""> polar_angle = float32(148)</span></div><div><span style="font-family:"Courier New""> time_of_flight = float32(751)</span></div><div><span style="font-family:"Courier New""> type = 'He3 gas cylinder'</span></div></div><div style="font-size:13px"><br></div><div style="font-size:13px">Here the main NXdata group plots the data against polar angle and time-of-flight, both of which are properties of the detector and so are stored in ‘entry/instrument/detector’. If someone plotting the data wants to know about other detector properties, such as the sample-to-detector distance, those are also in the NXdetector group and the target attribute shows the user where to look. There could be multiple NXdetector groups, but the link identifies the right one. So the target attribute provides important functionality. In a data reduction script that wants to convert from time-of-flight to energy transfer, it is essential they know in which group the relevant distance fields are stored. That is only possible by making the object in the NXdetector group the parent and using the ’target’ attribute to point to it.</div><div style="font-size:13px"><br></div><div style="font-size:13px">Ironically, I think this functional purpose is what led the Fairmat group to propose the ’target’ attribute, so the original reasoning was sound, if now forgotten.</div><div style="font-size:13px"> </div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div>The NeXus Design page is somewhat coy about saying why a "target" </div><div>attribute is needed. There's some vague mention of people getting </div><div>confused when using a particular tool, but nothing concrete. If people </div><div>are confused, isn't this rather a problem with that tool or with how </div><div>NeXus is organising data?</div></blockquote><div style="font-size:13px"><br></div><div style="font-size:13px">The importance of links was crystal-clear to the original developers of NeXus twenty years ago for the reasons I described above. I hadn’t realized that this aspect of the standard was no longer understood. I guess we did a bad job of documenting it at the time.</div><div style="font-size:13px"><br></div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div>The page also includes some rather confusing use of terminology. The </div><div>page seemingly confuses "links" (all objects are accessible through at </div><div>least one link, if not they are garbage collected) with "hard linking" </div><div>(a common term for creating a new reference to some existing objects).</div></blockquote><div style="font-size:13px"><br></div><span style="font-size:13px">If documentation of NeXus links is intermingled with discussions of garbage collection, then it should be changed. </span><br style="font-size:13px"><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div> </div><div>The NeXus Design page also talks about the "original dataset" . This is </div><div>arguable wrong. There is no "original dataset" since all hard links </div><div>refer to the same, single dataset. One might talk about the "original </div><div>path". However, given two paths, what is it that makes one path "original"?</div></blockquote><div style="font-size:13px"><br></div><div style="font-size:13px">This may be clumsy wording, but I think the meaning in the above example is that ‘/entry/instrument/detector/time_of_flight’ is the “original dataset.” It is reproduced in the NXdata group to make plotting more convenient.</div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div> </div><div>As a counter example using the "Linking in a NeXus file" diagram from </div><div>the NeXus Design page, with HDF5 semantics I could create the dataset in </div><div>one group (that happens to be NXdata) and then create a link to that </div><div>dataset under a different group (which happens to be </div><div>NXinstrument/NXdetector). In temporal order, the "original dataset" (or </div><div>original path, if you prefer) would be under the NXdata group, which </div><div>isn't what is shown on the NeXus Design page and (I suspect) not what is </div><div>intended.</div></blockquote><div style="font-size:13px"><br></div><div style="font-size:13px">The temporal order when writing the file is irrelevant. </div><div style="font-size:13px"><br></div><div style="font-size:13px"><div>All your complaints about the documentation seem justified, so we should probably revise it, but the value of using the target attribute is still, I believe, valid.</div><div><br></div><div>I hope this helps.</div><div><br></div><div>With best regards,</div><div>Ray</div><div> -- </div><div>Ray Osborn, Senior Scientist</div><div>Materials Science Division</div><div>Argonne National Laboratory</div><div>Lemont, IL 60439, USA</div><div>Phone: +1 (630) 252-9011</div><div>Email: <a href="mailto:ROsborn@anl.gov" target="_blank">ROsborn@anl.gov</a></div></div><br></div></div><div>_______________________________________________<br>NeXus-committee mailing list<br></div><div><a href="mailto:NeXus-committee@nexusformat.org" target="_blank">NeXus-committee@nexusformat.org</a><br></div><div><a href="https://lists.nexusformat.org/mailman/listinfo/nexus-committee" rel="noreferrer" target="_blank">https://lists.nexusformat.org/mailman/listinfo/nexus-committee</a><br></div></blockquote></div></div></blockquote></div><br></div><pre>_______________________________________________</pre><pre>NeXus-committee mailing list</pre><pre><a href="mailto:NeXus-committee@nexusformat.org">NeXus-committee@nexusformat.org</a></pre><pre><a href="https://lists.nexusformat.org/mailman/listinfo/nexus-committee">https://lists.nexusformat.org/mailman/listinfo/nexus-committee</a></pre></blockquote><div><br></div><div><span></span></div></body></html>