[NeXus-committee] questions about NeXus data structures as per Figs 1-3 of draft paper

Osborn, Raymond rosborn at anl.gov
Tue Aug 19 18:21:04 BST 2014


Sorry I haven’t sent comments about the manuscript yet, but I hope to do so within the next day or so. Thanks for all the work. It looks very good in my initial read-through.

On Joachim’s question, though, I think he is right to raise these issues. Personally, I think it is a mistake to say anything is required unless it is part of a specific application definition. Obviously, it is impossible to do any data analysis if there isn’t any data, so a NeXus file without an NXdata group is of limited use, but it may not be useless. For example, there may be times when you want to create a NeXus file containing just metadata as part of a workflow where externally linked data files are added at a later stage. In fact, that is one of the options we are considering in our design of a single crystal x-ray diffuse scattering workflow. I don’t think it is in our interest to say that such preliminary files are not valid - they can serve a useful purpose. 

I think some of the discussion is based on the assumption that all NeXus files are put in long-term repositories, and so the only valid NeXus file are those that contains all the information necessary for future generations to analyze the data. That is a valid goal for a long-term repository, but NeXus files can be useful in intermediate stages of any data analysis and don’t have to be so complete. I don’t think we should discourage such usage - in fact, it was one of the original motivations for developing NeXus, and it is the way I use NeXus most of the time.

Ray

On Aug 18, 2014, at 10:33 AM, Pete R Jemian <prjemian at gmail.com> wrote:

> Joachim:
> 
> Your questions are most excellent.  These are all points that members of the NeXus Technical Committee should understand.
> 
> On 08/18/2014 08:32 AM, Joachim Wuttke wrote:
>> Dear colleagues:
>> 
>> In the draft manuscript (v6), Figs 1 and 3 show the
>> common structure of raw-data and processed-data
>> files, respectively.
>> 
>> Are these structures also described in the docs? Where?
> 
> Not as clearly as these figures.  They will become part of the manual.
> 
> In previous versions of the manual, there was a table that was used to describe the instrument definition hierarchy.  This was set aside as the instrument definitions (described with meta-DTD and documented on the wiki) were refactored into the NXDL we have now.  Now you see that documentation reappearing.  The attempt here is to describe what is needed to demonstrate the point without describing all the possibilities.  (Such possibilities are terribly distracting.  For example, they lead some people to think that all possibilities are required.)
> 
> One additional figure might "drive the point home" to describe the absolute minimum required structure of a NeXus data file.  That is:
> 
> -----------------------------------
> | NXroot                          |
> |   -------------------------------
> |   | NXentry            required |
> |   |   ---------------------------
> |   |   | NXdata         required |
> |   |   |   data:NX_NUMBER        |
> |   |   |     @signal=1  required |
> -----------------------------------
> 
> And, even in this simple example, the name "data" is not required since the attribute signal=1 labels this for NeXus as the default data to be visualized.
> 
> This is the absolute minimum structure (virtually no metadata) and we have an example that shows this:
> http://download.nexusformat.org/doc/html/examples/h5py/index.html
> 
> However, since so much raw data is acquired with knowledge of much more metadata, Figure 1 of the manuscript is a great suggestion for the structure of a prototypical NeXus data file.
> 
> 
>> Some groups are marked as "required". Is this specified
>> in the docs? Where?
> 
> http://download.nexusformat.org/doc/html/introduction.html#important-classes
> 
> Perhaps this should be content in the Design chapter?
> http://download.nexusformat.org/doc/html/design.html
> 
> 
>> If some groups at second or third level are required,
>> then the first-level group "NXroot" must also be
>> required, right?
> 
> At the root of the HDF5 data file, there has been no requirement to have an attribute @NX_class="NXroot" on the root.  It was suggested at a NIAC meeting some years ago that we should start adding this attribute as "good practice".  NIAC did not go so far as requiring it so as to maintain compatibility with common use and the NAPI implementation.
> 
> 
>> Why should we require NXdata? If scientists at a certain
>> instrument have no interest in using generic default
>> plotting tools, and don't like the extra complexity of
>> symbolic links in their raw data files, we should allow
>> them to use NeXus without NXdata.
> 
> http://download.nexusformat.org/doc/html/motivations.html#index-0
> 
> 
>> Why is NXinstrument required for raw-data files? Is there
>> an application definition without an NXinstrument group?
> 
> NXinstrument is not required.
> 
> How did you assume that NXinstrument was required for raw-data files?
> We need to adjust the manuscript to make sure others do not form that opinion.
> 
> BUT, pursuant to http://download.nexusformat.org/doc/html/motivations.html#defineddictionary, NXinstrument provides the place to store agreed-upon terms such as wavelength.
> 
> Another example use of NXinstrument is in the figure of this section:
> http://download.nexusformat.org/doc/html/design.html#links
> 
> 
>> It seems though that there are some raw-data application
>> definitions without NXsample, and many without NXuser.
>> Is there a rationale why some instrumt types would
>> require these metadata, and others not? Or do the
>> different application definitions just reflect different
>> personal preferences of different authors?
> 
> 
> Your last sentence is a good description of what I think is the reason.
> 
>> 
>> For multi-method instruments, some entries move from
>> NXentry into NXentry/NXsubentry. What if one day an
>> established multi-method instrument gets embedded into
>> a yet more powerful instrument: will we then have
>> NXentry/NXentry/NXsubentry or NXentry/NXsubentry/NXsubsentry
>> or NXentry/NXsubentry/NXsubsubsentry? In my humble
>> opinion, NXsubentry should never have been invented;
>> why not use the power of recursion and allow NXentry/NXentry?
> 
> 
> Certainly a good topic for NIAC debate.
> 
> Can you describe (with more specifics) such an instrument or concept that could not already be documented with Figure 2 in the manuscript?
> 
> In my view, for a single-technique small-angle scattering I(Q) dataset,
> it seems much easier to place the SAS data at
>   /NXentry/NXdata/I(Q)
> than
>   /NXentry/NXsubentry/NXdata/I(Q)
> 
> But, knowing that our users and scientists have boundless creativity, I accede to the structure of Figure 2 in the manuscript as that will cover conceivable future variety.
> 
> Pete
> 
> _______________________________________________
> NeXus-committee mailing list
> NeXus-committee at nexusformat.org
> http://lists.nexusformat.org/mailman/listinfo/nexus-committee

-- 
Ray Osborn, Senior Scientist
Materials Science Division
Argonne National Laboratory
Argonne, IL 60439, USA
Phone: +1 (630) 252-9011
Email: ROsborn at anl.gov





More information about the NeXus-committee mailing list