Storing repeating structures in NeXus files

C.M.Moreton-Smith at rl.ac.uk C.M.Moreton-Smith at rl.ac.uk
Wed Nov 3 16:35:22 GMT 1999


I've spent some time talking to Freddie about this problem and have come up
with a two proposals.  As Ray points out, this should be posted to the NeXus
list but I'd rather the NAPI list had a chance to comment first before I
post it.

The main aim here is to offer a single convention for handling repeating
structures rather than leave it to everyone to implement them in a slightly
different way.  In the case of ISIS files, there are several examples of
needing this sort of structure, one NXEntry per period is a  likely one, or
possibly a two dimensional ordering of NXentries based on periods in one
dimension and time-regimes in the other.  Any common parameters are simply
linked back to the first entry so the structure remains fairly efficient.
The data group would of course be different in each entry.

A smaller scale and more ISIS specific example is that of sample environment
parameter blocks which are already stored as an array of similar
heterogeneous structures, one for each piece of sample environment equipment
on the CAMAC control system.  Writing these as separate arrays is
incovenient, if only because some would have to be arrays of strings (sample
block names).


Proposal 1
==========

We reserve a special class name, "NXgroup_array" and give the names
"dimensions" and numeric names (e.g. "1", "2", ..."nnn") special meaning
within this group. This class is used to denote a special group for
repeating structures.

We can then simulate arrays of structures in the following standard format

 NXgroup	(Class = NXgroup_array)
   "dimensions"	: SDS (Integer)	- Integer SDS holding dimension sizes
   "1"		: NXgroup		- arrays based at 1
   "2"		: NXgroup
   ...
   "d1 x d2 x ... dn" : NXgroup	- store arrays flat (dn are dimensions)
 end

The repeating groups are simply added with NXmakeGroup() and named with
their index number.  Although not enforceable, all the NXgroups should
contain the same fields.  Retrieving groups is a simple case of opening the
top group, reading the dimensions SDS first, and then scanning through with
NXOpengroup() as the data is read.  The user is responsible for multiplying
out the dimensions to find the correct element if the array is
multi-dimensional.  For example element 2,5 (or 5,2) would be accessed as
the group with the name "10".

Any standard NeXus browser software can still display the array without
special knowledge of this format and the array indexing is still self
descriptive and visually intuitive. By keeping the array "flat" and storing
dimensions separately, programming is also simplified.

Some notes:

Dimensions
----------
Once, again, in terms of the standard, we need to specify the order of
storing the dimensions. It is suggested that the most rapidly varying
dimension (that of adjacent elements) is stored first in the dimensions SDS.
Higher dimensions could then be added later with ease and existing code
looking at the first dimension would only ever underscan (not overscan) the
data.

First element
-------------
And the number of the first element? - we have assumed that "1" is more
natural from the scientific point of view, also multiplying dimensions e.g.
2 x 3 will give the number of the corresponding element exactly without
having to remember to add or subtract one.

Modified API functions (not required)
-------------------------------------
This convention can be used without any need for new API functions.
However, it seems sensible to consider extending functions such as
NXGetInfo() to return the dimensions and rank of this special group as if
this was an SDS of NXgroups.  As the class name is reserved, this can be
done unambiguously.  NXGetNextEntry() could also be modified to only return
the numbered groups in sequential order and to ignore the dimensions entry.



Proposal 2
==========
This is an ancillary proposal which gives a means of reading a NeXus file
with several entries by number or alternativly to view the file as an array
of NXentries.  It also provides a standard group name "file" for reading the
whole file at once.

proposal
--------
Currently it is assumed that the names of NXEntry groups will be something
like "entry1", "entry2" etc.  The proposal is that we stipulate entry names
are even more simply named as "1", "2", ..., "n".

By doing this, it is possible to add an NXgroup_array (see proposal 1) with
it's entries as links to the top level NXentry groups of the NeXus file.  If
this is done, the name of the group array should be "file" and the elements
of the group array should be made as links to the "NXEntry" groups in the
file.

note
----
The renaming of NXentries is only because the underlying HDF does not allow
symbolic links. i.e. if the group is linked to as "1", the name of the
linked to group must also be "1".

============================================================================
======

> -----Original Message-----
> From: Mark.Koennecke [mailto:koenneck at psi.ch]
> Sent: 02 November 1999 07:27
> To: NAPI at isise.rl.ac.uk; C.M.Moreton-Smith at RL.AC.UK
> Cc: napi at anl.gov
> Subject: Re: Storing repeating structures
> 
> 
> 
> 
> 
> On Mon, 1 Nov 1999 C.M.Moreton-Smith at rl.ac.uk wrote:
> 
> > 
> > main()
> > {
> > 	/* Define the record structure */
> > 	typedef struct element_struct
> > 	{
> > 		int a;
> > 		float b;
> > 		char c[10];	/* Allocated in the struct */
> > 	} elem;
> > 
> > 	elem my_array[100];
> > 
> > 	my_array[2].a = 1;
> > 	my_array[55].b = 99.5;
> > 	strcpy(my_array[33].c, "a string");
> > 
> > 	/* How do we now store "my_array" in a NeXus file? */
> > }
> > 
> > The obvious way is to simply do what we do with the 
> "Entry_" Vgroups and
> > just append the index of the array element creating a set 
> similar but
> > differently named Vgroups my_array_1, my_array_2 etc.  But 
> what if the array
> > is declared as a two dimensional array (e.g. in C 
> my_array[25][4])?.  Would
> > we then write out my_array_23_2 or my_array_2_23.
> > 
> > Also, for a person reading this file, there is no rank 
> information available
> > and hence without searching for each one, they will not 
> even be able to tell
> > how many records of this type there are in the file to 
> read.  Tricky if you
> > are dynamically allocating memory.
> > 
> > Has anyone else come across this problem yet?  Is there a 
> legal method of
> > storing an array of Vgroups in the underlying HDF which we should be
> > exposing via the next version of the API?
> > 
> 
>   Chris,
> 
>   I did indeed hit this problem. We have an instrument which 
> will collect
>   series of PSD frames when the detector electronics becomes
>   available(sigh.....). At each frame a couple of values have to be
>   stored. In C you would use a struct.
> 
>   To my knowledge there is no way to have arrays of vGroups 
> either in HDF4
>   nor in HDF 5. I'am not so sure about HDF5 though.
> 
>   Options available where to have F77 style arrays for each 
> single  value
>   to be stored or to have a vGroup per frame.
>  
>   I choose the vGroup option for two reasons: Each frame represents a
>   single measurement. The structure of NeXus files required some data
>   to be stored in different vGroups: i.e PSD data in NXdata, 
> NXinstrument,
>   NXdetector, sample data in NXsample etc. The vGroup option 
> allowed to 
>   preserve this structure. I also feel that the addition of 
> an artifical
>   dimension (frame number) is a bit dodgy especially if we consider 
>   automatic data analysis.  
> 
>   I have a counter for the number of the frames available as 
> a data item 
>   in the first vGroup.    
> 
>   No, I did not solve the 2 dimensional problem. But what is 
> that really?
>   Do you have an example for that case?
> 
> 
>                                        Mark
> 
> 



More information about the NeXus-developers mailing list