NXDICT, compression, four circle, robustness

c.m.moreton-smith at rl.ac.uk c.m.moreton-smith at rl.ac.uk
Thu Apr 2 11:54:25 BST 1998


Just a few comments,

My feeling is that Przemek's approach has the advantage of keeping the
data from a single "experiment" in one place whereas Mark's seems
roughly equivalent with multiple NXentries (linking to the same sample,
instrument and user if necessary to avoid redundancy).

Which approach to take must surely depend on where the scanned variable
(e.g. temp) is being stored.  If we have one temperature per NXentry
then the layout below might be most logical.

      entry1, NXentry
	      TRICS, NXinstrument 	/* linked */
		      ............
	      sample, NXsample		/* linked */
	      frame, NXdata
		temp, NXdata   		/* run at 10K */

      entry2, NXentry
	      TRICS, NXinstrument  	/* linked */
		      ............
	      sample, NXsample 		/* linked */
	      frame, NXdata
		temp, NXdata   		/* run at 20K */

If all the temp values are stored in a single array i.e. temp = (10K
20K) it seems that we are getting closer to Przemek's representation.

Following on from Przemek, Emulating a Vdata means that you can get very
close to storing the physical n-tuples for each 'point' or 'measurement'
(one per row).

I interpret Przemek's schema (with some invented data!) as below

	phi	theta	omega	signal
	=======================================
	0.1	1.5	2.3	( 423 123 ... nhist )
	0.1	1.5	2.4	( 134 456 ... nhist )
	0.1	1.5	2.5	( 894 789 ... nhist )
	0.1	1.5	2.6	( 120 101 ... nhist )
	...

does the dimension "nhist" relate to an implicit scanned variable e.g.

	temp ( 10k 20k .. nhist ) ?

if so, is it also in NXdata or lurking somewhere in the sample
parameters ?

Going back to Przemek's question about problems with other data,

I can't see any fundamental problems even with ISIS data! unless nhist =
n when it might possible to read the 2-D array along the wrong axis!

Having said this, I think there is a lot of mileage in going further.
Relational database theory (which is designed purely to make the
automatic processing of data more effective) defines a set of fairly
simple tests called normal forms which say that the better normalised
(the higher normal form) the data, the easier it is to manipulate and
store it.

For example, The table I've drawn from Przemek's information does
contain repeating groups (the sub-arrays) and would fail the criteria
for 1st Normal form. By re-jigging slightly to store the same data in
the form below (I'm assuming temp was the missing variable for the sake
of argument).  We end up with a table in at least 3rd normal form as
below.

	phi	theta	omega	signal	temp
	=======================================
	0.1	1.5	2.3	423		10k
	0.1	1.5	2.4	123		20k
	...					... (nth temp)
	0.1	1.5	2.5	134		10K
	0.1	1.5	2.6	456		20K
	...					... (nth temp)
	0.1	1.5	2.3	894		10K
	0.1	1.5	2.4	789		20K
	...					... (nth temp)

i.e.

      entry1, NXentry
	      TRICS, NXinstrument 
		      ............
	      sample, NXsample

	      data, NXdata
		 phi    SDS (vector, n points)
		 theta  SDS (vector, n points)
		 omega  SDS (vector, n points)
		 signal SDS (vector, n points)
		 temp	  SDS (vector, n points)

Having reached this point, there are three interesting conclusions that
can be drawn.

CONCLUSIONS
===========

	1) Data in this form is ALWAYS representable in a single Vdata,
i.e. ONE 	   Vdata instead of 24 * 5 Vdatas (already used by HDF
to make up the 		   SDSs!!!).

	2) As we already treat data in the NXdata group as special and
we control
	   the implementation via the API, would it not make sense to
make the 		   NXdata group into a Vdata ?

	3) Almost by definition (as I had to do with Przemek's layout),
doing this 	   forced me to include extra data which would surely be
needed as part of 	   the analysis.


I'm hoping I haven't made this too difficult to follow but please shout
AARRGH! back at me if I have!

	Chris



More information about the NeXus-developers mailing list