NXDICT, compression, four circle, robustness
c.m.moreton-smith at rl.ac.uk
c.m.moreton-smith at rl.ac.uk
Thu Apr 2 11:54:25 BST 1998
Just a few comments,
My feeling is that Przemek's approach has the advantage of keeping the
data from a single "experiment" in one place whereas Mark's seems
roughly equivalent with multiple NXentries (linking to the same sample,
instrument and user if necessary to avoid redundancy).
Which approach to take must surely depend on where the scanned variable
(e.g. temp) is being stored. If we have one temperature per NXentry
then the layout below might be most logical.
entry1, NXentry
TRICS, NXinstrument /* linked */
............
sample, NXsample /* linked */
frame, NXdata
temp, NXdata /* run at 10K */
entry2, NXentry
TRICS, NXinstrument /* linked */
............
sample, NXsample /* linked */
frame, NXdata
temp, NXdata /* run at 20K */
If all the temp values are stored in a single array i.e. temp = (10K
20K) it seems that we are getting closer to Przemek's representation.
Following on from Przemek, Emulating a Vdata means that you can get very
close to storing the physical n-tuples for each 'point' or 'measurement'
(one per row).
I interpret Przemek's schema (with some invented data!) as below
phi theta omega signal
=======================================
0.1 1.5 2.3 ( 423 123 ... nhist )
0.1 1.5 2.4 ( 134 456 ... nhist )
0.1 1.5 2.5 ( 894 789 ... nhist )
0.1 1.5 2.6 ( 120 101 ... nhist )
...
does the dimension "nhist" relate to an implicit scanned variable e.g.
temp ( 10k 20k .. nhist ) ?
if so, is it also in NXdata or lurking somewhere in the sample
parameters ?
Going back to Przemek's question about problems with other data,
I can't see any fundamental problems even with ISIS data! unless nhist =
n when it might possible to read the 2-D array along the wrong axis!
Having said this, I think there is a lot of mileage in going further.
Relational database theory (which is designed purely to make the
automatic processing of data more effective) defines a set of fairly
simple tests called normal forms which say that the better normalised
(the higher normal form) the data, the easier it is to manipulate and
store it.
For example, The table I've drawn from Przemek's information does
contain repeating groups (the sub-arrays) and would fail the criteria
for 1st Normal form. By re-jigging slightly to store the same data in
the form below (I'm assuming temp was the missing variable for the sake
of argument). We end up with a table in at least 3rd normal form as
below.
phi theta omega signal temp
=======================================
0.1 1.5 2.3 423 10k
0.1 1.5 2.4 123 20k
... ... (nth temp)
0.1 1.5 2.5 134 10K
0.1 1.5 2.6 456 20K
... ... (nth temp)
0.1 1.5 2.3 894 10K
0.1 1.5 2.4 789 20K
... ... (nth temp)
i.e.
entry1, NXentry
TRICS, NXinstrument
............
sample, NXsample
data, NXdata
phi SDS (vector, n points)
theta SDS (vector, n points)
omega SDS (vector, n points)
signal SDS (vector, n points)
temp SDS (vector, n points)
Having reached this point, there are three interesting conclusions that
can be drawn.
CONCLUSIONS
===========
1) Data in this form is ALWAYS representable in a single Vdata,
i.e. ONE Vdata instead of 24 * 5 Vdatas (already used by HDF
to make up the SDSs!!!).
2) As we already treat data in the NXdata group as special and
we control
the implementation via the API, would it not make sense to
make the NXdata group into a Vdata ?
3) Almost by definition (as I had to do with Przemek's layout),
doing this forced me to include extra data which would surely be
needed as part of the analysis.
I'm hoping I haven't made this too difficult to follow but please shout
AARRGH! back at me if I have!
Chris
More information about the NeXus-developers
mailing list