Levels of NeXus compliance & More compression

C.M.Moreton-Smith at rl.ac.uk C.M.Moreton-Smith at rl.ac.uk
Fri Jan 21 15:14:17 GMT 2000


I'm just off on holiday for a couple of weeks but thought it might be worth
stimulating discussion on two topics which have been occupying me on and off
for the last couple of months.

Levels of NeXus compliance
==========================
In looking at moving existing native format data (in our case ISIS data
files) to NeXus, there appear to be several evolutionary stages.  We haven't
really addressed how we define a valid NeXus file but these stages have
suggested at least a broad brush way of looking at NeXus compliance.

Level 0 NeXus Files
-------------------
At ISIS, we can now automatically create what I'm calling "Level 0" or
".nx0" NeXus file from any ISIS raw file using an automatic conversion
program.  The Level 0 specifies the minimum level of NeXus compliance,
simply that the file is written using only the NeXus API, nothing else, no
dictionary or structure.

Even at this level, NeXus is very valuable, it insulates us from the
complexities of HDF, it provides for a unified set of code for reading and
writing and since compression is part of the standard, it now allows us to
create smaller files just by re-writing them!

Level 1 NeXus Files
-------------------
These I think are what we are discussing currently as NeXus files;
informally, we aim to provide the normally expected NeXus groups,
appropriate attributes for axes etc. but we are fairly flexible about what
has to be there - and in fact, we can't really tell the difference between a
"valid" data file or not.  Extra fields can be added and most dictionary
fields are optional.

Level 2 NeXus Files
-------------------
When we start describing specific file formats for, say, reflectometry.  It
becomes more important to be sure that the file is a valid file for a
particular group of users.  At this point we could really do with being able
to define the sort of data in the data group, specific elements in the
instrument configuration which must be there and, importantly, be able to
validate the file automatically against a definition.  At the point of a
definition and some form of automatic validation I think we cross from a
Level 1 to a Level 2 file.


the second topic...

Compression ++
==============
Currently a de-motivator to storing our data in NeXus is that the
compression is not as good as we can currently get with our native format
files.  We use two simple FORTRAN routines which compress/decompress our
integer signal data based on the assumption that the difference between two
adjacent data points can usually be stored as a relative offset in a single
byte rather than as a longword integer value.

The scheme is also very fast and good compression is still possible
subsequently with LZW.  The question is, could we add this scheme as a
compression option to the NeXus API?  Obviously if we don't add it to the
API and we continue to use this, our files would not be browseable with a
NeXus browser automatically.

On the plus side, the scheme is very likely to work for most forms of
spectra based data and would certainly be of general benefit to NeXus(i.e.
not just ISIS). On the minus side, we would have to implement the
compression in the NeXus API and not in the generic HDF beneath.  This would
mean that we need to use the NeXus API and NeXus browsers.  Are we sure
enough of the benefits of NeXus to take a step like this and improve the
NeXus over the underlying HDF?

In our case we can get up to 30% higher compression and with expected data
rates of 1-2GB a day from some instruments, this becomes very significant,
the traditional argument that disk space etc. is cheap is not very
convincing for data rates like this.


What do others think?

best regards


Chris


P.S.  Attached is a Level 0 conversion of an ISIS raw file, it is complete
and probably the first time that we've been able to export a raw file in a
format which anyone can read easily without having to use our own software!

--
Chris Moreton-Smith, Software Development Manager
ISIS Science Instrumentation, CCLRC, Chilton, Didcot, OXON OX11 0QX
Telephone: +44 (0) 1235 446544, Fax: +44 (0) 1235 445720
Email: C.Moreton-Smith at rl.ac.uk


-------------- next part --------------
A non-text attachment was scrubbed...
Name: irs12836.nx0
Type: application/octet-stream
Size: 143296 bytes
Desc: not available
Url : http://lists.nexusformat.org/pipermail/nexus/attachments/20000121/39cc3720/attachment.obj 


More information about the NeXus mailing list