Levels of NeXus compliance & More compression

C.M.Moreton-Smith at rl.ac.uk C.M.Moreton-Smith at rl.ac.uk
Wed Feb 16 10:12:18 GMT 2000


I've done a bit more research into better compression for NeXus files and
thought it would be good to pass on what I've discovered.

Chasing around the web I've come across two lossless compression codes which
(on ISIS files at least) can give approximately 40-50% better compression
than LZW (level 9).  We have a real problem currently with datafiles of
50-100MB each which can now be produced as rapidly as every 2 minutes! with
this in mind, an improvement like this in compression of NeXus files could
save us 400GB of storage which is quite a cost saving.

Contrary to my earlier belief, the codes work best on raw (uncompressed)
files and go well beyond the two stage compression we are currently using
for archiving existing ISIS (.RAW) files.

I've attached a table giving the compression ratios achieved with the
different codes, firstly on a NeXus level 0 translation of a 54Mb GEM data
file and secondly on a file containing just the integer count data alone
(CNT1) from the same file.

Tantalizingly, the source for the best codes is not available currently
which makes them risky to rely on.  The best widely available compressor
surprisingly is the Microsoft CAB file kit!

One thing that is clear is that with a good compression code, multi-stage
compression is less satisfactory and that there is still a good case for
writing uncompressed NeXus files first then compressing the whole file
afterwards.

Chris

> -----Original Message-----
> From: Mark Koennecke [mailto:Mark.Koennecke at psi.ch]
> Sent: 27 January 2000 07:15
> To: NEXUS at anpns1.pns.anl.gov; C.M.Moreton-Smith at RL.AC.UK
> Subject: Re: Levels of NeXus compliance & More compression
> 
> 
> 
>   Sorry for replying late to this....
> 
> 
> On Fri, 21 Jan 2000 C.M.Moreton-Smith at rl.ac.uk wrote:
> 
> > Level 0 NeXus Files
> > -------------------
> > At ISIS, we can now automatically create what I'm calling 
> "Level 0" or
> > ".nx0" NeXus file from any ISIS raw file using an automatic 
> conversion
> > program.  The Level 0 specifies the minimum level of NeXus 
> compliance,
> > simply that the file is written using only the NeXus API, 
> nothing else, no
> > dictionary or structure.
> > 
> > Even at this level, NeXus is very valuable, it insulates us from the
> > complexities of HDF, it provides for a unified set of code 
> for reading and
> > writing and since compression is part of the standard, it 
> now allows us to
> > create smaller files just by re-writing them!
> > 
> > Level 1 NeXus Files
> > -------------------
> > These I think are what we are discussing currently as NeXus files;
> > informally, we aim to provide the normally expected NeXus groups,
> > appropriate attributes for axes etc. but we are fairly 
> flexible about what
> > has to be there - and in fact, we can't really tell the 
> difference between a
> > "valid" data file or not.  Extra fields can be added and 
> most dictionary
> > fields are optional.
> > 
> > Level 2 NeXus Files
> > -------------------
> > When we start describing specific file formats for, say, 
> reflectometry.  It
> > becomes more important to be sure that the file is a valid 
> file for a
> > particular group of users.  At this point we could really 
> do with being able
> > to define the sort of data in the data group, specific 
> elements in the
> > instrument configuration which must be there and, 
> importantly, be able to
> > validate the file automatically against a definition.  At 
> the point of a
> > definition and some form of automatic validation I think we 
> cross from a
> > Level 1 to a Level 2 file.
> 
>   I think this is a nice way of looking at it. In this scheme we are
>   currently creating Level 1 NeXus files. Level 2 is ill 
> defined because
>   our glossary is ill defined and we do not have a scheme for 
> defining and
>   validating an instrument description. For a fully general reading
>   routine for a specific instrument type, for example a powder
>   diffractometer, it would be nice if  structures and names 
> match up. This
>   part of NeXus still needs development.
> 
>   Semi consciously I was aware of this already after my first 
> data file 
>   definition, for the powder diffractometer. Partly this was 
> the reason
>   to devise NXDICT which allows me to change names and 
> placements of data
>   items in the file by editing the template file.  
>      
> > Compression ++
> > ==============
> > Currently a de-motivator to storing our data in NeXus is that the
> > compression is not as good as we can currently get with our 
> native format
> > files.  We use two simple FORTRAN routines which 
> compress/decompress our
> > integer signal data based on the assumption that the 
> difference between two
> > adjacent data points can usually be stored as a relative 
> offset in a single
> > byte rather than as a longword integer value.
> > 
>   I would follow Ray in this respect that I'am reluctant to give up
>   general HDF reading tools for this. And I think the case should be
>   proven more carefully. Tim Mooney made extensive tests with 
> compression 
>   and achieved surprisingly good results with the HDF 
> compression schemes. 
>   One would need to prove that the ISIS scheme + LZW or whatever is
>   really substantially better then the HDF-schemes alone for 
> a couple of
>   dozen data files from different instruments. 
> 
>                    Regards,
> 
>                             Mark
>   
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Raw Compression values.htm
Type: application/octet-stream
Size: 17892 bytes
Desc: not available
Url : http://lists.nexusformat.org/pipermail/nexus/attachments/20000216/b6ee2828/attachment.obj 


More information about the NeXus mailing list