Levels of NeXus compliance & More compression

Ray Osborn ROsborn at anl.gov
Wed Feb 16 14:30:23 GMT 2000


on 2000/02/16 4:12 AM, C.M.Moreton-Smith at rl.ac.uk at
C.M.Moreton-Smith at rl.ac.uk wrote:

> I've done a bit more research into better compression for NeXus files and
> thought it would be good to pass on what I've discovered.
> 
> Chasing around the web I've come across two lossless compression codes which
> (on ISIS files at least) can give approximately 40-50% better compression
> than LZW (level 9).  We have a real problem currently with datafiles of
> 50-100MB each which can now be produced as rapidly as every 2 minutes! with
> this in mind, an improvement like this in compression of NeXus files could
> save us 400GB of storage which is quite a cost saving.
> 

I find your results very encouraging.  It suggests that the standard NeXus
compression works very well, giving you reduction factors of nearly 10.  It
is, of course, possible to do better if you compress the whole HDF file
because then you are compressing the HDF headers, address blocks, small
SDS's etc.  If that space gain is more vital than immediate access to the
data, then do both.  Use NeXus compression, and then compress the whole
file.

> Contrary to my earlier belief, the codes work best on raw (uncompressed)
> files and go well beyond the two stage compression we are currently using
> for archiving existing ISIS (.RAW) files.
> 
> I've attached a table giving the compression ratios achieved with the
> different codes, firstly on a NeXus level 0 translation of a 54Mb GEM data
> file and secondly on a file containing just the integer count data alone
> (CNT1) from the same file.
> 
> Tantalizingly, the source for the best codes is not available currently
> which makes them risky to rely on.  The best widely available compressor
> surprisingly is the Microsoft CAB file kit!
> 

I agree that it is risky using proprietary codes.  Someone recently
described this as giving ownership of your files to the company that
controls the data format.  In effect, they decide whether you can have
access to your own data.

> One thing that is clear is that with a good compression code, multi-stage
> compression is less satisfactory and that there is still a good case for
> writing uncompressed NeXus files first then compressing the whole file
> afterwards.
> 

I think the advantage of using NeXus compression before overall file
compression is that you don't take such a big disk space hit when you
decompress the file.  I'm sure you've had experience of users deciding to
restore a dozen files at once.  With the double compression scheme, this
will not be a problem; the user will still have efficiently compressed files
that they can access as simply as if decompressed.

One issue that is only hinted at in the attached file is the speed of each
of these schemes.  Can you say how long it took for NeXus to compress the
50Mb files?  I suspect that this will be the deciding factor for most people
in choosing compression strategies.

Ray
-- 
Dr Ray Osborn                Tel: +1 (630) 252-9011
Materials Science Division   Fax: +1 (630) 252-7777
Argonne National Laboratory  E-mail: ROsborn at anl.gov
Argonne, IL 60439-4845





More information about the NeXus mailing list