Improving NAPI storage efficiency

Mon Aug 25 20:48:36 BST 1997

Thanks to Freddie for the suggested modifications to the web page, which I
think I have now made.  I wanted to draw attention to another important
issue that we haven't discussed for a while.  There are real concerns about
the HDF performance, particularly at synchrotron sources where they will
have much faster data rates.  Therefore, we have to pay attention to
methods of improving I/O efficiency in the NeXus API.  There is a section
in the HDF user guide specifically discussing how to improve performance.
One of them covers the issue of reducing the size of fake dimension data.

If I understand correctly, in the old HDF, fake dimension scales were
automatically created, consisting of Vdata with as many records as the
dimension of the data.  The new version is meant to only store one record
stating the size of the dimension, but when I dumped the contents of my
LRMECS NeXus file, it appears that we are still storing both (!) types of
fake dimensions.   I thought that the documentation stated that the second
was the default, but this does not appear to be the case.  In order to
eliminate the old dimension scales, we need to add a call to
SDsetdimval_comp to remove backward compatibility.  Are there any reasons
why we should not do this?  If not, please could it be inserted into NAPI.C.

The other issue is that when we are storing scalar values in SDSs, we will
also be storing these same dimension scales.  We discussed this at
SoftNeSS'96, and concluded that we still wanted to use SDSs rather than
Vdatas for these scalars because it would make the API simpler, and Vdatas
could not take attributes.   This last point is no longer true.  In the
latest versions of HDF, Vdatas can have attributes.

Now that you have some experience in coding the API, would it be worth
considering storing single-valued SDS's as Vdatas, or would this make the
API too convoluted.  I'm not proposing that we do this - I'm just asking if
it is simple enough to consider.  One reason for considering it is that HDF
files do get clogged up with a lot of extraneous information when we use
SDS's, which third-party browsers don't always display very clearly.

Finally, I recall that the I/O efficiency could be improved by increasing
the size of the HDF header block.  Is this being done?  Does anyone know
how to do this?

Regards,
Ray

------------------------------------------------------
Dr Ray Osborn                  Tel: +1 (630) 252-9011
Materials Science Division     Fax: +1 (630) 252-7777
Argonne National Laboratory    E-mail: ROsborn at anl.gov
Argonne, IL 60439-4845

--------------------------------------------------------------------------------
Return-Path: <ROsborn at anl.gov>
Received: from dns2.anl.gov by ipns.pns.anl.gov (MX V4.2 AXP) with SMTP; Mon,
          25 Aug 1997 14:52:48 CST
Received: from osborn.msd.anl.gov (osborn.msd.anl.gov [146.139.244.39]) by
          dns2.anl.gov (8.6.11/8.6.11) with ESMTP id OAA26925 for
          <ROsborn at anl.gov>; Mon, 25 Aug 1997 14:52:45 -0500
Received: from dns2.anl.gov (146.139.254.3) by osborn.msd.anl.gov with SMTP
          (Eudora Internet Mail Server 1.1.2); Mon, 25 Aug 1997 14:49:43 -0500
Received: from osborn.msd.anl.gov (osborn.msd.anl.gov [146.139.244.39]) by
          dns2.anl.gov (8.6.11/8.6.11) with ESMTP id OAA26842 for
          <NAPI at anl.gov>; Mon, 25 Aug 1997 14:49:24 -0500
Received: from [146.139.244.39] by osborn.msd.anl.gov with ESMTP (Eudora
          Internet Mail Server 1.1.2); Mon, 25 Aug 1997 14:48:44 -0500
Message-ID: <l03102800b0278f432924@[146.139.244.39]>