[Nexus-developers] Handling of C-string null characters

Nick Maliszewskyj nickm at nist.gov
Wed Jul 6 14:31:20 BST 2005

On the matter of the actual length of a stored string I'll throw my
two cents in with Freddie and Ray. Store the non-null characters
of the string in the file so that the length of the object looks the
same to all languages. C API users will assume the burden for
allocating enough storage and terminating strings with \0.

Akeroyd, FA (Freddie) wrote:

>NXmalloc() should allocate length+1 bytes (where length is what
>NXgetinfo() returns) and then set element "length+1" to NULL. When
>NXgetdata() is called, even though it doesn't add a NULL byte itself,
>there will then be a NULL present at the end of the string as
>NXgetdata() will only write "length" characters. If the user instead
>uses malloc() he needs to remember to allocate "length+1" bytes and then
>add the NULL himself.
Let's not mess with malloc(). If we're only writing length characters, then
we'll only transfer length characters from the string variables on a call to
the API. For the small benefit we'd get for C strings we could run into 
problems for non-string data. If we really want to make life easier for the
C programmers we should provide a utility function for string handling,
otherwise assume they know what they're doing.

>The other question is the stripping of spaces in strings. Currently the
>API, unless you open the file with the new NXACC_NOSTRIP option, will
>strip both leading and trailing spaces and also collapse/merge multiple
>spaces between words to a single space e.g.
>"  nexus       data    "    ->   "nexus data"
>I think stripping leading + trailing spaces is probably reasonable, but
>what about embedded spaces - is it reasonable to always reduce them to a
>single space? Note that "space" here means anything recognised as a
>space by the isspace() C function i.e. tabs and newline characters will
>also get removed/turned into a single space. I think we need another
>option to control the merging of spaces between words in addition to
>stripping leading and trailing spaces - embedded spaces/tabs/newlines
>may be important for formatting purposes if a text data/log file has
>been included in a NeXus file. I would propose that the default be to
>strip leading/trailing "spaces" but to preserve embedded "spaces".
Yes, strip leading and trailing whitespace but preserve whitespace in
the middle.  I would presume that for hand-generated XML leading
and trailing whitespace could be an artifact of editing while embedded
whitespace could be significant (e.g., tabular information in a note whose
formatting is effected entirely by whitespace).


o Dr. Nicholas C. Maliszewskyj
o Center for Neutron Research
o National Institute of Standards & Technology
o 100 Bureau Drive, Stop 8562
o Gaithersburg MD 20899-8562
o nickm at nist.gov     Phone: (301)975-3171    Fax: (301)921-9847

More information about the NeXus-developers mailing list