[Nexus-developers] Handling of C-string null characters

Akeroyd, FA (Freddie) F.A.Akeroyd at rl.ac.uk
Wed Jul 6 14:03:51 BST 2005


NXmalloc() should allocate length+1 bytes (where length is what
NXgetinfo() returns) and then set element "length+1" to NULL. When
NXgetdata() is called, even though it doesn't add a NULL byte itself,
there will then be a NULL present at the end of the string as
NXgetdata() will only write "length" characters. If the user instead
uses malloc() he needs to remember to allocate "length+1" bytes and then
add the NULL himself.

The other question is the stripping of spaces in strings. Currently the
API, unless you open the file with the new NXACC_NOSTRIP option, will
strip both leading and trailing spaces and also collapse/merge multiple
spaces between words to a single space e.g.

"  nexus       data    "    ->   "nexus data"

I think stripping leading + trailing spaces is probably reasonable, but
what about embedded spaces - is it reasonable to always reduce them to a
single space? Note that "space" here means anything recognised as a
space by the isspace() C function i.e. tabs and newline characters will
also get removed/turned into a single space. I think we need another
option to control the merging of spaces between words in addition to
stripping leading and trailing spaces - embedded spaces/tabs/newlines
may be important for formatting purposes if a text data/log file has
been included in a NeXus file. I would propose that the default be to
strip leading/trailing "spaces" but to preserve embedded "spaces".

Freddie

> -----Original Message-----
> From: nexus-developers-bounces at anl.gov [mailto:nexus-developers-
> bounces at anl.gov] On Behalf Of Ray Osborn
> Sent: 05 July 2005 18:20
> To: Nexus-Developers at anl.gov
> Subject: [Nexus-developers] Handling of C-string null characters
> 
> There is one urgent thing that we need to clear up before we release
NAPI
> v3.0, and that concerns how we handle string lengths.  Following
problems
> with the XML API, Mark has now changed NXgetinfo so that it returns
the
> length of the string in the Fortran API but adds one to the length in
the
> C
> API to accommodate the NULL character.  I think this is the wrong way
to
> approach this problem, and I think Freddie agreed with me when he
wrote to
> confirm what the API now does.  We need to resolve this quickly so
other
> opinions are welcomed.
> 
> So I'm raising the old question - how long is a string?
> 
> Current Behaviour:
> 
> NXgetinfo and NXmalloc adds the extra byte to the length of character
> strings, when called in C, but it is removed in the Fortran API.  The
> length
> of "neutron" is 8 in C but 7 in Fortran (and presumably other APIs
such as
> Python).  NXgetdata will return "neutron\0" in C, but "neutron" in
> Fortran.
> 
> Proposal (my view, and I believe Freddie's):
> 
> The length of a character string returned by NXgetinfo should be the
> number
> of characters excluding the NULL character, and NXgetdata should
return
> exactly those characters.  The documentation should warn the
C-programmer
> to
> add one byte to the allocation, if they use malloc directly, and to
add
> the
> NULL character to the string returned by NXgetdata to make a C-string.
> NXmalloc will automatically add the extra byte when allocating memory.
> 
> This ensures that the length does not depend on the language used to
read
> the NeXus file.   C-programmers are used to dealing with this issue
and
> don't need to be spoon-fed.  The average non-programming user will,
> however,
> be confused why "neutron" is 8 characters long according to NXbrowse
and
> most other generic file readers, but only seven according to the
Fortran
> API.  This will prevent such confusion in a well-documented way.
> 
> We may need to put this to a vote, but we should settle it before
Friday
> if
> Nick's timetable is to be kept.
> 
> Regards,
> Ray
> --
> Dr Ray Osborn                Tel: +1 (630) 252-9011
> Materials Science Division   Fax: +1 (630) 252-7777
> Argonne National Laboratory  E-mail: ROsborn at anl.gov
> Argonne, IL 60439-4845
> 
> 
> 
> _______________________________________________
> NeXus-developers mailing list
> NeXus-developers at anl.gov
> http://www.neutron.anl.gov/mailman/listinfo/nexus-developers





More information about the NeXus-developers mailing list