[Nexus-developers] NeXus 2D Strings

tieman tieman at aps.anl.gov
Fri Dec 15 15:49:16 GMT 2006


Mark,

Freddie suggested:

>> I think the problem is due to an error in the way the API tries to strip
>> whitespace on strings - try opening the file with the flags
>> NXACC_READ|NXACC_NOSTRIP

This does, indeed, work to read the HDF4 files.  I had to hack 
NXmakedata in napi.c to remove the check on multi-dimensional character 
arrays that was preventing the writes of 2D data in order to get writes 
to work as I'm used to, though.

For the most part, my 2D char arrays are in a sort of electronic log we 
generate for each sample.  The "experiment file" as we refer to it is a 
quasi complete log of all experimental parameters (beamline setting, 
detector setting, etc...) as well as a processing history of the data 
(acquired data file names, acquired white/dark file names, processing 
algorithms used, cluster machines used to process, etc...)  The 
experiment file contains all the data that would be redundant to put 
into each data file itself.

The only place I use 2D char arrays is for lists of file names which, in 
my case, are a fixed size for a given list.  The file names are not 
terminated nor are there embedded escape characters.  On read, I know 
how long a file name is and how many there are simply by looking at dims[].

I'd like to continue to be able to do this with HDF4 and HDF5 if possible.

I don't care much about XML but I would almost argue to treat strings in 
XML the same as HDF does--that is a '\n' is a single character.  Sure, 
looking at the XML in a text editor will look funny and one will need to 
be careful about how those files are copied about, but I think XML will 
handle it OK if you don't try and strip the unprintable characters.  
And, as you mentioned, there is no need for supporting multi-dimensional 
char arrays in the Nexus spec.  Some of use just like Napi as an API and 
only loosely adhere to the Nexus spec though...

...my $0.02 worth...

Brian

Mark Koennecke wrote:
> High,
>
> 2D string arrays should work in HDF-4. We never supported them in 
> HDF-5 because the NeXus standard nowhere requires 2D strings  and we 
> were lazy.  It
> is possible to support string arrays in HDF-5. As Freddy rightly 
> mentioned there is a problem writing 2D string arrays in XML. The 
> obvious solution is to
> make a new line for each run in the array. However, this falls over 
> when newlines are in the data. This can be solved by escaping newlines 
> in the data. But this
> causes trouble to those  who solved the current NeXus 2D string 
> problem by formatting their string arrays in a newline separated long 
> string. This may be solved
> by escaping newline only when  the dimensionality is higher then 1.
>
> This raises the question of dimensionality: is 2D sufficient or do we 
> have to go for the most general case of up to 32 dimensional string 
> arrays?
>
> Then there is the issue of ragged string arrays. Usually strings are 
> of different length in a string array. Currently this is solved by 
> padding arrays to the longest
> string in the set.
>
> This gets even more complicated if we start to think about unicode.....
>
> Summing it up, before we can implement 2D string arrays we need to 
> find some consensus on:
> - Padding strings to match arrays
> - Formatting string arrays in XML
> - Decide if 2D is enough or if we wish to support the more general 
> case which is also more work.
>
> Finally, I wish to point out that storing the strings in array for 
> NX_UINT8 might be a feasible workaround. This just is
> ugly to look at when printed with a program which does not know about 
> this.
>
>                     Best Regards,
>
>                                   Mark Koennecke
>



More information about the NeXus-developers mailing list