[Nexus-developers] NeXus 2D Strings
tieman
tieman at aps.anl.gov
Fri Dec 15 15:49:16 GMT 2006
Mark,
Freddie suggested:
>> I think the problem is due to an error in the way the API tries to strip
>> whitespace on strings - try opening the file with the flags
>> NXACC_READ|NXACC_NOSTRIP
This does, indeed, work to read the HDF4 files. I had to hack
NXmakedata in napi.c to remove the check on multi-dimensional character
arrays that was preventing the writes of 2D data in order to get writes
to work as I'm used to, though.
For the most part, my 2D char arrays are in a sort of electronic log we
generate for each sample. The "experiment file" as we refer to it is a
quasi complete log of all experimental parameters (beamline setting,
detector setting, etc...) as well as a processing history of the data
(acquired data file names, acquired white/dark file names, processing
algorithms used, cluster machines used to process, etc...) The
experiment file contains all the data that would be redundant to put
into each data file itself.
The only place I use 2D char arrays is for lists of file names which, in
my case, are a fixed size for a given list. The file names are not
terminated nor are there embedded escape characters. On read, I know
how long a file name is and how many there are simply by looking at dims[].
I'd like to continue to be able to do this with HDF4 and HDF5 if possible.
I don't care much about XML but I would almost argue to treat strings in
XML the same as HDF does--that is a '\n' is a single character. Sure,
looking at the XML in a text editor will look funny and one will need to
be careful about how those files are copied about, but I think XML will
handle it OK if you don't try and strip the unprintable characters.
And, as you mentioned, there is no need for supporting multi-dimensional
char arrays in the Nexus spec. Some of use just like Napi as an API and
only loosely adhere to the Nexus spec though...
...my $0.02 worth...
Brian
Mark Koennecke wrote:
> High,
>
> 2D string arrays should work in HDF-4. We never supported them in
> HDF-5 because the NeXus standard nowhere requires 2D strings and we
> were lazy. It
> is possible to support string arrays in HDF-5. As Freddy rightly
> mentioned there is a problem writing 2D string arrays in XML. The
> obvious solution is to
> make a new line for each run in the array. However, this falls over
> when newlines are in the data. This can be solved by escaping newlines
> in the data. But this
> causes trouble to those who solved the current NeXus 2D string
> problem by formatting their string arrays in a newline separated long
> string. This may be solved
> by escaping newline only when the dimensionality is higher then 1.
>
> This raises the question of dimensionality: is 2D sufficient or do we
> have to go for the most general case of up to 32 dimensional string
> arrays?
>
> Then there is the issue of ragged string arrays. Usually strings are
> of different length in a string array. Currently this is solved by
> padding arrays to the longest
> string in the set.
>
> This gets even more complicated if we start to think about unicode.....
>
> Summing it up, before we can implement 2D string arrays we need to
> find some consensus on:
> - Padding strings to match arrays
> - Formatting string arrays in XML
> - Decide if 2D is enough or if we wish to support the more general
> case which is also more work.
>
> Finally, I wish to point out that storing the strings in array for
> NX_UINT8 might be a feasible workaround. This just is
> ugly to look at when printed with a program which does not know about
> this.
>
> Best Regards,
>
> Mark Koennecke
>
More information about the NeXus-developers
mailing list