[Nexus-developers] RE: NeXus-API XML in alpha
Mark Koennecke
Mark.Koennecke at psi.ch
Wed Oct 6 09:05:24 BST 2004
Peter,
On Tue, 5 Oct 2004, Peterson, Peter F. wrote:
> Mark,
>
> Thank you for the hard work. Please see my comments below.
>
> The NeXus-API for XML passes the standard NeXus-API self test which at
> least promotes it to alpha. Contrary to my previous e-mail, I decided
> to base the XML part of the API on mini-XML, a tree API for XML. The
> mxml homepage:
> http://www.easysw.com/~mike/mxml
> I worked with the author of mxml, Michael Sweet, to extend mxml in such
> a way that we can store NeXus datasets in the tree efficiently. I
> very much hope, that my changes will make it in the standard
> distribution of mxml; so that we do not have to maintain that
> library as well.
>
> At first I was curious why you didn't use something a more widespread, like libxml. Then I discovered that this provides a library for reading and writing. I did not find information in the documentation, does this only parse using DOM (no SAX support)? The reason I ask is utilities that use XML would be easiest if there is only one XML library used, and libxml is on redhat installations (v>9.0) for the system.
There is a good reason for this. I looked at libxml and other xml
libraries. I realized that all of them were good for storing text. What
we needed (lots of numbers) would have been stored as big lumps of text.
This is both difficult for access and hugely inefficient. The XML-NeXus
API is certainly not the right tool to use for very large data sets. But
I learned that users do not care about such things and thats why I
wanted to handle modestly large datasets gracefully. This means the xml
library of choice would have to be hacked. libxml is very large, it
includes DOM and SAX parsers, http client code XML-RPC code etc. I
rather wanted a smaller, more focused, library which we may even
be able to maintain if the original author does not support us. The
choice was mxml.
>
> For links I took the liberty to store them as elements of type NXlink
> with an attribute target which is the path to the linked item. NXopengroup,
> NXopendata, etc follow these links. The implementation of the
> NeXus-API is next to complete, but two issues remain:
> - I did not implement unlimited dimensions. This would be complicated,
> and I am not convinced that we need it.
> - Currently, the XML-API rejects non text array type attributes and
> converts single number attributes silently to text. Reading attributes
> returns only text attributes. If this is not acceptable, we need to
> store type information with the attribute. I would suggest a scheme
> like some_attribute="NX_INT32[10]: 1 2 3 4 5 6 7 8 9 10". What does
> everybody else think?
> The XML NeXus API can be enabled through the define: NXXML.
>
> Unlimited dimensions is not allowed by the API already. There is a limit to <25 dimensions. I do not know if there is an error check on it, but there will be a buffer overflow.
>
Unlimited dimensions means that you can extend the dataset along the
first dimension. You give it a dimension of -1 and throw data at it
with NXputslab. This is supported by both HDF API's.
> For attribute types, NXtranslate tries to convert them to ints, then floats, and finally character arrays. There is no concept of int arrays as attributes. Your solution of including the type and length in the value is much more elegant. I would add one special case: if there is no type (the first three characters are not "NX_") then the value is a character array.
>
Actually originally NeXus attributes were forseen to at least support 1D
arrays. We may drop this of course. I see what others say, perhaps I
change the attribute handling to my suggestion.
> Peter, long ago, reported an issue to me about performance problems
> when reading multi dimensional arrays from the Java-API. I looked
> into this problem and I'am afraid this can only be fixed by
> introducing another performance issue, a memory problem now:
> Currently, the Java-API already needs two copies of the data for
> conversions from NeXus to Java number formats. If we fix the
> performance problem we need three:
> - the byte arrya coming from the C-API
> - An intermediate one dimensional array
> - The final multi dimensional array.
> I am not yet ready to do this. I think the implementation in
> HDFArray.java which converts rows of multi dimensional data sets is
> the best we can do in the general case. Has anybody got a better idea?
>
> There only needs to be two copies, one for C and a multi-dimensional version for Java. Java reads in slabs supplying subarrays from what you allocated to napi. A working example is found in NXvalid, the file gov.anl.util.NXutil, the function starting at line 159.
I look into this.
>
> Before we release a new version I would like to have the C++ API written. A thread on the wiki exists for what functions should be included at <http://www.neutron.anl.gov.:8080/NeXus/84>
I put my comments about this onto the Swiki page. We can discuss the C++
API and when to relase a new version at the NIAC meeting.
Are you aware of the fact, that Elena Pourmal from the HDF team is a
guest at the NIAC meeting? She offered to give a short
(15min) presentation about where HDF is headed and I think we should go
for it.
See you soon,
Mark Koennecke
More information about the NeXus-developers
mailing list