[Nexus-developers] RE: NeXus-API XML in alpha

Peterson, Peter F. petersonpf at ornl.gov
Wed Oct 6 15:05:56 BST 2004


Mark,

I just realized that v2.0 was never released to the public. Only people
running out of CVS have it and there are no links to it from the NeXus
web site. Last thing that I recall, we were still in a "testing" stage
of the api, waiting for people to return that it worked fine on their
system.

P^2

-----Original Message-----
From: Mark Koennecke [mailto:Mark.Koennecke at psi.ch] 
Sent: Wednesday, October 06, 2004 4:05 AM
To: Peterson, Peter F.
Cc: nexus-developers at anl.gov
Subject: RE: NeXus-API XML in alpha



Peter,

On Tue, 5 Oct 2004, Peterson, Peter F. wrote:

> Mark,
> 
> Thank you for the hard work. Please see my comments below.
> 
>   The NeXus-API for XML passes the standard NeXus-API self test which
at
>   least promotes it to alpha. Contrary to my previous e-mail, I
decided
>   to base the XML part of the API on mini-XML, a tree API for XML. The
>   mxml homepage:
>             http://www.easysw.com/~mike/mxml
>   I worked with the author of mxml, Michael Sweet, to extend mxml in
such
>   a way that we can store NeXus datasets in the tree efficiently. I
>   very much hope, that my changes will make it in the standard
>   distribution of mxml; so that we do not have to maintain that
>   library as well.
> 
> At first I was curious why you didn't use something a more widespread,

> like libxml. Then I discovered that this provides a library for 
> reading and writing. I did not find information in the documentation, 
> does this only parse using DOM (no SAX support)? The reason I ask is 
> utilities that use XML would be easiest if there is only one XML 
> library used, and libxml is on redhat installations (v>9.0) for the 
> system.

  There is a good reason for this. I looked at libxml and other xml
  libraries. I realized that all of them were good for storing text.
What
  we needed (lots of numbers) would have been stored as big lumps of
text.
  This is both difficult for access and hugely inefficient.  The
XML-NeXus
  API is certainly not the right tool to use for very large data sets.
But
  I learned that users do not care about such things and thats why I
  wanted to handle modestly large datasets gracefully. This means the
xml
  library of choice would have to be hacked. libxml is very large, it
  includes DOM and SAX parsers, http client code XML-RPC code etc. I
  rather wanted a smaller, more focused, library which we may even
  be able to maintain if the original author does not support us. The
  choice was mxml.  
> 
>   For links I took the liberty to store them as elements of type
NXlink
>   with an attribute target which is the path to the linked item.
NXopengroup,
>   NXopendata, etc follow these links. The implementation of the
>   NeXus-API is next to complete, but two issues remain:
>   - I did not implement unlimited dimensions. This would be
complicated,
>     and I am not convinced that we need it.
>   - Currently, the XML-API rejects non text array type attributes and
>     converts single number attributes silently to text. Reading
attributes
>     returns only text attributes. If this is not acceptable, we need
to
>     store type information with the attribute. I would suggest a
scheme 
>     like some_attribute="NX_INT32[10]: 1 2 3 4 5 6 7 8 9 10". What
does
>     everybody else think?
>   The XML NeXus API can be enabled through the define: NXXML.
> 
> Unlimited dimensions is not allowed by the API already. There is a 
> limit to <25 dimensions. I do not know if there is an error check on 
> it, but there will be a buffer overflow.
> 
  Unlimited dimensions means that you can extend the dataset along the
  first dimension. You give it a dimension of -1 and throw data at it
  with NXputslab. This is supported by both HDF API's. 

> For attribute types, NXtranslate tries to convert them to ints, then 
> floats, and finally character arrays. There is no concept of int 
> arrays as attributes. Your solution of including the type and length 
> in the value is much more elegant. I would add one special case: if 
> there is no type (the first three characters are not "NX_") then the 
> value is a character array.
> 
  Actually originally NeXus attributes were forseen to at least support
1D
  arrays. We may drop this of course. I see what others say, perhaps I
  change the attribute handling to my suggestion.

>   Peter, long ago, reported an issue to me about performance problems
>   when reading multi dimensional arrays from the Java-API. I looked
>   into this problem and  I'am afraid this can only be fixed by
>   introducing another performance issue, a memory problem now:
>   Currently, the Java-API already needs two copies of the data for
>   conversions from NeXus to Java number formats. If we fix the
>   performance problem we need three:
>   - the byte arrya coming from the C-API
>   - An intermediate one dimensional array
>   - The final multi dimensional array.
>   I am not yet ready to do this.  I think the implementation in 
>   HDFArray.java which converts rows of multi dimensional data sets is 
>   the best we can do in the general case. Has anybody got a better 
> idea?
> 
> There only needs to be two copies, one for C and a multi-dimensional 
> version for Java. Java reads in slabs supplying subarrays from what
you allocated to napi. A working example is found in NXvalid, the file
gov.anl.util.NXutil, the function starting at line 159.
  
  I look into this.  
>   
> Before we release a new version I would like to have the C++ API 
> written. A thread on the wiki exists for what functions should be 
> included at <http://www.neutron.anl.gov.:8080/NeXus/84>

  I put my comments about this onto the Swiki page. We can discuss the
C++
  API and when to relase a new version at the NIAC meeting. 

  Are you aware of the fact, that Elena Pourmal from the HDF team is a
  guest at the NIAC meeting? She offered to give a short
  (15min) presentation about where HDF is headed and I think we should
go 
  for it.
 
                    See you soon,
 
 		       Mark Koennecke
 






More information about the NeXus-developers mailing list