[Nexus-developers] NeXus-XML-API

Mark Koennecke Mark.Koennecke at psi.ch
Thu Aug 19 13:34:14 BST 2004


  High,

  as one of my other projects went better then expected I may have time
  to do something about the extension of the NeXus API to support XML. Thus
  I started thinking about a path to implementing this. I looked at 
  the XML libraries available for ANSI-C. XML parsing is usually done in
  two ways: event based parsing (also called SAX), where you specify 
  callback functions which are triggered by the parser whenever it 
  encounters an XML entity. Or DOM based systems which operate on a tree
  in memory. As we wish to insert new groups and data at random into
  the NeXus hierarchy, a tree based system seems to be the most
  advantageous. Unfortunately, all DOM based libraries I have seen 
  so far store data in the tree as formatted text. As data in NeXus can
  become quite large this seems to be no good idea. Data formatted as text
  also makes it even more difficult to implement NXgetslab, NXputslab.  

  After these considerations I propose the following strategy for
  implementing XML support to the NeXus API:

  I would start with building an own tree structure which holds all data
  in memory as binary data. This structure would be formatted into a 
  XML file on NXclose for writing.

  For parsing I would want to use an event style parser which would
  build the memory tree on a call to NXopen of an existing file. For
  reasons of familiarity and library size I would want to use the
  expat parser for this. see: http://expat.sourceforge.net for more
  details. I am using expat in another project of mine and it did not 
  cause any problems. I also considered libml2 from the Gnome project
  but that library includes a SAX parser, a DOM system, code for reading
  html, for doing http, ftp, XML-RPC etc. and I find it to heavyweight
  for our purpose.

  I think this would work. The tree stuff I would need for this could 
  grow to another API for dealing with NeXus files in the long run. A
  disadvantage is that all data has to be in memory. This could become a
  problem with large data sets. But these are not the best candidates for
  XML anyway. Moreover we can choose to put large datasets into temporary
  files on disk.  Another drawback is that you have nothing if your program
  crashes before the XML-NeXus file is written. But HDF files which were
  not closed properly also usually were corrupted and useless. 

  Another issue is how to format numbers in the XML file. I suggest to use
  reasonable defaults, but give the user an opportunity to define the
  format. This would require a new API-function:

        NXnumberformat(NXhandle, type, format-string)

  which does something for XML and becomes a noop for HDF.   

  Not a single line of code is yet written, please send comments and
  suggestions now!


			Best Regards,

				Mark Koennecke
  






More information about the NeXus-developers mailing list