[Nexus-developers] NeXus-XML-API
Mark Koennecke
Mark.Koennecke at psi.ch
Thu Aug 19 13:34:14 BST 2004
High,
as one of my other projects went better then expected I may have time
to do something about the extension of the NeXus API to support XML. Thus
I started thinking about a path to implementing this. I looked at
the XML libraries available for ANSI-C. XML parsing is usually done in
two ways: event based parsing (also called SAX), where you specify
callback functions which are triggered by the parser whenever it
encounters an XML entity. Or DOM based systems which operate on a tree
in memory. As we wish to insert new groups and data at random into
the NeXus hierarchy, a tree based system seems to be the most
advantageous. Unfortunately, all DOM based libraries I have seen
so far store data in the tree as formatted text. As data in NeXus can
become quite large this seems to be no good idea. Data formatted as text
also makes it even more difficult to implement NXgetslab, NXputslab.
After these considerations I propose the following strategy for
implementing XML support to the NeXus API:
I would start with building an own tree structure which holds all data
in memory as binary data. This structure would be formatted into a
XML file on NXclose for writing.
For parsing I would want to use an event style parser which would
build the memory tree on a call to NXopen of an existing file. For
reasons of familiarity and library size I would want to use the
expat parser for this. see: http://expat.sourceforge.net for more
details. I am using expat in another project of mine and it did not
cause any problems. I also considered libml2 from the Gnome project
but that library includes a SAX parser, a DOM system, code for reading
html, for doing http, ftp, XML-RPC etc. and I find it to heavyweight
for our purpose.
I think this would work. The tree stuff I would need for this could
grow to another API for dealing with NeXus files in the long run. A
disadvantage is that all data has to be in memory. This could become a
problem with large data sets. But these are not the best candidates for
XML anyway. Moreover we can choose to put large datasets into temporary
files on disk. Another drawback is that you have nothing if your program
crashes before the XML-NeXus file is written. But HDF files which were
not closed properly also usually were corrupted and useless.
Another issue is how to format numbers in the XML file. I suggest to use
reasonable defaults, but give the user an opportunity to define the
format. This would require a new API-function:
NXnumberformat(NXhandle, type, format-string)
which does something for XML and becomes a noop for HDF.
Not a single line of code is yet written, please send comments and
suggestions now!
Best Regards,
Mark Koennecke
More information about the NeXus-developers
mailing list