[Nexus-developers] NXconvert-NXtranslate

Wed Nov 19 14:46:16 GMT 2003

High,

On Tue, 18 Nov 2003, Ray Osborn wrote:

> Both schemes now use a single XML translation file, based on the NeXus
> metaDTD format, to define how to copy the existing source data to a target
> file in a new NeXus format.  However, Mark is suggesting that we make the
> translation file by replacing the data of a NeXus XML file with scripting
> commands while Peter is suggesting that we add attributes to the data tag
> that points to missing data.  Although this may seem a fairly minor
> difference, I think it profoundly affects the versatility and ease of use of
> this translation process.
> 
......... some deleted here
> 
> I believe that the general user will find it much easier to write Peter's
> translation files.  Of course, someone needs to do the hard work of
> providing the data reading libraries, and the various wrappers that will
> interface those libraries to the translation utility.   The complexity will
> be similar in both schemes; you have to write a set of C-wrappers, either to
> interface the scripting language (Mark's) or the translation utility
> (Peter's) with the source library.  However, it only needs to be done once
> for each source library.
> 
> Now, if the user wants to produce their own NeXus file using these
> libraries, we need to make it easy for them to customize the translation
> file.  Mark's scheme requires that a user learn a particular scripting
> language and all the scripts that are written in it, e.g. copyFromNeXus.
> There would have to be a new set of scripts written for every type of source
> file, with their own sets of arguments.
> 
  Well, yes, you have to learn a scripting language. But you have to learn
  something to use NXtranslate. Moreover a scripting language would be
  able to handle any conversion from ASCII without extensions written in
  ANSII-C code. I already think about making it easy to exchange the 
  scripting language: i.e.: provide the helper functions through SWIG and 
  encapsulate scripting language specifics into separate source files.

> 2) Versatility
> 
> In the example above, the "name" tag had the value SEPD.  Assume that this
> was not in the source file but needs to be added to the target file.  This
> is easy to do in Peter's scheme.  Missing data are just put in as they would
> appear in the final XML version of the data file.  In Mark's scheme, we have
> to use a script command (writeText SEPD) for every data item because every
> tag would have to be parsed for a possible scripting command.   This has
> another consequence that has implications for using a NeXus file to access
> live data.
> 
  I think here is a major point which I missed to make sufficiently
  clear in my first posting. I actually think that scripting is more
  versatile, at the expense of a little overhead for the SEPD case. I
  fail to see how Peters scheme handles the case when we have to compute
  values before we write to a file. This is not negligible, I reckon I
  have to redo distances in +80000 files! Or if you wish to convert to
  our coordinate system. The latter would probably involve calculations
  involving several angles and values from the source file. How
  would this be reasonably expressed in Peter's tag scheme:
    <distance tag="nexus:/entry1/FOCUS/bank1/distance * -1 "/
           mime-type="nxtranslate-script">
  or something?? Note that I need some scripting language here as well to
  parse the expression. Or an own expression parser.   

  Now consider a file with very large datasets. No neutrons, probably
  synchrotron data. I might want to transfer a large data array in
  chunks and not in one block in order to reduce the memory requirements
  of NXtranslate. With the scripting approach and full access to the
  NeXus-API, this is no problem. How do we do this in Peter's approach?
  Automatically chunk if datatset > 16MB? What if the user wants control
  about the chunking? Remember, a well choosen chunking size can have 
  significant impact on I/O performance when this is needed.

  Now consider a ISIS-NeXus 1.0 file. I might need to reorder the
  histogram array in order to separate detector banks. And this 
  differently for each instrument type.  With a scripting
  language I can do that. How to do that in Peter's scheme? 

  Next example: an ILL ASCII file. They look like dumps of float and int
  arrays. You might need to write array2[13] to 
  /entry1/d19/chopper/speed. Easy in a scripting language, this is just
  an array or list reference, but how does this look like in Peters'
  scheme?

> It is often the case that an instrument scientist would like to treat
> archived data and live data with the same software.  Before a run starts, it
> is often possible to construct the entire NeXus file, but with the data
> itself missing.  Once the run starts, it would be nice to use that file for
> all the meta-information, but have a way to access the live data as well.
> Peter's scheme allows this.  In this scenario, the translation file is just
> a regular NeXus file; it can even be a binary HDF file.  However, the data
> tag, instead of containing data, contains attributes that point to the data
> and the library used to read it.
> 
> This is not possible in Mark's scheme because every data item has to contain
> a scripting command.  I don't think there is any way to parse a data item
> and automatically tell whether it contains data or scripts, unless every
> script is enclosed by some special delimiters.   That would make the files
> look even more complex, and we would have to ensure that the delimiters
> never appear in real data.

  Access to live data through the NeXus-API is as yet an unimplemented 
  feauture. It is not yet even planned or agreed upon. I also fail to
  see how Peter's scheme facilitates this. Please enlighten me! One
  conceivable scheme, I could see is that an empty dataset of appropriate
  size is created and with attributes stating how to get the data. I can
  create this with scripting.

> 
> Incidentally, Peter's method also becomes a scheme for referencing external
> data within a NeXus file.  I'm not a great fan of doing this, but I could
> see that it might be nice to have a link to some publically accessible
> document as part of the file's self-documentation.
> 
> e.g.
> 
> <NXinstrument name="lrmecs">
>    <picture source="http://www.pns.anl.gov/lrmecs/lrmecs.jpg"
>             mime_type="image/jpeg" />
> ...
> </NXinstrument> 
> 
> Of course, we would have to put something in the API to handle data requests
> for such external links gracefully, but that's something for the future.
> 
  I always thought one of our goals was to have everything in one
  file............. Freddy made us agree to NX_BINARY in order to store
  pictures like that.   

> Eventually, I think we should extend the API to have a thin GetData layer
> that interfaces to external libraries, or returns sensible error messages if
> the libraries are not available.  Open Genie has a generalized data acess
> library; something similar could be adapted for NeXus.  Initially, it would
> be part of the translation utility, but eventually, it could be part of the
> standard API.
> 
  I wonder if this is a good idea.... Perhaps a good translation utility
  is all we ever want.

  Summing it up, a scripting language gives us more power when we have
  to compute data before it goes into a file. A scripting language
  also would handle the common case of conversion from ASCII formats
  without any further extensions to NXtranslate. In Peters scheme I would
  have to write a handler for each ASCII format. Even extensions for
  binary formats are easier: I need to wrap my read routines with some
  well documented wrapper for the scripting language, build a shared
  library and load it into NXtranslate's scripting interpreter without
  touching NXtranslate's source code. In Peters scheme I would have to
  implement the dynamic loading stuff myself (in a platform independent
  way, of course). Or relink NXtranslate with the new handler.     

  What does everybody else think?

			Best Regards,

				Mark