[Nexus-developers] NXconvert-NXtranslate
Mark Koennecke
Mark.Koennecke at psi.ch
Wed Nov 19 14:46:16 GMT 2003
High,
On Tue, 18 Nov 2003, Ray Osborn wrote:
> Both schemes now use a single XML translation file, based on the NeXus
> metaDTD format, to define how to copy the existing source data to a target
> file in a new NeXus format. However, Mark is suggesting that we make the
> translation file by replacing the data of a NeXus XML file with scripting
> commands while Peter is suggesting that we add attributes to the data tag
> that points to missing data. Although this may seem a fairly minor
> difference, I think it profoundly affects the versatility and ease of use of
> this translation process.
>
......... some deleted here
>
> I believe that the general user will find it much easier to write Peter's
> translation files. Of course, someone needs to do the hard work of
> providing the data reading libraries, and the various wrappers that will
> interface those libraries to the translation utility. The complexity will
> be similar in both schemes; you have to write a set of C-wrappers, either to
> interface the scripting language (Mark's) or the translation utility
> (Peter's) with the source library. However, it only needs to be done once
> for each source library.
>
> Now, if the user wants to produce their own NeXus file using these
> libraries, we need to make it easy for them to customize the translation
> file. Mark's scheme requires that a user learn a particular scripting
> language and all the scripts that are written in it, e.g. copyFromNeXus.
> There would have to be a new set of scripts written for every type of source
> file, with their own sets of arguments.
>
Well, yes, you have to learn a scripting language. But you have to learn
something to use NXtranslate. Moreover a scripting language would be
able to handle any conversion from ASCII without extensions written in
ANSII-C code. I already think about making it easy to exchange the
scripting language: i.e.: provide the helper functions through SWIG and
encapsulate scripting language specifics into separate source files.
> 2) Versatility
>
> In the example above, the "name" tag had the value SEPD. Assume that this
> was not in the source file but needs to be added to the target file. This
> is easy to do in Peter's scheme. Missing data are just put in as they would
> appear in the final XML version of the data file. In Mark's scheme, we have
> to use a script command (writeText SEPD) for every data item because every
> tag would have to be parsed for a possible scripting command. This has
> another consequence that has implications for using a NeXus file to access
> live data.
>
I think here is a major point which I missed to make sufficiently
clear in my first posting. I actually think that scripting is more
versatile, at the expense of a little overhead for the SEPD case. I
fail to see how Peters scheme handles the case when we have to compute
values before we write to a file. This is not negligible, I reckon I
have to redo distances in +80000 files! Or if you wish to convert to
our coordinate system. The latter would probably involve calculations
involving several angles and values from the source file. How
would this be reasonably expressed in Peter's tag scheme:
<distance tag="nexus:/entry1/FOCUS/bank1/distance * -1 "/
mime-type="nxtranslate-script">
or something?? Note that I need some scripting language here as well to
parse the expression. Or an own expression parser.
Now consider a file with very large datasets. No neutrons, probably
synchrotron data. I might want to transfer a large data array in
chunks and not in one block in order to reduce the memory requirements
of NXtranslate. With the scripting approach and full access to the
NeXus-API, this is no problem. How do we do this in Peter's approach?
Automatically chunk if datatset > 16MB? What if the user wants control
about the chunking? Remember, a well choosen chunking size can have
significant impact on I/O performance when this is needed.
Now consider a ISIS-NeXus 1.0 file. I might need to reorder the
histogram array in order to separate detector banks. And this
differently for each instrument type. With a scripting
language I can do that. How to do that in Peter's scheme?
Next example: an ILL ASCII file. They look like dumps of float and int
arrays. You might need to write array2[13] to
/entry1/d19/chopper/speed. Easy in a scripting language, this is just
an array or list reference, but how does this look like in Peters'
scheme?
> It is often the case that an instrument scientist would like to treat
> archived data and live data with the same software. Before a run starts, it
> is often possible to construct the entire NeXus file, but with the data
> itself missing. Once the run starts, it would be nice to use that file for
> all the meta-information, but have a way to access the live data as well.
> Peter's scheme allows this. In this scenario, the translation file is just
> a regular NeXus file; it can even be a binary HDF file. However, the data
> tag, instead of containing data, contains attributes that point to the data
> and the library used to read it.
>
> This is not possible in Mark's scheme because every data item has to contain
> a scripting command. I don't think there is any way to parse a data item
> and automatically tell whether it contains data or scripts, unless every
> script is enclosed by some special delimiters. That would make the files
> look even more complex, and we would have to ensure that the delimiters
> never appear in real data.
Access to live data through the NeXus-API is as yet an unimplemented
feauture. It is not yet even planned or agreed upon. I also fail to
see how Peter's scheme facilitates this. Please enlighten me! One
conceivable scheme, I could see is that an empty dataset of appropriate
size is created and with attributes stating how to get the data. I can
create this with scripting.
>
> Incidentally, Peter's method also becomes a scheme for referencing external
> data within a NeXus file. I'm not a great fan of doing this, but I could
> see that it might be nice to have a link to some publically accessible
> document as part of the file's self-documentation.
>
> e.g.
>
> <NXinstrument name="lrmecs">
> <picture source="http://www.pns.anl.gov/lrmecs/lrmecs.jpg"
> mime_type="image/jpeg" />
> ...
> </NXinstrument>
>
> Of course, we would have to put something in the API to handle data requests
> for such external links gracefully, but that's something for the future.
>
I always thought one of our goals was to have everything in one
file............. Freddy made us agree to NX_BINARY in order to store
pictures like that.
> Eventually, I think we should extend the API to have a thin GetData layer
> that interfaces to external libraries, or returns sensible error messages if
> the libraries are not available. Open Genie has a generalized data acess
> library; something similar could be adapted for NeXus. Initially, it would
> be part of the translation utility, but eventually, it could be part of the
> standard API.
>
I wonder if this is a good idea.... Perhaps a good translation utility
is all we ever want.
Summing it up, a scripting language gives us more power when we have
to compute data before it goes into a file. A scripting language
also would handle the common case of conversion from ASCII formats
without any further extensions to NXtranslate. In Peters scheme I would
have to write a handler for each ASCII format. Even extensions for
binary formats are easier: I need to wrap my read routines with some
well documented wrapper for the scripting language, build a shared
library and load it into NXtranslate's scripting interpreter without
touching NXtranslate's source code. In Peters scheme I would have to
implement the dynamic loading stuff myself (in a platform independent
way, of course). Or relink NXtranslate with the new handler.
What does everybody else think?
Best Regards,
Mark
More information about the NeXus-developers
mailing list