[Nexus-developers] NXtranslate

Peterson, Peter F. petersonpf at ornl.gov
Mon Dec 1 19:40:39 GMT 2003


Mark et al,

*WARNING: This is a long and dense, so please refill your coffee/tea
before reading*

NXtranslate was originally designed to make a framework for doing the
*majority* of file conversions in a simple way that allowed distributing
work among developers and users. For cases that are not covered by
NXtranslate, one can write a single purpose code using the napi
directly. After talking with various people at various facilities, doing
some *very* random reading and looking at code for other programs that I
found useful (but not at all applicable) the following requirements were
made:
 - Simple: Do one thing and do it well. If there is a complicated case,
a general purpose tool is not going to be sufficient anyhow. The person
who wants to do more than NXtranslate can always has the napi.
 - Flexible: it must be able to deal with writing files in a standard
that has not been fully formalized.
 - Extensible: people want to write NeXus files from a variety of
formats, we should make it easy to do. This means that we give them a
straightforward framework to plug into and their code will work without
messing with our internals. 
 - Straightforward: end users that want to convert from their old
favorite format to their new favorite format (NeXus) can edit an
existing XML translation file and run, without having to read any
documentation.
 - Small: This goes back to being simple, extensible, and
straightforward; the less it does, and the fewer knobs, the easier it
will be to document, maintain, and use.
 - Starting format independent: It will not have a favorite language.
While reading with napi is obvious, it should be no easier than reading
from ISIS raw files or NCNR(ASCII) files.
 - Portable: Must be able to run on the major platforms for facilities
(Unix/Linux, Windows, and MacOS X). This restricts the implementation
language, but not much.

The specific cases for NXtranslate to deal with are (sorry for being
US-centric):
 MLNSC - They have *lots* of files (second only to PSI?) written using
the napi, but done so before the NIAC started to formalize anything, or
was even formed. They need to convert the noncompliant NeXus files to
compliant NeXus files. This means changing structure, but not changing
information (time_of_flight axis will still be stored as an int array
with units=100*nano*second).
 SNS - They will write out their raw data using the napi, but it will
only consist of location/detector id (an integer identifier),
time_of_flight, and pulse_number. They need something to group together
this information with logs (e.g. temperature, presure, etc.), and
instrument geometry. Since the geometry is fixed it can be hard coded in
the translation file for a given set of runs.
 NCNR - They are committed to the move to NeXus for their archival
format, but want to do so in a way that doesn't break existing analysis
code or require rewritting their current data writing code. The intent
is to have an "on demand" way of converting files from their legacy
(ASCII) format to NeXus when a user wants it. Eventually they will only
provide NeXus files, but not until their code has been adapted and
tested.
 IPNS - Similar to NCNR, they want to give users greater choice and
easier access to their data. Similar to SNS, they want to combine
information from a couple of sources, specifically the "IPNS runfile", a
"SDDS" log file, and from the user database (MS access). This is not
really a new case, but another group of people to talk with and three
more libraries.
Things outside of these four cases (such as live data) are a nice side
effect of the design, but not necessary.

As a bit more information of how Nxtranslate would work. The idea of
"plugins" is taken from GKrellM <http://www.gkrellm.net>, but is seen in
many other places. The idea is that a shared library in a known location
(for GKrellM on linux it is in /usr/local/lib/gkrellm/plugins/ and
${HOME}/.gkrellm/plugins/) which has a C-struct with information about
how to interact with the plugin, specifically some strings for
saving/loading configuration and function pointers for initialization,
interaction, and cleanup. How this would work in NXtranslate is a
structure like (forgive me for not getting c-syntax quite right):
<code>
typedef struct{
 char * mime_type,
 initialize(),
 destroy(),
 has_location(char *location), // determine if the source file has the
information needed
 get_type(char *location, int *type), // type of data
 get_rank(char *location, int *rank), // rank of dimension array
 get_dimension(char *location, int *dimension), // dimension array
 get_data(char *location, void *data), // fill in the void pointer
 has_attribute(char *location, char *name), // determine if the
attribute exists
 get_attribute_length(char *location, char *name, int *length), // find
length of attribute
 get_attribute(char *location, char *name, char *value) // get an
attribute with a specific name
}
</code>
There will also need to be a way to get a "slab" of data, rather than
the whole thing at once.

NXtranslate puts the responsibility of the person who wants their data
read in on them. It allows for a simple way to construct a NeXus file
using any library that NXtranslate has been made aware of.
Short-commings that have been pointed out are:
  Q: Each ASCII format has to have its own library
  A: How similar is 3-column ASCII to GSAS powder raw file to SPEC?
While all could be dealt with by a full scripting system this requires
that an end-user has to write in it. By having separate libraries for a
set of ASCII files the developer (who will do a better job) takes care
of the majority of the hard work.
  Q: Cannot convert from int[] to float[] or 100*nano*second to
micro*second
  A: This is true. However, an equivalent unit like "10^-7*second" could
be written directly into the translation file and copied into the final
NeXus file.
  Q: Cannot move field to attribute
  A: This is true, but you can still write the resulting value into
translation file. This can be done a bit more dynamically using macros.
  Q: Cannot move attribute to field
  A: This is false.
  Q: Must hold all data in memory at once.
  A: This is false. The entire translation file will be read in using
DOM, then the resulting tree will be parsed and written, node by node.
Nodes that need to import information using the libraries will do so
just before writing. This means that the 25-dimension limit set by the
napi must be observed.

My question for you (everybody on the list): Are you in a situation not
covered by this design? This is not whether you can think of a case that
isn't, but actually know of one.

Peter Peterson
P.S. I'm back from vacation.
--
Spallation Neutron Source
Oak Ridge National Laboratory
Tel: 630-252-8397  Fax: 630-252-7777





More information about the NeXus-developers mailing list