[Nexus-developers] nxconvert
Mark Koennecke
Mark.Koennecke at psi.ch
Mon Nov 10 15:24:09 GMT 2003
High,
On the NIAC meeting I promised a whitepaper on implementing a NeXus
file conversion utility. Well here we go, it is in the attachement.
I'am looking forward to your comments.
Best Regards,
Mark Koennecke
-------------- next part --------------
NXCONVERT Whitepaper
NXconvert is going to be a program for converting NeXus files from
one format to another. This does not only include conversion between
different HDF versions but also the transformation of the file
structure. This tool is necessary to convert existing NeXus files
to the new formats standardized through the NIAC, the NeXus
International Advisory Comittee.
NXCONVERT Tasks
The following problems have to be addressed by an automatic conversion
tool for NeXus files:
- It should be possible to select the type of the output file (HDF4,
HDF5, soon XML).
- Data must be copied from a node in the source file structure to
another node in the target file structure. This covers the biggest
part of the job.
- The tool must be capable to process multiple entries in a file.
- Some data must be copied from attributes in the source file to
SDS in the target file.
- Links have to be created in the target file.
- It must be possible to supply missing data.
- Some values may need to be recalculated. For example when converting
distances in old files to the new scheme.
In addition the NeXus file converter should preferably be a command
line utility in order to facilitate batch conversions of large numbers
of files.
NXCONVERT Implementation Strategy
A suitable NeXus file converter can be implemented through a
combination between a dictionary based conversion loop augmented
with script processing for performing more complex operations.
Dictionary Conversion
In order to perform NeXus file conversions it is necessary to be able
to address nodes in both the source and the target file. This addressing
problem can be overcome by the use of the existing NXdict library.
NXdict maps descriptions of the position of a data item in a NeXus file
and its properties (definition strings) to short names, called aliases.
It also supports text replacement facilities withing definition strings.
In order to use NXdict for conversion the user would have to supply two
NXdict dictionary files: one for the source file and one for the target
file. In order to map a node in the source file to a node in the target
file both nodes are required to have the same alias.
With suitable dictionaries provided, the NeXus conversion utility would
execute the following pseudocode:
FOR all entries in the source NeXus file
Fix up entry name in both source and target dictionaries
FOR all aliases in source dictionary
IF source alias exists in target dictionary
copyDataItem
ENDIF
ENDFOR
ENDFOR
copyDataItem would use dataset attributes as specified in the target
dictionary. Dimensions and number types, however, must be taken from
the existing data in the source file. Otherwise the conversion of files
with variable dimensions, for example time-of-flight, would fail. Datasets
above a certain size will automatically be written compressed.
The current dictionary file format used by NXdict appears to be unpopular.
NXdict should be augmented in such a way that it understands dictionaries
in XML format. This XML format should follow the format of the NeXus-DTD
as closely as possible.
Script Conversion
The dictionary scheme mentioned above covers many cases. Cases not
easily covered in the dictionary scheme include:
- link generation
- calculations to be performed on data items.
- attribute to SDS conversions.
Such conversions can best be performed through a script. Rather then
implementingan own special purpose scripting language, it would be
advisable to use one of the many embeddable scripting languages
freely available. Besides being embeddable, such a scripting language
must be extendable in C, in order to implement some own functions:
- storeAtAlias alias someValue
stores someValue at the node described by alias in the target file
- getSourceAttribute attributeName
retrieves the global attribute attributeName from the source file.
- makeLink sourceAlias targetAlias
creates a link from sourceAlias at targetAlias
Users would implement some functions in the scripting language which
would then be called by the main application at appropriate times:
- forFile
performs per file conversions
- forEntry entryName
performs operations per entry. The entry name is specified as entryName
Inclusion of the SWIG generated wrapper functions for the NeXus API
and the provision of handles to both the source and target file would
allow for very sophisticated transformations to be performed.
Many scripting languages may be choosen as nxconverts internal scripting
language. For reasons of familiarity I suggest Tcl, but this is open
for discussion.
Comand Line
The resulting command line for the NeXus conversion utility would look
like:
nxconvert sourcedictionary targetdictionary scriptfile
-target HDF4 | HDF5 | XML infile outfile
source- and targetdictionary are the dictionaries in XML format,
scriptfile the file with additional scripted transformations. -target
specifies the output format of the conversion. infile and outfile are
the filenames of the NeXus files to process.
Alternative Implementation: Scripts only
Another possibility would be to build the dictionaries and to perform
the transformation through a scripting language. However, this causes
more typing and ignores the fact that basic dictionaries can be generated
either through the NXtoDTD tool from existing files or be derived from
the DTD containing the instrument definition.
More information about the NeXus-developers
mailing list