[Nexus-developers] nxconvert

Mark Koennecke Mark.Koennecke at psi.ch
Mon Nov 10 15:24:09 GMT 2003


  High,

  On the NIAC meeting I promised a whitepaper on implementing a NeXus
  file conversion utility. Well here we go, it is in the attachement.
  I'am looking forward to your comments.



				Best Regards,

					Mark Koennecke

-------------- next part --------------

    NXCONVERT Whitepaper

    NXconvert is going to be a program for converting NeXus files from
    one format to another. This does not only include conversion between
    different HDF versions but also the transformation of the file 
    structure. This tool is necessary to convert existing NeXus files
    to the new formats standardized through the NIAC, the NeXus
    International Advisory Comittee.

    
    NXCONVERT Tasks
    
    The following problems have to be addressed by an automatic conversion
    tool for NeXus files:
    - It should be possible to select the type of the output file (HDF4,
      HDF5, soon XML).
    - Data must be copied from a node in the source file structure to
      another node in the target file structure. This covers the biggest
      part of the job.
    - The tool must be capable to process multiple entries in a file.
    - Some data must be copied from attributes in the source file to
      SDS in the target file. 
    - Links have to be created in the target file.
    - It must be possible to supply missing data. 
    - Some values may need to be recalculated. For example when converting
      distances in old files to the new scheme.  
    In addition the NeXus file converter should preferably be a command
    line utility in order to facilitate batch conversions of large numbers
    of files.

    
    NXCONVERT Implementation Strategy

    A suitable NeXus file converter can be implemented through a 
    combination between a dictionary based conversion loop augmented
    with script processing for performing more complex operations.


    Dictionary Conversion

    In order to perform NeXus file conversions it is necessary to be able
    to address nodes in both the source and the target file. This addressing
    problem can be overcome by the use of the existing NXdict library.
    NXdict maps descriptions of the position of a data item in a NeXus file
    and its properties (definition strings) to short names, called aliases.
    It also supports text replacement facilities withing definition strings.
   
    In order to use NXdict for conversion the user would have to supply two
    NXdict dictionary files: one for the source file and one for the target
    file. In order to map a node in the source file to a node in the target 
    file both nodes are required to have the same alias. 

    With suitable dictionaries provided, the NeXus conversion utility would
    execute the following pseudocode:
    
    FOR all entries in the source NeXus file
       Fix up entry name in both source and target dictionaries
       FOR all aliases in source dictionary
          IF source alias exists in target dictionary
	     copyDataItem
          ENDIF
       ENDFOR
    ENDFOR     
    
    copyDataItem would use dataset attributes as specified in the target
    dictionary. Dimensions and number types, however, must be taken from
    the existing data in the source file. Otherwise the conversion of files
    with variable dimensions, for example time-of-flight, would fail. Datasets
    above a certain size will automatically be written compressed. 
    
    The current dictionary file format used by NXdict appears to be unpopular.
    NXdict should be augmented in such a way that it understands dictionaries
    in XML format. This XML format should follow the format of the NeXus-DTD
    as closely as possible.   

    
    Script Conversion

    The dictionary scheme mentioned above covers many cases. Cases not 
    easily covered in the dictionary scheme include:
    - link generation
    - calculations to be performed on data items.
    - attribute to SDS conversions.
    Such conversions can best be performed through a script. Rather then 
    implementingan own special purpose scripting language, it would be 
    advisable to use one of the many embeddable scripting languages
    freely available. Besides being embeddable, such a scripting language 
    must be extendable in C, in order to implement some own functions:
    - storeAtAlias alias someValue
      stores someValue at the node described by alias in the target file
    - getSourceAttribute attributeName
      retrieves the global attribute attributeName from the source file.
    - makeLink sourceAlias targetAlias
      creates a link from sourceAlias at targetAlias
    
    Users would implement some functions in the scripting language which
    would then be called by the main application at appropriate times:
    - forFile
      performs per file conversions
    - forEntry entryName
      performs operations per entry. The entry name is specified as entryName
     
    Inclusion of the SWIG generated wrapper functions for the NeXus API
    and the provision of handles to both the source and target file would
    allow for very sophisticated transformations to be performed.

    Many scripting languages may be choosen as nxconverts internal scripting 
    language. For reasons of familiarity I suggest Tcl, but this is open
    for discussion.

    Comand Line
  
    The resulting command line for the NeXus conversion utility would look
    like:
     
       nxconvert  sourcedictionary  targetdictionary  scriptfile 
                 -target HDF4 | HDF5 | XML infile outfile

   source- and targetdictionary are the dictionaries in XML format,
   scriptfile the file with additional scripted transformations. -target
   specifies the output format of the conversion. infile and outfile are
   the filenames of the NeXus files to process.
 

   Alternative Implementation: Scripts only

   Another possibility would be to build the dictionaries and to perform
   the transformation through a scripting language. However, this causes
   more typing and ignores the fact that basic dictionaries can be generated
   either through the NXtoDTD tool from existing files or be derived from
   the DTD containing the instrument definition.  





More information about the NeXus-developers mailing list