[Nexus] NeXus - a solution to what is not the real problem ?

Ray Osborn rosborn at anl.gov
Tue Mar 9 20:57:01 GMT 2010


Hi everyone,
Thanks for getting this vital debate going again. I confess I haven't  
had time to read all the submissions completely and I will be late for  
a meeting if I do, but I would like to mention the development of a  
Python interface that I think could significantly impact how many  
scientists view and use NeXus and is relevant to this discussion. As  
Freddie said, the original idea of NeXus was as an interchange format,  
but most people now think of it as an archive format. That was a  
historical evolution that resulted from conflicting driving forces in  
pushing the format forward. The initial driver was the need for  
scientists to cope with data from multiple sources. Unfortunately,  
although Mark Koennecke and others produced a very elegant API fairly  
quickly (well over ten years ago), it happened at a time when  
scientists stopped programming in compiled languages and became  
dependent on proprietary scripting languages like Matlab and IDL, so  
more recent developments have been driven by facility needs to archive  
large complex datasets. This is why we are now so obsessed with  
precise file definitions, when the NeXus file structure was designed  
to be much more flexible.

In my opinion, the way to get back to the original NeXus roots, for  
scientists at least, is by improving the scripting interface so that  
scientists can take control of their own analyses again. A while ago,  
Paul Kienzle produced a beautiful mapping of the basic API onto Python  
using Numpy, and a higher-level interface that reads the whole NeXus  
file into a Python tree structure. We have been building on that base  
for a few months now and should try to release something to the  
contributed section fairly soon. Python, with its relaxed approach to  
object-oriented programming, is ideal for coping with the arbitrary  
data that might be contained within a NeXus file. Then, perhaps, we  
won't be so hung up on the precise definition of what is contained in  
each file. NeXus is self-describing so you can interrogate the file to  
discover its contents and act accordingly.

Basically, the Python interface makes the reading, writing, and  
analyses of NeXus data extremely simple at the command-line, with  
simplified constructors for NXdata objects that become workspaces for  
data analysis, including all the usual arithmetic operations, slicing  
and dicing, automatic plotting, and the ability to save to files.  
Constructors for other group classes also exist. Here is a simple  
session:

 >>> x=linspace(0,2*pi,101)
 >>> a=NXdata(sin(x),x)
 >>> print a.signal
[  0.00000000e+00   6.27905195e-02   1.25333234e-01 ...,   
-1.25333234e-01
   -6.27905195e-02  -2.44929360e-16]
 >>> a.nxsave('temp.nxs')
 >>> b=nexus.load('temp.nxs')
 >>> b.nxtree()
root:NXroot
   @HDF5_Version = 1.8.2
   @NeXus_version = 4.2.0
   @file_name = temp.nxs
   @file_time = 2010-03-09T14:36:58-06:00
   entry:NXentry
     data:NXdata
       signal = float64(101)
         @axes = x
         @signal = 1
       x = float64(101)
 >>> print 2*b.entry.data.signal
[  0.00000000e+00   1.25581039e-01   2.50666467e-01 ...,   
-2.50666467e-01
   -1.25581039e-01  -4.89858720e-16]

I hope that the ability for scientists to produce their own scripts  
will then be combined with the ability for computer experts to package  
these scripts into GUI applications once we know what to do with the  
data. Both can be layered on the Python interface.

Of course, I'm not suggesting that everyone should use Python, but I  
do think we need to provide simple tools to make it obvious why NeXus  
is worth using. The ability to read, plot, and manipulate arbitrary  
datasets from any type of neutron and x-ray instrument, without  
needing to ask a programmer to do it for me, has always been my dream,  
and I think is now a reality thanks to the work of Paul and others.

With best regards,
Ray

On Mar 9, 2010, at 1:03 PM, Andy Gotz wrote:

> Hi Gerd + Joachim,
>
> I can hear lots of frustration in Joachim's email so I thought I  
> would share some of my thoughts :
>
> I fully agree with Gerd's conclusion that defining a new format is  
> not an option for me. Nexus is an example of how painful it gets  
> when defining new data formats. I don't want to go down this road  
> again and alone. One of the problems is acceptance by the community.  
> If we stick to Nexus it will eventually be adopted I am sure -  
> perseverance is a powerful ally. Until it is adopted widely I  
> understand Brian's frustration - imagine 10 years of trying to get a  
> format accepted and still no success !
>
> I think that some of the reasons why Nexus has not being adopted  
> are : (1) there has been too much emphasis on storing of raw data,  
> it would have been much more successful IMHO if Nexus had  
> concentrated on analysed data. This is what the user sees and wants.  
> It would have proved the added value of Nexus much quicker. (2) lack  
> of manpower working on Nexus (maybe this will change with the  
> Pandata networking activity getting some funding ...). (3) lack of  
> reactivity of the Nexus developers, partly related to (2) I think  
> but also (1). Recently at the Hyperspectral workshop at the ESRF we  
> requested that data dimensions be added as attributes to the Nexus  
> data definition. This way a program can easily identify the spectra,  
> images and 3d volumes in a Nexus file. We do not have any feedback  
> from the Nexus community how long it will take to get something as  
> simple and fundamental as this to be accepted.
>
> Of course it is easy to criticise because I was not there in the  
> beginning of the Nexus. But I think we need some changes in the way  
> the Nexus committee works today to make it more reactive. I have  
> just been replaced on the NIAC committee so again it is easy to make  
> such a suggestion !
>
> In Joachim's example I find it is strange to be discussing his file  
> format which is essentially how to store 2 columns of data. I agree  
> that in this case Nexus can be seen as overkill. But if Nexus were  
> used it could also use one of the standard nexus tools to extract it  
> to ascii to get the immediate feedback. It also points a finger to  
> the lack of Nexus viewers. I don't know of any which will refresh  
> the contents of the displayed file automatically. But the case of 2  
> columns of ascii data is not what most of us are dealing with. Most  
> of us have to deal with tens of thousands and millions of images and  
> data volumes (cf. Brian's 100 TB's). ASCII is NOT an option. We are  
> solving a new problem in fact. When Nexus was started this was not  
> such a critical issue. Today it is. The hyperspectral workshop was  
> proof of this. Again it is mainly interesting to agree on how to  
> store large volumes of data for analysis programs. Would we be  
> better off adopting some other 3D data format ? But what about 2D  
> and 1D and the experimental / data analysis context ?
>
> I would say if you want to use YAML for your raw data that is your  
> choice. But why not join the larger community for helping defining  
> how to store analysed data. Then we can use your data analysis  
> routines and vice versa.
>
> I think the time has never been riper for Nexus to be adopted and to  
> become a real standard. Thanks to the new institutes standardising  
> on Nexus and some of the old ones trying to do the same. But there  
> is no guarantee that Nexus will succeed. Joachim, your email is  
> proof of this.
>
> The bottom line is I think we need to improve the current situation  
> together and not simply go our own way. The diversity of the  
> scientific techniques of the communities we are serving makes this a  
> non-trivial problem.
>
> Andy
>
> Gerd Wellenreuther wrote:
>> Hi Joachim,
>>
>> Wuttke, Joachim schrieb:
>>> What I am attacking is not the
>>> serious work you are doing for really complicated data set, but  
>>> the idea, popular
>>> at management level, that NeXus should ultimately be used at _all_  
>>> neutron and
>>> X-ray instruments. All I want to say is: for certain types of  
>>> instruments, migrating
>>> to NeXus would be considerable effort, would rather increase than  
>>> reduce the
>>> diversity of data formats, would not improve the messy state of  
>>> software seen by
>>> the users.
>> There are lot of issues which can be tackled by the use of a common  
>> data format, not only simple data exchange, but also software  
>> developement, data archiving etc.. For this reason, the aim to  
>> define and spread the use of such a common data format is good,  
>> IMHO. But when it comes to *implementing* this dataformat at  
>> individual beamlines / instruments, the corresponding work should  
>> *not* be done by single beamline scientists, or even single  
>> institutes - this would again result in different flavours of what  
>> was supposed to be the common data format (as can be seen by the  
>> struggle of facilities like Soleil and Diamond, both writing NeXus- 
>> files but not being able to use the software(s) developed at the  
>> other side of the channel). So, if we do it right, the single  
>> beamline scientist should not be required to bear the biggest part  
>> of the workload - the opposite should be the case. In the long run,  
>> we want to save time, right?
>>
>> So the goal has to be to define a common data format *ALONG* with a  
>> high-level API (much higher than the present NeXus-API in my  
>> opinion) which assists IT-guys and scientists from different  
>> facilities to exploit the capabilities of NeXus, *AND* further  
>> tools for the scientists to do their job. Of course, if the  
>> management thinks this is a good idea, they have to fund people to  
>> implement it :).
>>
>> Sure, NeXus is not the perfect candidate for such a common data  
>> format. Fact is also: There is no other candidate. (At this point  
>> you might notice: I am not considering starting from scratch as an  
>> option :) .)
>>
>> Cheers, Gerd
>> _______________________________________________
>> NeXus mailing list
>> NeXus at nexusformat.org
>> http://lists.nexusformat.org/mailman/listinfo/nexus
>
> _______________________________________________
> NeXus mailing list
> NeXus at nexusformat.org
> http://lists.nexusformat.org/mailman/listinfo/nexus

-- 
Ray Osborn
Materials Science Division
Argonne National Laboratory
Argonne, IL 60439, USA
Phone: +1 (630) 252-9011
Email: ROsborn at anl.gov





More information about the NeXus mailing list