[Nexus] Switching to a debate about NeXus + Python

Wellenreuther, Gerd gerd.wellenreuther at desy.de
Wed Mar 10 07:48:36 GMT 2010


Hi Ray,

I would like to talk about that module Paul Kienzle produced in more 
detail. The reason for that: A lot of scientists consider Python as a 
very good environment for scripting/programming purposes, and having a 
top-level python module supporting NeXus would be a very good thing.

Recently, I have tried phynx written by Darren Dale, which is already 
providing support for NeXus. I know of h5py and pytables, which are 
capable of writing HDF5-files directly, but I would like to use a module 
which does most of the tiring and repetitive setting of the 
NeXus-internals like NXclass, signals, axes etc. itself.

I personally would also like to split the task of defining where which 
data during acquisition time has to go in the NeXus-file, from the task 
of actually writing the file. E.g. I built another module on top of 
phynx (which is itself on top of h5py, Darren, please correct me). The 
user / the data aqcuisition program just hands over the data along with 
the name of the device produced it. The module looks up, where this data 
should go in the NeXus tree, and caches this data until being told to 
write it.

*This* is an example of how I think a scientist would like to write and 
read data today - just give the data to an instance, which is looking up 
the standard way how to read/write this data, and which keeps track of 
all the small details. Maybe also adding some logbook-information about 
when that happened etc. ...

How far advanced is that module you mentioned, and what is available for 
it currently?

Cheers, Gerd

Ray Osborn wrote:
>  A while ago,
> Paul Kienzle produced a beautiful mapping of the basic API onto Python 
> using Numpy, and a higher-level interface that reads the whole NeXus 
> file into a Python tree structure. We have been building on that base 
> for a few months now and should try to release something to the 
> contributed section fairly soon. Python, with its relaxed approach to 
> object-oriented programming, is ideal for coping with the arbitrary data 
> that might be contained within a NeXus file. Then, perhaps, we won't be 
> so hung up on the precise definition of what is contained in each file. 
> NeXus is self-describing so you can interrogate the file to discover its 
> contents and act accordingly.
> 
> Basically, the Python interface makes the reading, writing, and analyses 
> of NeXus data extremely simple at the command-line, with simplified 
> constructors for NXdata objects that become workspaces for data 
> analysis, including all the usual arithmetic operations, slicing and 
> dicing, automatic plotting, and the ability to save to files. 
> Constructors for other group classes also exist. Here is a simple session:
> 
>  >>> x=linspace(0,2*pi,101)
>  >>> a=NXdata(sin(x),x)
>  >>> print a.signal
> [  0.00000000e+00   6.27905195e-02   1.25333234e-01 ...,  -1.25333234e-01
>   -6.27905195e-02  -2.44929360e-16]
>  >>> a.nxsave('temp.nxs')
>  >>> b=nexus.load('temp.nxs')
>  >>> b.nxtree()
> root:NXroot
>   @HDF5_Version = 1.8.2
>   @NeXus_version = 4.2.0
>   @file_name = temp.nxs
>   @file_time = 2010-03-09T14:36:58-06:00
>   entry:NXentry
>     data:NXdata
>       signal = float64(101)
>         @axes = x
>         @signal = 1
>       x = float64(101)
>  >>> print 2*b.entry.data.signal
> [  0.00000000e+00   1.25581039e-01   2.50666467e-01 ...,  -2.50666467e-01
>   -1.25581039e-01  -4.89858720e-16]
> 
> I hope that the ability for scientists to produce their own scripts will 
> then be combined with the ability for computer experts to package these 
> scripts into GUI applications once we know what to do with the data. 
> Both can be layered on the Python interface.
> 
> Of course, I'm not suggesting that everyone should use Python, but I do 
> think we need to provide simple tools to make it obvious why NeXus is 
> worth using. The ability to read, plot, and manipulate arbitrary 
> datasets from any type of neutron and x-ray instrument, without needing 
> to ask a programmer to do it for me, has always been my dream, and I 
> think is now a reality thanks to the work of Paul and others.
> 
> With best regards,
> Ray
> 
> On Mar 9, 2010, at 1:03 PM, Andy Gotz wrote:
> 
>> Hi Gerd + Joachim,
>>
>> I can hear lots of frustration in Joachim's email so I thought I would 
>> share some of my thoughts :
>>
>> I fully agree with Gerd's conclusion that defining a new format is not 
>> an option for me. Nexus is an example of how painful it gets when 
>> defining new data formats. I don't want to go down this road again and 
>> alone. One of the problems is acceptance by the community. If we stick 
>> to Nexus it will eventually be adopted I am sure - perseverance is a 
>> powerful ally. Until it is adopted widely I understand Brian's 
>> frustration - imagine 10 years of trying to get a format accepted and 
>> still no success !
>>
>> I think that some of the reasons why Nexus has not being adopted are : 
>> (1) there has been too much emphasis on storing of raw data, it would 
>> have been much more successful IMHO if Nexus had concentrated on 
>> analysed data. This is what the user sees and wants. It would have 
>> proved the added value of Nexus much quicker. (2) lack of manpower 
>> working on Nexus (maybe this will change with the Pandata networking 
>> activity getting some funding ...). (3) lack of reactivity of the 
>> Nexus developers, partly related to (2) I think but also (1). Recently 
>> at the Hyperspectral workshop at the ESRF we requested that data 
>> dimensions be added as attributes to the Nexus data definition. This 
>> way a program can easily identify the spectra, images and 3d volumes 
>> in a Nexus file. We do not have any feedback from the Nexus community 
>> how long it will take to get something as simple and fundamental as 
>> this to be accepted.
>>
>> Of course it is easy to criticise because I was not there in the 
>> beginning of the Nexus. But I think we need some changes in the way 
>> the Nexus committee works today to make it more reactive. I have just 
>> been replaced on the NIAC committee so again it is easy to make such a 
>> suggestion !
>>
>> In Joachim's example I find it is strange to be discussing his file 
>> format which is essentially how to store 2 columns of data. I agree 
>> that in this case Nexus can be seen as overkill. But if Nexus were 
>> used it could also use one of the standard nexus tools to extract it 
>> to ascii to get the immediate feedback. It also points a finger to the 
>> lack of Nexus viewers. I don't know of any which will refresh the 
>> contents of the displayed file automatically. But the case of 2 
>> columns of ascii data is not what most of us are dealing with. Most of 
>> us have to deal with tens of thousands and millions of images and data 
>> volumes (cf. Brian's 100 TB's). ASCII is NOT an option. We are solving 
>> a new problem in fact. When Nexus was started this was not such a 
>> critical issue. Today it is. The hyperspectral workshop was proof of 
>> this. Again it is mainly interesting to agree on how to store large 
>> volumes of data for analysis programs. Would we be better off adopting 
>> some other 3D data format ? But what about 2D and 1D and the 
>> experimental / data analysis context ?
>>
>> I would say if you want to use YAML for your raw data that is your 
>> choice. But why not join the larger community for helping defining how 
>> to store analysed data. Then we can use your data analysis routines 
>> and vice versa.
>>
>> I think the time has never been riper for Nexus to be adopted and to 
>> become a real standard. Thanks to the new institutes standardising on 
>> Nexus and some of the old ones trying to do the same. But there is no 
>> guarantee that Nexus will succeed. Joachim, your email is proof of this.
>>
>> The bottom line is I think we need to improve the current situation 
>> together and not simply go our own way. The diversity of the 
>> scientific techniques of the communities we are serving makes this a 
>> non-trivial problem.
>>
>> Andy
>>
>> Gerd Wellenreuther wrote:
>>> Hi Joachim,
>>>
>>> Wuttke, Joachim schrieb:
>>>> What I am attacking is not the
>>>> serious work you are doing for really complicated data set, but the 
>>>> idea, popular
>>>> at management level, that NeXus should ultimately be used at _all_ 
>>>> neutron and
>>>> X-ray instruments. All I want to say is: for certain types of 
>>>> instruments, migrating
>>>> to NeXus would be considerable effort, would rather increase than 
>>>> reduce the
>>>> diversity of data formats, would not improve the messy state of 
>>>> software seen by
>>>> the users.
>>> There are lot of issues which can be tackled by the use of a common 
>>> data format, not only simple data exchange, but also software 
>>> developement, data archiving etc.. For this reason, the aim to define 
>>> and spread the use of such a common data format is good, IMHO. But 
>>> when it comes to *implementing* this dataformat at individual 
>>> beamlines / instruments, the corresponding work should *not* be done 
>>> by single beamline scientists, or even single institutes - this would 
>>> again result in different flavours of what was supposed to be the 
>>> common data format (as can be seen by the struggle of facilities like 
>>> Soleil and Diamond, both writing NeXus-files but not being able to 
>>> use the software(s) developed at the other side of the channel). So, 
>>> if we do it right, the single beamline scientist should not be 
>>> required to bear the biggest part of the workload - the opposite 
>>> should be the case. In the long run, we want to save time, right?
>>>
>>> So the goal has to be to define a common data format *ALONG* with a 
>>> high-level API (much higher than the present NeXus-API in my opinion) 
>>> which assists IT-guys and scientists from different facilities to 
>>> exploit the capabilities of NeXus, *AND* further tools for the 
>>> scientists to do their job. Of course, if the management thinks this 
>>> is a good idea, they have to fund people to implement it :).
>>>
>>> Sure, NeXus is not the perfect candidate for such a common data 
>>> format. Fact is also: There is no other candidate. (At this point you 
>>> might notice: I am not considering starting from scratch as an option 
>>> :) .)
>>>
>>> Cheers, Gerd
>>> _______________________________________________
>>> NeXus mailing list
>>> NeXus at nexusformat.org
>>> http://lists.nexusformat.org/mailman/listinfo/nexus
>>
>> _______________________________________________
>> NeXus mailing list
>> NeXus at nexusformat.org
>> http://lists.nexusformat.org/mailman/listinfo/nexus
> 

-- 
Dr. Gerd Wellenreuther
beamline scientist P06 "Hard X-Ray Micro/Nano-Probe"
Petra III project
HASYLAB at DESY
Notkestr. 85
22603 Hamburg

Tel.: + 49 40 8998 5701


More information about the NeXus mailing list