[Nexus-developers] Python Tree API

Wed Aug 29 08:59:59 BST 2012

******************
Sorry if this message is duplicated, the list reported a delivery error and I 
haven't seen the message appear afterwards, so I am resending.
******************

Hi,
I reply to this old thread because only now I got to play with the tree api 
(using the Nexus 4.3 release). 

First of all, I love the tree api. It makes the life *so* much easier than the 
napi...

Here are some comments/questions/suggestions I have so far:

1- The docstring of the nxs module should mention the napi and the tree api, 
possibly with a summary of both and stating its differences/purposes. This way 
a user doing help(nxs) would learn learn where to start looking for help. It 
could also give a simple example on how to open a NeXus file using each API. 
And It should at least mention the relevant classes whose docstrings may help 
the user in understanding the tree api (e.g., it should recommend to read 
NXgroup and NXfield docstrings)

2- I think that even more stress should be put in recommending the dictionary 
approach to access data (over the member approach) for anything other than 
interactive use. Although it is said several times, many of the examples state 
both methods as "equivalent", or even only use the member access for the 
example. This is calling for users to get annoyed 

3- I find the info provided by the docstrings of NXgroup.save (and .write) 
methods not sufficient. For example, if I want to save to a file incrementally, I 
could start by (e.g.) calling save on an empty NXentry, then inserting 
groups/fields on that NXentry and then calling ".write()" the NXentry when I 
want to make the changes permanent. But it is not clear to me from the 
docstrings whether this will waste time rewriting already written groups/data.

4- How do I make a named link? I would expect a keyword argument to the 
NXgroup.makelink() method

5- Is the "name" keyword of the NXgroup.insert() method actually honored? 
Consider the following code:

++++++++++++++++
import nxs
e = nxs.NXentry()
s = nxs.NXsample(name='some_name')
e.insert(s, name='other_name')
print e.tree  

#this outputs:
#
#entry:NXentry
#  some_name:NXsample
++++++++++++++++

I would expect that the NXsample group would be called 'other_name'. This is 
what I understand from the following phrase in the NXgroup docsting:
> insert(self, NXobject, name='unknown'):
>     Insert a valid NXobject (NXfield or NXgroup) into the group.
>     If NXobject has a 'name' attribute and the 'name' keyword is not given,
>     then the object is inserted with the NXobject name.

6- I miss an example on how to write slabs of data, and some comment about how 
to do it efficiently. For example, what would be the best (efficient) way of 
implementing the following napi code using the tree api? (specifically, how can 
I keep the handle for both slabs open using the "with statement"?):

+++++++++++++++++++++
import nxs,numpy
xarray = numpy.arange(5)                                                                                                                                                                                                                          
yarray = x**2                                                                                                                                                                                                                                    
f = nxs.open('test.nex', 'w5')                                                                                                                                                                                                                 
f.makegroup('entry', 'NXentry')
f.opengroup('entry')                                                                                                                                                                                                                       
f.makegroup('data', 'NXdata')
f.opengroup('data')
f.makedata('x',dtype='int64', shape=[nxs.UNLIMITED])
f.makedata('y',dtype='int64', shape=[nxs.UNLIMITED])
for i,(sx,sy) in enumerate(zip(xarray, yarray)):
    f.opendata('x')
    f.putslab([sx], i, [1])
    f.closedata()
    f.opendata('y')
    f.putslab([sy], i, [1])
    f.closedata()
f.close()
+++++++++++++++++++++

7- related to the previous question: it would be very useful if the docstrings 
of NXfield.put and .get provided a code example. Also, what does the "refresh" 
keyword argument do in the put method?

8-Some methods (such as NXfield.put or NXgroup.makelink) seem to require that 
the group/field is already saved to a file. This should be explained in the 
documentation of the methods.

9- Finally a suggestion: if the matplotlib plot support is to be kept (I 
remember to having read somewhere that it might be dropped?), then I suggest 
to restrict the dependency to just the matplotlib.pyplot submodule instead of 
using the whole pylab module (I am not sure, but I think that everything that 
the tree api currently uses from matplotlib, actually comes from the pyplot 
submodule). The reason is that in this way it could easily be made also 
compatible with the guiqwt library, which provides its own compatible 
implementation of pyplot (guiqwt.pyplot): 
http://packages.python.org/guiqwt/reference.html

On Wed 9 November 2011 21:29:56 Ray Osborn wrote:
> Dear colleagues,
> I have been working extensively on the Python tree API following the recent
> NeXus code camp at Argonne that involve some major structural changes that
> I think make the code more robust and convenient, so I think it is time to
> get some feedback from other NeXus developers. There are two issues - one
> is whether these changes are the right way to go and the other is how
> backwardly compatible this needs to be. The Python port of NAPI has not
> been changed, so I am only talking about tree.py, which I suspect has not
> been used that much since it is still not officially a part of the NeXus
> distribution. Python is flexible enough that it may be possible to map
> much of the old API onto the new one, but I only want to do that if it is
> really necessary. However, because the changes are so substantial, I have
> not loaded this on to the Subversion server. If I get enough encouragement
> to do this, I am happy to do so. In the meantime, I attach a tar file with
> the nexus subdirectory. I think that if NEXUSLIB is defined then 'import
> nexus' should work if you put the package in your site-packages directory.
> 
> I don't know the schedule for the next NeXus release, but I suspect that we
> won't be able to get this in, but I would really appreciate it if you can
> find the time to check this out, and send some feedback. We may need to
> check for some backward compatibility issues, although Paul and I thought
> it unlikely that many people have been using the tree part of the API. The
> napi part I haven't touched.
> 
> I have tried to document the changes in the docstrings, so the simplest
> thing is probably to read what I have written particularly for the NXfield
> (formerly SDS) and NXgroup classes. I would be interested how the
> documentation looks in doxygen.
> 
> To summarize, the main change is to explicitly place the NeXus items
> (NXfields or NXgroups) within each group into a dictionary named
> 'entries', but to use the __getattr__ and __setattr__ methods to preserve
> the convenience of the direct syntax.
> 
> i.e., In the example below, after assigning the NXgroup, the following four 
NeXus object assignments are all equivalent:
> >>> entry.sample = NXsample()
> >>> entry.sample.entries['temperature'] = NXfield(40.0,name='temperature')
> >>> entry.sample.temperature = NXfield(40.0)
> >>> entry.sample.temperature = 40.0
> 
> There are a few internal Python attributes that have special meaning, e.g., 
'name', 'group', 'path', so if a NeXus NXfield needs to be called 'group', then 
it has to be entered into the 'entries' dictionary explicitly (as in the first 
example above) or using the insert method:
> >>> entry.sample.insert(NXfield(40.0,name='temperature'))
> 
> This saves typing the name twice. The number of such attributes is pretty
> small - at the moment, they are 'name', 'group', 'entries', 'attrs',
> 'dtype','shape', 'link', 'path', and 'head'. I don't think I have seen any
> of those defined within a NeXus NDL file (I'm not sure about 'shape'), but
> it gives us a way of coping with them if there is a name clash.
> 
> This means that we can relax the requirement that all the methods start
> with 'nx'. Now, we have the 'tree' method, instead of 'nxtree'. Only
> 'nxdata' and 'nxaxes' are kept, because they are both common names used in
> NeXus files.
> 
> In particular, the NXfield attributes can have the same name as the
> standard Numpy attributes (e.g., 'reshape', not 'nxreshape').
> 
> Furthermore, we can make is so that all the Numpy ndarray attributes work
> as well, using the __getattr__ again. So now, we can use them like this.
> 
> >>> x=NXfield(np.linspace(0,100.,11),name='x')
> >>> x.size
> 
>        4
> 
> >>> x.sum()
> 
>        10.0
> 
> >>> x.max()
> 
>        4.0
> 
> >>> x.mean()
> 
> Paul also showed me how to cast NXfields as ndarrays, using the __array__
> method, so we can also do other operations.
> 
> >>> np.sin(x)
> 
>        array([ 0.84147098,  0.90929743,  0.14112001, -0.7568025 ])
> 
> I think this gets as close as possible to making NXfields true subclasses
> of ndarrays, without major surgery.
> 
> I have done the same trick with NeXus attributes, which are stored in the
> 'attrs' dictionary, with class NXattr. This make NXgroup attributes work
> the same way as NXfields. In both cases, I think we could recommend that
> anyone writing scripts should use the wordier dictionary assignments to be
> completely safe. The shorter versions are to make it more convenient and
> intuitive for people in an interactive session, and are important, in my
> opinion, to people finding this attractive to use.
> 
> The docstrings are quite extensive, although they are not complete. Let me
> know if there are things that really need further explanation.
> 
> Thanks,
> Ray

-- 
+----------------------------------------------------+
 Carlos Pascual Izarra
 Scientific Software Coordinator
 Computing Division
 Cells / Alba Synchrotron  [http:/www.cells.es]
 Carretera BP 1413 de Cerdanyola-Sant Cugat, Km. 3.3
 E-08290 Cerdanyola del Valles (Barcelona), Spain
 E-mail: cpascual at cells.es
 Phone: +34 93 592 4428
+----------------------------------------------------+