[Nexus-developers] Memory use during putslab and getslab

Mark Koennecke Mark.Koennecke at psi.ch
Thu Mar 17 07:31:56 GMT 2011


Ray,

Ray Osborn wrote:
> We are trying to the use the putslab and getslab modules in Paul Kienzle's Python interface in order to avoid memory consumption but observe memory spikes during the operation itself. I'm afraid that I can't tell at this stage whether it's caused by the Python interface or the underlying HDF5 C-interface so I would welcome some comments. 
>
> Basically, I am allocating an SDS with the right type and shape but no values and saving it as a NeXus file. This seems to work - the file is still small and there are no memory spikes.
>
>   
>>>> entry=SDS(name='data', dtype='float32', shape=[200000000])
>>>> entry.nxsave('tmp.nxs')
>>>> entry.data.nxget(1,2)
>>>>         
> array([ 0.,  0.], dtype=float32)
>
> If you are familiar with the Python interface, you will know that a.nxdata is always None, i.e., the data array is never allocated, but the nxget module, which just calls the C getslab routine returns an array filled with zeros, presumably the HDF5 default.
>
> Then I try to write a slab.
>
>   
>>>> slab=np.array((1.0,2.0),dtype='float32')
>>>> entry.data.nxput(slab,1)
>>>> entry.data.nxget(1,2)
>>>>         
> array([ 1.,  2.], dtype=float32)
>
> This time, both the nxput and nxget calls produce a memory spike of over 800MB, the size of the allocated data, before returning to normal.
>
> Does anybody know why the whole array gets allocated temporarily, and is there a way of avoiding this? Obviously, if we are talking about a 50GB array, then this is likely to produce a memory error (although I haven't tested that yet). 
>
>   
I am missing one important morsel of information: when creating the 
array, what do you specify as the chunksize?
The recommendation is to set the chunksize to approximately  the size of 
the slabs you are going to read and write.
If the whole dataset is just one chunk this would explain your 
observations.

> Also, is there a way to make the undefined data get returned as NaNs? Presumably, HDF5 is returning a default value when it finds no stored data, so can that default value be changed, and could this be an option in the NeXus API?
>
>   
We have to look into this...

Mark

> Any comments welcome.
>
> Thanks,
> Ray
>   



More information about the NeXus-developers mailing list