[Nexus] [netcdfgroup] [Hdf-forum] Detecting netCDF versus HDF5 -- PROPOSED SOLUTIONS --REQUEST FOR COMMENTS
dmh at ucar.edu
dmh at ucar.edu
Thu Apr 21 23:02:20 BST 2016
If you have hdf5 files that should be readable, then I will undertake to
look at them and see what the problem is.
WRT to old files: We could produce a utility that would redef the file
and insert the
_NCProperties attribute. This would allow someone to wholesale
mark old files.
=Dennis Heimbigner
Unidata
On 4/21/2016 2:17 PM, Pedro Vicente wrote:
> Dennis
>
>>>>> I am in the process of adding a global attribute in the root group
>> that captures both the netcdf library version and the hdf5 library
>> version
>> whenever a netcdf file is created. The current form is
>> _NCProperties="version=...|netcdflibversion=...|hdflibversion=..."
>
>
> ok, good to know, thank you
>
>
>>>> > 1. I am open to suggestions about changing the format or adding
>>>> info > to it.
>
>
> I personally don't care, anything that uniquely identifies a netCDF
> file (HDF5 based) as such will work
>
>
>>>> 2. Of course this attribute will not exist in files written using
>>>> older
>> versions of the netcdf library, but at least the process will have
>> begun.
>
> yes
>
>
>> 3. This technically does not address the original issue because there
>> exist
>> hdf5 files not written by netcdf that are still compatible with
>> and can be
>> read by netcdf. Not sure this case is important or not.
>
> there will always be HDF5 files not written by netcdf that netCDF
> will read as we are now.
>
> this is not really the issue, but you just made a further issue :-)
>
> the issue is that I would like an application that reads a netCDF
> (HDF5 based) file to decide to use the netCDF or HDF5 API.
> your attribute writing will do , for future files.
> for older nertCDF files there may be a way to detect the current
> attributes and data structures to see if we can make it "identify itself"
> as netCDF. A bit of debugging will confirm that, since Dimension
> Scales are used, that would be an (imperfect maybe) way to do it
>
> regarding the "further issue " above
>
> you could go one step further and for any HDF5 files not written by
> netcdf , you could make netCDF reject the file reading,
> because it's not "netCDF compliant".
> Since having netCDF read pure HDF5 files is not a problem (at least
> for me), I don't know if you would want to do this, just an idea.
> In my mind taking complexity and ambiguities of problems is always a
> good thing
>
>
> ah, I forgot one thing, related to this
>
>
> In the past I have found several pure HDF5 files that netCDF failed in
> reading.
> Since netCDF is HDF5 binary compatible, one would expect that all HDF5
> files will be read by netCDF.
> Except if you specifically wrote something in the code that makes it
> to fail if some condition is not met,
> This was a while ago, I'll try to find those cases and I'll send a bug
> report to the bug report email
>
> ----------------------
> Pedro Vicente
> pedro.vicente at space-research.org
> https://twitter.com/_pedro__vicente
> http://www.space-research.org/
>
> ----- Original Message ----- From: <dmh at ucar.edu>
> To: "Pedro Vicente" <pedro.vicente at space-research.org>; "HDF Users
> Discussion List" <hdf-forum at lists.hdfgroup.org>;
> <cf-metadata at cgd.ucar.edu>; "Discussion forum for the NeXus data
> format" <nexus at nexusformat.org>; <netcdfgroup at unidata.ucar.edu>
> Cc: "John Shalf" <jshalf at lbl.gov>; <Richard.E.Ullman at nasa.gov>;
> "Marinelli, Daniel J. (GSFC-5810)" <daniel.j.marinelli at nasa.gov>;
> "Miller, Mark C." <miller86 at llnl.gov>
> Sent: Thursday, April 21, 2016 2:30 PM
> Subject: Re: [netcdfgroup] [Hdf-forum] Detecting netCDF versus HDF5
> -- PROPOSED SOLUTIONS --REQUEST FOR COMMENTS
>
>
>> I am in the process of adding a global attribute in the root group
>> that captures both the netcdf library version and the hdf5 library
>> version
>> whenever a netcdf file is created. The current form is
>> _NCProperties="version=...|netcdflibversion=...|hdflibversion=..."
>> Where version is the version of the _NCProperties attribute and the
>> others
>> are e.g. 1.8.18 or 4.4.1-rc1.
>> Issues:
>> 1. I am open to suggestions about changing the format or adding info
>> to it.
>> 2. Of course this attribute will not exist in files written using
>> older versions
>> of the netcdf library, but at least the process will have begun.
>> 3. This technically does not address the original issue because there
>> exist
>> hdf5 files not written by netcdf that are still compatible with
>> and can be
>> read by netcdf. Not sure this case is important or not.
>> =Dennis Heimbigner
>> Unidata
>>
>>
>> On 4/21/2016 9:33 AM, Pedro Vicente wrote:
>>> DETECTING HDF5 VERSUS NETCDF GENERATED FILES
>>> REQUEST FOR COMMENTS
>>> AUTHOR: Pedro Vicente
>>>
>>> AUDIENCE:
>>> 1) HDF, netcdf developers,
>>> Ed Hartnett
>>> Kent Yang
>>> 2) HDF, netcdf users, that replied to this thread
>>> Miller, Mark C.
>>> John Shalf
>>> 3 ) netcdf tools developers
>>> Mary Haley , NCL
>>> 4) HDF, netcdf managers and sponsors
>>> David Pearah , CEO HDF Group
>>> Ward Fisher, UCAR
>>> Marinelli, Daniel J. , Richard Ullmman, Christopher Lynnes, NASA
>>> 5)
>>> [CF-metadata] list
>>> After this thread started 2 months ago, there was an annoucement on
>>> the [CF-metadata] mail list
>>> about
>>> "a meeting to discuss current and future netCDF-CF efforts and
>>> directions.
>>> The meeting will be held on 24-26 May 2016 in Boulder, CO, USA at
>>> the UCAR Center Green facility."
>>> This would be a good topic to put on the agenda, maybe?
>>> THE PROBLEM:
>>> Currently it is impossible to detect if an HDF5 file was generated
>>> by the HDF5 API or by the netCDF API.
>>> See previous email about the reasons why.
>>> WHY THIS MATTERS:
>>> Software applications that need to handle both netCDF and HDF5 files
>>> cannot decide which API to use.
>>> This includes popular visualization tools like IDL, Matlab, NCL, HDF
>>> Explorer.
>>> SOLUTIONS PROPOSED: 2
>>> SOLUTION 1: Add a flag to HDF5 source
>>> The hdf5 format specification, listed here
>>> https://www.hdfgroup.org/HDF5/doc/H5.format.html
>>> describes a sequence of bytes in the file layout that have special
>>> meaning for the HDF5 API. It is common practice, when designing a
>>> data format,
>>> so leave some fields "reserved for future use".
>>> This solution makes use of one of these empty "reserved for future
>>> use" spaces to save a byte (for example) that describes an enumerator
>>> of "HDF5 compatible formats".
>>> An "HDF5 compatible format" is a data format that uses the HDF5 API
>>> at a lower level (usually hidden from the user of the upper API),
>>> and providing its own API.
>>> This category can still be divide in 2 formats:
>>> 1) A "pure HDF5 compatible format". Example, NeXus
>>> http://www.nexusformat.org/
>>> NeXus just writes some metadata (attributes) on top of the HDF5 API,
>>> that has some special meaning for the NeXus community
>>> 2) A "non pure HDF5 compatible format". Example, netCDF
>>> Here, the format adds some extra feature besides HDF5. In the case
>>> of netCDF, these are shared dimensions between variables.
>>> This sub-division between 1) and 2) is irrelevant for the problem
>>> and solution in question
>>> The solution consists of writing a different enumerator value on the
>>> "reserved for future use" space. For example
>>> Value decimal 0 (current value): This file was generated by the HDF5
>>> API (meaning the HDF5 only API)
>>> Value decimal 1: This file was generated by the netCDF API (using HDF5)
>>> Value decimal 2: This file was generated by <put here another HDF5
>>> based format>
>>> and so on
>>> The advantage of this solution is that this process involves 2
>>> parties: the HDF Group and the other format's organization.
>>> This allows the HDF Group to "keep track" of new HDF5 based formats.
>>> It allows to make the other format "HDF5 certified" .
>>> SOLUTION 2: Add some metadata to the other API on top of HDF5
>>> This is what Nexus uses.
>>> A Nexus file on creation writes several attributes on the root
>>> group, like "NeXus_version" and other numeric data.
>>> This is done using the public HDF5 API calls.
>>> The solution for netCDF consists of the same approach, just write
>>> some specific attributes, and a special netCDF API to write/read them.
>>> This solutions just requires the work of one party (the netCDF group)
>>> END OF RFC
>>> In reply to people that commented in the thread
>>> @John Shalf
>>> >>Perhaps NetCDF (and other higher-level APIs that are built on top of
>>> HDF5) should include an attribute attached
>>> >>to the root group that identifies the name and version of the API
>>> that created the file? (adopt this as a convention)
>>> yes, that's one way to do it, Solution 2 above
>>> @Mark Miller
>>> >>>Hmmm. Is there any big reason NOT to try to read a netCDF produced
>>> HDF5 file with the native HDF5 library if someone so chooses?
>>> It's possible to read a netCDF file using HDF5, yes.
>>> There are 2 things that you will miss doing this:
>>> 1) the ability to inquire about shared netCDF dimensions.
>>> 2) the ability to read remotely with openDAP.
>>> Reading with HDF5 also exposes metadata that is supposed to be
>>> private to netCDF. See below
>>> >>>> And, attempting to read an HDF5 file produced by Silo using just
>>> the HDF5 library (e.g. w/o Silo) is a major pain.
>>> This I don't understand. Why not read the Silo file with the Silo API?
>>> That's the all purpose of this issue, each higher level API on top
>>> of HDF5 should be able to detect "itself".
>>> I am not familiar with Silo, but if Silo cannot do this, then you
>>> have the same design flaw that netCDF has.
>>>
>>> >>> In a cursory look over the libsrc4 sources in netCDF distro, I see
>>> a few things that might give a hint a file was created with netCDF. . .
>>> >>>> First, in NC_CLASSIC_MODEL, an attribute gets attached to the
>>> root group named "_nc3_strict". So, the existence of an attribute on
>>> the root group by that name would suggest the HDF5 file was
>>> generated by netCDF.
>>> I think this is done only by the "old" netCDF3 format.
>>> >>>>> Also, I tested a simple case of nc_open, nc_def_dim, etc.
>>> nc_close to see what it produced.
>>> >>>> It appears to produce datasets for each 'dimension' defined with
>>> two attributes named "CLASS" and "NAME".
>>> This is because netCDF uses the HDF5 Dimension Scales API internally
>>> to keep track of shared dimensions. These are internal attributes
>>> of Dimension Scales. This approach would not work because an HDF5
>>> only file with Dimension Scales would have the same attributes.
>>>
>>> >>>> I like John's suggestion here.
>>> >>>>>But, any code you add to any applications now will work *only*
>>> for files that were produced post-adoption of this convention.
>>> yes. there are 2 actions to take here.
>>> 1) fix the issue for the future
>>> 2) try to retroactively have some workaround that makes possible now
>>> to differentiate a HDF5/netCDF files made before the adopted convention
>>> see below
>>>
>>> >>>> In VisIt, we support >140 format readers. Over 20 of those are
>>> different variants of HDF5 files (H5part, Xdmf, Pixie, Silo, Samrai,
>>> netCDF, Flash, Enzo, Chombo, etc., etc.)
>>> >>>>When opening a file, how does VisIt figure out which plugin to
>>> use? In particular, how do we avoid one poorly written reader plugin
>>> (which may be the wrong one for a given file) from preventing the
>>> correct one from being found. Its kinda a hard problem.
>>>
>>> Yes, that's the problem we are trying to solve. I have to say, that
>>> is quick a list of HDF5 based formats there.
>>> >>>> Some of our discussion is captured here. . .
>>> http://www.visitusers.org/index.php?title=Database_Format_Detection
>>> I"ll check it out, thank you for the suggestions
>>> @Ed Hartnett
>>> >>>I must admit that when putting netCDF-4 together I never considered
>>> that someone might want to tell the difference between a "native"
>>> HDF5 file and a netCDF-4/HDF5 file.
>>> >>>>>Well, you can't think of everything.
>>> This is a major design flaw.
>>> If you are in the business of designing data file formats, one of
>>> the things you have to do is how to make it possible to identify it
>>> from the other formats.
>>>
>>> >>> I agree that it is not possible to canonically tell the
>>> difference. The netCDF-4 API does use some special attributes to
>>> track named dimensions,
>>> >>>>and to tell whether classic mode should be enforced. But it can
>>> easily produce files without any named dimensions, etc.
>>> >>>So I don't think there is any easy way to tell.
>>> I remember you wrote that code together with Kent Yang from the HDF
>>> Group.
>>> At the time I was with the HDF Group but unfortunately I did follow
>>> closely what you were doing.
>>> I don't remember any design document being circulated that explains
>>> the internals of the "how to" make the netCDF (classic) model of
>>> shared dimensions
>>> use the hierarchical group model of HDF5.
>>> I know this was done using the HDF5 Dimension Scales (that I wrote),
>>> but is there any design document that explains it?
>>> Maybe just some internal email exchange between you and Kent Yang?
>>> Kent, how are you?
>>> Do you remember having any design document that explains this?
>>> Maybe something like a unique private attribute that is written
>>> somewhere in the netCDF file?
>>>
>>> @Mary Haley, NCL
>>> NCL is a widely used tool that handles both netCDF and HDF5
>>> Mary, how are you?
>>> How does NCL deal with the case of reading both pure HDF5 files and
>>> netCDF files that use HDF5?
>>> Would you be interested in joining a community based effort to deal
>>> with this, in case this is an issue for you?
>>>
>>> @David Pearah , CEO HDF Group
>>> I volunteer to participate in the effort of this RFC together with
>>> the HDF Group (and netCDF Group).
>>> Maybe we could make a "task force" between HDF Group, netCDF Group
>>> and any volunteer (such as tools developers that happen to be in
>>> these mail lists)?
>>> The "task force" would have 2 tasks:
>>> 1) make a HDF5 based convention for the future and
>>> 2) try to retroactively salvage the current design issue of netCDF
>>> My phone is 217-898-9356, you are welcome to call in anytime.
>>> ----------------------
>>> Pedro Vicente
>>> pedro.vicente at space-research.org
>>> <mailto:pedro.vicente at space-research.org>
>>> https://twitter.com/_pedro__vicente
>>> http://www.space-research.org/
>>>
>>> ----- Original Message -----
>>> *From:* Miller, Mark C. <mailto:miller86 at llnl.gov>
>>> *To:* HDF Users Discussion List
>>> <mailto:hdf-forum at lists.hdfgroup.org>
>>> *Cc:* netcdfgroup at unidata.ucar.edu
>>> <mailto:netcdfgroup at unidata.ucar.edu> ; Ward Fisher
>>> <mailto:wfisher at ucar.edu>
>>> *Sent:* Wednesday, March 02, 2016 7:07 PM
>>> *Subject:* Re: [Hdf-forum] Detecting netCDF versus HDF5
>>>
>>> I like John's suggestion here.
>>>
>>> But, any code you add to any applications now will work *only* for
>>> files that were produced post-adoption of this convention.
>>>
>>> There are probably a bazillion files out there at this point that
>>> don't follow that convention and you probably still want your
>>> applications to be able to read them.
>>>
>>> In VisIt, we support >140 format readers. Over 20 of those are
>>> different variants of HDF5 files (H5part, Xdmf, Pixie, Silo,
>>> Samrai, netCDF, Flash, Enzo, Chombo, etc., etc.) When opening a
>>> file, how does VisIt figure out which plugin to use? In
>>> particular, how do we avoid one poorly written reader plugin
>>> (which may be the wrong one for a given file) from preventing the
>>> correct one from being found. Its kinda a hard problem.
>>>
>>> Some of our discussion is captured here. . .
>>>
>>> http://www.visitusers.org/index.php?title=Database_Format_Detection
>>>
>>> Mark
>>>
>>>
>>> From: Hdf-forum <hdf-forum-bounces at lists.hdfgroup.org
>>> <mailto:hdf-forum-bounces at lists.hdfgroup.org>> on behalf of John
>>> Shalf <jshalf at lbl.gov <mailto:jshalf at lbl.gov>>
>>> Reply-To: HDF Users Discussion List <hdf-forum at lists.hdfgroup.org
>>> <mailto:hdf-forum at lists.hdfgroup.org>>
>>> Date: Wednesday, March 2, 2016 1:02 PM
>>> To: HDF Users Discussion List <hdf-forum at lists.hdfgroup.org
>>> <mailto:hdf-forum at lists.hdfgroup.org>>
>>> Cc: "netcdfgroup at unidata.ucar.edu
>>> <mailto:netcdfgroup at unidata.ucar.edu>"
>>> <netcdfgroup at unidata.ucar.edu
>>> <mailto:netcdfgroup at unidata.ucar.edu>>, Ward Fisher
>>> <wfisher at ucar.edu <mailto:wfisher at ucar.edu>>
>>> Subject: Re: [Hdf-forum] Detecting netCDF versus HDF5
>>>
>>> Perhaps NetCDF (and other higher-level APIs that are built on
>>> top of HDF5) should include an attribute attached to the root
>>> group that identifies the name and version of the API that
>>> created the file? (adopt this as a convention)
>>>
>>> -john
>>>
>>> On Mar 2, 2016, at 12:55 PM, Pedro Vicente
>>> <pedro.vicente at space-research.org
>>> <mailto:pedro.vicente at space-research.org>> wrote:
>>> Hi Ward
>>> As you know, Data Explorer is going to be a general
>>> purpose data reader for many formats, including HDF5 and
>>> netCDF.
>>> Here
>>> http://www.space-research.org/
>>> Regarding the handling of both HDF5 and netCDF, it seems
>>> there is a potential issue, which is, how to tell if any
>>> HDF5 file was saved by the HDF5 API or by the netCDF API?
>>> It seems to me that this is not possible. Is this correct?
>>> netCDF uses an internal function NC_check_file_type to
>>> examine the first few bytes of a file, and for example for
>>> any HDF5 file the test is
>>> /* Look at the magic number */
>>> /* Ignore the first byte for HDF */
>>> if(magic[1] == 'H' && magic[2] == 'D' && magic[3] ==
>>> 'F') {
>>> *filetype = FT_HDF;
>>> *version = 5;
>>> The problem is that this test works for any HDF5 file and
>>> for any netCDF file, which makes it impossible to tell
>>> which is which.
>>> Which makes it impossible for any general purpose data
>>> reader to decide to use the netCDF API or the HDF5 API.
>>> I have a possible solution for this , but before going any
>>> further, I would just like to confirm that
>>> 1) Is indeed not possible
>>> 2) See if you have a solid workaround for this,
>>> excluding the dumb ones, for example deciding on a
>>> extension .nc or .h5, or traversing the HDF5 file to see
>>> if it's non netCDF conforming one. Yes, to further
>>> complicate things, it is possible that the above test says
>>> OK for a HDF5 file, but then the read by the netCDF API
>>> fails because the file is a HDF5 non netCDF conformant
>>> Thanks
>>> ----------------------
>>> Pedro Vicente
>>> pedro.vicente at space-research.org
>>> <mailto:pedro.vicente at space-research.org>
>>> http://www.space-research.org/
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> Hdf-forum at lists.hdfgroup.org
>>> <mailto:Hdf-forum at lists.hdfgroup.org>
>>>
>>> http://secure-web.cisco.com/1r-EJFFfg6rWlpQsvXstBNTjaHQaKT_NkYRN0Jj_f-Z3EK0-hs6IbYc8XUBRyPsH3mU3CS0iiY7_qnchCA0QxNzQt270d_2HikCwpAWFmuHdacin62eaODutktDSOULIJmVbVYqFVSKWPzoX7kdP0yN9wIzSFxZfTwfhU8ebsN409xRg1PsW_8cvNiWzxDNm9wv9yBf9yK6nkEm-bOx2S0kBLbg9WfIChWzZrkpE3AHU9I-c2ZRH_IN-UF4g_g0_Dh4qE1VETs7tZTfKd1ox1MtBmeyKf7EKUCd3ezR9EbI5tK4hCU5qW4v5WWOxOrD17e8yCVmob27xz84Lr3bCK5wIQdH5VzFRTtyaAhudpt9E/http%3A%2F%2Flists.hdfgroup.org%2Fmailman%2Flistinfo%2Fhdf-forum_lists.hdfgroup.org
>>>
>>> Twitter: https://twitter.com/hdf5
>>>
>>>
>>>
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> Hdf-forum at lists.hdfgroup.org
>>> <mailto:Hdf-forum at lists.hdfgroup.org>
>>>
>>> http://secure-web.cisco.com/1r-EJFFfg6rWlpQsvXstBNTjaHQaKT_NkYRN0Jj_f-Z3EK0-hs6IbYc8XUBRyPsH3mU3CS0iiY7_qnchCA0QxNzQt270d_2HikCwpAWFmuHdacin62eaODutktDSOULIJmVbVYqFVSKWPzoX7kdP0yN9wIzSFxZfTwfhU8ebsN409xRg1PsW_8cvNiWzxDNm9wv9yBf9yK6nkEm-bOx2S0kBLbg9WfIChWzZrkpE3AHU9I-c2ZRH_IN-UF4g_g0_Dh4qE1VETs7tZTfKd1ox1MtBmeyKf7EKUCd3ezR9EbI5tK4hCU5qW4v5WWOxOrD17e8yCVmob27xz84Lr3bCK5wIQdH5VzFRTtyaAhudpt9E/http%3A%2F%2Flists.hdfgroup.org%2Fmailman%2Flistinfo%2Fhdf-forum_lists.hdfgroup.org
>>>
>>> Twitter: https://twitter.com/hdf5
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> Hdf-forum at lists.hdfgroup.org
>>>
>>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>> Twitter: https://twitter.com/hdf5
>>>
>>>
>>>
>>> _______________________________________________
>>> netcdfgroup mailing list
>>> netcdfgroup at unidata.ucar.edu
>>> For list information or to unsubscribe, visit:
>>> http://www.unidata.ucar.edu/mailing_lists/
>>
>
More information about the NeXus
mailing list