[Nexus] Fwd: Re: [MAHID] naming conventions

Pete R. Jemian prjemian at gmail.com
Fri Jan 29 18:16:51 GMT 2010


Matt Newville suggests a stronger declaration of our naming rules.
Our manual, in ClassDefinitions.xml, says:
-------------------% clip here %-------------------
                         <para>Short name of the data field.
                             Name must satisfy both HDF and XML
                             naming rules.</para>
-------------------% clip here %-------------------


Matt suggests something stronger, derived from the XML standard,
-------------------% clip here %-------------------
Names for Groups, Datasets, and attributes must match:
   NameStartChar ::=  _ | a..z | A..Z
   NameChar      ::=  NameStartChar | . | 0..9
   Name          ::=  NameStartChar (NameChar)*

Or, as a regular expression:  [_a-zA-Z][_a-zA-Z.0-9]*
-------------------% clip here %-------------------

Are these names validated in any way?

Also, we _must_ tighten up our examples (in the manual
and example data files)!

Comments?




-------- Original Message --------
Subject: Re: [MAHID] naming conventions
Date: Fri, 29 Jan 2010 11:16:10 -0600
From: Matt Newville <newville at cars.uchicago.edu>
Reply-To: mahid at googlegroups.com
To: mahid at googlegroups.com

Hi,

I think that the Nexus approach toward names of (correct me if I have
this wrong)
     A Name for Group, Dataset or attributes
     must be a valid HDF5 and XML name.

is a bit too weak.  To verify a name is allowed, does one check both?
I don't actually see a simple grammar production for HDF5 names (I
believe it may simply be "char*").  Spaces and non-printable ASCII
characters are definitely allowed, and I suspect that unicode support
in names may vary with HDF5 versions and libraries.

I think non-printable characters and whitespace should be avoided.  If
I read it correctly, one of the examples in the Nexus doc has a
dataset named " data " (Example 3.1, page 16):
     <NXdata name=" data " >
         <time_of_flight axis= 1 primary= 1 > 1500.0 1502.0 1504.0 ...
</time_of_flight>
         <polar_angle axis= 2 primary= 1 > 15.0 15.6 16.2 ... </polar_angle>
         <data > 5 7 14 ... </data>
     </NXdata>

That could be unintentional, but (if I understand correctly) the
corresponding HDF5 file would have a Group named " data ", which is
allowed (both HDF5 and XML).  That seems problematic to me (what if
there are Groups named 'data', ' data ', and ' data'?).  I recommend a
much simplified variation of the XML grammar production that doesn't
allow whitespace, non-printable characters, or most punctuation in
names.  Specifically, I suggest

Names for Groups, Datasets, and attrbutes must match:
   NameStartChar ::=  _ | a..z | A..Z
   NameChar        ::=  NameStartChar | . | 0..9
   Name               ::=  NameStartChar (NameChar)*

Or, as a regular expression:  [_a-zA-Z][_a-zA-Z.0-9]*

We could consider other punctuation characters, such as '@$&~|:-', but
I think we could easily live without these too.

Any comments?

Cheers,

--Matt Newville


More information about the NeXus mailing list