[Nexus] Fwd: Re: [MAHID] naming conventions
Pete R. Jemian
prjemian at gmail.com
Fri Jan 29 18:16:51 GMT 2010
Matt Newville suggests a stronger declaration of our naming rules.
Our manual, in ClassDefinitions.xml, says:
-------------------% clip here %-------------------
<para>Short name of the data field.
Name must satisfy both HDF and XML
naming rules.</para>
-------------------% clip here %-------------------
Matt suggests something stronger, derived from the XML standard,
-------------------% clip here %-------------------
Names for Groups, Datasets, and attributes must match:
NameStartChar ::= _ | a..z | A..Z
NameChar ::= NameStartChar | . | 0..9
Name ::= NameStartChar (NameChar)*
Or, as a regular expression: [_a-zA-Z][_a-zA-Z.0-9]*
-------------------% clip here %-------------------
Are these names validated in any way?
Also, we _must_ tighten up our examples (in the manual
and example data files)!
Comments?
-------- Original Message --------
Subject: Re: [MAHID] naming conventions
Date: Fri, 29 Jan 2010 11:16:10 -0600
From: Matt Newville <newville at cars.uchicago.edu>
Reply-To: mahid at googlegroups.com
To: mahid at googlegroups.com
Hi,
I think that the Nexus approach toward names of (correct me if I have
this wrong)
A Name for Group, Dataset or attributes
must be a valid HDF5 and XML name.
is a bit too weak. To verify a name is allowed, does one check both?
I don't actually see a simple grammar production for HDF5 names (I
believe it may simply be "char*"). Spaces and non-printable ASCII
characters are definitely allowed, and I suspect that unicode support
in names may vary with HDF5 versions and libraries.
I think non-printable characters and whitespace should be avoided. If
I read it correctly, one of the examples in the Nexus doc has a
dataset named " data " (Example 3.1, page 16):
<NXdata name=" data " >
<time_of_flight axis= 1 primary= 1 > 1500.0 1502.0 1504.0 ...
</time_of_flight>
<polar_angle axis= 2 primary= 1 > 15.0 15.6 16.2 ... </polar_angle>
<data > 5 7 14 ... </data>
</NXdata>
That could be unintentional, but (if I understand correctly) the
corresponding HDF5 file would have a Group named " data ", which is
allowed (both HDF5 and XML). That seems problematic to me (what if
there are Groups named 'data', ' data ', and ' data'?). I recommend a
much simplified variation of the XML grammar production that doesn't
allow whitespace, non-printable characters, or most punctuation in
names. Specifically, I suggest
Names for Groups, Datasets, and attrbutes must match:
NameStartChar ::= _ | a..z | A..Z
NameChar ::= NameStartChar | . | 0..9
Name ::= NameStartChar (NameChar)*
Or, as a regular expression: [_a-zA-Z][_a-zA-Z.0-9]*
We could consider other punctuation characters, such as '@$&~|:-', but
I think we could easily live without these too.
Any comments?
Cheers,
--Matt Newville
More information about the NeXus
mailing list