[Nexus] Fwd: Re: [MAHID] naming conventions

freddie.akeroyd at stfc.ac.uk freddie.akeroyd at stfc.ac.uk
Fri Jan 29 18:51:30 GMT 2010


Pete,

I believe the NeXus aim was to stick to character sequences that were
also valid as program variable names; this allows programming language
classes/structures to be built that mirror a defined file structure. The
scheme you enclose below allows "." which is usually invalid in program
variable names, instead being reserved as an operator. The expression
"[_a-zA-Z][_a-zA-Z0-9]*" fits with the NeXus aim and is probably the
best to use, but I've just noticed a small mistake in
http://svn.nexusformat.org/definitions/trunk/NeXus.xsd as it uses
"[_a-zA-Z0-9]+" for the "validName" restriction thus allowing variable
names to start with a digit rather than only contain them; we should
update it to "[_a-zA-Z][_a-zA-Z0-9]*"

Regards,

Freddie

> -----Original Message-----
> From: nexus-bounces at nexusformat.org [mailto:nexus-
> bounces at nexusformat.org] On Behalf Of Pete R. Jemian
> Sent: 29 January 2010 18:17
> To: NeXus
> Subject: [Nexus] Fwd: Re: [MAHID] naming conventions
> 
> 
> Matt Newville suggests a stronger declaration of our naming rules.
> Our manual, in ClassDefinitions.xml, says:
> -------------------% clip here %-------------------
>                          <para>Short name of the data field.
>                              Name must satisfy both HDF and XML
>                              naming rules.</para>
> -------------------% clip here %-------------------
> 
> 
> Matt suggests something stronger, derived from the XML standard,
> -------------------% clip here %-------------------
> Names for Groups, Datasets, and attributes must match:
>    NameStartChar ::=  _ | a..z | A..Z
>    NameChar      ::=  NameStartChar | . | 0..9
>    Name          ::=  NameStartChar (NameChar)*
> 
> Or, as a regular expression:  [_a-zA-Z][_a-zA-Z.0-9]*
> -------------------% clip here %-------------------
> 
> Are these names validated in any way?
> 
> Also, we _must_ tighten up our examples (in the manual
> and example data files)!
> 
> Comments?
> 
> 
> 
> 
> -------- Original Message --------
> Subject: Re: [MAHID] naming conventions
> Date: Fri, 29 Jan 2010 11:16:10 -0600
> From: Matt Newville <newville at cars.uchicago.edu>
> Reply-To: mahid at googlegroups.com
> To: mahid at googlegroups.com
> 
> Hi,
> 
> I think that the Nexus approach toward names of (correct me if I have
> this wrong)
>      A Name for Group, Dataset or attributes
>      must be a valid HDF5 and XML name.
> 
> is a bit too weak.  To verify a name is allowed, does one check both?
> I don't actually see a simple grammar production for HDF5 names (I
> believe it may simply be "char*").  Spaces and non-printable ASCII
> characters are definitely allowed, and I suspect that unicode support
> in names may vary with HDF5 versions and libraries.
> 
> I think non-printable characters and whitespace should be avoided.  If
> I read it correctly, one of the examples in the Nexus doc has a
> dataset named " data " (Example 3.1, page 16):
>      <NXdata name=" data " >
>          <time_of_flight axis= 1 primary= 1 > 1500.0 1502.0 1504.0 ...
> </time_of_flight>
>          <polar_angle axis= 2 primary= 1 > 15.0 15.6 16.2 ...
> </polar_angle>
>          <data > 5 7 14 ... </data>
>      </NXdata>
> 
> That could be unintentional, but (if I understand correctly) the
> corresponding HDF5 file would have a Group named " data ", which is
> allowed (both HDF5 and XML).  That seems problematic to me (what if
> there are Groups named 'data', ' data ', and ' data'?).  I recommend a
> much simplified variation of the XML grammar production that doesn't
> allow whitespace, non-printable characters, or most punctuation in
> names.  Specifically, I suggest
> 
> Names for Groups, Datasets, and attrbutes must match:
>    NameStartChar ::=  _ | a..z | A..Z
>    NameChar        ::=  NameStartChar | . | 0..9
>    Name               ::=  NameStartChar (NameChar)*
> 
> Or, as a regular expression:  [_a-zA-Z][_a-zA-Z.0-9]*
> 
> We could consider other punctuation characters, such as '@$&~|:-', but
> I think we could easily live without these too.
> 
> Any comments?
> 
> Cheers,
> 
> --Matt Newville
> _______________________________________________
> NeXus mailing list
> NeXus at nexusformat.org
> http://lists.nexusformat.org/mailman/listinfo/nexus
-- 
Scanned by iCritical.


More information about the NeXus mailing list