[Nexus] Fwd: Re: [MAHID] naming conventions

Pete R. Jemian prjemian at gmail.com
Fri Jan 29 19:01:35 GMT 2010


Freddie:

Agreed.

NeXus.xsd is not used to validate NXDL files.
NXDL files validate against nxdl.xsd which includes nxdlTypes.xsd.

Can we merge these?
Or, I add the same rule to nxdl.xsd.
Currently, it permits names to be "xs:string" and that is not good.

Pete

On 1/29/2010 12:51 PM, freddie.akeroyd at stfc.ac.uk wrote:
> Pete,
>
> I believe the NeXus aim was to stick to character sequences that were
> also valid as program variable names; this allows programming language
> classes/structures to be built that mirror a defined file structure. The
> scheme you enclose below allows "." which is usually invalid in program
> variable names, instead being reserved as an operator. The expression
> "[_a-zA-Z][_a-zA-Z0-9]*" fits with the NeXus aim and is probably the
> best to use, but I've just noticed a small mistake in
> http://svn.nexusformat.org/definitions/trunk/NeXus.xsd as it uses
> "[_a-zA-Z0-9]+" for the "validName" restriction thus allowing variable
> names to start with a digit rather than only contain them; we should
> update it to "[_a-zA-Z][_a-zA-Z0-9]*"
>
> Regards,
>
> Freddie
>
>> -----Original Message-----
>> From: nexus-bounces at nexusformat.org [mailto:nexus-
>> bounces at nexusformat.org] On Behalf Of Pete R. Jemian
>> Sent: 29 January 2010 18:17
>> To: NeXus
>> Subject: [Nexus] Fwd: Re: [MAHID] naming conventions
>>
>>
>> Matt Newville suggests a stronger declaration of our naming rules.
>> Our manual, in ClassDefinitions.xml, says:
>> -------------------% clip here %-------------------
>>                           <para>Short name of the data field.
>>                               Name must satisfy both HDF and XML
>>                               naming rules.</para>
>> -------------------% clip here %-------------------
>>
>>
>> Matt suggests something stronger, derived from the XML standard,
>> -------------------% clip here %-------------------
>> Names for Groups, Datasets, and attributes must match:
>>     NameStartChar ::=  _ | a..z | A..Z
>>     NameChar      ::=  NameStartChar | . | 0..9
>>     Name          ::=  NameStartChar (NameChar)*
>>
>> Or, as a regular expression:  [_a-zA-Z][_a-zA-Z.0-9]*
>> -------------------% clip here %-------------------
>>
>> Are these names validated in any way?
>>
>> Also, we _must_ tighten up our examples (in the manual
>> and example data files)!
>>
>> Comments?
>>
>>
>>
>>
>> -------- Original Message --------
>> Subject: Re: [MAHID] naming conventions
>> Date: Fri, 29 Jan 2010 11:16:10 -0600
>> From: Matt Newville<newville at cars.uchicago.edu>
>> Reply-To: mahid at googlegroups.com
>> To: mahid at googlegroups.com
>>
>> Hi,
>>
>> I think that the Nexus approach toward names of (correct me if I have
>> this wrong)
>>       A Name for Group, Dataset or attributes
>>       must be a valid HDF5 and XML name.
>>
>> is a bit too weak.  To verify a name is allowed, does one check both?
>> I don't actually see a simple grammar production for HDF5 names (I
>> believe it may simply be "char*").  Spaces and non-printable ASCII
>> characters are definitely allowed, and I suspect that unicode support
>> in names may vary with HDF5 versions and libraries.
>>
>> I think non-printable characters and whitespace should be avoided.  If
>> I read it correctly, one of the examples in the Nexus doc has a
>> dataset named " data " (Example 3.1, page 16):
>>       <NXdata name=" data ">
>>           <time_of_flight axis= 1 primary= 1>  1500.0 1502.0 1504.0 ...
>> </time_of_flight>
>>           <polar_angle axis= 2 primary= 1>  15.0 15.6 16.2 ...
>> </polar_angle>
>>           <data>  5 7 14 ...</data>
>>       </NXdata>
>>
>> That could be unintentional, but (if I understand correctly) the
>> corresponding HDF5 file would have a Group named " data ", which is
>> allowed (both HDF5 and XML).  That seems problematic to me (what if
>> there are Groups named 'data', ' data ', and ' data'?).  I recommend a
>> much simplified variation of the XML grammar production that doesn't
>> allow whitespace, non-printable characters, or most punctuation in
>> names.  Specifically, I suggest
>>
>> Names for Groups, Datasets, and attrbutes must match:
>>     NameStartChar ::=  _ | a..z | A..Z
>>     NameChar        ::=  NameStartChar | . | 0..9
>>     Name               ::=  NameStartChar (NameChar)*
>>
>> Or, as a regular expression:  [_a-zA-Z][_a-zA-Z.0-9]*
>>
>> We could consider other punctuation characters, such as '@$&~|:-', but
>> I think we could easily live without these too.
>>
>> Any comments?
>>
>> Cheers,
>>
>> --Matt Newville
>> _______________________________________________
>> NeXus mailing list
>> NeXus at nexusformat.org
>> http://lists.nexusformat.org/mailman/listinfo/nexus

-- 
----------------------------------------------------------
  Pete R. Jemian, Ph.D.                <jemian at anl.gov>
  Beam line Controls and Data Acquisition, Group Leader
  Advanced Photon Source,   Argonne National Laboratory
  Argonne, IL  60439                   630 - 252 - 3189
-----------------------------------------------------------
     Education is the one thing for which people
        are willing to pay yet not receive.
-----------------------------------------------------------


More information about the NeXus mailing list