[NeXus-committee] Fwd: WSSSPE2 notification for paper 17

Ted Habermann thabermann at hdfgroup.org
Tue Sep 2 17:11:59 BST 2014


Hello all,

Happy to let you know that the HDF Paper was accepted for the WSSSPE2 Conference! There are a number of interesting review comments that I need to respond to. Note that the four page limit is still in effect, so it may prove difficult to cover everything... Nevertheless, I will get back to you and, hopefully, meet some of you in New Orleans...

Ted

Begin forwarded message:

From: WSSSPE2 <wssspe2 at easychair.org<mailto:wssspe2 at easychair.org>>
Subject: WSSSPE2 notification for paper 17
Date: September 2, 2014 at 9:02:10 AM MDT
To: Ted Habermann <thabermann at hdfgroup.org<mailto:thabermann at hdfgroup.org>>

Dear Ted Habermann

We’re happy to inform you that your paper (17: The Hierarchical Data Format (HDF): A Foundation for Sustainable Data and Software) has been accepted for the 2nd Workshop on
Sustainable Software for Science: Practice and Experiences (WSSSPE2). Papers for WSSSPE2 were reviewed through
a peer review process where reviewers were asked to comment broadly on papers, with the primary acceptance
criteria being that papers should make a clear contribution to the topic of the workshop. Accepted papers will be
used by the organizing committee to design sessions that will be highly interactive and targeted towards
facilitating action, and will be linked to from the WSSSPE2 web pages. Please note that most of the workshop will
be interactive sessions, and only include a small number of selected presentations.

Authors are encouraged to revise their papers based on reviewer comments that make sense given the WSSSPE2
scope and CFP*, reposting new versions at the same DOI that has already been registered with EasyChair. Please
also reformate papers that exceed the requested four page limit to be four pages or less (references can be
additional to this limit).

Best wishes,
the WSSSSPE organizers

*Please be aware that the organizers do not control the reviewers, and while we suggest how they should review
papers, this doesn’t always lead to the reviews we would like.  Some of the reviewers may have viewed the papers
more as traditional conference or journal papers than as WSSSPE2 workshop papers.  We have taken this into
account in making our decisions, but you should also take it into account as you read the reviews and make
changes for your final version.


----------------------- REVIEW 1 ---------------------
PAPER: 17
TITLE: The Hierarchical Data Format (HDF): A Foundation for Sustainable Data and Software
AUTHORS: Ted Habermann, Andrew Collette, Steve Vincena, Werner Benger, Jay Jay Billings, Matt Gerring, Konrad Hinsen, Pierre de Buyl, Mark Könnecke, Filipe Rnc Maia and Suren Byna

OVERALL EVALUATION: 1 (weak accept)
REVIEWER'S CONFIDENCE: 4 (high)
Relevance to WSSSPE: 3 (good)
Action-orientation (will lead to actions?): 3 (good)
Should this paper author give a plenary talk?: 2 (yes)

----------- REVIEW -----------
This papers presents an impressive data format for interoperability (HDF5) that is in use in some research groups, and this idea of format homogenization is clearly of interest to the larger scientific community.

However I'm not clear on the link to software (within meta-data, persistently linked) and reproducibility of results. Are there software styles within the HDF5 format? I am not convinced interoperability per se is in high demand by these communities as we don't see much evidence of reuse in different area - perhaps orienting toward linking code and data to results would be more broadly useful and interoperability would be a second order consideration.

There were a few framing statements on page 1 that were very strong and I felt deserved some explanation and argument, since in their current form they would demand enormous resources to effectuate. For example "Data must be preserved in well-documented, self-describing formats accessible on multiple platforms using many programming languages." must? I think should might be a better word, which a description of how much time and effort this goal would take to achieve. Are there any data what would be exempt from this statement? Again, where does the code that produced the results (or the data) fit in? I had the same issue with the very strong statement: "In the short-term, sustainable formats and commercial, open-source, and community tools must support specific data and analysis needs of multiple scientific communities in order to achieve the usage and support required for sustaining maintenance and development of new capabilities." Why? and why not verification? why is "su!
staining maintenance and development of new capabilities" of such primary importance, and how much would be cost?

I also dont have much of a sense of how easy it would be to convert data to HDF5 format. What are the parameters that influence how easy this is? How much human intervention is needed? etc.

It would be nice to have had a ref for HDF5 when it is introduced on the first page.

Overall a terrific substantive paper but it would be much improved with those concerned addressed.


----------------------- REVIEW 2 ---------------------
PAPER: 17
TITLE: The Hierarchical Data Format (HDF): A Foundation for Sustainable Data and Software
AUTHORS: Ted Habermann, Andrew Collette, Steve Vincena, Werner Benger, Jay Jay Billings, Matt Gerring, Konrad Hinsen, Pierre de Buyl, Mark Könnecke, Filipe Rnc Maia and Suren Byna

OVERALL EVALUATION: 1 (weak accept)
REVIEWER'S CONFIDENCE: 5 (expert)
Relevance to WSSSPE: 3 (good)
Action-orientation (will lead to actions?): 2 (fair)
Should this paper author give a plenary talk?: 1 (no)

----------- REVIEW -----------
This paper provides an overview of "HDF5 in Action".  I came to this manuscript expecting either a description of sustainability within the HDF5 development process itself, or a discussion of how HDF5 is being used for sustainable scientific software development.  Although the manuscript attempts to follow the latter path, it reads a little more like a laundry list of applications and tools in the HDF5 ecosystem than a discussion of how HDF5 promotes sustainability.

HDF5 is a very important library in the scientific software development community, and it is important for other developers to be aware of its contributions as well as its own development model. I would suggest that the authors remove some of the examples of HDF5 applications and spend more time expanding on how HDF5's common data standard improves the sustainability of software in practice.  I am not sure how this paper could be made more "actionable", but I feel that is also lacking here.

Other comments:

there are individuals or groups who’s -> there are individuals or groups whose


----------------------- REVIEW 3 ---------------------
PAPER: 17
TITLE: The Hierarchical Data Format (HDF): A Foundation for Sustainable Data and Software
AUTHORS: Ted Habermann, Andrew Collette, Steve Vincena, Werner Benger, Jay Jay Billings, Matt Gerring, Konrad Hinsen, Pierre de Buyl, Mark Könnecke, Filipe Rnc Maia and Suren Byna

OVERALL EVALUATION: 1 (weak accept)
REVIEWER'S CONFIDENCE: 4 (high)
Relevance to WSSSPE: 3 (good)
Action-orientation (will lead to actions?): 2 (fair)
Should this paper author give a plenary talk?: 1 (no)

----------- REVIEW -----------
This paper provides many strong use cases for use and re-utilization of the Hierarchical Data Format (HDF) in high performance computing environments and in extended areas of relevance for computing (H5MD). Having a paper like this in the WSSSPE2 program could be important for discussion around how data management and metadata management across computational infrastructures will be one of the major tenants for long-term sustainable software architectures.

Overall, I don’t see how this paper leads to an actionable item as described in the CFP for WSSPE2. Where is the argument to support HDF as a means to support specific mechanisms for enabling better software reuse or any other component of software sustainability. I understand how better data management can help enable faster software development and support but this should have been the paper to make that argument as an actionable component of the workshop.

Future enhancements could include a stronger argument for how data lifecycle management could improve sustainability design for software engineering processes.


----------------------- REVIEW 4 ---------------------
PAPER: 17
TITLE: The Hierarchical Data Format (HDF): A Foundation for Sustainable Data and Software
AUTHORS: Ted Habermann, Andrew Collette, Steve Vincena, Werner Benger, Jay Jay Billings, Matt Gerring, Konrad Hinsen, Pierre de Buyl, Mark Könnecke, Filipe Rnc Maia and Suren Byna

OVERALL EVALUATION: -1 (weak reject)
REVIEWER'S CONFIDENCE: 4 (high)
Relevance to WSSSPE: 3 (good)
Action-orientation (will lead to actions?): 1 (poor)
Should this paper author give a plenary talk?: 1 (no)

----------- REVIEW -----------
This paper proposes that HDF5 can be used to form the foundation for sustainability in data science. The authors give some examples of projects using HDF5, and then propose some next steps. The reason I don't think this paper is very suitable is that HDF5 is already widely accepted as the defacto format for data science, so it's hard to see how this contributes anything additional to solving sustainability issues. The paper also does not propose any specific mechanisms for enabling better software reuse, maintenance, or any other software engineering issues. HDF5 is clearly an important component in software sustainability, but the authors need to do a better job of explaining how it can be more than this.


----------------------- REVIEW 5 ---------------------
PAPER: 17
TITLE: The Hierarchical Data Format (HDF): A Foundation for Sustainable Data and Software
AUTHORS: Ted Habermann, Andrew Collette, Steve Vincena, Werner Benger, Jay Jay Billings, Matt Gerring, Konrad Hinsen, Pierre de Buyl, Mark Könnecke, Filipe Rnc Maia and Suren Byna

OVERALL EVALUATION: 2 (strong accept)
REVIEWER'S CONFIDENCE: 3 (medium)
Relevance to WSSSPE: 4 (excellent)
Action-orientation (will lead to actions?): 2 (fair)
Should this paper author give a plenary talk?: 2 (yes)

----------- REVIEW -----------
The early parts of this paper give a really nice and concise introduction to the importance of considering data sustainability alongside software sustainability, and in describing how HDF5 addresses the issues arising. Several case studies are then given of the use of HDF5 in a wide spectrum of large scale computational science projects. This should form the foundation of a detailed action plan as to how to move forward in this area. However, I found the "Next Steps?" a little disappointing - how are the community efforts going to be galvanised and how will the authors make use of the presentation at WSSSPE2 to bring together the leaders of these communities. I think the paper should be accepted for oral presentation but in that presentation the authors should concentrate on how the success of HDF5 can be built upon.

There are a couple of typographical errors:

Page 4: "In" should be in
Page 4 who's should be whose


[cid:32323496-C60B-49FF-8310-11CCF46BDC72]

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nexusformat.org/pipermail/nexus-committee/attachments/20140902/aa9885e6/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SignatureSm.png
Type: image/png
Size: 30402 bytes
Desc: SignatureSm.png
URL: <http://lists.nexusformat.org/pipermail/nexus-committee/attachments/20140902/aa9885e6/attachment.png>


More information about the NeXus-committee mailing list