National HPCC Software Exchange (NHSE) Development Report January - April, 1995 1. We have completed a catalog of software currently pointed to by the NHSE. This catalog is available online http://www.netlib.org/nse/sw_survey.html. The catalog is available in a browsable HTML version, a printable PostScript version, and a searchable version. We have categorized software items into four main categories: application libraries and programs, data analysis and visualization tools, numerical libraries and routines, and parallel processing tools. We have categorized parallel processing tools further into eight subcategories. In addition, we have provided abstracts and indexing keywords drawn from the HPCC thesaurus (currently under development) and the GAMS classification scheme for mathematical software. This catalog was constructed manually by NHSE developers and is essentially a snapshot of software available from the NHSE as of February 1995. In the future, when the software submission process described below in item 2 is well underway, the catalog will be updated automatically after submissions have been processed by the NHSE librarian. 2. We have worked out the details of the software submission process, and we are ready to begin a trial run by contacting authors of selected software from the catalog described above and asking them to formally submit their software. Contributors submit software to the NHSE by filling out an HTML form using a forms-capable WWW browser such as Mosaic or Netscape. The NHSE software submission form may be found at http://www.netlib.org/nse/software_submit/software_submit.html. This form explains the submission and review process, including authentication procedures based on PGP, and gives an example of a completed submission form. Some contributors may have fairly large collections that are already catalogued using a different data model. The NHSE will provide assistance to such contributors in converting their catalog information to the form required for submission to the NHSE and in submitting such collections en masse. Once a software submission has been authenticated, it is processed before being placed in the NHSE on-line software catalog. This processing involves retrieval of the files specified by the author as making up the contribution, fingerprinting these files, assigning the contribution a unique identifier, and additional cataloging by the NHSE librarian. Contributors may submit software for consideration for the Unreviewed, Partially reviewed, or Reviewed levels. If the software has been submitted for partial review, the NHSE librarian also inspects the submission for adherence to the NHSE software guidelines. After software has been submitted for full review, it will be assigned to an area editor, who recruits two to six reviewers to peer review the software according the documentation, correctness, soundness, usability, and efficiency criteria. 3. We have developed a framework for a software repository package that will be made available for use by NHSE participants. We expect to have two kinds of sites participating in the NHSE: 1) contributor sites that make resources available on file servers, 2) index sites that manage, catalog, and provide a search interface to the distributed collection. Initially the CRPC sites will be the index sites. We expect a variety of materials to be contributed, ranging from software to tech reports to data archives to informational HTML pages. Software submission will be handled using the system described in item 2 above. Only index sites will run the software submission server programs. Contributor sites will interact as clients via a WWW browser forms interface. For providing a search interface to the distributed collection of HPCC reports, we have installed and are currently evaluating the Dienst server software being developed for the ARPA/CNRI sponsored digital library system infrastructure project. Our plan for indexing informational HTML pages is to have each contributing site run a Harvest gatherer that collects and summarizes its own pages on a regular basis. A Harvest broker running at indexing site will collect information from the individual gatherers and provide a search interface. The individual brokers will be linked together to provide an overall search interface. Because we need to collect usage statistics for the NHSE, each contributor site will run a logging program and report statistics to an index site. The index site will gather, summarize, and display the usage statistics. We have begun the design and implementation of a location-independent naming system for network retrievable resources. Once this system has been fully implemented, servers that map resource names to meta-information and to locations will be run by index sites. To summarize, the following components will make up the repository package for the two kinds of sites: contributor site ---------------- http or ftp server Harvest gatherer log/statistics reporter index site ---------- Harvest broker with with indexing engine and search interface software submission and software catalog servers programs for gathering, summarizing, and displaying usage statistics tech report server name resolution software 4. We are experimenting with using Latent Semantic Indexing (LSI) to provide an enhanced search interface to NHSE material. LSI uses SVD decomposition to construct lower-rank approximations to the term-document incidence matrix for a collection of documents. Terms and documents both are represented as vectors in the resulting concept space, and queries are processed by finding nearby terms and documents. LSI is good at finding documents that are semantically close to a query, even if terms in the query do not appear in the documents. As a first step, we have constructed an LSI index to the HTML version of Parallel Computing Works. Preliminary experiments show good recall of sections of the book relevant to terms selected from the HPCC glossary and HPCC thesaurus we are also developing. We plan to integrate the LSI indexes for different sets of NHSE material with the manually constructed hypertext HPCC thesaurus and roadmap. Use of LSI should allow us to leverage effort we invest in manually indexing a small portion of the NHSE material into providing an effective search interface to the large, changing overall collection of NHSE information. 5. We have established contacts with digital library and software reuse research and development groups. We have been involved in discussions with CNRI about the possibility of merging the location-independent naming system we are developing with CNRI's handle management system. We are proposing the NHSE as a testbed for comparing the two systems. We will be presenting papers on the NHSE at two upcoming digital library conferences (see item 6 below), and we will be participating in an upcoming ARPA-sponsored Digital Library Workshop May 18-19. Netlib is a member of the Executive Board of the Reuse Library Interoperability Group (RIG) and represents the NHSE's interests as well. The May RIG meeting is being held in Knoxville and is being sponsored by Netlib. We have presented a paper about the location-independent naming system being developed for the NHSE at a software reuse conference (see item 6 below), and we have submitted a paper addressing software reusability issues for high performance computing to a software reusability workshop (see item 6 below). We are establishing contacts with researchers in the field of software reusability to investigate applying the techniques of domain analysis to the NHSE. 6. The following papers related to the NHSE have been submitted and/or accepted for publication: "The National HPCC Software Exchange", by Browne, Dongarra, Green, Moore, Rowan, Wade, Fox, Hawick, Kennedy, Pool, Stevens, Olson, and Disz. IEEE Computational Science and Engineering, 1995. http://www.netlib.org/srwn/srwn08.ps (to appear) "Location Independent Naming for Virtual Distributed Repositories", by Browne, Dongarra, Green, Moore, Pepin, Rowan, Wade, and Grosse, ACM-SIGSOFT Symposium on Software Reusability, Seattle, Apr 28-30, 1995. http://www.netlib.org/srwn/srwn07.ps (published) "Digital Software and Data Repositories for Support of Scientific Computing", by Boisvert, Browne, Dongarra, and Grosse. Digital Libraries Forum, McLean, Va, May 16-17, 1995. http://www.netlib.org/srwn/srwn09.ps (accepted) "Management of the National HPCC Software Exchange -- A Virtual Distributed Digital Library", by Browne, Dongarra, Kennedy, and Rowan. Digital Libraries 95, Austin, Texas, June 11-13, 1995. http://www.netlib.org/srwn/srwn11.ps (accepted) "Distributed Information Management in the National HPCC Software Exchange", by Browne, Dongarra, Fox, Hawick, Kennedy, Stevens, Olson, and Rowan, Supercomputing 95. http://www.netlib.org/srwn/srwn10.html (submitted) "Software Reuse in High Performance Computing", by Browne, Dongarra, Fox, Hawick, and Rowan. 7th Workshop on Insitutionalizing Software Reuse (WISR 7), St. Charles, Illinois, Aug 28-30, 1995. http://www.netlib.org/srwn/srwn14.ps (submitted)