NIEeS reached the end of its contract in August 2008. This website remains as a historical document and as a gateway to the archive of presentations and supplementary material.
eScience, by definition, requires software and we think the software falls into three broad categories dealing with computation, data management, and collaboration.
For the most part, the computational area is handled by software not written to perform scientific tasks, but to "carry" these tasks into an eScience environment.
Data management software has to deal with the distributed nature of eScience projects; moving data, on demand, to and from remote processing machines; making it available to researchers in scattered locations; coping with huge volumes of data, which may include metadata capture to further enhance the value of the data itself.
Collaborative software works to enhance the orgranizational aspects of a project as well as to mitigate the difficulties in communication associated with having dispersed workers.
The NIEeS staff are not restricted to purely academic knowledge of eScience software and techniques. The NIEeS director is also head of a major eScience research programme (eMinerals), and a significant contributor to another (MaterialsGrid), and we are collocated with the Cambridge members of those projects. The NIEeS staff benefit greatly from working in an eScience production environment, and contribute to the development and support of that environment. The close contact we have with scientists and developers in these large and successful eScience projects gives us unusual insight into, and practical experience of, both good practice and problems associated with doing science this way. Augment this with exposure to the wide range of eScience and ancillary topics covered in the events we run, and NIEeS is in an exceptional position to offer advice and help to many types of aspiring eScientists.
The scope of things we can offer advice about and help with is quite wide, but we have a number of core software tools, either developed elsewhere and used within the group, or developed within the group to solve a particular problem. In our environment, all three categories of software mentioned above are put to use in several virtual organisations comprised of members working in different institutions and even different countries.
Key Software tools
Condor is a specialized workload management system for compute-intensive jobs developed at the University of Wisconsin. Like PBS or other queuing systems, Condor provides a job queuing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management. Users submit their serial or parallel jobs to Condor, Condor places them into a queue, chooses when and where to run the jobs based upon a policy, carefully monitors their progress, and ultimately informs the user upon completion. Condor's novel architecture allows it to succeed in areas where traditional scheduling systems fail, such as managing heterogeneous computing resources. Condor can also be used to manage a cluster of dedicated compute nodes (such as a "Beowulf" cluster or several similar clusters).
Condor can be used to build Grid-style computing environments that cross administrative boundaries. Condor's "flocking" technology allows multiple Condor compute installations to work together. Condor incorporates many of the emerging Grid-based computing methodologies and protocols. It is also fully interoperable with resources managed by Globus. As a result, Condor can be used to seamlessly combine all of an organization's or an eScience workgroup's computational power into one available resource.
NIEeS has deployed Condor and integrated it into CamGrid. CamGrid is a University of Cambridge project which aims to build a university-wide grid based on the Condor middleware.
The Globus Toolkit, developed by the Globus Alliance, provides a set of software tools to implement the basic services and capabilities required to construct a computational Grid, such as security, resource location, resource management, and communications. Globus includes programs such as: GRAM (Globus Resource Allocation Manager), which figures out how to convert a request for resources into commands that local computers can understand; GSI (Grid Security Infrastructure), which provides authentication of the user and works out that person's access rights, and GridFTP which provides secure data transfer mechanism.
NIEeS deploys the above services for UK environmental scientists to try out as a means of accessing the NIEeS test services, for example, NIEeS runs Globus as an interface to the NIEeS Condor Pool. In order to use these grid services, you can apply for a NIEeS certificate (just ask), or you can use a UK eScience certificate, if you have one.
Storage Resource Broker
The Storage Resource Broker (SRB), developed at the San Diego Supercomputing Center (SDSC), provides access to distributed data from any single point of access. From the viewpoint of the user, the SRB gives a virtual file system, with access to data being based on data attributes and logical names rather than on physical location or real names. Physical location is seen as a file characteristic only. One of the features of the SRB is that it allows users to easily replicate data across different physical file systems in order to provide an additional level of file protection. The SRB runs on various versions of Unix, Linux, Apple's Mac OS X, and Microsoft Windows.
NIEeS makes SRB vaults available in our test services.
The Access Grid is a set of resources used to support distributed collaborative interactions across the internet. The main feature is scalable videoconferencing, augmented by a number of presentation and application sharing tools. The Access Grid can be used for large-scale distributed meetings, smaller collaborative work sessions, seminars, lectures, tutorials, and training. The Access Grid thus differs from other tools that focus more on individual-to-individual communication, although it can be used in this mode.
NIEeS created a local multicast environment which is necessary to exploit Access Grid fully. We also run unicast-multicast bridges for collaborators (and others in the NERC community) who do not have access to multicast communications.
NIEeS Certificate Authority (CA)
The NIEeS CA creates X.509 certificates to support a secure grid computing environment.
A certificate uses a digital signature to bind together a public key with identity information such as the name of a person, organization, email, and so forth. The certificate can be used to verify that a public key belongs to an individual. We issue User Certificates and Host Certificates to people and sites who are interested in using the NIEeS test services.
An issue raised at several of our events is how to handle metadata. Scientists can now produce data sets so large they would be difficult to handle without a rich set of metadata. Although manual tools exist to connect metadata to data sets, what is needed is automatic annotation.
The RCommand framework, developed in a CCLRC/Cambridge collaboration, is designed to enable metadata to be harvested from output XML files, with tools for the scientists to then extract the metadata and associated data set URL's from a central database. NIEeS is setting up a demonstration system with its own databases. We have extensive information within our GridInfo system.
My_condor_submit (MCS) is a tool developed by the eMinerals project to allow simplified job submission to remote grid resources with built-in meta-scheduling and load balancing, and both data and metadata management. The meta-scheduling is implemented within MCS itself while the job submission is handled by Condor-G and the metadata capture and storage are handled by Rcommands, and the SRB Scommands, respectively. The Globus Toolkit is used to provide security but creates a data IO problem; MCS also solves this problem. One can submit MCS jobs from a client machine with Condor-G, Globus and SRB clients installed.
NIEeS is using MCS to submit a parallel remote sensing model to the NIEeS PBS cluster. Data input and output both employ SRB.
FoX: Integrating XML and Fortran
Extensible Markup Language (XML) is useful in eScience both to make data intelligible and to ease manipulation of data. FoX is a library of software routines to allow fortran programmers to read in, and write out XML.
NIEeS is using FoX to create another library of software routines as an interface to kml, the markup language used with Google Earth (see the NIEeS projects page).