BACK to main CASC page

11 October 2001

Jeremy Knowles
Dean of the Faculty of Arts and Sciences
Harvard University
University Hall
Cambridge, MA 02138


Dear Jeremy,

This letter is a request by a group of faculty to create a Center for Advanced Scientific Computing at Harvard University.

Historically, one of the primary roads to breakthroughs in the sciences has been the application of new instrumentation to an unsolved problem. In the 20th Century, particularly in the latter half, breakthroughs have involved collaborations between scientists and expert engineers in the design and realization of experimental hardware. At present, computer hardware and software are rivaling, and will likely someday overtake, experimental hardware as the single largest expense of cutting-edge science. However, unlike highly specialized experimental apparatus, computing resources can be shared among researchers to produce a highly efficient engine for sophisticated modeling and data analysis. Across a broad range of departments in the sciences at Harvard, there is a significant fraction of faculty members whose research requires advanced computing. There is a large commonality in the advanced computing requirements of these faculty members, from areas as mundane as the administration of powerful commodity computing farms to areas as sophisticated as the optimization of complex modeling problems.

The mission of the Center for Advanced Scientific Computing would be to advance scientific research at Harvard through the direct application of computer science to amenable, yet difficult, analysis and modeling problems in the Sciences. This request has a history of false starts, but a common theme: the independent realization by many faculty that such a Center is critical for forefront scientific research at Harvard. Professors Alyssa Goodman (Astronomy) and John Huth (Physics) have agreed to lead initial efforts, but all of the signatories of this letter are willing to commit significant effort to make this Center a reality. The signatories include members of the Departments of Astronomy, Chemistry, Organismic and Evolutionary Biology, Earth and Planetary Sciences, Physics, Statistics, and the Division of Engineering and Applied Sciences. We do not have an exhaustive list of potentially interested faculty members, but view this letterıs signers as a sufficiently representative list as to demonstrate the level of interest. We are currently employing the acronym ³CASC² to refer to the ³Center for Advanced Scientific Computing.²

Why should such a center be formed?

We envisage CASC as an essential resource for many faculty and students, offering them not only forefront hardware and software, but most importantly, staff whose expertise will enable scientists from a variety of disciplines to apply the most relevant computational techniques to any particular problem. Centers of this kind exist at other universities and are proving to be a tremendous asset not only in the direct application of problem solving, but in less direct areas such as the attraction of funding and faculty recruiting. The lack of such a center at Harvard places its faculty at a significant disadvantage relative to those in other universities. We feel that this situation must be remedied.

CASC can be thought of as serving functions analogous to the combination of a library, an engineering group, and a fabrication facility, but it is the unique combination of these functions that makes CASC different. That combination turns CASC into research Center, devoted to perpetually finding new and innovative applications of computing resources to ³modern² scientific inquiry.

What kind of projects might be enabled?

Examples of projects that CASC would enable abound. Take modeling by a geophysicist or an astrophysicist of gaseous ionized matter. Todayıs high-end computer farms can simulate the physical behavior of a cubical volume 512 pixels on a side (100 mega pixels in total) in days to weeks of CPU time. Thus, in most cases the largest scale simulated can only be about 500 times larger than the smallest scale, and that is simply an insufficient range of scale to achieve credible results. In order to make modeling more relevant and realistic, the simulators must choose among one of several possible options. Should they strive to find a faster, but financially feasible computer? Should they optimize their current code? Should they change their software to run on many spatial scales in parallel or in sequence, and would such changes make their code run faster on some specialized machine? As talented as the applications scientists may be, they are not usually computing experts, who would be in a far better position to choose among a semi-infinite space of options to find the one that is most effective. In many cases, the choice of computing strategy itself will be as much of a scholarly contribution as the result the computing generates.

Another type of example concerns the area of so-called ³smart queries² of large databases that can efficiently look for correlations in huge quantities of data, in the range of thousands of Terabytes. These data sets run the gamut from the reconstruction of particle physics tracks, to gene sequence searches, to Earth samples, to climate records, to correlations in matter density in the early university. The data sets continue to grow, and so do the concomitant problems associated with searching them, which form an intensive area of study for computer scientists.

How will faculty make use of this center?

We see CASC as a collaborative enterprise. There will be a base support of staff who are expert in areas such as secondary and tertiary storage, high-end commodity configurations, etc., coupled with computer scientists with a strong background in scientific applications. In the beginning, about 20 researchers would be affiliated with CASC (see below for a breakdown). Faculty and researchers in the Departments and Divisions represented by the signatories of this letter would ³propose² projects to CASC management, who would discuss the projects further with both their research staff and with the proposing scientist. If a project were selected, the CASC members assigned to it would play more of a collaborative than a support role. We expect that in several cases the CASC scientist would be a co-author along with the Department/Division researchers on scientific publications resulting from the collaborative effort. Also, there is nothing to stop CASC scientists from collaborating on more than one project at a time‹some of which may even face common challenges.

The CASC members(s) working on any project would take a large share of the responsibility for selecting the correct hardware/software tools for the task and would often be able to draw upon shared CASC resources to implement the solution. However, we are very wary of investing too much, as many institutions have done, in one particular kind of hardware or software and then forcing ³that² solution onto everyone. CASC is not a supercomputing center in the sense that it is not planned as a big building housing an expensive supercomputer that only a handful of people will find useful. The primary focus on CASC is on people, and the hardware they need to collaborate and carry out their research effectively.

We expect that CASC will be housed in a building either by itself, or co-located with some of the groups who may make use of the center. The North Campus redevelopment represents an ideal site because of its proximity to both the DEAS and science faculty of the FAS. We feel that a local concentration of almost all of CASCıs research staff will foster both formal and informal discussions that will lead to the best-thought-out research. We anticipate that CASC would provide offices for visiting faculty actively working on CASC-collaborative projects, and that Departments and the Division would provide similar temporary accommodations for CASC scientists, when their physical presence in a Department or Division would be useful.

Who will work at this Center?

Given the rapid pace of computing hardware evolution, we view the establishment of specific computational engines as being secondary to the establishment of an infrastructure of knowledgeable Center staff who can help develop and use these computational engines effectively. There are new concepts in data-intensive computing, such as virtual data grids, which take advantage of smart queries of large databases of information (hundreds to thousands of terabytes of data) using computational centers linked together by high-bandwidth internet backbones. Scientists teamed up with computing professionals can take advantage of these recent developments to find powerful ways of extracting new information from data. CASC should become a powerful resource for graduate students collaborating with faculty members, where the students can work on topics at the intersection of their field and cutting-edge computer science.

We expect that the computing professionals most attracted to positions at CASC might be former scientists and/or science Ph.D.s whose career has taken them on a more computer-oriented tack. Roughly speaking, there will be two categories of CASC research staff. Initially we believe that about five members of the staff would hold ³Senior² (i.e., long term) positions, and would be charged with both research and supervisory duties. Another, larger, number of members would hold ³Junior² (postdoctoral-level) positions, and would be selected in an open competition each year, by a search committee comprised of faculty from the departments represented by the signatories of this letter. In its early years, we guess that about 15 postdoctoral-level positions in steady state (5 hires per year, for a 3-year term each) are about right.

In addition to the research staff, there is the need for a strong technical and support staff for configuration, procurement, installation, maintenance, 24/7 support, etc. The actual size of the support staff depends in part on the details of the Center. Centers of this kind at other universities employ ten to twenty technical/support personnel. Technical choices will determine the staffing needs. If, for example, one decides to employ robotic tape storage systems, this pushes the staffing needs to the high end.

How much will a Center cost, and how should it be funded?

We imagine that the initial incarnation of the Center should have a technical support staff of roughly 15 FTEıs, 5 post-docs, ramping up to 15 total at the end of three years, and 5 senior staff. In addition to this, there should be at least two supported high-end platforms ­ one suited to CPU intensive activity, the other more multipurpose, but specializing in data intensive work, and high bandwidth connectivity. The support staff should be capable of helping groups external to the Center set up systems such as Beowulf class commodity CPU systems as well. With a loaded cost of $240k/year for senior personnel, $100k/year for postdoc level positions, and $150k/year for technical staff, plus replacement and licensing costs of roughly $1M/year for hardware and software after initial investments, one expects an annual CASC operating budget on the order of $6M/year. It is expected that a substantial fraction of the cost of the center will eventually be derived from matching funds resulting from grants, likely at the level of 50% of the operating cost of the center.

The existence of advanced computing centers at other universities has created a tremendous asset for those schools, not only in the direct support of scientific applications, but also in the competitive edge in funding, where funding agencies increasingly look for university infrastructure to add synergy to new initiatives. So, although we are looking to Harvard to endow a Center, we imagine its efforts will grow far beyond its Harvard-sponsored effort. Several signatories on this letter already have government funding for advanced-computing related projects (e.g. The National Virtual Observatory (NVO), The Grid Physics Network (GriPhyN)). We expect that CASC would provide a perfect lever for acquiring more such funding, some of which might ultimately be channeled back into CASC (e.g. in paying for additional postdocs).

We believe that Harvard must invest in a Center like what we have described in order to effectively compete in an era when information technology is coming to dominate the scientific landscape. We will be more than happy to meet to discuss how we might plan to form such a Center.

Sincerely,

Jeremy Bloxham, Chair
Earth and Planetary Sciences

Colleen Cavanaugh
Organismic & Evolutionary Biology

David van Dyk
Statistics

Giuseppina (Pepi) Fabbiano
CfA/SAO

Alyssa Goodman
CfA/Astronomy

Lincoln Greenhill
CfA/SAO

Lars Hernquist
CfA/Astronomy

John Huth
Physics

Efthimios (Tim) Kaxiras
Physics/DEAS

Avi Loeb
CfA/Astronomy

James Rice
Earth and Planetary Sciences/DEAS

Eugene Shakhnovich
Chemistry

Irwin Shapiro, Director
CfA/Astronomy/SAO/Physics

Kris Stanek
CfA/Astronomy

Steven Wofsy
Earth and Planetary Sciences

BACK to main CASC page