BioHPC Home

BioHPC
Computational Biology Application Suite for High Performance Computing

HOME

About BioHPC

Implementations

What's new Using BioHPC BioHPC NextGen Support Architecture Applications Web Services Access Future Directions BioHPC @ CBSU Using BioHPC Administration of BioHPC Installing on cluster Installing on server Real-time scheduler Download from CBSU

BioHPC @ CBSU

BioHPC is the main CBSU venue for delivering High Performance Computing to biological groups. It was developed in collaboration with Microsoft and due to its sponsorship part of CBSU resources are open to general scientific community. BioHPC users are divided into two categories: guests and registered users. Registered users come from Cornell community and have several privileges over guests including number of jobs and full access to restricted computationally expensive programs. BioHPC was very well received in the scientific community and its usage, both registered and guest, increases substantially every year.

The Cornell BioHPC installation is available at http://cbsuapps.tc.cornell.edu/ with all implemented applications included. Some of the applications are freely available for general public use; some, due to a very high computational demand, are available to registered users only. Currently the CBSU BioHPC installation at Cornell is linked to 327 compute nodes with 978 CPU cores grouped in 8 clusters. One compute node is a 64GB RAM 16-core server; 32 nodes are 8-core 16GB RAM servers; and, 234 nodes are 2-core 4GB RAM servers. Twenty compute nodes from a 64 node Athena cluster at Microsoft headquarters in Redmond, WA are linked to this interface via a HPC Basic Profile/JSDL connection (16GB RAM 4-core servers). Due to size and load, the Cornell BioHPC installation has a dedicated web server, separate fileserver (6TB storage), separate ftp server (6TB storage) and a separate Microsoft SQL Server. 2.5 FTEs at are currently dedicated to BioHPC development and maintenance at CBSU.

The primary installation running on Cornell's CBSU clusters processes over 80,000 job submissions per year, the majority requiring multiple processors over multiple days. Since its initial deployment in 2003 through 3/31/2011, BioHPC processed 229,375 jobs with an average load of 27,597 jobs per year and 86,151 over last full year (4/1/2010-3/31/2011). Jobs were submitted by 16,224 users from 83 countries with the majority of jobs (49% by CPU time used) originating in the U.S. Among them there are 3,380 users from .edu domains representing 498 edu institutions, 7,230 users from .com domains (including 6,520 users with Yahoo, Gmail and Hotmail e-mail addresses), and 311 Cornell users.

The most popular application category overall (counting all years since 2003) is population genetics with 84% of CPU time used followed by protein structure (10%) and sequence analysis (5%).

Cornell registered users accounted for 24% of CPU usage while accounting for only 2.2% of the total amount of users, which means Cornell users consume significant computational resources per job and per user. Registered users have access to the most resource consuming sequence analysis programs not available to guests, however with majority of their usage still focused on population genetics. They also can run more jobs per user than guests resulting in higher resource utilization per category.

Recently, we gathered geographical information of visitors of the Cornell BioHPC installation site between June 3rd 2010 - May 19th 2011 using RevolverMaps. It registered 89,306 visits with the following geographical distribution:

Visits are well correlated with job submissions. Current visitors geographical distribution can be accessed here.

BioHPC @ Cornell

What's new