BioHPC Home

BioHPC
Computational Biology Application Suite for High Performance Computing

HOME

About BioHPC

Implementations

What's new Using BioHPC BioHPC NextGen Support Architecture Applications Web Services Access Future Directions BioHPC @ CBSU Using BioHPC Administration of BioHPC Installing on cluster Installing on server Real-time scheduler Download from CBSU

BioHPC: Applications

BioHPC provides users with popular bioinformatics tools covering various aspects of computational biology:

Data mining / sequence analysis
BLAST, BLAT, HMMER, GIMSAN, InterProScan, RepeatFinder, GIMSAN, SLIM
Next Generation Sequencing tools
Bowtie, BWA, Cufflinks, FastX, RNASeq, SamTools, TopHat
Protein structure prediction and modeling
LOOPP, Modeller
Population genetics
BayesScan, BEAST, BEST, Clumpp, Colony, IM, IMa, IMa2, InStruct, LAMARC, MCMCcoal, MDIV, Migrate, MKPRF, MSVAR, OmegaMap, Parentage, SFS_CODE, Structurama, Structure, TESS
Phylogenetics
MrBayes, ClustalW, Stretcher, T-COFFEE
Association analysis / statistics
PLINK, R, NAM-GWAS
MSR Biomedical
CreateEpitome, Epipred, FalseDiscoveryRate, HlaAssignment, HlaCompletion, PhyloD

The system is flexible and can be easily customized to include other software, in fact the number of applications available in BioHPC grows fast. The interface to each application is standardized, users can choose the cluster, number of nodes or allow the interface to determine it based on the best load balance and node availability. It is also scalable, the installation on our servers currently (January 2011) processes approximately 80,000 job submissions per year, many of them requiring massively parallel computations for a long time. It is integrating different cluster technologies (MS CCS, MS HPC Server 2008, JSDL, Linux schedulers). There are both parallel and serial applications available through the interface. LOOPP and MrBayes are examples of genuine parallel applications. P-BLAST, P-HMMER and P-IPRSCAN are parallelized through input sequence distribution (trivial parallelization). MPI is used for communication.

The applications accessible via BioHPC are various third party programs governed by their respective licenses. Only part developed at CBSU is covered by BioHPC license. It is sole responsibility of the administrators/owners of a particular BioHPC server to assure that use of these applications is in agreement with their respective licenses.

BioHPC @ CBSU implementation is a good example of what is typical application usage. Below is an example of usage of a few popular programs between 6/13/2003 and 1/17/2011. For up to date information about BioHPC@CBSU please go to this page.

MDIV	22,856	(population genetics)	1 core from few hours to two weeks (average: 2-5 days).
LOOPP	22,974	(protein structure prediction)
5-20 cores for 3-10 hours
MrBayes	30,890	(population genetics)	8-20 cores for a few hours to two weeks (average: 5 days)
P-BLAST	5,947	(sequence analysis / data mining)	10 – 100 cores for a few days to a week (average 3 days)
IM / IMa / IMa2	30,590	(population genetics)	1 core for 2-5 days
Structure	29,527	(population genetics)
All applications	208,760	(average 27,420 per year)
(since 1/18/2010)	79,691

BioHPC @ Cornell

What's new