BioHPC Home
BioHPC
Computational Biology Application Suite for High Performance Computing
 
What's new Using BioHPC BioHPC NextGen Support Architecture Applications Web Services Access Future Directions BioHPC @ CBSU Using BioHPC Administration of BioHPC Installing on cluster Installing on server Real-time scheduler Download from CBSU

BioHPC: Applications

BioHPC provides users with popular bioinformatics tools covering various aspects of computational biology:
  • Data mining / sequence analysis
    BLAST, BLAT, HMMER, GIMSAN, InterProScan, RepeatFinder, GIMSAN, SLIM

  • Next Generation Sequencing tools
    Bowtie, BWA, Cufflinks, FastX, RNASeq, SamTools, TopHat

  • Protein structure prediction and modeling
    LOOPP, Modeller

  • Population genetics
    BayesScan, BEAST, BEST, Clumpp, Colony, IM, IMa, IMa2, InStruct, LAMARC, MCMCcoal, MDIV, Migrate, MKPRF, MSVAR, OmegaMap, Parentage, SFS_CODE, Structurama, Structure, TESS

  • Phylogenetics
    MrBayes, ClustalW, Stretcher, T-COFFEE

  • Association analysis / statistics
    PLINK, R, NAM-GWAS

  • MSR Biomedical
    CreateEpitome, Epipred, FalseDiscoveryRate, HlaAssignment, HlaCompletion, PhyloD

The system is flexible and can be easily customized to include other software, in fact the number of applications available in BioHPC grows fast. The interface to each application is standardized,  users can choose the cluster, number of nodes or allow the interface to determine it based on the best load balance and node availability. It is also scalable, the installation on our servers currently (January 2011) processes approximately 80,000 job submissions per year, many of them requiring massively parallel computations for a long time. It is integrating different cluster technologies (MS CCS, MS HPC Server 2008, JSDL, Linux schedulers). There are both parallel and serial applications available through the interface. LOOPP and MrBayes are examples of genuine parallel applications. P-BLAST, P-HMMER and P-IPRSCAN are parallelized through input sequence distribution (trivial parallelization). MPI is used for communication. 

The applications accessible via BioHPC are various third party programs governed by their respective licenses. Only part developed at CBSU is covered by BioHPC license. It is sole responsibility of the administrators/owners of a particular BioHPC server to assure that use of these applications is in agreement with their respective licenses.

BioHPC @ CBSU implementation is a good example of what is typical application usage. Below is an example of usage of a few popular programs between 6/13/2003 and 1/17/2011. For up to date information about BioHPC@CBSU please go to this page.

MDIV 22,856 (population genetics) 1 core from few hours to two weeks   (average: 2-5 days).
LOOPP 22,974(protein structure prediction)
5-20 cores for 3-10 hours
MrBayes 30,890(population genetics) 8-20 cores for a few hours to two weeks (average: 5 days)
P-BLAST 5,947(sequence analysis / data mining) 10 – 100 cores for a few days to a week (average 3 days)
IM / IMa / IMa230,590(population genetics) 1 core for 2-5 days
Structure29,527(population genetics)  
All applications 208,760 (average 27,420 per year)  
(since 1/18/2010)79,691   



BioHPC @ Cornell What's new