MpiBLAST

Description

mpiBLAST is a freely available, open-source, parallel implementation of NCBI BLAST. By efficiently utilizing distributed computational resources through database fragmentation, query segmentation, intelligent scheduling, and parallel I/O, mpiBLAST improves NCBI BLAST performance by several orders of magnitude while scaling to hundreds of processors. mpiBLAST is also portable across many different platforms and operating systems. Lastly, a renewed focus and consolidation of the many codebases has positioned mpiBLAST to continue to be of high utility to the bioinformatics community.

Version

  • 1.6.0

Authorized Users

  • CIRCE account holders
  • SC account holders

Platforms

  • CIRCE cluster
  • SC cluster

Modules

MpiBLAST requires the following module file to run:

  • apps/mpiblast/1.6.0

Running MpiBLAST on CIRCE/SC

The MpiBLAST user guide is essential to understanding the application and making the most of it. The guide and this page should help you to get started with your simulations. Please refer to the Documentation section for a link to the guide.

  • Note on CIRCE: Make sure to run your jobs from your $WORK directory!
  • Note: Scripts are provided as examples only. Your SLURM executables, tools, and options may vary from the example below. For help on submitting jobs to the queue, see our SLURM User’s Guide.

Creating and Submitting a Job

Create a .ncbirc file in your home directory. It should be formatted as follows (user location of /work directory will vary):

[mpiBLAST]
  Shared=/work/j/joeuser/blast
  Local=/tmp/joeuser/blast

‘Shared’ is a location that all of the nodes doing the computation will have access to and ‘Local’ should be a faster, local drive.

Prepare a database for use

Research Computing mirrors several of the available databases. They can be found in /opt/apps/ncbi-6.1/blast/fasta. If one of those will serve your needs, simply copy it from that location to your Shared location, and decompress it.

[joeuser@host ~]$ module add apps/mpiblast/1.6.0
[joeuser@host ~]$ mkdir $WORK/blast
[joeuser@host ~]$ cp /opt/apps/ncbi-6.1/blast/fasta/drosoph.nt.gz $WORK/blast
[joeuser@host ~]$ cd $WORK/blast
[joeuser@host blast]$ gunzip drosoph.nt.gz

Then, you need to format the database for use. Below is an example way to do so. It assumes you want to run a job searching against the drosoph.nt database, on 4 nodes.

[joeuser@host blast]$ mpiformatdb --nfrags=4 -i drosoph.nt -pF --quiet

Submit the Job

Below is a sample job script. It assumes that you want a hard runtime of 1 hour and that you have set up the database in 4 fragments. It also assumes that your value being searched for is in a file called query.in, in the current working directory. You’ll need to request two more processors than the number of fragments for the two coordinating processes that are created during execution.

#!/bin/sh
#
#SBATCH --job-name=mpiblast-test
#SBATCH --time=01:00:00
#SBATCH --ntasks=6

#### SLURM 6 processor mpiBLAST test to run for 1 hour.

module purge
module load apps/mpiblast/1.6.0

mpirun mpiblast -d drosoph.nt -i query.in -p blastn -o results.txt 

 
Next, you can change to your job’s directory, and run the sbatch command to submit the job:

[user@login0 ~]$ cd my/jobdir
[user@login0 jobdir]$ sbatch ./mpiblast-test.sh

Example query.in file for drosoph.nt:

>Search sequence
AGTACGTAAAGAAAATCTTTTTTTGGCGACATCATTGTATTTGGTAGTATACGATTTCCAGCATCCACAACTATTTCCT

Documentation

Home Page, User Guides, and Manuals

More Job Information

See the following for more detailed job submission information:

Reporting Bugs

Report bugs with MpiBLAST to the IT Help Desk: rc-help@usf.edu