mpiBLAST is a freely available, open-source, parallel implementation of NCBI BLAST. By efficiently utilizing distributed computational resources through database fragmentation, query segmentation, intelligent scheduling, and parallel I/O, mpiBLAST improves NCBI BLAST performance by several orders of magnitude while scaling to hundreds of processors. mpiBLAST is also portable across many different platforms and operating systems. Lastly, a renewed focus and consolidation of the many codebases has positioned mpiBLAST to continue to be of high utility to the bioinformatics community.
MpiBLAST requires the following module file to run:
- See Modules for more information.
Running MpiBLAST on CIRCE/SC
The MpiBLAST user guide is essential to understanding the application and making the most of it. The guide and this page should help you to get started with your simulations. Please refer to the Documentation section for a link to the guide.
- Note on CIRCE: Make sure to run your jobs from your $WORK directory!
- Note: Scripts are provided as examples only. Your SLURM executables, tools, and options may vary from the example below. For help on submitting jobs to the queue, see our SLURM User’s Guide.
Creating and Submitting a Job
Create a .ncbirc file in your home directory. It should be formatted as follows (user location of /work directory will vary):
[mpiBLAST] Shared=/work/j/joeuser/blast Local=/tmp/joeuser/blast
‘Shared’ is a location that all of the nodes doing the computation will have access to and ‘Local’ should be a faster, local drive.
Prepare a database for use
Research Computing mirrors several of the available databases. They can be found in /opt/apps/ncbi-6.1/blast/fasta. If one of those will serve your needs, simply copy it from that location to your Shared location, and decompress it.
[joeuser@host ~]$ module add apps/mpiblast/1.6.0 [joeuser@host ~]$ mkdir $WORK/blast [joeuser@host ~]$ cp /opt/apps/ncbi-6.1/blast/fasta/drosoph.nt.gz $WORK/blast [joeuser@host ~]$ cd $WORK/blast [joeuser@host blast]$ gunzip drosoph.nt.gz
Then, you need to format the database for use. Below is an example way to do so. It assumes you want to run a job searching against the drosoph.nt database, on 4 nodes.
[joeuser@host blast]$ mpiformatdb --nfrags=4 -i drosoph.nt -pF --quiet
Submit the Job
Below is a sample job script. It assumes that you want a hard runtime of 1 hour and that you have set up the database in 4 fragments. It also assumes that your value being searched for is in a file called query.in, in the current working directory. You’ll need to request two more processors than the number of fragments for the two coordinating processes that are created during execution.
#!/bin/sh # #SBATCH --job-name=mpiblast-test #SBATCH --time=01:00:00 #SBATCH --ntasks=6 #### SLURM 6 processor mpiBLAST test to run for 1 hour. module purge module load apps/mpiblast/1.6.0 mpirun mpiblast -d drosoph.nt -i query.in -p blastn -o results.txt
Next, you can change to your job’s directory, and run the sbatch command to submit the job:
[user@login0 ~]$ cd my/jobdir [user@login0 jobdir]$ sbatch ./mpiblast-test.sh
Example query.in file for drosoph.nt:
>Search sequence AGTACGTAAAGAAAATCTTTTTTTGGCGACATCATTGTATTTGGTAGTATACGATTTCCAGCATCCACAACTATTTCCT
Home Page, User Guides, and Manuals
- mpiBLAST Home Page
- mpiBLAST User's Guide
More Job Information
See the following for more detailed job submission information:
Report bugs with MpiBLAST to the IT Help Desk: firstname.lastname@example.org