Bcbio-nextgen

Revision as of 19:12, 10 October 2017 by Tgreen (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Description

From the bcbio web site: bcbio is a python toolkit providing best-practice pipelines for fully automated high throughput sequencing analysis. You write a high level configuration file specifying your inputs and analysis parameters. This input drives a parallel pipeline that handles distributed execution, idempotent processing restarts and safe transactional steps. The goal is to provide a shared community resource that handles the data processing component of sequencing analysis, providing researchers with more time to focus on the downstream biology.

Versions

  • 1.0.5

Authorized Users

  • CIRCE account holders
  • SC account holders

Platforms

  • CIRCE cluster
  • RRA cluster
  • SC cluster

Modules

  • apps/bcbio/nextgen-1

Running Bcbio-nextgen on CIRCE/SC

The Bcbio-nextgen user guide is essential to understanding the application and making the most of it. The guide and this page should help you to get started with your simulations. Please refer to the Documentation section for a link to the guide.

  • Note on CIRCE: Make sure to run your jobs from your $WORK directory!
  • Note: Scripts are provided as examples only. Your SLURM executables, tools, and options may vary from the example below. For help on submitting jobs to the queue, see our SLURM User’s Guide.

How to Submit Jobs

Provided is a batch script for running 'bcbio' as a multi-processor job. This script can be copied into your work directory (the folder with your input files and other job files) so that you can submit batch processes to the queue.

Batch Job Submit Script

  • Run analysis, distributed across 8 local cores
  • NOTE: The number of ntasks that you request in your SLURM submit script should be 2 more than the number that you use in the bcbio_nextgen.py executable line.
#!/bin/bash
#
#SBATCH --job-name=bcbio-test
#SBATCH --nodes=1
#SBATCH --ntasks=10
#SBATCH --output=output.%j.bcbio-test
#SBATCH --time=24:00:00

### SLURM 10 processor / single node bcbio-nextgen test to run for 24 hours.
### 10 cpu cores are requested in this example, as "-n 8" is specified in
##### the bcbio_nextgen.py line below.

module purge
module load apps/bcbio/nextgen-1

bcbio_nextgen.py ../config/file-name.yaml -n 8

 
Next, you can change to your job’s directory, and run the sbatch command to submit the job:

[user@login0 ~]$ cd my/jobdir
[user@login0 jobdir]$ sbatch ./bcbio-test.sh
  • You can view the status of your job with the “squeue -u <username>” command
  • In addition, bcbio-nextgen jobs can be run distributed parallel across multiple nodes. To do this, your job configuration must make use of the IPython feature within python 2.7.11. Please review the documentation on the bcbio-nextgen website for more details about this feature.

Documentation

Home Page, User Guides, and Manuals

Benchmarks, Known Tests, Examples, Tutorials, and Other Resources

More Job Information

See the following for more detailed job submission information:

Reporting Bugs

Report bugs with Bcbio-nextgen to the IT Help Desk: rc-help@usf.edu