Difference between revisions of "ABySS"

Revision as of 18:54, 29 June 2016

Description

From the ABySS web site: ABySS (Assembly By Short Sequences) is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.

Version

1.2.7

Authorized Users

CIRCE account holders
RRA account holders
SC account holders

Platforms

CIRCE cluster
RRA cluster
SC cluster

Modules

ABySS requires the following module file to run:

apps/abyss/1.2.7

See Modules for more information.

Running ABySS Jobs on CIRCE

The ABySS user guide is essential to understanding the application and making the most of it. The guide and this page should help you to get started with your simulations. Please refer to the Documentation section for a link to the guide.

Note on CIRCE: Make sure to run your jobs from your $WORK directory!
Note: Scripts are provided as examples only. Your SLURM executables, tools, and options may vary from the example below. For help on submitting jobs to the queue, see our SLURM User’s Guide.

To assemble transcriptome data, see Trans-ABySS.

How to Submit Jobs

Provided are batch scripts for running ABySS as a single processor or multi-processor job. These scripts can be copied into your work directory (the folder with your input files and database files) so that you can submit batch processes to the queue. .

Serial Submit Script

Single-end assembly

Assemble short reads in a file named reads.fa into contigs in a file named contigs.fa with the following script:

#!/bin/bash
#
#SBATCH --comment=abyss-test
#SBATCH --ntasks=1
#SBATCH --job-name=abyss-test
#SBATCH --output=output.%j.abyss-test
#SBATCH --time=01:00:00

#### Slurm 1 processor ABySS test to run for 1 hour.

module purge
module load apps/abyss/1.2.7

ABYSS -k25 reads.fa -o contigs.fa

where ‘-k’ is an appropriate k-mer length. The only method to find the optimal value of ‘k’ is to run multiple trials and inspect the results. The maximum value for ‘k’ is 64.
Next, you can change to your job’s directory, and run the sbatch command to submit the job:

[user@login0 ~]$ cd my/jobdir
[user@login0 jobdir]$ sbatch ./abyss-test.sh

You can view the status of your job with the “squeue -u <username>” command

Distributed Parallel

Paired-end assembly

To assemble paired short reads in two files named reads1.fa and reads2.fa into contigs in a file named ecoli-contigs.fa, use the script:

#!/bin/bash
#
#SBATCH --comment=abyss-test
#SBATCH --ntasks=8
#SBATCH --job-name=abyss-test
#SBATCH --output=output.%j.abyss-test
#SBATCH --time=01:00:00

#### Slurm 8 processor ABySS test to run for 1 hour.

module purge
module load apps/abyss/1.2.7

abyss-pe k=25 n=10 in='reads1.fa reads2.fa' name=ecoli

where ‘k’ is the k-mer length as before. ‘n’ is the minimum number of pairs needed to consider joining two contigs. The optimal value for ‘n’ must be found by trial. ‘in’ specifies the input files to read, which may be in FASTA, FASTQ, qseq, export, SAM or BAM format and compressed with gz, bz2 or xz and may be tarred. The assembled contigs will be stored in ${name}-contigs.fa.

The suffix of the read identifier for a pair of reads must be one of ‘1’ and ‘2’, or ‘A’ and ‘B’, or ‘F’ and ‘R’, or ‘F3’ and ‘R3’, or ‘forward’ and ‘reverse’. The reads may be interleaved in the same fileor found in different files. If the mates are in different files, it’s highly recommended to place each pair of files adjacent on the command line and to use an even number of threads. Even if you are running on a single-processor machine, using two threads will help performance. Do not group together all the files containing the forwards reads followed by all the files containing the reverse reads.

Reads without mates should be placed in a file specified by the ‘se’
(single-end) parameter. Reads without mates in the paired-end files
will slow down the paired-end assembler considerably during the
ParseAligns stage.

Documentation

Home Page, User Guides, and Manuals

ABySS Home Page
- http://www.bcgsc.ca/platform/bioinfo/software/abyss

Benchmarks, Known Tests, Examples, Tutorials, and Other Resources

ABySS Options
- http://seqanswers.com/wiki/ABySS#Single-end_assembly

More Job Information

See the following for more detailed job submission information:

Reporting Bugs

Report bugs with ABySS to the IT Help Desk: rc-help@usf.edu

@@ Line 1: / Line 1: @@
 == Description ==
-''From the ABySS web site'': ABySS (Assembly By Short Sequences) is a ''de novo'', parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.
+''From the ABySS web site'': '''ABySS (Assembly By Short Sequences)''' is a ''de novo'', parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.
-* [http://www.bcgsc.ca/platform/bioinfo/software/abyss ABySS Home Page]
+{{AppStandardHeader|1.2.7|abyss}}
-== Version ==
+== Running ABySS Jobs on CIRCE ==
-*1.2.7
-== Authorized Users ==
-*<code>CIRCE</code> account holders
-== Platforms ==
+{{PleaseReadUserGuide}}
+{{SLURMAppParams}}
-*<code>CIRCE</code> cluster
-== Running ABySS Jobs on CIRCE ==
 To assemble transcriptome data, see [[Trans-ABySS]].
-=== [[Modules]] ===
-Before running a job, you must first set up your environment properly. Here are the required module files:
-*apps/abyss/1.2.7
-To run ABySS on the cluster, ensure that you use <code>module add</code> prior to using any executables. See [[Modules]] for more information.
 === How to Submit Jobs ===
-Provided are batch scripts for running ABySS as a single processor or multi-processor job. These scripts can be copied into your work directory (the folder with your input files and database files) so that you can submit batch processes to the queue. For help on submitting jobs to the queue, see our [[SLURM Users|SLURM User’s Guide]]. These scripts are provided as examples only. Your SLURM and ABySS options will vary.
+Provided are batch scripts for running ABySS as a single processor or multi-processor job. These scripts can be copied into your work directory (the folder with your input files and database files) so that you can submit batch processes to the queue. .
-Please refer to the [[ABySS#Additional Documentation | Additional Documentation]] section for a link to ABySS option information.
 === Serial Submit Script ===
@@ Line 108: / Line 90: @@
 *You can view the status of your job with the “squeue -u <username>” command
-== Additional Documentation ==
+{{Documentation}}
+*ABySS Home Page
+**http://www.bcgsc.ca/platform/bioinfo/software/abyss
+{{BKETOR}}
 *ABySS Options
 ** http://seqanswers.com/wiki/ABySS#Single-end_assembly