MaSuRCA

Description

From the University of Maryland Assembly Group website: MaSuRCA is the Maryland Super-Read Celera Assembler and can be used on assembly projects of all sizes, from bacteria genomes to mammalian genomes to large plant genomes. MaSuRCA has been used to assemble de novo a variety of genomes, sometimes improving on published genomes using added data, sometimes creating the first publicly available draft genome for the species.

Version

  • 3.2.2

Authorized Users

  • CIRCE account holders
  • RRA account holders
  • SC account holders

Platforms

  • CIRCE cluster
  • RRA cluster
  • SC cluster

Modules

MaSuRCA requires the following module file to run:

  • apps/MaSuRCA/3.2.2

Running MaSuRCA on CIRCE/SC

The MaSuRCA user guide is essential to understanding the application and making the most of it. The guide and this page should help you to get started with your simulations. Please refer to the Documentation section for a link to the guide.

  • Note on CIRCE: Make sure to run your jobs from your $WORK directory!
  • Note: Scripts are provided as examples only. Your SLURM executables, tools, and options may vary from the example below. For help on submitting jobs to the queue, see our SLURM User’s Guide.

Submitting Jobs

The assembly is driven by a configuration file that specifies the location of the read files and some parameters. A shell script is generated from this configuration that will run the actual assembler.

For example, you can generate a MaSuRCA input configuration file named configuration.txt using the command below (after loading the appropriate module file):

[user@login0 ~]$ cd my/jobdir
[user@login0 jobdir]$ masurca -g configuration.txt

Then edit the configuration file 'configuration.txt' with a text editor and start the assembly using the example script below. The script (for testing, name it “masurca-test.sh”) can be copied into your job directory (the folder with your input and configuration files) and modified so that you can submit batch processes to the queue.

Please note: MaSuRCA can run on multiple cores, but is not distributable across nodes. Therefore the node count must always be 1.

#!/bin/bash
#
#SBATCH --comment=masurca-test
#SBATCH --nodes=1  ###Must be 1
#SBATCH --ntasks-per-node=8
#SBATCH --job-name=masurca-test
#SBATCH --output=output.%j.masurca-test
#SBATCH --time=01:00:00

#### SLURM 8 processor MaSuRCA test to run for 1 hour on a single node.

module purge
module load apps/masurca/3.2.2

masurca configuration.txt
./assemble.sh

 
Next, be sure that you are in your job's directory, and run the sbatch command to submit the job:

[user@login0 ~]$ cd my/jobdir
[user@login0 jobdir]$ sbatch ./masurca-test.sh
  • You can view the status of your job with the “squeue -u <username>” command

Documentation

Home Page, User Guides, and Manuals

More Job Information

See the following for more detailed job submission information:

Reporting Bugs

Report bugs with MaSuRCA to the IT Help Desk: rc-help@usf.edu