MaSuRCA
Description
From the University of Maryland Assembly Group website: MaSuRCA is the Maryland Super-Read Celera Assembler and can be used on assembly projects of all sizes, from bacteria genomes to mammalian genomes to large plant genomes. MaSuRCA has been used to assemble de novo a variety of genomes, sometimes improving on published genomes using added data, sometimes creating the first publicly available draft genome for the species.
Version
- 3.2.2
Authorized Users
CIRCE
account holdersRRA
account holdersSC
account holders
Platforms
CIRCE
clusterRRA
clusterSC
cluster
Modules
MaSuRCA requires the following module file to run:
apps/MaSuRCA/3.2.2
- See Modules for more information.
Running MaSuRCA on CIRCE/SC
The MaSuRCA user guide is essential to understanding the application and making the most of it. The guide and this page should help you to get started with your simulations. Please refer to the Documentation section for a link to the guide.
- Note on CIRCE: Make sure to run your jobs from your $WORK directory!
- Note: Scripts are provided as examples only. Your SLURM executables, tools, and options may vary from the example below. For help on submitting jobs to the queue, see our SLURM User’s Guide.
Submitting Jobs
The assembly is driven by a configuration file that specifies the location of the read files and some parameters. A shell script is generated from this configuration that will run the actual assembler.
For example, you can generate a MaSuRCA input configuration file named configuration.txt using the command below (after loading the appropriate module file):
[user@login0 ~]$ cd my/jobdir [user@login0 jobdir]$ masurca -g configuration.txt
Then edit the configuration file 'configuration.txt' with a text editor and start the assembly using the example script below. The script (for testing, name it “masurca-test.sh”) can be copied into your job directory (the folder with your input and configuration files) and modified so that you can submit batch processes to the queue.
Please note: MaSuRCA can run on multiple cores, but is not distributable across nodes. Therefore the node count must always be 1.
#!/bin/bash # #SBATCH --comment=masurca-test #SBATCH --nodes=1 ###Must be 1 #SBATCH --ntasks-per-node=8 #SBATCH --job-name=masurca-test #SBATCH --output=output.%j.masurca-test #SBATCH --time=01:00:00 #### SLURM 8 processor MaSuRCA test to run for 1 hour on a single node. module purge module load apps/masurca/3.2.2 masurca configuration.txt ./assemble.sh
Next, be sure that you are in your job's directory, and run the sbatch command to submit the job:
[user@login0 ~]$ cd my/jobdir [user@login0 jobdir]$ sbatch ./masurca-test.sh
- You can view the status of your job with the “squeue -u <username>” command
Documentation
Home Page, User Guides, and Manuals
- MaSuRCA Home Page
- MaSuRCA Manual
- ftp://ftp.genome.umd.edu/pub/MaSuRCA/MaSuRCA_QuickStartGuide.pdf
- /apps/masurca/3.2.2/docs/MaSuRCA_QuickStartGuide.pdf
More Job Information
See the following for more detailed job submission information:
Reporting Bugs
Report bugs with MaSuRCA to the IT Help Desk: rc-help@usf.edu