ESPRIT

Description

ESPRIT is a computational algorithm that can be used to estimate microbial diversity based on 16S rRNA shotgun sequences. It consists of four modules: (1) removes low-quality reads using various criteria, (2) computes pairwise distances of reads, (3) groups reads into OTUs at different dissimilarity levels, and (4) performs statistical inference to estimate species richness.

Version

  • 20090715

Authorized Users

  • CIRCE account holders
  • SC account holders

Platforms

  • CIRCE cluster
  • SC cluster

Modules

ESPRIT requires the following module file to run:

  • apps/esprit/20090715

Running ESPRIT on CIRCE/SC

The ESPRIT user guide is essential to understanding the application and making the most of it. The guide and this page should help you to get started with your simulations. Please refer to the Documentation section for a link to the guide.

  • Note on CIRCE: Make sure to run your jobs from your $WORK directory!
  • Note: Scripts are provided as examples only. Your SLURM executables, tools, and options may vary from the example below. For help on submitting jobs to the queue, see our SLURM User’s Guide.

Creating and Submitting a Job

Job Considerations

In order to submit a ESPRIT job, you will need to have two things:

  • an input FASTA file that is pre-trimmed
  • the number of pieces the FASTA file should be split into for parallelization

The number of pieces you choose determines how many tasks will be submitted to the cluster. The kmer-distance calculation will require ((num_pieces SELECT 2) + num_pieces) tasks, and the Needleman-Wunch calculation will require (num_pieces * num_pieces) tasks.

For a FASTA file split into 10 pieces, for example, you will have 55 and 100 tasks, respectively.

It is recommended that you do not split the file into more than 10 pieces, since the grid will have to dispatch a lot of tasks and it will take longer for them to finish since not all of them will be dispatched at the same time.

It is also recommended that you run your job from /work to ensure integrity of results.

Submit the Job

  1. Ensure that the ESPRIT module is loaded in the persistent environment:
    [user@login1 ~]$ module load apps/esprit/20090715
  2. Create a directory for the ESPRIT run:
    [user@login1 ~]$ mkdir $WORK/esprit-job1
  3. Copy your FASTA file into the directory:
    [user@login1 ~]$ cp data/test.fas $WORK/esprit-job1
  4. Finally, change into the job directory and run esprit_cc with the proper options. Press ENTER again to confirm the parameters:
    [user@login1 esprit-job1]$ esprit_cc test.fas 5
    Specified input to be handled in 5 parts.
    The number of processes for the KMER distance and Needleman-Wunsh calculations
    are 15 and 25, respectively.
    
    Press Enter to continue. Otherwise, press Ctrl+c to exit.
    
    17666 Seqs Match Primer
    17666 Seqs Valid Len
    
    8234 Seqs After Process
          0.07 secs in Purging Strings. 
    Calculating KMER distances..
    Your job-array 77519.1-15:1 ("kmerdist.26893") has been submitted
    Waiting on job kmerdist.26893 to complete. Please wait..
    Counting Total Records....
    497836 Records Found, Splitting...
     Writing kmer.dist_24
    Calculating Needleman-Wunch Distances:
    Your job-array 77521.1-25:1 ("needledist.26893") has been submitted
    Waiting on job needledist.26893 to complete. Please wait..
    Your job 77523 ("hcluster.26893") has been submitted
    Waiting on job hcluster.26893 to complete. Please wait..
    Dist 0.01 Seqs 17666 Clusters 8028
    Dist 0.02 Seqs 17666 Clusters 6363
    Dist 0.03 Seqs 17666 Clusters 6108
    Dist 0.04 Seqs 17666 Clusters 5035
    Dist 0.05 Seqs 17666 Clusters 4335
    Dist 0.06 Seqs 17666 Clusters 3961
    Dist 0.07 Seqs 17666 Clusters 3479
    Dist 0.08 Seqs 17666 Clusters 3107
    Dist 0.09 Seqs 17666 Clusters 2816
    Dist 0.10 Seqs 17666 Clusters 2536
    Dist 0.11 Seqs 17666 Clusters 2312
    Dist 0.12 Seqs 17666 Clusters 2147
    Dist 0.13 Seqs 17666 Clusters 2053
    Dist 0.14 Seqs 17666 Clusters 1994
    Dist 0.15 Seqs 17666 Clusters 1942
    Dist 0.16 Seqs 17666 Clusters 1917
    Dist 0.17 Seqs 17666 Clusters 1896
    Dist 0.18 Seqs 17666 Clusters 1892
    Dist 0.19 Seqs 17666 Clusters 1891
    Dist 0.20 Seqs 17666 Clusters 1890
          4.85 secs in Statistical Analysis.

Documentation

Home Page, User Guides, and Manuals

  • ESPRIT User's Guide
    • /opt/apps/esprit/20090715/esprit_user_guide.pdf

Reporting Bugs

Report bugs with ESPRIT to the IT Help Desk: rc-help@usf.edu