Mothur-Sapelo2

From Research Computing Center Wiki
Jump to navigation Jump to search

Category

Bioinformatics

Program On

Sapelo2

Version

1.45.0

Author / Distributor

Please see Mothur

Description

"Mothur is a single piece of open-source, expandable software to fill the bioinformatics needs of the microbial ecology community. The functionality of different software including dotur, sons, treeclimber, s-libshuff, unifrac, and others have been incorperated in Mothur. In addition to improving the flexibility of these software, a number of other features including calculators and visualization tools are available with Mothur." More details, documentation and tutorials are at theMothur Wiki

Running Program

Also refer to Running Jobs on Sapelo2

For more information on Environment Modules on Sapelo2 please see the Lmod page.

The latest version of this application is at /apps/singularity-images/mothur-1.45.0.sif

To use this version, please run

singularity exec /apps/singularity-images/mothur-1.45.0.sif [command] [options]

To run mother in "batch mode", collect your Mothur commands into a command file and use that file in a batch job. More information about creating Mothur batch files can be found here: https://mothur.org/wiki/batch_mode/ Some Mothur commands can use multiple cores. To use multiple cores, adjust the processors parameter for each command in the Mothur command file. You must also adjust the --cpus_per_task parameter in your submission script to be equal to the number of processors you request in your Mothur command file.

The following is an example of a Mothur command file requesting use of 8 cpus.This file is called Mother_Commandfile.txt, and assembles pair end reads and prepares them for analysis.

make.file(inputdir=./MiSeq_SOP, type=gz, prefix=stability)
make.contigs(file=current, processors=8)
screen.seqs(fasta=current, group=current, maxambig=0, maxlength=275)
unique.seqs()
count.seqs(name=current, group=current)
align.seqs(fasta=current, reference=silva.v4.fasta)

he following is an example job submission script (sub.sh) running the above Mother command file:

#!/bin/bash
#SBATCH --job-name=Mothurtest         # Job name
#SBATCH --partition=batch             # Partition (queue) name
#SBATCH --ntasks=1                    # Run on a single CPU
#SBATCH --cpus-per-task=8             # Use 8 cpus
#SBATCH --mem=10gb                    # Job memory request
#SBATCH --time=02:00:00               # Time limit hrs:min:sec
#SBATCH --output=%x_%j.out            # Standard output log
#SBATCH --error=%x_%j.err             # Standard error log

#SBATCH --mail-type=END,FAIL          # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=username@uga.edu  # Where to send mail	

cd $SLURM_SUBMIT_DIR
singularity exec /apps/singularity-images/mothur-1.45.0.sif mothur Mother_Commandfile.txt

Here is an example of job submission command:

sbatch ./sub.sh 

Documentation

[cft07037@b1-24 singularity-images]$ singularity exec /apps/singularity-images/mothur-1.45.0.sif mothur --help
INFO:    underlay of /etc/localtime required more than 50 (72) bind mounts
Linux version

Using Boost
mothur v.1.45.0
Last updated: 3/22/21
by
Patrick D. Schloss

Department of Microbiology & Immunology

University of Michigan
http://www.mothur.org

When using, please cite:
Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.

Distributed under the GNU General Public License

Type 'help()' for information on the commands that are available

For questions and analysis support, please visit our forum at https://forum.mothur.org

Type 'quit()' to exit program

[NOTE]: Setting random seed to 19760620.

Script Mode


mothur > help()

NOTE: sens.spec assumes that only unique sequences were used to generate the distance matrix.


Clustering commmands include: cluster, cluster.classic, cluster.fit, cluster.split, mgcluster, phylotype

General commmands include: get.current, get.dists, make.biom, make.file, make.group, make.lefse, merge.count, merge.files, merge.groups, remove.dists, rename.file, set.current, set.dir, set.logfile, set.seed, system

Hypothesis Testing commmands include: amova, anosim, clearcut, cooccurrence, corr.axes, deunique.tree, homova, indicator, kruskal.wallis, libshuff, mantel, nmds, otu.association, parsimony, pca, pcoa, phylo.diversity, unifrac.unweighted, unifrac.weighted

OTU-Based Approaches commmands include: biom.info, classify.svm, collect.shared, collect.single, create.database, dist.shared, estimator.single, filter.shared, get.communitytype, get.coremicrobiome, get.group, get.groups, get.label, get.otus, get.otulist, get.oturep, get.otus, get.rabund, get.relabund, get.sabund, get.sharedseqs, heatmap.bin, heatmap.sim, lefse, list.otus, list.otus, make.clr, make.shared, merge.otus, metastats, normalize.shared, otu.hierarchy, primer.design, rarefaction.shared, rarefaction.single, remove.groups, remove.otus, remove.otus, remove.rare, sens.spec, sparcc, split.abund, summary.shared, summary.single, tree.shared, venn

Phylotype Analysis commmands include: classify.otu, classify.seqs, classify.tree, get.lineage, merge.taxsummary, remove.lineage, summary.tax

Sequence Processing commmands include: align.check, align.seqs, bin.seqs, chimera.bellerophon, chimera.ccode, chimera.check, chimera.perseus, chimera.pintail, chimera.slayer, chimera.uchime, chimera.vsearch, chop.seqs, cluster.fragments, consensus.seqs, count.groups, count.seqs, degap.seqs, deunique.seqs, dist.seqs, fastq.info, filter.seqs, get.mimarkspackage, get.seqs, list.seqs, make.contigs, make.fastq, make.lookup, make.sra, count.seqs, merge.sfffiles, pairwise.seqs, pcr.seqs, pre.cluster, remove.seqs, rename.seqs, reverse.seqs, screen.seqs, seq.error, sff.multiple, sff.info, shhh.flows, shhh.seqs, sort.seqs, split.groups, sra.info, sub.sample, summary.qual, summary.seqs, trim.flows, trim.seqs, unique.seqs

For more information about a specific command type 'commandName(help)' i.e. 'cluster(help)'


Common Questions: 

1. How do I site mothur?
	Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.

2. Do you have an example analysis?
	Yes, https://mothur.org/wiki/454_SOP and https://mothur.org/wiki/MiSeq_SOP highlight some of the things you can do with mothur.

3. Do you offer workshops?
	Yes! Please see our https://mothur.org/wiki/Workshops page for more information.

4. What are mothur's file types?
	Mothur uses and creates many file types. Including fasta, name, group, design, count, list, rabund, sabund, shared, relabund, oligos, taxonomy, constaxonomy, phylip, column, flow, qfile, file, biom and tree. You can find out more about these formats here: https://www.mothur.org/wiki/File_Types.

5. Is there a list of all of mothur's commands?
	Yes! You can find it here, http://www.mothur.org/wiki/Category:Commands.

6. Why does the cutoff change when I cluster with average neighbor?
	This is a product of using the average neighbor algorithm with a sparse distance matrix. When you run cluster, the algorithm looks for pairs of sequences to merge in the rows and columns that are getting merged together. Let's say you set the cutoff to 0.05. If one cell has a distance of 0.03 and the cell it is getting merged with has a distance above 0.05 then the cutoff is reset to 0.03, because it's not possible to merge at a higher level and keep all the data. All of the sequences are still there from multiple phyla. Incidentally, although we always see this, it is a bigger problem for people that include sequences that do not fully overlap.


Common Issues: 

1. Mothur can't find my input files. What wrong?
	By default, mothur will then look for the input files in the directory where mothur's executable is located. Mothur will also search the input, output and temporary default locations. You can set these locations using the set.dir command: set.dir(input=/users/myuser/desktop/mothurdata). Alternatively you can provide complete file names, or move the input files to mothur's executable location.

2. I installed the latest version, but I am still running an older version. Why?
	We often see this issue when you have an older version of mothur installed in your path. You can find out where by opening a terminal window and running: 

	yourusername$ which mothur
	path_to_old_version
	for example: yourusername$ which mothur
	/usr/local/bin

	When you find the location of the older version, you can delete it or move it out of your path with the following:

	yourusername$ mv path_to_old_version/mothur new_location
	for example: yourusername$ mv /usr/local/bin/mothur /Users/yourusername/desktop/old_version_mothur

3. File Mismatches - 'yourSequence is in fileA but not in fileB, please correct.'
	The most common reason this occurs is because you forgot to include a name or count file on a command, or accidentally included the wrong one due to a typo. Mothur has a 'current' option, which allows you to set file parameters to 'current'. For example, if fasta=current mothur will use the last fasta file given or created. The current option was designed to help avoid typo mistakes due to mothur's long filenames. Another reason this might occur is a process failing when you are using multiple processors. If a process dies, a file can be incomplete which would cause a mismatch error downstream.

4. I don't have enough RAM or processing power. What are my options?
	If you are using multiple processors, try running the command with processors=1, the more processors you use the more memory is required.
	Alternatively, you can use AWS to run your analysis. Here are instructions: https://mothur.org/wiki/Mothur_AMI.

5. Mothur crashes when I read my distance file. What's wrong?
	There are two common causes for this, file size and format.

	FileSize:	The cluster command loads your distance matrix into RAM, and your distance file is most likely too large to fit in RAM. There are two options to help with this. The first is to use a cutoff. By using a cutoff mothur will only load distances that are below the cutoff. If that is still not enough, there is a command called cluster.split, http://www.mothur.org/wiki/cluster.split. Cluster.split divides the dataset by taxonomic assignment and generates matrices for each grouping, and then clusters the smaller pieces separately. You may also be able to reduce the size of the original distance matrix by using the commands outline in the Schloss SOP, http://www.mothur.org/wiki/Schloss_SOP

	Wrong Format:	This error can be caused by trying to read a column formatted distance matrix using the phylip parameter. By default, the dist.seqs command generates a column formatted distance matrix. To make a phylip formatted matrix set the dist.seqs command parameter output to lt.

6. Why do I have such a large distance matrix?
	This is most often caused by poor overlap of your reads. When reads have poor overlap, it greatly increases your error rate. Also, sequences that should cluster together don't because the errors appear to be genetic differences when in fact they are not. The quality of the data you are processing can not be overstressed. Error filled reads produce error filled results!

	Check out Pat's blog: http://blog.mothur.org/2014/09/11/Why-such-a-large-distance-matrix/

	NOTE: To take a step back, if you look through our MiSeq SOP, you’ll see that we go to great pains to only work with the unique sequences to limit the number of sequences we have to align, screen for chimeras, classify, etc. We all know that 20 million reads will never make it through the pipeline without setting your computer on fire. Returning to the question at hand, you can imagine that if the reads do not fully overlap then any error in the 5’ end of the first read will be uncorrected by the 3’ end of the second read. If we assume for now that the errors are random, then every error will generate a new unique sequence. Granted, this happens less than 1% of the time, but multiply that by 20 million reads at whatever length you choose and you’ve got a big number. Viola, a bunch of unique reads and a ginormous distance matrix.

7. Mothur reports a 'bad_alloc' error in the shhh.flows command. What's wrong?
	This error indicates your computer is running out of memory. The shhh.flows command is very memory intensive. This error is most commonly caused by trying to process a dataset too large, using multiple processors, or failing to run trim.flows before shhh.flows. If you are using multiple processors, try running the command with processors=1, the more processors you use the more memory is required. Running trim.flows with an oligos file, and then shhh.flows with the file option may also resolve the issue. If for some reason you are unable to run shhh.flows with your data, a good alternative is to use the trim.seqs command using a 50-bp sliding window and to trim the sequence when the average quality score over that window drops below 35. Our results suggest that the sequencing error rates by this method are very good, but not quite as good as by shhh.flows and that the resulting sequences tend to be a bit shorter.


How To: 

1. How do I make a tree?
	Mothur has two commands that create trees: clearcut and tree.shared.

	The clearcut commands creates a phylogenetic tree that represents how sequences relate. The clearcut program written by Initiative for Bioinformatics and Evolutionary Studies (IBEST) at the University of Idaho. For more information about clearcut please refer to http://bioinformatics.hungry.com/clearcut/

	The tree.shared command will generate a newick-formatted tree file that describes the dissimilarity (1-similarity) among multiple groups. Groups are clustered using the UPGMA algorithm using the distance between communities as calculated using any of the calculators describing the similarity in community membership or structure.

2. How do I know 'who' is in an OTU in a shared file?
	You can run the get.otulist command on the list file you used to generate the shared file. You want to be sure you are comparing the same distances. ie final.opti_mcc.0.03.otulist would relate to the 0.03 distance in your shared file. Also, if you subsample your data set and want to compare things, be sure to subsample the list and group file and then create the shared file to make sure you are working with the same sequences.

	sub.sample(list=yourListFile, count=yourCountFile, persample=t)
	make.shared(list=yourSubsampledListFile, group=yourSubsampledCountFile, label=0.03)
	get.otulist(list=yourSubsampledListFile, label=0.03)

3. How do I know 'who' is in the OTUs represented in the venn picture?
	You can use the get.sharedseqs command. Be sure to pay close attention to the 'unique' and 'shared' parameters.

4. How do I select certain sequences or groups of sequences?
	Mothur has several 'get' and 'remove' commands: get.seqs, get.lineage, get.groups, get.dists, get.otus, remove.seqs, remove.lineage, remove.dists, remove.otus and remove.groups.

5. How do I visualize my results from mothur?
	To visual your data with R follow this tutorial http://www.riffomonas.org/minimalR/06_line_plots.html.


For further assistance please refer to the Mothur manual on our wiki at http://www.mothur.org/wiki.


For further assistance please refer to the Mothur manual on our wiki at http://www.mothur.org/wiki, or contact Pat Schloss at mothur.bugs@gmail.com.


mothur > quit()


It took 0 seconds to run 2 commands from your script.

Back to Top

Installation

Source code is obtained from Mothur github

System

64-bit Linux