PartitionFinder2-Sapelo2: Difference between revisions

@@ Line 1: / Line 1: @@
-[[Category:Sapelo2]][[Category:Software]][[Category:Bioinformatics]]
-=== Category ===
-Bioinformatics
-=== Program On ===
-Sapelo2
-=== Version ===
-.1.1
-=== Author / Distributor ===
-Details at [http://www.robertlanfear.com/partitionfinder/ partitionfinder]
-=== Description ===
-"PartitionFinder is free open source software to select best-fit partitioning schemes and models of molecular evolution for phylogenetic analyses."
-More detailes are at [http://www.robertlanfear.com/partitionfinder/ partitionfinder]
-=== Running Program ===
-Also refer to [[Running Jobs on zcluster]]
-/usr/local/partitionfinder/ points to latest version at /usr/local/partitionfinder/2.0.0-pre11
-Use Anaconda 2.3.0 to run it.
-Example of shell script sub.sh
-<pre class="gscript">
-#!/bin/bash
-cd working_directory
-export PATH=/usr/local/anaconda/2.3.0/bin:${PATH}
-time python /usr/local/partitionfinder/2.0.0-pre11/PartitionFinder.py -p 4 [options] <foldername>
-</pre>
-where <foldername> is the name of the folder with a phylip alignment and associated .cfg file. By default the program will used as many cores as the node has, even if those cores are already busy running other jobs. Therefore it is very important to use the '''-p''' option to specify how many cores to use (e.g. 4) and in the job submission command please request the same number of cores.
-Example of submission to the queue:
-<pre  class="gcommand">
-qsub -q queueName -pe thread 4 sub.sh
-</pre>
-where the number that follows '''-pe thread''' has to match the number that follows the '''-p''' option in the PartitionFinder.py command in sub.sh.
-=== Documentation ===
-Details at http://www.robertlanfear.com/partitionfinder/
-<pre  class="gcommand">
-export PATH=/usr/local/anaconda/2.3.0/bin:${PATH}
-python /usr/local/partitionfinder/1.1.1/PartitionFinder.py --help
-Usage: python PartitionFinder.py [options] <foldername>
-    PartitionFinder and PartitionFinderProtein are designed to discover optimal
-    partitioning schemes for nucleotide and amino acid sequence alignments.
-    They are also useful for finding the best model of sequence evolution for datasets.
-    The Input: <foldername>: the full path to a folder containing:
-        - A configuration file (partition_finder.cfg)
-        - A nucleotide/aa alignment in Phylip format
-    Take a look at the included 'example' folder for more details.
-    The Output: A file in the same directory as the .cfg file, named
-    'analysis' This file contains information on the best
-    partitioning scheme, and the best model for each partiiton
-    Usage Examples:
-        >python PartitionFinder.py example
-        Analyse what is in the 'example' sub-folder in the current folder.
-        >python PartitionFinder.py -v example
-        Analyse what is in the 'example' sub-folder in the current folder, but
-        show all the debug output
-        >python PartitionFinder.py -c ~/data/frogs
-        Check the configuration files in the folder data/frogs in the current
-        user's home folder.
-        >python PartitionFinder.py --force-restart ~/data/frogs
-        Deletes any data produced by the previous runs (which is in
-        ~/data/frogs/output) and starts afresh
-Options:
-  -h, --help            show this help message and exit
-  -v, --verbose         show debug logging information (equivalent to --debug-
-                        out=all)
-  -c, --check-only      just check the configuration files, don't do any
-                        processing
-  -f, --force-restart   delete all previous output and start afresh (!)
-  -p N, --processes=N   Number of concurrent processes to use. Use -1 to match
-                        the number of cpus on the machine. The default is to
-                        use -1.
-  --show-python-exceptions
-                        If errors occur, print the python exceptions
-  --save-phylofiles     save all of the phyml or raxml output. This can take a
-                        lot of space(!)
-  --dump-results        Dump all results to a binary file. This is only of use
-                        for testing purposes.
-  --compare-results     Compare the results to previously dumped binary
-                        results. This is only of use for testing purposes.
-  -q, --quick           Avoid anything slow (like writing schemes at each
-                        step),useful for very large datasets.
-  -r, --raxml           Use RAxML (rather than PhyML) to do the analysis. See
-                        the manual
-  -m, --ml-tree         Estimate a starting tree using maximum likelihood in
-                        RAxML
-  --cmdline-extras=N    Add additional commands to the phyml or raxml
-                        commandlines that PF uses.This can be useful e.g. if
-                        you want to change the accuracy of lnL calculations
-                        ('-e' option in raxml), or use multi-threaded versions
-                        of raxml that require you to specify the number of
-                        threads you will let raxml use ('-T' option in raxml.
-                        E.g. you might specify this: --cmndline_extras ' -e
-.0 -T 10 ' N.B. MAKE SURE YOU PUT YOUR EXTRAS IN
-                        QUOTES, and only use this command if you really know
-                        what you're doing and are very familiar with raxml and
-                        PartitionFinder
-  --weights=N           Mainly for algorithm development. Only use it if you
-                        know what you're doing.A list of weights to use in the
-                        clustering algorithms. This list allows you to assign
-                        different weights to: the overall rate for a subset,
-                        the base/amino acid frequencies, model parameters, and
-                        alpha value. This will affect how subsets are
-                        clustered together. For instance: --cluster_weights
-                        '1, 2, 5, 1', would weight the base freqeuncies 2x
-                        more than the overall rate, the model parameters 5x
-                        more, and the alpha parameter the same as the model
-                        rate
-  --kmeans=type         This defines which sitewise values to use: entropy or
-                        tiger  --kmeans entropy: use entropies for sitewise
-                        values --kmeans tiger: use TIGER rates for sitewise
-                        values
-  --rcluster-percent=N  This defines the proportion of possible schemes that
-                        the relaxed clustering algorithm will consider before
-                        it stops looking. The default is 10%. e.g. --rcluster-
-                        percent 10.0
-  --rcluster-max=N      This defines the number of possible schemes that the
-                        relaxed clustering algorithm will consider before it
-                        stops looking. The default is to look at just the top
-schemes. e.g. --rcluster-max 1000
-  --min-subset-size=N   This defines the minimum subset size that the kmeans
-                        and rcluster algorithm will accept. Subsets smaller
-                        than this  will be merged at with other subsets at the
-                        end of the algorithm (for kmeans) or at the start of
-                        the algorithm (for rcluster). See manual for details.
-                        The default value for kmeans is 100. The default value
-                        for rcluster is to ignore this option. e.g. --min-
-                        subset-size 100
-  --debug-output=REGION,REGION,...
-                        (advanced option) Provide a list of debug regions to
-                        output extra information about what the program is
-                        doing. Possible regions are 'all' or any of {subset,su
-                        bset_ops,neighbour,raxml,parser,model_util,results,ent
-                        ropy,alignment,future_stdlib,threadpool,progress,main,
-                        config,pandas,boto.perf,reporter,kmeans,pandas.io.gbq,
-                        pandas.io,analysis_m,util,scheme,submodels,boto,databa
-                        se,analysis,phyml,raxml_mode,model_load,phyml_mode}.
-  --all-states          In the kmeans and rcluster algorithms, this stipulates
-                        that PartitionFinder should not produce subsets that
-                        do not have all possible states present. E.g. for DNA
-                        sequence data, all subsets in the final scheme must
-                        have A, C, T,  and G nucleotides present. This can
-                        occasionally be useful for downstream  analyses,
-                        particularly concerning amino acid datasets.
-  --profile             Output profiling information after running (this will
-                        slow everything down!)
-</pre>
-[[#top|Back to Top]]
-=== Installation ===
-source code from [https://github.com/brettc/partitionfinder/releases PartitionFinder]
-=== System ===
--bit Linux

PartitionFinder2-Sapelo2: Difference between revisions

Latest revision as of 10:12, 5 June 2018

Navigation menu

PartitionFinder2-Sapelo2: Difference between revisions

Latest revision as of 10:12, 5 June 2018

Navigation menu

Search