DMRIharmonization-Sapelo2: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
(Created page with "Category:Sapelo2Category:SoftwareCategory:Engineering === Category === Engineering === Program On === Sapelo2 === Version === 20240227 === Author / Distributor === See https://github.com/pnlbwh/dMRIharmonization === Description === "dMRIharmonization repository is developed by Tashrif Billah, Sylvain Bouix, Suheyla Cetin Karayumak, and Yogesh Rathi, Brigham and Women's Hospital (Harvard Medical School)." For more information, please see https://...")
 
No edit summary
Line 19: Line 19:
=== Description ===
=== Description ===
   
   
"dMRIharmonization repository is developed by Tashrif Billah, Sylvain Bouix, Suheyla Cetin Karayumak, and Yogesh Rathi, Brigham and Women's Hospital (Harvard Medical School)." For more information, please see https://github.com/pnlbwh/dMRIharmonization.
"dMRIharmonization repository is developed by Tashrif Billah, Sylvain Bouix, Suheyla Cetin Karayumak, and Yogesh Rathi, Brigham and Women's Hospital (Harvard Medical School)."
More details are at https://github.com/pnlbwh/dMRIharmonization.
 
=== Running Program ===
 
* Version 20240227 is installed as a Python virtual environment on Sapelo2 at /apps/gb/dMRIharmonization/20240227
 
To use it, please load the module and activate its env with:
<pre class="gscript">
  ml dMRIharmonization/20240227
  source ${EBROOTDMRIHARMONIZATION}/harmonization/bin/activate
  source ${EBROOTDMRIHARMONIZATION}/../env.sh
</pre>
 
To deactivate its env, please do:
<pre class="gscript">
  deactivate
</pre>
 
<nowiki>#</nowiki>
 
Below is an example of a job submission script (sub.sh) to run harmonization.py on the batch queue:
<div class="gscript2">
#!/bin/bash
#SBATCH --job-name=dc_h         
#SBATCH --partition=batch         
#SBATCH --mem=160G
#SBATCH --nodes=1
#SBATCH --ntasks=24
#SBATCH --cpus-per-task=1
#SBATCH --time=7-00
#SBATCH --constraint="Genoa|Milan"
 
#SBATCH --mail-type=ALL   
#SBATCH --mail-user=jbrown95@uga.edu
 
cd $SLURM_SUBMIT_DIR
 
ml purge
ml dMRIharmonization/20240227
source ${EBROOTDMRIHARMONIZATION}/harmonization/bin/activate
source ${EBROOTDMRIHARMONIZATION}/../env.sh
 
export OMP_NUM_THREADS=1
 
site=dallas
 
harmonization.py \
--tar_list full_inputs/${site}_all.csv \
--tar_name ${site} \
--template ${site}_to_chicago_template/ \
--nshm 8 \
--nzero 10 \
--nproc 24 \
--process
</div>
 
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.
 
Please use '''--constraint="Genoa|Milan"''' header in your job submission script for a quicker job start time and optimal job performance.
 
Here is an example of job submission command:
<pre  class="gcommand">
sbatch ./sub.sh
</pre>
 
=== Documentation ===
<pre  class="gcommand">
ml CellRanger-ATAC/1.2.0
cellranger-atac -h
 
cellranger-atac -h (1.2.0)
Copyright (c) 2019 10x Genomics, Inc.  All rights reserved.
-------------------------------------------------------------------------------
 
Usage:
    cellranger-atac mkfastq
 
    cellranger-atac count
    cellranger-atac aggr
    cellranger-atac reanalyze
 
    cellranger-atac mkref
 
    cellranger-atac testrun
    cellranger-atac upload
    cellranger-atac sitecheck
 
 
cellranger-atac count -h
 
cellranger-atac count (1.2.0)
Copyright (c) 2019 10x Genomics, Inc.  All rights reserved.
-------------------------------------------------------------------------------
 
The commands below should be preceded by 'cellranger-atac':
 
Usage:
    count
        --id=ID
        --fastqs=PATH
        [--sample=PREFIX]
        [options]
    count <run_id> <mro> [options]
    count -h | --help | --version
 
Arguments:
    id      A unique run id, used to name output folder [a-zA-Z0-9_-]+.
    fastqs  Path of folder created by mkfastq or bcl2fastq.
    sample  Prefix of the filenames of FASTQs to select.
 
Options:
# Sample Specification
    --reference=PATH Path of folder containing a 10x-compatible reference.
        Required.
    --description=TEXT  More detailed sample description. Optional.
    --lanes=NUMS        Comma-separated lane numbers.
    --indices=INDICES  Deprecated. Not needed with the output of
    cellranger-atac mkfastq, or bcl2fastq
    --project=TEXT      Name of the project folder within a mkfastq or
                            bcl2fastq-generated folder to pick FASTQs from.
# ATAC analysis
    --force-cells=N    Define the top N barcodes with the most reads as
                            cells. N must be a positive integer <=
                            20,000. Please consult the documentation
                            before using this option. Optional.
    --dim-reduce=MODE  Dimensionality reduction mode for clustering: 'lsa'
                            (default), 'plsa', or 'pca'. Optional.
# Downsampling
    --downsample=GB    Downsample input FASTQs to approximately GB
                            gigabases of input sequence. Optional.
# Martian Runtime
    --jobmode=MODE      Job manager to use. Valid options:
                            local (default), sge, lsf, or a .template file
    --localcores=NUM    Set max cores the pipeline may request at one time.
                            Only applies to local jobs.
    --localmem=NUM      Set max GB the pipeline may request at one time.
                            Only applies to local jobs.
    --localvmem=NUM    Set max virtual address space in GB for the pipeline.
                            Only applies to local jobs.
    --mempercore=NUM    Reserve enough threads for each job to ensure enough
                        memory will be available, assuming each core on your
                        cluster has at least this much memory available.
                            Only applies in cluster jobmodes.
    --maxjobs=NUM      Set max jobs submitted to cluster at one time.
                            Only applies in cluster jobmodes.
    --jobinterval=NUM  Set delay between submitting jobs to cluster, in ms.
                            Only applies in cluster jobmodes.
    --overrides=PATH    The path to a JSON file that specifies stage-level
                            overrides for cores and memory.  Finer-grained
                            than --localcores, --mempercore and --localmem.
                            Consult the 10x support website for an example
                            override file.
    --uiport=PORT      Serve web UI at http://localhost:PORT
    --disable-ui        Do not serve the UI.
    --noexit            Keep web UI running after pipestance completes or fails.
    --nopreflight      Skip preflight checks.
 
    -h --help          Show this message.
    --version          Show version.
 
Note: 'cellranger-atac count' works as follows:
set --fastqs to the folder containing FASTQ files. In addition,
set --sample to the name prefixed to the FASTQ files comprising your sample.
For example, if your FASTQs are named:
    subject1_S1_L001_R1_001.fastq.gz
then set --sample=subject1
 
</pre>
[[#top|Back to Top]]
 
=== Installation ===
Source code is download from https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest
=== System ===
64-bit Linux

Revision as of 08:31, 17 September 2024


Category

Engineering

Program On

Sapelo2

Version

20240227

Author / Distributor

See https://github.com/pnlbwh/dMRIharmonization

Description

"dMRIharmonization repository is developed by Tashrif Billah, Sylvain Bouix, Suheyla Cetin Karayumak, and Yogesh Rathi, Brigham and Women's Hospital (Harvard Medical School)." More details are at https://github.com/pnlbwh/dMRIharmonization.

Running Program

  • Version 20240227 is installed as a Python virtual environment on Sapelo2 at /apps/gb/dMRIharmonization/20240227

To use it, please load the module and activate its env with:

   ml dMRIharmonization/20240227
   source ${EBROOTDMRIHARMONIZATION}/harmonization/bin/activate
   source ${EBROOTDMRIHARMONIZATION}/../env.sh

To deactivate its env, please do:

   deactivate

#

Below is an example of a job submission script (sub.sh) to run harmonization.py on the batch queue:

  1. !/bin/bash
  2. SBATCH --job-name=dc_h
  3. SBATCH --partition=batch
  4. SBATCH --mem=160G
  5. SBATCH --nodes=1
  6. SBATCH --ntasks=24
  7. SBATCH --cpus-per-task=1
  8. SBATCH --time=7-00
  9. SBATCH --constraint="Genoa|Milan"
  1. SBATCH --mail-type=ALL
  2. SBATCH --mail-user=jbrown95@uga.edu

cd $SLURM_SUBMIT_DIR

ml purge ml dMRIharmonization/20240227 source ${EBROOTDMRIHARMONIZATION}/harmonization/bin/activate source ${EBROOTDMRIHARMONIZATION}/../env.sh

export OMP_NUM_THREADS=1

site=dallas

harmonization.py \ --tar_list full_inputs/${site}_all.csv \ --tar_name ${site} \ --template ${site}_to_chicago_template/ \ --nshm 8 \ --nzero 10 \ --nproc 24 \ --process

In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.

Please use --constraint="Genoa|Milan" header in your job submission script for a quicker job start time and optimal job performance.

Here is an example of job submission command:

sbatch ./sub.sh 

Documentation

ml CellRanger-ATAC/1.2.0
cellranger-atac -h

cellranger-atac -h (1.2.0)
Copyright (c) 2019 10x Genomics, Inc.  All rights reserved.
-------------------------------------------------------------------------------

Usage:
    cellranger-atac mkfastq

    cellranger-atac count
    cellranger-atac aggr
    cellranger-atac reanalyze

    cellranger-atac mkref

    cellranger-atac testrun
    cellranger-atac upload
    cellranger-atac sitecheck


cellranger-atac count -h

cellranger-atac count (1.2.0)
Copyright (c) 2019 10x Genomics, Inc.  All rights reserved.
-------------------------------------------------------------------------------

The commands below should be preceded by 'cellranger-atac':

Usage:
    count
        --id=ID
        --fastqs=PATH
        [--sample=PREFIX]
        [options]
    count <run_id> <mro> [options]
    count -h | --help | --version

Arguments:
    id      A unique run id, used to name output folder [a-zA-Z0-9_-]+.
    fastqs  Path of folder created by mkfastq or bcl2fastq.
    sample  Prefix of the filenames of FASTQs to select.

Options:
# Sample Specification
    --reference=PATH	Path of folder containing a 10x-compatible reference.
    			    Required.
    --description=TEXT  More detailed sample description. Optional.
    --lanes=NUMS        Comma-separated lane numbers.
    --indices=INDICES   Deprecated. Not needed with the output of
    			cellranger-atac mkfastq, or bcl2fastq
    --project=TEXT      Name of the project folder within a mkfastq or
                            bcl2fastq-generated folder to pick FASTQs from.
# ATAC analysis 
    --force-cells=N     Define the top N barcodes with the most reads as
                            cells. N must be a positive integer <=
                            20,000. Please consult the documentation
                            before using this option. Optional.
    --dim-reduce=MODE   Dimensionality reduction mode for clustering: 'lsa'
                            (default), 'plsa', or 'pca'. Optional.
# Downsampling
    --downsample=GB     Downsample input FASTQs to approximately GB 
                            gigabases of input sequence. Optional.
# Martian Runtime
    --jobmode=MODE      Job manager to use. Valid options:
                            local (default), sge, lsf, or a .template file
    --localcores=NUM    Set max cores the pipeline may request at one time.
                            Only applies to local jobs.
    --localmem=NUM      Set max GB the pipeline may request at one time.
                            Only applies to local jobs.
    --localvmem=NUM     Set max virtual address space in GB for the pipeline.
                            Only applies to local jobs.
    --mempercore=NUM    Reserve enough threads for each job to ensure enough
                        memory will be available, assuming each core on your
                        cluster has at least this much memory available.
                            Only applies in cluster jobmodes.
    --maxjobs=NUM       Set max jobs submitted to cluster at one time.
                            Only applies in cluster jobmodes.
    --jobinterval=NUM   Set delay between submitting jobs to cluster, in ms.
                            Only applies in cluster jobmodes.
    --overrides=PATH    The path to a JSON file that specifies stage-level
                            overrides for cores and memory.  Finer-grained
                            than --localcores, --mempercore and --localmem.
                            Consult the 10x support website for an example
                            override file.
    --uiport=PORT       Serve web UI at http://localhost:PORT
    --disable-ui        Do not serve the UI.
    --noexit            Keep web UI running after pipestance completes or fails.
    --nopreflight       Skip preflight checks.

    -h --help           Show this message.
    --version           Show version.

Note: 'cellranger-atac count' works as follows:
set --fastqs to the folder containing FASTQ files. In addition,
set --sample to the name prefixed to the FASTQ files comprising your sample. 
For example, if your FASTQs are named:
    subject1_S1_L001_R1_001.fastq.gz
then set --sample=subject1

Back to Top

Installation

Source code is download from https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest

System

64-bit Linux