DMRIharmonization-Sapelo2: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(6 intermediate revisions by the same user not shown)
Line 28: Line 28:
To use it, please load the module and activate its env with:
To use it, please load the module and activate its env with:
<pre class="gscript">
<pre class="gscript">
  ml dMRIharmonization/20240227
ml dMRIharmonization/20240227
  source ${EBROOTDMRIHARMONIZATION}/harmonization/bin/activate
source ${EBROOTDMRIHARMONIZATION}/harmonization/bin/activate
  source ${EBROOTDMRIHARMONIZATION}/../env.sh
source ${EBROOTDMRIHARMONIZATION}/../env.sh
</pre>
</pre>


To deactivate its env, please do:
To deactivate its env, please do:
<pre class="gscript">
<pre class="gscript">
  deactivate
deactivate
</pre>
</pre>


<nowiki>#</nowiki>
Below is an example of a job submission script (sub.sh) to run '''harmonization.py''' with 24 parallel processes on a single compute node on the batch partition:<syntaxhighlight lang="shell">
 
Below is an example of a job submission script (sub.sh) to run harmonization.py on the batch queue:
<div class="gscript2">
#!/bin/bash
#!/bin/bash
#SBATCH --job-name=dc_h            
#SBATCH --job-name=test_dMRIharmonization            
#SBATCH --partition=batch           
#SBATCH --partition=batch           
#SBATCH --mem=160G
#SBATCH --nodes=1
#SBATCH --nodes=1
#SBATCH --ntasks=24
#SBATCH --ntasks=24
#SBATCH --cpus-per-task=1
#SBATCH --cpus-per-task=1
#SBATCH --time=7-00
#SBATCH --mem=100G
#SBATCH --time=7-00:00:00
#SBATCH --constraint="Genoa|Milan"
#SBATCH --constraint="Genoa|Milan"


#SBATCH --mail-type=ALL     
#SBATCH --mail-type=ALL     
#SBATCH --mail-user=jbrown95@uga.edu
#SBATCH --mail-user=<yourMyID>@uga.edu


cd $SLURM_SUBMIT_DIR
cd $SLURM_SUBMIT_DIR
Line 64: Line 61:
export OMP_NUM_THREADS=1
export OMP_NUM_THREADS=1


site=dallas
harmonization.py --nproc 24 <your other options and arguments>


harmonization.py \
deactivate
--tar_list full_inputs/${site}_all.csv \
</syntaxhighlight>In your actual submission script, please ensure that you request the appropriate computing resources for your job. For example, you can request CPU cores for running parallel processes using the Slurm headers, such as <code>--ntasks=24</code> and <code>--cpus-per-task=1</code>.
--tar_name ${site} \
--template ${site}_to_chicago_template/ \
--nshm 8 \
--nzero 10 \
--nproc 24 \
--process
</div>


In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.
'''Please note:'''
 
* Use the header <code>--constraint="Genoa|Milan"</code> in your job submission script for optimal job performance.
* The value for the header <code>--ntasks</code>, e.g., <code>--ntasks=24</code>, should match the number specified for the <code>--nproc</code> option on your command line, i.e., <code>--nproc=24</code>.
* We highly recommend setting <code>--cpus-per-task=1</code> and exporting <code>OMP_NUM_THREADS=1</code> by including <code>export OMP_NUM_THREADS=1</code>in your job submission script.


Please use '''--constraint="Genoa|Milan"''' header in your job submission script for a quicker job start time and optimal job performance.


Here is an example of job submission command:
Here is an example of job submission command:
Line 87: Line 80:
=== Documentation ===
=== Documentation ===
   
   
<pre class="gcommand">
<pre class="gcommand">
ml CellRanger-ATAC/1.2.0
ml dMRIharmonization/20240227
cellranger-atac -h
source ${EBROOTDMRIHARMONIZATION}/harmonization/bin/activate
source ${EBROOTDMRIHARMONIZATION}/../env.sh
harmonization.py -h


cellranger-atac -h (1.2.0)
===============================================================================
Copyright (c) 2019 10x Genomics, Inc.  All rights reserved.
dMRIharmonization (2018) pipeline is written by-
-------------------------------------------------------------------------------


Usage:
TASHRIF BILLAH
    cellranger-atac mkfastq
Brigham and Women's Hospital/Harvard Medical School
tbillah@bwh.harvard.edu, tashrifbillah@gmail.com


    cellranger-atac count
===============================================================================
    cellranger-atac aggr
See details at https://github.com/pnlbwh/dMRIharmonization
    cellranger-atac reanalyze
Submit issues at https://github.com/pnlbwh/dMRIharmonization/issues
View LICENSE at https://github.com/pnlbwh/dMRIharmonization/blob/master/LICENSE
===============================================================================


    cellranger-atac mkref
Template creation, harmonization, and debugging
 
    cellranger-atac testrun
    cellranger-atac upload
    cellranger-atac sitecheck
 
 
cellranger-atac count -h
 
cellranger-atac count (1.2.0)
Copyright (c) 2019 10x Genomics, Inc.  All rights reserved.
-------------------------------------------------------------------------------
 
The commands below should be preceded by 'cellranger-atac':


Usage:
Usage:
     count
     harmonization.py [SWITCHES]  
        --id=ID
        --fastqs=PATH
        [--sample=PREFIX]
        [options]
    count <run_id> <mro> [options]
    count -h | --help | --version
 
Arguments:
    id      A unique run id, used to name output folder [a-zA-Z0-9_-]+.
    fastqs  Path of folder created by mkfastq or bcl2fastq.
    sample  Prefix of the filenames of FASTQs to select.
 
Options:
# Sample Specification
    --reference=PATH Path of folder containing a 10x-compatible reference.
        Required.
    --description=TEXT  More detailed sample description. Optional.
    --lanes=NUMS        Comma-separated lane numbers.
    --indices=INDICES  Deprecated. Not needed with the output of
    cellranger-atac mkfastq, or bcl2fastq
    --project=TEXT      Name of the project folder within a mkfastq or
                            bcl2fastq-generated folder to pick FASTQs from.
# ATAC analysis
    --force-cells=N    Define the top N barcodes with the most reads as
                            cells. N must be a positive integer <=
                            20,000. Please consult the documentation
                            before using this option. Optional.
    --dim-reduce=MODE  Dimensionality reduction mode for clustering: 'lsa'
                            (default), 'plsa', or 'pca'. Optional.
# Downsampling
    --downsample=GB    Downsample input FASTQs to approximately GB
                            gigabases of input sequence. Optional.
# Martian Runtime
    --jobmode=MODE      Job manager to use. Valid options:
                            local (default), sge, lsf, or a .template file
    --localcores=NUM    Set max cores the pipeline may request at one time.
                            Only applies to local jobs.
    --localmem=NUM      Set max GB the pipeline may request at one time.
                            Only applies to local jobs.
    --localvmem=NUM    Set max virtual address space in GB for the pipeline.
                            Only applies to local jobs.
    --mempercore=NUM    Reserve enough threads for each job to ensure enough
                        memory will be available, assuming each core on your
                        cluster has at least this much memory available.
                            Only applies in cluster jobmodes.
    --maxjobs=NUM      Set max jobs submitted to cluster at one time.
                            Only applies in cluster jobmodes.
    --jobinterval=NUM  Set delay between submitting jobs to cluster, in ms.
                            Only applies in cluster jobmodes.
    --overrides=PATH    The path to a JSON file that specifies stage-level
                            overrides for cores and memory.  Finer-grained
                            than --localcores, --mempercore and --localmem.
                            Consult the 10x support website for an example
                            override file.
    --uiport=PORT      Serve web UI at http://localhost:PORT
    --disable-ui        Do not serve the UI.
    --noexit            Keep web UI running after pipestance completes or fails.
    --nopreflight      Skip preflight checks.
 
    -h --help          Show this message.
    --version          Show version.


Note: 'cellranger-atac count' works as follows:
Meta-switches:
set --fastqs to the folder containing FASTQ files. In addition,
    -h, --help                          Prints this help message and quits
set --sample to the name prefixed to the FASTQ files comprising your sample.
    --help-all                          Prints help messages of all sub-commands and quits
For example, if your FASTQs are named:
    -v, --version                      Prints the program's version and quits
    subject1_S1_L001_R1_001.fastq.gz
then set --sample=subject1


Switches:
    --bvalMap VALUE:str                specify a bmax to scale bvalues into
    --create                            turn on this flag to create template
    --debug                            turn on this flag to debug harmonized data (valid only with --process)
    --denoise                          turn on this flag to denoise voxel data
    --force                            turn on this flag to overwrite existing data
    --harm_list VALUE:ExistingFile      harmonized csv/txt file with first column for dwi and 2nd column for mask: dwi1,mask1\n dwi2,mask2\n...
    --nproc VALUE:str                  number of processes/threads to use (-1 for all available, may slow down your system); the default is 4
    --nshm VALUE:str                    spherical harmonic order; the default is -1
    --nzero VALUE:str                  number of zero padding for denoising skull region during signal reconstruction; the default is 10
    --process                          turn on this flag to harmonize
    --ref_list VALUE:ExistingFile      reference csv/txt file with first column for dwi and 2nd column for mask: dwi1,mask1\n dwi2,mask2\n...
    --ref_name VALUE:str                reference site name
    --resample VALUE:str                voxel size MxNxO to resample into
    --stats                            print statistics of all sites, useful for recomputing --debug statistics separately
    --tar_list VALUE:ExistingFile      target csv/txt file with first column for dwi and 2nd column for mask: dwi1,mask1\n dwi2,mask2\n...; required
    --tar_name VALUE:str                target site name; required
    --template VALUE:str                template directory; required
    --travelHeads                      travelling heads
</pre>
</pre>
[[#top|Back to Top]]
[[#top|Back to Top]]
Line 193: Line 133:
=== Installation ===
=== Installation ===
   
   
Source code is download from https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest
Source code is download from https://github.com/pnlbwh/dMRIharmonization
   
   
=== System ===
=== System ===
64-bit Linux
64-bit Linux

Latest revision as of 09:01, 17 September 2024


Category

Engineering

Program On

Sapelo2

Version

20240227

Author / Distributor

See https://github.com/pnlbwh/dMRIharmonization

Description

"dMRIharmonization repository is developed by Tashrif Billah, Sylvain Bouix, Suheyla Cetin Karayumak, and Yogesh Rathi, Brigham and Women's Hospital (Harvard Medical School)." More details are at https://github.com/pnlbwh/dMRIharmonization.

Running Program

  • Version 20240227 is installed as a Python virtual environment on Sapelo2 at /apps/gb/dMRIharmonization/20240227

To use it, please load the module and activate its env with:

ml dMRIharmonization/20240227
source ${EBROOTDMRIHARMONIZATION}/harmonization/bin/activate
source ${EBROOTDMRIHARMONIZATION}/../env.sh

To deactivate its env, please do:

deactivate

Below is an example of a job submission script (sub.sh) to run harmonization.py with 24 parallel processes on a single compute node on the batch partition:

#!/bin/bash
#SBATCH --job-name=test_dMRIharmonization           
#SBATCH --partition=batch          
#SBATCH --nodes=1
#SBATCH --ntasks=24
#SBATCH --cpus-per-task=1
#SBATCH --mem=100G
#SBATCH --time=7-00:00:00
#SBATCH --constraint="Genoa|Milan"

#SBATCH --mail-type=ALL     
#SBATCH --mail-user=<yourMyID>@uga.edu

cd $SLURM_SUBMIT_DIR

ml purge
ml dMRIharmonization/20240227
source ${EBROOTDMRIHARMONIZATION}/harmonization/bin/activate
source ${EBROOTDMRIHARMONIZATION}/../env.sh

export OMP_NUM_THREADS=1

harmonization.py --nproc 24 <your other options and arguments>

deactivate

In your actual submission script, please ensure that you request the appropriate computing resources for your job. For example, you can request CPU cores for running parallel processes using the Slurm headers, such as --ntasks=24 and --cpus-per-task=1.

Please note:

  • Use the header --constraint="Genoa|Milan" in your job submission script for optimal job performance.
  • The value for the header --ntasks, e.g., --ntasks=24, should match the number specified for the --nproc option on your command line, i.e., --nproc=24.
  • We highly recommend setting --cpus-per-task=1 and exporting OMP_NUM_THREADS=1 by including export OMP_NUM_THREADS=1in your job submission script.


Here is an example of job submission command:

sbatch ./sub.sh 

Documentation

ml dMRIharmonization/20240227 
source ${EBROOTDMRIHARMONIZATION}/harmonization/bin/activate
source ${EBROOTDMRIHARMONIZATION}/../env.sh
harmonization.py -h

===============================================================================
dMRIharmonization (2018) pipeline is written by-

TASHRIF BILLAH
Brigham and Women's Hospital/Harvard Medical School
tbillah@bwh.harvard.edu, tashrifbillah@gmail.com

===============================================================================
See details at https://github.com/pnlbwh/dMRIharmonization
Submit issues at https://github.com/pnlbwh/dMRIharmonization/issues
View LICENSE at https://github.com/pnlbwh/dMRIharmonization/blob/master/LICENSE
===============================================================================

Template creation, harmonization, and debugging

Usage:
    harmonization.py [SWITCHES] 

Meta-switches:
    -h, --help                          Prints this help message and quits
    --help-all                          Prints help messages of all sub-commands and quits
    -v, --version                       Prints the program's version and quits

Switches:
    --bvalMap VALUE:str                 specify a bmax to scale bvalues into
    --create                            turn on this flag to create template
    --debug                             turn on this flag to debug harmonized data (valid only with --process)
    --denoise                           turn on this flag to denoise voxel data
    --force                             turn on this flag to overwrite existing data
    --harm_list VALUE:ExistingFile      harmonized csv/txt file with first column for dwi and 2nd column for mask: dwi1,mask1\n dwi2,mask2\n...
    --nproc VALUE:str                   number of processes/threads to use (-1 for all available, may slow down your system); the default is 4
    --nshm VALUE:str                    spherical harmonic order; the default is -1
    --nzero VALUE:str                   number of zero padding for denoising skull region during signal reconstruction; the default is 10
    --process                           turn on this flag to harmonize
    --ref_list VALUE:ExistingFile       reference csv/txt file with first column for dwi and 2nd column for mask: dwi1,mask1\n dwi2,mask2\n...
    --ref_name VALUE:str                reference site name
    --resample VALUE:str                voxel size MxNxO to resample into
    --stats                             print statistics of all sites, useful for recomputing --debug statistics separately
    --tar_list VALUE:ExistingFile       target csv/txt file with first column for dwi and 2nd column for mask: dwi1,mask1\n dwi2,mask2\n...; required
    --tar_name VALUE:str                target site name; required
    --template VALUE:str                template directory; required
    --travelHeads                       travelling heads

Back to Top

Installation

Source code is download from https://github.com/pnlbwh/dMRIharmonization

System

64-bit Linux