Partek Flow Documentation

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Cell Ranger is a set of analysis pipelines that process Chromium single cell data to align reads, generate feature-barcode matrices, and perform clustering and gene expression analysis  analysis for 10X Genomics Chromium Technology[1].

Cell Ranger - ATAC in Partek Flow

The Cell Ranger - ATAC task in Partek Flow includes two different wrappers. To deal with the single cell ATAC-Seq dataset, the 'cellranger-atac count' pipeline from Cell Ranger ATAC v2.0[2] has been wrapped in Partek® Flow®  as Cell Ranger - ATAC taskFlow. It takes FASTQ files from from 'cellranger-atac mkfastq' and and performs ATAC analysis including reads reads filtering and alignment, barcode counting,   identification of transposase cut sites, peak and cell calling,  and count matrix generation. Its outputs  then becomes then become the starting point for downstream analysis for scATAC-seq data in Flow. Seq data. To process Chromium Single Cell Multiome ATAC + Gene Expression sequencing data,  ‘cellranger-arc count’ v2.0[3] has been wrapped to generate a variety of analyses pertaining to gene expression, chromatin accessibility and their linkage.

Importing Cell Ranger - ATAC data in a Partek Flow project

When importing raw reads for processing using the Cell Ranger - ATAC task for scATAC-Seq data, the user is not required to specify the data type. If you are importing 10x multiome ATAC + Gene Expression data, first select the ATAC files choose data type ATAC-Seq and complete the import. Once the import task has successfully run, add the gene expression reads to each of the samples. Remember to specify mRNA in the data type during import.

Running Cell Ranger - ATAC in Flow

To run the Cell Ranger - ATAC task task for scATAC-seq Seq data in Flow, select the Unaligned reads datanode data node, then select Cell Ranger - ATAC in in the 10x Genomics section section (left panel, Figure 1). For 10x multiome ATAC + Gene Expression data, there will be two data nodes once the FASTQ files have been imported into Flow properly - ATAC-Seq and mRNA (right panel, Figure 1). Users should select the ATAC-Seq datanode to trigger the Cell ranger - ATAC task. 

Numbered figure captions
SubtitleTextSelecting the Cell Ranger - ATAC task for converting fastqs to Single cell counts.
AnchorNameTask selection

Image ModifiedImage Added

Similar to the Cell Ranger - Gene Expression task, a first time user will be asked to create a Reference assembly. In Partek® Flow®, we will use Cell Ranger ARC 2.0.0 to create reference a Reference assembly for all 10x Genomics analysis pipelines. Please refer to our Cell Ranger - Gene Expression task manual on how to build or use Reference assembly.

Numbered figure captions
SubtitleTextCreate Cell Ranger ARC reference genome for the first time user.
AnchorNameCreate reference

Image Removed

Clicking the big grey button of Create Cell Ranger ARC 2.0.0 reference would pop up a new window where lists the requirements that users need to fill in (Figure 3). To create the same reference genomes (2020-A) that are provided in Cell Ranger by default, the transcriptome annotations are respectively GENCODE v32  for human and vM23 for mouse, which are equivalent to Ensembl 98[3]. If users don't have any options in the dropdown list, they can click Add annotation model (GTF file) for Index, or New assembly... (FASTA file)for Assembly and upload the files. 

Numbered figure captions
SubtitleTextBuild a new reference assembly with Flow Cell Ranger ARC.
AnchorNameCreate reference

Image Removed

Once the right options Once the right assembly has been chosen/provided, simply press the Create  Finish button to finishrun the task with default settings. The reference assembly of ‘Homo sapiens (human) - hg38’ has been created used as an example here (Figure 42).

Numbered figure captions
SubtitleTextCreate Human reference genome (hg38) for Run Cell Ranger - Gene ExpressionATAC task in Flow.
AnchorNameCreate reference

Image RemovedImage Added

The main task menu interface will be refreshed as above (Figure 4) for gene expression data if references have been added. Users can go ahead click the Finish button to run the task as default.  While for Feature Barcode data, there are more information needed besides reference assembly. An additional section of Protein has been added to the interface if Single cell gene expression + Cell surface protein has been selected for Feature Barcode data (Figure 5). Users need firstly push the button Select data node and select the correct data for feature of antibody capture or protein in a new pop-up window (top right, Figure 5). Then users need to upload the feature reference file (.csv) prepared for their datasets. A Feature Reference CSV file declares the molecule structure and unique Feature Barcode sequence of each feature present in the experiment. It should include at least six columns: id, name, read, pattern, sequence and feature_type. An example of TotalSeq™-B Feature Reference CSV has been linked here. Users can download it by clicking the link and use it as a template for their own data. But for more details, please refer to 10x Genomics webpage[4].different for Single Cell Multiome ATAC + Gene Expression sequencing data because the gene expression data from the very same cell has to be paired with the ATAC-Seq data (Figure 3)

Numbered figure captions
SubtitleTextRun Cell Ranger - ATAC task for 10x Single Cell Multiome ATAC and Gene Expression task for Feature Barcode data with reference assembly hg38 in Flow.
AnchorNameRun Cell ranger ARC task selection

Image Removed

...

Image Added


After the task has finished successfully, a new data node named Single cell counts will will be displayed in Flow if the task has been finished successfully (Figure 64). This data node contains a filtered feature peak barcode count matrix for gene expression ATAC-Seq data, but unified a unified feature-barcode matrix  that that contains gene expression counts alongside Feature Barcode ATAC-Seq peak counts for each cell barcode for Feature Barcode multiomic data.  To To open the task report when the task is finished, double click the output data node, or select the Task report in in the Task results section after single clicking the data node. Users then will find the   The task report (Figure 75) is the same to as the ‘Summary HTML’ from Cell Ranger ATAC output.

Numbered figure captions
SubtitleTextThe finished Cell Ranger - Gene Expression ATAC task in Flow. Feature Barcode data has been used in the example task here.
AnchorNameRun Cell ranger task

Image RemovedImage AddedImage Added

Cell Ranger -

...

ATAC task report in Flow

Task The task report is sample based. Users can use the dropdown list on the top left to switch samples.  Under the sample name, there are two tabs on each report - Summary report and Analysis report Data Quality report (Figure 75).  Important information on the Estimated Number of Cells, Mean Reads per Cell, Median Genes per Cell, Median high-quality fragments per cell, Fraction of high-quality fragments overlapping peaks, as well as information on on Sample, Sequencing, Mapping, and Sample  Cells and Cell Clustering are summarized in different panels. The Importantly, the Barcode Rank Plot has plot and the Fragment Distribution plot have also been included as an important piece in the Cells panel in section of the Summary report (Figure 75)Descriptions of metrics in the following sections can also be found by clicking the Image Added to the section header in the Summary HTML file itself.

Numbered figure captions
SubtitleTextThe example report of Cell ranger Ranger - ATAC task in Flow.
AnchorNameCell ranger task report

Image Removed

Another two plots -biplots of Sequencing Saturation and Median Genes per Cell to Mean Reads per Cell have been included in the Analysis report as they are important metrics to library complexity and sequencing depth (Figure 8).

Numbered figure captions
SubtitleTextAnalysis report of Cell ranger task in Flow.
AnchorNameCell ranger task report

Image Removed

...

Image Added

The Library Complexity section in Data Quality report plots the observed per cell complexity, measured as median unique fragments per cell, as a function of mean reads per cell (Figure 6). While the Mapping section displays the Insert Size Distribution plot, and metrics derived from it. Single Cell ATAC read pairs produce detailed information about nucleosome packing and positioning. The fragment length distribution captures the nucleosome positioning periodicity. The Targeting section shows profiling of the chromatin accessibility behavior of the library at epigenetically relevant regions in the genome. The Enrichment around TSS plot is helpful to assess the signal-to-noise ratio of the library. It is well known that Transcriptional Start Sites (TSSs) and the promoter regions around them have a higher degree of chromatin accessibility compared to other regions of the genome. The Peaks targeting plot presents the variation in the number of on-target fragments, or fragments that overlap peaks, within each barcode group. A higher percentage of the barcode fragments overlap peaks is expected for cell-associated barcodes. 

Numbered figure captions
SubtitleTextExpanded panel Data Quality report of Cell ranger Ranger - ATAC task report in Flow.
AnchorNameCell ranger task reportreport2

Image Removed

...

Image Added


The task report for Feature Barcode multiomic data analysis is the same to scRNA-seq data report.more complicated. It contains summary metrics at different levels - ATAC, gene expression, both/joint. Joint view is the default view visible upon first rendering the summary and can be accessed by clicking "Joint" at the top left corner. Metrics that are specific to the given Chromatin Accessibility library will appear in the ATAC tab. Lastly, metrics that are specific to the given gene expression library will appear in the Gene Expression tab (Figure 7). To understand the details, please refer to 10x Genomics webpage[4].

Numbered figure captions
SubtitleTextAdditional panels of Cell Ranger - Gene Expression task report for Feature Barcode Report for 10x Chromium Single Cell Multiome ATAC + Gene Expression data in Flow Cell Ranger - ATAC task.
AnchorNameCell ranger task reportreport3

Image Removed

...

Image Added



Other adjustable parameters in this task (Figure 2) include

Subsample percentileDownsample to preserve this fraction of reads.

Users can also click Configure to change the default settings In in Advanced options (Figure 42).

Include intronsCount reads mapping to intronic regions. This may improve sensitivity for samples with a significant amount of pre-mRNA molecules, such as nuclei.

Expected cells: Expected number of recovered cells. Default: 3,000 cells.

Force cells: Force pipeline to use this number of cells, bypassing the cell detection algorithm. Use this Override peak caller: To override the peak caller, users specify peaks to use in downstream analyses from supplied 3-column BED file. The supplied peaks file must be sorted by position and not contain overlapping peaks; comment lines beginning with `#` are allowed.

Force cellsDefine the top N barcodes with the most fragments overlapping peaks as cells and override the cell calling algorithm. N must be a positive integer <= 20,000. Use this option if the number of cells estimated by Cell Ranger -ATAC is not consistent with the barcode rank plot.

Memory limit (GB): Restricts Cell Ranger - Gene ExpressionATAC to use specified amount of memory (in GB) to execute pipeline stages. 


Import Cell Ranger output into Flow

If users have converted FASTQ outside of Partek, the available count matrix can be imported along with additional files (Figure 8A). Files that Flow will need to complete the import includes the following:

filtered_feature_bc_matrix.h5

per_barcode_metrics.csv (or the name is singlecell.csv)

peaks.bed

fragments.tsv.gz.tbi

fragments.tsv.gz.

Those five files can usually be found in the outs/ subdirectory within the pipeline output directory (Figure 8B). Five files are necessary per sample because scATAC-seq is more complicated than RNA-seq. If peak calling was performed on each sample/dataset independently, the peaks are unlikely to be exactly the same so all of the samples/datasets need to be merged to create a common set of peaks across the samples/datasets; this is performed during data import wherein all of the samples/datasets need to be imported at one time, not separately. To add samples, click the green + button (Figure8A).

Numbered figure captions
SubtitleTextSingle cell ATAC count matrix import in Flow.
AnchorNamecount importer

Image Added



Import FASTQs for Cell Ranger -ATAC in Flow

Although the index files (I1 or I2) are optional, we encourage users to include all of the FASTQs in the table (Figure 9) while importing data for Cell Ranger - ATAC.   

Numbered figure captions
SubtitleTextInput FASTQs for Cell Ranger - ATAC in Flow.
AnchorNamecount importer

Image Added

The import of ATAC FASTQ files are as straightforward as the (sc)RNA-Seq data. However, we need to associate the two different types of data together for 10x multiome ATAC + Gene Expression data.  This includes two steps: 1). Import the ATAC FASTQ files in the new page after clicking on the "Automatically create samples from files" in Data tab (Figure 10A) and select the data type, as ATAC-Seq, from the dropdown list (Figure 10B). 2). Move back to the Data tab and display all of the files by clicking on the Show data files button at the bottom left of Sample name table. Then click the green + button Image Added (Figure 10C) to add RNA FASTQs to the same sample. Similarly, we will select the data type from the dropdown (mRNA) before we finish the import process by clicking on the Associate file button (Figure 10D). 

Numbered figure captions
SubtitleTextImport FASTQs for 10x multiomics ATAC + GEX data in Flow.
AnchorNameimport 10x multiome

Image Added

 



References

  1. https://support.10xgenomics.com/single-cell-gene-expression/software/overview/welcome

  2. https://support.10xgenomics.com/single-cell-gene-expressionatac/software/pipelines/6.0latest/release-notes

  3. https://support.10xgenomics.com/single-cell-genemultiome-expressionatac-gex/software/pipelines/4.0latest/release-noteswhat-is-cell-ranger-arc
  4. https://support.10xgenomics.com/single-cell-genemultiome-expressionatac-gex/software/pipelines/latest/using/feature-bc-analysis#feature-refoutput/summary


Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.