Descriptive statistics task can be invoked on matrix data node e.g. Gene Counts, Normalized Counts data node in bulk RNA seq analysis pipeline or Single Cell counts Data node etc. It calculates measures of central tendency and variability on observations or features of the matrix data.

## Running Descriptive statistics

- Click on a counts data node
- Choose
**Descriptive Statistics**in*Pre-analysis tools*section of the toolbox (Figure 1)

This will invoke the dialog configuration dialog; use it to specify which calculation(s) will be performed on cells (or samples for a bulk analysis data node) or features (Figure 2).

The available statistics are listed on the left panel, suppose "x_{1}, x_{2}, ..., x_{n}"represent an array of numbers

- Coefficient of variation (CV): s represent the standard deviation
- Geometric mean: g=
- Max:
- Mean:
- Median: when n is odd, median is , when n is even, median is
- Median absolute deviation: , where
- Min:
- Number of cells: Available when
*Calculate for*is set to*Features*. Reports the number of cells with the value [<, <=, =, !=, > >=] (select one from the drop down list) than the cut off value entered in the text box. The cut off will be applied to the values present in the input data node, i.e. if invoked on non-normalised data node, the values are raw counts. For instance, use this option if you want to know the number of cells in which each feature was detected; possible filter:*Number of cells whose value > 0.0* - Percent of cells: Available when
*Calculate for*is set to*Features*. Reports the number of cells with the value [<, <=, =, !=, > >=] (select one from the drop down list) than the cut off value entered in the text box. - Number of features: Available when
*Calculate for*is set to*Cells*. Reports the number of features with the value [<, <=, =, !=, > >=] (select one from the drop down list) than the cut off value entered in the text box. The cut off will be applied to the values present in the input data node, i.e. if invoked on non-normalised data node, the values are raw counts. For example, use this option if you want to know the number of detected genes per each cell; filter:*Number of features whose value > 0.0* - Percent of features: Available when
*Calculate for*is set to*Cells*. Reports the fraction of features with the value [<, <=, =, !=, > >=] (select one from the drop down list) than the cut off value entered in the text box.

- Q1: 25th percentile
- Q3: 75th percentile
- Range: x
_{max}- x_{min} - Standard deviation: where
- Sum:
- Variance:

**Left click** to select measurement and drag to move to the right panel one at a time, or when you mouse over on a measurement, click on the **green plus** button to move to the right panel. When *Sample (Cell)* is select, the calculation will be performed on all the features in the input matrix for each sample (or cell). When *Feature* is selected, the calculation will be performed across all the samples (cells) in the input matrix for each feature.

In addition, when *Feature* is selected, there is an extra *Group by* option (Figure 3)

From the drop-down list, choose a categorical attribute to calculate the descriptive statistics on all the subgroups for each feature.

The output of the task is a matrix: *Cell stats* (result of *Calculate for Cells*) or *Feature stats* (result of *Calculate for Features*) (Figure 4). The results can be visualized in the *Data Viewer*.

## Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Your Rating: | Results: | 11 | rates |