Now that the data has been imported, we need to make a few changes to the data annotation before analysis.
Notice that the Sample ID names in column 1 are gray (Figure 1). This indicates that Sample ID is a text factor. Text factors cannot be used as a variable in downstream analysis so we need to change Sample ID to a categorical factor.
The samples names in column 1 are now black, indicating that they have been changed to a categorical variable. Next, we will add attributes for grouping the data.
Creating a categorical sample attribute allows us to group samples. This is useful for designating samples as replicates, as members of an experimental group, or as sharing a phenotype of interest. In this tutorial, we have four different samples from different tissues and different donors, but to illustrate the available statistical analysis options, we need to divide the samples into two groups: muscle (Heart and Muscle) and not muscle (Brain and Liver).
The attribute will now appear as a new column in the RNA-seq spreadsheet with the heading Tissue and the groups muscle and not muscle.
The next available step in the Import panel of the RNA-seq workflow is Choose Sample ID Column. Verifying the correct column is designated the Sample ID becomes particularly important when data from multiple experiments is being combined.
The next step is to assess the quality of the data by checking the alignments per read.
A new child spreadsheet will be created named Allignment_Counts (Figure 7).