Now that the data has been imported, we need to make a few changes to the data annotation before analysis. 

Modifying Sample Factors

Notice that the Sample ID names in column 1 are gray (Figure 1). This indicates that Sample ID is a text factor. Text factors cannot be used as a variable in downstream analysis so we need to change Sample ID to a categorical factor. 

The samples names in column 1 are now black, indicating that they have been changed to a categorical variable. Next, we will add attributes for grouping the data. 

Creating a categorical sample attribute allows us to group samples. This is useful for designating samples as replicates, as members of an experimental group, or as sharing a phenotype of interest. In this tutorial, we have four different samples from different tissues and different donors, but to illustrate the available statistical analysis options later on, we need to group the samples into two groups: muscle (heart and muscle) and not muscle (brain and liver).  

The attribute will now appear as a new column in the RNA-seq spreadsheet with the heading Tissue and the groups muscle and not muscle.