Creating and Analyzing a Project

Partek^® Flow^® software manages separate experiments as projects. A complete project consists of input data, tasks used to analyze the data, the resulting output files, and a list of users involved in the analysis.

This chapter provides instructions in creating and analyzing a project and covers:

Types of Data

Partek Flow can import a wide variety of data types including raw count, microarray as well as unaligned and aligned NGS data. The following file types are valid and will be recognized by the Partek Flow file browser:

idat
vcf
bz2
gz
tar
csfasta
csfastq
fastq
txt
SAM
bgx

bpm
probe_tab
CEL
qual
zip
bcf
sra
sff
fasta
BAM

There is no need to unzip compressed files.

In cases where there is paired end data, files will also be automatically recognized and their paired relationship will be maintained throughout the analysis.

Matching on paired end files is based on file names: every character in both file names must match, except for the section that determines whether a file is the first or the second file. For instance, if the first file contains "_R1", "_1", "_F3", "_F5" in the file name, the second file must contain something in the lines with the following: "_R2", "_2", "_F5", "_F5-P2", "_F5-BC", "_R3", "_R5" etc. The identifying section must be separated from the rest of the filename with underscores or dots. If two conflicting identifiers are present then the file is treated as single end. For example, s_1_1 matches s_1_2, as described above. However, s_2_1 does not mate with s_1_2 and the files will be treated as two single-end files.

Creating a New Project

Using a web browser, log in to Partek Flow. From the Home page click the New Project button; enter a project name (Figure 1) and then click Create project.

Figure 1. Partek Flow Home page and the dialog box for naming a project (inset)

The Project name is the basis of the default name of the output directory for this project. Project names are unique, thus a new project cannot have the same name as an existing project within the same Partek Flow server.

Once a new project has been created, the user is automatically directed to the Data tab of the Project View.

The Data Tab

The Data Tab is where users can add samples, import data, and assign sample attributes. This is also where users can modify the location of the project output folder.

Adding samples

For a new project, where no samples have yet been added, the Data tab will automatically prompt you to add samples (Figure 2). To add samples to the project, click Import data. Four options will be displayed.

Figure 2. The Partek Flow Data tab and selecting options for adding samples

Automatically create samples from files

This method adds samples by creating them simultaneously as the data gets imported into a project. The sample names are assigned automatically based on filenames.

Before proceeding, it is ideal that you have already copied the data you wish to analyze in a folder (with appropriate permissions) within the Partek Flow server. Please seek assistance from your system administrator in uploading your data directly.

Select the Automatically create samples from files button. The next screen will feature a file browser that will show any folders you have access to in the Partek Flow server (Figure 3). Select a folder by clicking the folder name. Files in the selected folder that have file formats that can be imported by Partek Flow will be displayed and tick-marked on the right panel (Figure 3). You can exclude some files from the folder by unselecting the check mark on the left side of the filename. When you have made your selections, click the Create sample button.

Figure 3. Selecting files in the Partek Flow server to be imported in a project

Alternatively, files can also be uploaded and imported into the project from the user's local computer. Select the My computer radio button (Figure 4) and the options of selecting the local file and the upload (destination) directory will appear. Only one file at a time can be imported to a project using this method.

Figure 4. Selecting files from the user's local computer for upload and import

Multiple data files can be compressed a single .zip file before uploading. Partek Flow will automatically unzip the files and put them in the upload directory.

Please be aware that the use of the method illustrated in Figure 4 highly depends on the speed and latency of the Internet connection between the user's computer and the server where Partek Flow is installed. Given the large size of most genomics data sets, is not recommended in most cases.

After successful creation of samples from files, the Data tab now contains a Sample management table (Figure 5). The Sample name column in the table is automatically generated based on the filenames and the table is sorted in alphabetical order.

Clicking the on the Show data files link on the lower right side of the sample management table will expand the table and reveal the filenames of the files associated with each sample (Figure 5). Conversely, clicking on Hide data files will hide the file information.

The columns in the expanded view show the files associated with each sample. Files are organized by file type. Any filename extensions that indicate compression (such as .gz) are not shown.

Figure 5. The sample management table with data files shown

Once a sample is created in a project, the files associated with it can be modified. In the expanded view, mouse over the +/- column of a sample. The highlighted icons will correspond to the options for the sample on that row.

Click the green icon ( ) to associate additional files or the red icon ( ) to dissociate a file from a sample. You can manually associate multiple files with one sample. Dissociating a file from a sample does not delete the file from the Partek Flow server.

Import samples from another project

This method adds samples from previously created projects (within the same Partek Flow server) to a new project. This option is useful for re-analyzing a dataset in a new project, analyzing a subset of a data from a previous project, or combining data from different projects

Select the Import samples from another project button. A dialog box will allow selection of the samples (Figure 6). The drop down menu Project lists the existing projects that the current user has access to. They may be previous projects from the same user, previous projects from collaborators, or (if the user has an administrative account) all the projects in the Partek Flow server.

Click the Project dropdown menu to select an existing project. In the Samples box, you will find a list of samples from the selected project. Choose the samples you wish to add to your new project by highlighting them using your mouse. If needed, use the keyboard Ctrl- or Shift- key to select more than one sample. Once you make your selection, click the Add button. These samples will now appear in the Sample management table in the Data tab.

Figure 6. Selecting samples to import from other projects

Create a new blank sample

Samples can be added one at a time by selecting the Create a new blank sample option (Figure 7). In the following dialog box, type a sample name and click Create. This process creates a sample entry in the sample management table but there is no associated file with it, hence it is a "blank sample."

Expanding the Sample management table by clicking Show data files on the lower left corner of the table will reveal the option to associate files to the blank sample.

Mouse over the +/- column and click the green icon ( ) to associate a file(s) to the sample. Perform the process for every sample in your project.

Figure 7. Adding a blank sample

Adding Files to an existing project

Additional samples can be added to any existing project simply by opening the project, going to the Data tab, and clicking the Import data button (Figure 8). Three options to add samples will be revealed.

Figure 8. The Import data button in the Partek Flow Data tab will reveal options for importing additional samples

Importing feature counts

Alternatively, if you have a matrix of data, such as raw read count data in text format, select Import feature counts. This will bring up the Input options page (Figure 9).

A box showing a text preview of the first 15 rows of the text file should help you determine on which row the relevant feature counts are located. Inspect the text preview and indicate which row the data header begins.

If the read counts are based on a compatible annotation file in Partek Flow, you can specify that annotation file under File format. Select the appropriate genome build and annotation model for your count data.

Figure 9. Importing feature counts uploads quantification data in text format

Otherwise, manually specify the orientation of the data matrix by changing the Input format drop-down menu (Figure 10). If the data has been log transformed, specify the base under Counts format.

Figure 10. Specify input format and log base of your feature counts

Project output directory

The project output directory is the folder within the Partek Flow server where all output files produced during analysis will be stored.

The default directory is configured by the Partek Flow Administrator under the Settings menu (under System Preferences > Default project output directory).

If the user does not override the default, the task output will go to a subdirectory with the name of the Project.

The user has the option of specifying an existing folder or creating a new one as the project output directory. To do so, click the icon next to the directory and specify or create a new folder in the dialog box.

Sample Annotation

After samples have been added in the project, additional information about the samples can be added. Information such as disease type, age, treatment, or sex can be annotated to the data by assigning the Attributes for each sample.
Certain tasks in Partek Flow, such as Gene-Specific Analysis, require that samples be assigned attributes in order to do statistical comparisons between groups or samples. As attributes are added to the project, additional columns in the sample management table will be created.

Sample attributes

Attributes can be managed or created within a project. Under the Data tab, click the button to open the Manage attributes page (Figure 11).

Figure 11. Managing attributes

To prepare for later data analysis using statistical tools, attributes can either be categorical or numeric (i.e., continuous).

Adding a categorical attribute

For categorical attributes, there are two levels of visibility. Project-specific categorical attributes are visible only within the current project. System-wide categorical attributes are visible across all the projects within the Partek Flow server, and are useful for maintaining uniformity of terms. Importing samples in a new project will retain the system-wide attributes, but not the project-specific attributes.

A feature of Partek Flow is the use of controlled vocabulary for categorical attributes, allowing samples to be assigned only within pre-defined categories. It was designed to effectively manage content and data and allow teams to share common definitions. The use of standard terms minimizes confusion.

To add a categorical attribute in the Manage attributes page, click the Add new attribute (Figure 12). In the dialog box, type a Name for the attribute, select the Categorical radio button next to Attribute type, select the visibility of the attribute and then click the Add button.

Figure 12. Adding a categorical attribute and defining the categories

Individual categories for the attribute must then be entered. Enter a name of the New category and click Add. The Name of the new category will show up in the table. The category can also be edited by clicking or deleted by clicking . Additionally, sub-categories can be added by clicking in the Options column. After all the categories have been added to this attribute, click Close to proceed.

Repeat the process for additional attributes of the samples in your study. When done, click Back to sample management table. Categorical attributes will default to Project-specific visibility.
An additional feature to facilitate the controlled vocabulary in Partek Flow is the integration of terms used by SNOMED CT, a healthcare terminology used for electronic health records. They can be added easily by managing the categories of an attribute and selecting Import terms from SNOMED CT shown in Figure 13. Available terms include those for body structure, specimen, clinical findings, or organism. Select multiple terms from SNOMED CT by holding Ctrl key on the keyboard while clicking the mouse to select. These terms will become new top level categories in your attribute. When terms are imported from SNOMED CT, the Term depth indicates the number of sublevels that will also be imported as subcategories in your attribute.

Figure 13. Importing SNOMED CT terms as categories

Adding a numeric attribute

To add a numeric attribute in the Manage attributes page, click the Add new attribute. In the dialog box (Figure 14), type a Name for the attribute, select the Numeric radio button next to Attribute type, and then click the Add button. Some optional parameters for numeric attributes include the Minimum value, Maximum value, and Units. When done, click Add to return to the Manage attributes page. Repeat the process add more numeric attributes. When done, click Back to sample management table.

Figure 14. Adding a numeric attribute and specifying the units

Adding a system-wide attribute

Since system-wide attributes do not have to be created by the current user, they only need to be added to the sample management table in a project.

In the Data tab, click Add a system-wide attribute button. In the dialog box that follows (Figure 15), a drop down menu is located next to Add attribute where you can select the System-wide attribute you would like to add to the project. Once selected, it will be recognized automatically as either Categorical, system-wide or the Numeric attribute.

For an System-wide categorical attribute, the different categories are listed and you have the option of pre-filling the columns with N/A (or any other category within the attribute). Click Add column and you will return to the Data Tab.

Figure 15. Adding a system-wide categorical attribute column

Assigning categories or values to attributes

After adding all the desired attributes to a project, the sample management table will show a new column for each attribute (Figure 16). The columns will initially as "N/A", as the samples have not yet been categorized or assigned a value. To edit the table, click . Assign the sample attributes by using a drop down for categorical attributes (controlled vocabulary) or typing with a keyboard for numeric attributes.

Figure 16. A "blank" sample management table showing attribute columns

When all the attributes have been entered, click Apply changes and the sample management table will be updated. After editing the sample table, make sure there are no fields with blank or N/A values before proceeding. To rename or delete attributes, click Manage attributes from the Data tab to access the Manage attributes page. Note that you cannot delete an attribute if a sample is assigned to it.

Assigning attributes using a Sample Annotation Text File

Another way to assign attributes to samples in the Data tab is to use a text file that contains the table of attributes and categories/values. This table is prepared outside of Partek Flow using any text editing software capable of saving tab-separated text files.

Using a text editor, prepare a table containing the attributes. An example is shown in Figure 17. There should only be one tab between columns with no extra tabs after the last column. In this particular example, the first column contains the filename and the text file is saved as Sampleinfo.txt.

Figure 17. A sample annotation text file. This view shows tab stops

The first row of the table in the text file contains the attributes (as headers). The first column of the table in the text file, regardless of the header of the first column, should contain either the sample names or the file names of the samples already added in Partek Flow. The first column is the unique identifier that will match the samples to the correct values or categories.

To upload sample attributes, click Assign sample attributes from a file in the Data tab. Then indicate where the attribute text file is stored and navigate to it. Partek Flow will parse the text file and present attributes that will be available for import (Figure 18).

Select the attributes you want to import by clicking the Import check box. Imported attributes that do not currently exist in the project will create new project-specific attributes.

Figure 18. Assigning attributes of samples using a sample annotation text file

You can change the name of a specific attribute by editing the Attribute name text box. Columns containing letter characters are automatically selected as categorical attributes. Columns containing numbers are suggested to be numeric attributes and can be changed to categorical using the drop down menu under Attribute type.

Guidelines for preparing the sample annotation text file

The first column is always the unique identifier and can refer only to File names or Sample names.
If using Sample names in the first column, they must match the entries of the Sample name column in the Sample management table.
If using File names in the first column, use the filenames shown in the fastq column of the expanded sample management table (see Figure 5) then add the extension .gz. All filenames must include the complete file extension (e.g., Samplename.fastq.gz).
The header name of the first column of the table (top left cell of our text table) is irrelevant but should not be left blank. Whether the first column contains File names or Sample names will be chosen during the process.
The last column cannot have empty values
Missing data (blank cells) can only be handled if the attribute is numeric. If it is categorical, please put a character in it.

It is advisable to use Sample name as the first column identifier when:

Samples are associated with more than one file (for instance, paired-end reads and/or technical replicates).
The files were imported in the SRA format (from the Sequence Read Archive database). In Partek Flow, they are automatically converted to the FASTQ format. Consequently, their filenames would change once they are imported. The new file names can be seen by expanding the sample management table, the new extension would be .fastq.gz.

If attributes are assigned from two different text files, the following will happen:

If the previous attributes have the same header and type (both are either categorical or numeric), the values are overwritten.
If there are different/additional headers on the "second round" of assignment, these new attributes will be appended to the table.
For numeric attributes, a "blank" value will not override a previous value.

Deleting or Renaming samples within a Project

In the Data tab, each sample can be renamed or deleted from the project by clicking the gear icon next to the sample name. The gear icon is readily visible upon mouse over (Figure 19). Deleting a sample from a project does not delete the associated files, which will remain on the disk.

Figure 19. Renaming or deleting a sample

You can download your completed Sample management table by clicking the Download link at the lower corner of the table. This will export a tab-delimited text file with contents of the table.

The Analyses Tab

After samples have been added and associated with valid data files, a data node will appear in the Analyses tab (Figure 20). The Analyses tab is where different analysis tools and the corresponding reports are accessed.

Figure 20. The Analyses tab showing a data node of unaligned reads

Data and Task nodes

The Analyses tab contains two elements: data nodes (circles) and task nodes (rounded rectangles) connected by lines and arrows. Collectively, they represent a data analysis pipeline.
Data nodes (Figure 21) may represent a file imported into the project, or a file generated by Partek Flow as an output of a task (e.g., alignment of FASTQ files generates BAM files).

Figure 21. Examples of different types of data nodes

Task nodes (Figure 22) represent the analysis steps performed on the data within a project. For details on the tasks available in Partek Flow, see the specific chapters of this user manual dedicated to the different tasks.

Figure 22. Examples of different types of task nodes

The context sensitive menu

Clicking on a node reveals the context sensitive menu, on the right side of the screen.

Figure 23. The context sensitive menu is revealed when a node is selected

Only the tasks that are available for the selected data node will be listed in the menu. For data nodes, actions that can be performed on that specific data type will appear.

In Figure 23, a node that contains Unaligned reads is selected (bold black line). The tasks listed are the ones that can be performed on unaligned data (QA/QC, Pre-analysis tools, and Aligners).

To hide the task pane, simply click the symbol on the upper left corner of the task pane. Clicking the triangles will collapse ( ) or expand ( ) the different categories of tasks that are shown.

After a task is performed on a data node, a new task node is created and connected to the original data node. Depending on the task, a new data node may automatically be generated that contains the resulting data.

In Figure 24, alignment was performed on the unaligned reads. Two additional nodes were added: a task node for Align reads and an output data node containing the Aligned reads.

Figure 24. Certain tasks performed on a data node generate additional data nodes. The example shows the Aligned reads node, which was generated upon alignment of the Unaligned reads node

Running a task

To run a task, select a data node and then locate the task you wish to perform from the task pane. Mouse over to see a description of the action to be performed. Click the specific task, set the additional parameters (Figure 25), and click Finish. The task will be scheduled, the display will refresh, and the screen will return to the project's Analyses tab.

In Figure 25, the STAR aligner was selected and the choices for the aligner index and additional alignment options appeared.

Figure 25. Running a STAR alignment task in Partek Flow. Dialog boxes to set the parameters appear

Tasks that are currently running (or scheduled in the queue) appear as translucent nodes. The progress of the task is indicated by the progress bar within the task node. Hovering the mouse pointer over the node will highlight the related nodes (with a thin black outline) and display the status of the task (Figure 26).

If a task is expected to generate data nodes, expected nodes appear even before the task is completed. They will have a lighter shade of color to indicate that they have not yet been generated as the task is still being performed. Once all tasks are done, all nodes would appear in the same shade.

Figure 26. A running task showing the progress indicator. The output data node, also in the lighter shade of color, appears even before the task completes. This enables the user to launch additional tasks while an upstream task is still in progress

Canceling and deleting tasks

Tasks can only be cancelled or deleted by the user that started the task. Running or pending tasks can be canceled by clicking the right mouse button on the task node and then selecting Cancel (Figure 27). Alternatively, the task node may be selected and the Cancel task selected from the task pane.

Figure 27. Canceling a task may be done by right clicking on the running task or by selecting Cancel task in the task panel

Cancelled or failed tasks are flagged by small red circles with exclamation points on the tasks nodes. Data nodes connected to incomplete tasks are also incomplete as no output can be generated (Figure 28). For failure due to errors, see the Task details.

Figure 28. Warnings indicate that the task failed (or was cancelled) and the data node is empty

To delete tasks from the project click the right mouse button on the task node and then select Delete (Figure 29). Alternatively, click the task node and select Delete task from the task pane. The nodes downstream of this task will be deleted. However, deleting the output files is optional (Figure 28, inset).

Figure 29. A task can be deleted by right clicking on the task and selecting Delete or selecting Delete task in the context sensitive menu

Task Results and Task Actions

Selecting a task node will reveal a menu pane with two sections: Task results and Task actions (Figure 30).

Figure 30. Context sensitive menu after selecting a task node

Items from the Task results section inform on the action performed in that node. Certain tasks generate a Task report (Figure 31), which include any tables or charts that the task may have produced.

Figure 31. An example of a Task report for the Trim bases task

The Task details shows detailed diagnostic information about the task (Figure 32). It includes the duration and parameters of the task, lists of input and output files, and the actual commands (in the command line) that were run.

Figure 32. An example of a Task details page for a Pre-alignment QA/QC task

Additionally, the Task details page would contain the error logs of unsuccessful runs. The user can download the logs or send them directly to Partek. This page plays an important role in diagnosing and troubleshooting issues related to task.

Double clicking on a task node will show the Task report page. However, if no report was generated, the user will be directed to the Task details page.

In the Task actions sections, the selected task can be Re-run w/new parameters, and in case it is part of a pipeline that includes additional tasks after it, running the Downstream tasks is an option. Re-running tasks will result in a new layer being made in the Analyses tab.

Another action available for a task node is Add task description (Figure 33), which is a way to add notes to the project. The user can enter a description, which will be displayed when the mouse pointer is hovered over the task node.

Figure 33. Adding a task description

Layers

It is common for next-generation sequencing data analysis to examine different task parameters for optimization. Users may want to modify an upstream step (e.g. alignment parameters) and inspect its effect on downstream results (e.g. percent aligned reads).

The implementation of Layers in Partek Flow makes optimizations easy and organized. Instead of creating separate nodes in a pipeline, another set of nodes with a different color is stacked on top of previous analyses (Figure 34). To see the parameters that were changed between runs, hover the mouse icon over the set of stacked task nodes and a pop-up balloon will display them. The text color signifies the layer corresponding to a specific parameter.

Figure 34. Layers and balloon text correspond to different parameters

Layers are formed when the same task is performed on the same data node more than once. They are also formed when a task node is selected and the Re-run it w/new parameters is selected in the task pane. This will allow the users to change the options only for the selected task. The user may choose to re-run the task to which the changes have been made, as well as all the downstream tasks until the analysis is completed. To do so, select Re-run w/new parameters, downstream tasks from the task pane.

To select a different layer, use the left mouse button to click on any node of the desired layer. All the nodes associated with the selected layer have the same color and when clicked will be displayed on the top of the stack.

Downloading Data

Data associated with any data node can be downloaded using the Download data link in the task pane (Figure 35). Compressed files will be downloaded to the local computer where the user is accessing the Partek Flow server. Note that bigger files (such as unaligned reads) would take longer to download. For guidance, a file size estimate is provided for each data node. These zipped files can easily be imported by the Partek^® Genomics Suite^® software.

Figure 35. Downloading the data from a data node using the task pane (an example is shown)

The Log Tab

The Log tab contains a table of the tasks that are running, scheduled, or those that have been completed completed within the Partek Flow project (Figure 36). It provides an overview of the task progress, enables task management, and links to detailed reports for each task.

Figure 36. The Queue tab showing running, waiting, and done tasks

Each row of the table corresponds to a task node in the Analyses tab. The list can be sorted according to a specific column using the sort icon .

The Task column lists the name of the tasks. On the left of the task name is a colored circle indicating the layer of this task. The column is searchable by task name. Clicking the task name will open the Task report page. If the task did not generate a report, the link will go to the Task details page.

The User column identifies the task owner. Aside from the user who created the project, only collaborators and users with admin privileges can start tasks in a project. Clicking a name in the User column will display the corresponding User profile.

The End column shows when the task was completed. It will show the actual time for completed tasks, and the estimated time for running tasks. These estimates improve in accuracy as more tasks are completed in the current Partek Flow instance.

The Status column displays the current status of the task, such as Waiting, Running, Done, Canceled. If the task is currently running, a status progress bar will appear in the column. Once completed, the status of a task will be Done and the End column will be updated with the completion time.

A waiting task may be waiting for upstream tasks to complete ( ) or waiting for more computing resources to be available ( ).

The Action column contains the cancel button ( ) while a task is queued or running. Clicking this button will cancel the task. A trash icon ( ) will appear in the Action column for completed, canceled or failed tasks, and will allow the task to be deleted from the project. Deleting a task in the Queue tab will remove the corresponding nodes in the Analyses tab. Unless the user has admin privileges, a user may only cancel and delete a task that he/she started. The User, End, and Status columns may be used to filter ( ) the table.

The Collaborators Tab

The Collaborators tab provides an overview of users associated with a particular project and enables project creators and administrators to add collaborators (Figure 37). A user (without administrator status) has to be specified as a collaborator in a project to be able to access the project in his/her home folder and to perform tasks.

To add a collaborator, type a username in the Add member box and click the button. Previous collaborators may be selected using the dropdown box. To delete a collaborator, select the next to their username.

Figure 37. Collaborators tab controls the user accounts that are permitted to work in the current project

Project Management

Project Archiving and Deletion

A project may be archived or deleted using the button on the upper right side of the Project View page (Figure 38).

Figure 38. Archiving or deleting a project

When deleting a project, the project will be deleted from the Partek Flow instance. In addition, the user is also given the choice to delete the output files produced by the project.

If Archive project is selected, the project will also be deleted from the Partek Flow instance (Figure 39). However, the user will be presented with a list of files which can be zipped together to create an archived project file.

Options include the raw input files, the annotation files as well at the output files associated with the project. Select the files you would like to zip up and click Archive.

Figure 39. Archiving a project

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Your Rating:

Results:

21

rates

Partek Flow Documentation

Page tree