Page History

...

Next, select the Time, Event, and Event status using the drop-down window. Partek Flow will automatically guess factors that might be appropriate for these options. Click Next to proceed with the task.

Image RemovedImage Added

The predictors (factors or variables) and co-predictors in the model must be defined. Co-predictors are numeric or categorical factors that will be included in the cox regression model. Time-to-event will be performed on features (e.g. genes) by default unless Use feature expression as predictor is unchecked. If unchecked, select a factor and Add factors that is not features to model a different variable. Using the default setting, Use feature expression as predictor, lets the user Add factors to the model that act to explain the relationship for time-to-event (co-predictor) in addition to features. Choose Add interaction to add co-predictors with known dependencies. If factors are added here, they cannot be added as stratification factors. Click Next to proceed with the task.

Image RemovedImage Added

Next, the user can define comparisons for the co-predictors if they have been added. Configure contrasts by moving factors into the numerator (e.g. experimental factor) or denominator (e.g. control factor / reference), choose Combine or Pairwise, and add the comparison which will be displayed below. Combine all numerator levels and combine all denominator levels in a single comparison or choose Pairwise to split all numerator levels and split all denominator levels into a factorial set of comparisons meaning every numerator will be paired with every denominator. Multiple comparisons from different factors can be added with Add comparison. Low value filter can be used to filter by excluding features; choose a filter or select none. Click Next to proceed with the task.

...

The results of Cox regression analysis provide key information to interpret, including:
- Hazard ratio (HR): if the HR = 0.5 then half as many patients are experiencing the event compared to the control group, if the HR = 1 the event rates are the same in both groups, and if the HR = 2 then twice as many are experiencing an event compared to the control group.
- HR limit: this is the confidence interval of the hazard ratio.
- P-value: the lower the p-value, the greater the significance of the observation.

(e.g. If you have selected both a co-predictor and strata factor then a comparison using the co-predictors and Type III p-value for the co-predictor will be generated in the Cox regression report.)

Kaplan-Meier Survival Curve

The Kaplan-Meier task is used for comparing the survival curves among two or more groups of samples. The groups are defined by one or more categorical attributes (factors) specified by the user. Like in the case of Cox Regression, it is possible to use feature expression data, if available. In that case, quantitative feature expression is converted into a feature-specific categorical attribute. Each combination of the attribute levels corresponds to a distinct group. If one selects three factors with 2, 3 and 5 levels, respectively, then the total count of compared groups is 2*3*5 = 30. Therefore, selecting too many factors and/or factors with many levels may not work since the total number of samples may be not enough to fill all of the groups.

...

The Kaplan-Meier task begins similar to the Cox regression task, then differs when selecting categorical attributes to define the compared groups.

For each feature (e.g. gene), the expression values are sorted in ascending order and placed into B bins of (roughly) equal size. As a result, a feature-specific categorical attribute with B levels is constructed which can be used by itself or in combination with other categorical attributes. For instance, for B = 2 (Figure 1), we take a given feature and compute the its median feature expression and the . The samples are separated into two groupsbins, depending on whether the expression in the sample is below or above the median (Figure 1). The levels of thus created categorical attribute are automatically denoted by P_1, P_2, …, P_B. Here P stands for “percentile” and the higher the bin number the higher the feature expression of the samples in the bin. if two percentiles are chosen, the bins are automatically labeled "Low" and "High" but the text box can be used to re-label the bins. The bins are feature-specific since this procedure is repeated for each feature separately.

Numbered figure captions

SubtitleText	Selecting categorical attributes to define compared groups
AnchorName	group factor KM

Image RemovedImage Added

For each group, the survival curve (aka survival function) is estimated using Kaplan-Meier estimator [1]. For instance, if one selects FactorA with three ER status which has two levels and we choose two feature expression bins, six four survival curves are displayed in the Data Viewer (Figure 2). The Grouping configuration option can be used to split and modify the connections.

Numbered figure captions

SubtitleText	Each of the defined groups produces a survival curve
AnchorName	survival curve

Image RemovedImage Added

To see whether the survival curves are statistically different, Kaplan-Meier task runs Log-rank and Wilcoxon (aka Wilcoxon-Gehan) tests. The null hypothesis is that the survival curves do not differ among the groups (the computational details are available in [2]). When feature expression is used, the p-values are also feature specific (Figure 3). Select the step-plot icon under View to visualize the Kaplan-Meier survival curves for each gene.

Numbered figure captions

SubtitleText	Log-rank and Wilcoxon p-values when feature expression is used
AnchorName	stats

Image RemovedImage Added

Choosing stratification factors

Like in Cox Regression task, it is possible to choose stratification factor(s), but the purpose and meaning of stratification are not the same as in Cox Regression. Suppose we want to compare the survival among the six four groups defined by the three two levels FactorA of ER status and the two bins of feature expression. We can select the two factors on “Select group factor(s)” page (Figure 1). In that case, the reported p-values will reflect the statistical difference among the six four survival curves that are due to both FactorA ER status and the feature expression. Imagine that our primary interest is the effect of feature expression on survival. Although FactorA ER status can be important and therefore should be included in the model, we want to know whether the effect of feature expression is significant after the contribution of FactorA ER status is taken into account. In other words, the goal is to treat FactorA ER status as a nuisance factor and the binned feature expression as a factor of interest.

In qualitative terms, it is possible to obtain an answer if we group the survival curves by the level of FactorA. In Data Viewer, that ER status. This can be achieved via “Grouping in the Data Viewer by choosing Grouping > Split by” function by under Configure (Figure 4). That makes it easy to compare the survival curves that have the same level of FactorA ER status and avoid the comparison of curves across different levels of FactorAER status.

Numbered figure captions

SubtitleText	Grouping of survival curves by the level of a specified factor
AnchorName	grouping Survival curve

Image RemovedImage Added

If in Figure 4, we see one or more subplot subplots where the survival curves differ a lot, that is evidence that the feature expression affects the survival even after adjusting for the contribution of FactorAER status. To obtain an answer in terms of adjusted Log-rank and Wilcoxon p-values, one should deselect FactorA ER status as a “group factor” (Figure 1) and mark it as a stratification factor instead (Figure 5). The computational computation of stratification adjusted p-values is elaborated in [2].

Numbered figure captions

SubtitleText	Selecting one or more stratification factors
AnchorName	stratification factors

Image Removed

Image Added

Suppose when the feature expression and FactorA ER status are selected as “group factors” (Figure 1), Log-rank p-value is 0.001, and when FactorA ER status is marked as stratification factor, the p-value becomes 0.70. It This means that FactorA ER status is very useful for explaining the difference in survival while the feature factor is of no use if FactorA ER status is already in the model. In other words, the marginal contribution of the binned expression factor is low.

If more than two attributes are present, it is possible to measure the marginal contribution of any single factor in a similar manner: the attribute of interest should be selected as “group factor” (Figure 1) and the other attributes should be marked as stratification factors (Figure 5). There is no limit on the count of factors that can be selected as “group” or stratification, except that all of the selected factors are involved in defining the groups and the groups should contain enough samples (at least, be non-empty) for the results to be reliable.

Troubleshooting

If the task fails (no report is produced), please follow the directions in Reporting a problem.

If the task report is produced, but the results are missing for some features, it may be possible to fix the issue by following the directions in the Differential Analysis Troubleshooting section.

References

[1] Kaplan-Meier (product limit) estimator: https://en.wikipedia.org/wiki/Kaplan%E2%80%93Meier_estimator

...

Partek Flow Documentation

Page tree

Versions Compared

Old Version 24

New Version Current

Key

Kaplan-Meier Survival Curve

Choosing stratification factors

Troubleshooting

References