Page History

...

Kaplan-Meier Survival Curve

The Kaplan-Meier task is used for comparing the survival curves among two or more groups of samples. The groups are defined by one or more categorical attributes (factors) specified by the user. Like in case of Cox Regression, it is possible to use feature expression data, if available. In that case, quantitative feature expression is converted into a feature-specific categorical attribute. Each combination of the attribute levels corresponds to a distinct group. If one selects three factors with 2, 3 and 5 levels, respectively, then the total count of compared groups is 2*3*5 = 30. Therefore, selecting too many factors and/or factors with many levels may not work since the total number of samples may be not enough to fill all of the groups.

To perform Kaplan-Meier survival analysis, at least two pieces of information must be provided for each sample: time-to-event (a numeric factor) and event status (categorical factor with two levels). Event status indicates whether the event occurred or the subject was censored (did not experience the event). Time-to-event indicates the time elapsed between the enrollment of a subject in the study and the occurrence of the event. Event status indicates whether the event occurred or the subject was censored (did not experience the event). The survival curve is not straight lines connecting each point, instead a staircase pattern is used. The event status will determine the staircase pattern where each drop in the staircase represents the event occurrence.

...

Getting started with the Kaplan-Meier

...

This task will be coming soon!

task

The Kaplan-Meier task begins similar to the Cox regression task, then differs when selecting categorical attributes to define the compared groups.

For each feature, the expression values are sorted in ascending order and placed into B bins of (roughly) equal size. As a result, a feature-specific categorical attribute with B levels is constructed which can be used by itself or in combination with other categorical attributes. For instance, for B = 2, we compute the median feature expression and the samples are separated into two groups, depending on whether the expression in the sample is below or above the median, as seen below. The levels of thus created categorical attribute are automatically denoted by P_1, P_2, …, P_B. Here P stands for “percentile” and the higher the bin number the higher the feature expression of the samples in the bin.

Image Added

Each of the defined groups produces a survival curve.

For each group, the survival curve (aka survival function) is estimated using Kaplan-Meier estimator [1]. For instance, if one selects FactorA with three levels and two feature expression bins, six survival curves are displayed in Data Viewer, as shown below.

Image Added

Log-rank and Wilcoxon p-values when feature expression is used.

To see whether the survival curves are statistically different, Kaplan-Meier task runs Log-rank and Wilcoxon (aka Wilcoxon-Gehan) tests. The null hypothesis is that the survival curves do not differ among the groups (the computational details are available in [2]). When feature expression is used, the p-values are feature specific also, as seen below.

Image Added

Choosing stratification factors

Like in Cox Regression task, it is possible to choose stratification factor(s) in GUI, but the purpose and meaning of stratification are not the same as in Cox Regression. Suppose we want to compare the survival among the six groups defined by the three levels FactorA and the two bins of feature expression. We can select the two factors on “Select group factor(s)” page (Fig 1). In that case, the reported p-values will reflect the statistical difference among the six survival curves that are due to both FactorA and the feature expression. Imagine that our primary interest is the effect of feature expression on survival. Although FactorA can be important and therefore should be included in the model, we want to know whether the effect of feature expression is significant after the contribution of FactorA is taken into account. In other words, the goal is to treat FactorA as a nuisance factor and the binned feature expression as a factor of interest.

Grouping of survival curves by the level of a specified factor. In qualitative terms, it is possible to obtain an answer if we group the survival curves by the level of FactorA. In Data Viewer, that can be achieved via “Grouping > Split by” function (Fig 4). That makes it easy to compare the survival curves that have the same level of FactorA and avoid the comparison of curves across different levels of FactorA.

Image Added

If in Fig 4 we see one or more subplot where the survival curves differ a lot, that is evidence that the feature expression affects the survival even after adjusting for the contribution of FactorA. To obtain an answer in terms of adjusted Log-rank and Wilcoxon p-values, one should deselect FactorA as a “group factor” (Fig 1) and mark it as a stratification factor instead (Fig 5). The computational of stratification adjusted p-values is elaborated in [2].

Numbered figure captions

SubtitleText	Selecting one or more stratification factors
AnchorName	stratification factors

Image Added

Suppose when the feature expression and FactorA are selected as “group factors” (Fig 1), Log-rank p-value is 0.001, and when FactorA is marked as stratification factor, the p-value becomes 0.70. It means that FactorA is very useful for explaining the difference in survival while the feature factor is of no use if FactorA is already in the model. In other words, the marginal contribution of the binned expression factor is low.

If more than two attributes are present, it is possible to measure the marginal contribution of any single factor in a similar manner: the attribute of interest should be selected as “group factor” (Fig 1) and the other attributes should be marked as stratification factors (Fig 5). There is no limit on the count of factors that can be selected as “group” or stratification, except that all of the selected factors are involved in defining the groups and the groups should contain enough samples (at least, be non-empty) for the results to be reliable.

References

[1] Kaplan-Meier (product limit) estimator:

https://en.wikipedia.org/wiki/Kaplan%E2%80%93Meier_estimator

[2] Klein, Moeschberger (1997), Survival Analysis: Techniques for Censored and Truncated Data. ISBN-13: 978-0387948294

If in Fig 4 we see one or more subplot where the survival curves differ a lot, that is evidence that the feature expression affects the survival even after adjusting for the contribution of FactorA. To obtain an answer in terms of adjusted Log-rank and Wilcoxon p-values, one should deselect FactorA as a “group factor” (Fig 1) and mark it as a stratification factor instead (Fig 5). The computational of stratification adjusted p-values is elaborated in [2].

Additional assistance

Rate Macro

allowUsers	false

...

Partek Flow Documentation

Page tree

Versions Compared

Old Version 13

New Version 14

Key

Kaplan-Meier Survival Curve

Getting started with the Kaplan-Meier

task

Choosing stratification factors