Join us for a webinar: The complexities of spatial multiomics unraveled
May 2

PGS Documentation

Page tree
Skip to end of metadata
Go to start of metadata

Introduction to Kaplan-Meier

The Kaplan-Meier estimator is a non-parametric statistic used to estimate the survival function where time-to-event incidence varies over time in a population. The Kaplan-Meier estimator is displayed as a Kaplan-Meier curve, a series of declining horizontal steps. The Kaplan-Meier curve should approach the true survival curve for the population with a sufficiently large sample size. Kaplan-Meier survival analysis can handle censored data, i.e., data where the event is not observed for some subjects. 

To perform Kaplan-Meier survival analysis, at least two pieces of information (one column each) must be provided for each sample: time-to-event (a numeric factor) and event status (categorical factor with two levels). Event status indicates whether the event occurred or the subject was censored (did not experience the event). Time-to-event indicates the time elapsed between the enrollment of a subject in the study and the occurrence of the event.

Common examples of Kaplan-Meier analysis include the fraction of patients who remain disease-free after cancer remission. In this case, the event would be disease recurrence and patients would be listed as censored if they do not experience recurrence during the study or if they drop out of the study before experiencing recurrence. 

Partek Genomics Suite does not impose any limitation on the labels used for the event and censored categories; in this tutorial, the events are coded as either "death" or "censored". If a subject is still alive at the end of the study, time-to-event indicates the period between enrollment and the end of the study. If a subject dropped out of the study, time-to-event indicates the period between enrollment and the last recorded time point.

Performing Kaplan-Meier Survival analysis

To begin, you should have the Survival Tutorial data set open in Partek Genomics Suite as shown

  • Select Stat from the main toolbar
  • Select Survival Analysis then Kaplan-Meier from the Stat menu (Figure 1)

Figure 1. Invoking Kaplan-Meier

The Kaplan-Meier dialog will open. Please note that in this tutorial data set, column 1. Survival (years) indicates the survival time of each patient in years and column 2. Event indicates the event status for each patient, death or censored. 

  • Set Time Variable to 1. Survival (years) using the drop-down menu
  • Set Event Variable to 2. Event using the drop-down menu

Only numeric data are displayed in the Time Variable drop-down list and only categorical data with two categories are displayed in Event Variable

  • Set Event Status to death using the drop-down menu (Figure 2)

Event Status should be set to the primary event outcome.

 

Figure 2. Configuring the Kaplan-Meier dialog
  • Select 3. p53 status from the Candidate(s) panel
  • Select Add Factor > to add 3. p53 status to the Strata (Categorical) panel

This will test the difference in survival rates between the p53 mutants (mutant) and samples with wild-type p53 (wt).

  • Select OK to run the test (Figure 3)

Figure 3. Configuring the Kaplan-Meier dialog to test the difference in survival rates between patients with different p53 status

The Kapan-Meier Plot will open in a new tab (Figure 4). 

 

Figure 4. Kaplan-Meier plot comparing the survival curves between two groups.

The horizontal axis indicates time-to-event; the vertical axis shows the cumulative percentage of survival. Censoring is shown as a triangle; event occurrence is shown as a step-down in the plot. Partek Genomics Suite performs two statistical tests to compare the survival curves: a log-rank test and the Wilcoxon-Gehan test. Low p-values indicate that the groups have significantly different survival times. 

  • Select the Analysis tab to switch to the Kaplan-Meier results spreadsheet (Figure 5)

Figure 5. Kaplan-Meier spreadsheet. Each row represents occurrence of at least one significant event.

The spreadsheet is organized into two sections: the analysis of the p53 mutant group and the analysis of the p53 wild type group. Each row represents a time point at which at least one event occurred; the columns provide the following information:

1. Identifies the group membership (according to the strata) 

2. Survival time corresponds to the entries in column 1. of the original (Survival_Tutorial) spreadsheet. At each given time, at least one event, either death or censored, was recorded. 

3. Probability of Survival: cumulative probability of survival at a given time point (also known as KM survival estimate). Cumulative probability is the probability of surviving all of the intervals before this time point. As time increases, the cumulative survival probabilities decreases as events occur.

4. Number of group members at risk (i.e., have not experienced the event). The count in each row is calculated by subtracting the number of deaths and censored events in the row above from the number at risk in the row above. 

5. Count of deaths at this time point in the group

6. Count of censored events at the given time in the group

7. Total number of deaths in all groups at the given time

8. Total number of participants at risk in all groups. The count in each row is calculated by subtracting the number of deaths and censored events at the previous time point in both groups from the total number at risk at the previous time point

9. Natural logarithm of column 3.; also noted as ln(KM)

10. Natural logarithm of the negative value of column 9., i.e., ln(-ln(KM)). A plot of ln(-ln(KM) vs. ln(t) is often used to test the proportional hazards assumption. To visualize the risk, select this column and select View > Log Log S Plot (Figure 6). 

Please note that the Kaplan-Meier results spreadsheet is a temporary file. If you would like to be able to view the spreadsheet again after closing Partek Genomics Suite, be sure to save it by selecting the Save Active Spreadsheet icon ().

 

Figure 6. Log Log S plot of KM data. As the lines are mostly parallel and do not cross, the log-rank test assumptions are valid. The Wilcoxon-Gehan test has more power if the lines had crossed or were not parallel but performs less well when there is extensive censored data

 

 

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Your Rating: Results: 1 Star2 Star3 Star4 Star5 Star 34 rates

  • No labels