Choosing the right test

Dr Anne Bernard (

Experimental Studies (Clinical Trial Designs)

Parallel Trial Designs

A parallel group study is a simple and commonly used clinical design, which compares two treatments or more allocated to different patients. Usually a test treatment is compared with a standard treatment (or a placebo). The allocation of patients to groups is usually achieved by randomisation. The groups are typically named the treatment group and the control group. Parallel group designs do not necessarily require the same number of patients in each group. Statistical analyses performed are usually a simple t-test of the between group difference in the outcome, which is usually a mean or a proportion.

Example of parallel design study

A 24-Week Placebo Controlled Trial of protection against sunlight in adults within Australia. A total of 200 patients are recruited and allocated to placebo or treatment group randomly. The comparison between placebo and treatment group can be performed with a t-test (parametric test). If data are non-normally distributed, the Mann-Whitney test (non parametric test) equivalent to the t-test) will be chosen.

The t-test tests the null hypothesis that there is no difference in the mean of the variable of interest between the placebo and the treatment group against the alternative one. If the null hypothesis is rejected (significant p-value) the conclusion is that there is a significant difference between the mean of the two groups.

Figure 1: Example of parallel trial design.

Cross-over trials

A cross-over trial involves two treatments, which are consecutively administered to each patient recruited in the study (Wellek, 2012). The main purpose is to provide a basis for separating treatment effects from period effects. The response is assumed to be a continuous random variable that follows the normal distribution. In the two-period cross-over design, patients are randomly assigned to one of two groups. One group receives treatment A followed by treatment B. The other group receives treatment B followed by treatment A. Each patient is measured at least twice, once with each treatment and the effectiveness of a new treatment is to be compared against the effectiveness of the currently use.

The statistical analysis of a cross-over experiment is more complex than a parallel group experiment and requires additional assumptions: no period effect and no treatment period interaction. It may be difficult to separate the treatment effect from the time effect and the carry over effect of the previous treatment.

Example of cross-over trial

A prospective, open randomised cross-over trial was conducted to evaluate the efficacy of acitretin for chemoprevention of squamous cell carcinomas and basal cell carcinomas in renal allograft recipients (George et al., 2002). Twenty-three patients with previous history of non-melanoma skin cancer were enrolled into the study. A two-period crossover design was utilized in which the effects of treatment were compared on the same subjects during two different treatment periods. Participants were randomly allocated to either receiving the treatment or remaining off the treatment. Cross-over occurred at the end of the first year. Fourteen patients were allocated to group A received the treatment (25 mg of acitetrin once daily with food with increase or decrease of the dose depending on side effects) during the first year and nine patients to group B received the drug during the second year.

The treatment effect is examined by comparing the mean differences for the two treatment orders and test it with a standard t-test for independent samples (under the hypothesis of no period effect). The presence of a period effect can be studied as well using a t-test. If the period effect is significant, the test will be adjusted in order to take the period effect into consideration (Pocock, 1983).

The purpose of the washout period is to minimise the risk of a carryover of the treatment effect from the treatment in period 1 into period 2. Still there might be a carryover effect present or other interactions between period and treatment. Therefore a test of interaction between treatment and period can be performed by comparing the mean within patient with means between the arms. If a significant interaction is found Pocock (1983) recommend to abandon the planed within patient analyse and instead analyse the between patient comparison using the first period only. Non-parametric tests instead of t-tests can be used for non-normally distributed data. More information can be found in Jones and Kenward (2003), or Senn (2002).

Sequential Trials

Sequential designs take into account the accumulating data and allow modification of the design before the initial target number of patients is reached (DeMets, 1998). The aim of statistical methods for monitoring accumulating data is to minimise either the number of patients entered into a trial or their length of follow up, basing that decision on the results of those already entered.

In sequential designs an evaluation of treatment benefit is made after each new patient primary outcome becomes available. A positive statistic reflects treatment benefit and a negative reflects harm, within the size of the test statistic reflecting the magnitude of the effect. After each new outcome, all the data collected up to that point are analysed and tests statistics (such as a maximum likelihood test statistic) are computed and compared to criteria established to control the false positive error rate (also known as Type I error), or a claim that the treatment has an effect, either beneficial or harmful, when in fact there is none. Then the trial is stopped or continued. If a trial continues to the final stage, the null hypothesis is either rejected or accepted.

Factorial Trials

Factorial trial designs are used to consider a combination of interventions such as treatment A + B, treatment A + placebo, treatment B + placebo or two placebos. The regression model statement for a simple 2 x 2 factorial design takes into account, the outcome (to explain), the mean difference on treatment A, on treatment B, and the interaction of treatment A and treatment B (e.g. whether the effect of treatment A is affected by the presence of treatment B). In most factorial trials the assumption is made that the effects of the different active interventions are independent, that is, there is no interaction (Higgins, 2008).

A 2×2 factorial trial can be seen as two trials addressing different questions. It is important that both parts of the trial are reported as if they were just a two-arm parallel group trial. The main analytical issues relate to the investigation of main effects and the interaction between the interventions in appropriate regression models. Factorial designs provide the only way to study interactions between treatment A and treatment B. This is because the design has treatment groups with all possible combinations of treatments. In the presence of interactions, it may not be possible to assess the main effects because the effect of treatment A changes according to the level of treatment B.

Combination therapies can be profitable because more symptoms can be relieved with one dose of medicine if a company can combine the active ingredients for treatment A and treatment B into one pill. Approval of a combination therapy however, requires evidence demonstrating the superiority of the AB combination therapy to the A monotherapy and the B monotherapy. We want to compare the population means for the A monotherapy, the B monotherapy, and the AB combination therapy. If the data are normally distributed, we can construct two two-sample t statistics, one comparing the AB combination treatment to the A monotherapy and the other comparing the AB combination therapy to the B monotherapy. If the p-value is not significant, the null hypothesis cannot be rejected, i.e., the AB combination is not significantly better than the A monotherapy and the B monotherapy.

Statistical tests for longitudinal studies (time to event)

When trials involve following patients for a long time (longitudinal studies) and the primary event of interest is death, relapse or development of a new disease, a different set of statistical methods are performed to analyse the data such as time to event analysis (survival analysis). It is therefore fundamental to most epidemiological cohort studies, as well as many randomised controlled trials (RCTs).

These data are commonly depicted with a Kaplan-Meier curve and help to answer the question "What proportion of the population survive beyond a specified time interval without a particular event happening?" (Singh, 2011). Kaplan-Meier curves are often used to represent survival functions, which gives the probability that a person survives longer than some specified time, t. The median (time at which, in 50% of cases, an event of interest has occurred) and the mean (average time for the event) can be derived from these curves. Some examples on clinical studies and explanations are presented in Langova (2008).

Usually not all participants are followed up until they experience the event of interest, leading to their times being 'censored'. In this case, the available information consists only of a lower bound for their actual event time. Three reasons of censoring are: When a person does not experience the event before the study ends, when a person is lost to follow-up during the study period, and when a person withdraws from the study.

There are several methods available to analyse time-to-event curves, such as Cox proportional hazards, log-rank, and Wilcoxon two sample tests. In contrast to most regression approaches (which typically involve modelling means of distributions given explanatory variables), many survival analysis models are defined in terms of the hazard (or rate) of the event of interest. Fully parametric models assume a particular form for the baseline hazard, the simplest being that it is constant over time (Poisson regression). Logistic regression is applied when the investigators examine the relationship between risk factors and various disease events. The Cox model is often preferred over the logistic model, which ignores survival time and censoring information.

The Cox model is a regression method for survival data. It takes into consideration time until an event of interest occurs and compares the cumulative probability of events over time for two or more cohorts, while adjusting other influential covariates). It provides an estimate of the hazard ratio). There are two assumptions about the Cox proportional hazard model: The hazard ratios of two people are independent of time, and are valid only for time-independent covariates, but it makes no parametric assumptions about the baseline hazard. Both the Poisson and Cox regression models assume the hazards to be proportional for individuals with different values of the explanatory variables. Statistical explanations are presented in Fleming (2000).

To compare two or more survivor functions the log rank test is usually used. This test is a form of Chi-square test for a large sample. It calculates a test statistic for testing a null hypothesis that the survival curves are the same for all groups. The test is more likely to detect a difference between groups when the risk of an event is consistently greater for one group than another. It is unlikely to detect a difference when survival curves cross. It is useful to plot survival curves when analysing survival data.

Observational Studies

An important objective of epidemiological research is to identify risk factors for disease. The selection of the study design has a decisive influence on the analysis of the study. Depending on the question being asked, the study type, and the available data, different kind of studies and statistical analyses are conducted.

Cohort Studies

In cohort studies, persons exposed to specific risk factors are compared with persons not exposed to these factors (Ressing, 2010). The occurrence of events in these two groups is observed prospectively. For example, in a cohort study leaded on 5000 women aged between 30 to 64 years and without skin cancer in Australia during 10 years, the occurrence of melanoma in this group is observed (follow-up).

Important frequency measures in cohort studies are incidence and mortality.

Important measures such as the relative risk (RR) (calculated by dividing the risk of disease for an exposed person by the risk of disease for a non-exposed person) or hazard ratio (HR) (representing instantaneous risk over the study time period rather than cumulative risk over an entire study) can be used as comparative effect measures (Case, 2002; Spruance, 2004).

Standardised incidence ratio (SIR) or standardised mortality ratio (SMR) are used to compare the incidence or mortality in the cohort with the ones in the general population (calculated by dividing the observed number of cases or deaths in the cohort by the expected number of cases or deaths). The odds ratio (OR) can also be calculated (approximation of RR for rare diseases). Effect estimates, such as RR, are usually calculated with regression models, taking influencing factors into consideration. The principle of regression is to investigate the common influence of several potential influence factors on the target parameter. The most important frequency measures and comparative measures are presented in Figure 2.

Figure 2: Calculation of important measures and comparative measures (extracted from Ressing, 2010).

Cox regression, Poisson regression or logistic regression can be used to analyse data of cohort studies, depending on the variable of interest (see Table 1). An explanation of the different statistical analyses usually performed on time-to-event data are presented in detail in Statistical tests for longitudinal studies (time-to-event).

In a Cox regression, the variable of interest is the time until the occurrence of an event (e.g, disease or death). The data are censored, meaning that not all participants could be observed throughout the entire study duration. Cox regression uses a proportional hazard model to calculate the hazard ratio. The underlying assumption is that the risk in the two groups differs by a specific factor. The interaction between factors can be also examined, and is considered in the regression model.

Poisson regression is used if the variable of interest is the number of observations of a rare event, for example, the number of melanoma cases within a defined period. To assess whether the observed effect is statistically significant, the confidence interval (CI) should, for example, be considered for all effect estimates (du Prel et al., 2009). If a statement is to be made about the number of cases of the disease caused by the risk factor, then the risk difference (RD) is considered (subtracting the risk of disease for an exposed person by the risk of disease for a non-exposed person). If the RD is 0, this means that there is no difference between exposed and non-exposed persons.

The various effect estimates described above measure the change in the frequency of a disease due to a specific risk factor. A value of 1 means that exposed persons have the same risk of falling ill as non-exposed persons. If the value is above 1, this risk factor increases the frequency of the disease. If the value is less than 1, the factor is considered to be protective. The confidence intervals (CI) and p-values help to assess whether the observed effects are statistically significant. The confidence interval includes the true value with a specific probability (usually 95%). If it does not include 1, the effect estimate is considered statistically significant.

Table 1: Calculation of the effects depending on the variable of interest and the com- parison group.
Type of variable of interestEffect estimateExampleStatistical analysis
Comparison within the study
Dichotomous Odds ratio, Relative risk skin cancer (yes/no) Two-by-two table
Dichotomous Odds ratio, Relative risk skin cancer (yes/no) Logistic regression
Dichotomous Relative Risk skin cancer (yes/no) Poisson regression
Time to first event Hazard ratio Time to death,
Time to recurrence,
Time to disease
Cox regression
Comparison of the study population with the general population
Dichotomous Standardised incidence ratio,
Standardised incidence mortality
skin cancer (yes/no) Age standardisation

Case-control Studies

In case-control studies, patients suffering from the studied disease are compared with controls that do not have the disease. Exposure is recorded retrospectively. The odds ratio (OR) is calculated as a comparative effect measure.

Example 1 of case-control study

A case-control study was performed in Valencia (Spain) on the benefits of some protective measures against sunlight on the risk of non-melanoma skin cancer (Suarez-Varela et al., 1995). Between 1990 and 1992, 276 cases of histologically confirmed NMSC and 552 control patients matched by age, sex, and area of residence were enrolled in the study and asked about their ultraviolet exposure, phenotypic features, and protective measures from sunlight.

For the binary target variable of a case-control study (disease yes/no), logistic regression is the best-suited statistical model to estimate the OR (multivariate models can be used to consider additional potential risk factors for NMSC). It is not possible to calculate RR in a case-control study, as no incidence can be calculated. OR can be interpreted as RR, if the disease is rare.

Example 2 of case-control study

A case control study was conducted in the United States to investigate the relationship between melanoma and dietary factors (Millen et al., 2004). 502 newly diagnosed patients with melanoma were recruited from pigment lesion clinics and 565 controls were recruited from outpatient clinics. Patients were requested to complete a food frequency questionnaire, which assessed diet over the previous year. Using logistic regression, odds ratios for melanoma were computed for nutrient and alcohol intake.

Cross-sectional Studies

In cross-sectional studies, the exposure and disease status are examined for a sample from a defined population at the same time point. The prevalence of diseases and the risk factors are the most important frequency measure, as well as the OR.

Example of cross-sectional study

In a cross-sectional study, with follow-up, conducted in South Wales, 1034 patients aged 60 years and over were selected from the Family Health Services Authority register (Harvey et al., 1996). This study aimed to describe the prevalence and incidence of solar keratoses and skin cancers and the natural history of solar keratoses in a random population sample. The main outcome measures were detection of the presence of solar keratoses and skin cancers on sun-exposed skin (yes/no) and photographic validation of solar keratoses and biopsy confirmation of cancers wherever possible.

The prevalence can be calculated in cross-sectional studies as a measure of frequency. It describes how frequently a specific disease or a specific risk factor occurs in a population at a defined point in time. The prevalence OR can be calculated as a measure of effect.


  • Breslow NE, Day NE. (1987). Statistical methods in cancer research. Vol. II: The analysis of cohort studies, Lyon: International Agency for Research on Cancer, IARC Sci Publ. 82.
  • Case, L. D. et al. (2002). Interpreting measures of treatment effect in cancer clinical trials. The oncologist, 7(3):181-187.
  • DeMets, D. L. (1998). Sequential designs in clinical trials. Cardiac Electrophysiology Review, 2(1):57-60.
  • Du Prel J-B. et al. (2009). Confidence interval or p-value? Part 4 of a series on evaluation of scientific publications. Dtsch Arztebl Int, 106(19):335-339.
  • Fleming, T. R. and Lin, D. Y. (2000). Survival analysis in clinical trials: past developments and future directions. Biometrics, 56(4):971-983.
  • George, R. et al. (2002). Acitretin for chemoprevention of nonmelanoma skin cancers in renal transplant recipients. Australasian journal of dermatology, 43(4):269-273.
  • Harvey, I. et al. (1996). Non-melanoma skin cancer and solar keratoses. I. Methods and descriptive results of the South Wales Skin Cancer Study. British journal of cancer, 74(8):1302.
  • Higgins, J. P. (2008). Cochrane handbook for systematic reviews of interventions (Vol. 5). Chichester: Wiley-Blackwell.
  • Jones, B. and Kenward, M. G. (2003). Design and analysis of cross-over trials (Vol. 98). CRC Press.
  • Langova, K. (2008). Survival analysis for clinical studies. Biomedical Papers, 152(2):303-307.
  • Millen, A. E. et al. (2004). Diet and melanoma in a case-control study. Cancer Epidemiology Biomarkers & Prevention, 13(6):1042-1051.
  • Ressing, M. et al. (2010). Data analysis of epidemiological studies: part 11 of a series on evaluation of scientific publications. Deutsches Arzteblatt International, 107(11):187.
  • Senn, S. (2002). Cross-over trials in clinical research (Vol. 5). John Wiley & Sons.
  • Singh, R. and Mukhopadhyay, K. (2011). Survival analysis in clinical trials: Basics and must know areas. Perspectives in clinical research, 2(4):145.
  • Spruance, S. L. et al. (2004). Hazard ratio in clinical trials. Antimicrobial agents and chemotherapy, 48(8):2787-2792.
  • Suarez-Varela, M. M. et al. (1995). Non-melanoma skin cancer: a case-control study on risk factors and protective measures. Journal of environmental pathology, toxicology and oncology: official organ of the International Society for Environmental Toxicology and Cancer, 15(2-4):255-261.
  • Wellek, S., and Blettner, M. (2012). On the proper use of the crossover design in clinical trials: part 18 of a series on evaluation of scientific publications. Deutsches Ärzteblatt International, 109(15):276.