Next Lesson - Pharmacokinetics
Abstract
- Evidence-based medicine (EBM) is the conscientious and judicious use of the best current evidence to inform clinical decisions, defined famously by David Sackett.
- The strongest single evidence for the efficacy of a drug comes from a randomised controlled trial (RCT); the strongest evidence overall comes from a systematic review and meta-analysis of multiple RCTs.
- Trials are valued for their ability to remove bias and confounding through randomisation, blinding and intention-to-treat analysis. Effect size is described by relative risk, absolute risk difference, and the number needed to treat (NNT).
- Statistical significance is summarised by the p-value and 95% confidence interval; the standard threshold for significance is p < 0.05.
Core
Introduction to Evidence-Based Medicine
Almost every drug a doctor prescribes was licensed because of a clinical trial. Almost every recommendation in NICE guidance is the synthesis of multiple trials. Understanding how the evidence is generated, summarised and weighed is therefore the foundation of safe, modern prescribing, and a substantial part of every UK medical school's pharmacology curriculum.
The classical definition, by the Canadian epidemiologist David Sackett in 1996, is worth quoting in full:
"Evidence-based medicine is the conscientious, explicit, and judicious use of current best evidence in making decisions about the care of individual patients."
The three components are equally important. Conscientious means actively seeking the evidence rather than relying on memory; explicit means making the reasoning transparent; and judicious means tempering the evidence with clinical judgement and patient preference. EBM is not "treatment by checklist"; it is the disciplined integration of evidence, expertise and the individual patient.
The Hierarchy of Evidence
Different study designs answer different questions, and not all generate equally robust evidence. The hierarchy of evidence is conventionally drawn as a pyramid, with the most reliable evidence at the top:
Diagram: The hierarchy of evidence. Systematic reviews of RCTs sit at the top because they minimise bias; expert opinion and pre-clinical research sit at the bottom.
From top to bottom:
- Systematic reviews and meta-analyses of randomised controlled trials.
- Randomised controlled trials (RCTs).
- Cohort studies: observational, follow groups exposed and not exposed forwards in time.
- Case-control studies: observational, compare those with and without disease backwards in time.
- Cross-sectional studies: observational, snapshot in time.
- Case series and case reports.
- Editorials and expert opinion.
The pyramid does not say RCTs are always better than observational studies; only that they are usually better at answering the question "does this treatment work?" Other questions (prognosis, prevalence, harm of a rare side effect) are sometimes best answered by other designs.
Clinical Trials
A clinical trial is, in the textbook definition, "any form of planned experiment which involves patients and is designed to elucidate the most appropriate method of treatment for future patients with a given medical condition."
Phases of a Clinical Trial
Drug development in the UK and internationally proceeds through four numbered phases, each with a different question:
- Phase I. Small number (20-100) of healthy volunteers. The question is "is the drug safe in humans?"; pharmacokinetics, dose finding, and tolerability.
- Phase II. Several hundred patients with the condition of interest. The question is "does the drug appear to work, and at what dose?".
- Phase III. Hundreds to thousands of patients across multiple centres. The question is "is the drug better than current standard treatment?". Almost always a randomised controlled trial. This is the phase that supports licensing.
- Phase IV. Post-marketing surveillance after the drug is licensed. The question is "what happens in the real world, and what rare adverse effects emerge?". This is where the Yellow Card Scheme contributes: covered in Pharmacovigilance and Pharmacogenetics.
The Randomised Controlled Trial
The randomised controlled trial (RCT) is the design of choice for testing whether a drug works. The basic structure is:
- Define the disease of interest and a clear research question (often framed using PICO: Population, Intervention, Comparator, Outcome).
- Identify eligible patients, define inclusion and exclusion criteria.
- Take baseline measurements of any variable that might confound the result (age, sex, severity of disease, comorbidities).
- Randomly allocate patients to the new treatment or to the comparator (placebo or current standard).
- Blind patients, clinicians and assessors where possible.
- Pre-specify a single primary outcome and a small number of secondary outcomes.
- Follow the patients for a defined period.
- Analyse on an intention-to-treat basis.
Pre-specifying a single primary outcome and a small number of secondary outcomes is a critical safeguard against data-dredging: the practice of running many statistical tests until one comes back "significant" by chance.
Bias and Confounding
Two distinct threats to the validity of a trial:
- Bias is a systematic error in the way the trial is conducted, analysed or reported. Selection bias, allocation bias, recall bias, observer bias and reporting bias are all examples.
- Confounding is a third variable that is independently associated with both the exposure and the outcome, producing an apparent association that is not causal. The classic example: does coffee cause lung cancer? Coffee drinkers tend to smoke more, and smoking causes lung cancer; smoking confounds the apparent coffee-cancer relationship.
Randomisation, blinding and intention-to-treat analysis are the three principal tools used to minimise these problems.
Randomisation
Randomisation distributes confounders: both known and unknown: evenly between treatment and control groups, on average. Methods include simple coin tosses, random number tables, and (in modern trials) computer-generated allocation. The allocation should be concealed: the person enrolling the next patient must not be able to predict which group they will be allocated to, otherwise selection bias creeps in.
Blinding
Blinding hides the treatment allocation to prevent the placebo effect, observer bias and reporting bias from distorting the results:
- Single blind: the patient does not know which treatment they are receiving.
- Double blind: neither the patient nor the clinician/assessor knows.
- Triple blind: the data analyst is also blinded; rarely used and often considered a subset of double blinding.
Where blinding is impossible (surgery, psychotherapy), researchers use sham procedures and blinded outcome assessment to recover what they can.
Intention-to-Treat versus As-Treated
Once a trial is over, the question is how to analyse the results. There are two competing principles:
- Intention-to-treat (ITT) analysis: analyses every patient in the group they were originally randomised to, regardless of whether they completed treatment, switched groups, or dropped out. ITT preserves the benefits of randomisation and reflects what would happen in the real world. This is the standard approach and is sometimes called a pragmatic analysis.
- As-treated analysis (also called per-protocol or explanatory analysis): analyses only patients who completed the protocol as intended. It can answer "what is the maximum biological effect of the drug?", but at the cost of losing randomisation, because non-compliers are systematically different from compliers.
The two approaches answer different questions. A pragmatic ITT analysis tells you how the drug will perform in clinical practice; an as-treated explanatory analysis tells you how it performs under ideal conditions. Both have their place, but ITT is the primary analysis for almost all licensing trials.
Outcomes and Effect Size
Types of Outcome
Trial outcomes are conventionally divided into three types:
- Pathophysiological outcomes: tumour size, blood pressure, HbA1c, ejection fraction. Easy to measure, but only useful if they actually predict patient-relevant events.
- Clinical outcomes: death, myocardial infarction, stroke, hospital admission. The currency of clinical decision-making.
- Patient-focused outcomes: quality of life, symptom scores, functional ability, satisfaction. Increasingly recognised as the most important measure of a treatment's value.
An ideal outcome is appropriate, valid, sensitive, specific, reliable, simple, sustainable and timely. In practice, no single measure ticks every box, and most trials use a combination.
Measures of Effect
The size of a treatment effect can be expressed in two complementary ways:
Relative Risk (RR) = Risk in treatment group ÷ Risk in control group
Absolute Risk Difference (ARD) = Risk in control group − Risk in treatment group
A relative risk of 1 means no difference between groups. A relative risk less than 1 means the treatment reduces the risk; greater than 1 means it increases it.
Relative measures can mislead when the underlying risk is small. A drug that "halves your risk of stroke" sounds impressive, but if the baseline risk was 2 in 1000, a 50% reduction is only 1 in 1000 fewer strokes; a useful effect, but not as transformative as the relative figure suggests. Always look at both the relative and the absolute numbers.
Number Needed to Treat
The number needed to treat (NNT) is the most clinically intuitive measure of effect size. It is the number of patients who must be treated for one to benefit:
NNT = 1 ÷ Absolute Risk Difference
An NNT of 10 means that for every 10 patients treated, one will benefit. The corresponding measure for harm is the number needed to harm (NNH): the number of patients treated for one to suffer an adverse outcome.
Worked examples worth remembering:
- Aspirin in secondary prevention after myocardial infarction: NNT ≈ 50 to prevent one further vascular event over two years.
- Statins in secondary prevention: NNT ≈ 30 to prevent one major vascular event over five years.
- Antibiotics for sore throat: NNT ≈ 4000 to prevent one case of rheumatic fever in modern UK populations.
Statistical Significance
P-Values
The p-value is the probability that a result at least as extreme as the one observed could have occurred by chance, if the null hypothesis (no real difference) were true. A p-value below the conventional threshold of 0.05 is described as "statistically significant".
Two important caveats:
- Statistical significance is not the same as clinical importance. A massive trial can find a tiny, clinically irrelevant difference to be highly statistically significant.
- The 0.05 threshold is a convention, not a fact. The probability that a "significant" finding is a false positive depends on prior probability, sample size and the number of tests performed.
Confidence Intervals
The 95% confidence interval (CI) is the strict frequentist concept: if the study were repeated many times, 95% of the resulting CIs would contain the true effect. The intuitive interpretation: "the true effect lies within this range with 95% probability"; is a Bayesian misreading and worth avoiding in a strict statistical sense, although it captures the practical meaning. The CI is more informative than the p-value alone, because it shows both the magnitude and precision of the estimate.
Two rules to remember:
- For a relative measure (relative risk, odds ratio): if the 95% CI includes 1, the result is not statistically significant.
- For an absolute measure (risk difference, mean difference): if the 95% CI includes 0, the result is not statistically significant.
A wide confidence interval means the data are imprecise; a narrow interval means they are precise. A wide interval that just excludes 1 is a far weaker finding than a narrow interval that excludes 1 by a large margin, even if both have p < 0.05.
Systematic Reviews and Meta-Analyses
Few clinical questions are settled by a single trial. The reliable answer comes from synthesising all the available trials together; a process formalised as the systematic review and, when statistical pooling is appropriate, the meta-analysis.
The Systematic Review
A systematic review is "an overview of primary studies that uses explicit and reproducible methods". The defining features are:
- Pre-specified protocol setting out the question, search strategy, inclusion and exclusion criteria, and analysis plan.
- Comprehensive search of multiple databases (Medline, Embase, Cochrane CENTRAL) and grey literature.
- Explicit eligibility criteria.
- Critical appraisal of each included study using a structured tool.
- Transparent synthesis of the results, narrative or quantitative.
- Reproducibility: another reviewer following the same protocol should reach the same conclusions.
The Cochrane Collaboration, founded in 1993 in Oxford, produces and maintains a library of high-quality systematic reviews and is the standard UK reference.
The Meta-Analysis
A meta-analysis is "a quantitative synthesis of the results of two or more primary studies that addressed the same hypothesis in the same way". A systematic review may include a meta-analysis, but does not have to; if the included studies are too clinically heterogeneous, statistical pooling is misleading.
The output of a meta-analysis is a pooled estimate of the effect size, with a 95% confidence interval. Each contributing study is weighted by its size and precision; larger trials with narrower confidence intervals contribute more to the pooled result.
Forest Plots
The standard graphical summary of a meta-analysis is the forest plot. Each horizontal line represents one study; the box on the line is the point estimate of effect, with the size of the box proportional to the study's weight; the line itself is the 95% CI. A vertical line at "no effect" (1 for relative measures, 0 for absolute) lets the reader see at a glance which studies showed a significant effect. The pooled estimate is plotted as a diamond at the bottom.
Diagram: A forest plot summarising a meta-analysis. Each square is a study's point estimate (size = study weight); horizontal lines are 95% CIs. The diamond at the bottom is the pooled estimate; if its width crosses the line of no effect (RR = 1), the pooled finding is not statistically significant.
Heterogeneity and Fixed-versus-Random-Effects Models
The studies in a meta-analysis are rarely identical. Differences in patient population, intervention, comparator, outcome measure, or follow-up duration produce heterogeneity. Two statistical models exist for handling this:
- Fixed-effect model. Assumes that every study is estimating the same single underlying effect, and that any differences between them are due to chance alone. Larger studies are weighted heavily.
- Random-effects model. Assumes that the true effect varies between studies (because of differences in patients, settings, etc.), and that the studies are sampling from a distribution of true effects. Weights are more evenly distributed across studies, and the resulting confidence interval is wider.
If heterogeneity is low, the two models give similar answers. If heterogeneity is high, a random-effects model is usually preferred, although either choice can be defended.
Publication Bias
Trials with positive results are more likely to be published than trials with negative or null results. This is publication bias, and it inflates the apparent effect of any drug in any meta-analysis that misses the unpublished trials. The most common diagnostic tool is the funnel plot: the size of each study (vertical axis) is plotted against the effect estimate (horizontal axis). In the absence of publication bias the plot is symmetric; asymmetry suggests missing studies.
Trial registration in advance (now mandatory for NHS-supported trials, recorded on registries such as ClinicalTrials.gov and the ISRCTN registry) is the principal regulatory response to publication bias.
Pragmatic and Explanatory Trials
Trials sit on a spectrum between two extremes:
- Explanatory trials (also "efficacy trials") test whether a drug can work under ideal conditions. They use highly selected patient populations, strict protocols, and intensive follow-up. They answer the question "does this drug have any biological effect at all?".
- Pragmatic trials (also "effectiveness trials") test whether the drug does work in real-world clinical practice. They use less restrictive entry criteria, settings that resemble normal NHS care, and less intensive follow-up. They answer the question "should we use this drug in routine practice?".
The corresponding distinction in everyday language is between efficacy (does it work in ideal conditions?) and effectiveness (does it work in the real world?). A drug can have high efficacy but low effectiveness if patients don't take it, can't tolerate it, or don't represent the trial population.
Ethics of Clinical Trials
A clinical trial is a deliberate experiment on human beings, and ethical review is mandatory before any UK trial begins. The principles are set out in the Declaration of Helsinki and in UK law (the Medicines for Human Use (Clinical Trials) Regulations 2004) and overseen by Health Research Authority Research Ethics Committees.
Three concepts are routinely tested at pre-clinical level:
- Clinical equipoise. A trial is only ethical if there is genuine uncertainty in the expert community about which treatment is better. If we already know one treatment is superior, randomising patients to the inferior one is unethical.
- Informed consent. Participants must understand what the trial involves, the risks and benefits, and that they can withdraw at any time without affecting their care.
- Do no harm. Independent ethical review continues throughout the trial, with stopping rules in case of unexpected harm or overwhelming benefit.
Summary
Research methods underpin every recommendation in modern pharmacology.
- Evidence-based medicine integrates the best current evidence with clinical expertise and patient preference.
- The hierarchy of evidence places systematic reviews and meta-analyses at the top, followed by RCTs, then cohort and case-control studies.
- Randomisation, blinding and intention-to-treat analysis are the three principal tools used to remove bias and confounding from a trial.
- Effect sizes are expressed as relative risk, absolute risk difference, and NNT; the absolute and relative figures should always be considered together.
- Statistical significance is summarised by the p-value and 95% confidence interval.
- A systematic review synthesises all available evidence; a meta-analysis pools the numbers, displayed as a forest plot.
- Trials must satisfy clinical equipoise, obtain informed consent, and operate under independent ethical review.
These principles are tested in the UK Prescribing Safety Assessment and underlie every NICE technology appraisal.
Reviewed by: Dr. Marcus Judge
- 13

