Most readers of clinical research are not researchers. They are journalists, patients, family members, policy makers, and curious citizens trying to make sense of headlines that often summarize complex studies in misleading ways. The gap between what a study actually shows and what the surrounding coverage claims is, in many fields, considerable. In psychedelic research, the gap can be particularly large.
This article is a practical guide to reading clinical trial papers. It is not a methodology textbook. It is the set of habits and questions that working clinicians, methodologists, and careful science journalists use to evaluate what a study can support and what it cannot. We use psilocybin trials as our running example because they are recent, accessible, and methodologically representative of the broader landscape. The skills generalize directly to other medical and psychiatric research.
If you read all the way through and remember nothing else, remember this: in a clinical trial paper, the most important section is usually the Methods, not the Results. The Results tell you what happened. The Methods tell you what it means.
Start With the Question
Before you read the paper, you should be able to articulate what question it is trying to answer. This sounds trivial. It is not. Many readers go straight to the abstract, absorb a result, and never circle back to ask whether the result actually answers the question they had in mind.
Useful questions clinical trials can answer include: “In patients with condition X, does treatment Y produce better outcomes on measure Z than treatment W (or placebo) over time period T?” That structure — which patients, what treatment, what outcome, what comparator, what duration — is the framework that any single trial can speak to.
Useful questions clinical trials usually cannot answer in a single study include: “Is this treatment going to work for me?” “Should this treatment be standard of care?” “What is the long-term safety profile across the general population?” These broader questions require accumulated evidence from many studies, often with different designs, and rarely admit confident answers from a single paper.
When you start a paper, write down — even just in your head — what you are hoping it can tell you. Then check, as you read, whether the study design can actually support that conclusion. Often it cannot, and that is the most important takeaway.
Read the Methods First
Most readers read papers in the order they are written: title, abstract, introduction, methods, results, discussion. Trained readers often skip directly to the methods after a quick glance at the abstract. The reason is simple: until you know how a study was conducted, the results do not have a defined meaning.
Several methodological elements deserve close attention.
The population. Who was studied? What were the inclusion and exclusion criteria? What were the demographic characteristics? In psilocybin trials, common exclusion criteria include personal or family history of psychotic disorders, cardiovascular disease, pregnancy, and current substance use disorders. These exclusions are sensible safety choices, but they mean the trial cannot speak to how psilocybin would affect anyone who would have been excluded. If you read a depression trial and the patient population was screened for psychotic vulnerability, the results do not generalize to depression with comorbid psychotic features. This is not a flaw of the trial. It is a boundary on its claims.
The intervention. What exactly was administered, in what dose, over what schedule, in what setting? In psilocybin research, the intervention is usually a bundled package: drug dose plus preparation sessions plus structured dosing-day support plus integration follow-up. Treating the intervention as “psilocybin” alone misrepresents what the trial tested.
The comparator. What was the control or comparison group experiencing? An untreated waitlist is not the same comparator as an active drug; a 1 mg psilocybin “placebo” is not the same as a true placebo; a niacin pill producing flushing is not the same as an inert pill. The comparator defines what the intervention is being measured against, and weak comparators inflate apparent effects.
Randomization. Were participants assigned to groups by chance, or by some other procedure? Random assignment is the principal tool for ensuring that the groups are comparable before treatment. Non-randomized studies — observational designs, open-label single-arm trials, retrospective analyses — provide weaker evidence about cause and effect.
Blinding. Did the participants know which condition they were in? Did the people administering the treatment know? Did the people assessing outcomes know? Each of these is a separate question. Single-blind, double-blind, and triple-blind studies refer to different patterns of who knows what. In psychedelic research, true blinding is essentially impossible at meaningful doses. The participant knows; the staff usually know. This unblinding is, as we have discussed elsewhere, a structural limit on what these trials can conclude.
The outcome measures. What was being measured, and how? Was it a participant-reported scale (subject to expectancy effects), a clinician-rated measure (subject to rater bias unless the rater is blinded), or a biological outcome (more objective but possibly less clinically meaningful)? The choice of outcome measure shapes the apparent results substantially.
The Sample Size Conversation
A perennial question in reading clinical trials is whether the sample size was adequate. The honest answer is that it depends on the effect size being investigated and the variability of the outcome.
For very large effects in low-variability outcomes, small samples can produce informative findings. For modest effects in high-variability outcomes, very large samples may be needed. Most clinical trials are powered, statistically, to detect effects of a specified size with a specified probability. The methods section will usually report a sample size calculation that explains what the trial was designed to detect.
In psilocybin trials, sample sizes have historically been small — fewer than 50 participants in many published studies, sometimes fewer than 20. Small samples have several consequences. They allow only large effects to be detected. They inflate the apparent magnitude of effects that are detected. They are more vulnerable to chance findings. And they limit subgroup analyses, because dividing a small sample into smaller subgroups quickly leaves too few participants to draw conclusions from.
This is not, by itself, a reason to dismiss small studies. Pilot studies and feasibility studies are intentionally small. They are designed to answer narrow questions about whether the larger study is worth conducting. The problem arises when small pilot studies are interpreted, in the surrounding coverage, as if they answered the broader questions that only larger trials can address.
What the Results Actually Say
When you finally turn to the results section, several specific habits help.
Look at the numbers, not just the verbal summary. The text of a results section frames the numbers in a way that reflects the authors’ interpretation. The tables and figures contain the data. Sometimes the data is stronger than the text suggests; sometimes it is weaker. Reading both lets you form an independent view.
Pay attention to effect sizes, not just statistical significance. A statistically significant result tells you that the observed difference is unlikely to be due to chance, given the sample. It does not tell you the difference is large or clinically meaningful. A trial with thousands of participants can detect tiny differences with high statistical significance; the differences may still be too small to matter for any individual patient.
Look at the variability. Means and medians summarize the average response. Standard deviations, interquartile ranges, and the distribution of outcomes tell you whether the average reflects a tightly clustered response or wide variation. In psychiatric research, response is often highly variable, and the average can hide the fact that some participants improved dramatically and others not at all.
Look at the dropout rates and the handling of missing data. Patients who leave a trial early may differ systematically from those who stay. How the analysis handles their absence (intention-to-treat versus completer analysis) substantially affects results.
Look at the adverse events. Reports that focus exclusively on efficacy and minimize the discussion of difficulty are incomplete. Honest reports include both, and the language used to describe adverse events often tells you something about the authors’ framing. Generic terms like “mild and transient” may obscure events that participants experienced as significant.
The Discussion Section
The discussion section is where the authors interpret their results, place them in the context of prior research, and acknowledge limitations. It is also where the most enthusiastic language often appears.
Two questions are particularly useful when reading a discussion.
What do the authors themselves identify as limitations? Conscientious authors enumerate the limits of their work. If the discussion of limitations is brief and dismissive, that is itself a signal. If it is detailed and candid, the rest of the paper is more credible. In the Hopkins psilocybin trials, the limitations sections are notably extensive — this is a feature of the field’s better work.
Do the conclusions actually follow from the data? Compare the specific claims in the discussion to the specific results in the tables. Sometimes the discussion extends well beyond what the data can support, particularly in the closing paragraphs that look toward future implications. Trained readers learn to notice the gap.
Conflicts of Interest and Funding
Modern clinical research is conducted in a landscape where most large trials are funded by parties with financial stakes in their outcome. This is not, by itself, evidence of wrongdoing. It does require attention.
Look at who funded the trial, who employs the authors, and who holds intellectual property related to the intervention. These are usually disclosed at the end of the paper, often in a section labeled “Conflicts of Interest” or “Funding Sources.”
A trial funded by an industry sponsor is not automatically suspect, but it is statistically more likely to report positive findings than an independently funded trial. Several meta-analyses have documented this pattern across multiple fields of medicine. Trials registered in advance, with pre-specified primary outcomes that match what is reported in the published paper, are more credible than trials where the outcomes appear to have been chosen after the data was in.
In contemporary psilocybin research, an increasing proportion of trials are sponsored by companies pursuing regulatory approval. This is normal for any drug development pathway. It does mean that contemporary readers should pay particular attention to funding sources, pre-registration, and the consistency between reported and pre-specified outcomes.
Putting It Together
A useful way to integrate what you have read is to write, in your own words, two sentences when you finish a trial paper:
- What this study actually showed. The narrow, specific finding, in the population studied, with the intervention tested, against the comparator used, on the outcome measured, at the time point reported.
- What this study did not show. The larger questions you might have hoped the study addressed but that the methodology cannot support.
If you can write those two sentences without reaching for the abstract, you have understood the paper. If you cannot, you have not yet read it as carefully as it warrants.
A Worked Example
Consider, briefly, a hypothetical psilocybin depression trial that finds a 50 percent response rate in a sample of 30 patients with treatment-resistant depression, compared to 25 percent in a 1 mg psilocybin “placebo” arm, at four weeks post-treatment. The press release reports an “unprecedented breakthrough.”
A careful reader can quickly identify what this hypothetical trial supports and does not support.
Supports: Some signal of larger effect from a higher dose than a lower dose in this small sample of selected patients over a short follow-up window. Justifies further investigation.
Does not support: General clinical effectiveness, durability of effect, generalizability to less selected populations, or the absence of expectancy-driven inflation of effect (the 1 mg “placebo” was likely identifiable to participants).
The press release’s framing is enthusiastic in a way the trial cannot support. A careful reader notices, treats the trial as preliminary, and waits for replication in larger and better-controlled studies.
This is not pedantry. It is the basic operating posture of evidence-based medicine. Most readers do not need to become experts. They need to maintain enough humility about single studies that the broader weight of evidence, over time, can shape their conclusions.
A Few Habits
If you are not already in the habit of reading primary research, several small practices will help you build the skill.
Pick a topic you care about and read three full trials on it. Not abstracts. Full papers, including the methods. After three trials, you will start to notice patterns in how trials in a given field are designed and reported.
When you read coverage of a new trial in the press, look up the original paper. Compare what the paper actually says to what the coverage claims. The exercise is consistently humbling.
Subscribe to a methodology blog or newsletter that critically evaluates new research. Several exist in psychiatry and adjacent fields. Reading critiques alongside the trials sharpens your own reading.
Notice when you are most tempted to overclaim. The trials that produce the most exciting headlines are the trials whose limitations are most likely to be overlooked. Building the habit of reading carefully under those conditions is most of what research literacy means in practice.
The evidence base for psilocybin in depression and other conditions is, at this writing, real but incomplete. Reading it carefully is part of how a serious reader can stay in honest relationship to a developing field. The same skills, applied broadly, are part of how an informed citizen can navigate the larger landscape of medical research without being misled by the loudest claims.
This article is part of the Magic Mushroom Institute’s research literacy series. We provide tools for reading and evaluating clinical research. Last reviewed May 2026.