Category: Statistics

  • Causality, the Philosophy, Evaluation, and the Tawhidic View

    Understanding causality is essential in everyday life because it shapes how people make decisions, assign responsibility, and anticipate outcomes. From simple actions such as taking medicine to relieve pain, to complex choices like implementing public health policies, people rely on assumptions about cause and effect. When these assumptions are unclear or mistaken, decisions may be ineffective or harmful. Reflecting on causality is therefore not merely philosophical, it directly affects daily routines, professional judgement, and ethical responsibility.

    The classical philosophical discussion of causality begins with Aristotle in the 4th century BCE. Aristotle proposed that a complete explanation of anything requires four causes. The material cause explains what something is made of, the formal cause explains what makes it the kind of thing it is, the efficient cause explains what brings it about or produces change, and the final cause explains its purpose. These causes work together rather than separately. Aristotle assumed that causes have real power in nature. Under similar conditions, similar causes will tend to produce similar effects. Nature, in this view, is orderly, purposeful, and intelligible, and human reason can understand how it operates.

    This understanding was critically examined within Islamic thought, most notably by Al-Ghazali in the 11th century CE, particularly in Tahafut al-Falasifah written around 1095. Al-Ghazali challenged the idea that natural objects possess intrinsic causal power. His critique focused on efficient causation and the notion of natural necessity. He argued that observing events occurring regularly together does not prove that one causes the other by itself. Fire does not burn by its own power, and medicine does not heal by itself. Rather, Allah creates both the apparent cause and the effect at each moment. The regularity observed in nature reflects divine custom, not independent natural necessity. Al-Ghazali did not deny purpose, but he rejected the idea that purpose is built into nature itself. Final causation, in his view, belongs to divine wisdom rather than autonomous natural processes.

    Modern discussions of causality emerged strongly in the 18th century CE through the work of David Hume, especially his writings published around 1748. Hume argued that humans never observe necessary connections between events. What we observe are repeated patterns, from which we form expectations through habit. Causality therefore becomes an inference rather than a certainty. This view influenced modern science, where causation is treated as probabilistic and open to revision. Rather than claiming absolute certainty, science evaluates causal claims based on evidence, consistency, and explanatory value.

    In applied sciences, particularly epidemiology, causality is evaluated using structured reasoning rather than philosophical proof. Austin Bradford Hill articulated this approach in 1965 by proposing considerations to assess whether an observed association is likely to be causal. These considerations accept uncertainty as unavoidable and focus on judgement rather than necessity. Causality in modern science is therefore practical, evidence-based, and aimed at guiding decisions rather than establishing metaphysical truths.

    From a tawhidic perspective, Muslims engage with all these levels of causality while maintaining a clear theological position. Islam affirms that Allah is the ultimate cause of all events. Natural causes, regularities, and scientific laws are real at the level of human experience and reasoning, but they operate only by divine permission. This allows Muslims to accept empirical causality for evaluation and action, while rejecting the idea that nature possesses independent or self-sustaining power. Causality therefore operates at two levels, an observable level that supports scientific inquiry and decision-making, and an ultimate level grounded in tawhid, where all power, purpose, and outcome return to Allah.

    In this way, causality is not rejected but properly ordered. Philosophy explains its structure, science evaluates it through evidence, and the tawhidic worldview places it within a coherent and meaningful understanding of reality and daily life.

  • The Evolution of Statistical Inference: From Formulas to Computers

    Statistics is the science of learning from data. Every time researchers use a sample to understand a population, they are practising statistical inference. Over the past century, the way we make these inferences has changed dramatically. Each new approach has brought a different philosophy about what “truth” means and how we can best estimate it.

    This article explains the main schools of thought in statistical inference, showing how they evolved from the early 1900s to the computer age. The discussion draws on Efron and Hastie’s Computer Age Statistical Inference (2021), which describes how classical, Bayesian, resampling, and modern computational methods all aim to uncover truth from limited data.

    The Classical or Frequentist Era

    The first formal school of statistical inference is known as the classical or frequentist approach. Developed by statisticians such as Ronald Fisher, Jerzy Neyman, and Egon Pearson in the early twentieth century, this framework treats the truth as something fixed but unknown. The data we observe are random samples from a larger population, and by studying these samples, we can estimate the true values that describe the population (Efron & Hastie, 2021).

    In this view, probability represents the long-run frequency of events. For example, if we say there is a 5 percent chance of a test result being significant when there is no real effect, we mean that if we repeated the same experiment many times, about 5 out of 100 would show a false signal.

    A key tool in this school is maximum likelihood estimation (MLE). It finds the values of unknown parameters that make the observed data most likely. This approach became the foundation for many classical methods, including hypothesis testing and confidence intervals. Classical inference is elegant and mathematically precise, but it depends on strong assumptions and analytical formulas. As problems became more complex, those formulas were often too difficult to compute.

    The Bayesian Revival

    An alternative view, known as Bayesian inference, was proposed much earlier, by Reverend Thomas Bayes in the 1700s. However, it became practical only in the late twentieth century, when computers made its calculations possible. Bayesian inference treats truth as something we hold beliefs about rather than something fixed and unknown.

    In this view, we start with a prior belief—what we think is likely before seeing any data—and then update it using the evidence we collect to form a posterior belief. The process reflects how humans naturally think and learn. For instance, a doctor might believe a patient probably has a certain illness based on symptoms, but then revise that belief after seeing lab results.

    Unlike classical inference, which relies on repeated sampling theory, Bayesian methods focus on how data change our level of belief. This approach is flexible and intuitive, but it requires specifying prior beliefs, which can introduce subjectivity. With modern computing, especially techniques such as Markov Chain Monte Carlo (MCMC), Bayesian methods have become widely used in fields such as medicine, economics, and artificial intelligence (Efron & Hastie, 2021).

    The Resampling Revolution

    By the 1980s, statisticians began to take advantage of computers to bypass complicated formulas entirely. Bradley Efron introduced the bootstrap, a resampling method that lets data “speak for themselves.” Instead of depending on mathematical derivations, the bootstrap repeatedly resamples from the observed data to estimate variability and uncertainty (Efron & Hastie, 2021).

    This approach belongs to what Efron and Hastie call the computer age of inference. It does not require assumptions about theoretical distributions or prior beliefs. Instead, it uses the computer to generate thousands of simulated datasets from the original sample. By examining how results vary across these resamples, statisticians can understand how stable or uncertain their findings are.

    Resampling methods changed the practice of statistics. They made inference accessible for complex problems where mathematical solutions were impossible. The bootstrap also bridged classical and modern approaches, keeping the idea of estimation but relying on computation instead of theory.

    The Modern Predictive and Machine Learning Era

    As data grew larger and more complex, statisticians faced a new challenge. Classical and Bayesian models often became too limited or too slow to handle modern datasets with thousands of variables. This led to new methods that emphasised prediction rather than pure inference.

    Techniques such as penalised regression (for example, ridge and lasso regression) and machine learning algorithms emerged to handle this complexity. These approaches trade a little accuracy for much greater stability and predictive power. Instead of focusing on exact parameter estimates, they aim to predict outcomes reliably for new data (Efron & Hastie, 2021).

    Philosophically, this represents a shift from “What is the true parameter?” to “Can we make good predictions?” Machine learning methods such as decision trees and neural networks no longer rely on probability theory in the traditional sense. They learn directly from data patterns and have transformed fields such as healthcare, finance, and climate science.

    Comparing the Philosophies

    Each school of inference reflects a different way of thinking about knowledge and truth.

    School View of Truth View of Probability Key Question
    Classical (Frequentist) Truth is fixed; data are random Long-run frequency What value makes the data most likely?
    Bayesian Truth is uncertain; beliefs can change Degree of belief How should we update what we believe after seeing data?
    Resampling Truth can be estimated from data directly Empirical variation What does the data itself say about uncertainty?
    Modern Predictive / Machine Learning Truth may be too complex to model Often not used explicitly How can we best predict new outcomes?

    Despite their differences, these schools share one purpose: to draw meaningful conclusions from imperfect data. Each arose to overcome the limitations of the previous one. The classical approach provided solid mathematical foundations. The Bayesian approach added flexibility and belief updating. Resampling empowered statisticians through computation. Modern predictive methods embraced the complexity of real-world data.

    Conclusion

    The story of statistical inference is the story of how humans have tried to reason about the unknown. From equations written by hand to millions of simulations run by computers, each generation of statisticians has pushed the boundaries of what can be learned from data.

    Efron and Hastie (2021) describe this journey as a transition from the “formula age” to the “computer age.” The essence of inference, however, remains the same: using limited evidence to understand the world. Whether through classical, Bayesian, resampling, or machine learning approaches, all aim to find truth in uncertainty and to make knowledge from data.

    References

    Efron, B., & Hastie, T. (2021). Computer age statistical inference: Student edition. Cambridge University Press.

  • Statistics and Machine Learning in Public Health: When to Use What

    If you’re trained in epidemiology or biostatistics, you likely think in terms of models, inference, and evidence. Now, with machine learning entering the scene, you’re probably hearing about algorithms that can “predict” disease, “detect” outbreaks, and “learn” from data. But while ML offers exciting possibilities, it’s important to understand how it differs from classical statistical approaches—especially when public health decisions depend on more than just prediction.

    Let’s explore how statistics and machine learning differ—not just in technique, but in mindset, use case, and the all-important concept of causality.

    How They Think

    Statistics and machine learning begin with different goals.

    Statistics is built to answer questions like: Does exposure X cause outcome Y? It aims to explain relationships, test hypotheses, and estimate effect sizes. It relies on assumptions—like randomness, independence, and model structure—to ensure that findings reflect the real world, not just the sample at hand.

    Machine learning, in contrast, asks: Given this data, what outcome should I predict? It doesn’t aim to explain but to perform—minimising error and maximising predictive accuracy, even if the relationships are complex or difficult to interpret.

    That’s a major shift. While statistics seeks truth about the population, ML seeks performance in unseen data.

    How They Work

    Statistical methods are grounded in probability theory and estimation. They involve fitting models with interpretable parameters: coefficients, confidence intervals, p-values. The analyst usually specifies the form of the model in advance, guided by theory and prior evidence.

    Machine learning models are trained through algorithms, often using large datasets and iterative techniques to optimise performance. Models like decision trees, support vector machines, and random forests find patterns without assuming linearity or distribution. You don’t always know what the model is “looking at”—you just know if it works.

    There are also hybrid approaches—like regularised regression, ensemble models, and causal forests—that blend the logic of both.

    What They Do Well

    Statistics excels in clarity and rigour. It tells you not just whether something matters, but how much, and with what certainty. It’s ideally suited for:

    Identifying risk factors Estimating treatment effects Designing policy interventions Publishing findings with transparent reasoning

    Machine learning is best when:

    Relationships are non-linear or unknown You have many predictors and large datasets You need fast, repeatable predictions (e.g. real-time risk scoring) The goal is performance, not explanation

    In short, statistics helps you understand, ML helps you predict.

    Where They Fall Short

    Statistics can break down when data gets messy—especially when model assumptions are violated or the number of variables overwhelms the number of observations. It also isn’t built to handle unstructured data like images or free text.

    Machine learning’s biggest limitation is often overlooked: it doesn’t care about causality. A model may predict hospitalisation risk with 95% accuracy, but it doesn’t tell you why. It might rely on variables that are associated, not causal. Worse, it might act on misleading proxies that look predictive but don’t offer actionable insight.

    This matters deeply in public health. Predicting who dies is not the same as preventing death. Models that ignore cause can lead to misguided interventions or unjust decisions.

    Another weakness of ML is interpretability. Many powerful algorithms (like gradient boosting or neural networks) are “black boxes”—hard to explain and harder to justify in policy decisions. While newer tools like SHAP can improve transparency, they still fall short of the clarity offered by traditional statistical models.

    When to Use Each

    Use statistics when:

    Your primary goal is inference or explanation You need to estimate effects or support causal conclusions You’re informing policy or making ethical decisions You want results that are interpretable and reportable

    Use machine learning when:

    Your primary goal is prediction or classification You’re handling high-dimensional or complex data You need scalable automation (e.g. early warning systems) You can validate predictions with real-world data

    Most importantly, if causality matters, don’t rely solely on ML—use statistical thinking or causal ML techniques that explicitly model counterfactuals and assumptions.

    What You Should Expect

    From statistics, expect:

    Clear models with interpretable outputs Transparent assumptions The ability to test hypotheses and quantify uncertainty

    From machine learning, expect:

    High performance with minimal assumptions Useful predictions even when mechanisms are unknown Some loss of interpretability (unless addressed deliberately)

    Just remember: good prediction doesn’t imply good understanding. And good models don’t always lead to good decisions—unless we interpret them wisely.

    A Path Forward for Epidemiologists and Biostatisticians

    Here’s the good news: your training in statistics and epidemiology is not a limitation—it’s your greatest asset. You already understand data, confounding, validity, and generalisability. You’re equipped to evaluate models critically and ask: Does this make sense? Is it actionable? Is it ethical?

    Start small. Try ML approaches that are extensions of what you know—like regularised logistic regression, decision trees, or ensemble methods. Explore tools like caret, tidymodels, or scikit-learn. And when you’re ready to dive deeper, look into causal ML methods like:

    • Targeted maximum likelihood estimation (TMLE)
    • Causal forests (grf)
    • Double machine learning (EconML)
    • DoWhy (for structural causal models)

    The best analysts of the future won’t just be statisticians or ML engineers—they’ll be methodologically bilingual, able to switch between explanation and prediction as the question demands.

    Your role isn’t to replace one with the other, but to integrate both—so that public health remains not just data-driven, but wisely so.

  • Training Critical Thinking and Logical Thinking in the Age of AI for Biostatistics and Epidemiology

    The arrival of generative AI tools like ChatGPT is changing the way we teach and practise biostatistics and epidemiology. Tasks that once took hours, like coding analyses or searching for information, can now be completed within minutes by simply asking the right questions. This development brings many opportunities, but it also brings new challenges. One of the biggest risks is that students may rely too much on AI without properly questioning what it produces.

    In this new environment, our responsibility as educators must shift. It is no longer enough to teach students how to use AI. We must now teach them how to think critically about AI outputs. We must train them to question, verify and improve what AI generates, not simply accept it as correct.

    Why critical thinking is important

    AI produces answers that often sound very convincing. However, sounding convincing is not the same as being right. AI tools are trained to predict the most likely words and patterns based on large amounts of data. They do not understand the meaning behind the information they provide. In biostatistics and epidemiology, where careful thinking about study design, assumptions and interpretation is vital, careless use of AI could easily lead to wrong conclusions.

    This is why students must develop a critical and questioning attitude. Every output must be seen as something to be checked, not something to be believed blindly.

    Recent academic work also supports this direction. Researchers have pointed out that users must develop what is now called “critical AI literacy”, meaning the ability to question and verify AI outputs rather than accept them passively (Ng, 2023; Mocanu, Grzyb, & Liotta, 2023). Although the terms differ, the message is the same: critical thinking remains essential when working with AI.

    How to train critical thinking when using AI

    Build a sceptical mindset

    Students should be taught from the beginning that AI is only a tool. It is not a source of truth. It should be seen like a junior intern: helpful and fast, but not always right. They should learn to ask questions such as:

    What assumptions are hidden in this output? Are the methods suggested suitable for the data and research question? Is anything important missing?

    Simple exercises, like showing students examples of AI outputs with clear mistakes, can help build this habit.

    Teach structured critical appraisal

    To help students evaluate AI outputs properly, it is useful to give them a structured way of thinking. A good framework involves five main points:

    First, methodological appropriateness

    Students must check whether the AI suggested the correct statistical method or study design. For example, if the outcome is time to death, suggesting logistic regression instead of survival analysis would be wrong.

    Second, assumptions and preconditions

    Every method has assumptions. Students must identify whether these assumptions are mentioned and whether they make sense. If assumptions are not stated, students must learn to recognise them and decide whether they are acceptable.

    Third, completeness and relevance

    Students should check whether the AI output missed important steps, variables or checks. For instance, has the AI forgotten to adjust for confounding factors? Is stratification by key variables missing?

    Fourth, logical and statistical coherence

    The reasoning must be checked for soundness. Are the conclusions supported by the results? Is there any step that does not follow logically?

    Fifth, source validation and evidence support

    Students should verify any references or evidence provided. AI sometimes produces references that do not exist or that are outdated. Cross-checking with real sources is necessary.

    By using these five points, students can build a habit of structured checking, instead of relying on their instincts alone.

    Encourage comparison and cross-verification

    Students should not depend on one AI output. They should learn to ask the same question in different ways and compare the answers. They should also check against textbooks, lectures, or real research papers.

    Practise reverse engineering

    One effective exercise is to give students an AI-generated answer with hidden mistakes and ask them to find and correct the errors. This strengthens their ability to read carefully and think independently.

    Make students teach back to AI

    Another good practice is to ask students to correct the AI. After finding an error, they should write a prompt that explains the mistake to the AI and asks for a better answer. Being able to explain an error clearly shows true understanding.

    Why logical thinking in coding and analysis planning remains essential

    Although AI can now generate codes and suggest analysis steps, it does not replace the need for human logical thinking. Writing good analysis plans and coding correctly require structured reasoning. Without this ability, students will not know how to guide AI properly, how to spot mistakes, or how to build reliable results from raw data.

    Logical thinking in analysis means asking and answering step-by-step questions such as:

    What is the research question? What are the variables and their roles? What is the right type of analysis for this question? What assumptions need to be checked? What is the correct order of steps?

    If students lose this skill and depend only on AI, they will not be able to detect when AI suggests inappropriate methods, forgets a critical step, or builds a wrong model. Therefore, teaching logical thinking in data analysis planning and coding must stay an important part of the curriculum.

    Logical planning and good coding are not simply technical skills. They reflect the student’s ability to reason clearly, to see the structure behind the problem, and to create a defensible path from data to answer. These are skills that no AI can replace.

    Ethical use of generative AI and the need for transparency

    Along with critical and logical thinking, students must also be trained to use generative AI tools ethically. They must understand that using AI does not remove their professional responsibility. If they rely on AI outputs for any part of their work, they must check it, improve it where needed, and take ownership of the final product.

    Students should also be taught about data privacy. Sensitive or identifiable information must never be shared with AI platforms, even during casual exploration or practice. Responsibility for patient confidentiality, research ethics, and academic honesty remains with the human user.

    Another important point is transparency. Whenever AI tools are used to assist in study design, data analysis, writing or summarising, this use should be openly declared. Whether in academic assignments, published articles or professional reports, readers have the right to know how AI was involved in shaping the content. Full and honest declaration supports academic integrity, maintains trust, and shows respect for the standards of research and publication.

    Students should be guided to include a simple statement such as:

    “An AI tool was used to assist with [describe briefly], and the final content has been reviewed and verified by the author.”

    By practising transparency from the beginning, students learn that AI is not something to hide, but something to use responsibly and openly.

    Building a modern curriculum

    To prepare students for this new reality, we must design courses that combine:

    Training in critical thinking when using AI outputs Training in logical thinking for building analysis plans and writing codes Training in ethical use and transparent declaration of AI assistance

    Students should be given real-world tasks where they must plan analyses from scratch, use AI as a helper but not as a leader, check every output carefully, and justify every step they take. They should also be trained to reflect on the choices they make, and on how to improve AI suggestions if they find them weak or incorrect.

    By doing this, we can prepare future biostatisticians and epidemiologists who are not only technically skilled but also intellectually strong and ethically responsible.

    A new way forward

    Teaching students to use AI critically is not just a good idea. It is essential for the future. In biostatistics and epidemiology, where errors can affect public health and policy, we must prepare a new generation who can use AI wisely without losing their own judgement.

    The best users of AI will not be those who follow it blindly, but those who can guide it with intelligence, knowledge and ethical care. Our role as teachers is to help students become leaders in the AI age, not followers.

    References

    Ng, W. (2023). Critical AI literacy: Toward empowering agency in an AI world. AI and Ethics, 3(1), 137–146. https://doi.org/10.1007/s43681-021-00065-5

    Mocanu, E., Grzyb, B., & Liotta, A. (2023). Critical thinking in AI-assisted decision-making: Challenges and opportunities. Frontiers in Artificial Intelligence, 6, Article 1052289. https://doi.org/10.3389/frai.2023.1052289

    Disclaimer

    This article discusses the responsible use of generative AI tools in education and research. It is based on current understanding and practices as of 2025. Readers are encouraged to apply critical judgement, stay updated with evolving guidelines, and ensure compliance with their institutional, professional, and ethical standards.

  • Understanding the Central Limit Theorem and Estimating Population Mean Using Sample Data

    The Central Limit Theorem (CLT) is a fundamental concept in statistics and an essential tool in biostatistics. It provides a foundation for understanding how sample data can be used to make inferences about an entire population. This article will guide students through the development and significance of the CLT, exploring the role of sample means, population means, and the measures of dispersion—particularly the standard error (SE), standard deviation (SD), and confidence interval (CI).

    The Origins of the Central Limit Theorem

    The CLT emerged in the 18th century through the work of mathematicians like Abraham de Moivre and Pierre-Simon Laplace. De Moivre, while studying probabilities in games of chance, observed that repeated trials formed a bell-shaped distribution. Laplace expanded on this by demonstrating that the sum of independent random variables approximates a normal distribution as the sample size increases. This was a profound realization because it showed that even non-normally distributed data could produce a predictable distribution of means.

    Later, Carl Friedrich Gauss solidified the concept of the normal distribution while studying measurement errors in astronomy. Gauss observed that when repeated measurements are taken, the errors typically form a bell-shaped curve. This normal distribution became the foundation for much of statistical analysis, allowing researchers to describe and predict patterns in data.

    How the Central Limit Theorem Works

    The CLT states that, for a sufficiently large sample size, the distribution of sample means will approximate a normal distribution regardless of the original population’s distribution. This result is powerful because it allows us to use a sample mean as an estimate for the population mean, even when we don’t know the distribution of the underlying data.

    In practical terms, the CLT explains why the mean of a sample often provides a reliable estimate of the population mean. As we increase the sample size, the sample mean will tend to get closer to the true population mean, creating the basis for inferential statistics.

    Understanding Central Tendency and Dispersion

    To make inferences about a population, we need to understand central tendency (mean, median, mode) and dispersion (SD, SE, CI). Central tendency measures provide a summary of where most data points fall, while dispersion measures show how spread out the data points are.

    Mean: The average value of the data points, highly sensitive to outliers.

    Median: The middle value in a sorted dataset, less affected by extreme values.

    Mode: The most frequently occurring value, useful for identifying common outcomes.

    In medical research, choosing the correct central tendency measure is important. For instance, in analyzing cholesterol levels among patients, the median might offer a more accurate “central” measure than the mean if the dataset contains extreme outliers.

    Measures of Dispersion: SD, SE, and CI

    Understanding variability is essential in research, as it indicates how consistent or spread out data points are around the mean. Here’s how SD, SE, and CI are applied:

    Standard Deviation (SD): This measures the spread of data within a sample. A high SD means individual values vary widely around the mean, while a low SD means values cluster closely around the mean.

    Standard Error (SE): This measures how much the sample mean is expected to deviate from the true population mean. The SE decreases as sample size increases, reflecting that larger samples provide more precise estimates of the population mean.

    Confidence Interval (CI): This gives a range within which the population mean likely falls. A 95% CI means we are 95% confident that the interval contains the true population mean. CIs allow researchers to report not only an estimate but also the reliability of that estimate.

    Interpreting “Small” SE and CI

    What constitutes a “small” SE or CI depends on several factors:

    1. Relative Size to the Mean: Typically, an SE or CI that is within 5-10% of the mean can be considered precise in many fields. For example, if the mean blood pressure reduction in a study is 10 mmHg, an SE between 0.5 and 1 mmHg would be considered precise because it reflects only a small percentage (5-10%) of the mean value.

    2. Clinical Relevance: In medicine, a small SE or narrow CI must also be clinically meaningful. A small SE that doesn’t offer insight into a meaningful treatment effect wouldn’t necessarily be useful.

    3. Sample Size and Precision: SE decreases as sample size increases. Larger samples reduce SE, resulting in narrower CIs, and provide more reliable estimates of the population mean.

    Proving Standard Error with R Simulation

    To demonstrate that SE accurately represents the precision of the sample mean, we can use a simulation. Here’s an R script that simulates a population, repeatedly samples it, and shows that the simulated SE (standard deviation of sample means) approximates the theoretical SE:

    # Load necessary libraries
    library(dplyr)
    library(gt)

    # Set parameters for population
    population_mean <- 120 # Hypothetical population mean (e.g., blood pressure)
    population_sd <- 15 # Hypothetical population standard deviation
    population_size <- 100000 # Large population size for accurate simulation
    num_samples <- 1000 # Number of samples to draw per sample size

    # Generate a large population
    set.seed(42)
    population_data <- rnorm(population_size, mean = population_mean, sd = population_sd)

    # Define different sample sizes for comparison
    sample_sizes <- c(10, 50, 100, 500, 1000)

    # Create a data frame to store results
    results <- data.frame(Sample_Size = integer(),
    Mean_of_Sample_Means = numeric(),
    Simulated_SE = numeric(),
    CI_Lower_Bound = numeric(),
    CI_Upper_Bound = numeric())

    # Loop through each sample size and calculate metrics
    for (sample_size in sample_sizes) {
    # Draw multiple samples and calculate mean for each sample
    sample_means <- replicate(num_samples, mean(sample(population_data, sample_size, replace = FALSE)))

    # Calculate the mean of sample means, simulated SE, and CI bounds
    mean_of_sample_means <- mean(sample_means)
    simulated_se <- sd(sample_means)
    ci_lower_bound <- mean_of_sample_means - 1.96 * simulated_se
    ci_upper_bound <- mean_of_sample_means + 1.96 * simulated_se

    # Store the results in the data frame
    results <- results %>% add_row(Sample_Size = sample_size,
    Mean_of_Sample_Means = mean_of_sample_means,
    Simulated_SE = simulated_se,
    CI_Lower_Bound = ci_lower_bound,
    CI_Upper_Bound = ci_upper_bound)
    }

    # Display the results table using gt for a neat format
    results %>%
    gt() %>%
    tab_header(title = "Comparison of SE and CI Across Different Sample Sizes")

    The output of this simulation with different sample sizes illustrates how SE and CI change with sample size:

    Explanation of Results

    In this simulation:

    • The mean of the sample means should be close to the population mean, confirming that sample means provide a reliable estimate of the true mean.

    • The standard deviation of the sample means (simulated SE) will approximate the theoretical SE, supporting that SE reflects how much the sample mean varies from the population mean.

    • The decreasing SE and narrowing CI with increasing sample size illustrate that larger samples improve the precision of the sample mean.

    Application in Medical Research

    In medical research, these measures of dispersion are vital. For instance, in a clinical trial, researchers might measure the mean reduction in blood pressure after a treatment. Reporting the mean reduction alone isn’t enough; they also report SE and CI to show the precision and reliability of this estimate. A smaller SE suggests the sample mean closely approximates the true effect in the population, while the CI gives a range within which the true effect likely falls.

    Summary

    The Central Limit Theorem and measures like SD, SE, and CI form the statistical backbone of medical research. Through understanding these concepts, researchers can confidently use sample data to estimate population parameters, assess data reliability, and make evidence-based decisions in healthcare.

    Disclaimer

    This article was created using ChatGPT for educational purposes and should not replace professional statistical advice.

  • All Swans Are White

    Photo by John Harrison

    Imagine that every swan observed in a particular region is white, leading to the belief that “all swans are white.” This conclusion appears reliable until the unexpected discovery of a black swan, which disproves this assumption. This example demonstrates a fundamental principle of hypothesis testing: science is often less about proving ideas to be universally true and more about disproving them to allow for further understanding. The discovery of a black swan challenges the certainty of the prior belief and highlights the scientific value of testing and questioning.

    A similar principle operates in the legal system, particularly in criminal trials. Here, the defendant is presumed “not guilty” until proven otherwise. This presumption of innocence serves as a null hypothesis, establishing a baseline assumption that remains in place unless substantial evidence refutes it. The burden of proof lies with the prosecution to present sufficient evidence to reject the “not guilty” assumption. If the evidence does not meet the required threshold, the verdict is “not guilty.” However, a verdict of “not guilty” does not equate to proof of innocence; it merely indicates that there was not enough evidence to reject the initial assumption. This cautious approach minimises errors in the justice process, underscoring the importance of evidence-based decision-making.

    In scientific research, hypothesis testing operates on a similar foundation, using a “null hypothesis” that provides a default assumption, usually stating that there is no effect or difference. Statisticians like Ronald A. Fisher developed hypothesis testing methods to structure scientific inquiry. Researchers gather data to test whether there is enough evidence to reject the null hypothesis. This method ensures that claims are only accepted when supported by statistically significant evidence, reducing the chance of errors.

    Why Rejection is Easier than Acceptance

    It is easier to prove that not all swans are white than to prove that every swan is white, as a single black swan is enough to challenge this certainty. In hypothesis testing, we often start with a “no effect” or “no difference” assumption because it allows us to look for evidence that disproves it rather than trying to prove a universal truth.

    Imagine researchers are testing a new drug intended to improve cancer survival rates compared to the standard treatment.

    Null Hypothesis (H₀): The new drug does not improve survival rates compared to the standard treatment (no effect).

    Alternative Hypothesis (H₁): The new drug does improve survival rates compared to the standard treatment.

    Here, the null hypothesis assumes no improvement with the new drug. By starting with this “no effect” assumption, researchers have a neutral basis for testing. If there’s strong evidence against the null hypothesis, they can reject it in favour of the alternative.

    Researchers conduct a trial with two groups of patients: one receiving the new drug and the other receiving the standard treatment. If they observe significantly higher survival rates in the group taking the new drug, they calculate the probability of seeing this difference under the assumption of “no effect.” If the probability of achieving these results by chance alone is very low (for example, below 5%, or p-value < 0.05), it suggests that the survival benefit observed is unlikely if the drug truly had no effect. This gives researchers grounds to reject the null hypothesis and conclude that the drug likely improves survival rates.

    To reject the null hypothesis, researchers only need a few trials showing significant improvement with the new drug to question the “no effect” assumption. But to prove that the drug universally improves survival rates in every case would require countless trials and still wouldn’t provide absolute certainty.

    Beyond Detecting Differences

    In hypothesis testing, the traditional approach often aims to detect a difference between groups or treatments, such as whether a new drug performs better than an existing one. However, researchers sometimes have different objectives, and other types of hypotheses can better reflect these specific goals. These include superiority, non-inferiority, and equivalence hypotheses, each of which frames the research question in a distinct way.

    Superiority Hypothesis: This hypothesis seeks to determine whether one treatment is better than another. For example, testing whether a new cancer drug improves survival rates more than the current standard drug. Here, the null hypothesis assumes the new treatment is no better than the standard, while the alternative suggests the new treatment is superior. If the data shows a statistically significant improvement with the new treatment, researchers can reject the null hypothesis and conclude that the new drug is likely superior.

    Non-Inferiority Hypothesis: In some studies, the goal is to show that a new treatment is not worse than the existing standard by more than an acceptable margin. This is often used when the new treatment may have other benefits, such as being less costly or easier to administer. For instance, researchers might test whether a new oral medication for high blood pressure is not significantly less effective than an injectable option, but is easier to use. Here, the null hypothesis assumes that the new treatment is worse than the standard by more than the acceptable margin, while the alternative hypothesis is that it is not worse. Results within the non-inferiority margin allow researchers to reject the null hypothesis and conclude that the new treatment is non-inferior.

    Equivalence Hypothesis: This hypothesis tests whether two treatments produce similar effects within a specified range. Equivalence studies are useful when researchers want to show that a new treatment performs just as well as an existing one, typically to confirm that the new treatment can replace the current standard without loss of efficacy. For example, researchers might test whether a generic drug is as effective as a brand-name drug within a small acceptable range of difference. The null hypothesis here assumes the treatments differ by more than the acceptable range, while the alternative suggests they are equivalent. Rejecting the null hypothesis indicates that the treatments can be considered interchangeable.

    Errors in Hypothesis Testing

    Hypothesis testing involves managing the risks of two main errors:

    Type I Error (False Positive): This error occurs when a true null hypothesis is incorrectly rejected. In a medical context, this could mean diagnosing a patient with a disease they do not actually have. Such an error could lead to unnecessary anxiety, additional tests, or even unnecessary treatment. In statistical terms, the probability of a Type I error is represented by alpha (α), usually known as the p-value, and is often set at 0.05.

    In hypothesis testing, the p-value helps us determine whether the results we observe are likely to reflect the real truth—an actual effect of the treatment—or are simply due to chance. When we talk about the “truth” in this context, we’re asking if the observed benefit of a new drug (for instance, an improvement in cancer survival rates) is genuinely due to the drug itself, rather than a random occurrence. The goal of statistical testing is to help us differentiate between results that reflect real-world effects and those that could have happened by coincidence.

    Type II Error (False Negative): This error occurs when a false null hypothesis is not rejected. In medical testing, this would be failing to diagnose a patient who actually has the disease, potentially leading to missed treatment. The probability of a Type II error is represented by beta (β), with the power of a test (1 – β) reflecting its ability to detect true effects.

    Both errors have real-world implications, especially in medicine, where incorrectly diagnosing a healthy patient (Type I error) or missing a diagnosis in an ill patient (Type II error) can have significant consequences.

    Looking Beyond Numbers

    The white swan story, the legal system, and hypothesis testing share a common theme: reliable conclusions require rigorous evidence and careful judgment. Hypothesis testing isn’t only about calculating probabilities and p-values; it’s about interpreting what those numbers mean in real-world contexts. By prioritising the rejection of assumptions rather than their acceptance, science follows a cautious, methodical path that allows for meaningful and reliable discoveries. This approach encourages researchers to look beyond statistical outcomes alone and consider the larger implications of their findings. Through thorough testing and the careful interpretation of evidence, hypothesis testing fosters a structured and reliable process for understanding the world.