Tag: biostatistics

  • The Evolution of Statistical Inference: From Formulas to Computers

    Statistics is the science of learning from data. Every time researchers use a sample to understand a population, they are practising statistical inference. Over the past century, the way we make these inferences has changed dramatically. Each new approach has brought a different philosophy about what “truth” means and how we can best estimate it.

    This article explains the main schools of thought in statistical inference, showing how they evolved from the early 1900s to the computer age. The discussion draws on Efron and Hastie’s Computer Age Statistical Inference (2021), which describes how classical, Bayesian, resampling, and modern computational methods all aim to uncover truth from limited data.

    The Classical or Frequentist Era

    The first formal school of statistical inference is known as the classical or frequentist approach. Developed by statisticians such as Ronald Fisher, Jerzy Neyman, and Egon Pearson in the early twentieth century, this framework treats the truth as something fixed but unknown. The data we observe are random samples from a larger population, and by studying these samples, we can estimate the true values that describe the population (Efron & Hastie, 2021).

    In this view, probability represents the long-run frequency of events. For example, if we say there is a 5 percent chance of a test result being significant when there is no real effect, we mean that if we repeated the same experiment many times, about 5 out of 100 would show a false signal.

    A key tool in this school is maximum likelihood estimation (MLE). It finds the values of unknown parameters that make the observed data most likely. This approach became the foundation for many classical methods, including hypothesis testing and confidence intervals. Classical inference is elegant and mathematically precise, but it depends on strong assumptions and analytical formulas. As problems became more complex, those formulas were often too difficult to compute.

    The Bayesian Revival

    An alternative view, known as Bayesian inference, was proposed much earlier, by Reverend Thomas Bayes in the 1700s. However, it became practical only in the late twentieth century, when computers made its calculations possible. Bayesian inference treats truth as something we hold beliefs about rather than something fixed and unknown.

    In this view, we start with a prior belief—what we think is likely before seeing any data—and then update it using the evidence we collect to form a posterior belief. The process reflects how humans naturally think and learn. For instance, a doctor might believe a patient probably has a certain illness based on symptoms, but then revise that belief after seeing lab results.

    Unlike classical inference, which relies on repeated sampling theory, Bayesian methods focus on how data change our level of belief. This approach is flexible and intuitive, but it requires specifying prior beliefs, which can introduce subjectivity. With modern computing, especially techniques such as Markov Chain Monte Carlo (MCMC), Bayesian methods have become widely used in fields such as medicine, economics, and artificial intelligence (Efron & Hastie, 2021).

    The Resampling Revolution

    By the 1980s, statisticians began to take advantage of computers to bypass complicated formulas entirely. Bradley Efron introduced the bootstrap, a resampling method that lets data “speak for themselves.” Instead of depending on mathematical derivations, the bootstrap repeatedly resamples from the observed data to estimate variability and uncertainty (Efron & Hastie, 2021).

    This approach belongs to what Efron and Hastie call the computer age of inference. It does not require assumptions about theoretical distributions or prior beliefs. Instead, it uses the computer to generate thousands of simulated datasets from the original sample. By examining how results vary across these resamples, statisticians can understand how stable or uncertain their findings are.

    Resampling methods changed the practice of statistics. They made inference accessible for complex problems where mathematical solutions were impossible. The bootstrap also bridged classical and modern approaches, keeping the idea of estimation but relying on computation instead of theory.

    The Modern Predictive and Machine Learning Era

    As data grew larger and more complex, statisticians faced a new challenge. Classical and Bayesian models often became too limited or too slow to handle modern datasets with thousands of variables. This led to new methods that emphasised prediction rather than pure inference.

    Techniques such as penalised regression (for example, ridge and lasso regression) and machine learning algorithms emerged to handle this complexity. These approaches trade a little accuracy for much greater stability and predictive power. Instead of focusing on exact parameter estimates, they aim to predict outcomes reliably for new data (Efron & Hastie, 2021).

    Philosophically, this represents a shift from “What is the true parameter?” to “Can we make good predictions?” Machine learning methods such as decision trees and neural networks no longer rely on probability theory in the traditional sense. They learn directly from data patterns and have transformed fields such as healthcare, finance, and climate science.

    Comparing the Philosophies

    Each school of inference reflects a different way of thinking about knowledge and truth.

    School View of Truth View of Probability Key Question
    Classical (Frequentist) Truth is fixed; data are random Long-run frequency What value makes the data most likely?
    Bayesian Truth is uncertain; beliefs can change Degree of belief How should we update what we believe after seeing data?
    Resampling Truth can be estimated from data directly Empirical variation What does the data itself say about uncertainty?
    Modern Predictive / Machine Learning Truth may be too complex to model Often not used explicitly How can we best predict new outcomes?

    Despite their differences, these schools share one purpose: to draw meaningful conclusions from imperfect data. Each arose to overcome the limitations of the previous one. The classical approach provided solid mathematical foundations. The Bayesian approach added flexibility and belief updating. Resampling empowered statisticians through computation. Modern predictive methods embraced the complexity of real-world data.

    Conclusion

    The story of statistical inference is the story of how humans have tried to reason about the unknown. From equations written by hand to millions of simulations run by computers, each generation of statisticians has pushed the boundaries of what can be learned from data.

    Efron and Hastie (2021) describe this journey as a transition from the “formula age” to the “computer age.” The essence of inference, however, remains the same: using limited evidence to understand the world. Whether through classical, Bayesian, resampling, or machine learning approaches, all aim to find truth in uncertainty and to make knowledge from data.

    References

    Efron, B., & Hastie, T. (2021). Computer age statistical inference: Student edition. Cambridge University Press.

  • Training Critical Thinking and Logical Thinking in the Age of AI for Biostatistics and Epidemiology

    The arrival of generative AI tools like ChatGPT is changing the way we teach and practise biostatistics and epidemiology. Tasks that once took hours, like coding analyses or searching for information, can now be completed within minutes by simply asking the right questions. This development brings many opportunities, but it also brings new challenges. One of the biggest risks is that students may rely too much on AI without properly questioning what it produces.

    In this new environment, our responsibility as educators must shift. It is no longer enough to teach students how to use AI. We must now teach them how to think critically about AI outputs. We must train them to question, verify and improve what AI generates, not simply accept it as correct.

    Why critical thinking is important

    AI produces answers that often sound very convincing. However, sounding convincing is not the same as being right. AI tools are trained to predict the most likely words and patterns based on large amounts of data. They do not understand the meaning behind the information they provide. In biostatistics and epidemiology, where careful thinking about study design, assumptions and interpretation is vital, careless use of AI could easily lead to wrong conclusions.

    This is why students must develop a critical and questioning attitude. Every output must be seen as something to be checked, not something to be believed blindly.

    Recent academic work also supports this direction. Researchers have pointed out that users must develop what is now called “critical AI literacy”, meaning the ability to question and verify AI outputs rather than accept them passively (Ng, 2023; Mocanu, Grzyb, & Liotta, 2023). Although the terms differ, the message is the same: critical thinking remains essential when working with AI.

    How to train critical thinking when using AI

    Build a sceptical mindset

    Students should be taught from the beginning that AI is only a tool. It is not a source of truth. It should be seen like a junior intern: helpful and fast, but not always right. They should learn to ask questions such as:

    What assumptions are hidden in this output? Are the methods suggested suitable for the data and research question? Is anything important missing?

    Simple exercises, like showing students examples of AI outputs with clear mistakes, can help build this habit.

    Teach structured critical appraisal

    To help students evaluate AI outputs properly, it is useful to give them a structured way of thinking. A good framework involves five main points:

    First, methodological appropriateness

    Students must check whether the AI suggested the correct statistical method or study design. For example, if the outcome is time to death, suggesting logistic regression instead of survival analysis would be wrong.

    Second, assumptions and preconditions

    Every method has assumptions. Students must identify whether these assumptions are mentioned and whether they make sense. If assumptions are not stated, students must learn to recognise them and decide whether they are acceptable.

    Third, completeness and relevance

    Students should check whether the AI output missed important steps, variables or checks. For instance, has the AI forgotten to adjust for confounding factors? Is stratification by key variables missing?

    Fourth, logical and statistical coherence

    The reasoning must be checked for soundness. Are the conclusions supported by the results? Is there any step that does not follow logically?

    Fifth, source validation and evidence support

    Students should verify any references or evidence provided. AI sometimes produces references that do not exist or that are outdated. Cross-checking with real sources is necessary.

    By using these five points, students can build a habit of structured checking, instead of relying on their instincts alone.

    Encourage comparison and cross-verification

    Students should not depend on one AI output. They should learn to ask the same question in different ways and compare the answers. They should also check against textbooks, lectures, or real research papers.

    Practise reverse engineering

    One effective exercise is to give students an AI-generated answer with hidden mistakes and ask them to find and correct the errors. This strengthens their ability to read carefully and think independently.

    Make students teach back to AI

    Another good practice is to ask students to correct the AI. After finding an error, they should write a prompt that explains the mistake to the AI and asks for a better answer. Being able to explain an error clearly shows true understanding.

    Why logical thinking in coding and analysis planning remains essential

    Although AI can now generate codes and suggest analysis steps, it does not replace the need for human logical thinking. Writing good analysis plans and coding correctly require structured reasoning. Without this ability, students will not know how to guide AI properly, how to spot mistakes, or how to build reliable results from raw data.

    Logical thinking in analysis means asking and answering step-by-step questions such as:

    What is the research question? What are the variables and their roles? What is the right type of analysis for this question? What assumptions need to be checked? What is the correct order of steps?

    If students lose this skill and depend only on AI, they will not be able to detect when AI suggests inappropriate methods, forgets a critical step, or builds a wrong model. Therefore, teaching logical thinking in data analysis planning and coding must stay an important part of the curriculum.

    Logical planning and good coding are not simply technical skills. They reflect the student’s ability to reason clearly, to see the structure behind the problem, and to create a defensible path from data to answer. These are skills that no AI can replace.

    Ethical use of generative AI and the need for transparency

    Along with critical and logical thinking, students must also be trained to use generative AI tools ethically. They must understand that using AI does not remove their professional responsibility. If they rely on AI outputs for any part of their work, they must check it, improve it where needed, and take ownership of the final product.

    Students should also be taught about data privacy. Sensitive or identifiable information must never be shared with AI platforms, even during casual exploration or practice. Responsibility for patient confidentiality, research ethics, and academic honesty remains with the human user.

    Another important point is transparency. Whenever AI tools are used to assist in study design, data analysis, writing or summarising, this use should be openly declared. Whether in academic assignments, published articles or professional reports, readers have the right to know how AI was involved in shaping the content. Full and honest declaration supports academic integrity, maintains trust, and shows respect for the standards of research and publication.

    Students should be guided to include a simple statement such as:

    “An AI tool was used to assist with [describe briefly], and the final content has been reviewed and verified by the author.”

    By practising transparency from the beginning, students learn that AI is not something to hide, but something to use responsibly and openly.

    Building a modern curriculum

    To prepare students for this new reality, we must design courses that combine:

    Training in critical thinking when using AI outputs Training in logical thinking for building analysis plans and writing codes Training in ethical use and transparent declaration of AI assistance

    Students should be given real-world tasks where they must plan analyses from scratch, use AI as a helper but not as a leader, check every output carefully, and justify every step they take. They should also be trained to reflect on the choices they make, and on how to improve AI suggestions if they find them weak or incorrect.

    By doing this, we can prepare future biostatisticians and epidemiologists who are not only technically skilled but also intellectually strong and ethically responsible.

    A new way forward

    Teaching students to use AI critically is not just a good idea. It is essential for the future. In biostatistics and epidemiology, where errors can affect public health and policy, we must prepare a new generation who can use AI wisely without losing their own judgement.

    The best users of AI will not be those who follow it blindly, but those who can guide it with intelligence, knowledge and ethical care. Our role as teachers is to help students become leaders in the AI age, not followers.

    References

    Ng, W. (2023). Critical AI literacy: Toward empowering agency in an AI world. AI and Ethics, 3(1), 137–146. https://doi.org/10.1007/s43681-021-00065-5

    Mocanu, E., Grzyb, B., & Liotta, A. (2023). Critical thinking in AI-assisted decision-making: Challenges and opportunities. Frontiers in Artificial Intelligence, 6, Article 1052289. https://doi.org/10.3389/frai.2023.1052289

    Disclaimer

    This article discusses the responsible use of generative AI tools in education and research. It is based on current understanding and practices as of 2025. Readers are encouraged to apply critical judgement, stay updated with evolving guidelines, and ensure compliance with their institutional, professional, and ethical standards.

  • Epidemiology and Biostatistics in the Light of Divine Unity

    In the Islamic worldview, knowledge is not categorised into ‘Islamic’ and ‘secular.’ There is only one knowledge — al-‘ilm — bestowed by Allah, whether discovered through divine revelation (wahy) or human reason (‘aql). All beneficial knowledge should ultimately draw us closer to Allah, the All-Knowing. This article explores the field of epidemiology and biostatistics through this lens of divine unity, affirming that scientific inquiry and statistical reasoning are not merely technical disciplines, but pathways to understanding the patterns and wisdom embedded in Allah’s creation.

    John M. Last (1988) defined epidemiology as “the study of the distribution and determinants of health-related states or events in specified populations, and the application of this study to the control of health problems.” This definition highlights three core components: distribution, determinants, and application. Distribution refers to patterns — who is affected, where, and when. Determinants delve into the causes, risk factors, and protective factors. Application demands action — the use of findings to prevent and control diseases.

    In Islam, observation of patterns in nature and society is encouraged. The Qur’an repeatedly invites reflection (tadabbur) on signs (ayat) in the universe and within ourselves. Understanding patterns of disease aligns with this call to contemplation and action. Epidemiology, therefore, becomes a means of fulfilling the Islamic obligation to protect life (hifz al-nafs), one of the five higher objectives of Shariah (maqasid al-shariah).

    Sir Austin Bradford Hill (1965) introduced a set of principles to guide causal inference in epidemiology. His criteria — strength, consistency, temporality, biological gradient, plausibility, coherence, experiment, specificity, and analogy — serve as guides rather than strict rules.

    Yet, we must recognise the epistemological humility within our methods. In regression models, confidence intervals, and Hill’s criteria, there is always an element of uncertainty. This aligns with the Islamic view that human knowledge is inherently limited. As Allah reminds us: “And you (O mankind) have not been given of knowledge except a little.” (Qur’an, Al-Isra’, 17:85)

    Hence, we strive to understand cause and effect through careful observation and reasoning, but ultimately, we acknowledge that true causality is known only to Allah. Our frameworks are approximations — tools to aid, not final truths.

    Historical accounts during the time of the Prophet Muhammad ﷺ and his companions demonstrate the application of outbreak control principles. One notable example is the plague (ṭā‘ūn) during the rule of Caliph Umar ibn al-Khattab. When the plague broke out in Syria, Umar decided not to enter the area, and advised others not to leave or enter — an early form of quarantine.

    The Prophet ﷺ said: “If you hear of a plague in a land, do not enter it; and if it breaks out in a land where you are, do not leave it.” (Sahih al-Bukhari, Hadith 5728; Sahih Muslim, Hadith 2219)

    This hadith reflects core outbreak control principles such as isolation, movement restriction, and collective responsibility — key strategies in modern epidemiology.

    Islam strongly advocates prevention. The Prophet ﷺ advised moderation in eating: “The son of Adam does not fill any vessel worse than his stomach. It is sufficient for the son of Adam to eat a few mouthfuls to keep him going. If he must do that (fill his stomach), then let him fill one-third with food, one-third with drink, and one-third with air.” (Sunan Ibn Majah, Hadith 3349)

    This guidance is preventive in nature and closely aligns with public health nutrition. Islam connects overindulgence and lack of restraint to the whispers of Shayṭān. Preventive health, therefore, is not just a matter of science, but a matter of spiritual discipline.

    Islamic rituals incorporate hygiene into acts of worship. Ablution (wudu’), performed five times daily before prayer, involves washing the hands, mouth, nose, face, arms, head, and feet — the very areas associated with microbial transmission.

    The Prophet ﷺ also instructed: “Cover your utensils and tie your water skins, for there is a night in the year when plague descends, and it does not pass an uncovered utensil or untied water skin without some of that plague descending into it.” (Sahih Muslim, Hadith 2014)

    These teachings reflect divine wisdom in infection prevention, centuries before the discovery of microbes and germ theory.

    Biostatistics provides us with essential tools to summarise data and draw meaningful inferences about populations from sample observations. Among its most powerful techniques is regression analysis, which allows us to explore and quantify the relationship between an outcome (dependent variable) and one or more explanatory (independent) variables.

    The general form of a multiple linear regression model is:

    y = β₀ + β₁x₁ + β₂x₂ + … + βₖxₖ + ε

    In this equation:

    • y represents the outcome or response variable we aim to predict or explain,

    • x₁ to xₖ are the predictor variables that we believe influence the outcome,

    • β₀ is the intercept, the expected value of y when all predictors are zero,

    • β₁ to βₖ are the regression coefficients that quantify the effect of each predictor on the outcome, and

    • ε is the error term, capturing the variability in y that the model cannot explain.

    This error term is more than just a technical component; it is a profound acknowledgment of the limits of human understanding. Even with the most refined models and abundant data, there will always be elements of unpredictability — due to omitted variables, imprecise measurements, biological variation, or other unknown factors. The presence of this uncertainty is a built-in reminder that our knowledge is partial and conditional.

    From an Islamic perspective, this aligns beautifully with the concept of epistemic humility. As Allah states in the Qur’an: “And you (O mankind) have not been given of knowledge except a little.” (Qur’an, Al-Isra’, 17:85)

    Thus, while biostatistics helps us make informed decisions and uncover meaningful relationships, it also teaches us to recognise the boundaries of what we can know. The error term symbolises the divine reality — that ultimate knowledge lies only with Allah. It invites us to pursue knowledge responsibly, with sincerity, but never with arrogance.

    This concept is further reinforced in the Qur’an: “And above every possessor of knowledge is one [more] knowing.” (Qur’an, Yusuf, 12:76)

    Every estimate, statistical model, and inference must be grounded in this awareness. We can model, measure, and approximate, but only Allah knows the unseen, the future, and the full complexity of creation. Biostatistics, therefore, is not only a scientific tool but also a spiritual exercise in recognising our role as seekers of knowledge, always dependent on the One who knows all.

    Epidemiology and biostatistics, when viewed through the Islamic perspective of tawḥīd (oneness of Allah), are not detached from faith but are deeply connected to it. These sciences offer not just understanding but also tools to protect life, serve society, and fulfil the trust placed upon us as khalifah (stewards) on Earth. By unifying rational inquiry with spiritual awareness, we find that knowledge — whether derived from revelation or observation — is ultimately from the same source. Through this lens, our pursuit of health knowledge becomes a journey toward Allah.

    References
    1. Last, J. M. (1988). A Dictionary of Epidemiology (2nd ed.). Oxford University Press.
    2. Hill, A. B. (1965). The Environment and Disease: Association or Causation? Proceedings of the Royal Society of Medicine, 58(5), 295–300.
    3. The Noble Qur’an, Surah Al-Isra’ (17:85), Surah Yusuf (12:76).
    4. Sahih al-Bukhari, Book 76, Hadith 5728.
    5. Sahih Muslim, Book 39, Hadith 2219; Book 23, Hadith 2014.
    6. Sunan Ibn Majah, Book 29, Hadith 3349.
    7. Al-Ghazali, I. H. Ihya Ulum al-Din – On the virtues of knowledge and its relation to action and worship.
    8. Nasr, S. H. (1992). Science and Civilization in Islam. Harvard University Press.

  • Understanding the Central Limit Theorem and Estimating Population Mean Using Sample Data

    The Central Limit Theorem (CLT) is a fundamental concept in statistics and an essential tool in biostatistics. It provides a foundation for understanding how sample data can be used to make inferences about an entire population. This article will guide students through the development and significance of the CLT, exploring the role of sample means, population means, and the measures of dispersion—particularly the standard error (SE), standard deviation (SD), and confidence interval (CI).

    The Origins of the Central Limit Theorem

    The CLT emerged in the 18th century through the work of mathematicians like Abraham de Moivre and Pierre-Simon Laplace. De Moivre, while studying probabilities in games of chance, observed that repeated trials formed a bell-shaped distribution. Laplace expanded on this by demonstrating that the sum of independent random variables approximates a normal distribution as the sample size increases. This was a profound realization because it showed that even non-normally distributed data could produce a predictable distribution of means.

    Later, Carl Friedrich Gauss solidified the concept of the normal distribution while studying measurement errors in astronomy. Gauss observed that when repeated measurements are taken, the errors typically form a bell-shaped curve. This normal distribution became the foundation for much of statistical analysis, allowing researchers to describe and predict patterns in data.

    How the Central Limit Theorem Works

    The CLT states that, for a sufficiently large sample size, the distribution of sample means will approximate a normal distribution regardless of the original population’s distribution. This result is powerful because it allows us to use a sample mean as an estimate for the population mean, even when we don’t know the distribution of the underlying data.

    In practical terms, the CLT explains why the mean of a sample often provides a reliable estimate of the population mean. As we increase the sample size, the sample mean will tend to get closer to the true population mean, creating the basis for inferential statistics.

    Understanding Central Tendency and Dispersion

    To make inferences about a population, we need to understand central tendency (mean, median, mode) and dispersion (SD, SE, CI). Central tendency measures provide a summary of where most data points fall, while dispersion measures show how spread out the data points are.

    Mean: The average value of the data points, highly sensitive to outliers.

    Median: The middle value in a sorted dataset, less affected by extreme values.

    Mode: The most frequently occurring value, useful for identifying common outcomes.

    In medical research, choosing the correct central tendency measure is important. For instance, in analyzing cholesterol levels among patients, the median might offer a more accurate “central” measure than the mean if the dataset contains extreme outliers.

    Measures of Dispersion: SD, SE, and CI

    Understanding variability is essential in research, as it indicates how consistent or spread out data points are around the mean. Here’s how SD, SE, and CI are applied:

    Standard Deviation (SD): This measures the spread of data within a sample. A high SD means individual values vary widely around the mean, while a low SD means values cluster closely around the mean.

    Standard Error (SE): This measures how much the sample mean is expected to deviate from the true population mean. The SE decreases as sample size increases, reflecting that larger samples provide more precise estimates of the population mean.

    Confidence Interval (CI): This gives a range within which the population mean likely falls. A 95% CI means we are 95% confident that the interval contains the true population mean. CIs allow researchers to report not only an estimate but also the reliability of that estimate.

    Interpreting “Small” SE and CI

    What constitutes a “small” SE or CI depends on several factors:

    1. Relative Size to the Mean: Typically, an SE or CI that is within 5-10% of the mean can be considered precise in many fields. For example, if the mean blood pressure reduction in a study is 10 mmHg, an SE between 0.5 and 1 mmHg would be considered precise because it reflects only a small percentage (5-10%) of the mean value.

    2. Clinical Relevance: In medicine, a small SE or narrow CI must also be clinically meaningful. A small SE that doesn’t offer insight into a meaningful treatment effect wouldn’t necessarily be useful.

    3. Sample Size and Precision: SE decreases as sample size increases. Larger samples reduce SE, resulting in narrower CIs, and provide more reliable estimates of the population mean.

    Proving Standard Error with R Simulation

    To demonstrate that SE accurately represents the precision of the sample mean, we can use a simulation. Here’s an R script that simulates a population, repeatedly samples it, and shows that the simulated SE (standard deviation of sample means) approximates the theoretical SE:

    # Load necessary libraries
    library(dplyr)
    library(gt)

    # Set parameters for population
    population_mean <- 120 # Hypothetical population mean (e.g., blood pressure)
    population_sd <- 15 # Hypothetical population standard deviation
    population_size <- 100000 # Large population size for accurate simulation
    num_samples <- 1000 # Number of samples to draw per sample size

    # Generate a large population
    set.seed(42)
    population_data <- rnorm(population_size, mean = population_mean, sd = population_sd)

    # Define different sample sizes for comparison
    sample_sizes <- c(10, 50, 100, 500, 1000)

    # Create a data frame to store results
    results <- data.frame(Sample_Size = integer(),
    Mean_of_Sample_Means = numeric(),
    Simulated_SE = numeric(),
    CI_Lower_Bound = numeric(),
    CI_Upper_Bound = numeric())

    # Loop through each sample size and calculate metrics
    for (sample_size in sample_sizes) {
    # Draw multiple samples and calculate mean for each sample
    sample_means <- replicate(num_samples, mean(sample(population_data, sample_size, replace = FALSE)))

    # Calculate the mean of sample means, simulated SE, and CI bounds
    mean_of_sample_means <- mean(sample_means)
    simulated_se <- sd(sample_means)
    ci_lower_bound <- mean_of_sample_means - 1.96 * simulated_se
    ci_upper_bound <- mean_of_sample_means + 1.96 * simulated_se

    # Store the results in the data frame
    results <- results %>% add_row(Sample_Size = sample_size,
    Mean_of_Sample_Means = mean_of_sample_means,
    Simulated_SE = simulated_se,
    CI_Lower_Bound = ci_lower_bound,
    CI_Upper_Bound = ci_upper_bound)
    }

    # Display the results table using gt for a neat format
    results %>%
    gt() %>%
    tab_header(title = "Comparison of SE and CI Across Different Sample Sizes")

    The output of this simulation with different sample sizes illustrates how SE and CI change with sample size:

    Explanation of Results

    In this simulation:

    • The mean of the sample means should be close to the population mean, confirming that sample means provide a reliable estimate of the true mean.

    • The standard deviation of the sample means (simulated SE) will approximate the theoretical SE, supporting that SE reflects how much the sample mean varies from the population mean.

    • The decreasing SE and narrowing CI with increasing sample size illustrate that larger samples improve the precision of the sample mean.

    Application in Medical Research

    In medical research, these measures of dispersion are vital. For instance, in a clinical trial, researchers might measure the mean reduction in blood pressure after a treatment. Reporting the mean reduction alone isn’t enough; they also report SE and CI to show the precision and reliability of this estimate. A smaller SE suggests the sample mean closely approximates the true effect in the population, while the CI gives a range within which the true effect likely falls.

    Summary

    The Central Limit Theorem and measures like SD, SE, and CI form the statistical backbone of medical research. Through understanding these concepts, researchers can confidently use sample data to estimate population parameters, assess data reliability, and make evidence-based decisions in healthcare.

    Disclaimer

    This article was created using ChatGPT for educational purposes and should not replace professional statistical advice.

  • All Swans Are White

    Photo by John Harrison

    Imagine that every swan observed in a particular region is white, leading to the belief that “all swans are white.” This conclusion appears reliable until the unexpected discovery of a black swan, which disproves this assumption. This example demonstrates a fundamental principle of hypothesis testing: science is often less about proving ideas to be universally true and more about disproving them to allow for further understanding. The discovery of a black swan challenges the certainty of the prior belief and highlights the scientific value of testing and questioning.

    A similar principle operates in the legal system, particularly in criminal trials. Here, the defendant is presumed “not guilty” until proven otherwise. This presumption of innocence serves as a null hypothesis, establishing a baseline assumption that remains in place unless substantial evidence refutes it. The burden of proof lies with the prosecution to present sufficient evidence to reject the “not guilty” assumption. If the evidence does not meet the required threshold, the verdict is “not guilty.” However, a verdict of “not guilty” does not equate to proof of innocence; it merely indicates that there was not enough evidence to reject the initial assumption. This cautious approach minimises errors in the justice process, underscoring the importance of evidence-based decision-making.

    In scientific research, hypothesis testing operates on a similar foundation, using a “null hypothesis” that provides a default assumption, usually stating that there is no effect or difference. Statisticians like Ronald A. Fisher developed hypothesis testing methods to structure scientific inquiry. Researchers gather data to test whether there is enough evidence to reject the null hypothesis. This method ensures that claims are only accepted when supported by statistically significant evidence, reducing the chance of errors.

    Why Rejection is Easier than Acceptance

    It is easier to prove that not all swans are white than to prove that every swan is white, as a single black swan is enough to challenge this certainty. In hypothesis testing, we often start with a “no effect” or “no difference” assumption because it allows us to look for evidence that disproves it rather than trying to prove a universal truth.

    Imagine researchers are testing a new drug intended to improve cancer survival rates compared to the standard treatment.

    Null Hypothesis (H₀): The new drug does not improve survival rates compared to the standard treatment (no effect).

    Alternative Hypothesis (H₁): The new drug does improve survival rates compared to the standard treatment.

    Here, the null hypothesis assumes no improvement with the new drug. By starting with this “no effect” assumption, researchers have a neutral basis for testing. If there’s strong evidence against the null hypothesis, they can reject it in favour of the alternative.

    Researchers conduct a trial with two groups of patients: one receiving the new drug and the other receiving the standard treatment. If they observe significantly higher survival rates in the group taking the new drug, they calculate the probability of seeing this difference under the assumption of “no effect.” If the probability of achieving these results by chance alone is very low (for example, below 5%, or p-value < 0.05), it suggests that the survival benefit observed is unlikely if the drug truly had no effect. This gives researchers grounds to reject the null hypothesis and conclude that the drug likely improves survival rates.

    To reject the null hypothesis, researchers only need a few trials showing significant improvement with the new drug to question the “no effect” assumption. But to prove that the drug universally improves survival rates in every case would require countless trials and still wouldn’t provide absolute certainty.

    Beyond Detecting Differences

    In hypothesis testing, the traditional approach often aims to detect a difference between groups or treatments, such as whether a new drug performs better than an existing one. However, researchers sometimes have different objectives, and other types of hypotheses can better reflect these specific goals. These include superiority, non-inferiority, and equivalence hypotheses, each of which frames the research question in a distinct way.

    Superiority Hypothesis: This hypothesis seeks to determine whether one treatment is better than another. For example, testing whether a new cancer drug improves survival rates more than the current standard drug. Here, the null hypothesis assumes the new treatment is no better than the standard, while the alternative suggests the new treatment is superior. If the data shows a statistically significant improvement with the new treatment, researchers can reject the null hypothesis and conclude that the new drug is likely superior.

    Non-Inferiority Hypothesis: In some studies, the goal is to show that a new treatment is not worse than the existing standard by more than an acceptable margin. This is often used when the new treatment may have other benefits, such as being less costly or easier to administer. For instance, researchers might test whether a new oral medication for high blood pressure is not significantly less effective than an injectable option, but is easier to use. Here, the null hypothesis assumes that the new treatment is worse than the standard by more than the acceptable margin, while the alternative hypothesis is that it is not worse. Results within the non-inferiority margin allow researchers to reject the null hypothesis and conclude that the new treatment is non-inferior.

    Equivalence Hypothesis: This hypothesis tests whether two treatments produce similar effects within a specified range. Equivalence studies are useful when researchers want to show that a new treatment performs just as well as an existing one, typically to confirm that the new treatment can replace the current standard without loss of efficacy. For example, researchers might test whether a generic drug is as effective as a brand-name drug within a small acceptable range of difference. The null hypothesis here assumes the treatments differ by more than the acceptable range, while the alternative suggests they are equivalent. Rejecting the null hypothesis indicates that the treatments can be considered interchangeable.

    Errors in Hypothesis Testing

    Hypothesis testing involves managing the risks of two main errors:

    Type I Error (False Positive): This error occurs when a true null hypothesis is incorrectly rejected. In a medical context, this could mean diagnosing a patient with a disease they do not actually have. Such an error could lead to unnecessary anxiety, additional tests, or even unnecessary treatment. In statistical terms, the probability of a Type I error is represented by alpha (α), usually known as the p-value, and is often set at 0.05.

    In hypothesis testing, the p-value helps us determine whether the results we observe are likely to reflect the real truth—an actual effect of the treatment—or are simply due to chance. When we talk about the “truth” in this context, we’re asking if the observed benefit of a new drug (for instance, an improvement in cancer survival rates) is genuinely due to the drug itself, rather than a random occurrence. The goal of statistical testing is to help us differentiate between results that reflect real-world effects and those that could have happened by coincidence.

    Type II Error (False Negative): This error occurs when a false null hypothesis is not rejected. In medical testing, this would be failing to diagnose a patient who actually has the disease, potentially leading to missed treatment. The probability of a Type II error is represented by beta (β), with the power of a test (1 – β) reflecting its ability to detect true effects.

    Both errors have real-world implications, especially in medicine, where incorrectly diagnosing a healthy patient (Type I error) or missing a diagnosis in an ill patient (Type II error) can have significant consequences.

    Looking Beyond Numbers

    The white swan story, the legal system, and hypothesis testing share a common theme: reliable conclusions require rigorous evidence and careful judgment. Hypothesis testing isn’t only about calculating probabilities and p-values; it’s about interpreting what those numbers mean in real-world contexts. By prioritising the rejection of assumptions rather than their acceptance, science follows a cautious, methodical path that allows for meaningful and reliable discoveries. This approach encourages researchers to look beyond statistical outcomes alone and consider the larger implications of their findings. Through thorough testing and the careful interpretation of evidence, hypothesis testing fosters a structured and reliable process for understanding the world.