â–¶What is the difference between Type I and Type II errors, and how do I control them?
Type I error is rejecting the null hypothesis when it's true (false positive, rate α, e.g., 0.05); Type II error is failing to reject the null when it's false (false negative, rate β, power = 1 - β). α is controlled directly (you choose your significance level, typically 0.05). β and power depend on effect size, sample size, and variability; with a larger sample or bigger effect, you gain power. You can't minimize both α and β simultaneously without changing sample size. In most research, α = 0.05 (5% false positive rate) and power = 0.80 (80% true positive rate, β = 0.20) are conventional trade-offs. High-stakes decisions (approving a risky drug) might demand tighter α; exploratory research might accept lower power. Always report effect sizes and confidence intervals, not just p-values.
â–¶What is p-hacking and how do I avoid it?
P-hacking is the practice of trying many analyses and reporting only those that achieve statistical significance—inflating false positive rates. Examples: testing multiple outcomes and reporting only the significant one, excluding outliers post-hoc to reach p < 0.05, or trying different statistical tests until one works. It ruins reproducibility; published results are often unreplicable. Prevention strategies: pre-register your hypotheses and analysis plan before analyzing data, commit to a primary outcome and treat others as exploratory, set your significance level *before* running tests, report effect sizes and confidence intervals (which convey magnitude regardless of p-value), and commit to reporting negative findings equally. Registered reports (review and approval before data collection) are the strongest defense.
â–¶When should I use linear regression versus logistic regression?
Use linear regression when your outcome is continuous (height, income, test scores); it models the mean outcome as a linear function of predictors. Use logistic regression when your outcome is binary (disease yes/no, voted yes/no); it models the probability of the outcome occurring. Linear regression on binary outcomes violates assumptions (residuals aren't normally distributed) and produces predictions outside [0,1]. Logistic regression accounts for the binary nature and constrains predictions to probabilities. Multinomial logistic regression extends to multiple categories. Violation of assumptions has consequences: misestimated standard errors, misleading significance tests, unrealistic predictions. Choose the right tool for your outcome type.
â–¶What is a confidence interval and why is it better than a p-value?
A confidence interval (CI) is a range of plausible values for a parameter, estimated from data. A 95% CI means that if you repeated your study many times and computed a CI each time, about 95% of those intervals would contain the true parameter. This interpretation (frequency-based) mirrors how statisticians think about evidence. A p-value is the probability of observing a result as or more extreme than what you got, *if the null hypothesis were true*—a conditional probability that many people misinterpret as 'probability the hypothesis is true' (which it isn't). A CI communicates both the point estimate and precision (narrow CI = more precise estimate). Effect sizes (like Cohen's d) combined with CIs are far more informative than p-values alone. Report both CIs and p-values for completeness, but lead with effect sizes and CIs.
â–¶What are violations of regression assumptions and how do I diagnose them?
Key assumptions in linear regression: linearity (outcome is linear function of predictors), independence (observations are independent), homoscedasticity (residual variance is constant), and normality (residuals are normally distributed). Violations matter: if assumptions are violated, standard errors can be biased and significance tests misleading. Diagnose using: scatter plots (check linearity and outliers), residual plots (check homoscedasticity and patterns), Q-Q plots (check normality), and VIF values (check multicollinearity). If violations exist: consider transformations (log, square root), robust standard errors, or alternative methods. Violations don't invalidate regression entirely; they affect precision of inference. Transparent reporting of violations and sensitivity analyses (showing results are robust to violations) strengthen your work.
â–¶What is power analysis and when should I conduct it?
Power analysis determines the sample size needed to detect a hypothesized effect with desired statistical power (typically 80%), given your significance level (α = 0.05) and expected effect size. Conduct it *before* data collection during study design: What sample size do I need? Use tools like G*Power, online calculators, or software-specific functions. Supply: effect size (from theory, prior research, or minimally meaningful difference), α, desired power, and study design. Power analysis is essential for: grant proposals (funders expect it), ethical research (large studies wasting resources on underpowered designs are unethical), and research credibility (underpowered studies are prone to false negatives and inflated effect sizes in published results). Post-hoc power analysis (computing power after the study) is less useful—if you didn't achieve significance, it doesn't tell you whether your effect truly doesn't exist or you were underpowered.
â–¶How do I handle multiple comparisons and multiple testing problems?
When you conduct many statistical tests, false positive rates (Type I error) inflate. Example: with 20 independent tests at α = 0.05 each, you expect about 1 false positive by chance. Methods to control this include: Bonferroni correction (divide α by number of tests, very conservative), Holm-Bonferroni (sequentially less stringent), False Discovery Rate (FDR, controls proportion of false positives among discoveries, less conservative than Bonferroni), or pre-specification of a primary hypothesis and secondary hypotheses. Best practice: declare your primary outcome and hypothesis; power the sample for that alone; treat other analyses as exploratory and report them as such. This maintains scientific integrity without overly penalizing discovery. Registered reports (pre-specifying analyses) are the gold standard for avoiding multiple testing issues.