An exam companion

Statistics.

From probability to hypothesis testing — theory, flashcards, spaced repetition, and worked exercises, built to be read.

Chapters

—

Flashcards

—

Exercises

—

Reading Instructions

How to use this volume

Begin with Theory. Every concept is introduced from first principles, with worked examples.
Move to Flashcards for active recall. Click any card to flip, then mark HARD or GOT IT — your judgments persist.
Attempt each Exercise on paper. Reveal the hint if stuck, then the full solution to check your reasoning.
Navigate with ← →. Focus search with /. Mark a problem solved to track your progress.

CH. 01 Statistics

Probability Foundations

Starting from zero: what probability is, how to combine events, and how to reason with Bayes — built up with concrete examples before any exam problem.

Sections 10

Flashcards 11

Exercises 8

Read time 7'

Sources Ross, Introductory Statistics 3E, ch. 4 · MACS lectures 02–05 (Perone Pacifico, LUISS) · Past exams: MACSf1/2, MAIf2/3/5, sampletest1, EBf1

Key concepts

Sample space & events Set operations Probability rules Equally likely outcomes Conditional probability Independence Total probability Bayes' theorem Counting

1 · What probability actually is

Start with the picture, not the formula. An experiment is any situation whose result you can't predict for sure, but where you do know the list of possible results. Tossing a coin is an experiment; so is "which operating system will my next laptop run?" or "what will Apple's share price be on Monday?"

Each single result is an outcome. The full list of possible outcomes is the sample space, written \(S\).

Coin toss: \(S=\{\text{Heads},\text{Tails}\}\).
Roll a die: \(S=\{1,2,3,4,5,6\}\).
Next laptop OS: \(S=\{\text{Windows},\text{MacOS},\text{Linux},\dots\}\).

An event is just a statement about the result — "the die shows an even number", "I get Heads". Formally it's a subset of \(S\): the even-number event is the set \(\{2,4,6\}\). We say the event occurs when the actual outcome is one of the outcomes inside it.

So what is a probability? The most useful picture is long-run frequency: if you repeated the experiment over and over, the probability of an event is the proportion of times it would happen. Flip a fair coin thousands of times and the fraction landing Tails settles near \(0.5\) — that limiting fraction is the probability. (Real example: across the world about 105 boys are born for every 100 girls, year after year — so \(P(\text{newborn is male})\approx0.51\).) A probability is always a number between 0 and 1: 0 = never, 1 = certain.

2 · Combining events: AND, OR, NOT

Events are sets, so we combine them like sets. Picture a rectangle for \(S\) and circles inside it for events (a Venn diagram).

OR — union \(A\cup B\): the outcomes in \(A\), or in \(B\), or in both. It occurs if at least one of them happens.
AND — intersection \(A\cap B\): the outcomes in both. It occurs only if they happen together.
NOT — complement \(A^c\): everything in \(S\) that is not in \(A\). It occurs exactly when \(A\) doesn't.

Mutually exclusive (disjoint) events can't happen at the same time — their intersection is empty, \(A\cap B=\varnothing\). "The die shows 2" and "the die shows 5" are mutually exclusive. The empty set \(\varnothing\) is the impossible event.

One identity worth seeing now, because it drives a lot of exam problems: "in \(A\) but not in \(B\)" is \(A\cap B^c\), and it equals what's in \(A\) minus the overlap. We'll turn that into numbers next.

3 · The rules of probability

Three basic rules (they just codify the frequency picture):

\(0\le P(A)\le1\) — proportions live between 0 and 1.
\(P(S)=1\) — something in the list always happens.
If \(A,B\) are mutually exclusive, \(P(A\cup B)=P(A)+P(B)\) — non-overlapping chances just add.

From these you derive the two you'll actually use:

Complement rule. Since \(A\) and \(A^c\) split \(S\): \(P(A^c)=1-P(A)\). (Heads has probability 0.4 ⇒ Tails has 0.6.) This is the engine behind "at least one" problems — it's usually easier to compute the opposite and subtract.

Addition rule (when events can overlap): \[ P(A\cup B)=P(A)+P(B)-P(A\cap B). \] Why subtract? If you just add \(P(A)+P(B)\), the overlap \(A\cap B\) gets counted twice, so you remove it once.

Concrete example (Ross 4.3). A shop takes Amex or VISA. 22% of customers carry Amex, 58% carry VISA, 14% carry both. Probability a customer has at least one card: \(0.22+0.58-0.14=0.66\). And "VISA but not Amex" \(=0.58-0.14=0.44\) — exactly the \(A\cap B^c\) idea from section 2.

4 · Equally likely outcomes — just count

When every outcome in \(S\) is equally likely (a fair die, a well-shuffled deck, "a person chosen at random"), probability becomes pure counting:

\[ P(A)=\frac{\#\text{outcomes in }A}{\#\text{outcomes in }S}. \]

The phrase "chosen at random" is your signal that outcomes are equally likely.

Fair die: \(P(\text{even})=3/6=1/2\).
European roulette, bet on odd: numbers \(\{0,1,\dots,36\}\), 18 are odd ⇒ \(18/37\).
Retirement centre (Ross 4.4): 420 members, 144 smokers ⇒ \(P(\text{smoker})=144/420=12/35\).

This is why counting techniques (section 9) matter: to get a probability you often just need to count the favourable outcomes and divide by the total.

5 · Conditional probability — updating on information

Often you learn something partway through. Conditional probability \(P(B\mid A)\) is "the probability of \(B\) given that \(A\) has happened."

The intuition (no formula yet). Roll two dice; you're told the first die is a 4. That knowledge shrinks the world: only six outcomes are still possible — \((4,1),\dots,(4,6)\). Among those, only \((4,6)\) makes the sum 10, so the chance is \(1/6\). You re-computed the probability inside a reduced sample space: \(A\) became your new \(S\).

Turning that into a formula — measure the overlap relative to the thing you now know:

\[ P(B\mid A)=\frac{P(A\cap B)}{P(A)}. \]

The famous trap (Ross 4.10 / the two-children problem). A couple has two children; you learn at least one is a girl. Probability both are girls? Equally likely families are \(\{(g,g),(g,b),(b,g),(b,b)\}\). "At least one girl" rules out only \((b,b)\), leaving three equally likely cases; just one is \((g,g)\). So the answer is \(1/3\), not \(1/2\) — the extra information reshapes the sample space. (This exact reasoning powers exam exercise 5 below.)

Rearranging the formula gives the multiplication rule: \(P(A\cap B)=P(A)\,P(B\mid A)\) — useful for "draw two without replacement" problems, where the second draw's odds depend on the first.

6 · Independence — when information doesn't help

Sometimes knowing \(A\) tells you nothing about \(B\). Then \(P(B\mid A)=P(B)\), and the multiplication rule simplifies to the test you'll use:

\[ A,B\text{ independent}\iff P(A\cap B)=P(A)\,P(B). \]

Signals of independence: "with replacement", "i.i.d.", "each toss is fair" — separate trials that don't influence each other.

Why it's not automatic (Ross 4.13). Two fair dice. Let \(A=\)"first die is 3". Compare two events: \(B=\)"sum is 8" and \(C=\)"sum is 7". Knowing the first die is 3 changes the chance of an 8 (now you just need a 5 next) — so \(A,B\) are dependent. But the chance of a 7 stays \(1/6\) whatever the first die shows (there's always exactly one matching second die) — so \(A,C\) are independent. Same first event, opposite verdicts: independence is something you check, not assume.

For "at least one" across independent trials, lean on the complement: three children, \(P(\text{at least one girl})=1-P(\text{all boys})=1-(1/2)^3=7/8\).

7 · Total probability — averaging over cases

Often the thing you want depends on a hidden "case" or "cause". Split the problem by the case, compute each piece, and combine. If \(B\) either happens or not, then for any \(A\):

\[ P(A)=P(A\mid B)\,P(B)+P(A\mid B^c)\,P(B^c). \]

Read it as a weighted average: the chance of \(A\) in each case, weighted by how likely that case is. (It generalises to any set of mutually-exclusive cases \(B_1,\dots,B_k\): \(P(A)=\sum_i P(A\mid B_i)P(B_i)\).)

Concrete example (insurance). 30% of drivers are high-risk, 70% low-risk. A high-risk driver has an accident this year with probability 0.4, a low-risk one with probability 0.2. Overall chance a random driver has an accident: \[ P(\text{accident})=0.4(0.30)+0.2(0.70)=0.12+0.14=0.26. \] You couldn't answer without splitting by risk type — that's the whole move.

8 · Bayes' theorem — reasoning backwards

Total probability runs cause → effect. Bayes runs it backwards: you observed the effect, and you want the probability of the cause. "It rained — how likely is it that the morning had been sunny?" "The test is positive — how likely is the disease?"

Start from the definition \(P(\text{cause}\mid\text{effect})=\dfrac{P(\text{cause}\cap\text{effect})}{P(\text{effect})}\), write the top as \(P(\text{effect}\mid\text{cause})P(\text{cause})\), and expand the bottom with total probability:

\[ P(H\mid E)=\frac{P(E\mid H)\,P(H)}{P(E\mid H)\,P(H)+P(E\mid H^c)\,P(H^c)}. \]

The procedure: (1) name the causes \(H\) and the observed effect \(E\); (2) write the priors \(P(H)\) and the likelihoods \(P(E\mid H)\) straight from the text; (3) total probability gives the denominator; (4) divide.

The showcase example — why a positive test can still be reassuring (Ross 4.17). A blood test is 99% accurate when the disease is present (\(P(E\mid H)=0.99\)) and gives a false positive 2% of the time (\(P(E\mid H^c)=0.02\)). Only 0.5% of people have the disease (\(P(H)=0.005\)). You test positive — what's the chance you're actually sick? \[ P(H\mid E)=\frac{0.99(0.005)}{0.99(0.005)+0.02(0.995)}\approx0.199. \] About 20% — surprisingly low, because the huge healthy population produces many false positives that swamp the few true cases. This is exactly the engine behind exam exercises 3 and 4.

9 · Counting — tools for equally-likely problems

Section 4 said probability is often just counting favourable ÷ total. Here are the tools, introduced only because we need them.

Basic principle. If step 1 has \(n\) options and step 2 has \(m\), together there are \(n\cdot m\). (One man from 8, one woman from 12 → \(96\) pairs.)

Permutations / factorial. The number of orderings of \(n\) distinct objects is \(n!=n(n-1)\cdots2\cdot1\) (with \(0!=1\)). Order matters here.

Combinations. When order does not matter — choosing a group of \(k\) from \(n\):

\[ \binom{n}{k}=\frac{n!}{k!\,(n-k)!}. \]

Two reflexes for the exam: "all different" → count the complement (the few repeats) and subtract; "these specific items are included" → fix them, then count the choices for the remaining slots. Watch for repeated elements — e.g. the two identical P's in APPLE: \(\binom{5}{2}=10\) selections, only one is the pair PP.

10 · Which tool? — recognition guide

Once the concepts are in place, exam problems are about spotting which one applies.

If the wording says…	It's a…	Do
"chosen at random" among options, then observe a result; "given [result], prob it was [cause]"	Bayes	priors × likelihoods, divide
"a cause is random, then count X", asked an unconditional \(P(X{=}k)\)	Total probability (mixture)	\(\sum P(X{=}k\mid \text{case})P(\text{case})\)
two overlapping groups + counts; "at least one", "one but not the other"	Addition / inclusion–exclusion	\(P(A\cup B)=P(A)+P(B)-P(A\cap B)\)
"choose \(k\) without replacement", "all different", "both included"	Counting	\(\binom{n}{k}\) + complement / fix-items
"knowing exactly \(m\) of \(n\) are…", asked about a specific draw	Conditional in a fair setup	reduced sample space; count favourable ÷ remaining

Click any card to flip. Rate it after to track what you need to revisit.

Card 1

In plain words: what are an experiment, an outcome, the sample space S, and an event?

Experiment = a situation with an unpredictable result but a known list of possibilities. Outcome = one single result. Sample space \(S\) = the set of all possible outcomes. Event = a statement about the result, i.e. a subset of \(S\); it 'occurs' when the actual outcome is inside it.

How sure? 50%

Card 2

What does it mean to say 'the probability of Tails is 0.5' (the long-run frequency picture)?

If you repeated the experiment many times, the proportion of times the event happens settles toward that number. Probability = limiting relative frequency. Always between 0 (never) and 1 (certain).

How sure? 50%

Card 3

AND, OR, NOT for events — which set operation is each, and what does 'mutually exclusive' mean?

OR = union \(A\cup B\) (at least one). AND = intersection \(A\cap B\) (both). NOT = complement \(A^c\). Mutually exclusive (disjoint) = can't both happen, \(A\cap B=\varnothing\).

How sure? 50%

Card 4

Why does the addition rule subtract \(P(A\cap B)\)? State it.

\(P(A\cup B)=P(A)+P(B)-P(A\cap B)\). Adding \(P(A)+P(B)\) double-counts the overlap, so you remove it once. (Also \(P(A\cap B^c)=P(A)-P(A\cap B)\).)

How sure? 50%

Card 5

When outcomes are equally likely, how do you compute a probability? What phrase signals this?

\(P(A)=\#A/\#S\) — favourable outcomes divided by total. The phrase 'chosen at random' signals equally-likely outcomes.

How sure? 50%

Card 6

Conditional probability: the intuition (reduced sample space) and the formula.

Given \(A\) happened, \(A\) becomes your new sample space; re-measure \(B\) inside it. \(P(B\mid A)=\dfrac{P(A\cap B)}{P(A)}\). The two-children problem: 'at least one girl' leaves 3 equally likely cases, so P(both girls)=1/3, not 1/2.

How sure? 50%

Card 7

Independence: the test, and how to recognise it in a problem.

\(A,B\) independent \(\iff P(A\cap B)=P(A)P(B)\) \(\iff P(B\mid A)=P(B)\). Signalled by 'with replacement', 'i.i.d.', 'each trial is fair'. Don't assume it — check it.

How sure? 50%

Card 8

Law of total probability — state it and explain it in words.

\(P(A)=\sum_i P(A\mid B_i)P(B_i)\) over mutually-exclusive cases \(B_i\). It's a weighted average: the chance of \(A\) in each case, weighted by how likely that case is. Use it when the answer depends on a hidden case/cause.

How sure? 50%

Card 9

Bayes' theorem: what does it do, and what are the priors and likelihoods?

It reverses conditioning: from effect back to cause. \(P(H\mid E)=\dfrac{P(E\mid H)P(H)}{\sum_i P(E\mid H_i)P(H_i)}\). Priors \(P(H_i)\) = how likely each cause before evidence; likelihoods \(P(E\mid H_i)\) = how the cause produces the effect. The denominator is total probability.

How sure? 50%

Card 10

Why can a 99%-accurate positive test still mean only ~20% chance of disease?

If the disease is rare (e.g. 0.5%), the large healthy group produces many false positives that outnumber the few true cases. Bayes weighs prior × likelihood: \(\frac{0.99\cdot0.005}{0.99\cdot0.005+0.02\cdot0.995}\approx0.20\). Rarity of the cause matters as much as test accuracy.

How sure? 50%

Card 11

Counting: factorial, combinations, and the two exam reflexes.

Orderings of \(n\): \(n!\). Choosing \(k\) of \(n\) (order irrelevant): \(\binom{n}{k}=\frac{n!}{k!(n-k)!}\). Reflexes: 'all different' → count the complement and subtract; 'specific items included' → fix them, count the rest.

How sure? 50%

Progress

0 / 8

EX. 01

Coin tossed a random number of times — P(X=0)

MACSf1-P2 medium

Let \(N\) be a number chosen from \(\{1,2,3\}\) with equal probability. Throw a fair coin \(N\) times, counting the number \(X\) of Heads obtained. Calculate \(P(X=0)\).

The hidden 'case' is how many times you tossed (\(N\)). No reversal is asked, so it's the law of total probability (section 7). Given \(N=n\), getting zero Heads means \(n\) Tails in a row: \((\tfrac12)^n\).

Step 1. Identify the structure
Cases: \(P(N=1)=P(N=2)=P(N=3)=\tfrac13\). The question asks a plain \(P(X=0)\) — no 'given' — so average over the cases (total probability), not Bayes.

Step 2. Chance of zero Heads in each case
\(n\) independent fair tosses, all Tails: \(P(X=0\mid N=n)=(\tfrac12)^n\). For \(n=1,2,3\) that's \(\tfrac12,\tfrac14,\tfrac18\).

Step 3. Weighted average
\[ P(X=0)=\sum_{n=1}^{3}P(X=0\mid N=n)P(N=n)=\tfrac13\left(\tfrac12+\tfrac14+\tfrac18\right)=\tfrac13\cdot\tfrac78. \]

\( P(X=0)=\dfrac{7}{24} \).

EX. 02

Same coin experiment — reverse it with Bayes

MACSf2-P2 medium

Same setup (\(N\) chosen from \(\{1,2,3\}\), fair coin tossed \(N\) times, \(X\) = Heads). Given that \(X=0\), calculate \(P(N=1\mid X=0)\).

Now you observe the effect \(X=0\) and want the hidden cause \(N=1\) → Bayes (section 8). The denominator is the \(P(X=0)=\tfrac{7}{24}\) you just built.

Step 1. Denominator = total probability
From exercise 1, \(P(X=0)=\tfrac{7}{24}\).

Step 2. Numerator = prior × likelihood for the cause N=1
\(P(X=0\mid N=1)\,P(N=1)=\tfrac12\cdot\tfrac13=\tfrac16\).

Step 3. Divide (Bayes)
\[ P(N=1\mid X=0)=\frac{1/6}{7/24}=\frac16\cdot\frac{24}{7}. \]

\( P(N=1\mid X=0)=\dfrac{4}{7} \).

EX. 03

Which coin was it? — Bayes with two coins

sampletest1-P1 easy

An urn has two coins: coin A with \(P(\text{Heads})=\tfrac14\), coin B with \(P(\text{Heads})=\tfrac12\). One coin is picked at random and thrown, giving "Heads". Given this result, what is the probability it was coin A?

Cause = which coin (priors \(\tfrac12,\tfrac12\)); effect = Heads (likelihoods \(\tfrac14,\tfrac12\)). You observed the effect, want the cause → Bayes. Same shape as the medical-test example.

Step 1. Priors and likelihoods
\(P(A)=P(B)=\tfrac12\); \(P(H\mid A)=\tfrac14\), \(P(H\mid B)=\tfrac12\).

Step 2. Total probability of Heads (denominator)
\(P(H)=\tfrac14\cdot\tfrac12+\tfrac12\cdot\tfrac12=\tfrac18+\tfrac14=\tfrac38\).

Step 3. Bayes
\[ P(A\mid H)=\frac{P(H\mid A)P(A)}{P(H)}=\frac{1/8}{3/8}. \]

\( P(A\mid H)=\dfrac13 \).

EX. 04

Was the morning sunny? — Bayes with rain

EBf1-P2 easy

If the morning is sunny, the chance of rain that day is \(\tfrac16\). On non-sunny mornings, the chance of rain is \(\tfrac12\). 60% of mornings start sunny. Given that it rained, what is the probability the morning was sunny?

Cause = sunny vs not (priors \(0.6,0.4\)); effect = rain (likelihoods \(\tfrac16,\tfrac12\)). Observed the effect (rain), want the cause (sunny) → Bayes.

Step 1. Priors and likelihoods
\(P(S)=0.6\), \(P(S^c)=0.4\); \(P(R\mid S)=\tfrac16\), \(P(R\mid S^c)=\tfrac12\).

Step 2. Total probability of rain
\(P(R)=\tfrac16\cdot0.6+\tfrac12\cdot0.4=0.1+0.2=0.3\).

Step 3. Bayes
\[ P(S\mid R)=\frac{P(R\mid S)P(S)}{P(R)}=\frac{0.1}{0.3}. \]

\( P(S\mid R)=\dfrac13 \).

EX. 05

Exactly two women — conditioning in a fair setup

MAIf3-P2 hard

Three students are selected at random with replacement. Knowing that exactly two of the selected are women, what is the probability that the first selected is a woman?

This is the two-children trap (section 5) with three slots. 'With replacement' → independent trials, each woman with some probability \(p\). Let \(E_i\)='the \(i\)-th is a woman', \(W\)=number of women. Want \(P(E_1\mid W=2)\). Watch \(p\) cancel.

Step 1. The conditioning event
Number of women among 3 is Binomial: \(P(W=2)=\binom{3}{2}p^2(1-p)=3p^2(1-p)\).

Step 2. Favourable: first is a woman AND exactly two women
The arrangements are \(E_1E_2E_3^c\) and \(E_1E_2^cE_3\) (disjoint). By independence each is \(p\cdot p\cdot(1-p)\), so \(P(E_1\cap\{W=2\})=2p^2(1-p)\).

Step 3. Divide — and watch p cancel
\[ P(E_1\mid W=2)=\frac{2p^2(1-p)}{3p^2(1-p)}. \] The \(p^2(1-p)\) cancels, just like the count-the-cases reasoning in the two-children problem.

\( P(E_1\mid W=2)=\dfrac23 \) — independent of \(p\).

EX. 06

Biology not Chemistry — addition rule

MAIf2-P2 easy

A school has 300 students; every student takes at least one of Biology or Chemistry, possibly both. Biology has 250 students, Chemistry 150. Picking a student at random, what is the probability they take Biology and not Chemistry?

Two overlapping groups + 'and not' → addition rule (section 3). Everyone is in the union, so \(P(B\cup C)=1\); solve for the overlap, then use \(P(B\cap C^c)=P(B)-P(B\cap C)\).

Step 1. Translate counts to probabilities
\(P(B)=\tfrac{250}{300}\), \(P(C)=\tfrac{150}{300}\), \(P(B\cup C)=1\) (all take at least one).

Step 2. Find the overlap via the addition rule
\(1=\tfrac{250}{300}+\tfrac{150}{300}-P(B\cap C)\Rightarrow P(B\cap C)=\tfrac{100}{300}\).

Step 3. 'And not'
\[ P(B\cap C^c)=P(B)-P(B\cap C)=\tfrac{250}{300}-\tfrac{100}{300}=\tfrac{150}{300}. \]

\( P(B\cap C^c)=\dfrac12 \) (150 of 300 take Biology only).

EX. 07

Two letters from APPLE, all different — counting

MAIf5-P2a easy

Consider the word APPLE. Choose two letters at random without replacement. What is the probability that they are different?

Equally-likely selections → count (section 9). \(\binom{5}{2}=10\) pairs. 'All different' → count the complement (the pairs that repeat) and subtract.

Step 1. Count the sample space
Letters A, P, P, L, E → \(\binom{5}{2}=10\) unordered pairs.

Step 2. Complement
The only pair of equal letters is PP — exactly 1 outcome.

Step 3. Subtract
Different pairs \(=10-1=9\), so \(P(\text{different})=\tfrac{9}{10}\).

\( P(\text{different})=\dfrac{9}{10} \).

EX. 08

Three letters from APPLE, both P's chosen — counting

MAIf5-P2b easy

From the word APPLE, choose three letters at random without replacement. What is the probability that both P's are among the chosen letters?

\(\binom{5}{3}=10\) selections. 'Both P's included' → fix the two P's, count choices for the one remaining slot.

Step 1. Count the sample space
\(\binom{5}{3}=10\) unordered selections of three letters.

Step 2. Fix the required items
If both P's are taken, the third letter is one of A, L, E → 3 favourable selections.

Step 3. Divide
\( P(\text{both P's})=\tfrac{3}{10} \).

\( P(\text{both P's})=\dfrac{3}{10} \).

CH. 02 Statistics

Discrete Random Variables & Key Models

Expectation, variance, and the Bernoulli / Binomial / Poisson / Geometric toolkit that feeds every later chapter

Sections 04

Flashcards 8

Exercises 6

Read time 3'

Sources Ross, Introductory Statistics 3E, ch. 5 · Ross, Prob & Stats for Engineers 5E, §5.1–5.4 · Esercizi 3–4 (worked)

Key concepts

pmf Expectation E[X] Variance Var(X) Var of linear combos Bernoulli/Binomial Poisson Geometric Discrete uniform

pmf, expectation, variance

A discrete random variable \(X\) is described by its pmf \(p(x)=P(X=x)\), with \(\sum_x p(x)=1\). Two summaries do most of the work:

\[ E[X]=\sum_x x\,p(x), \qquad \mathrm{Var}(X)=E[X^2]-\big(E[X]\big)^2,\quad E[X^2]=\sum_x x^2 p(x). \]

The variance shortcut \(E[X^2]-(E[X])^2\) is almost always faster than \(\sum (x-\mu)^2 p(x)\) on the exam — compute \(E[X]\) and \(E[X^2]\) in one pass over the table, then subtract.

Normalising trick. If a pmf is given up to a constant \(c\) (e.g. \(p(i)=c\,\lambda^i/i!\)), find \(c\) by forcing \(\sum_x p(x)=1\). Recognising the series (here \(\sum \lambda^i/i! = e^\lambda\)) names the distribution.

Expectation & variance of linear combinations

These rules are the engine of the estimator-theory chapter (bias/MSE) — learn them cold.

\[ E[aX+bY+c]=aE[X]+bE[Y]+c \quad(\text{always}). \]\[ \mathrm{Var}(aX+b)=a^2\mathrm{Var}(X), \qquad \mathrm{Var}(aX+bY)=a^2\mathrm{Var}(X)+b^2\mathrm{Var}(Y)+2ab\,\mathrm{Cov}(X,Y). \]

If \(X,Y\) are independent then \(\mathrm{Cov}(X,Y)=0\), so \(\mathrm{Var}(aX+bY)=a^2\mathrm{Var}(X)+b^2\mathrm{Var}(Y)\). Note expectation is linear unconditionally, but a constant added to \(X\) never changes the variance: \(\mathrm{Var}(Y+4)=\mathrm{Var}(Y)\).

The four key discrete models

Model	When	pmf	Mean	Var
Bernoulli(p)	one yes/no trial	\(p,\,1-p\)	\(p\)	\(p(1-p)\)
Binomial(n,p)	# successes in \(n\) indep. trials	\(\binom{n}{k}p^k(1-p)^{n-k}\)	\(np\)	\(np(1-p)\)
Poisson(λ)	count of rare events / given a rate	\(e^{-\lambda}\lambda^k/k!\)	\(\lambda\)	\(\lambda\)
Geometric(p)	# trials until first success	\((1-p)^{k-1}p\)	\(1/p\)	\((1-p)/p^2\)
Discrete Unif.	\(m\) equally-likely values	\(1/m\)	midpoint	—

Geometric on the exam: its pmf is usually given inside the problem (e.g. "draw with replacement until a black ball"); you just identify \(p\) and plug in. The MLE \(\hat\theta=1/\bar X\) is derived later (estimator chapter).

Poisson tell: "average number per week/page/area" and a count question → Poisson with \(\lambda\)=that average. \(P(X\ge1)=1-e^{-\lambda}\) is the most common ask.

Which model? — recognition guide

If the wording says…	Model
"\(n\) independent trials, each succeeds w.p. \(p\); how many succeed"	Binomial\((n,p)\)
"average / mean number of [events] per [unit]", count question	Poisson\((\lambda)\)
"repeat until the first success", "number of draws needed"	Geometric\((p)\)
finitely many equally-likely outcomes	Discrete uniform
pmf given up to a constant \(c\)	normalise: \(\sum p(x)=1\)

Click any card to flip. Rate it after to track what you need to revisit.

Card 1

Fastest way to compute Var(X) from a discrete pmf table?

\( \mathrm{Var}(X)=E[X^2]-(E[X])^2 \). One pass over the table: accumulate \(\sum x\,p(x)\) and \(\sum x^2 p(x)\), then subtract the square of the first from the second.

How sure? 50%

Card 2

Variance of a linear combination \(aX+bY\) — general formula, and the independent case?

\( \mathrm{Var}(aX+bY)=a^2\mathrm{Var}(X)+b^2\mathrm{Var}(Y)+2ab\,\mathrm{Cov}(X,Y) \). If independent, \(\mathrm{Cov}=0\) so it drops to \(a^2\mathrm{Var}(X)+b^2\mathrm{Var}(Y)\). Adding a constant changes nothing: \(\mathrm{Var}(X+c)=\mathrm{Var}(X)\).

How sure? 50%

Card 3

Recognition cue: "n independent trials, each succeeds with probability p, how many succeed?" — model + mean + variance.

Binomial\((n,p)\): \(P(X=k)=\binom{n}{k}p^k(1-p)^{n-k}\), \(E=np\), \(\mathrm{Var}=np(1-p)\).

How sure? 50%

Card 4

Recognition cue: "the average number of events per week/page is λ", asked a count probability — model + the most common quantity.

Poisson\((\lambda)\): \(P(X=k)=e^{-\lambda}\lambda^k/k!\), \(E=\mathrm{Var}=\lambda\). Most asked: \(P(X\ge1)=1-e^{-\lambda}\).

How sure? 50%

Card 5

Recognition cue: "repeat trials until the first success / number of draws needed" — model + pmf + mean.

Geometric\((p)\): \(P(X=k)=(1-p)^{k-1}p\) for \(k=1,2,\dots\), \(E=1/p\), \(\mathrm{Var}=(1-p)/p^2\). On the exam \(p\) is identified from the setup; pmf often given.

How sure? 50%

Card 6

Bernoulli(p): mean and variance? How does it relate to Binomial?

\(E=p\), \(\mathrm{Var}=p(1-p)\). A Binomial\((n,p)\) is the sum of \(n\) independent Bernoulli\((p)\), hence \(E=np\), \(\mathrm{Var}=np(1-p)\).

How sure? 50%

Card 7

A pmf is given as \(p(i)=c\,\lambda^i/i!\), \(i=0,1,2,\dots\). How do you find c, and what distribution is it?

Force \(\sum_i p(i)=1\): \(c\sum \lambda^i/i! = c\,e^\lambda=1\Rightarrow c=e^{-\lambda}\). So \(X\sim\)Poisson\((\lambda)\).

How sure? 50%

Card 8

Why is \(E[X^2]\neq (E[X])^2\) in general, and what is the gap equal to?

The gap is exactly the variance: \(E[X^2]-(E[X])^2=\mathrm{Var}(X)\ge0\). Equality only if \(X\) is a.s. constant.

How sure? 50%

Progress

0 / 6

EX. 01

Expectation and variance from a frequency table

Esercizio 3, n.11 easy

A company has 50 employees. The number of years \(X\) each has been with the company is distributed as: 12 employees → 1 year, 15 → 2 years, 16 → 3 years, 7 → 4 years. A worker is chosen at random. Find \(E[X]\) and \(\mathrm{Var}(X)\).

Turn frequencies into a pmf (divide by 50). Then \(E[X]=\sum x\,p(x)\), \(E[X^2]=\sum x^2 p(x)\), \(\mathrm{Var}=E[X^2]-(E[X])^2\).

Step 1. pmf
\(P(X{=}1)=\tfrac{12}{50}\), \(P(X{=}2)=\tfrac{15}{50}\), \(P(X{=}3)=\tfrac{16}{50}\), \(P(X{=}4)=\tfrac{7}{50}\).

Step 2. E[X]
\( E[X]=\dfrac{12+30+48+28}{50}=\dfrac{118}{50}=2.36 \).

Step 3. E[X²]
\( E[X^2]=\dfrac{1\cdot12+4\cdot15+9\cdot16+16\cdot7}{50}=\dfrac{12+60+144+112}{50}=\dfrac{328}{50}=6.56 \).

Step 4. Variance
\( \mathrm{Var}(X)=6.56-2.36^2=6.56-5.5696=0.9904 \).

\( E[X]=2.36 \), \( \mathrm{Var}(X)=0.9904 \). (Note: the original solution sheet mis-summed \(E[X^2]\) as 6.36 — the correct value is 6.56.)

EX. 02

Defective bearings — Binomial

Esercizio 3, n.4 medium

Each ball bearing is independently defective with probability 0.05. A sample of 5 is inspected. Find (a) the probability none are defective, (b) the probability two or more are defective.

\(X=\)#defective \(\sim\)Binomial\((5,0.05)\). For (b) use the complement \(P(X\ge2)=1-P(X{=}0)-P(X{=}1)\).

Step 1. (a) none defective
\( P(X=0)=\binom{5}{0}(0.05)^0(0.95)^5=0.95^5=0.7738 \).

Step 2. P(X=1)
\( P(X=1)=\binom{5}{1}(0.05)(0.95)^4=5\cdot0.05\cdot0.8145=0.2036 \).

Step 3. (b) two or more
\( P(X\ge2)=1-P(X{=}0)-P(X{=}1)=1-0.7738-0.2036=0.0226 \).

(a) \(0.7738\); (b) \(0.0226\).

EX. 03

Highway accidents — Poisson

Esercitazione 4, n.5 easy

The average number of accidents per week on a highway is 1.2. Compute the probability that there is at least one accident this week.

"Average number per week" + count → Poisson\((\lambda=1.2)\). "At least one" → complement of zero.

Step 1. Model
\(X\sim\)Poisson\((1.2)\), \(P(X=k)=e^{-1.2}\,1.2^k/k!\).

Step 2. Complement
\( P(X\ge1)=1-P(X=0)=1-e^{-1.2} \).

Step 3. Evaluate
\( =1-0.3012=0.6988 \).

\( P(X\ge1)=1-e^{-1.2}\approx0.6988 \) (≈70%).

EX. 04

Draw until a black ball — Geometric

Esercizio 3, n.8 medium

An urn has \(N\) white and \(M\) black balls. Balls are drawn one at a time with replacement until a black ball appears. What is the probability that exactly \(n\) draws are needed?

Each draw is independent, success = black, \(p=M/(N+M)\). "Number of draws until first success" → Geometric: \(P(X=n)=(1-p)^{n-1}p\).

Step 1. Success probability per draw
\( p=\dfrac{M}{N+M} \), so failure (white) has prob \(1-p=\dfrac{N}{N+M}\).

Step 2. Geometric pmf
Need \(n-1\) whites then a black: \( P(X=n)=\left(\dfrac{N}{N+M}\right)^{n-1}\dfrac{M}{N+M} \).

Step 3. Simplify
\( P(X=n)=\dfrac{M\,N^{\,n-1}}{(N+M)^n},\quad n=1,2,3,\dots \)

\( P(X=n)=\dfrac{M\,N^{\,n-1}}{(N+M)^{n}} \).

EX. 05

Mean and variance of linear combinations

Esercitazione 4, n.4 medium

Given \(E[X]=5\), \(\mathrm{Var}(X)=2\), \(E[Y]=12\), \(\mathrm{Var}(Y)=1\), \(\mathrm{Cov}(X,Y)=3\), compute: (a) \(E[3X+4Y]\); (b) \(E[2+5Y+X]\); (c) \(\mathrm{Var}(3X+4Y)\); (d) \(\mathrm{Var}(2+5Y+X)\); (e) \(\mathrm{Var}(Y+4)\).

Expectation is linear (constants pass through). For variance use \(\mathrm{Var}(aX+bY)=a^2\mathrm{Var}(X)+b^2\mathrm{Var}(Y)+2ab\,\mathrm{Cov}(X,Y)\); additive constants vanish.

Step 1. (a),(b) expectations
\(E[3X+4Y]=3(5)+4(12)=63\). \(E[2+5Y+X]=2+5(12)+5=67\).

Step 2. (c) variance with covariance
\( \mathrm{Var}(3X+4Y)=9(2)+16(1)+2\cdot3\cdot4\cdot3=18+16+72=106 \).

Step 3. (d) drop the constant
\( \mathrm{Var}(2+5Y+X)=\mathrm{Var}(5Y+X)=25(1)+1(2)+2\cdot5\cdot1\cdot3=25+2+30=57 \).

Step 4. (e) constant only
\( \mathrm{Var}(Y+4)=\mathrm{Var}(Y)=1 \).

(a) 63; (b) 67; (c) 106; (d) 57; (e) 1.

EX. 06

Normalise a pmf given up to a constant

Esercizio 3, n.7 medium

The pmf of \(X\) is \(p(i)=c\,\lambda^i/i!\) for \(i=0,1,2,\dots\) (\(\lambda>0\)). Find (a) \(c\) and \(P(X=0)\); (b) \(P(X>2)\). Hint: \(e^x=\sum_{i\ge0}x^i/i!\).

Make probabilities sum to 1; recognise the exponential series. The result is a named distribution.

Step 1. Normalise
\( \sum_{i\ge0}c\lambda^i/i!=c\,e^{\lambda}=1\Rightarrow c=e^{-\lambda} \). So \(X\sim\)Poisson\((\lambda)\).

Step 2. (a) P(X=0)
\( P(X=0)=e^{-\lambda}\lambda^0/0!=e^{-\lambda} \).

Step 3. (b) P(X>2)
\( P(X>2)=1-P(X{=}0)-P(X{=}1)-P(X{=}2)=1-e^{-\lambda}\left(1+\lambda+\tfrac{\lambda^2}{2}\right) \).

(a) \(c=e^{-\lambda}\), \(P(X=0)=e^{-\lambda}\); (b) \(P(X>2)=1-e^{-\lambda}\left(1+\lambda+\tfrac{\lambda^2}{2}\right)\).

CH. 03 Statistics

Normal Distribution

Standardize with Z, read Φ — tail probabilities, inverse-σ, percentiles, moments→probability

Sections 03

Flashcards 7

Exercises 4

Read time 3'

Sources Ross, Prob & Stats for Engineers 5E, §5.5 · Ross, Introductory Statistics 3E, §3.6 · Ross_Tables.pdf · Past exams: MACSf1/2, MAIf3/5

Key concepts

Standardization Z=(X−μ)/σ Φ table Inverse Φ⁻¹ Symmetry Φ(−z)=1−Φ(z) Percentiles Moments→probability

Standardize and read the table

If \(X\sim N(\mu,\sigma^2)\), convert to the standard normal \(Z\sim N(0,1)\) by

\[ Z=\frac{X-\mu}{\sigma}, \qquad P(X\le a)=\Phi\!\left(\frac{a-\mu}{\sigma}\right). \]

Everything is one of: a tail probability, an inverse (recover \(\sigma\) or a percentile), or a comparison. Two identities you use every time:

Symmetry: \( \Phi(-z)=1-\Phi(z) \), so \( P(X\ge a)=1-\Phi\!\big(\tfrac{a-\mu}{\sigma}\big)=\Phi\!\big(\tfrac{\mu-a}{\sigma}\big) \).
Central band: \( P(\mu-d\le X\le \mu+d)=2\Phi\!\big(\tfrac{d}{\sigma}\big)-1 \).

Table values worth memorising: \(\Phi(0.5)=0.6915\), \(\Phi(1)=0.8413\), \(\Phi(1/3)=0.6293\); inverses \(\Phi^{-1}(0.975)=1.96\), \(\Phi^{-1}(0.75)=0.675\).

The four exam flavours

Tail / comparison ("probability the return is at least 50", "which is more likely"): standardize, read \(\Phi\); for a comparison the bigger \(\Phi\big(\tfrac{\mu-a}{\sigma}\big)\) wins.
Inverse-σ ("95% are between 140 and 160, find σ"): write the band as \(2\Phi(d/\sigma)-1=\text{coverage}\), solve \(d/\sigma=z\), then \(\sigma=d/z\).
Percentiles ("25th pct = 3, 75th pct = 7"): \(\mu\) is the midpoint of symmetric percentiles; get \(\sigma\) from \(\Phi\big(\tfrac{x_p-\mu}{\sigma}\big)=p\).
Moments→probability ("E[X]=2, E[X(X−1)]=6, find Var and P(X≤4)"): recover \(\sigma^2=E[X^2]-(E[X])^2\) using \(E[X(X-1)]=E[X^2]-E[X]\), then standardize.

Which flavour? — recognition guide

Wording	Flavour	Key move
"probability ≥/≤ a", "which strategy/option is better"	tail / comparison	\(\Phi\big(\tfrac{\mu-a}{\sigma}\big)\), bigger wins
"X% are between a and b", find σ	inverse-σ	\(2\Phi(d/\sigma)-1=\) coverage
"the p-percentile is …", find μ and σ	percentiles	μ = midpoint; σ from \(\Phi^{-1}(p)\)
given E[X] and E[X²] or E[X(X−1)]	moments→prob	\(\sigma^2=E[X^2]-(E[X])^2\)

Click any card to flip. Rate it after to track what you need to revisit.

Card 1

How do you turn \(P(X\le a)\) for \(X\sim N(\mu,\sigma^2)\) into a table lookup?

Standardize: \(Z=\tfrac{X-\mu}{\sigma}\), so \(P(X\le a)=\Phi\!\big(\tfrac{a-\mu}{\sigma}\big)\). For the upper tail \(P(X\ge a)=\Phi\!\big(\tfrac{\mu-a}{\sigma}\big)\) by symmetry.

How sure? 50%

Card 2

Symmetry identity for the standard normal cdf, and the central-band probability?

\( \Phi(-z)=1-\Phi(z) \). Central band: \( P(\mu-d\le X\le\mu+d)=2\Phi(d/\sigma)-1 \).

How sure? 50%

Card 3

Recognition cue: "95% of values lie between a and b", asked the standard deviation. Procedure?

Inverse-σ. Band width \(d=(b-a)/2\) about the mean; set \(2\Phi(d/\sigma)-1=0.95\Rightarrow\Phi(d/\sigma)=0.975\Rightarrow d/\sigma=1.96\Rightarrow\sigma=d/1.96\).

How sure? 50%

Card 4

Recognition cue: "the 25th percentile is 3 and the 75th is 7". Find μ and σ.

μ = midpoint of symmetric percentiles = (3+7)/2 = 5. For σ: \(\Phi\big(\tfrac{7-5}{\sigma}\big)=0.75\Rightarrow 2/\sigma=\Phi^{-1}(0.75)=0.675\Rightarrow\sigma\approx2.96\).

How sure? 50%

Card 5

You're given E[X] and E[X(X−1)] for a normal X. How do you get Var(X)?

\(E[X(X-1)]=E[X^2]-E[X]\Rightarrow E[X^2]\). Then \(\mathrm{Var}(X)=E[X^2]-(E[X])^2\).

How sure? 50%

Card 6

Comparing two normals \(N(\mu_1,\sigma_1^2)\) vs \(N(\mu_2,\sigma_2^2)\) for \(P(\text{return}\ge a)\) — which is larger?

Compute \(\Phi\big(\tfrac{\mu_i-a}{\sigma_i}\big)\) for each; the larger value wins. A bigger mean helps, but a large σ can hurt by pulling the standardized threshold toward 0.

How sure? 50%

Card 7

Key Φ and Φ⁻¹ values to have memorised for the exam?

\(\Phi(0.5)=0.6915\), \(\Phi(1)=0.8413\), \(\Phi(1/3)=0.6293\); \(\Phi^{-1}(0.975)=1.96\), \(\Phi^{-1}(0.95)=1.645\), \(\Phi^{-1}(0.75)=0.675\).

How sure? 50%

Progress

0 / 4

EX. 01

Two strategies — compare normal tail probabilities

MACSf1-P3 medium

Strategy A yields a return \(\sim N(100,100^2)\); strategy B yields a return \(\sim N(60,30^2)\). If the return must be at least 50, which strategy should be chosen?

Compute \(P(R\ge50)=\Phi\big(\tfrac{\mu-50}{\sigma}\big)\) for each, then compare.

Step 1. Strategy A
\( P(R_A\ge50)=P\big(Z\ge\tfrac{50-100}{100}\big)=P(Z\ge-\tfrac12)=\Phi(\tfrac12)=0.6915 \).

Step 2. Strategy B
\( P(R_B\ge50)=P\big(Z\ge\tfrac{50-60}{30}\big)=P(Z\ge-\tfrac13)=\Phi(\tfrac13)=0.6293 \).

Step 3. Compare
\(0.6915>0.6293\), so A is more likely to clear the 50 threshold.

Choose Strategy A (\(0.6915\) vs \(0.6293\)).

EX. 02

Children's heights — recover σ (inverse problem)

MACSf2-P1 medium

Heights of 12-year-olds are \(N(150,\sigma^2)\) cm, and 95% lie between 140 and 160 cm. Find \(\sigma\).

The band 140–160 is symmetric about 150 with half-width \(d=10\). Use \(2\Phi(d/\sigma)-1=0.95\).

Step 1. Write the band
\( 0.95=P(140\le X\le160)=2\Phi\!\big(\tfrac{10}{\sigma}\big)-1 \).

Step 2. Solve for the z
\( \Phi(10/\sigma)=0.975\Rightarrow 10/\sigma=\Phi^{-1}(0.975)=1.96 \).

Step 3. Recover σ
\( \sigma=10/1.96\approx5.10 \).

\( \sigma\approx5.10 \) cm.

EX. 03

From moments to a probability

MAIf3-P3 medium

A normal \(X\) has \(E[X]=2\) and \(E[X(X-1)]=6\). Determine \(\mathrm{Var}(X)\) and \(P(X\le4)\).

\(E[X(X-1)]=E[X^2]-E[X]\) gives \(E[X^2]\); then variance, then standardize.

Step 1. Find E[X²]
\( 6=E[X(X-1)]=E[X^2]-E[X]=E[X^2]-2\Rightarrow E[X^2]=8 \).

Step 2. Variance
\( \mathrm{Var}(X)=E[X^2]-(E[X])^2=8-4=4 \), so \(\sigma=2\).

Step 3. Standardize
\( P(X\le4)=P\big(Z\le\tfrac{4-2}{2}\big)=\Phi(1)=0.8413 \).

\( \mathrm{Var}(X)=4 \), \( P(X\le4)=\Phi(1)=0.8413 \).

EX. 04

Quartiles → mean and standard deviation

MAIf5-P3 medium

A normal random variable has 25th percentile 3.0 and 75th percentile 7.0. Find its mean and standard deviation.

By symmetry μ is the midpoint of the two percentiles. For σ use \(\Phi\big(\tfrac{x_{0.75}-\mu}{\sigma}\big)=0.75\).

Step 1. Mean
\( \mu=\tfrac{3+7}{2}=5 \) (midpoint of symmetric percentiles).

Step 2. Set up σ
\( 0.75=P(X\le7)=\Phi\!\big(\tfrac{7-5}{\sigma}\big)=\Phi(2/\sigma)\Rightarrow 2/\sigma=\Phi^{-1}(0.75)=0.675 \).

Step 3. Recover σ
\( \sigma=2/0.675\approx2.96 \).

\( \mu=5 \), \( \sigma\approx2.96 \).

CH. 04 Statistics

Descriptive Statistics & Correlation

Weighted & grouped means, back-solving, augmenting a sample, reading dispersion, and the collinearity shortcut for correlation

Sections 06

Flashcards 6

Exercises 6

Read time 3'

Sources Ross, Introductory Statistics 3E, ch. 2–3 · Past exams: MACSf1/2, MAIf2/3/5, sampletest1, EBf1

Key concepts

Sample mean & variance Σxᵢ = n·x̄ Weighted/grouped mean Back-solving group sizes Augmenting a sample Dispersion ↔ SD Sample correlation r

Sample mean, variance, and the two sums you reconstruct

For a sample \(x_1,\dots,x_n\):

\[ \bar x=\frac1n\sum_i x_i, \qquad s^2=\frac{1}{n-1}\sum_i (x_i-\bar x)^2 \quad(\text{note the }n-1). \]

Exam problems rarely give you the raw data — they give summaries and ask you to recover the two totals:

\[ \sum_i x_i = n\,\bar x, \qquad \sum_i (x_i-\bar x)^2 = (n-1)\,s^2. \]

Once you hold those two sums you can add a point, merge groups, or back-solve a missing piece.

Weighted / grouped mean & back-solving

An overall mean is the weighted average of subgroup means, weights = subgroup proportions (or sizes):

\[ \bar x=\sum_g w_g\,\bar x_g, \qquad w_g=\frac{n_g}{\sum_h n_h}. \]

Forward: proportions and group means given → plug in. Back-solve: the overall mean is given and a group mean / group size is unknown → set up one linear equation (or a 2×2 system if a difference between group means is also stated) and solve.

Typical back-solve: total grade \(=\) (class-A mean)·\(n_A+\)(class-B mean)·\((N-n_A)\), divided by \(N\), equals the stated overall mean → solve for \(n_A\).

Augmenting a sample with a new observation

Adding one point \(x_{n+1}\): recover \(\sum x_i=n\bar x\) and \(\sum(x_i-\bar x)^2=(n-1)s^2\), add the new value, recompute.

New mean: \( \bar x_{new}=\dfrac{n\bar x + x_{n+1}}{n+1} \).
If the new point equals the old mean, the mean is unchanged and the sum of squared deviations is unchanged; the variance still shifts because the divisor grows from \(n-1\) to \(n\). Example: \(n=12\), \(\bar x=10\), \(s^2=12\), add \(x_{13}=10\) → mean stays 10, SS stays \(132\), new \(s^2=132/12=11\).

Reading dispersion from a histogram (no computation)

When several histograms share the same mean and you must rank their standard deviations without computing: SD measures spread about the mean, i.e. the typical size of \((x_i-\bar x)^2\).

Mass concentrated near the centre (mound/bell shape) → smallest SD.
Mass spread evenly (flat/uniform) → middle SD.
Mass pushed to the extremes (U-shape / bimodal) → largest SD.

So for shapes {mound B, uniform A, U-shape C} with SDs {1.38, 1.71, 1.98}: B = 1.38 (smallest), A = 1.71, C = 1.98 (largest). You cannot actually compute an SD from a histogram — the argument is purely about where the mass sits.

Sample correlation — spot the line

The sample correlation coefficient is

\[ r=\frac{\sum_i (x_i-\bar x)(y_i-\bar y)}{\sqrt{\sum_i (x_i-\bar x)^2}\,\sqrt{\sum_i (y_i-\bar y)^2}}\in[-1,1]. \]

Exam shortcut: if the points lie exactly on a straight line \(y=a+bx\), then \(r=\operatorname{sign}(b)\) — \(+1\) for positive slope, \(-1\) for negative — with no arithmetic. The hint "try to plot the points" is the giveaway. Only compute the full formula if the points are not perfectly collinear.

Which tool? — recognition guide

Wording	Tool
subgroup proportions + group means → overall mean	weighted mean (forward)
overall mean given, a group mean or group size missing	back-solve linear eq / 2×2 system
"add a new observation", new mean/variance	recover \(\sum x\), \(\sum(x-\bar x)^2\); add; recompute
several histograms, same mean, "match the SDs"	dispersion reasoning (no computation)
few (x,y) pairs, "correlation", "plot the points"	collinear → \(r=\pm1\); else full formula

Click any card to flip. Rate it after to track what you need to revisit.

Card 1

You're given the sample mean x̄ and variance s² for n points but not the raw data. Which two totals do you reconstruct, and how?

\( \sum x_i=n\bar x \) and \( \sum(x_i-\bar x)^2=(n-1)s^2 \). These let you add points, merge groups, or back-solve.

How sure? 50%

Card 2

Overall mean from subgroup means — formula, and what changes when you instead know the overall mean and a group is missing?

\( \bar x=\sum_g w_g\bar x_g \), \(w_g=n_g/\sum n_h\). If the overall mean is given and a group mean/size is unknown, set the same identity equal to the given overall mean and solve the resulting linear equation.

How sure? 50%

Card 3

Augment a sample of n (mean x̄, var s²) with x_{n+1}=x̄ (equal to the old mean). What happens to the mean and the variance?

Mean unchanged. Sum of squared deviations unchanged (new point adds 0). But variance changes because the divisor grows: \(s^2_{new}=\dfrac{(n-1)s^2}{n}\). E.g. 12→13 points: \(132/12=11\) (was 12).

How sure? 50%

Card 4

Several histograms share the same mean; rank their standard deviations without computing. What's the rule?

SD ↔ spread about the mean. Mass near centre (mound) = smallest SD; flat/uniform = middle; mass at the extremes (U-shape) = largest SD. SD cannot actually be read off a histogram — argue from where the mass sits.

How sure? 50%

Card 5

Exam shortcut for the sample correlation coefficient when the (x,y) points lie exactly on a line y=a+bx?

\( r=\operatorname{sign}(b) \): +1 for positive slope, −1 for negative. No arithmetic needed. The hint "plot the points" signals collinearity.

How sure? 50%

Card 6

Full formula for the sample correlation coefficient, and its range?

\( r=\dfrac{\sum(x_i-\bar x)(y_i-\bar y)}{\sqrt{\sum(x_i-\bar x)^2}\sqrt{\sum(y_i-\bar y)^2}}\in[-1,1] \).

How sure? 50%

Progress

0 / 6

EX. 01

Average salary across staff types

MACSf2-P3 easy

Staff is 10% type A, 70% type B, 20% type C, with average monthly salaries 1000, 2000, 3000 respectively. What is the average salary across all staff?

Weighted mean: proportions are the weights.

Step 1. Apply the weighted mean
\( \bar x=0.1(1000)+0.7(2000)+0.2(3000)=100+1400+600 \).

\( \bar x=2100 \).

EX. 02

Augment a 12-point sample

MAIf3-P1 medium

Twelve numbers have sample mean 10 and sample variance 12. A new observation \(x_{13}=10\) is added. Find the mean and variance of the 13-point sample.

Recover \(\sum x_i=12\cdot10\) and \(\sum(x_i-10)^2=11\cdot12\). The new point equals the mean.

Step 1. Recover the totals
\( \sum_{i=1}^{12}x_i=12\cdot10=120 \); \( \sum_{i=1}^{12}(x_i-10)^2=11\cdot12=132 \).

Step 2. New mean
\( \bar x_{13}=\dfrac{120+10}{13}=\dfrac{130}{13}=10 \) (unchanged).

Step 3. New variance
New point adds \((10-10)^2=0\) to the SS, so \( s^2_{13}=\dfrac{132+0}{13-1}=\dfrac{132}{12}=11 \).

Mean \(=10\), variance \(=11\) (variance drops from 12 because the divisor grew from 11 to 12).

EX. 03

Female vs male average grade — a 2×2 system

MAIf5-P1 medium

A class of 100 has overall average grade 25.0. It is 60% female, 40% male, and the female–male difference in averages is 1.0. Find the female and male average grades.

Let \(\bar x_F=1+\bar x_M\) and \(0.6\bar x_F+0.4\bar x_M=25\).

Step 1. Set up
\( \bar x_F=1+\bar x_M \); weighted mean \(0.6\bar x_F+0.4\bar x_M=25\).

Step 2. Substitute
\( 0.6(1+\bar x_M)+0.4\bar x_M=25\Rightarrow 0.6+\bar x_M=25\Rightarrow \bar x_M=24.4 \).

Step 3. Back out the other
\( \bar x_F=1+24.4=25.4 \).

\( \bar x_M=24.4 \), \( \bar x_F=25.4 \).

EX. 04

Back-solve class sizes from a pooled mean

EBf1-P1 medium

Two classes take the same test. Class A averages 7.2, class B averages 6.7, and the 50 students together average 6.9. How many students are in each class?

Let \(n_A\) be class A's size, \(50-n_A\) class B's. Pooled mean = total grades / 50 = 6.9.

Step 1. Equation
\( \dfrac{7.2\,n_A+6.7(50-n_A)}{50}=6.9 \).

Step 2. Clear and solve
\( 7.2n_A+335-6.7n_A=345\Rightarrow 0.5\,n_A=10\Rightarrow n_A=20 \).

Class A has 20 students, class B has 30.

EX. 05

Match standard deviations to histograms

MAIf2-P1 / sampletest1-P6 easy

Three samples have the same mean \(\bar x=3.5\) over the range 0–7, with standard deviations (in some order) \(1.38,\,1.71,\,1.98\). Sample A is a flat/uniform histogram; Sample B is mound-shaped (mass concentrated near the centre); Sample C is U-shaped (mass at the extremes). Match each SD to its sample — without computing.

SD measures spread about the mean. Rank by how far the typical observation sits from 3.5.

Step 1. Smallest spread
Sample B (mound) keeps observations closest to \(\bar x\) → smallest SD \(=1.38\).

Step 2. Largest spread
Sample C (U-shape) pushes mass to the extremes → largest \((x-\bar x)^2\) → largest SD \(=1.98\).

Step 3. Middle
Sample A (uniform) is in between → \(1.71\).

B = 1.38, A = 1.71, C = 1.98. (An SD can't be read off a histogram — the ranking comes from where the mass sits.)

EX. 06

Correlation of five collinear points

MACSf1-P1 easy

Five pairs: \((1,3),(3,7),(5,11),(4,9),(2,5)\). What is the sample correlation coefficient? (Hint: plot the points.)

Check whether they lie on a line \(y=a+bx\). If so, \(r=\operatorname{sign}(b)\).

Step 1. Spot the line
Each pair satisfies \(y=2x+1\): \(1\!\to\!3,\,2\!\to\!5,\,3\!\to\!7,\,4\!\to\!9,\,5\!\to\!11\). All five are exactly collinear.

Step 2. Read off r
Positive slope \(b=2>0\), perfect line → \(r=+1\). No computation needed.

\( r=1 \).

CH. 05 Statistics

CLT & Sampling Distributions

Sums and counts of many i.i.d. units go Normal — standardize the total and read Φ

Sections 04

Flashcards 5

Exercises 3

Read time 3'

Sources Ross, Prob & Stats for Engineers 5E, ch. 6 (§6.2 sample mean, §6.3 CLT) · Ross, Introductory Statistics 3E, §5.5 (binomial→normal) · Past exams: MAIf2, EBf1, sampletest1

Key concepts

Central Limit Theorem Sum S_n≈N(nμ,nσ²) Sample mean X̄≈N(μ,σ²/n) Standardize the total Binomial→normal Continuity correction

The Central Limit Theorem

Take \(n\) independent, identically distributed units \(X_1,\dots,X_n\), each with mean \(\mu\) and variance \(\sigma^2\). For large \(n\) (rule of thumb \(n\ge30\)) their sum and mean are approximately Normal:

\[ S_n=\sum_{i=1}^n X_i \;\approx\; N\big(n\mu,\;n\sigma^2\big), \qquad \bar X \;\approx\; N\!\left(\mu,\;\frac{\sigma^2}{n}\right). \]

The shape of the individual \(X_i\) does not matter — only \(\mu\), \(\sigma^2\), and \(n\). This is what lets you answer "probability that the total exceeds a threshold" without knowing the per-unit distribution.

The standardization template

Almost every CLT exam problem is: many i.i.d. units, per-unit \(\mu\) and \(\sigma\) given, asked the probability a total crosses a threshold \(s\). The recipe:

Total parameters: mean \(n\mu\), variance \(n\sigma^2\), sd \(\sigma\sqrt n\).
Standardize: \( Z=\dfrac{s-n\mu}{\sigma\sqrt n} \).
Read \(\Phi\): \( P(S_n\le s)=\Phi(Z) \), \( P(S_n\ge s)=1-\Phi(Z) \).

\[ P(S_n\ge s)=1-\Phi\!\left(\frac{s-n\mu}{\sigma\sqrt n}\right). \]

Watch the direction ("sufficient to cover" / "exceed" → upper tail). A standardized value \(|Z|\gtrsim4\) means the probability is effectively 0 or 1.

Binomial → Normal approximation

A Binomial count is a sum of \(n\) Bernoulli's, so for large \(n\) the CLT gives

\[ X\sim\mathrm{Bin}(n,p)\;\approx\;N\big(np,\;np(1-p)\big). \]

Standardize with \(\mu=np\), \(\sigma=\sqrt{np(1-p)}\): \( P(X\le k)\approx\Phi\!\big(\tfrac{k-np}{\sqrt{np(1-p)}}\big) \).

Continuity correction. Because \(X\) is discrete, the more accurate version replaces \(k\) with \(k+\tfrac12\) (for \(\le\)) or \(k-\tfrac12\) (for \(\ge\)). Exams often say "without (half) continuity correction" — then use \(k\) as-is. Always check the wording.

Recognition guide

Wording	Tool
"\(n\) (≥30) i.i.d. units, per-unit μ and σ, prob the TOTAL exceeds …"	CLT on the sum: \(N(n\mu,n\sigma^2)\)
"average of \(n\) measurements is within … of μ"	CLT on the mean: \(N(\mu,\sigma^2/n)\)
"coin thrown \(n\) times / count of successes", large \(n\), "normal approximation"	Binomial→Normal: \(N(np,np(1-p))\)
"without half correction"	use \(k\) as-is, no \(\pm\tfrac12\)

Click any card to flip. Rate it after to track what you need to revisit.

Card 1

State the CLT for the sum and the mean of n i.i.d. units with mean μ and variance σ².

\( S_n=\sum X_i\approx N(n\mu,\,n\sigma^2) \) and \( \bar X\approx N(\mu,\,\sigma^2/n) \), for large n (≥30). The per-unit distribution shape is irrelevant.

How sure? 50%

Card 2

Recognition cue: many i.i.d. units, per-unit μ and σ given, asked the probability the TOTAL exceeds a threshold s. Procedure?

Sum ≈ \(N(n\mu,n\sigma^2)\). Standardize \(Z=\dfrac{s-n\mu}{\sigma\sqrt n}\), then \(P(S_n\ge s)=1-\Phi(Z)\).

How sure? 50%

Card 3

Binomial(n,p) normal approximation — the approximating distribution, and the continuity correction?

\( \mathrm{Bin}(n,p)\approx N(np,\,np(1-p)) \). Continuity correction: use \(k+\tfrac12\) for \(P(X\le k)\), \(k-\tfrac12\) for \(P(X\ge k)\). Skip it if the problem says "without half correction".

How sure? 50%

Card 4

When standardizing a SUM of n units, what are the mean and standard deviation you divide by?

Mean \(n\mu\), standard deviation \(\sigma\sqrt n\) (variance \(n\sigma^2\)). Do NOT use \(\sigma/\sqrt n\) — that's for the mean \(\bar X\), not the sum.

How sure? 50%

Card 5

A CLT problem gives a standardized value of Z≈9.5. What's the probability the total exceeds the threshold?

Essentially 0: \(1-\Phi(9.5)\approx0\). \(|Z|\gtrsim4\) already pins the tail probability to 0 (or 1 on the other side).

How sure? 50%

Progress

0 / 3

EX. 01

Will 40 cans of paint be enough?

MAIf2-P3 medium

A can of paint covers on average 52 m² with sd 3 m². You must paint 2260 m². What is the approximate probability that 40 cans suffice?

"Sufficient" means the total coverage \(S_{40}=\sum X_i\ge2260\). Use CLT on the sum: \(N(40\mu,40\sigma^2)\).

Step 1. Total parameters
Per can \(\mu=52\), \(\sigma^2=9\). Sum of 40: mean \(40\cdot52=2080\), variance \(40\cdot9=360\), sd \(\sqrt{360}\approx18.97\).

Step 2. Standardize
\( Z=\dfrac{2260-2080}{\sqrt{360}}=\dfrac{180}{18.97}\approx9.49 \).

Step 3. Read the tail
\( P(S_{40}\ge2260)=1-\Phi(9.49)\approx0 \).

≈ 0 — it is essentially impossible for 40 cans to cover 2260 m² (you'd expect them to cover only ~2080).

EX. 02

Do 36 batteries last a year?

EBf1-P3 medium

A battery lasts on average 10 days with sd 1 day; lifetimes are i.i.d. and each expired battery is replaced. Find the approximate probability that the total lifetime of 36 batteries exceeds one year (365 days).

Total lifetime \(L=\sum_{i=1}^{36}X_i\). CLT: \(L\approx N(36\mu,36\sigma^2)\). Want \(P(L>365)\).

Step 1. Total parameters
Per battery \(\mu=10\), \(\sigma^2=1\). For 36: mean \(360\), variance \(36\), sd \(6\).

Step 2. Standardize
\( Z=\dfrac{365-360}{6}=\dfrac{5}{6}\approx0.83 \).

Step 3. Read the tail
\( P(L>365)=1-\Phi(5/6)\approx1-0.7977=0.2023 \).

≈ 0.2023 (about 20%). Note: using the coarse table value \(\Phi(0.83)=0.7967\) gives ≈0.2033; the official answer 0.2023 uses \(\Phi(0.833)\approx0.7977\).

EX. 03

100 coin tosses — normal approximation

sampletest1-P3 medium

A fair coin is thrown 100 times; \(X\) = number of Heads. Using the normal approximation without half-correction, compute \(P(X\le60)\).

\(X\sim\mathrm{Bin}(100,\tfrac12)\). Approximate by \(N(np,np(1-p))\). "Without half correction" → use 60 directly.

Step 1. Approximating normal
\( np=50 \), \( np(1-p)=25 \), so \(X\approx N(50,25)\), sd \(=5\).

Step 2. Standardize (no correction)
\( Z=\dfrac{60-50}{5}=2 \).

Step 3. Read
\( P(X\le60)\approx\Phi(2)=0.9772 \).

\( P(X\le60)\approx0.9772 \).

CH. 06 Statistics

Estimator Theory

Bias, variance, MSE, efficiency, unbiasedness — plus MLE and method-of-moments. The estimator-algebra slice appears in 5 of 7 exams.

Sections 07

Flashcards 8

Exercises 7

Read time 3'

Sources Ross, Prob & Stats for Engineers 5E, §7.2 (MLE), §7.7 (evaluating an estimator) · Bombelli, StimaPunti Proprietà Estimatori (point estimation notes) · Past exams: MACSf1/2, MAIf2/3/5, sampletest1

Key concepts

Bias Variance MSE = Var + Bias² Unbiasedness Efficiency Min-MSE weighting Jensen's inequality MLE Method-of-moments

The estimator vocabulary

An estimator \(T\) for a parameter \(\theta\) is a function of the sample. Three numbers grade it:

\[ \mathrm{Bias}(T)=E[T]-\theta, \qquad \mathrm{Var}(T), \qquad \mathrm{MSE}(T)=E[(T-\theta)^2]=\mathrm{Var}(T)+\mathrm{Bias}(T)^2. \]

Unbiased means \(E[T]=\theta\) (Bias \(=0\)), and then \(\mathrm{MSE}=\mathrm{Var}\). More efficient = smaller MSE (smaller variance, among unbiased estimators).

The algebra engine (from the discrete-RV chapter): \(E\) is linear always; for independent pieces \( \mathrm{Var}(aU+bW)=a^2\mathrm{Var}(U)+b^2\mathrm{Var}(W) \). Population moments you reuse: Bernoulli \(E=p,\mathrm{Var}=p(1-p)\); Poisson \(E=\mathrm{Var}=\lambda\); sample mean of \(m\) obs \(\mathrm{Var}(\bar X_m)=\sigma^2/m\); and \(\sigma^2=E[X^2]-\mu^2\).

Unbiasedness & finding the constant

Weights that sum to 1 → automatically unbiased for the mean. If \(T=c\bar X_1+(1-c)\bar X_2\) then \(E[T]=\mu\) for every \(c\).

Find a constant. When asked "for what \(a\) is \(T\) unbiased for \(\sigma^2\)", set \(E[T]=\theta\) and solve. The recurring move is \(E[X_i^2]=\sigma^2+\mu^2\), and for independent \(X_i,X_j\), \(E[X_iX_j]=\mu^2\). Example: \(T=\frac1n\sum X_i^2+a\) with \(\mu\) known → \(E[T]=\sigma^2+\mu^2+a=\sigma^2\Rightarrow a=-\mu^2\).

Efficiency & minimum-MSE weighting

Compare efficiency: if all candidates are unbiased, compute each variance and pick the smallest.

Optimal weight. For \(M_c=c\bar X_1+(1-c)\bar X_2\) (independent), \(\mathrm{MSE}=c^2\mathrm{Var}_1+(1-c)^2\mathrm{Var}_2\); differentiate and set to 0. The minimiser is inverse-variance weighting

\[ c^*=\frac{1/\mathrm{Var}_1}{1/\mathrm{Var}_1+1/\mathrm{Var}_2}. \]

With sample means of sizes \(n_1,n_2\) (\(\mathrm{Var}_i=\sigma^2/n_i\)) this becomes \(c^*=\dfrac{n_1}{n_1+n_2}\) — weight each sample by its size. Put more weight on the more precise (larger / lower-variance) sample.

Nonlinear transforms break unbiasedness (Jensen)

If \(T\) is unbiased for \(\sigma^2\), is \(\sqrt T\) unbiased for \(\sigma\)? No. For a nonlinear \(g\), \(E[g(T)]\neq g(E[T])\) in general (Jensen's inequality). Since \(\sqrt{\cdot}\) is concave, \(E[\sqrt T]\le\sqrt{E[T]}=\sigma\), with strict inequality unless \(T\) is constant → \(\sqrt T\) is biased low for \(\sigma\). Whenever an exam takes a square root, log, or reciprocal of an unbiased estimator, the answer to "still unbiased?" is no.

Maximum likelihood (MLE)

Recipe: write the likelihood \(L(\theta)=\prod_i f(x_i\mid\theta)\), take \(\log\), differentiate, set to 0, solve.

\[ \ell(\theta)=\log L(\theta)=\sum_i \log f(x_i\mid\theta), \qquad \frac{d\ell}{d\theta}=0. \]

Geometric example \(f(x\mid\theta)=\theta(1-\theta)^{x-1}\): \(L=\theta^n(1-\theta)^{\sum(x_i-1)}\), \(\ell=n\log\theta+\big(\sum(x_i-1)\big)\log(1-\theta)\), giving \(\hat\theta=\dfrac{n}{\sum x_i}=\dfrac1{\bar X}\). On the exam the pmf/pdf is given in the problem — you just run the recipe.

Method-of-moments (MoM)

Equate the population moment to the sample moment and solve for the parameter. For one parameter, set \(E[X]=\bar X\) (using the theoretical mean as a function of \(\theta\)) and invert.

Example density \(f(x)=2x/\theta^2\) on \([0,\theta]\): \(E[X]=\int_0^\theta x\frac{2x}{\theta^2}dx=\frac{2\theta}{3}\). Set \(\bar X=\frac{2\theta}{3}\Rightarrow \hat\theta_M=\frac{3}{2}\bar X\). It is unbiased here (\(E[\hat\theta_M]=\theta\)) with \(\mathrm{MSE}=\theta^2/(8n)\to0\) — consistent. MoM and MLE can differ; MoM is usually the quicker algebra.

Recognition guide

Wording	Do
"which estimator is more efficient" (several unbiased)	compute each variance, smallest wins
"value of c that minimizes MSE"	differentiate MSE in c, or inverse-variance weight \(c^*\)
"determine constant a so T is unbiased for σ²/μ"	set \(E[T]=\theta\), use \(E[X^2]=\sigma^2+\mu^2\), solve
"is √T (or log, 1/T) unbiased"	No — Jensen, nonlinear ⇒ biased
"write the likelihood / find the MLE"	\(L=\prod f\) → log → differentiate → solve
"find the moments estimator"	\(E[X]=\bar X\) (as function of θ), invert
"Bias and MSE of T"	\(E[T]-\theta\); \(\mathrm{Var}(T)+\mathrm{Bias}^2\)

Click any card to flip. Rate it after to track what you need to revisit.

Card 1

Define bias, variance, and MSE of an estimator T for θ, and the identity linking them.

\(\mathrm{Bias}(T)=E[T]-\theta\); \(\mathrm{Var}(T)=E[(T-E[T])^2]\); \(\mathrm{MSE}(T)=E[(T-\theta)^2]=\mathrm{Var}(T)+\mathrm{Bias}(T)^2\). Unbiased ⇒ MSE = Var.

How sure? 50%

Card 2

Several UNBIASED estimators are given; how do you pick the most efficient?

Most efficient = smallest MSE = smallest variance (since unbiased). Compute each \(\mathrm{Var}\) via \(\mathrm{Var}(aU+bW)=a^2\mathrm{Var}(U)+b^2\mathrm{Var}(W)\) (independence) and compare coefficients.

How sure? 50%

Card 3

For \(M_c=c\bar X_1+(1-c)\bar X_2\) with independent samples of sizes \(n_1,n_2\): is it unbiased, and what c minimizes MSE?

Unbiased for every c (weights sum to 1). Min-MSE weight is inverse-variance: \(c^*=\dfrac{1/\mathrm{Var}_1}{1/\mathrm{Var}_1+1/\mathrm{Var}_2}=\dfrac{n_1}{n_1+n_2}\).

How sure? 50%

Card 4

How do you find the constant a making \(T=\frac1n\sum X_i^2+a\) unbiased for σ² when μ is known?

Use \(E[X_i^2]=\sigma^2+\mu^2\). Then \(E[T]=\sigma^2+\mu^2+a=\sigma^2\Rightarrow a=-\mu^2\).

How sure? 50%

Card 5

If T is unbiased for σ², is √T unbiased for σ? Why / why not?

No. Square root is concave, so by Jensen \(E[\sqrt T]\le\sqrt{E[T]}=\sigma\), strict unless T is degenerate → √T is biased low. Any nonlinear transform of an unbiased estimator is generally biased.

How sure? 50%

Card 6

MLE recipe in four moves?

1) Likelihood \(L(\theta)=\prod_i f(x_i\mid\theta)\). 2) Log-likelihood \(\ell=\sum\log f\). 3) \(d\ell/d\theta=0\). 4) Solve for \(\hat\theta\). Geometric \(\theta(1-\theta)^{x-1}\) → \(\hat\theta=1/\bar X\).

How sure? 50%

Card 7

Method-of-moments recipe for a one-parameter family?

Express the population mean as a function of θ, set it equal to the sample mean \(\bar X\), and invert. E.g. \(E[X]=2\theta/3\Rightarrow\hat\theta_M=\tfrac32\bar X\).

How sure? 50%

Card 8

Two key population-moment facts used constantly in estimator algebra?

\(E[X^2]=\sigma^2+\mu^2\) (from \(\sigma^2=E[X^2]-\mu^2\)); and for independent \(X_i,X_j\), \(E[X_iX_j]=E[X_i]E[X_j]=\mu^2\).

How sure? 50%

Progress

0 / 7

EX. 01

Which of three unbiased estimators is most efficient?

MACSf1-P6 hard

\(\hat p\) is the success proportion in a Bernoulli sample of size \(n=9\); \(Y\) is one further independent observation. Consider \(T_1=\hat p\), \(T_2=\tfrac12\hat p+\tfrac12 Y\), \(T_3=\tfrac{9}{10}\hat p+\tfrac{1}{10}Y\). Which is most efficient?

Check all three are unbiased (they are), then compare variances. Use \(\mathrm{Var}(\hat p)=p(1-p)/9\), \(\mathrm{Var}(Y)=p(1-p)\).

Step 1. All unbiased
Each is a weight-1 combination of unbiased pieces, so \(E[T_i]=p\) and MSE = Var.

Step 2. Variances
\(\mathrm{Var}(T_1)=\tfrac19 p(1-p)\). \(\mathrm{Var}(T_2)=\tfrac14\cdot\tfrac{p(1-p)}{9}+\tfrac14 p(1-p)=\tfrac{10}{36}p(1-p)\). \(\mathrm{Var}(T_3)=\tfrac{81}{100}\cdot\tfrac{p(1-p)}{9}+\tfrac1{100}p(1-p)=\tfrac{1}{10}p(1-p)\).

Step 3. Compare coefficients
\(\tfrac1{10}=0.100<\tfrac19\approx0.111<\tfrac{10}{36}\approx0.278\).

\(T_3\) is most efficient (smallest variance, coeff \(1/10\)); \(T_2\) is least efficient. Intuition: \(T_3\) leans on the bigger, more precise sample.

EX. 02

Combine two sample means — unbiased c and optimal c

MACSf2-P6 hard

\(\bar X_{10}\) and \(\bar Y_{15}\) are means of independent samples of sizes 10 and 15 from a population with mean μ. Let \(M_c=c\bar X_{10}+(1-c)\bar Y_{15}\). (a) For which c is \(M_c\) unbiased? (b) Find the c minimizing MSE.

Weights sum to 1 → unbiased for all c. MSE \(=c^2\sigma^2/10+(1-c)^2\sigma^2/15\); differentiate.

Step 1. (a) Unbiasedness
\(E[M_c]=c\mu+(1-c)\mu=\mu\) for every c → unbiased for all c.

Step 2. (b) MSE
\(\mathrm{MSE}(M_c)=\mathrm{Var}(M_c)=\sigma^2\!\left(\tfrac{c^2}{10}+\tfrac{(1-c)^2}{15}\right)\).

Step 3. Minimize
\(\dfrac{d}{dc}=\sigma^2\!\left(\tfrac{2c}{10}-\tfrac{2(1-c)}{15}\right)=\dfrac{\sigma^2}{15}(5c-2)=0\Rightarrow c=\tfrac25\). (Matches \(c^*=\tfrac{n_1}{n_1+n_2}=\tfrac{10}{25}\).)

(a) unbiased for every c; (b) \(c^*=\tfrac25\).

EX. 03

Constant for an unbiased variance estimator, and √T

MAIf2-P6 hard

Sample \((X_1,\dots,X_n)\) from a population with known mean \(\mu=3\) and unknown variance \(\sigma^2\). (a) Find \(a\) so that \(T=\frac1n\sum_{i=1}^n X_i^2+a\) is unbiased for \(\sigma^2\). (b) Is \(\sqrt T\) unbiased for \(\sigma\)?

(a) \(E[X_i^2]=\sigma^2+\mu^2=\sigma^2+9\). (b) Think about \(E[\sqrt T]\) vs \(\sqrt{E[T]}\).

Step 1. (a) Expectation of T
\(E[T]=E[X_i^2]+a=(\sigma^2+9)+a\). Unbiased ⇒ \(\sigma^2+9+a=\sigma^2\Rightarrow a=-9\).

Step 2. (b) Square root
We'd need \(E[\sqrt T]=\sigma=\sqrt{E[T]}\). But \(\sqrt{\cdot}\) is concave, so by Jensen \(E[\sqrt T]<\sqrt{E[T]}=\sigma\) (strict unless T is constant).

(a) \(a=-9\); (b) No — \(\sqrt T\) is biased (low) for \(\sigma\) by Jensen's inequality.

EX. 04

Make a(X₁−Xₙ)² unbiased for σ²

MAIf3-P6 hard

Sample from a population with known mean \(\mu=3\), unknown variance \(\sigma^2\). Find the constant \(a\) so that \(T=a(X_1-X_n)^2\) is unbiased for \(\sigma^2\).

Expand the square; use \(E[X_i^2]=\sigma^2+9\) and \(E[X_1X_n]=\mu^2=9\) (independence).

Step 1. Expand
\(E[(X_1-X_n)^2]=E[X_1^2]+E[X_n^2]-2E[X_1X_n]\).

Step 2. Substitute moments
\(=(\sigma^2+9)+(\sigma^2+9)-2(9)=2\sigma^2\).

Step 3. Solve
\(E[T]=a\cdot2\sigma^2=\sigma^2\Rightarrow a=\tfrac12\).

\(a=\tfrac12\).

EX. 05

Bias and MSE of a Poisson estimator

sampletest1-P4 hard

Sample \((X_1,\dots,X_n)\) from Poisson(λ). For \(T=\tfrac12\!\left(\dfrac{X_1+\cdots+X_{n-1}}{n-1}+X_n\right)\), determine the bias and the MSE.

Write \(T=\tfrac12\bar X_{n-1}+\tfrac12 X_n\). Poisson: \(E=\mathrm{Var}=\lambda\); \(\mathrm{Var}(\bar X_{n-1})=\lambda/(n-1)\).

Step 1. Expectation → bias
\(E[T]=\tfrac12\lambda+\tfrac12\lambda=\lambda\Rightarrow\mathrm{Bias}=0\).

Step 2. Variance
\(\mathrm{Var}(T)=\tfrac14\cdot\dfrac{\lambda}{n-1}+\tfrac14\lambda=\dfrac{\lambda}{4}\!\left(\dfrac{1}{n-1}+1\right)=\dfrac{\lambda}{4}\cdot\dfrac{n}{n-1}\).

Step 3. MSE
Unbiased ⇒ \(\mathrm{MSE}=\mathrm{Var}=\dfrac{n\lambda}{4(n-1)}\).

Bias \(=0\); \(\mathrm{MSE}(T)=\dfrac{n\lambda}{4(n-1)}\).

EX. 06

Geometric MLE

MAIf5-P6 medium

Sample of \(n\) observations from \(f(x\mid\theta)=\theta(1-\theta)^{x-1}\), \(x=1,2,\dots\), \(\theta\in(0,1)\). (a) Write the likelihood. (b) Find the MLE. (c) For the sample \(3,5,1,2,4\), give the estimate.

\(L=\prod f\), then log, differentiate, set 0. The sum of exponents is \(\sum(x_i-1)\).

Step 1. (a) Likelihood
\(L(\theta)=\prod_{i=1}^n\theta(1-\theta)^{x_i-1}=\theta^n(1-\theta)^{\sum(x_i-1)}\).

Step 2. (b) Maximize log-likelihood
\(\ell=n\log\theta+\big(\sum(x_i-1)\big)\log(1-\theta)\); \(\dfrac{d\ell}{d\theta}=\dfrac{n}{\theta}-\dfrac{\sum(x_i-1)}{1-\theta}=0\Rightarrow\hat\theta=\dfrac{n}{\sum x_i}=\dfrac1{\bar X}\).

Step 3. (c) Plug in data
\(\bar X=(3+5+1+2+4)/5=3\Rightarrow\hat\theta=1/3\).

(a) \(L=\theta^n(1-\theta)^{\sum(x_i-1)}\); (b) \(\hat\theta=1/\bar X\); (c) \(\hat\theta=1/3\).

EX. 07

Method-of-moments for a triangular density

StimaPunti (Bombelli), Ex. 3 hard

Sample from \(f(x;\theta)=\dfrac{2x}{\theta^2}\) for \(x\in[0,\theta]\) (0 otherwise), \(\theta>0\). (a) Find \(E[X]\) and \(\mathrm{Var}(X)\). (b) Find the method-of-moments estimator of \(\theta\). (c) Its bias and MSE; behaviour as \(n\to\infty\).

\(E[X]=\int_0^\theta x\frac{2x}{\theta^2}dx\). MoM: set \(\bar X=E[X]\) and invert.

Step 1. (a) Moments
\(E[X]=\dfrac{2}{\theta^2}\!\int_0^\theta x^2dx=\dfrac{2\theta}{3}\); \(E[X^2]=\dfrac{2}{\theta^2}\!\int_0^\theta x^3dx=\dfrac{\theta^2}{2}\); \(\mathrm{Var}(X)=\dfrac{\theta^2}{2}-\dfrac{4\theta^2}{9}=\dfrac{\theta^2}{18}\).

Step 2. (b) MoM estimator
Set \(\bar X=E[X]=\dfrac{2\theta}{3}\Rightarrow\hat\theta_M=\dfrac32\bar X\).

Step 3. (c) Bias and MSE
\(E[\hat\theta_M]=\tfrac32\cdot\tfrac{2\theta}{3}=\theta\) → unbiased. \(\mathrm{Var}(\hat\theta_M)=\tfrac94\cdot\dfrac{\mathrm{Var}(X)}{n}=\tfrac94\cdot\dfrac{\theta^2}{18n}=\dfrac{\theta^2}{8n}\). So \(\mathrm{MSE}=\dfrac{\theta^2}{8n}\to0\).

\(E[X]=\tfrac{2\theta}{3}\), \(\mathrm{Var}(X)=\tfrac{\theta^2}{18}\); \(\hat\theta_M=\tfrac32\bar X\); unbiased with \(\mathrm{MSE}=\theta^2/(8n)\to0\) (consistent).

CH. 07 Statistics

Confidence Intervals & Sample Size

Build an interval, back-solve the sample size, recover n from a published CI, handle asymmetric tails — appears in all 7 past exams.

Sections 06

Flashcards 7

Exercises 7

Read time 3'

Sources Ross, Prob & Stats for Engineers 5E, §7.3 (CI mean), §7.4 (two-mean), §7.5 (proportion), §6.3.2 (sample size) · Past exams: MACSf1/2, MAIf2/3/5, sampletest1, EBf1

Key concepts

estimate ± margin z critical values CI for the mean CI for a proportion Sample size back-solve Worst-case p(1−p)=¼ Asymmetric tails z vs t

The CI machinery

Every confidence interval is estimate ± margin, the margin being a critical value times a standard error.

\[ \text{mean (known }\sigma\text{ / large }n): \quad \bar x\pm z_{\alpha/2}\,\frac{\sigma}{\sqrt n}\quad(\text{use }s\text{ if }\sigma\text{ unknown, large }n). \]\[ \text{proportion}: \quad \hat p\pm z_{\alpha/2}\,\sqrt{\frac{\hat p(1-\hat p)}{n}}. \]

Critical values: \(z_{0.025}=1.96\) (95%), \(z_{0.005}=2.576\) (99%), \(z_{0.05}=1.645\) (90%). The total length of a symmetric CI is twice the margin, \(2z_{\alpha/2}\cdot\text{SE}\).

Sample size: back-solve from a target precision

Given a target on the interval (total length \(L\), or half-width / accuracy \(d=L/2\)), set the length condition and solve for \(n\), then round up.

\[ \text{mean}: \; 2z_{\alpha/2}\frac{\sigma}{\sqrt n}\le L \;\Rightarrow\; n\ge\left(\frac{2z_{\alpha/2}\sigma}{L}\right)^2. \]\[ \text{proportion}: \; n\ge\frac{z_{\alpha/2}^2\,p(1-p)}{d^2}, \quad\text{worst case }p(1-p)=\tfrac14. \]

When \(\hat p\) is unknown (sample not yet taken), use the worst case \(p(1-p)\le\frac14\) — it guarantees the precision whatever the true \(p\).

Recover n from a published interval

If a poll reports "with \(C\%\) confidence, support is between \(a\) and \(b\)", read off \(\hat p=\tfrac{a+b}{2}\) and half-width \(d=\tfrac{b-a}{2}\), then invert the margin:

\[ z_{\alpha/2}\sqrt{\frac{\hat p(1-\hat p)}{n}}=d \;\Rightarrow\; n=\frac{z_{\alpha/2}^2\,\hat p(1-\hat p)}{d^2}. \]

E.g. \((48\%,52\%)\) at 90%: \(\hat p=0.5\), \(d=0.02\), \(n=1.645^2(0.25)/0.02^2\approx1691\).

Asymmetric tails (non-standard split)

A "95% CI" normally splits \(\alpha=0.05\) as \(2.5\%\) per tail. If the problem asks for an unequal split — say \(3\%\) in the lower tail, \(2\%\) in the upper — use a different z on each side:

\[ \Big(\bar x - z_{0.03}\tfrac{\sigma}{\sqrt n}, \; \bar x + z_{0.02}\tfrac{\sigma}{\sqrt n}\Big), \quad z_{0.03}=1.88,\; z_{0.02}=2.055. \]

The total tail probability still sums to \(\alpha\); only the per-side allocation changed, so the interval is no longer symmetric about \(\bar x\).

z or t?

Default to z: known σ, or large \(n\) (≥30) so the CLT applies and \(s\) is a good plug-in. Use t\(_{n-1}\) only for a small sample from a normal population with unknown σ. For \(n=100\) the two barely differ (e.g. \(z_{0.025}=1.96\) vs \(t_{99,0.025}\approx1.98\)) and either is acceptable — the exams lean on z.

Recognition guide

Wording	Do
"compute a C% CI for the mean/proportion"	estimate ± \(z_{\alpha/2}\)·SE
"how large a sample", "length less than L", "accurate to within ±d"	back-solve \(n\), round up; prop → use ¼
"support between a% and b%, find n"	\(\hat p\)=mid, \(d\)=half-width, invert
"lower tail 3%, upper tail 2%"	different z each side (asymmetric)
small n, normal, σ unknown	t\(_{n-1}\); else z

Click any card to flip. Rate it after to track what you need to revisit.

Card 1

CI for a mean (known σ or large n) and CI for a proportion — write both.

Mean: \(\bar x\pm z_{\alpha/2}\,\sigma/\sqrt n\) (use s if σ unknown, large n). Proportion: \(\hat p\pm z_{\alpha/2}\sqrt{\hat p(1-\hat p)/n}\). Critical z: 1.96 (95%), 2.576 (99%), 1.645 (90%).

How sure? 50%

Card 2

Sample-size formula for a mean given target total length L, and for a proportion given half-width d?

Mean: \(n\ge(2z_{\alpha/2}\sigma/L)^2\). Proportion: \(n\ge z_{\alpha/2}^2\,p(1-p)/d^2\), worst case \(p(1-p)=\tfrac14\). Always round UP.

How sure? 50%

Card 3

Why and when do you use the worst case p(1−p)=¼ in a sample-size calculation?

When p̂ is unknown because the sample hasn't been taken. \(p(1-p)\) is maximised at \(p=\tfrac12\) giving \(\tfrac14\); using it guarantees the target precision for any true p.

How sure? 50%

Card 4

A poll reports "90% confident, support between 48% and 52%". How do you recover the sample size?

\(\hat p=(0.48+0.52)/2=0.5\), half-width \(d=0.02\). Invert: \(n=z_{0.05}^2\hat p(1-\hat p)/d^2=1.645^2(0.25)/0.02^2\approx1691\).

How sure? 50%

Card 5

How do you build a 95% CI with 3% in the lower tail and 2% in the upper tail?

Use a different z each side: \((\bar x - z_{0.03}\sigma/\sqrt n,\; \bar x + z_{0.02}\sigma/\sqrt n)\), with \(z_{0.03}=1.88\), \(z_{0.02}=2.055\). Tails sum to α=5% but the interval is asymmetric.

How sure? 50%

Card 6

z or t for a confidence interval — decision rule?

z if σ known or n large (≥30). t\(_{n-1}\) only for small n from a normal population with unknown σ. For n=100 they nearly coincide (1.96 vs ≈1.98); exams default to z.

How sure? 50%

Card 7

What is the total length of a symmetric confidence interval, and how does it scale with n and with confidence?

Length \(=2z_{\alpha/2}\cdot\text{SE}\). It shrinks like \(1/\sqrt n\) (quadruple n to halve it) and grows with higher confidence (bigger \(z_{\alpha/2}\)).

How sure? 50%

Progress

0 / 7

EX. 01

99% CI for mean toothbrush purchases

MACSf1-P4 easy

In a sample of \(n=100\), the number of toothbrushes bought per year has mean \(\bar x=0.9\) and sd \(s=0.2\). Compute an approximate 99% confidence interval for the population mean.

Large n → z CI with s. 99% → \(z_{0.005}=2.576\). SE \(=0.2/\sqrt{100}=0.02\).

Step 1. Margin
\(z_{0.005}\,s/\sqrt n=2.576\cdot0.02=0.05152\).

Step 2. Interval
\(0.9\pm0.05152\).

\((0.84848,\,0.95152)\).

EX. 02

Tire lifetimes — CI then required sample size

MAIf3-P5 medium

Tire lifetimes are normal with known \(\sigma=3600\) miles. A sample of \(n=81\) gave \(\bar x=28400\). (a) Build a 95% CI for the mean. (b) How large a sample gives a 99% CI shorter than the interval in (a)?

(a) \(z_{0.025}=1.96\), \(\sqrt{81}=9\). (b) Set the 99% length \(\le\) the (a) length, solve for n, round up.

Step 1. (a) Margin and CI
\(1.96\cdot3600/9=1.96\cdot400=784\); CI \(=28400\pm784=(27616,29184)\), length \(1568\).

Step 2. (b) Length condition
99% length \(=2\cdot2.576\cdot3600/\sqrt n=18547.2/\sqrt n\le1568\).

Step 3. Solve for n
\(\sqrt n\ge18547.2/1568=11.83\Rightarrow n\ge139.9\Rightarrow n\ge140\).

(a) \((27616,\,29184)\); (b) at least \(140\) tires.

EX. 03

Sample size for a proportion (length < 0.1)

MACSf2-P4 medium

Estimate the percentage of spaghetti eaters who use parmigiano. With \(\hat p\) unknown, what sample size gives a 95% CI of total length less than 0.1?

\(2\cdot1.96\sqrt{p(1-p)/n}<0.1\); use the worst case \(p(1-p)=\tfrac14\).

Step 1. Length condition
\(2\cdot1.96\sqrt{p(1-p)/n}<0.1\Rightarrow n>\dfrac{4\cdot1.96^2}{0.1^2}p(1-p)\).

Step 2. Worst case
Use \(p(1-p)=\tfrac14\): \(n>\dfrac{4\cdot3.8416}{0.01}\cdot\tfrac14=384.16\).

Step 3. Round up
\(n\ge385\).

\(n\ge385\).

EX. 04

Super Bowl poll — sample size within ±0.02

EBf1-P5 medium

How large a sample is needed to be 90% confident that the estimated proportion of households watching is accurate to within ±0.02?

Half-width \(d=0.02\), 90% → \(z=1.645\), worst case \(p(1-p)=\tfrac14\).

Step 1. Formula
\(n\ge\dfrac{z_{0.05}^2\,p(1-p)}{d^2}=\dfrac{1.645^2\cdot\tfrac14}{0.02^2}\).

Step 2. Evaluate
\(=\dfrac{2.706\cdot0.25}{0.0004}=1691.3\).

Step 3. Round up
\(n\ge1692\).

\(n\ge1692\).

EX. 05

Recover the sample size from a published CI

MAIf2-P5 medium

A poll states: "with 90% confidence, the minister's support is between 48% and 52%". Using the standard proportion-CI formula, how large was the sample?

\(\hat p=0.5\) (midpoint), half-width \(d=0.02\). Invert \(z\sqrt{\hat p(1-\hat p)/n}=d\).

Step 1. Read off the CI
\(\hat p=(0.48+0.52)/2=0.5\); margin \(=0.02\); 90% → \(z=1.645\).

Step 2. Invert
\(1.645\sqrt{0.25/n}=0.02\Rightarrow n=\dfrac{1.645^2\cdot0.25}{0.02^2}=1691.3\).

\(n\approx1691\) (i.e. about 1691–1692).

EX. 06

Confidence interval with asymmetric tails

MAIf5-P4 medium

\(n=100\) observations, normal with known \(\sigma=1\), \(\bar x=3.5\). Build a 95% CI but with 3% probability in the lower tail and 2% in the upper tail.

Different z each side: \(z_{0.03}=1.88\) (lower), \(z_{0.02}=2.055\) (upper). \(\sigma/\sqrt n=0.1\).

Step 1. Lower bound
\(3.5-1.88\cdot0.1=3.5-0.188=3.312\).

Step 2. Upper bound
\(3.5+2.055\cdot0.1=3.5+0.2055=3.7055\).

\((3.312,\,3.7055)\) — note it is not symmetric about \(3.5\).

EX. 07

Large-sample CI — z or t

sampletest1-P5 easy

\(n=100\) measurements give mean \(1.0\) and sample sd \(2.0\). Compute a 95% CI for the true value.

Large n → z is the standard choice (\(z_{0.025}=1.96\)); t with df 99 (≈1.98) gives almost the same answer. SE \(=2/\sqrt{100}=0.2\).

Step 1. z interval
\(1\pm1.96\cdot0.2=1\pm0.392=(0.608,1.392)\).

Step 2. t interval (alternative)
Using \(t_{99,0.025}\approx1.98\): \(1\pm1.98\cdot0.2=1\pm0.396=(0.604,1.396)\).

z-CI \((0.608,1.392)\); t-CI \((0.604,1.396)\) — both legitimate, nearly identical at \(n=100\).

CH. 08 Statistics

Hypothesis Testing

One skeleton, five standard-error variants: one/two mean, one/two proportion, paired. Appears in all 7 past exams.

Sections 05

Flashcards 7

Exercises 7

Read time 3'

Sources Ross, Prob & Stats for Engineers 5E, ch. 8 (§8.2–8.4 means, §8.6 proportions, asymptotic) · Past exams: MACSf1/2, MAIf2/3/5, sampletest1, EBf1

Key concepts

H₀ / H₁ Test statistic z vs t One- vs two-sided Critical value p-value Pooled proportion Paired differences

The test skeleton

Every test is the same four moves; only the standard error changes.

State \(H_0,H_1\) — the direction comes from the wording (below).
Pick the statistic and its null distribution: \( \text{TS}=\dfrac{\text{estimate}-\text{null}}{\text{SE}} \), \(\sim N(0,1)\) (large n / known σ) or \(t_{df}\) (small n, normal).
Compute the observed value \(ts_{obs}\).
Decide: reject if \(ts_{obs}\) falls in the rejection region — two-sided \(|ts|\ge z_{\alpha/2}\), one-sided \(ts\ge z_\alpha\) (or \(\le-z_\alpha\)). Or report the p-value and reject when \(\text{p-value}\le\alpha\).

Default to z (large samples / known σ); §8.6–8.7 proportion and Poisson tests are asymptotic-normal too. Use t only for small-n normal data (and the paired test).

The standard-error table

Test	Statistic	Null dist.
1-mean, known σ / large n	\((\bar x-\mu_0)/(\sigma/\sqrt n)\)	N(0,1)
1-mean, small n	\((\bar x-\mu_0)/(s/\sqrt n)\)	\(t_{n-1}\)
paired	\((\bar d-0)/(s_d/\sqrt n)\) on \(d_i=\)before−after	\(t_{n-1}\)
2-mean, large n	\((\bar x_1-\bar x_2)/\sqrt{s_1^2/n_1+s_2^2/n_2}\)	N(0,1)
1-proportion	\((\hat p-p_0)/\sqrt{p_0(1-p_0)/n}\)	N(0,1)
2-proportion (pooled)	\((\hat p_1-\hat p_2)/\sqrt{\hat p_p(1-\hat p_p)(1/n_1+1/n_2)}\)	N(0,1)

Pooled proportion: \( \hat p_p=\dfrac{X_1+X_2}{n_1+n_2} \). Critical values: \(z_{0.025}=1.96\), \(z_{0.05}=1.645\); \(t_{9,0.05}=1.833\).

One-sided or two-sided? Read the wording

One-sided: "more than", "over 30%", "more effective", "improved", "larger mean" → \(H_1\) points one way; reject only in that tail with \(z_\alpha\) (1.645 at 5%).
Two-sided: "changed", "differ significantly", "need to recalibrate", "is there a difference" → \(H_1\neq\); reject in both tails with \(z_{\alpha/2}\) (1.96 at 5%).

For a two-group test, set \(H_0:\) difference \(=0\). The direction of \(H_1\) decides which tail; e.g. "B more effective" with statistic \((\bar x_A-\bar x_B)/SE\) rejects for small (negative) values.

p-value & the threshold significance level

The p-value is the null-probability of a statistic at least as extreme as observed, in the direction of \(H_1\): one-sided \(P(Z\ge ts_{obs})\); two-sided \(2P(Z\ge|ts_{obs}|)\). Reject whenever \(\alpha\ge\) p-value.

"Find the significance levels at which \(H_0\) is rejected" is just the p-value: e.g. a one-sided \(ts=0.93\) gives \(P(Z\ge0.93)=1-\Phi(0.93)=1-0.8238=0.1762\), so reject for \(\alpha\ge17.62\%\). A \(ts=2.4\) gives p \(=1-\Phi(2.4)=0.0082\) → reject for \(\alpha\ge0.82\%\).

Recognition guide

Wording	Test
one group vs a target μ₀, "recalibrate / changed"	1-mean z (two-sided)
one group vs target proportion, "more than X%"	1-prop z (one-sided)
two groups, "more effective / larger"	2-mean z (one-sided)
two proportions, "significant difference"	2-prop pooled z (two-sided)
same units measured before/after (Initial/Final)	paired t on differences
"at what significance levels reject"	compute the p-value

Click any card to flip. Rate it after to track what you need to revisit.

Card 1

The four moves of any hypothesis test?

1) State \(H_0,H_1\) (direction from wording). 2) Statistic \(\text{TS}=(\text{estimate}-\text{null})/\text{SE}\), null dist z or \(t_{df}\). 3) Compute \(ts_{obs}\). 4) Reject if in the rejection region (\(|ts|\ge z_{\alpha/2}\) two-sided, \(ts\ge z_\alpha\) one-sided), or if p-value ≤ α.

How sure? 50%

Card 2

Standard errors: 1-mean (known σ), 2-mean (large n), 1-proportion, 2-proportion pooled.

1-mean: \(\sigma/\sqrt n\). 2-mean: \(\sqrt{s_1^2/n_1+s_2^2/n_2}\). 1-prop: \(\sqrt{p_0(1-p_0)/n}\). 2-prop: \(\sqrt{\hat p_p(1-\hat p_p)(1/n_1+1/n_2)}\), \(\hat p_p=(X_1+X_2)/(n_1+n_2)\).

How sure? 50%

Card 3

Paired test: when, and what's the statistic?

When the same units are measured before/after (Initial/Final pairs). Form differences \(d_i\), then \(\text{TS}=\dfrac{\bar d-0}{s_d/\sqrt n}\sim t_{n-1}\). It's a one-sample t-test on the differences.

How sure? 50%

Card 4

Which words signal a ONE-sided vs a TWO-sided alternative?

One-sided: "more than", "over X%", "more effective", "improved", "larger". Two-sided: "changed", "differ", "significant difference", "need to recalibrate". One-sided uses \(z_\alpha\) (1.645), two-sided \(z_{\alpha/2}\) (1.96).

How sure? 50%

Card 5

How do you compute a p-value, and how do you answer "at what α is H₀ rejected?"

One-sided: \(P(Z\ge ts_{obs})=1-\Phi(ts_{obs})\); two-sided: \(2P(Z\ge|ts_{obs}|)\). Reject for every \(\alpha\ge\) p-value. So the p-value IS the threshold significance level.

How sure? 50%

Card 6

Two-group test: how do you set H₀, and how does the H₁ direction pick the tail?

\(H_0:\) difference \(=0\). If \(H_1:\mu_B>\mu_A\) and the statistic is \((\bar x_A-\bar x_B)/SE\), you reject for small (negative) values; if the statistic is \((\bar x_B-\bar x_A)/SE\), reject for large values. Keep the sign convention consistent.

How sure? 50%

Card 7

z or t for a hypothesis test?

z: known σ, or large n (proportions and Poisson tests are asymptotic-normal too). t\(_{n-1}\): small n from a normal population with unknown σ, and the paired test. Exams are mostly large-sample z; t appears once (paired).

How sure? 50%

Progress

0 / 7

EX. 01

Recalibrate the bottling machine? (1-mean, two-sided)

MACSf1-P5 easy

A machine should fill bottles to a mean of 750 g; content is normal with known \(\sigma=5\) g. A sample of \(n=25\) gives \(\bar x=745\) g. Is there reason to recalibrate (α = 0.05)?

"Recalibrate" = changed = two-sided. \(\text{TS}=(\bar x-750)/(\sigma/\sqrt n)\), reject if \(|ts|\ge1.96\).

Step 1. Hypotheses
\(H_0:\mu=750\) vs \(H_1:\mu\neq750\).

Step 2. Statistic
\(ts=\dfrac{745-750}{5/\sqrt{25}}=\dfrac{-5}{1}=-5\).

Step 3. Decide
\(-5<-1.96\) → in the rejection region.

Reject \(H_0\): the machine needs recalibrating.

EX. 02

Over 30% smokers? (1-proportion, one-sided + p-value)

MACSf2-P5 medium

66 of 200 adults are smokers. (a) At α = 0.05, can we conclude more than 30% are smokers? (b) At which significance levels would \(H_0\) be rejected?

One-sided: \(H_0:p\le0.30\) vs \(H_1:p>0.30\). \(\text{TS}=(\hat p-0.3)/\sqrt{0.3\cdot0.7/n}\); reject if \(ts\ge1.645\). (b) is the p-value.

Step 1. Statistic
\(\hat p=66/200=0.33\); \(ts=\dfrac{0.33-0.3}{\sqrt{0.21/200}}=\dfrac{0.03}{0.0324}=0.93\).

Step 2. (a) Decide
\(0.93

Step 3. (b) p-value
Reject when \(z_\alpha\le0.93\): \(1-\alpha\le\Phi(0.93)=0.8238\Rightarrow\alpha\ge0.1762\).

(a) Do not reject — can't conclude >30%. (b) Reject for \(\alpha\ge17.62\%\).

EX. 03

Is treatment B more effective? (2-mean, one-sided)

MAIf2-P4 medium

Two groups of \(n=140\): A has \(\bar x_A=105\), \(s_A=50\); B has \(\bar x_B=120\), \(s_B=60\) (higher = more effective). Can we claim B is more effective (α = 0.05)?

\(H_1:\mu_B>\mu_A\), i.e. \(H_0:\mu_A-\mu_B\ge0\) vs \(H_1:\mu_A-\mu_B<0\). Statistic \((\bar x_A-\bar x_B)/\sqrt{s_A^2/n_A+s_B^2/n_B}\); reject if \(ts\le-1.645\).

Step 1. SE
\(\sqrt{2500/140+3600/140}=\sqrt{43.571}=6.601\).

Step 2. Statistic
\(ts=\dfrac{105-120}{6.601}=-2.27\).

Step 3. Decide
\(-2.27<-1.645\) → reject.

Reject \(H_0\): treatment B is more effective.

EX. 04

Heavier bananas from provider 1? (2-mean, p-value)

MAIf3-P4 medium

Two providers, \(n=128\) each: provider 1 \(\bar x_1=155.0\), \(s_1=10\); provider 2 \(\bar x_2=152.0\), \(s_2=10\). Is the claim that provider 1's bananas are heavier justified?

\(H_0:\mu_1-\mu_2\le0\) vs \(H_1:\mu_1-\mu_2>0\). Compute \(ts\), then \(p=P(Z\ge ts)\).

Step 1. Statistic
\(ts=\dfrac{155-152}{\sqrt{100/128+100/128}}=\dfrac{3}{\sqrt{1.5625}}=\dfrac{3}{1.25}=2.4\).

Step 2. p-value
\(P(Z\ge2.4)=1-\Phi(2.4)=0.0082\).

Step 3. Decide
Reject for \(\alpha\ge0.82\%\); at 5% (and 1%) → reject.

\(ts=2.4\), p \(=0.0082\) → reject \(H_0\): the claim is justified.

EX. 05

Do two coins differ? (2-proportion, pooled)

MAIf5-P5 medium

Two coins are each thrown 800 times: A shows Heads 430 times, B shows Heads 400 times. Is there a significant difference in their Heads probabilities (α = 0.05)?

Two-sided 2-proportion test with pooled \(\hat p_p=(430+400)/1600\). Reject if \(|ts|\ge1.96\).

Step 1. Pooled proportion
\(\hat p_p=830/1600=0.51875\); \(\hat p_A=0.5375\), \(\hat p_B=0.5\).

Step 2. Statistic
\(ts=\dfrac{0.5375-0.5}{\sqrt{0.51875\cdot0.48125\,(1/800+1/800)}}=\dfrac{0.0375}{0.02498}=1.50\).

Step 3. Decide
\(1.50<1.96\) → do not reject.

Do not reject \(H_0\): the difference is not significant.

EX. 06

Did training improve scores? (paired t)

sampletest1-P2 hard

Ten gamers' scores are recorded before (Initial) and after (Final) a week of training. The improvements (Final−Initial) are \(0.6,-0.7,0.4,-1.4,1.7,0.6,2.4,1.0,1.9,0.5\). Does the data support that training improved the average score (α = 0.05)?

Paired data → one-sample t on the differences. \(H_0:\mu\le0\) vs \(H_1:\mu>0\). Critical \(t_{9,0.05}=1.833\).

Step 1. Difference summaries
\(\bar d=0.7\), \(s_d=1.15\), \(n=10\).

Step 2. Statistic
\(ts=\dfrac{0.7-0}{1.15/\sqrt{10}}=\dfrac{0.7}{0.3637}=1.92\).

Step 3. Decide
\(1.92>t_{9,0.05}=1.833\) → reject.

Reject \(H_0\): training improved the average score (this is the one exam where t, not z, is used).

EX. 07

Has the average grade changed? (1-mean, two-sided)

EBf1-P4 easy

Historical average grade is 23.5. A recent exam with \(n=100\) students had mean 25.0, sd 2.5. Can we conclude the average changed (α = 0.05)?

"Changed" = two-sided. Large n → z with s. Reject if \(|ts|\ge1.96\).

Step 1. Hypotheses
\(H_0:\mu=23.5\) vs \(H_1:\mu\neq23.5\).

Step 2. Statistic
\(ts=\dfrac{25.0-23.5}{2.5/\sqrt{100}}=\dfrac{1.5}{0.25}=6\).

Step 3. Decide
\(6\gg1.96\) → reject.

Reject \(H_0\): the average grade has significantly changed.

CH. 09 Statistics

Appendix — Low-ROI Topics

In the syllabus but never exam-tested: exponential, Chebyshev, the t/χ² distributions, variance inference — plus the out-of-scope chi-squared and F tests. Read once; don't over-invest.

Sections 05

Flashcards 5

Exercises 2

Read time 3'

Sources Ross, Prob & Stats for Engineers 5E, §4.9, §5.6, §5.8, §7.3.3, §8.5 · Margin note: §9 (chi-squared homogeneity, OUT of program)

Key concepts

Exponential & memoryless Chebyshev's inequality Weak Law of Large Numbers t distribution χ² distribution Variance CI/test Chi-squared homogeneity (out) F-test (out)

Exponential distribution (memoryless)

In the program but its inference (§7.6 exp-CI) is skipped, and it never appeared on an exam. Know only the basics: \(f(x)=\lambda e^{-\lambda x}\) for \(x\ge0\), with \(E[X]=1/\lambda\), \(\mathrm{Var}(X)=1/\lambda^2\), and tail \(P(X>t)=e^{-\lambda t}\).

Memoryless property: \(P(X>s+t\mid X>s)=P(X>t)\) — a used component is as good as new. This is the one fact most likely to be asked.

Chebyshev's inequality & the Weak Law

Distribution-free bound on how far a variable strays from its mean:

\[ P\big(|X-\mu|\ge a\big)\le\frac{\sigma^2}{a^2}, \qquad\text{equivalently}\qquad P\big(|X-\mu|\ge k\sigma\big)\le\frac1{k^2}. \]

It is loose (works for any distribution) but needs no normality. The Weak Law of Large Numbers follows: \(\bar X_n\to\mu\) in probability as \(n\to\infty\), since \(\mathrm{Var}(\bar X_n)=\sigma^2/n\to0\).

The t and χ² distributions

t\(_{df}\): bell-shaped, symmetric, heavier tails than the normal; \(df=n-1\). Used for the mean when \(\sigma\) is unknown and \(n\) is small; as \(df\to\infty\) it converges to \(N(0,1)\). On these exams it appears only once (the paired test).

χ²\(_{df}\): the distribution of a sum of squared standard normals; right-skewed, positive. It is the sampling distribution behind variance inference: \(\dfrac{(n-1)s^2}{\sigma^2}\sim\chi^2_{n-1}\).

Confidence interval & test for a variance

Author-light — plausibly examinable, never seen. Using \(\dfrac{(n-1)s^2}{\sigma^2}\sim\chi^2_{n-1}\):

\[ \text{CI for }\sigma^2:\quad \left(\frac{(n-1)s^2}{\chi^2_{n-1,\,\alpha/2}},\;\frac{(n-1)s^2}{\chi^2_{n-1,\,1-\alpha/2}}\right). \]

Test \(H_0:\sigma^2=\sigma_0^2\): statistic \(\chi^2=\dfrac{(n-1)s^2}{\sigma_0^2}\), compared to \(\chi^2_{n-1}\) critical values. (Note the chi-square table is asymmetric — you read two different critical values for the two tails.)

Out of scope — recognise and move on

Chi-squared test of homogeneity / independence (contingency table). Appeared once (EBf1-P6, no solution provided) but it is Ross chapter 9, outside this program (ch4–8). If you ever see a contingency table: build expected counts \(E_{ij}=\dfrac{\text{row}_i\cdot\text{col}_j}{N}\), statistic \(\chi^2=\sum\dfrac{(O-E)^2}{E}\), \(df=(r-1)(c-1)\). Just a margin note — do not study deeply.

F-test (equality of two variances, §8.5.1). The F-distribution (§5.8.3) is explicitly skipped, so treat the F-test as out of scope — recognition only, never required on these papers.

Click any card to flip. Rate it after to track what you need to revisit.

Card 1

Exponential distribution: tail probability and the memoryless property?

\(P(X>t)=e^{-\lambda t}\); \(E=1/\lambda\), \(\mathrm{Var}=1/\lambda^2\). Memoryless: \(P(X>s+t\mid X>s)=P(X>t)\) — a used item is as good as new.

How sure? 50%

Card 2

Chebyshev's inequality — state it both ways. When would you reach for it?

\(P(|X-\mu|\ge a)\le\sigma^2/a^2\), or \(P(|X-\mu|\ge k\sigma)\le1/k^2\). Use it for a distribution-free bound when you don't know the shape (and can't assume normal).

How sure? 50%

Card 3

What is the sampling distribution linking the sample variance to σ², and what is it used for?

\(\dfrac{(n-1)s^2}{\sigma^2}\sim\chi^2_{n-1}\). It underlies the confidence interval and the test for a population variance.

How sure? 50%

Card 4

You see a contingency table asking about differences between groups. What test — and is it in scope here?

Chi-squared test of homogeneity: \(E_{ij}=\text{row}_i\text{col}_j/N\), \(\chi^2=\sum(O-E)^2/E\), \(df=(r-1)(c-1)\). It is Ross ch9 — OUT of this program. Recognise it, don't study it deeply.

How sure? 50%

Card 5

How does the t distribution relate to the normal, and the χ² distribution to the normal?

t\(_{df}\): like the normal but heavier tails; → N(0,1) as df→∞. χ²\(_{df}\): the distribution of a sum of df squared standard normals; right-skewed, positive.

How sure? 50%

Progress

0 / 2

EX. 01

A distribution-free bound (Chebyshev)

canonical (not exam-tested) easy

A variable has mean \(\mu=50\) and standard deviation \(\sigma=5\), with unknown distribution. Bound the probability that it differs from 50 by at least 15.

\(a=15=3\sigma\), so \(k=3\). Use \(P(|X-\mu|\ge k\sigma)\le1/k^2\).

Step 1. Express in σ units
\(a=15=3\sigma\Rightarrow k=3\).

Step 2. Apply Chebyshev
\(P(|X-50|\ge15)\le\dfrac1{3^2}=\dfrac19\).

\(\le\dfrac19\approx0.111\) (a loose bound, valid for any distribution).

EX. 02

Exponential tail & memorylessness

canonical (not exam-tested) easy

A component's lifetime is exponential with mean 10 hours. (a) Find \(P(X>20)\). (b) Given it has already lasted 30 hours, find \(P(X>50\mid X>30)\).

Mean \(=1/\lambda=10\Rightarrow\lambda=0.1\). Tail \(P(X>t)=e^{-\lambda t}\). Part (b) uses memorylessness.

Step 1. (a) Tail
\(P(X>20)=e^{-0.1\cdot20}=e^{-2}\approx0.1353\).

Step 2. (b) Memoryless
\(P(X>50\mid X>30)=P(X>20)=e^{-2}\approx0.1353\) — the extra 30 hours are forgotten.

(a) \(e^{-2}\approx0.135\); (b) the same \(e^{-2}\approx0.135\) (memoryless).

CH. 10 Statistics

Cheatsheet & Decision Map

The open-book weapon: which procedure → which formula. Print this (Print cheat sheet) and bring it to the exam.

Sections 07

Flashcards 0

Exercises 0

Read time 3'

Sources Synthesis of chapters 1–9 · exam-taxonomy.md

Key concepts

Decision tree CI & test formulas Critical values Estimator recipes Bayes template Normal shortcuts

Which procedure? — top-level routing

The question is about…	Go to
"chosen at random then observe", "given the result, prob it was…"	Bayes / total probability →
two overlapping groups, "at least one", "not the other"	inclusion–exclusion
"choose k without replacement", "all different"	counting \(\binom{n}{k}\)
"normally distributed", a probability / percentile / unknown σ	standardize Z, Φ →
weighted/grouped mean, add an observation, back-solve a size	descriptive (Σx=nx̄)
many i.i.d. units, prob the TOTAL exceeds a threshold	CLT: \(N(n\mu,n\sigma^2)\)
symbolic estimator T: unbiased? MSE? efficient? find a / c?	estimator theory →
"write the likelihood / MLE"; "moments estimator"	MLE / method-of-moments
"confidence interval", "how large a sample"	CI / sample size →
"is there reason to conclude / more than / changed"	hypothesis test →

Inference: which CI / which test?

Walk the questions in order:

CI or test? "compute an interval / how large a sample" → CI. "is there reason / more than / changed" → test.
Mean or proportion? averages/measurements → mean. percentages/counts of successes → proportion.
One sample or two? one group vs a target → one-sample. two groups compared → two-sample. same units before/after → paired.
z or t? known σ or large n (≥30) → z. small n, normal, unknown σ → t. (Proportions always z.)
One- or two-sided? "more / over / improved" → one-sided \(z_\alpha\). "changed / differ" → two-sided \(z_{\alpha/2}\).

CI & test formula bank (T1 + T2 spine)

Case	CI: estimate ± margin	Test statistic
1-mean, σ known / large n	\(\bar x\pm z_{\alpha/2}\sigma/\sqrt n\)	\((\bar x-\mu_0)/(\sigma/\sqrt n)\)
1-mean, small n	\(\bar x\pm t_{n-1,\alpha/2}s/\sqrt n\)	\((\bar x-\mu_0)/(s/\sqrt n)\sim t_{n-1}\)
paired	\(\bar d\pm t_{n-1,\alpha/2}s_d/\sqrt n\)	\(\bar d/(s_d/\sqrt n)\sim t_{n-1}\)
2-mean, large n	—	\((\bar x_1-\bar x_2)/\sqrt{s_1^2/n_1+s_2^2/n_2}\)
1-proportion	\(\hat p\pm z_{\alpha/2}\sqrt{\hat p(1-\hat p)/n}\)	\((\hat p-p_0)/\sqrt{p_0(1-p_0)/n}\)
2-proportion (pooled)	—	\((\hat p_1-\hat p_2)/\sqrt{\hat p_p(1-\hat p_p)(1/n_1+1/n_2)}\)

\(\hat p_p=\dfrac{X_1+X_2}{n_1+n_2}\). Reject: two-sided \(|ts|\ge z_{\alpha/2}\); one-sided \(ts\ge z_\alpha\) (or \(\le-z_\alpha\)). p-value: one-sided \(1-\Phi(ts)\), two-sided \(2(1-\Phi(|ts|))\); reject when \(\alpha\ge\) p-value.

Sample size (round UP)

Target	Formula
mean, total length \(L\)	\(n\ge(2z_{\alpha/2}\sigma/L)^2\)
proportion, half-width \(d\)	\(n\ge z_{\alpha/2}^2\,p(1-p)/d^2\), worst case \(p(1-p)=\tfrac14\)
recover n from CI	\(\hat p\)=mid, \(d\)=half-width, \(n=z_{\alpha/2}^2\hat p(1-\hat p)/d^2\)

Critical values & Φ shortcuts

Confidence / tail	z
90% (two-sided) / 0.05 tail	\(z_{0.05}=1.645\)
95% / 0.025 tail	\(z_{0.025}=1.96\)
99% / 0.005 tail	\(z_{0.005}=2.576\)
0.03 / 0.02 tails (asymmetric)	\(z_{0.03}=1.88,\;z_{0.02}=2.055\)
small-n paired	\(t_{9,0.05}=1.833\)

Φ values: \(\Phi(0.5)=0.6915\), \(\Phi(1)=0.8413\), \(\Phi(2)=0.9772\), \(\Phi(2.4)=0.9918\), \(\Phi(1/3)=0.6293\). Symmetry \(\Phi(-z)=1-\Phi(z)\). Central band \(2\Phi(d/\sigma)-1\). Inverse: \(\Phi^{-1}(0.975)=1.96\), \(\Phi^{-1}(0.75)=0.675\). Standardize \(Z=(X-\mu)/\sigma\).

Estimator-theory recipes (T3 / T10)

MSE: \(\mathrm{MSE}=\mathrm{Var}+\mathrm{Bias}^2\); Bias \(=E[T]-\theta\). Unbiased ⇒ MSE=Var.
Var of combo: \(\mathrm{Var}(aU+bW)=a^2\mathrm{Var}(U)+b^2\mathrm{Var}(W)\) (independent).
Unbiased weights: any \(c\bar X_1+(1-c)\bar X_2\) is unbiased for μ (weights sum to 1).
Min-MSE weight: inverse-variance, \(c^*=\dfrac{n_1}{n_1+n_2}\) for sample means.
Find constant: set \(E[T]=\theta\); use \(E[X^2]=\sigma^2+\mu^2\), \(E[X_iX_j]=\mu^2\) (indep).
Jensen: \(\sqrt T\) (or log, 1/T) of an unbiased T is biased.
MLE: \(L=\prod f\) → \(\log\) → \(d/d\theta=0\). Geometric → \(\hat\theta=1/\bar X\).
Method-of-moments: set \(E[X]=\bar X\) (as a function of θ), invert.
Moments: Bernoulli \(E{=}p,V{=}p(1{-}p)\); Poisson \(E{=}V{=}\lambda\); \(\mathrm{Var}(\bar X_m)=\sigma^2/m\).

Probability templates (T4 / counting / CLT)

Total probability: \(P(E)=\sum_i P(E\mid H_i)P(H_i)\).
Bayes: \(P(H_j\mid E)=\dfrac{P(E\mid H_j)P(H_j)}{\sum_i P(E\mid H_i)P(H_i)}\).
Coin mixture: \(P(X=0\mid N=n)=(\tfrac12)^n\).
Conditional in Bernoulli: disjoint arrangements ÷ binomial; \(p\) cancels.
Inclusion–exclusion: \(P(B\cup C)=P(B)+P(C)-P(B\cap C)\); \(P(B\cap C^c)=P(B)-P(B\cap C)\).
Counting: \(P=\#\text{fav}/\binom{n}{k}\); complement for "different", fix items for "included".
CLT (sum): \(S_n\approx N(n\mu,n\sigma^2)\), \(Z=\dfrac{s-n\mu}{\sigma\sqrt n}\) (SUM uses \(\sigma\sqrt n\), not \(\sigma/\sqrt n\)).
Binomial→Normal: \(N(np,np(1-p))\); skip \(\pm\tfrac12\) if "no continuity correction".
Normal models→prob: \(\mathrm{Var}=E[X^2]-(E[X])^2\), \(E[X(X-1)]=E[X^2]-E[X]\).