Notes for reviewing my probability and statistics course at Northeastern.
Sample Spaces
Experiment
Repeatable procedure with a set of possible results
Sample Outcome
One of many possible results of an experiment
Sample Space
$S$: The set of all possible outcomes of an experiment
Event
$A \subset S$ : 0 or more outcomes of an experiment
Probability
$P(A) = \frac{n(A)}{n(S)}$
Set Theory
Discrete Set
A finite or countable set. i.e. tossing a coin until a head is received:
$S = { H, TH, TTH, \dots }$
Continuous Set
A range of values. I.e. getting a number smaller than 1 from within a
real number between 0 and $\sqrt{2}$: $S = [0, \sqrt{2}]$, $A = [0, 1)$
Intersection
$A \cap B = {x  x \in A\ and\ x \in B}$ for some events A, B
Disjoint
$A \cap B = \emptyset$. Also known as "mutually exclusive"
Union
$A \cup B = {x  x \in A\ or\ x \in B}$
Complement
$A^{c} = {x \in S  x \not{\in} A}$
DeMorgan's Law(s)
 $(A \cup B)^{c} = A^{c} \cap B^{c}$
 $(A \cap B)^{c} = A^{c} \cup B^{c}$
Probability Function
$P$ assigns a real number to any event of a sample space, and follows the following axioms for a sample space that's finite:
 $P(A) \geq 0$ for all $A$
 $P(S) = 1$
 $A \cap B = \emptyset \implies P(A \cup B) = P(A) + P(B)$, or $P(A \cup B) = P(A) + P(B)  P(A \cap B)$ otherwise
 $P(\cup_{i=1}^{\infty} A_{i}) = \sum_{i=1}^{\infty} P(A_{i})$ if any $A_1, A_2, A_3 \dots$ are mutually exclusive in $S$
Conditional Probability
$P(AB) = \frac{P(A \cup B)}{P(B)}$ $P(A \cap B) = P(BA)P(A) = P(AB)P(B)$ Solving conditional probability: Make a tree! Fork at each choice, recording the probability on each "leaf". Then use the leaves to assess a series of events.
Independence
$A$ and $B$ are independent if $P(A \cap B) = P(A)P(B)$; put another way, $P(AB) = P(A) \iff P(BA) = P(B)$ For more than two sets:
 $P(A \cap B \cap C) = P(A)P(B)P(C)$
 $P(A \cap B) = P(A)P(B)$, $P(A \cap C) = P(A)P(C)$, $P(B \cap C) = P(B)P(C)$ and so onβ¦
Series
Geometric
$S_{n} = \sum_{k=0}^{n}r^{k} = 1 + r + r^{2} + \dots + r^{n}$ For $1 < r < 1$, the sum converges: $S \equiv S_{\infty} = \sum_{k=0}^{\infty}r^{k} = \frac{1}{1  r}$, for $r < 1$
[[PDF]] $p_{Y}(k) = (1  p)^{k  1}p$
mean
$\frac{1}{p}$
Probability Distributions
$(^{n}_{r}) = {n}C{r} = \frac{n!}{(n  r)! r!}$
Binomial Distribution
Given n independent trials with two outcomes and a constant P(success) for each outcome, $P(k) = (^{n}_{k}) * p^{k}(1  p)^{n  k}$ for $k = 0, 1, 2, \dots, n$.
[[PDF]] $P(X = k) = (^{n}_{k}) * p^{k}(1  p)^{n  k}$
 calculate: use
binompdf
[[CDF]] $P(X \leq t) = \sum_{k=0}^{t}(^{n}_{k}) * p^{k}(1  p)^{n  k}$
 calculate: use
binomcdf
[[Expected Value]] $E(X) = np$
Bernoulli Distribution
Essentially a [[Binomial Distribution]] where $n = 1$. $P(k) = (^{1}_{k}) * p^{k}(1  p)^{1  k}$
Hypergeometric Distribution
$P(x) = \frac{(^{k}{x})(^{Nk}{nx})}{(^{N}_{n})}$
 Random selection of $n$ items without replacement from a set of $N$ items
 Not guaranteed P(success) stays constant.
 $k$ items are success; $N  k$ items are failures.
This can be considered a generalization of the [[binomial distribution]].
Uniform Distribution
All values in a range are equally likely. For some interval $[a, b]$:
[[PDF]] $\frac{1}{b  a}$
[[CDF]] $\frac{x  a}{b  a}$
Exponential Distribution
[[PDF]] $f_{X}(x) = \lambda e^{\lambda x}$ for $x \geq 0$
[[CDF]] $F_{X}(x) = 1  e^{\lambda x}$
mean
$E(X) = \frac{1}{\lambda}$
var
$Var(X) = \frac{1}{\lambda^{2}}$
Poisson Distribution
Counts the number of occurences per unit of measurement  over a specific period of time, a specific area, or volume, etcβ¦
The probability of an event occuring in a unit of measurement must be the same for all similar units; for example, if the unit of measurement is a month, then the probability must be the same for all months.
Poisson is often used to approximate the binomial distribution: given that $n$ is large ($n \geq 100$) and $p$ is small, we can let $Ξ» = np$! In other words, $\lambda := np$ > (average number of occurences per unit) * (length of observation period)
Use poissonpdf
and poissoncdf
calculator unctions to approximate
this.
[[PDF]] $p_{X}(k) = P(X = k) := \frac{\lambda^{k}e^{\lambda}}{k!}$
mean
$E(X) = \lambda$
variance
$Var(X) = \lambda$
Normal Distribution
Density Functions
Discrete Random Variable
Some $X$ such that:
 $X: S \rightarrow \mathbb{R}$
 $X$ is a countable subset of $\mathbb{R}$
 Motivation: Constrain the sample space to a smaller sample space, using a single variable to represent each outcome we're investigating. If we're looking at pairs of numbers, for example, we only care that the sum of the pair is the same, so we consider (1, 2) and (2, 1) to be the same outcome.
Continuous random variable
Like discrete, but ranges over a continuous interval of $\mathbb{R}$
instead. Two continuous random variables are independent if some
functions, $g(x)$ and $h(x)$, exist such that:
 $f_{X,Y}(x,y) = g(x)h(y)$
 $f_{X}(x) = g(x)$
 $f_{Y}(y) = h(y)$
In other words, one should be able to multiply the results of the marginal pdfs to produce the joint pdf for the two variables, and vice versa.
(Probability Density Function)
Discrete
For every $X$, a probability density function (pdf) looks like: $p_{x}(k) = P(X = k) := P({s \in S  X(s) = k})$, where $p_{x}(k): \mathbb{R} \rightarrow \mathbb{R}$. Here, $s$ and $S$ are from the original sample space we're sampling from.
Continuous
Some $f_{x}(x)$ satisfying:
 $f_{x}(x) \geq 0$
 $\int_{\infty}^{\infty}f_{x}(x) = 1$
$P(a \leq X \leq b) = \int_{b}^{a}f_{x}(x)dx$
Joint PDF
$p_{X,Y}(x,y) := P(X = x, Y = y)$, satisfying:
 $p_{X,Y}(x,y) \geq 0$
 $\sum_{all y} \sum_{all y} p_{X,Y}(x,y) = 1$

Discrete
Given the joint pdf of $X$ and $Y$, the marginal pdfs of $X$ and $Y$ are:
 $p_{X}(x) = \sum_{all y}p_{X,Y}(x,y)$
 $p_{Y}(y) = \sum_{all x}p_{X,Y}(x,y)$

Continuous
$P((X,Y) \in R) = \int \int_{R}f_{X,Y}(x,y) dx dy$ When solving  identify the bound that's dependent on the other! It can really help to plot out some 2D plane, then graph the relationship between the two continuous random variables. From this graph it's often fairly easy to identify, and thus estimate, the area we're investigating; this guides us to investigate what we should learn more from! Absolutely worth working through some pracice problems.
CDF
Cumulative Distribution Function
Discrete
$F_{x}(t) = P(X \leq t) := P({s \in S  X(s) \leq t})$, where $F_{x}(t): \mathbb{R} \rightarrow \mathbb{R}$. Generally, $F_{x}(t) = \int_{\infty}^{t}p_{x}(t)$; the pdf represents the probability of the discrete random variable being a specific value, while the cdf represents the probability of all outcomes occuring less than some outcome upper bound $t$.
Continuous
$F_{x}(x) = P(X \leq x) = \int_{\infty}^{x}p_{x}(x)dx$ $P(a \leq X \leq b) = F_{X}(b)  F_{X}(a)$
Expected Value
A generalization of the concept of "average". The name's on the tin  it's a value that represents the proportionally weighted, expected result.
For example, if I have a 5% chance at $100 and a 95% chance at $20, the expected value would be $100 * 0.05 + 0.95 * 20$, so $24.
Discrete
$E(X) = \sum_{all \ k} k * p_{X}(k)$
Continuous
$E(X) = \int_{\infty}^{\infty} x * p_{X}(x) dx$
Other Properties
$E(aX + bY) = aE(X) + bE(Y)$ for any random varaibles $X, Y$ and numbers $a$ and $b$. As such, if $X$ and $Y$ are independent, then $E(XY) = E(X)E(Y)$.
Sample Mean
Denoted as $\bar{X}$ $\bar{X} = \frac{1}{n}(X_{1} + X_{2} + \dots + X_{n})$
Median
Discrete
"Middle number" of the distribution, or the average of the two middle numbers if the cardinality is even; the standard definition.
Continuous
$m$ such that $\int_{\infty}^{m}f_{Y}(y) dy = 0.5$. Finding the median:
 Integrate and substitute.
 Factor in terms of and solve for $m$.
Variance
A measure of how far the distribution spreads from its mean. $Var(X) := E((X  \mu)^{2})$, where:
 $\mu = E(X)$ is the mean of $X$
 $\sigma := \sqrt{Var(X)}$ is the standard deviation of $X$
If $X$ and $Y$ are independent, then: $Var(aX + bY) = a^{2}Var(X) + b^{2}Var(Y)$ In general: $Var(aX + bY) = a^{2}Var(X) + b^{2}Var(Y)  2abCov(X,Y)$
Covariance
where $Cov(X, Y)$, the covariance of X and Y, is: $Cov(X,Y) := E(XY)  E(X)E(Y)$ As can be assumed, if $X$ and $Y$ are independent, then $Cov(X,Y) = 0$.
Correlation
$Corr(X,Y) = \frac{Cov(X,Y)}{\sqrt{Var(X)Var(Y)}}$ A measurement of correlation; if positive, then the two random variables increase together; if negative, one increases while the other decreasese and vice versa.
Double Integrals
Fubini's Theorem: $\int_{a}^{b} \int_{c}^{d} f(x,y) dy dx = \int_{c}^{d} \int_{a}^{b} f(x,y) dx dy$, given $a \leq x \leq b$ and $c \leq y \leq d$ To identify the region in $\mathbb{R}^{2}$ to integrate over, use the inside first. Treat the unevaluated integral variable as a constant and just integrate with respect to a constant. Often one variable is much easier to integrate than the other; pick the right one to use! This takes practice.
TODO Problems to Practice
3
Graphing PDF, CDF
Converting between the two, esp. with continuous piecewise
what's with that variance theorm and E(g(X))? practice those problems.
there are some good exercises for deriving expected value and variance available in the textbook  review these!
do the exam
3.7 3.9 4.x 5.x, especially the problem i missed on the last exam exam problems and how exactly to approach them the rest of the 6s and 7s, though i can probably wing those just
Types of Problems
Exam 1
Probabilities and Sets
i.e. $P(A), P(A \cup B), P(A \cap B^{c})$ > find something Draw things out as Venn diagrams to help visualize Nice properties:
 $P(A \cup B) = P(A \cap B^{c}) + P(A \cap B) + P(A^{c} \cap B)$
 Bayes: $P(AB) = \frac{P(A \cap B)}{P(B)}$
Simple Probabilities
Generally, use some arrangement of $nCr$ ; P(at most x) = 1  P(at least x); vice vers
Conditional Probabilities with Scenarios
Draw a tree:

At the root, no branch is chosen

After the root, choose what info you have more about: i.e. if you're looking for P(Defective  built at plant X), draw a tree with the first set of descendant notes representing the plant at which the thing was constructed, then from there the chances that the product was defective given the plant branching off of each.
From here, can use Bayes rule to fill out the tree, then find desiresd probability.
Exam 2
distributions to know: binomial,
Finding Mean, stdev, sample from given problem
$u = np$ $\sigma = \sqrt{np(1p)}$
$P(X=a) = nCa * P(a)^{a}(1P(a))^{na}$ ~ binompdf
w/ n, P(a), a
Finding P in range: $P(a < X \leq b) = cdf(500, b, P(x))  cdf(500, a, P(x))$ gives CDF for that range!
Find cdf of discrete set of scenarios
 Write out each possible scenario and associated value of random variable
 Use P(scenario) * (num occurences of scenario) for each random variable value to compute some P(X=res) for each
 Assemble into table; P(X=k) for values x=1, x=2, x=3 for example.
PDF > CDF
integrate when converting to cdf, outside the range provided for the pdf, must state that the value of the cdf is 0 before and 1 following the range; otherwise the cdf won't function outside of it, but the range for a cdf should support anything
CDF > PDF
derivative
Find E, Var given density function
 $E(X) = \int_{a}^{b} x f_{X}(x) dx$ over provided range $a \leq f_{X}(X) \leq b$

$Var(X) = E(X^{2})  (E(X))^{2}$
 where $E(X^{2}) = \int_{a}^{b} x^{2} f_{X}(x) dx$
Piecewise CDF>PDF, PDF > CDF
derivative, integral respectively of each function provided over each range
Joint PDF, CDF
 Find marginal pdfs for each; for x, integrate by dy, and for y integrate by dx
 Set up integral for $E(XY)$; this should be some $\int_{c}^{d} \int_{a}^{b} xyf_{XY}(x,y) dx dy$ for bounds $a \leq x \leq b$ and $c \leq y \leq d$ . If bounds overlap, reference the relationship between them (i.e. $0 < y < x < 1 \implies a=y, b=1, c=0, d=1$), as the bounds of one are dependent on the bounds for the other.
 Find the [[covariance]] $Cov(X,Y) := E(XY)  E(X)E(Y)$. Use the marginal PDFs and integrate to find corresponding $E$, then follow the formula
finding c for some pdf with constant to solve for
TODO Combining Random Variables
i.e. test 2: 13, 14
Exam 3
use normal distribution with continuity correction to estimate probability in bounds
 find mean ($\mu$) and standard deviation ($\sigma$) of the provided scenario for the sample mean; note that $\sigma(\bar{x}) = \frac{\sigma}{\sqrt{n}}$

state that the normal approximation can be used if given "normal
approximation" or $n \geq 30$; "using Central Limit Theorem" if
using this approximation, where
$\bar{X} \tilde{=} N(\mu, \frac{\sigma}{\sqrt{n}})$
 TODO when to use $\frac{\sigma^{2}}{n}$?
 continuity correction: "round up or down" to the nearest 0.5 so that the interval encapsulates the intended population. i.e. if interval is $\leq$, it's necessary to ensure interval will encapsulate upper and lower bound
 apply normalcdf: $normalcdf(lower, upper, \mu, \sigma)$; where with sample mean, use $\sigma(\bar{x}) = \frac{\sigma}{\sqrt{n}}$ instead. Use $10^{99}$ to replace upper and lower bounds (negative for low) as needed to fill in provided open P(..) intervals.
solve for interval given resultant probability
 Investigate the interval; sketch it out relative to a normal distribution. If it's two tailed, mark that
 Finding the bound asked for: $invNorm(area before interval, \bar{x}, \sigma(\bar{x}))$ provides such a bound.
probabilities with poisson dist.
finding confidence interval
maximum likelihood estimation
 Find $L(\theta) = \prod_{i=1}^{n}\f_{X}(x)$
Exam 4
unbiasted estimators variance
Some $\hat{\theta}$ is an unbiased estimator for $\theta$ if $E(\hat{\theta}) = \theta$, so:
 Set up variance formula: ignore constants, and take variance of the random variables used to calculate the new random variable
 Substitute based on what's provided for these existing random variables; i.e. if $E(X) = E(Y) = \theta$, can substitute the expected value of each there for theta when calculating
 if original value is reached after evaluating, then it's an unbiased estimator!
variance
examine the calculation for the random variable, squaring the constants and taking the variance of the random variables used to calculate it
find convidence interval

T test
 State facts: S, df, $\alpha$
 calculate $t_{\frac{\alpha}{2}, df}$ with $invT(1  \alpha, df)$.
 Find interval: $\bar{X} = t_{\frac{\alpha}{2}, df} \pm \frac{s}{\sqrt{n}}$; can use TInterval

Z test
statistical test
critical values and errors
 find critical value in nonstandard form: typically $invNorm(acceptarea, average, \frac{\sigma}{\sqrt{n}})$
 Find type 1 error: typically $\alpha$
 find type 2 error (given that real mean, $\mu(H_{a})$): $\beta = P(TypeII) = P(accept H_{0}  \mu = H_{a})$ > $normalcdf(b1, b2, \mu(H_{a}), \sigma(\bar{x}))$
 Find the power of the test: $Power = 1  \beta$
Additional Material
 public document at doc.anagora.org/probstat
 video call at meet.jit.si/probstat