Lab 2: Probability Theory

Digital Accessibility

Please note that all images were created with modifications to the defaults to make them digitally accessible. If you recreate this code in another environment, your plots have different colors and backgrounds.

Binomial Probabilities

Here is a review of some of the code you will use in Lab 2. You can also refer to Note Outline 2 from class.

Theoretical Probabilities

If you meet the conditions to use a binomial scenario, then the probability distribution follows the binomial distribution and you can use the mathematical formula to calculate the probability of any one or multiple events occurring.

\(P(x = X)\)

We can calculate the probability of a discrete event occurring X times in n cases (\(P(x = X)\)) if the probability of x occurring is piusing either of the codes below:

dbinom(x = X, size = n, prob = pi)
dbinom(X, n, pi) # you do not need to label the arguments if you provide them in exactly this order

\(P(x < X)\)

We can calculate the probability of a discrete event occurring less than X times in n cases (\(P(x < X)\)) if the probability of x occurring is piusing either of the codes below.

pbinom(q = X-1, size = n, prob = pi) #we do not want to include the probability of x in this calculation, so we supply one less than it to the argument
pbinom(X-1, n, pi)

pbinom() always calculates sum of the probability for the the value you supply to the first argument x= and every value less than it down to x = 0. So in the example above, if you want to calculate the probability less than x (but not including x).

You can see below that using pbinom() is the equivalent to adding dbinom() calculations for x and all values less than x:

\(P(x > X)\)

We can calculate the probability of a discrete event occurring more than X times in n cases (\(P(x > X)\)) if the probability of x occurring is piusing either of the codes below.

1-pbinom(q = X, size = n, prob = pi) 
pbinom(q = X, size = n, prob = pi, lower.tail = FALSE)

Both functions above calculate the same thing. We essentially calculate the probability of seeing x and less, and then subtract that from 1 to get the probability of more than x. lower.tail = FALSE does the same, just asks it to return the reverse calculation. We have to be careful to not subtract the probability of X occurring.

\(P(x \leq X)\)

We can calculate the probability of a discrete event occurring X times or fewer in n cases (\(P(x \leq X)\)) if the probability of x occurring is piusing either of the codes below:

pbinom(q = X, size = n, prob = pi)
pbinom(X, n, pi) # you do not need to label the arguments if you provide them in exactly this order

\(P(x \geq X)\)

We can calculate the probability of a discrete event occurring X times or fewer in n cases (\(P(x \geq X)\)) if the probability of x occurring is piusing either of the codes below:

pbinom(q = X, size = n, prob = pi)
pbinom(X, n, pi) # you do not need to label the arguments if you provide them in exactly this order

Practice

Scenario: It is commonly repeated on the internet (but unsourced and unverified) that only 10% of National Park visitors walk more than 1 mile from a road / trailhead.

If this is true, what is the probability ….

1.1. …that you talk to 10 people and only 3 of them have walked a mile away from a road?

Fill in the blanks to calculate the probability.

1.2. …that you talk to 30 people and 5 or fewer of them have walked a mile away from a road?

Fill in the blanks to calculate the probability.

1.3. …that you talk to 25 people and 10 or more of them have walked a mile away from a road?

Fill in the blanks to calculate the probability.

Empirical Calculations

We can calculate probabilities by creating our own probability distribution by simulating a binomial sampling distribution.

We can pull random cases from a binomial distribution and store those data using the following, where R is the number of times to repeat the sample, N is the number of cases in one sample and pi is the probability of success :

name<-rbinom(n = R, size = N, prob = pi)

name has the number of ‘successes’ that occured in each of the samples of size N that was repeated R times.

Recall that if you have a dataset and ask put a vector of information in an inequality, you will get back a vector of information evaluating if the inequality was true or false for each case.

example > 10

 [1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE
[13] FALSE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
[25]  TRUE

If we take the mean() of that inequality, R will treat TRUE = 1 and FALSE = 0, so the mean of that inequality would be the probability that a sample had more than 10 successes

We could also calculate the probability a sample had 10 or more successes with

The inequalities you can use are:

Calculates	Inequality Code
Equal to	`==`
Greater than	`>`
Less than	`<`
Greater than or equal to	`>=`
Less than or equal to	`<=`

Practice

Scenario: It is commonly repeated on the internet (but unsourced and unverified) that only 10% of National Park visitors walk more than 1 mile from a road / trailhead.

If this is true, create a simulation to calculate the probability that ….

1.4. That you would get a sample of 25 people and find that 10 or more of them have walked a mile away from a road?

Normal Distribution

Normal distributions are defined by \(\mu\) and \(\sigma\): \(N(\mu, \sigma)\). We can calculate the probability of a value or less \(P(X \leq x)\) occurring in a Normal distribution using:

pnorm(q = X, mean = mu, sd = sigma)
pnorm(X, mu, sigma) #again, you don't need the argument names if you keep them in this exact order

where X is the value you want to calculate the probability from, mu is the value mean and sigma is the value of the standard deviation of the normal distribution.

You can calculate the probability of a value or more (\(P(X \geq x)\)) using

1-pnorm(q = X, mean = mu, sd = sigma)
pnorm(X, mu, sigma, lower.tail = FALSE)

With the normal distribution, we do not need to worry about the probability of X, since in a continuous probability distribution, the probability of a single value is 0. That also means that \(P(X \geq x)\) = \(P(X > x)\) and \(P(X \leq x)\) = \(P(X < x)\).

Don’t forget that standard deviation \(s = \sqrt(s^2)\), so you may need to calculate \(s\) from \(s^2\) using the function sqrt()

Practice

Scenario: Sardine length is normally distributed with a mean of 120 mm and a standard deviation of 20 mm.

If this is true, find the probability that….

2.1. … a randomly selected sardine would have a length of greater than 134 mm

2.2. … a randomly selected sardine would have a length of less than 75 mm

Interpreting probabilities

When interpreting probabilities, make sure you include the full context, and include the assumption that probability was calculated under!

Final Reminders

Remember, all the code needed for the labs can be found in your note outlines.

Please see Canvas for information on the Math/Stat Cafe (generic R help available, but not class-specific help), and your Instructor’s Student Hours (Office Hours) if you need any additional help.