0:00

In this video,

Â we explore comparing independent means from a Bayesian perspective.

Â We'lll take an estimation perspective using credible intervals to quantify how

Â large the difference is.

Â We'll illustrate this using the distracted eater study from the Statistics

Â course to compare the average snack consumption of distracted and

Â non-distracted eaters post lunch.

Â 0:23

As a reminder, the study was called playing a computer game during

Â lunch affects fullness, memory for lunch, and later snack intake.

Â The researchers set out to evaluate the relationship between distraction,

Â recall a food consumed and snacking.

Â They had a sample of 44 volunteer patients that they randomized

Â into two equally sized groups.

Â One group played solitaire on the computer while eating and

Â was instructed to win as many games as possible and focus on the game.

Â 0:54

The other group was asked to eat their lunch without any distractions,

Â focusing on what they were eating.

Â Both groups were provided the same amount of lunch and offered the same

Â amount of biscuits or for the Americans, cookies to snack on afterwards.

Â The researchers measured the snack consumption of subjects in each group.

Â The study reports average snack consumption levels for both groups,

Â as well as the standard deviations in sample sizes.

Â Suppose we want to estimate how much more or

Â less distracted eaters snack compared to non-distracted eaters?

Â We would use an estimate of difference with a credible interval.

Â 1:37

We need to specify models for the snack consumption in the two groups.

Â I'll use A and B to denote the two groups respectively.

Â As with all inferential methods, there's some conditions that we need to meet.

Â First is independence both within and between the groups.

Â We will assume independent normal distributions where each group has their

Â own mean and their own variance.

Â Inference in the case where the variances are not assumed to be the same is known is

Â the Behrens-Fisher problem.

Â 2:08

Next, we need to specify a prior distribution on all four unknown

Â parameters.

Â We will use the reference priors for the parameters in each group.

Â This is known as the independent Jeffrey's prior and

Â is a limit of congregant prior distributions.

Â Under the independent Jeffrey's prior, the marginal group distribution for

Â the mean of Group A has a student t distribution centered at the sample mean

Â and with scale given by the standard error of the estimate of the mean.

Â Likewise, the marginal posterior distribution for

Â the mean of Group B has a student t distribution with parameters, again,

Â taken from the frequented summaries because the data and

Â the parameters of the two groups, the means are independent, posteriorly.

Â The point estimate is the posterior mean of the difference, which is also

Â the difference of the posterior means with the independent Jeffrey's prior this

Â the difference between the two sample means are about 25 grams.

Â This amounts to about two cookies.

Â To provide some measure of uncertainty, we would report a credible interval for

Â the difference of the group means.

Â This requires the posterior distribution of the difference.

Â Unfortunately, there's no closed form expression for

Â the distribution of the difference of two student t distributions.

Â However, we can use simulation to draw samples from posterior

Â distribution using what is called Monte Carlos sampling.

Â From Monte Carlo samples, we simulate possible values of the parameters from

Â their posterior distributions.

Â In this case, first we generate a large number of values from

Â the student t distribution for the mean for Group A.

Â Second, we generate an equivalent from the student t distribution for

Â the mean of Group B.

Â 4:03

For sample m, we formed a difference of the generated means.

Â And with our samples from the posterior distribution, we can now make inference

Â by calculating what it called Monte Carlo averages and using the frequentist

Â definition probability to calculate posterior probabilities.

Â 4:25

The figure shows our Monte Carlo estimate of the posterior distribution using

Â a smooth version of a histogram of the sample differences of the means.

Â The blue area represents the highest posterior density, or HPD region,

Â where the probability that the differences in that region is equal to 0.95.

Â These are the most plausible values for the difference.

Â The 95% HPD interval is a 95% credible interval using our

Â Monte Carlo samples, this is 1.85 to 48.37 grams,

Â suggesting that being distracted does increase snack consumption later.

Â This estimate is based on 25,000 Monte Carlo samples but

Â if you try this on your own you may get a slightly different answer

Â if the number of Monte Carlo samples is different.

Â 5:13

Let's recap everything we've done so far.

Â We started with a study where the researchers randomly assigned respondents

Â into distracted and non-distracted eating groups and

Â compared their snack intake post meal.

Â The sample statistics suggested that the distracted eaters consume more snacks on

Â average.

Â However, just because we observe a difference in the sample means doesn't

Â necessarily mean that there is something going on that is statistically significant

Â or practically significant in the actual population.

Â So, we use statistical inference tools to evaluate if this apparent

Â relationship between distracted eating and

Â snacking provides evidence of a real difference at the population level.

Â The credible intervals for the average difference was 1.85 to

Â 48.37 grams using the Monte Carlo estimate.

Â Which corresponds to some crumbs to just more than four cookies.

Â Note that we have a randomized controlled trial here, so

Â if we do indeed find a significant result we could then talk about a causal

Â relationship between these two variables.

Â 6:19

We use the independent Jeffrey's prior as a reference analysis.

Â This problem where we have two groups with unequal variances is known

Â as the Behrensâ€“Fisher problem.

Â There other prior distributions that you may come across for

Â this famous problem such as matching priors.

Â Those require more advanced methods for simulating than what we will cover here.

Â 6:39

The posterior distribution was computed assuming that the means were different

Â using credible intervals to quantify the magnitude of the difference.

Â However, if we are interested in testing that the means are the same,

Â and that playing solitaire has no affect on consumption,

Â then we need to assign positive probability to the means being equal.

Â In the next video, we'll explore testing this hypothesis using Bayes factors and

Â posterior probabilities.

Â