0:04

The results section of your final report is where you summarize and

interpret the results of each statistical analysis.

Typically when we develop a results section,

we start with the simplest analysis, which is usually descriptive statistics

in which we provide the reader with a little bit more information

about the distributions of our predictors in response variable.

Then we move onto a bivariate analyses where we describe the associations between

a predictor and response variable.

And then finally we report our multivariable analyses.

Typically our primary analysis with multiple predictors.

The results of our descriptive and bivariate analyses can be brief.

What we really want to do is provide most of the detail, the results for

the multivariable analyses since this is typically the goal of our study.

One thing to keep in mind is that tables can be useful for

summarizing a lot of results.

In many statistical analyses that we do, particularly machine learning methods,

we may have a lot of predictors in writing out the descriptive statistics in text for

each of these variables can make the result section very long and

difficult to read.

Summarizing all that information in a table makes it a lot easier.

If you have your statistics in a table

then you really don't need to report those numbers again in the text unless,

of course you want to highlight some particular variable.

And figures are also very important.

1:56

Most of the time when we write a final report, they tend to be pretty brief so

we don't always have room for a lot of figures.

So we pick and choose those figures carefully by choosing the ones that really

show an important result or describe a more complex result.

When we write about our results and we include figures in the results section

we want to make sure that we refer to the figures in the text so that we can

point the readers to the figures while we're describing the results.

Here's an example of a results section.

We're starting out with our descriptive statistics.

You can see that I've depicted the descriptive statistics in a table with

the quantitative data analytic variables.

And I refer the reader to the table.

You can also see that I am not

repeating the actual statistics in the text because it's in the table.

Although I do so for

the response variable in order to highlight that particular variable.

In addition, there were a couple of binary variables.

The descriptive statistics for

those variables are not in the table, so I present percentages and

the number of observations for each of the two binary variables.

Then we move on to describing our bivariate analyses.

Here I present some scatter plots,

which provide a great visualization of the association between

our manufacturing lead time response variable and the quantitative predictors.

Again, you can see in parentheses that I've written

Figure 1 to refer the reader to the figure while I discuss the results.

Then I talk about what the figures are showing.

The scatter plots reveal that the manufacturing lead times were shorter when

there was a greater number of ingredient units in stock.

Then I add the results on my Pearson correlation analysis.

R equals -0.79, which is the value of the Pearson Correlation Coefficient,

and then the p value associated with that correlation coefficient.

The scatter plots also show that manufacturing lead times

increased when production workers had worked more hours on their shift

before beginning production.

And again I provide the Pearson correlation coefficient

4:02

Note that I talk about the direction of the association.

So rather than just saying manufacturing lead times are significantly

associated with the number of ingredient units in stock

I talk about the direction of the association.

Otherwise, it's not especially informative.

When I talk about the direction of the association, what I mean is that I'm

indicating whether or not it's a positive or negative association.

Manufacturing lead time was negatively associated with ingredient units in stock.

And I described that negative association by writing manufacturing lead

time's shorter when there was a greater number of ingredient units in stock.

And then I do the same thing with the association between number of

hours production workers had worked on this shift before beginning production and

manufacturing lead times which was a positive association.

Manufacturing lead times increased when production workers had

worked more hours on their shift before beginning production.

And then I write that manufacturing lead time was not significantly associated with

the number of steps involved in the production of a batch, and

I provide the Pearson correlation coefficient, -.05 and

the p value and it was not associated with the number of hours of sleep that

production workers reported getting the night before batch production began.

And again, I reported the Pearson correlation coefficient.

0.01, p = 0.71.

Note that the figure has a title that describes what's going in the figure.

Note also that I follow the standard graphing convention

where my predictive variable is spotted on the horizontal, or x axis.

And my response variable is plotted on the vertical, or y axis.

Note also, that the variable labels are explained.

Rather than just providing the actual variable name, the reader can understand

what the variable means without having to refer to a code book.

These are important characteristics of figures.

You want to make sure that you have a title and

you want to make sure that the variable labels are informative.

So in the previous slide, I summarized the association between

each quantitative predictor, and the quantitative response variable

using scatter plots In a Pearson Correlation Coefficient.

In this next section I discuss the association between each

categorical predictor and the quantitative response variable.

The appropriate bivariate analysis when you have a categorical predictor and

a quantitative response variable Is analysis of variance.

So I write analysis of variance indicated that average manufacturing

lead times did not differ significantly as a function of equipment failure, and

I provide the s statistic and the associated degrees of freedom

in parentheses, and the p value associated with the f-statistic.

And then finally, the r-square, which is the variance

in manufacturing lead times that is accounted for by equipment failure.

I also write that trainee involvement and

production is also not significantly associated with manufacturing lead time.

And again I provide the f-statistic, the p value and the r square.

And then I point the reader to figures 2 and

3 to give them a visual of the association.

Notice also that when I describe the results of the analysis of variance,

I do not include the means in the text.

That's because I have provided the means in the figures.

Finally I discuss the results of my multivariable analysis.

I first point the reader to figure four, which showed that five of the six

variables were retained in the model selected by the Lasso Regression Analysis.

Only the number of production steps predictor was excluded.

The number of ingredient units in stock.

And the number of shift hours employees work before beginning production were most

strongly associated with manufacturing lead time followed by equipment failure,

training involvement in production and the number of hours of sleep

that production workers reported getting the night before their shifts began and

this is shown in table two.

Table 2 shows the last or least angle regression variable selection summary.

The average squared error, which is also the means squared error,

associated with each variable as it was entered into the model.

Then I provide a little bit of information about the direction of the association

with each variable, and the response variable manufacturing lead time.

Manufacturing lead times were shorter for batches that had a greater number of

ingredients in stock, and when production operators reported sleeping for

more hours the night prior to bad production.

On the other hand, working more shift hours prior to manufacturing,

equipment failure and having trainees involved in batch production

was associated with increased lead times.

Together these five predictors accounted for

93.3% of the variance in manufacturing lead time.

Then I report the mean square error for

the test data set compared to the mean square error for the training data set.

To show that they differed very little, which suggests that predictive accuracy

did not decline in the last regression algorithm development and training data

set, with applied to predict lead manufacturing times in a test data set.

And again, I refer the readers to figure 4,

which shows the mean square error rates for both the tests and training data sets.