### Correlation & Significance | STAT

the linear relationship between the two variables. There are three classifications for the correlation of data: 1. Positive correlation: as the explanatory variables. If the relationship between the variables is not linear, then the correlation plot for which r = 1. Figure 1. A perfect positive linear relationship, r = 1. Figure 2. The correlation between two variables can be positive (i.e., higher levels of one variable are associated with higher levels of the other) or negative (i.e., higher.

Outliers, well, what looks pretty far from the rest of the data? This could also be an outlier. Let me label these.

**Determine whether there is a Linear correlation between two variables Standard**

Now, pause the video and see if you can think about this one. Is this positive or negative, is it linear, non-linear, is it strong or weak? I'll get my ruler tool out here.

So, this goes here. It seems like I can fit a line pretty well to this. So, I could fit, maybe I'll do the line in purple. I could fit a line that looks like that. And so, this one looks like it's positive. As one variable increases, the other one does, for these data points.

So it's a positive. I'd say this was pretty strong.

## Bivariate relationship linearity, strength and direction

The dots are pretty close to the line there. It really does look like a little bit of a fat line, if you just look at the dots.

So, positive, strong, linear, linear relationship. And none of these data points are really strong outliers. This one's a little bit further out. But they're all pretty close to the line, and seem to describe that trend roughly. All right, now, let's look at this data right over here. So, let me get my line tool out again. So, it looks like I can fit a line. So it looks, and it looks like it's a positive relationship.

The line would be upward sloping. It would look something like this. And, once again, I'm eyeballing it. You can use computers and other methods to actually find a more precise line that minimizes the collective distance to all of the points, but it looks like there is a positive, but I would say, this one is a weak linear relationship, 'cause we have a lot of points that are far off the line.

So, not so strong.

So, I would call this a positive, weak, linear relationship. And there's a lot of outliers here. This one over here is pretty far, pretty far out. Pause this video and think about, is it positive or negative, is strong or weak? Is this linear or non-linear?

Well, the first thing we wanna do is let's think about it with linear or non-linear. I could try to put a line on it. But if I try to put a line on it, it's actually quite difficult. If I try to do a line like this, you'll notice everything is kind of bending away from the line. It looks like, generally, as one variable increases, the other variable decreases, but they're not doing it in a linear fashion.

It looks like there's some other type of curve at play. So, I could try to do a fancier curve that looks something like this, and this seems to fit the data a lot better. So this one, I would describe as non-linear. And it is a negative relationship. As one variable increases, the other variable decreases.

So, this is a negative, I would say, reasonably strong non-linear relationship. And once again, this is subjective. So, I'll say negative, reasonably strong, non-linear relationship. And maybe you could call this one an outlier, but it's not that far, and I might even be able to fit a curve that gets a little bit closer to that.

### Linear, nonlinear, and monotonic relationships - Minitab

And once again, I'm eyeballing this. Now let's do this last one. And so, this one looks like a negative linear relationship to me, a fairly strong negative linear relationship, although there are some outliers. So, let me draw this line. So that seems to fit the data pretty good.

### Correlation and Linear Regression

So this is a negative, reasonably strong, reasonably strong linear relationship. But these are very clear outliers. The figure below is a scatter diagram illustrating the relationship between BMI and total cholesterol. Each point represents the observed x, y pair, in this case, BMI and the corresponding total cholesterol measured in each participant.

Note that the independent variable BMI is on the horizontal axis and the dependent variable Total Serum Cholesterol on the vertical axis.

BMI and Total Cholesterol The graph shows that there is a positive or direct association between BMI and total cholesterol; participants with lower BMI are more likely to have lower total cholesterol levels and participants with higher BMI are more likely to have higher total cholesterol levels.

## Introduction

For either of these relationships we could use simple linear regression analysis to estimate the equation of the line that best describes the association between the independent variable and the dependent variable.

The simple linear regression equation is as follows: The Y-intercept and slope are estimated from the sample data, and they are the values that minimize the sum of the squared differences between the observed and the predicted values of the outcome, i. These differences between observed and predicted values of the outcome are called residuals.

The estimates of the Y-intercept and slope minimize the sum of the squared residuals, and are called the least squares estimates. That would mean that variability in Y could be completely explained by differences in X. However, if the differences between observed and predicted values are not 0, then we are unable to entirely account for differences in Y based on X, then there are residual errors in the prediction.

The residual error could result from inaccurate measurements of X or Y, or there could be other variables besides X that affect the value of Y. Based on the observed data, the best estimate of a linear relationship will be obtained from an equation for the line that minimizes the differences between observed and predicted values of the outcome.

The Y-intercept of this line is the value of the dependent variable Y when the independent variable X is zero.