SAS will not print any plots, if ODS graphics is not turned on.

Then, I'll use the survey select procedure to randomly split my data set

into a training data set consisting of 70% of the total observations in the data set,

and a test data set consisting of the other 30% of the observations.

data=new specifies the name of my managed input data set.

And out equals the name of the randomly split output data set,

which I will call traintest.

With it, we include the seed option which allows us to specify a random number seed

to ensure that the data are randomly split the same way if I run the code again.

The samprate command, tells SAS to split the input data set so

that 70% of the observations are designated as training observations, and

the remaining 30% are designated as test observations.

method=srs, specifies that the data are to be split using simple random sampling.

And the out all option, tells SAS to include, both, the training and test

observations in a single output data set that has a new variable called selected,

to indicate whether an observation belongs to the training set, or the test set.

I will use the glmselect procedure to test my lasso regression model.

data=traintest tells SAS to use the randomly split dataset, and the plots=all

option, asks that all plots associated with the lasso regression be printed.

With it we include the seed option, which allows us to specify a random number seed,

which will be used in the cross-validation process.

The partition command assigns each observation a role,

based on the variable called selected,

to indicate whether the observation is a training or test observation.

Observations with a value of one on the selected variable,

are assigned the role of training observation.

And observations with a value of zero, are assigned the role of test observation.

The model command specifies the regression model for which my response variable,

school connected-ness,

is equal to the list of the 23 candidate predictor variables.

After the slash, we specify the options we want to use to test the model.

The selection option tells us which method to use to compute the parameters for

variable selection.

In this example, I will use the LAR algorithm, which stands for

Least Angled Regression.

This algorithm starts with no predictors in the model, and

adds a predictor at each step.

It first adds a predictor that is most correlated with the response variable, and

moves it towards least square estimate, until there is another predictor

that is equally correlated with the model residual.

It adds this predictor to the model and

starts the least square estimation process over again, with both variables.

The LAR algorithm continues with this process

until it has tested all the predictors.

Perimeter estimates at any step are shrunk, and

predictors with coefficients that are shrunk to zero are removed from the model

,and the process starts all over again.