data310

Week 2 Monday Response

dataset: https://archive.ics.uci.edu/ml/datasets/Automobile

out of 26 variables, these are the categorical:

make
fuel-type
aspiration
num-of-doors
body-style
drive-wheels
engine-location
engine-type
num-of-cylinders**
fuel-system

Start with the Regression script that we prepared during today’s class. Replace the Auto MPG dataset provided in the tensorflow exercise with the Auto Imports dataset provided in the UCI Machine Learning Repository. Specify a model with the following target and features.

highway-mpg (continuous)
num-of-cylinders (categorical)
engine-size (continuous)
horsepower (continuous)
curb-weight (continuous)

Specify and train both a multi-class linear regression and a multi-class DNN regression. Which of the two models produces a better loss metric (see this link for an explanation of the loss function). Produce a plot that supports your answer. Return to the remainder of the variables from the dataset and add additional continuous and categorical features with the intent of improving your loss metric. Produce a plot that demonstrates the value of your model. What is the best model your team was able to produce?

loss plots
between multi-class linear regression and multi-class DNN regression, the DNN model is better
- Mean Absolute Error
  - DNN: 2.058904
  - linear regression: 2.571499
next step: add additional variables one by one to “raw_dataset_variables” to see if model improves
After running the above listed 5 features (with num-of-cylinders as categorical variable), I run 24 more models (12 for multi-class linear regression and 12 for multi-class DNN) each with a new continuous/numeric feature from the list of 26 features added to the original 5 and with still num-of-cylinders as categorical. I then run 26 x 10 more models (24 models + 1 original linear regression + 1 original DNN), each ‘loop’ with a different categorical variable [check above list for categorical variable].
for each ‘loop’ (holding one categorical variable constant for loop), a new continuous feature is added each time to run a model, until all the continuous features have been added
you can see the dataframes of MAE here MAE dataframes
- I highlighted the models with the lowest MAE from each loop
because there are so many models ran, a plot visualization would be too complicated for the eyes.
Therefore, I chose to present the MAE information with dataframes
the best model is “lin_city-mpg”, run with categorical variable “engine type”, because it has the lowest MAE of 0.690061, which is even lower than the original DNN and linear regression.
- “lin_city-mpg” means linear regression model run with continuous features of
  - symbolizing, normalized-losses, wheel-base, length, width, height, bore, stroke, compression-ratio, peak-rpm, AND CITY-MPG