top of page

Concept of overfitting using the Higher order linear regression

Vinay Anant

Overview

Introduction to the concept of overfitting through the use of higher order linear regression


Overfitting


When a model learns the information and noise in the training data to the point where it degrades the model's performance on fresh data, this is known as overfitting. This means that the model picks up on noise or random fluctuations in the training data and learns them as ideas. The issue is that these notions do not apply to fresh data, limiting the model's ability to generalize.



Underfitting

Underfitting is defined as a model that cannot both model and generalize to new data. A machine learning model that is underfit is unsuitable, as evidenced by its poor performance on the training data. Underfitting is rarely considered since, given a decent performance metric, it is simple to discover. The solution is to move on and experiment with different machine learning techniques. Nonetheless, it serves as a good counterpoint to the issue of overfitting.



Generate data pairs

Let us now proceed to generate the 20 data pairs (X,Y) using y = sin(2*pi*X) + 0.1 * N.

We can use uniform distribution between 0 and 1 for X. This can be done easily using Numpy as np.random.uniform.

After this we can generate Sample N from the normal gaussian distribution. This can also be easily done with Numpy as N = np.random.normal.

Now can be computed using y = sin(2*pi*X) + 0.1 * N.


Split dataset

Split dataset in the form of 10 for train and 10 for test


Root Mean Square Error

The Root Mean Square Error (RMSE) is a standard method of calculating a model's error in predicting quantitative data



RMSE is a good estimator for the standard deviation σ of the distribution of our errors!



Gradient Descent

Gradient descent is an optimization approach for determining the values of a function's parameters that minimizes a cost function.


When the parameters cannot be determined analytically (e.g., using linear algebra) and must be found using an optimization algorithm, gradient descent is the best method to utilize.

The procedure begins with initial values for the function's coefficient or coefficients. These could be 0. Source: Lecture 03_ Gradient Descent slide

By plugging the coefficients into the function

and

calculating the cost, the cost of the coefficients is determined.

Then the derviative is computed for the cost.


Now we have the derivative which can be used

to update the values of coefficients. After this, Source: Lecture 03_ Gradient Descent slide

learning rate parameter, that controls how

much the coefficients can change on each

update must be specified.







Source: Lecture 03_ Gradient Descent slide


Order (0, 1, 3, 9)

We can find weights for of polynomial regression with for the order of 0, 1, 3, 9.


Pandas dataframe to display

Displaying weights along with different order using Pandas which consists of providing us with data frames.



Plot generation for fit data of various orders using Matplotlib





M=0








M=1










M=3







M=9






Train error vs Test error

Plotting the graph can easily help us in identifying the train vs test error after the execution

This graph is clearly showing the comparison of various train and test errors during execution of various orders starting from 0 to 9 i.e. 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 using RMSE.






Generating 100 more data pairs

Let's generate 100 more data pairs to see the results and fitting of a 9th order model.

On the left side we can see 100 data pairs, on the right we can see the fit.


We can avoid this problem of overfitting using regularization.


Regularization

We can regularize using the sum of weights.


Regularization reduces the variance of a model without changing in its bias which helps in avoiding overfitting.


L1 and L2

Regularization consists of two techniques i.e. L1 and L2. In L2 the cost function is modified by adding a term to it as penalty. This is also called as Ridge Regression.

L1 or Lasso regression is another regularization technique for reducing model complexity. It is an abbreviation for Least Absolute and Selection Operator.


We can perform it for various lambda values as 1, 1/10, 1/100, 1/1000, 1/10000, 1/100000.


Using L2 would really improve the comparison between the test and train error and ultimately reduce or avoid overfitting.


So regularization really helps here.



Various experimented Lambda values







Conclusion

After performing various experiments it has been noticed that with ninth degree the model performed great with the training data but it overfitted.

Also, It seems difficult to understand which model performed best with the given lambda values however still with various deviations it appears that the lambda that is closer to 0.1 performed better than the others.



Contribution

  • Performed experiments for various orders and plotted different graphs

  • Researched information for overfitting and its possible solution

  • Implemented L2 Regularization to overcome overfitting


Challenges

  • Implementation of this problem was new for me and references helped me a lot to gain understanding and eventually solve the same

  • Displaying weights in a table was a challenge and I solved it using data frame by pandas after multiple unsuccessful attempts

  • Implementation of model was challenging due to different dimensions and ordering. Reshaping and sorting using zip helped to resolved this



The notebook can be found here


References:

19 views0 comments

Recent Posts

See All

CIFAR-10 image classifier using Pytorch

Overview CIFAR-10 is a dataset which consists of 60,000 32x32 colour images in 10 classes. They were collected by Alex Krizhevsky, Vinod...

Opmerkingen


bottom of page