A closed-form approach in Numpy

Last week we took a look at how to solve linear regression from scratch, using the normal equation. If you need a quick refresher, I highly recommend starting there before moving forward. If not, let’s dive into ridge regression!

Ridge Regression, like its sibling, Lasso Regression, is a way to “regularize” a linear model. In this context, regularization can be taken as a synonym for preferring a simpler model by penalizing larger coefficients. …


Understanding the Procfile, and setup.sh

You’ve probably heard of streamlit. It’s one of the best ways out there to quickly build attractive, user-friendly data apps. If you haven’t heard of streamlit, you should definitely go check it out, you can find a quick-start guide here.

If you have heard of streamlit, and you’ve started playing around with it a bit, then sooner or later you’ve probably wanted to share your app outside of your local machine. I’ll go a step further and say that since much of the beauty of streamlit lies in its ability to make components that typically lie in the general realm…


Extend Built-in Functionality with Your Own Pipeline Compatible Preprocessing Tools

We all know the importance of preprocessing in a machine learning project. It typically makes sense to handle some missing values, scale various features, one-hot encode others, etc., and scikit-learn has prebuilt tools that do a great job of all of these steps right out of the box. But what about adding new features, or applying a custom transformation? Did you know that scikit-learn also makes it easy to build these steps into a standard pipeline workflow? Here’s how!

FunctionTransformer

Let’s start simple with a great tool for on the fly transformations: FunctionTransformer. FunctionTransformer can be used for everything from applying…


Numpy implementation of Batch Gradient Descent

Cost Function

Our cost function is the residual sum of squares (RSS). If you like it can be modified slightly to be the Mean Squared Errors (MSE), by dividing by the number of instances in the set (since this is a constant, it will not change the calculations below). As a reminder, with feature matrix X, and target vector y, that equation can be written like this:

RSS(theta) = (y-X*theta)transpose — (y-X*theta)
RSS(theta) = (y-X*theta)transpose — (y-X*theta)
Our Cost Function Residual Sum of Squares (RSS), Image by Author

We start by first choosing a random vector of coefficients theta, and taking the partial derivative of our cost function (often denoted by “J”, we retain RSS here for clarity), with respect to…


A numpy implementation based on the normal equation

A sample linear regression fitted to some random data
A sample linear regression fitted to some random data
A sample Linear Regression Fit (Image by Author)

These days, it’s easy to fit pretty much any model you can think of with one library or another, but how much do you really learn by calling .fit() and .predict()? While it’s certainly much more practical to use a framework like python’s statsmodels or scikit-learn for the normal use-case, it seems equally logical that when learning data science it makes a lot of sense to get a feel for how these models actually work. Below we show how to use numpy to implement a basic linear regression model from the ground up. Let’s get started!

It’s All About the Coefficients

Think back to your…


Our Titular Hero’s Famous Bridges

If you’re just getting started with programming then you might not have heard of Project Euler. If you haven’t, first go check it out! You’ll find that it’s a super cool series of problems that mix math and computer science. The website lists the intended audience as:

“students for whom the basic curriculum is not feeding their hunger to learn, adults whose background was not primarily mathematics but had an interest in things mathematical, and professionals who want to keep their problem solving and mathematics on the cutting edge.”

I’ll also add two stipulations to form the intended audience for…


Pivot tables are a bit like Vim. (No wait don’t go! They’re much easier!) Pivot tables are like Vim, in the sense that you probably know they are quite a powerful tool, but the syntax is a bit confusing at first, and if you’ve been using groupby forever, then you know, what’s the point? Experienced Vim users will tell you with not-so-borderline evangelical zeal that the advantage of Vim is that you can edit at the speed that you think, really becoming one with your code. Well, pivot tables are a lot like that for data. The groupby verb itself…

Jake Miller Brooks

Data Scientist, lifelong learner, background in Housing Finance, Transportation and Infrastructure.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store