Binary Classification from Scratch in Python

There is no shortage of articles, videos and tutorials on logistic regression for classification. It’s a classic subject in Machine Learning and is usually a stepping stone before moving on to more complex algorithms. What this article aims to do is show you logistic regression from another lens, where we can solve for a formulaic solution to the weights that we pass to a model that returns the predicted probability. I provide links to the code and solution in the article.

The problem that Logistic Regression aims to tackle is that of finding the probability of an observation of a set of features belonging to a certain class. By training on examples where we see observations actually belonging to certain classes (this would be the label, or target variable), our model will have a good idea of what a new observation’s class will be. Exactly how the model is trained is not always discussed, but I want to discuss a case where a simple rule can determine those probabilities. …

Image for post
Image for post
Photo by JESHOOTS.COM on Unsplash

Testing candidates for a Data Scientist position gives a hiring organization a great sense of how well they can do job-related tasks and manage time effectively. Skills that Data Scientists need to succeed vary by company or even by teams within a company, so testing candidates should be tailored. In general though, Data Science is a process that includes many steps and independent skills that aggregate to something greater than the sum of its parts. …

Image for post
Image for post
Photo by Arisa Chattasa on Unsplash

Your data science project has been getting a lot of attention and now you have been invited to present the topic to executive leadership. The anxiety radiating from your teammates and direct leadership tell you that there is something different about this meeting.

No worries. After being in this situation before I can give a few pointers so you can bring your “A-game” when talking data science to C-level executives. (Note: “C-level” refers to company organization members that have titles starting with “C”, such as the Chief Executive Officer, or CEO)

Start with these:

  • Executives usually have a long career with modest beginnings and are completely reasonable people. If you were promoted to CEO today, would you become totally unreasonable? No. …

Interpretable Machine Learning That Isn’t OLS

Image for post
Image for post
Photo by Max Ostrozhinskiy on Unsplash


  • Introduction
  • California Housing Dataset Example
  • Conclusion
  • References


The interpretable side of machine learning has always been interesting to me. I think it is important to be able to plainly state (to some degree) what the model is doing. Some of the most explainable machine learning models are also the weakest in terms of accuracy, so we are forced to make decisions in order to strike a balance.

This article focuses on the RuleFit algorithm, written in Python, to predict a continuous target variable. This topic has been touched on by other authors, and while you can find methods to explain predictions made from more black-box algorithms, they are usually classification problems. …

Image for post
Image for post
Photo by Tyler Franta on Unsplash

As the demand for people with a data science skillset has soared, companies have looked for ways to fill that demand. One way is for companies to go out and recruit people who encapsulate all things data science, which usually includes proficiency of a coding language, probably Python. This is because Python is a popular coding language used in data science. Another way is to look within your company and see who has the skillset and make those people your data scientists. Companies can also look within their walls and find somebody who is close to what they want in a Data Scientist and train them. To speed up the growth of their internal data science workforce, companies sometimes purchase commercial tools that offer the capability of delivering data science solutions quickly. …

Use Python to Simplify All of the Prep Work for Modeling with Text Data

Image for post
Image for post
Photo by Mike Benna on Unsplash

GitHub link


Skip ahead to the actual Pipeline section if you are more interested in that than learning about the quick motivation behind it: Text Pre Process Pipeline (halfway through the blog).

I’ve been looking at performing machine learning on text data but there are some data preprocessing steps that are unique to text data that I was not used to. Because of that, my Python code included a lot of transformation steps where I would wrangle with the data, fit a transformation, then transform the training data, transform the testing data, and then repeat this process for every type of transformation I wanted to do. I remembered reading that Python had a convenient way to wrap up transformations but never had a reason to look into it before now. Usually, I would perform something like a standardization scaling to numeric data or some dummy variable creation and that would be it. …


Casey Whorton

Data Scientist | British Bake-Off Connoisseur| Recovering Insomniac | Heavy Metal Music Advocate

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store