Image for post
Image for post
Photo by Hans Eiskonen on Unsplash

When someone shows you who they are, believe them the first time. — Maya Angelou

You reach out your hand, grasp the door handle, turn it and push open the door. As it gently swings open and comes to a rest, you see 3 people: two are laughing and talking with each other and one is staring at their phone with an angry look. Does this quick information tell you anything about how to communicate with the people in this room? Obviously, after you learn more about them you will have a better idea on how to approach them. This…


Image for post
Image for post
Photo by Sigmund on Unsplash

Here, I will share with you two different methods for applying custom functions to groups of data in pandas. There are many out-of-the-box aggregate and filtering functions available for us to use already, but those don’t always do everything we want. Sometimes, when I want to know “what was the second most recent observation per month” or “what is the difference between two weighted averages by product type”, I know I need to define a custom function to do the job.

Sample Data (not again…)

Usually, I try not to use this toy dataset because it has been done so many times, but it…


Image for post
Image for post
Photo by William Krause on Unsplash

An airport in Florida is closer to the Detroit airport than one in Hyderabad, and we know that because we measure the distances using latitude and longitude (Hyderabad is a huge city in India). But, how do we say one shopping basket’s contents are closer to another’s? Or one forest is more similar to another in terms of the animals that live in them? We can treat these as comparisons between sets and measure the similarity (or dissimilarity) between them using Jaccard’s coefficient (We’ll use coefficient and similarity score interchangeably). For large datasets, this can be a big task, so…


Binary Classification from Scratch in Python

There is no shortage of articles, videos and tutorials on logistic regression for classification. It’s a classic subject in Machine Learning and is usually a stepping stone before moving on to more complex algorithms. What this article aims to do is show you logistic regression from another lens, where we can solve for a formulaic solution to the weights that we pass to a model that returns the predicted probability. I provide links to the code and solution in the article.

The problem that Logistic Regression aims to tackle is that of finding the probability of an observation of a…


Image for post
Image for post
Photo by JESHOOTS.COM on Unsplash

Testing candidates for a Data Scientist position gives a hiring organization a great sense of how well they can do job-related tasks and manage time effectively. Skills that Data Scientists need to succeed vary by company or even by teams within a company, so testing candidates should be tailored. In general though, Data Science is a process that includes many steps and independent skills that aggregate to something greater than the sum of its parts. …


Image for post
Image for post
Photo by Arisa Chattasa on Unsplash

Your data science project has been getting a lot of attention and now you have been invited to present the topic to executive leadership. The anxiety radiating from your teammates and direct leadership tell you that there is something different about this meeting.

No worries. After being in this situation before I can give a few pointers so you can bring your “A-game” when talking data science to C-level executives. (Note: “C-level” refers to company organization members that have titles starting with “C”, such as the Chief Executive Officer, or CEO)

Start with these:

  • Executives usually have a long career…


Interpretable Machine Learning That Isn’t OLS

Image for post
Image for post
Photo by Max Ostrozhinskiy on Unsplash

Contents

  • Introduction
  • California Housing Dataset Example
  • Conclusion
  • References

Introduction

The interpretable side of machine learning has always been interesting to me. I think it is important to be able to plainly state (to some degree) what the model is doing. Some of the most explainable machine learning models are also the weakest in terms of accuracy, so we are forced to make decisions in order to strike a balance.

This article focuses on the RuleFit algorithm, written in Python, to predict a continuous target variable. This topic has been touched on by other authors, and while you can find methods to explain…


Image for post
Image for post
Photo by Tyler Franta on Unsplash

As the demand for people with a data science skillset has soared, companies have looked for ways to fill that demand. One way is for companies to go out and recruit people who encapsulate all things data science, which usually includes proficiency of a coding language, probably Python. This is because Python is a popular coding language used in data science. Another way is to look within your company and see who has the skillset and make those people your data scientists. Companies can also look within their walls and find somebody who is close to what they want in…


Use Python to Simplify All of the Prep Work for Modeling with Text Data

Image for post
Image for post
Photo by Mike Benna on Unsplash

GitHub link

Introduction

Skip ahead to the actual Pipeline section if you are more interested in that than learning about the quick motivation behind it: Text Pre Process Pipeline (halfway through the blog).

I’ve been looking at performing machine learning on text data but there are some data preprocessing steps that are unique to text data that I was not used to. Because of that, my Python code included a lot of transformation steps where I would wrangle with the data, fit a transformation, then transform the training data, transform the testing data, and then repeat this process for every type of transformation…

Casey Whorton

Data Scientist | British Bake-Off Connoisseur| Recovering Insomniac | Heavy Metal Music Advocate

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store