By John Sizemore, Vice President - Data Intelligence & Analytics

I’m often told that Machine Learning sounds complicated – but it doesn’t have to be. If I was asked to explain ML in 20 words or less, this is what it would sound like:

Understand the problem. Clean up the data. Investigate relationships. Engineer the dataset. Build the model. Tune to high performance.

At its core, ML is pretty straightforward. But it does need to follow a process. Here’s a more in-depth breakdown of the stages that can help you turn your data into proactive learnings: 

  • Understand - We can't improve what we don't understand, so our solutions are always grounded in a deep understanding of a process and the data related to that process.
  • Clean - The real world is messy, and data is almost never what we've been told. To get data ready for both analysis and (eventually) machine learning, we have to clean and process it.
  • Investigate - Before we can teach a machine what is important in a dataset, we have to understand it ourselves. Investigating data is really about driving a deeper understanding of a dataset, its correlations and relationships, identifying patterns, and so on. It's rare that complex processes have simple solutions, but it's often relatively simple analysis that sets us on the path of a solution.
  • Engineer - Machines are not smarter than humans; they are just great at fast math. But to learn best, they must be taught in very specific ways. This step is about prepping a dataset to train a model in the best way possible, as well as about bringing new information to the model to give it the best chance of seeing the signal we want.
  • Build & Tune - This is the fun part -- creating, testing, and tuning predictive models. This stage includes retraining models as new data becomes available, as well as assessing model performance over and doing maintenance work to make sure the model continues to deliver value.

Don’t let complex terminology overwhelm you when it comes to using ML. All it takes is 20 words and 1 open mind.

Most Recent News & Articles

100th Episode Of The Dan Smolen Podcast

Prateek Joshi, Founder and CEO of Plutoshift, discusses how A.I. makes the world a better place on the 100th episode of The Dan Smolen Podcast. The Dan Smolen Podcast...
Data quality dimensions for machine learning

8 Dimensions of Data Quality

By Prateek Joshi Large companies have enormous physical infrastructure. This infrastructure is well instrumented and data is collected continuously. The Plutoshift platform uses this data to help them monitor...

Databases, Infrastructure, and Query Runtime

By: Andrew Carlisle Recently, my team was tasked with making a switch from a combined MySQL and Cassandra infrastructure to one in which all of this data is stored...

Executive guide to assessing your data readiness in 5 steps

By Prateek Joshi Within large companies, data is stored across many systems. If we specifically look at companies with large operations infrastructure, there are many different types of data...