ML Series - Getting Started with ML using Amazon SageMaker - Part 1

Table of Contents

Introduction
What is Machine Learning?
Dataset and Model
What is Amazon SageMaker?
Summary

Introduction

How long until AI surpasses humans and becomes SUPERINTELLIGENT? If the rate of development is a positive number , it is inevitable. WE'RE DOOMED. I KNOW RIGHT? There is a chance it might happen by the end of this century.

Up until now everyone must be aware of the buzzwords AI, ML and DL. But I'm certain most people don't understand what's under the hood and how everything works.

I believe that every human on this planet should be a part of the process of creation of an entity that will affect the lives of everyone. And you never know, we might end up in a UTOPIA. Because We're HUMANS and would never go down without a fight.

This blog series will help the readers start with Machine Learning in a simple and practical manner. Since this series aims at readers who are newbies and are just getting started with ML, I've decided to use SageMaker, a Managed Service for ML which abstracts the intricate details and is easy to setup and use. So let's get started with MACHINE LEARNING!

What is Machine Learning?

I believe in learning by always asking the right questions. So the question is "How can computers learn to solve problems without being explicitly programmed?" Enter Machine Learning.

Wikipedia defines Machine Learning as -

Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks.

So to simply put it, we have data and we need to create a model to learn from it. You must be wondering - Learn what? Well it depends on the Use-Case. There are four types of Machine Learning -

Supervised learning

Supervised learning (SL) is the machine learning task of learning a function that maps an input to an output based on example input-output pairs.

The main requirement in this case is labeled data. We have pairs of input-output data and the task is to model the relationship between this data. The most common use-cases for Supervised learning are prediction and classification.

Unsupervised learning

Unsupervised learning is a type of algorithm that learns patterns from untagged data. The hope is that through mimicry, which is an important mode of learning in people, the machine is forced to build a compact internal representation of its world and then generate imaginative content from it.

Contrary to Supervised learning, the basic principle behind Unsupervised learning is that the model tries to find patterns in the data and categorizes data based on these patterns. The most common use-cases are Clustering and Anamoly detection.

Reinforcement learning

Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward.

Imagine we have an agent in an environment which can perform actions and receives a reward from the environment on performing actions (If you haven't already noticed, that's how humans function. Except the reward circuit is a part of our brain). The key part here is to define the reward function, and to have a balance between exploration (perform actions which might have a small/negative reward and gather new information about the environment) and exploitation (perform actions which have the maximum reward based on current knowledge).

Semi-supervised learning

Semi-supervised learning is an approach to machine learning that combines a small amount of labeled data with a large amount of unlabeled data during training.

It's a combination of Supervised learning and Unsupervised learning. It's similar to providing a direction to Unsupervised learning using Supervised learning. The dataset is similar to Unsupervised learning except it includes a small amount of labeled data as well.

Dataset and Model

For this series we'll focus on Supervised Learning.

Supervised learning uses labeled datasets to train models. We can either use them for prediction or classification. The primary pre-requisite here is that we must have labeled data which is used to train the model/algorithm, and the trained model can be used for inference on new data.

So, we need a dataset to train on. I have decided to use the Cats vs Dogs dataset. It is a set of labeled images with either a cat or dog as the subject. Our goal is to classify the image as either a "DOG" or a "CAT".

CAT	DOG

Now for the model we will use a Neural Network. I've always been fascinated by these. xD. Imagine it as a black box which learns the relationship between the input data and the output data through training. And voilà, you have a model which can classify dogs and cats!

What is Amazon SageMaker?

AWS provides it's Machine Learning as a Service offering under the name SageMaker. SageMaker allows you to build, train and deploy Machine Learning models on the Cloud. It provides services for Data Labeling, Data Ingestion, Data Processing, Jupyter notebook Studio and Studio Lab as ML development environment, Fully Managed Training, Automatic Model Tuning, Model Deployment and Inference capabilities.

You don't need to manage the underlying hardware required for building and training the models. Hence it's a great option for someone getting started with Machine Learning. So anyone from ML experts to Business Analysts (with no coding expertise) can use SageMaker.