What is machine learning?
Machine learning is the idea that there are generic algorithms that can tell you something interesting about a set of data without you having to write any custom code specific to the problem. Instead of writing code, you feed data to the generic algorithm and it builds its own logic based on the data.
For
example, one kind of algorithm is a classification algorithm. It can put data
into different groups. The same classification algorithm used to recognize
handwritten numbers could also be used to classify emails into spam and
not-spam without changing a line of code. It’s the same algorithm but it’s fed
different training data so it comes up with different classification logic.
This machine learning algorithm is a black box
that can be re-used for lots of different classification problems |
“Machine learning” is an umbrella term covering lots of these
kinds of generic algorithms.
Two kinds of Machine Learning Algorithms
You can think of machine learning algorithms as falling into one of two main categories — supervised learning and unsupervised learning. The difference is simple but really important.
Supervised
Learning
Let’s say you are a real estate agent. Your business is growing,
so you hire a bunch of new trainee agents to help you out. But there’s a
problem — you can glance at a house and have a pretty good idea of what a house
is worth, but your trainees don’t have your experience so they don’t know how
to price their houses.
To help your trainees (and maybe free yourself up for a vacation), you decide to write a little app that can estimate the value of a house in your area based on its size, neighborhood, etc and what similar houses have sold for.
So
you write down every time someone sells a house in your city for 3 months. For
each house, you write down a bunch of details — number of bedrooms, size in
square feet, neighborhood, etc. But most importantly, you write down the final
sale price:
Using that training data, we want to create a program that can estimate
how much any other house in your area is worth:
This is called supervised learning. You knew how much each house
sold for, so in other words, you knew the answer to the problem and could work
backward from there to figure out the logic.
To build your app, you feed your training data about each house
into your machine learning algorithm. The algorithm is trying to figure out
what kind of math needs to be done to make the numbers work out.
This kind of like having the answer key to a math test with all
the arithmetic symbols erased:
Oh no! A devious student erased the arithmetic symbols from the teacher’s answer key! |
From this, can you figure out what kind of math problems was on
the test? You know you are supposed to “do something” with the numbers on the
left to get each answer on the right.
In supervised learning, you are letting the computer work out that
relationship for you. And once you know what math was required to solve this
specific set of problems, you could answer to any other problem of the same
type!
Unsupervised
Learning
Let’s go back to our original example with the real estate agent.
What if you didn’t know the sale price for each house? Even if all you know is the
size, location, etc of each house, it turns out you can still do some really
cool stuff. This is called unsupervised learning.
Even if you aren’t trying to predict an unknown number (like price), you can still do interesting things with machine learning |
This is kind of like someone giving you a
list of numbers on a sheet of paper and saying “I don’t really know what these
numbers mean but maybe you can figure out if there is a pattern or grouping or
something — good luck!”
So what could do with this data? For
starters, you could have an algorithm that automatically identified different
market segments in your data. Maybe you’d find out that home buyers in the
neighborhood near the local college really like small houses with lots of
bedrooms, but homebuyers in the suburbs prefer 3-bedroom houses with lots of
square footage. Knowing about these different kinds of customers could help
direct your marketing efforts.
Another cool thing you could do is automatically identify any outlier houses that were way different than everything else. Maybe those outlier houses are giant mansions and you can focus your best sales people on those areas because they have bigger commissions.
Supervised learning is what we’ll focus on for the rest of this post, but that’s not because unsupervised learning is any less useful or interesting. In fact, unsupervised learning is becoming increasingly important as the algorithms get better because it can be used without having to label the data with the correct answer.
Side note: There are lots of other types of machine learning algorithms. But this is a pretty good place to start.
Source: Medium.com
About us: TMA Solutions was established in 1997 to provide quality software outsourcing services to leading companies worldwide. We are one of the largest software outsourcing companies in Vietnam with 2,500 engineers.
Visit us at https://www.tmasolutions.com/
No comments:
Post a Comment