Latest News

Tuesday, August 4, 2020

Machine Learning is Fun! (Part 2)

That’s cool, but does being able to estimate the price of a house really count as “learning”?

As a human, your brain can approach most any situation and learn how to deal with that situation without any explicit instructions. If you sell houses for a long time, you will instinctively have a “feel” for the right price for a house, the best way to market that house, the kind of client who would be interested, etc. The goal of Strong AI research is to be able to replicate this ability with computers.

But current machine learning algorithms aren’t that good yet — they only work when focused a very specific, limited problem. Maybe a better definition for “learning” in this case is “figuring out an equation to solve a specific problem based on some example data”.

Unfortunately “Machine Figuring out an equation to solve a specific problem based on some example data” isn’t really a great name. So we ended up with “Machine Learning” instead.

Of course if you are reading this 50 years in the future and we’ve figured out the algorithm for Strong AI, then this whole post will all seem a little quaint. Maybe stop reading and go tell your robot servant to go make you a sandwich, future human.

Let’s write that program!

So, how would you write the program to estimate the value of a house like in our example above? Think about it for a second before you read further.

If you didn’t know anything about machine learning, you’d probably try to write out some basic rules for estimating the price of a house like this:

def estimate_house_sales_price(num_of_bedrooms, sqft, neighborhood):
  price = 0  # In my area, the average house costs $200 per sqft
  price_per_sqft = 200  if neighborhood == "hipsterton":
    # but some areas cost a bit more
    price_per_sqft = 400  elif neighborhood == "skid row":
    # and some areas cost less
    price_per_sqft = 100  # start with a base price estimate based on how big the place is
  price = price_per_sqft * sqft  # now adjust our estimate based on the number of bedrooms
  if num_of_bedrooms == 0:
    # Studio apartments are cheap
    price = price
20000
  else:
    # places with more bedrooms are usually
    # more valuable

    price = price + (num_of_bedrooms * 1000) return price

If you fiddle with this for hours and hours, you might end up with something that sort of works. But your program will never be perfect and it will be hard to maintain as prices change.

Wouldn’t it be better if the computer could just figure out how to implement this function for you? Who cares what exactly the function does as long is it returns the correct number:

def estimate_house_sales_price(num_of_bedrooms, sqft, neighborhood):
  price = <computer, plz do some math for me>  return price

One way to think about this problem is that the price is a delicious stew and the ingredients are the number of bedrooms, the square footage and the neighborhood. If you could just figure out how much each ingredient impacts the final price, maybe there’s an exact ratio of ingredients to stir in to make the final price.

That would reduce your original function (with all those crazy if’s and else’s) down to something really simple like this:

def estimate_house_sales_price(num_of_bedrooms, sqft, neighborhood):
 price = 0 # a little pinch of this
 price += num_of_bedrooms * .841231951398213 # and a big pinch of that
 price += sqft * 1231.1231231 # maybe a handful of this
 price += neighborhood * 2.3242341421 # and finally, just a little extra salt for good measure
 price += 201.23432095 return price

Notice the magic numbers in bold — .841231951398213, 1231.12312312.3242341421, and 201.23432095. These are our weights. If we could just figure out the perfect weights to use that work for every house, our function could predict house prices!

A dumb way to figure out the best weights would be something like this:

Step 1:

Start with each weight set to 1.0:

def estimate_house_sales_price(num_of_bedrooms, sqft, neighborhood):
  price = 0  # a little pinch of this
  price += num_of_bedrooms * 1.0  # and a big pinch of that
  price += sqft * 1.0  # maybe a handful of this
  price += neighborhood * 1.0  # and finally, just a little extra salt for good measure
  price += 1.0  return price

Step 2:

Run every house you know about through your function and see how far off the function is at guessing the correct price for each house:

Use your function to predict a price for each house.

For example, if the first house really sold for $250,000, but your function guessed it sold for $178,000, you are off by $72,000 for that single house.

Now add up the squared amount you are off for each house you have in your data set. Let’s say that you had 500 home sales in your data set and the square of how much your function was off for each house was a grand total of $86,123,373. That’s how “wrong” your function currently is.

Now, take that sum total and divide it by 500 to get an average of how far off you are for each house. Call this average error amount the cost of your function.

If you could get this cost to be zero by playing with the weights, your function would be perfect. It would mean that in every case, your function perfectly guessed the price of the house based on the input data. So that’s our goal — get this cost to be as low as possible by trying different weights.

Step 3:

Repeat Step 2 over and over with every single possible combination of weights. Whichever combination of weights makes the cost closest to zero is what you use. When you find the weights that work, you’ve solved the problem!

Source: Medium.com


About us: TMA Solutions was established in 1997 to provide quality software outsourcing services to leading companies worldwide. We are one of the largest software outsourcing companies in Vietnam with 2,500 engineers.

Visit us at https://www.tmasolutions.com/


No comments:

Post a Comment

Tags

Recent Post