What is machine learning? Why does it matter? Should you get ready to welcome our robot overloads? If you’re curious about computer advances, or are thinking about getting into programming, you really need to know about this powerful technology. Let’s jump in and get you acquainted!
Learning the ropes
The goal is for computers to successfully generalize like people do, without getting too caught up on the details. Princeton calls it “programming by example.” The programmer codes the system to complete a simple programming task itself, with the ability to analyze feedback and change (within narrow parameters). This involves three basic tasks:
- Look at data. First, the computer system needs to analyze data. This is often called representation, because the data must be properly classified with the right characteristics based on the goal of the machine learning systems. For example, it would be a waste of time programming a system with data points that describe the weight of apples when it really needs to study the color of apples.
- Learn a lesson. With the right sort of data present, the computer needs to score that data and come to some sort of conclusion. This is usually called evaluation, because the computer must know what data points indicate, and be available to evaluate that data accordingly. For example, a system may link specific data points that correspond with the color and taste of apples. The apples are then rated by taste, based on the data the system currently has.
- Apply the lesson to new data. The system can now take a look at new data and come to its own conclusions, a process known as optimization. For example, it can pick out the apples in a bushel that are most likely to taste good based on their color. Of course, the system may not do well at first, but if it keeps on receiving data about the apples that taste good—perhaps by asking the humans that are eating the apples—then it gains a lot more information to compare taste and color. Eventually, the system can become a pro at immediately identifying the tastiest apples, and can do it far faster than any human could, all without any additional programming.
As you can probably tell, there is a lot of terminology associated with machine learning – and, annoyingly, it’s different from similar terminology used in statistics and other data-studying fields. We’re going to avoid using it when possible to help make things clearer, but if you want to dive into a class or course, you’ll find a lot of new vocabulary.
The heart of machine learning: Data sets and algorithms
So, what makes machine learning tick? First, data needs to be represented in accurate ways. This is actually a very challenging step, because in the age of Big Data there’s an incredible amount of information out there, and it needs to be classified in just the right way for proper use. In other words, you need to create an alphabet for the computer to learn before it can start reading on its own, and that takes a lot of work! There is much discussion on how to properly pick feature values in this field of the programming world.
Second, even with properly classified data, the computer still needs to evaluate accurately and change according to what it learns. This is where the algorithms come in, and boy are they complicated. We don’t want to drown you in a sea of coding woes, so instead let’s talk about the two primary groups of algorithms used: Supervised and unsupervised.
- Supervised learning: These algorithms are well-defined and generally have right and wrong answers. You can think of them like True/False choices on a test. They study data and file the results into good and bad baskets, eventually learning exactly what data is associated with each basket. These sharp-eyed algorithms are often used to predict results based on small details that humans might not notice.
- Unsupervised learning: This type of algorithm is more like a Fill in the Blank question. There’s no “right answer” programmed in, so the system instead looks at available data and sees what it can find. This is very useful when it comes to pattern recognition and finding root causes long before a human would be able to.
A lot of that happens in what used to be called a “black box,” although the University of Washington calls it “black art,” referencing the twisty, private coding paths that programmers use to implement effective machine learning —which often involve unorthodox or even random methods. We’re sure that teachers of human students can relate.
These algorithms are typically closely-guarded secrets, and successful algorithms are hoarded and constantly tweaked for improvements. However, there are exceptions. Microsoft Azure, for example, has a cloud service that allows you to borrow from a library of available algorithms to create your own basic machine learning for apps or web services you may be developing.
Out in the open
- Google: Google uses some advanced machine learning in its search engine algorithm, mostly for user intent. Google wants its search engine to be able to tell, through trial and error, what users really want when they search using colloquial phrases.
- Data filtering: Think about your spam filter. How does it tell which emails are spam and which aren’t? That’s right, it’s a tiny little machine learning system right there in your email.
- Marketing: Online stores and social media use these systems to predict what you may be interested in buying, based on what you and others have liked, retweeted, bought in the past, and so on.
- Healthcare: Advanced machine learning systems can be used to diagnose patients based on symptoms, spot problems with medication, and more. This is a new field, but one filled with potential.
- Identity theft prevention: Computers can look at purchases and credit report data, then compare them with past results to figure out how likely it is if a credit card or financial account has been stolen.
- Smart devices: A simple machine learning device is the Nest Thermostat, which learns your temperature patterns and starts mimicking them on its own. A more complex version may boast facial or voice recognition software that learns identifying characteristics.
- Self-driving cars: All self-driving car projects are using some form of machine learning to better understand and respond to road conditions, which is why all those trial runs are so important
Ultimately, machine learning is a road that could eventually lead to realistic AI. The goal of teaching computers to teach themselves always points toward artificial intelligence. Of course, we’re not nearly there yet. Computers are very good at analyzing data, but they need strict parameters and can’t branch out on their own. Machine learning is the very first baby step in helping computer systems figure out how to do this.