Your browser is out of date!

Update your browser to view this website correctly. Update my browser now

×

Machine learning is a discipline that uses computer algorithms to extract useful knowledge from data.

Why does machine learning matter?

Machine learning plays a critical role in digital transformation. Across industries, organizations seek to leverage the digital revolution for more revenue or lower costs. Machine learning makes it possible for teams to work smarter, do things faster, and make previously impossible tasks routine.

How do organizations use machine learning?

Machine learning can:

  • Predict a future value
  • Estimate a probability
  • Infer an unknown
  • Classify an object
  • Group similar objects together
  • Detect associations
  • Identify outliers

Organizations put these capabilities to work in numerous ways. For example, a retailer can use machine learning to predict the volume of traffic in a store on a given day and use that prediction to optimize staffing. A bank can use machine learning to infer the current market value of a home (based on its size, characteristics, and neighborhood); in turn, this lowers the cost of appraisals and expedites mortgage processing.

Is machine learning new?

No. Some machine learning techniques date back to the 1940s. As in any field, researchers continuously innovate. However, widely used methods such as logistic regression and decision trees are more than 50 years old.

If machine learning isn't new, why is there so much interest today?

Machine learning algorithms need a lot of data and computing power to produce useful results. Today, we have more data than ever, and computing power is pervasive and cheap. Machine learning algorithms are better than ever and widely available in open source software. Some well-publicized recent successes for machine learning add to the "buzz."

How does machine learning make impossible tasks routine?

Machine learning produces knowledge that organizations build into applications that can process millions of transactions at a fraction of the cost of manual handling. This capability makes it possible for businesses to do things that would be prohibitively expensive if performed by humans.

For example, consider an application that handles incoming email traffic to a customer service center. With text mining, a kind of machine learning, the app automatically responds to some emails and routes others to specialists for a response. It would be extraordinarily expensive for an organization to hire human analysts to read every incoming email. Machine learning makes it possible for the center to offer a communications channel to customers at an acceptable cost.

How does machine learning work?

There are many different types of machine learning algorithms, and each class works differently. In general, machine learning algorithms begin with an initial hypothetical model, determine how well this model fits a set of data, and improve the model iteratively. This training process continues until the algorithm can find no additional improvements, or the user stops the process.

What is the difference between statistics and machine learning?

Researchers use statistical techniques to test the hypothesis that data conforms to a known mathematical distribution, such as a linear model. Machine learning algorithms, on the other hand, seek to learn patterns that do not necessarily conform to known mathematical distributions.

Statisticians developed tools such as linear regression many years ago, when researchers worked with small data sets and performed computations by hand. Adrien-Marie Legendre published the first description of the least squares method, a standard approach in regression analysis, in 1805.

Today, practitioners have data and computing power that was not available to classical statisticians. While academics once debated the validity of machine learning and statistical techniques, today most data scientists freely use methods from both disciplines.

What is the difference between machine learning and artificial intelligence (AI)?

Machine learning is part of the artificial intelligence ecosystem, but AI includes additional capabilities, such as sensors, devices that interact with the natural world, and computer-based reasoning.

Autonomous vehicles offer an excellent example of applied AI. There are machine learning components built into an autonomous vehicle. But the vehicle also includes sensors that capture and encode data about the world — a "brain" that reasons and makes decisions — and devices that instruct the wheels to turn, the engine to accelerate, and so forth.

What is the difference between machine learning and deep learning?

Deep learning is a subfield of machine learning. Neural networks are a type of machine learning that represent knowledge as a set of mathematical functions organized in a directed graph and arranged in layers. Neural networks with multiple "hidden" layers are so-called "deep" neural networks. Deep learning is useful because it performs well on tasks such as image and speech recognition, where other machine learning techniques perform poorly.

Learn more about deep learning here.

What is the difference between machine learning and data science?

Machine learning is a technology; data science is a discipline. Data scientists use machine learning to build predictive applications.

Who creates machine learning algorithms?

Researchers and practitioners in business, government, and academia create or enhance machine learning algorithms. They publish papers that describe the benefits of each innovation.

A machine learning algorithm is only useful when it is implemented in software. Most algorithm developers choose freely available open source software for their algorithms; this facilitates broader adoption by the community.

What languages do data scientists use for machine learning?

There are machine learning libraries available for many different computer languages, including C and Java. However, the most popular languages among data scientists are Python, R, and Scala.

Does Cloudera offer tools for machine learning?

Yes. Cloudera Enterprise Data Hub includes Apache Spark, a distributed in-memory engine for high-performance data processing. Spark includes a machine learning library called MLlib, which has many widely used machine learning techniques. Learn more about Cloudera's distribution of Apache Spark here.

Cloudera also offers Data Science Workbench (CDSW), a self-service platform for data science. CDSW supports on-demand provisioning and access for data scientists who want to use Python, R, and Scala.