7 Machine Learning Algorithms to Know: A Beginner's Guide

Written by Coursera • Updated on

Machine learning algorithms power many services in the world today. Here are seven to know as you look to start your career.

[Featured image] Two business intelligence analysts comb through data in an office.

Machine learning (ML) can do everything from analyzing x-rays to predicting stock market prices to recommending binge-worthy television shows. With such a wide range of applications, it’s little surprise that the global machine learning market is projected to grow from $21.7 billion in 2022 to $209.91 billion by 2029, according to Fortune Business Insights [1]. 

At the core of machine learning are algorithms, which are trained to become the machine learning models used to power some of the most impactful innovations in the world today. In this article, you will learn about seven of the most important ML algorithms to know as you begin your own machine learning journey and explore the different learning styles used to turn ML algorithms into ML models. 

Top machine learning algorithms to know

Machine learning algorithms are the fundamental building blocks for machine learning models. From classification to regression, here are seven algorithms you need to know as you begin your machine learning career:

1. Linear regression

Linear regression is a supervised learning algorithm used to predict and forecast values within a continuous range, such as sales numbers or prices. 

Originating from statistics, linear regression performs a regression task, which maps a constant slope using an input value (X) with a variable output (Y) to predict a numeric value or quantity. Linear regression uses labeled data to make predictions by establishing a line of best fit, or “regression line”, that is approximated from a scatter plot of data points. As a result, linear regression is used for predictive modeling rather than categorization.

2. Logistic regression

Logistic regression, or “logit regression,” is a supervised learning algorithm used for binary classification, such as deciding whether an image fits into one class or another. 

Originating from statistics, logistic regression technically predicts the probability that an input can be categorized into a single primary class. In practice, however, this can be used to group outputs into one of two categories (“the primary class” or “not the primary class”). This is achieved by creating a range for binary classification, such as any output between 0-.49 is put in one group and any between .50 - 1.00 is put in another. 

As a result, logistic regression in machine learning is typically used for binary categorization rather than predictive modeling. 

3. Naive Bayes 

Naive Bayes is a set of supervised learning algorithms used to create predictive models for either binary or multi-classification. Based on Bayes’ theorem, Naive Bayes operates on conditional probabilities, which are independent of one another but indicate the likelihood of a classification based on their combined factors.

For example, a program created to identify plants might use a naive Bayes algorithm to categorize images based on particular factors, such as perceived size, color, and shape. While each of these factors is independent of one another, the algorithm would note the likelihood of an object being a particular plant using the combined factors. 

4. Decision tree

A decision tree is a supervised learning algorithm used for classification and predictive modeling. 

Resembling a graphic flowchart, a decision tree begins with a root node, which asks a specific question of data and then sends it down a branch depending on the answer. These branches each lead to an internal node, which in turn asks yet another question of the data before directing it toward another branch depending on the answer. This continues until the data reaches an end node, also called a leaf node, that doesn’t branch any further. 

Decision trees are common in machine learning because they can handle complex data sets with relative simplicity.

video-placeholder
Loading...
Explanation and Examples of Decision Trees

5. Random forest algorithm

A random forest algorithm uses an ensemble of decision trees for classification and predictive modeling. 

In a random forest, many decision trees (sometimes hundreds or even thousands) are each trained using a random sample of the training set (a method known as “bagging”). Afterward, researchers put the same data into each decision tree in the random forest and tally their end results. The most common result is then selected as the most likely outcome for the data set.

Although they can become complex and require significant time, random forests correct the common problem of “overfitting” that can occur with decision trees. Overfitting is when an algorithm coheres too closely to its training data set, which can negatively impact its accuracy when introduced to new data later. 

6. K-nearest neighbor (KNN) algorithm

A K-nearest neighbor is a supervised learning algorithm used for classification and predictive modeling. 

True to its name, KNN algorithms classify an output by its proximity to other outputs on a graph. For example, if an output is closest to a cluster of blue points on a graph rather than a cluster of red points, then it would be classified as a member of the blue group. This approach means that KNN algorithms can be used to either classify known outcomes or predict the value of unknown ones.

7.  K means algorithm

K means is an unsupervised algorithm used for classification and predictive modeling. 

Much like KNN, K means uses the proximity of an output to a cluster of data points to identify it. Each of the clusters is defined by a centroid, a real or imaginary center point for the cluster. K means is useful on large data sets, especially for clustering, though it can falter when handling outliers.

Training machine learning algorithms: four methods

Everyone learns differently – including machines. In this section, you will learn about four different learning styles used to train machine learning algorithms: supervised learning, unsupervised learning, reinforcement learning, and semi-supervised learning. 

Supervised learning

A supervised learning algorithm uses a labeled data set to train an algorithm, effectively guaranteeing that it has an answer key available to cross-reference predictions and refine its system. As a result, supervised learning is best suited to algorithms faced with a specific outcome in mind, such as classifying images.

For example, an algorithm meant to identify different plant types might be trained using images that are already labeled with their names (e.g., “rose,” “pumpkin,” or “aloe vera”). Through supervised learning, the algorithm would be able to identify the differentiating features for each plant classification effectively and eventually do the same with an unlabeled data set. 

Much as a teacher supervises their students in a classroom, the labeled data likewise supervise the algorithm’s solutions and directs them toward the right answer. 

Unsupervised learning 

An unsupervised learning algorithm uses an unlabeled data set to train an algorithm, which must analyze the data to identify distinctive features, structures, and anomalies. Unlike supervised learning, researchers use unsupervised learning when they don’t have a specific outcome in mind, instead, they use the algorithm to cluster data and identify patterns,   associations, or anomalies. 

For example, a business might feed an unsupervised learning algorithm unlabeled customer data to segment its target market. Once they have established a clear customer segmentation, the business could then use this data to direct their future marketing efforts, like social media marketing

Unsupervised learning is akin to a learner working out a solution themselves without the supervision of a teacher.

 

Reinforcement learning 

In reinforcement learning a machine or AI agent attempts to accomplish a task, receives feedback as it does so, and then iterates a new approach until it has devised the optimal solution. As a result, reinforcement learning is akin to the way that a child learns to maneuver a new environment: first, they explore, then interact with it, and over time learn how to seamlessly maneuver the space. 

Due to the feedback loops required to develop better and better strategies, reinforcement learning is often used in video game environments where conditions can be controlled and feedback is reliably given. Over time, the machine or AI learns through the accumulation of feedback until it achieves the optimal path to its goal. 

Semi-supervised learning

Semi-supervised learning (SSL) trains algorithms using a small amount of labeled data alongside a larger amount of unlabeled data. Semi-supervised learning is often used to categorize large amounts of unlabeled data because it might be unfeasible or too difficult to label all data itself. 

Typically, a researcher using SSL would first train an algorithm with a small amount of labeled data before training it with a large amount of unlabeled data. For example, an SSL algorithm analyzing speech might first be trained on labeled soundbites before being trained on unlabeled sounds, which are likely to vary in pitch and style from the labeled data. 

You are Currently on slide 1

Learn more about machine learning 

A career in machine learning begins with learning all you can about it. Even the best machine learning models need some training first, after all. 

To start your own training, you might consider taking Andrew Ng's beginner-friendly Machine Learning Specialization to master fundamental AI concepts and develop practical machine learning skills. DeepLearning.AI’s Deep Learning Specialization, meanwhile, introduces course takers on how to build and train deep neural networks. 

Placeholder

specialization

Machine Learning

#BreakIntoAI with Machine Learning Specialization. Master fundamental AI concepts and develop practical machine learning skills in the beginner-friendly, 3-course program by AI visionary Andrew Ng

4.9

(7,747 ratings)

129,595 already enrolled

BEGINNER level

Average time: 3 month(s)

Learn at your own pace

Skills you'll build:

Decision Trees, Artificial Neural Network, Logistic Regression, Recommender Systems, Linear Regression, Regularization to Avoid Overfitting, Gradient Descent, Supervised Learning, Logistic Regression for Classification, Xgboost, Tensorflow, Tree Ensembles, Advice for Model Development, Collaborative Filtering, Unsupervised Learning, Reinforcement Learning, Anomaly Detection

Article sources

  1. Fortune Business Insights. “The global machine learning (ML) market is expected to grow from $21.17 billion in 2022 to $209.91 billion by 2029,  https://www.fortunebusinessinsights.com/machine-learning-market-102226.” Accessed December 2, 2022. 

Written by Coursera • Updated on

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.

Big savings for your big goals! Save $200 on Coursera Plus.

  • For a limited time, save like never before on a new Coursera Plus annual subscription (original price: $399 | after discount: $199 for one year).
  • Get unlimited access to 7,000+ courses from world-class universities and companies—for less than $20/month!
  • Gain the skills you need to succeed, anytime you need them—whether you’re starting your first job, switching to a new career, or advancing in your current role.