7 Machine Learning Algorithms to Know: A Beginner's Guide

Written by Coursera Staff • Updated on Nov 29, 2023

Machine learning algorithms power many services in the world today. Here are seven to know as you look to start your career.

[Featured image] Two machine learning professionals discuss machine learning algorithms on their computer.

Machine learning (ML) can do everything from analysing X-rays to predicting stock market prices to recommending binge-worthy television shows. With such a wide range of applications, it’s little surprise that the global machine learning market is projected to grow from 26.03 billion USD in 2023 to 225.91 billion USD by 2030, according to Fortune Business Insights [1].

At the core of machine learning are algorithms, which are trained to become the machine learning models used to power some of the most impactful innovations in the world today. In this article, you will learn about seven of the most important ML algorithms to know and explore the different learning styles used to turn ML algorithms into ML models.

Top machine learning algorithms to know

From classification to regression, here are seven algorithms you need to know:

1. Linear regression

Linear regression is a supervised learning algorithm used to predict and forecast values within a continuous range, such as sales numbers or prices.

Originating from statistics, linear regression performs a regression task, which maps a constant slope using an input value (X) with a variable output (Y) to predict a numeric value or quantity.

Linear regression uses labelled data to make predictions by establishing a line of best fit, or 'regression line', that is approximated from a scatter plot of data points. As a result, linear regression is used for predictive modelling rather than categorisation.

2. Logistic regression

Logistic regression, or 'logit regression', is a supervised learning algorithm used for binary classification, such as deciding whether an image fits into one class.

Originating from statistics, logistic regression technically predicts the probability that an input can be categorised into a single primary class. In practice, however, this can be used to group outputs into one of two categories ('the primary class' or 'not the primary class'). This is achieved by creating a range for binary classification, such as any output between 0-.49 is put in one group, and any between .50 and 1.00 is put in another.

As a result, logistic regression in machine learning is typically used for binary categorisation rather than predictive modelling.

3. Naive Bayes

Naive Bayes is a set of supervised learning algorithms used to create predictive models for either binary or multi-classification. Based on Bayes’ theorem, Naive Bayes operates on conditional probabilities, which are independent of one another but indicate the likelihood of a classification based on their combined factors.

For example, a programme created to identify plants might use a Naive Bayes algorithm to categorise images based on particular factors, such as perceived size, colour, and shape. While each of these factors is independent, the algorithm would note the likelihood of an object being a particular plant using the combined factors.

4. Decision tree

A decision tree is a supervised learning algorithm used for classification and predictive modelling.

Resembling a graphic flowchart, a decision tree begins with a root node, which asks a specific question of the data and then sends it down a branch depending on the answer. These branches each lead to an internal node, which asks another question of the data before directing it toward another branch, depending on the answer. This continues until the data reaches an end node, also called a leaf node, that doesn’t branch any further.

Decision trees are common in machine learning because they can handle complex data sets with relative simplicity.

5. Random forest algorithm

A random forest algorithm uses an ensemble of decision trees for classification and predictive modelling.

In a random forest, many decision trees (sometimes hundreds or even thousands) are each trained using a random sample of the training set (a method known as 'bagging'). Afterwards, the algorithm puts the same data into each decision tree in the random forest and tallys their end results. The most common result is then selected as the most likely outcome for the data set.

Although they can become complex and require significant time, random forests correct the common problem of ‘overfitting’ that can occur with decision trees. Overfitting is when an algorithm coheres too closely to its training data set, which can negatively impact its accuracy when introduced to new data later.

6. K-nearest neighbour (KNN) algorithm

A K-nearest neighbour is a supervised learning algorithm for classification and predictive modelling.

True to its name, KNN algorithms classify an output by its proximity to other outputs on a graph. For example, if an output is closest to a cluster of blue points on a graph rather than a cluster of red points, it would be classified as a member of the blue group. This approach means that KNN algorithms can classify known outcomes or predict the value of unknown ones.

7. K-Means algorithm

K-Means is an unsupervised algorithm used for classification and predictive modelling.

Much like KNN, K-Means uses the proximity of an output to a cluster of data points to identify it. Each of the clusters is defined by a centroid, a real or imaginary centre point for the cluster. K-Means is useful on large data sets, especially for clustering, though it can falter when handling outliers.

Training Machine Learning Algorithms: Four methods

Everyone learns differently—including machines. In this section, you will learn about four different learning styles used to train machine learning algorithms: supervised learning, unsupervised learning, reinforcement learning, and semi-supervised learning.

Supervised learning

A supervised learning algorithm uses a labelled data set to train an algorithm, effectively guaranteeing that it has an answer key available to cross-reference predictions and refine its system. As a result, supervised learning is best suited to algorithms faced with a specific outcome in mind, such as classifying images.

For example, an algorithm meant to identify different plant types might be trained using images already labelled with their names (e.g., 'rose', 'pumpkin', or 'aloe vera'). Through supervised learning, the algorithm would be able to identify the differentiating features for each plant classification effectively and eventually do the same with an unlabelled data set.

Much as a teacher supervises their students in a classroom, the labelled data likewise supervises the algorithm’s solutions and directs them towards the right answer.

Unsupervised learning

An unsupervised learning algorithm uses an unlabelled data set to train an algorithm, which must analyse the data to identify distinctive features, structures, and anomalies. Unlike supervised learning, researchers use unsupervised learning when they don’t have a specific outcome in mind. Instead, they use the algorithm to cluster data and identify patterns, associations, or anomalies.

For example, a business might feed an unsupervised learning algorithm unlabelled customer data to segment its target market. Once they have established a clear customer segmentation, the business could use this data to direct future marketing efforts, like social media marketing.

Unsupervised learning is akin to a learner working out a solution themselves without the supervision of a teacher.

Reinforcement learning

In reinforcement learning, a machine or artificial intelligence (AI) agent attempts to accomplish a task, receives feedback as it does so, and then iterates a new approach until it has devised the optimal solution. As a result, reinforcement learning is akin to how a child learns to adapt to a new environment: first, they explore, then interact with it, and over time learn how to seamlessly adapt to the space.

Due to the feedback loops required to develop better strategies, reinforcement learning is often used in video game environments where conditions can be controlled and feedback is reliably given. Over time, the machine or AI learns through the accumulation of feedback until it achieves the optimal path to its goal.

Semi-supervised learning

Semi-supervised learning (SSL) trains algorithms using a small amount of labelled data alongside a larger amount of unlabeled data. Semi-supervised learning is often used to categorise large amounts of unlabelled data because it might be unfeasible or too difficult to label all the data.

Typically, a researcher using SSL would first train an algorithm with a small amount of labelled data before training it with a large amount of unlabelled data. For example, an SSL algorithm analysing speech might first be trained on labelled soundbites before being trained on unlabelled sounds, likely to vary in pitch and style from the labelled data.

Learn more about machine learning

A career in machine learning begins with learning all you can about it. Even the best machine learning models need some training first, after all.

To start your own training, you might consider taking Andrew Ng's beginner-friendly Machine Learning Specialisation on Coursera to master fundamental AI concepts and develop practical machine learning skills. DeepLearning.AI’s Deep Learning Specialisation, meanwhile, introduces course takers to how to build and train deep neural networks.

Article sources

Fortune Business Insights. “The global machine learning (ML) market is expected to grow from $21.17 billion in 2022 to $209.91 billion by 2029, https://www.fortunebusinessinsights.com/machine-learning-market-102226.” Accessed March 1, 2023.

Keep reading

Updated on Nov 29, 2023

Written by:

Coursera Staff

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.