Explore what substring extraction in Excel is, why it’s important, and what functions you need to know to use this technique effectively.
![[Featured Image] A person sits at an office desk and uses Excel substring functions on two computer monitors.](https://d3njjcbhbojbot.cloudfront.net/api/utilities/v1/imageproxy/https://images.ctfassets.net/wp1lcwdav1p1/6oDG3IewQMs6eza52e3ko0/0c2197b092064a9089bd1a38e993d0db/GettyImages-1572438118.jpg?w=1500&h=680&q=60&fit=fill&f=faces&fm=jpg&fl=progressive&auto=format%2Ccompress&dpr=1&w=1000)
Extracting substrings in Excel allows you to organize, manage, and process your data in a way that makes downstream analysis more efficient.
A substring is part of a text string, allowing you to extract meaningful information and organize your data inputs.
Excel functions such as MID, LEFT, RIGHT, and FIND allow you to extract substrings effectively within your workbook.
You can choose to use Excel functions in isolation or in combination to process and manage your data.
Learn more about the key functions that can help you extract substrings and gain powerful insights into your data set. You can also start learning with the Excel Skills for Data Analytics and Visualization Specialization. In as little as four weeks, you can discover how to bring your data to life using advanced Excel functions, creative visualizations, and powerful automation features. By the end, you’ll have a shareable certificate to add to your professional profile.
An Excel substring is part of a text string. For example, this might be the “CA” in “San Diego, CA” or the “marksmith” in “marksmith@gmail.com.” You can use a substring technique to turn your real-world, unstructured data into organized, usable data, such as splitting full names into columns, extracting department codes from employee IDs, or pulling years from people’s birthdates.
While you can’t currently use a single substring function in Excel, you can find several text functions that help to extract portions of your text. Some of these include MID, LEFT, RIGHT, FIND, LEN, SUBSTITUTE, REPT, TRIM, LOWER, and UPPER. By learning how each works, you can determine the best way to find and organize the information in your data set.
Being able to pull substrings from your text matters because real-world data rarely comes in a perfectly formatted organization. You might have customer first and last names merged, transaction records combined with timestamps, extraneous characters added to imported data, or other organizational obstacles that make it difficult to effectively work with your data set. Using substring functions gives you the precision tools needed to parse this information accurately, improving your ability to properly analyze, report, and integrate your data.
Ctrl+F4 in Excel is a way to close the selected workbook window without closing the entire Excel application. This is a great way to close one workbook quickly when you have multiple workbooks open, without losing your place in every workspace. If you want to close the entire Excel application, you can opt for Alt+F4 instead.
Once you have a grasp on your raw data, you can use Excel functions individually or in combination to organize and extract relevant information. Starting with the basics, such as LEFT, RIGHT, MID, and FIND, can help you build a foundation to explore and manage your data.
When you want a substring from the start of a cell, like a prefix or country code, use LEFT. The base formula is:
=LEFT(text, [num_chars])
In this formula, “text” is the text string or cell number with the text string. “Num_chars” is optional and specifies the length of the text to extract. If omitted, the formula assumes one character.
For example, you might have a set of phone number entries, each in the format “XXX-YYY-ZZZZ,” where XXX represents the location code. If you wanted to extract these values, you could use the formula: =LEFT(A2, 3)
With these specifications, the formula would return the first three characters, starting at the leftmost digit in cell A2.
This function is essentially the opposite of LEFT. The base function is:
=RIGHT(text, [num_chars])
Similar to the LEFT function, the “text” specifies the text string or cell number with the text string, while “num_chars” is the optional length specification. If omitted, num_chars is assumed to be one.
An example of RIGHT would be the last four digits of a Social Security number. Perhaps you have a data set of Social Security numbers in the format “XXX-YY-ZZZZ,” and you want to extract the last four digits. You could use the formula: =RIGHT(A2, 4)
This formula would return the last four digits of the string in cell A2.
The MID function allows you to extract characters from any position within your string. Its raw syntax is:
=MID(text, start_num, num_chars)
In this formula, “text” is the text string or cell number, “start_num” is the position of the first character you want to extract, and “num_chars” is the number of characters you want returned.
For example, say you have birthdate data in the format of DDMMYYYY (day, month, year) and you want to extract the month. You could use the formula: =MID(A2, 3, 2)
With this formula, you would return two digits starting at digit three from cell A2. If A2 contained the string “19021970”, corresponding to February 19, 1970, you would return “02” for February in your formula cell.
The find function helps you locate a text string within a second text string by returning the character number of where it first appears. For example, =FIND(“@”, “user@example.com”) would return five, representing the position of the “@.” FIND is case-sensitive, and you can combine it with other functions to locate a position and then perform follow-up operations.
The base function is:
=FIND(find_text, within_text, [start_num])
With this formula, “find_text” is the text you want to find, “within_text” is the string containing the text information you’re looking for, and “start_num” is an optional character location to start your search.
Continuing the previous example, if you had a list of emails and you wanted to locate the position of the “@” symbol, you could use the formula: =FIND(“@”, A2, 1)
This would indicate that you were searching for the “@” symbol in cell A2, starting with the first character.
In addition to LEFT, RIGHT, MID, and FIND, a few additional functions can help you manage, manipulate, and edit your data to create cleaner data sets. Some useful functions to begin with include:
The LEN function returns the number of characters in a string. The base formula is:
=LEN(text)
Where “text” is the cell or string. For example, =LEN(A2) returns the number of characters in the string in A2. If the text in cell A2 were “Washington, DC” the LEN result would be 14, which accounts for each letter, a comma, and a space.
The SUBSTITUTE function replaces specific text in a string with a new value. Like the FIND function, the SUBSTITUTE function is case-sensitive. This formula is:
=SUBSTITUTE(text, old_text, new_text, [instance_num])
In this formula, “text” is the string or reference cell, “old_text” is the text you want to replace, “new text” is the replacement text, and “instance_num” is the optional specification of which instance you want to replace. If this isn’t specified, all instances are replaced.
An example would be =SUBSTITUTE(A2, “2025”, “2026”) to replace the value of 2025 with 2026 in cell A2.
The REPT function repeats specified text a certain number of times. The base formula is:
=REPT(text, number_times)
In this formula, “text” is the string or cell number, and “number_times” is the number of times to repeat the text. For example:
=REPT("*", 5) would display *****.
The TRIM function removes extraneous spaces from a string of text while keeping the single spaces between words. This can help you standardize your strings if you have irregular or messy inputs. The base function is:
=TRIM(text)
For example, you could specify =TRIM(A2) to remove all spaces at the beginning and end of the contents in A2 while leaving singular spaces between words.
Converting all of your text strings to lowercase or uppercase can help with organization and streamline the use of case-sensitive functions like FIND. To use this function, you’ll start with the base functions:
=LOWER(text)
=UPPER(text)
For example, =LOWER(A2) would return the string in A2 with all lower case values, while =UPPER(A2) would return the string in A2 with all upper case values.
When you begin working with your data in Excel, taking time to ensure it is clean, complete, and saved properly can help avoid costly mistakes and allow for streamlined data processing later on. Some tips to keep in mind:
Always create a backup copy of the original data.
Check spelling and grammar to ensure values are consistent.
Remove duplicate rows by filtering for unique values.
Find and replace text that is messy or unnecessary.
Change the case of your text to standardize search processes (when appropriate).
Fix numbers, number signs, dates, and times as appropriate to make the analysis easier.
Merge, split, and rearrange columns in a way that makes sense for your data import.
If you’re working with the same type of data repeatedly, you might explore how to create a macro or write code to automate the process. In some cases, this might involve automatically filling in worksheet cells, creating and formatting tables, and automating repetitive tasks.
Substring functions are typically fast and flexible, and they don’t require add-ins or programming knowledge, making them an accessible way to manage your data without a programming background. Plus, when you reference cells directly rather than the text itself, the output dynamically updates with the new information. You can also copy the formula across a large number of rows or columns, helping with larger manipulation tasks.
However, if your data is very complex or has irregular formatting, your formulas may be limited in their capacity to address more complicated data processing tasks. When necessary, combining Excel with programming languages such as SQL, R, or Python can help you more comprehensively process and clean complex data sets.
In Excel, Ctrl+H allows you to replace text or numbers. You can also do this by going to Home > Editing > Find & Select > Replace. This allows you to type the string you want to find, the values you want to replace it with, and where you want to search (sheets, workbook, etc).
While learning how to extract substrings is an important part of mastering Excel, this skill becomes even more powerful when combined with other Excel and data analytics techniques. When getting started, consider building a strong foundation by taking the time to learn how to appropriately enter your data, apply cell borders, apply cell shading, create simple formulas, and put your data in a table.
Once you’ve mastered these basics, you can move to more advanced Excel skills like learning to sort and filter your data and use Quick Analysis to get an overview of your data. Once you’ve built a strong understanding of how to enter, store, and clean your data, including extracting substrings, you can move to data analytics topics in areas such as descriptive analytics, diagnostic analytics, predictive analytics, and prescriptive analytics. Deciding the right type of analytical techniques to explore will depend on your data types, objectives, and available resources.
If you’d like to learn more about Excel and related data analytics techniques before launching your career, consider subscribing to our LinkedIn newsletter, . You can also explore more through our free resources below:
Learn key terms: Excel Terms and Definitions
Build new skills: How to Use Dynamic Named Ranges in Excel
Hear from experts: 7 Questions with a Data Analytics Professor
Whether you want to develop a new skill, get comfortable with an in-demand technology, or advance your abilities, keep growing with a subscription. You’ll get access to over 10,000 flexible courses.
Editorial Team
Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...
This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.