Big data refers to large data sets that can be studied to reveal patterns, trends, and associations. The vast amount of data collection avenues that exist means that data can now come in larger quantities, be gathered much more quickly, and exist in a variety of different formats than ever before (characteristics called the three Vs of big data: volume, velocity, and variety). This new, larger, and more complex data is collectively called big data.
Though there is no threshold that separates big data from traditional data, big data is generally considered to be “big” because it cannot be processed effectively and quickly enough by older data analysis tools.
Read more: Learn what analysts actually do to break down data
Big data can come from:
Smart (Internet of Things) devices: A connection to the internet enables companies to collect data through devices like smart home systems, robotic vacuum cleaners, smart TVs, and wearable fitness trackers.
Social media: Likes, shares, posts, comments, how long you spend looking at a post—all of this information is considered insightful data about people’s behavior, sentiment, and preferences.
Websites: Companies or other website owners can track page visits, general locations of visitors, see how long audiences spend on a page, what links are most clicked, and cursor movement.
Business transactions: Data can come from customers as they purchase products, online and in person. Price, time of purchase, payment methods, and other details can inform a business about customer demand for their products.
Machinery: Even without an internet connection, machines like road cameras, sensors, and medical equipment can record information.
Healthcare: The healthcare system is full of data. Data analysts can use aggregated information on healthcare records, insurance, and patient summaries to drive new insights and enhance patient care.
Government: City, state, and federal governments can use data from many sources—auto traffic information, agricultural yields, weather tracking systems, demographic information from censuses, to name a few—to make policy decisions.
Emerging information technology has allowed data to be collected, stored, and analyzed at unprecedented scales. The internet continues to be adopted by new users in the US and across the globe, and developing technologies have allowed internet integration into many different products, creating numerous new sources of data. The millions of people watching Netflix, using Google, or buying products online every day contribute to the increasing volume and sophistication of big data.
Big data can be used by most any entity to make decisions about their operations. A business, for example, can analyze the data they collect to better understand customer preferences. Big data in healthcare systems can be used to find common symptoms of diseases, or decide how much staff to put on a hospital floor at any given time. Governments may use traffic data to plan new roads, or track crime rates or terrorism risks to adjust their response accordingly.
Data analysts and other professionals who work with big data may use the following tools and methods:
Predictive analytics: Analysts can use data to predict the likelihood of events or trends in the future by using predictive models and machine learning technology.
Data mining: Data mining refers to a process that combs through the vast amounts of data to find patterns, trends, and correlations. Finding relationships between data points is key to helping organizations make decisions.
Machine learning: Machine learning—a form of artificial intelligence that learns and improves itself continuously—helps to predict trends and find patterns in large sets of data. Machine learning can be useful in adapting to new data influxes.
Deep learning: Deep learning is a subset of machine learning that is based on artificial neural networks and mimics the learning process of the human brain. Deep learning is often used in speech and text recognition, and computer vision technology.
Data warehouses: Data warehouses store large amounts of historical data. The data is typically cleaned and organized, and can be accessed at a later date to be analyzed.
Hadoop: Hadoop is a software framework used to store and process vast amounts of data that can work across several clusters of computers. Hadoop’s capacity to be scaled easily and ability to store various types of data at once have made it the go-to platform to process big data.
Apache Spark: Apache Spark is a software framework that combines data analysis with artificial intelligence. It can perform analyses on large sets of data more quickly in many cases than Hadoop.
Data-related professions—data analysts and scientists, AI and machine learning specialists, and big data specialists—took the top three positions in the World Economic Forum’s list of top job roles with increasing demand across industries in 2020 . Here’s a closer look at the jobs that use big data in different capacities.
Data analyst: A data analyst works to gather, clean, and interpret data and create data models. Data analysts can work in a variety of different industries, including business, science, and healthcare.
Data engineer: Data engineers work to create and maintain data infrastructure. This can include data warehouses, data pipelines, and other forms of organizing data that analysts can use to make predictions or other interpretations.
Data scientist: A data scientist generally uses mathematical or statistical knowledge to build algorithms, models, and other analytical tools to help organize and interpret data.
Business intelligence analyst: Business intelligence analysts parse business data like sales information or customer engagement metrics to form insights into how a business is performing.
Operations analyst: Operations analysts gather data about operational issues in businesses or other organizations. Operations analysts can use data to find solutions to issues in production, staffing, or other aspects of running a business.
Marketing analyst: Marketing researchers or analysts harvest information about current or potential customers, market conditions, or competitor activities. The data collected is then used to understand how a business can respond through marketing tactics or product adjustments.
Learning to incorporate big data into your career can bring you fresh insights into your work, and data is likely only to continue to grow in importance. Several courses online can help you get started.
Learn to navigate your way around big data and get a grasp on Hadoop with UC San Diego’s course on Big Data.
Familiarize yourself with the basics of machine learning with a course from Stanford University.
Find out how to scale data science and machine learning for big data using Apache Spark.
1. World Economic Forum. "The Future of Jobs Report 2020, http://www3.weforum.org/docs/WEF_Future_of_Jobs_2020.pdf." Accessed March 27, 2021.