This course will offer you an opportunity to learn the fundamental concepts and emerging technologies in data storage and data governance. It presents a balanced theory-practice focus and covers Structured Query Language, and two flavors of NoSQL databases in MongoDB and Neo4j graph database. It also includes a brief introduction to big data management including hadoop, MapReduce, and Apache Spark. By the end of this part 2 course on data analytics, you will have a foundational understanding of the theory and applications of database management to support data analytics, data mining, machine learning, and artificial intelligence.
This module first presents an overview of the structured query language (SQL) Data Definition Language (SQL DDL) to define a relational data model. It examines the schema creation, table creation, drop command, and alter command. Various syntaxes are illustrated with explicit examples. This module also discusses the SQL Data Manipulation Language (SQL DML) used to retrieve data, update data, insert new data, and delete existing data. The focus is on SQL INSERT statements for inserting data into tables and some simple SQL SELECT statements. More complex SQL SELECT statements will be discussed in later modules along with SQL DELETE and SQL UPDATE statements.
What's included
1 video10 readings7 assignments
Show info about module content
1 video•Total 1 minute
Meet Your Faculty•1 minute
10 readings•Total 113 minutes
Course Introduction•2 minutes
Syllabus - Data Management for Analytics Part 2•10 minutes
Academic Integrity•1 minute
What is SQL?•15 minutes
SQL Data Definition Language (DDL)•5 minutes
A DDL example•20 minutes
DROP and ALTER command•10 minutes
SQL INSERT statement•15 minutes
SQL SELECT statement•30 minutes
Module 1 Summary•5 minutes
7 assignments•Total 13 minutes
Check Your Prior Knowledge•3 minutes
Assess Your Learning: What is SQL?•1 minute
Assess Your Learning: SQL Data Definition Language (DDL)•2 minutes
Assess Your Learning: A DDL Example•2 minutes
Assess Your Learning: DROP and ALTER Command•2 minutes
Assess Your Learning: SQL INSERT Statement•1 minute
Assess Your Learning: SQL SELECT statement•2 minutes
Structured Query Language, Part 2
Module 2•2 hours to complete
Module details
This module continues the discussion of the SQL data manipulation language (DML) SELECT statement. It introduces various aggregate functions: COUNT, SUM, AVG, VARIANCE, MIN, and MAX, which are used to summarize information from database tuples. This is followed by the GROUP BY/HAVING clause, which allows the application of aggregate functions to subgroups. This module then discusses join queries that allow the user to combine or join data from multiple tables. The inner join queries feature a “where” clause that matches one or multiple columns from two tables. The left outer join, right outer join, and full outer join can be used to keep all the tuples of one or both tables in the result, regardless of whether or not they have matching tuples in the other table. All queries in this module use the Wine database in the online playground and can be executed there.
What's included
1 video6 readings6 assignments
Show info about module content
1 video•Total 4 minutes
Aggregate Functions•4 minutes
6 readings•Total 85 minutes
Queries with Aggregate Functions•25 minutes
Queries with GROUP BY/HAVING•10 minutes
Queries with ORDER BY•10 minutes
Inner Joins•20 minutes
Outer Joins•15 minutes
Module 2 Summary•5 minutes
6 assignments•Total 11 minutes
Check Your Prior Knowledge•2 minutes
Assess Your Learning: Queries with Aggregate Functions•2 minutes
Assess Your Learning: Queries with GROUP BY/HAVING•1 minute
Assess Your Learning: Queries with ORDER BY•2 minutes
Assess Your Learning: Inner Joins•2 minutes
Assess Your Learning: Outer Joins•2 minutes
Structured Query Language, Part 3
Module 3•3 hours to complete
Module details
This module presents more complex SQL queries. It introduces nested queries where a complete SELECT FROM block appears in the WHERE clause of another query. The subquery or inner block is nested in the outer block and there can be multi-level nesting. The query optimizer usually flattens the nested query into multiple queries and executes them sequentially from the innermost to the outermost level. This module also examines the correlated nested query, where the inner block uses one or more columns of the table defined in the outer block. In this case, the query cannot be flattened, and the inner block subquery must be evaluated for each tuple of the table (also used in the inner block). The usage of the operators >= ALL and > ANY is discussed. The former can be used to find the highest or largest values whereas the latter can be used to exclude the lowest or smallest values. All queries in this module use the Wine database in the online playground and can be executed there. Finally, this module examines the DELETE and UPDATE statements that can be used to delete or modify data. It concludes with a brief discussion of SQL views.
What's included
2 videos10 readings10 assignments
Show info about module content
2 videos•Total 8 minutes
Nested Query - Correlated Query•4 minutes
ALL/ANY/EXISTS/NOT EXISTS•4 minutes
10 readings•Total 135 minutes
Nested Queries•15 minutes
Nested Correlated Queries•20 minutes
Queries with ALL/ANY•15 minutes
EXISTS/NOT EXISTS functions•10 minutes
Subqueries in SELECT/FROM•10 minutes
Set Operations•15 minutes
DELETE Statement•15 minutes
UPDATE Statement•15 minutes
SQL Views•15 minutes
Module 10 Summary•5 minutes
10 assignments•Total 19 minutes
Check Your Prior Knowledge•3 minutes
Assess Your Learning: Nested Queries•2 minutes
Assess Your Learning: Nested Correlated Queries•2 minutes
Assess Your Learning: Queries with ALL/ANY Knowledge•2 minutes
Assess Your Learning: EXISTS/NOT EXISTS Functions•2 minutes
Assess Your Learning: Subqueries in SELECT/FROM•1 minute
Assess Your Learning: Set Operations•2 minutes
Assess Your Learning: DELETE Statement•2 minutes
Assess Your Learning: UPDATE Statement•2 minutes
Assess Your Learning: SQL Views•1 minute
Extension to Relational Database Management Systems
Module 4•1 hour to complete
Module details
This module introduces a couple of extensions to the Relational Database Management Systems (RDBMSs). We will start by reviewing the core components of the relational model and its limitations. Subsequently, the module explores methods for extending relational databases, starting with a thorough review of triggers and stored procedures as pivotal mechanisms for augmenting the activity of RDBMSs. The module concludes by delving into the intricacies of recursive queries, a powerful extension to the SQL language.
What's included
4 readings4 assignments
Show info about module content
4 readings•Total 60 minutes
Limitations of the relational model•10 minutes
Active Relational Database Management System Extensions: Triggers and Stored Procedures•25 minutes
Recursive SQL Queries•20 minutes
Week 11 Summary•5 minutes
4 assignments•Total 8 minutes
Check Your Prior Knowledge•2 minutes
Assess Your Learning: Limitations of the relational model•3 minutes
Assess Your Learning: Active Relational Database Management System Extensions: Triggers and Stored Procedures•2 minutes
Assess Your Learning: Recursive SQL Queries•1 minute
NoSQL Databases: MongoDB
Module 5•1 hour to complete
Module details
This module presents an overview of the NoSQL movement and distributed systems. MongoDB NoSQL database is discussed at the introductory level. MongoDB is intended for storing documents such as resumes, legal documents, books, etc. It does not use any schema or data model, and stores documents as collections — which store a collection of attributes labeled and unordered that represent semi-structured items.
What's included
5 readings5 assignments
Show info about module content
5 readings•Total 70 minutes
The NoSQL movement•20 minutes
Key-Value Stores and Distributed Systems•10 minutes
Document Stores and MongoDB•20 minutes
Aggregation with MapReduce•15 minutes
Module 5 Summary•5 minutes
5 assignments•Total 7 minutes
Check Your Prior Knowledge•1 minute
Assess Your Learning: The NoSQL movement•2 minutes
Assess Your Learning: Key-Value Stores and Distributed Systems•1 minute
Assess Your Learning: Document Stores and MongoDB•2 minutes
Assess Your Learning: Aggregation with MapReduce•1 minute
NoSQL Databases: Neo4j Graph Database
Module 6•1 hour to complete
Module details
This module continues the discussion of the NoSQL database. The graph theory and Neo4j graph database are discussed at the introductory level. The Neo4j is a graph database that applies graph theory to information storage. It consists of nodes and edges, both of which can store information. Graph databases are particularly useful in modeling social networks such as X (formerly known as Twitter) and Facebook. In a way, a graph database is a hyper-relational database where join tables are replaced by more interesting and semantically meaningful relationships that can be navigated (graph traversal) and/or queried, based on graph pattern matching.
What's included
5 readings4 assignments
Show info about module content
5 readings•Total 42 minutes
A Brief Introduction to Graph Theory•5 minutes
Graph-based Databases•10 minutes
Neo4j and Cypher Query Language•25 minutes
Module 6 Summary•1 minute
Congratulations!•1 minute
4 assignments•Total 5 minutes
Check Your Prior Knowledge •1 minute
Assess Your Learning: A Brief Introduction to Graph Theory•1 minute
Assess Your Learning: Graph-based Databases•1 minute
Assess Your Learning: Neo4j and Cypher Query Language•2 minutes
Build toward a degree
This course is part of the following degree program(s) offered by Northeastern University . If you are admitted and enroll, your completed coursework may count toward your degree learning and your progress can transfer with you.¹
View eligible degrees
Build toward a degree
This course is part of the following degree program(s) offered by Northeastern University . If you are admitted and enroll, your completed coursework may count toward your degree learning and your progress can transfer with you.¹
¹Successful application and enrollment are required. Eligibility requirements apply. Each institution determines the number of credits recognized by completing this content that may count towards degree requirements, considering any existing credits you may have. Click on a specific course for more information.
Founded in 1898, Northeastern is a global research university with a distinctive, experience-driven approach to education and discovery. The university is a leader in experiential learning, powered by the world’s most far-reaching cooperative education program. The spirit of collaboration guides a use-inspired research enterprise focused on solving global challenges in health, security, and sustainability.
When will I have access to the lectures and assignments?
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
What will I get if I purchase the Certificate?
When you purchase a Certificate you get access to all course materials, including graded assignments. Upon completing the course, your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Is financial aid available?
Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.