Big Data Engineers and professionals with NoSQL skills are highly sought after in the data management industry. This Specialization is designed for those seeking to develop fundamental skills for working with Big Data, Apache Spark, and NoSQL databases. Three information-packed courses cover popular NoSQL databases like MongoDB and Apache Cassandra, the widely used Apache Hadoop ecosystem of Big Data tools, as well as Apache Spark analytics engine for large-scale data processing.
You start with an overview of various categories of NoSQL (Not only SQL) data repositories, and then work hands-on with several of them including IBM Cloudant, MonogoDB and Cassandra. You’ll perform various data management tasks, such as creating & replicating databases, inserting, updating, deleting, querying, indexing, aggregating & sharding data. Next, you’ll gain fundamental knowledge of Big Data technologies such as Hadoop, MapReduce, HDFS, Hive, and HBase, followed by a more in depth working knowledge of Apache Spark, Spark Dataframes, Spark SQL, PySpark, the Spark Application UI, and scaling Spark with Kubernetes. In the final course, you will learn to work with Spark Structured Streaming Spark ML - for performing Extract, Transform and Load processing (ETL) and machine learning tasks.
This specialization is suitable for beginners in the fields of NoSQL and Big Data – whether you are or preparing to be a Data Engineer, Software Developer, IT Architect, Data Scientist, or IT Manager.
Applied Learning Project
The emphasis in this specialization is on learning by doing. As such, each course includes hands-on labs to practice & apply the NoSQL and Big Data skills you learn during lectures.
In the first course, you will work hands-on with several NoSQL databases- MongoDB, Apache Cassandra, and IBM Cloudant to perform a variety of tasks: creating the database, adding documents, querying data, utilizing the HTTP API, performing Create, Read, Update & Delete (CRUD) operations, limiting & sorting records, indexing, aggregation, replication, using CQL shell, keyspace operations, & other table operations.
In the next course, you’ll launch a Hadoop cluster using Docker and run Map Reduce jobs. You’ll explore working with Spark using Jupyter notebooks on a Python kernel. You’ll build your Spark skills using DataFrames, Spark SQL, and scale your jobs using Kubernetes.
In the final course you will use Spark for ETL processing, and Machine Learning model training and deployment using IBM Watson.