Duke University
Applied Python Data Engineering Specialization
Elevate your coding skills with data engineering. Use big data for decision-making, analysis, AI and machine learning

Taught in English

Kennedy Behrman
Matt Harrison
Noah Gift

Instructors: Kennedy Behrman

Specialization - 3 course series

Get in-depth knowledge of a subject
3.6

(18 reviews)

Intermediate level

Recommended experience

5 months at 10 hours a week
Flexible schedule
Learn at your own pace

What you'll learn

  • Create scalable big data pipelines (Hadoop, Spark, Snowflake, Databricks) for efficient data handling.

  • Build machine learning workflows (PySpark, MLFlow) on Databricks for seamless model development and deployment.

  • Implement DataOps/DevOps to streamline data engineering processes.

  • Formulate and communicate data-driven insights and narratives through impactful visualizations with Python and data storytelling

Skills you'll gain

Details to know

Recently updated!

January 2024

Spark, Hadoop, and Snowflake for Data Engineering

Course 129 hours3.7 (22 ratings)

What you'll learn

  • Create scalable data pipelines (Hadoop, Spark, Snowflake, Databricks) for efficient data handling.

  • Optimize data engineering with clustering and scaling to boost performance and resource use.

  • Build ML solutions (PySpark, MLFlow) on Databricks for seamless model development and deployment.

  • Implement DataOps and DevOps practices for continuous integration and deployment (CI/CD) of data-driven applications, including automating processes.

Skills you'll gain

Category: Big Data
Category: Python Programming
Category: Information Engineering
Category: Apache Hadoop
Category: Apache Spark

Virtualization, Docker, and Kubernetes for Data Engineering

Course 227 hours3.3 (12 ratings)

What you'll learn

  • Master virtualization, containerization, and Docker, including Dockerfile creation and multi-container orchestration with Compose and Airflow.

  • Develop expertise in Kubernetes core concepts, cluster architecture, and deployment using cloud environments, GitHub Codespaces, and AI-driven tools.

  • Navigate data scenarios mastering containerization, deploying apps, and addressing production issues with cloud orchestration and SRE practices.

Skills you'll gain

Category: Cloud-Based Integration
Category: containerization
Category: virtualization
Category: Kubernetes
Category: Docker (Software)

Data Visualization with Python

Course 39 hours

What you'll learn

  • Apply Python, spreadsheets, and BI tooling proficiently to create visually compelling and interactive data visualizations.

  • Formulate and communicate data-driven insights and narratives through impactful visualizations and data storytelling.

  • Assess and select the most suitable visualization tools and techniques to address organizational data needs and objectives.

Skills you'll gain

Category: Business Communication
Category: Data Analysis
Category: Python Programming
Category: Cloud Applications
Category: Data Visualization

Instructors

Kennedy Behrman
Duke University
7 Courses37,740 learners

Duke University

