Coursera

Modern Data Architecture & Lakehouse Engineering Specialization

Coursera

Modern Data Architecture & Lakehouse Engineering Specialization

Design and Build Modern Data Platforms.

Learn to architect, secure, and optimize cloud-based lakehouse systems for enterprise analytics.

Hurix Digital

Instructor: Hurix Digital

Access provided by PALC Dev

Get in-depth knowledge of a subject
Intermediate level

Recommended experience

4 weeks to complete
at 10 hours a week
Flexible schedule
Learn at your own pace
Get in-depth knowledge of a subject
Intermediate level

Recommended experience

4 weeks to complete
at 10 hours a week
Flexible schedule
Learn at your own pace

What you'll learn

  • Architect and provision secure, resilient cloud data infrastructure using Infrastructure as Code and disaster recovery best practices.

  • Build lakehouse platforms with transactional integrity, automated pipelines, and seamless integration of diverse data sources.

  • Optimize data system performance through strategic partitioning, query tuning, security controls, and systematic benchmarking.

Details to know

Shareable certificate

Add to your LinkedIn profile

Taught in English
Recently updated!

February 2026

See how employees at top companies are mastering in-demand skills

 logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Advance your subject-matter expertise

  • Learn in-demand skills from university and industry experts
  • Master a subject or tool with hands-on projects
  • Develop a deep understanding of key concepts
  • Earn a career certificate from Coursera

Specialization - 13 course series

Engineer Cloud Data for Resiliency & ROI

Engineer Cloud Data for Resiliency & ROI

Course 1, 2 hours

What you'll learn

  • Infrastructure as Code automates data platform deployments, replacing manual processes with version-controlled, repeatable systems.

  • Cost optimization uses performance benchmarking and data analysis to identify efficient compute/storage configs for specific workloads.

  • Business continuity requires proactive disaster recovery with automated failover and continuous replication for strict recovery goals.

  • Successful cloud data engineering balances performance, cost, and reliability through strategic design and continuous monitoring.

Skills you'll gain

Category: Disaster Recovery
Category: Business Continuity
Category: Business Continuity Planning
Category: Cloud Deployment
Category: Data Architecture
Category: Infrastructure as Code (IaC)
Category: Benchmarking
Category: Data Warehousing
Category: Cloud Computing Architecture
Category: Terraform
Category: Capacity Management
Category: AWS CloudFormation
Category: IT Infrastructure
Category: Data Infrastructure
Category: Performance Analysis
Category: Cost Management
Category: Automation
Build & Analyze Your Data Lakehouse

Build & Analyze Your Data Lakehouse

Course 2, 2 hours

What you'll learn

  • External tables let query engines access distributed files without duplication, reshaping large-scale analytics design.

  • Choosing Delta, Iceberg, or Hudi requires evaluating schema changes, time travel needs, and performance goals.

  • Lakehouse architecture merges data lake flexibility with warehouse reliability using metadata and ACID support.

  • Automated ingestion with staging and transformation layers ensures consistent, high-quality data across analytics systems.

Transform, Analyze, and Optimize Your Data

Transform, Analyze, and Optimize Your Data

Course 3, 3 hours

What you'll learn

  • Batch data transformation converts raw semi-structured data into analysis-ready formats that support enterprise decisions.

  • Workload analysis guides database design by linking access patterns and query frequency to performance and cost gains.

  • Migration choices must rely on performance testing and quantitative analysis to ensure ROI-driven transformations.

  • System performance depends on storage, queries, and hardware, requiring holistic technical and business evaluation.

Skills you'll gain

Category: Data Architecture
Category: Operational Databases
Category: Apache Cassandra
Category: Apache Hive
Category: Database Design
Category: Azure Synapse Analytics
Category: Data Transformation
Category: Amazon Redshift
Category: Database Management
Category: Data Wrangling
Unify, Reconcile, and Tune Data Systems

Unify, Reconcile, and Tune Data Systems

Course 4, 3 hours

What you'll learn

  • SQL MERGE offers atomic sync that maintains consistency in CDC pipelines with minimal overhead.

  • Field-level conflict analysis needs clear business rules and source-of-truth hierarchies for reliable reconciliation.

  • Integration performance improves through measurement, bottleneck detection, and targeted tuning, not large redesigns.

  • Sustainable data systems balance quality, speed, and reliability through ongoing monitoring and iterative improvement.

Skills you'll gain

Category: Stored Procedure
Category: Performance Testing
Category: Data Pipelines
Category: Data Manipulation
Category: Data Integration
Category: Operational Databases
Category: Database Design
Category: Application Performance Management
Category: Data Cleansing
Category: Data Quality
Category: Performance Measurement
Category: Systems Integration
Category: Performance Metric
Category: Data Integrity
Category: Performance Tuning
Category: Consolidation
Category: Data Governance
Category: Data Validation
Category: SQL
Category: Performance Improvement
Secure Data: Mask, Monitor, and Audit

Secure Data: Mask, Monitor, and Audit

Course 5, 2 hours

What you'll learn

  • Data protection requires layered security controls that balance privacy with operational utility.

  • Proactive monitoring and anomaly detection are essential for identifying security threats before they escalate into breaches.

  • Compliance frameworks provide structured approaches to evaluating and strengthening organizational security postures.

  • Effective data governance integrates technical controls with policy frameworks to create comprehensive protection strategies.

Skills you'll gain

Category: Security Management
Provision Secure Cloud Data Infrastructure

Provision Secure Cloud Data Infrastructure

Course 6, 2 hours

What you'll learn

  • Security by design applies layered defenses across storage, identity, and networks from the start of infrastructure setup.

  • Infrastructure as Code ensures consistent, auditable security settings that reduce errors and support compliance needs.

  • The principle of least privilege must be embedded into every access control decision, granting only necessary permissions to specific resources.

  • Secure networks rely on segmentation with private subnets and controls to protect systems from public exposure.

Skills you'll gain

Category: Cloud Security
Category: Identity and Access Management
Category: Network Security
Category: Encryption
Category: Infrastructure as Code (IaC)
Category: Data Security
Category: Data Infrastructure
Category: Data Management
Category: Security Controls
Category: Private Cloud
Category: Data Integrity
Category: Cloud Storage
Category: Cloud Infrastructure
Category: Infrastructure Security
Apply Data Lake Transactions & Versioning

Apply Data Lake Transactions & Versioning

Course 7, 2 hours

What you'll learn

  • Transactional storage layers ensure data lake reliability, supporting concurrent operations and maintaining integrity.

  • Version control in data lakes enables auditing, compliance, time-travel queries, and error recovery for production systems.

  • Schema evolution strategies help data systems adapt to business changes while maintaining backward compatibility.

  • Converting raw files to transactional formats is a key pattern supporting both analytics and operational reliability.

Skills you'll gain

Category: SQL
Category: Data Pipelines
Category: Data Lakes
Evaluate Storage for Data Warehousing Success

Evaluate Storage for Data Warehousing Success

Course 8, 2 hours

What you'll learn

  • Storage format choice strongly affects query performance and should match workload needs, not general assumptions.

  • Column storage suits read-heavy analytics, while row storage performs better for transactional and write-focused workloads.

  • Benchmarking with real datasets and queries offers the best basis for sound storage architecture decisions.

  • Compression and ingestion speed must be balanced carefully to align performance with business priorities.

Skills you'll gain

Category: Performance Testing
Category: Amazon Redshift
Category: Scalability
Category: Data Processing
Category: Star Schema
Category: Analysis
Category: Snowflake Schema
Category: Data-Driven Decision-Making
Category: Apache Hive
Category: Data Warehousing
Category: Data Architecture
Category: Query Languages
Category: Data Storage
Category: Data Storage Technologies
Category: Technical Communication
Build & Transform Data Pipelines

Build & Transform Data Pipelines

Course 9, 2 hours

What you'll learn

  • Modular pipeline design enables maintainable, scalable data systems that can adapt to changing business requirements.

  • Integration of complementary tools (Spark, dbt, Airflow) creates more robust and efficient data processing workflows than single-tool approaches.

  • Proper separation of concerns between ingestion, transformation, and loading stages reduces complexity and improves debugging capabilities.

  • Automation and orchestration are essential for reliable, production-grade data systems that minimize manual intervention and human error.

Skills you'll gain

Category: Maintainability
Category: Data Processing
Category: Data Integration
Category: Data Warehousing
Category: Data Pipelines
Category: Cloud Deployment
Category: Data Cleansing
Category: Apache Airflow
Category: Cloud Computing
Category: Extract, Transform, Load
Unify Diverse Data Sources

Unify Diverse Data Sources

Course 10, 1 hour

What you'll learn

  • Standardized connector configuration patterns apply across different data source types, making integration skills transferable.

  • Authentication and security considerations must be built into every connector setup to ensure enterprise-grade data protection.

  • Proper offset and parameter management in streaming and API connections prevents data loss and ensures complete data capture.

  • Unified staging approaches enable downstream analytics and business intelligence regardless of source system complexity.

Skills you'll gain

Category: Data Pipelines
Category: Authentications
Category: Data Integration
Category: Databases
Category: Database Systems
Category: Enterprise Security
Category: Relational Databases
Category: Data Infrastructure
Category: Restful API
Category: Real Time Data
Category: Apache Kafka
Map Data Flows Fast

Map Data Flows Fast

Course 11, 1 hour

What you'll learn

  • Visual data flow docs are key for system clarity and form the base for good pipeline design and team communication.

  • Complete data flow diagrams must show the full journey from sources through transforms to final destinations.

  • Structured diagram creation follows steps: find sources, map processes, set destinations, and check connections.

  • Good data flow visuals connect technical work with business needs, enabling stakeholder alignment and decisions.

Skills you'll gain

Category: Data Literacy
Category: Data Presentation
Category: Data Store
Category: Data Mapping
Category: Diagram Design
Category: Data Visualization
Category: Data Flow Diagrams (DFDs)
Category: Dataflow
Category: Data Transformation
Category: Data Pipelines
Category: Data Processing
Category: Technical Communication
Optimize Spark Performance: Analyze & Accelerate

Optimize Spark Performance: Analyze & Accelerate

Course 12, 1 hour

What you'll learn

  • Performance optimization is a systematic process requiring analysis of data access patterns, not random configuration changes.

  • Strategic partitioning minimizes expensive network shuffles and is the foundation of scalable Spark applications.

  • Intelligent caching of reusable intermediate datasets can dramatically reduce computation costs and improve job reliability.

  • The Spark UI provides actionable insights that guide optimization decisions and enable data-driven performance improvements.

Skills you'll gain

Category: Apache Spark
Category: Performance Tuning
Category: Data Processing
Category: PySpark
Category: Data Pipelines
Category: Systems Analysis
Optimize Query Performance for Data Success

Optimize Query Performance for Data Success

Course 13, 2 hours

What you'll learn

  • Proactive performance monitoring prevents system failures and ensures consistent user experience across production environments.

  • Systematic diagnosis of query bottlenecks requires understanding both query logic efficiency and underlying resource limitations.

  • Strategic resource allocation combines technical optimization with business requirements to maintain service level agreements.

  • Continuous performance analysis creates a feedback loop that improves system reliability over time.

Skills you'll gain

Category: Capacity Management
Category: Application Performance Management
Category: Query Languages
Category: Database Management
Category: Continuous Monitoring
Category: Operational Databases
Category: Service Level
Category: System Monitoring
Category: Performance Tuning
Category: Performance Testing

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructor

Hurix Digital
Coursera
406 Courses34,235 learners

Offered by

Coursera

Why people choose Coursera for their career

Felipe M.

Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."