Elastic ETL | Ahmed Yar

1 | The Problem

Static cloud infrastructure leads to a choice between high costs (over-provisioning) or high latency (under-provisioning). This project implements Infrastructure Elasticity, allowing a cluster to grow and shrink its compute resources based on live data pressure.

02 | System Architecture

SOURCE

Python Producer

INGESTION

Apache Kafka

PROCESSING

Spark Master

W1

W2

W3

W4

W2-W4 executors scale up when a 1s backlog is detected in the 'cs_student_logs' topic

03 | Technology Stack

Category	Technology	Role
Ingestion	Apache Kafka	High-throughput message buffer
Processing	Spark 3.5.0	10s Tumbling Window Aggregation
Elasticity	Dynamic Allocation	Real-time Executor scaling
Environment	Docker , Linux	Containerized cluster management

04 | Performance Impact

62.5% Cost Reduction

300% Spike Capacity

Testing confirmed a reduction from 96 to 36 instance-hours/day compared to static cluster strategies while maintaining sub-second latency.