Apache Spark Structured Streaming in Production
Wire Kafka-aligned sources, stateful operators, and checkpoint folders without hand-waving the failure modes.
Big Data + Apache Spark · KR delivery spine
Hwaseong-si crews pair with our facilitators to keep Spark clusters explainable: streaming joins, Iceberg branches, and listener buses documented like production incidents—not slide fiction.
Trusted by teams at VoltMesh Logistics, HanRiver Analytics Co-op, Paju Fabrication IoT for streaming drills.
2.6M
rows replayed in anonymized skew drills
54
cohorts since 2019 with published change logs
9.2
facilitator clarity (internal pulse, /10)
18
racks time-shared for hybrid lab windows
42
employer-authored constraints in rubrics
Quiet channel
No countdown widgets—just a concise memo on watermark changes, rack firmware, and syllabus diffs whenever we cut a new intake.
Catalog surface
Eight Spark-forward builds with filters for level, duration, study format, and certificate paths. Cards stay horizontal so you scan like a capacity planner, not a carousel shopper.
Showing 8 of 8 programs
Wire Kafka-aligned sources, stateful operators, and checkpoint folders without hand-waving the failure modes.
ACID batches meet streaming merges: schema enforcement, vacuum cadence, and time travel queries you can defend in review.
Shuffle partitions, AQE toggles, and skew hints taught as measurable experiments—not folklore copied from forums.
Move driver/executor placement to K8s pod templates with resource profiles tuned for Hwaseong-class bare-metal racks.
Feature stores fed by nightly Spark jobs: point-in-time correctness, idempotent writes, and drift alarms wired to notebooks.
Hidden partitioning, branch snapshots, and compaction jobs orchestrated with realistic table maintenance budgets.
OpenTelemetry exporters, Spark listener buses, and executor log stitching so on-call engineers sleep slightly better.
Theta sketches, HLL, and count-min sketches inside guarded UDFs with reproducible accuracy/error trade notebooks.
Execution spine
Frame the failure budget
Replay with captured plans
Instrument listener buses
Publish diffable notebooks
Retro with employer rubric
Signal wall
“The Structured Streaming labs forced us to log watermark decisions we had been hiding in Slack threads. Checkpoint forensics alone justified the Mesh Foundry tier.”
“Iceberg branch snapshots finally clicked after the week-two compaction drill—still slower at PySpark than I want, but honest about why.”
“Mentors answered my skew hint question with a live explain plan instead of a canned deck.”
“Finance wanted receipts for every performance claim; the listener bus module gave us metric exports they could trace.”
“Career coach helped me rewrite bullet points around the Delta Lake governance project without inventing titles I never held.”
Partnerships
Micro FAQ
No. Week one runs against our shared lab; you only need Docker-capable hardware and VPN access we provision after enrollment confirmation.
See the Refund pathway page for eligibility windows, partial refund cases, and processing timelines under Korean consumer guidance.
Seats are per named participant so feedback stays attributable; we can invoice multiple cost centers on request.
Artifacts
Includes week-by-week cluster expectations, VPN prerequisites, and the refund pathway summary so finance can sign without chasing us for footnotes.