Software Engineering Technical Lead
WEKA
WEKA provides a software platform that powers high-performance data infrastructure, enabling organizations to accelerate innovation with modern data architecture.
Tel Aviv, Israel
Posted 2mo ago
datacenterstoragesoftware
At WEKA, we are building NeuralMesh™ — the world's first intelligent, adaptive mesh storage system, purpose-built for the age of AI. To ensure our platform remains unbreakable at the world's largest AI and GPU clusters, we don't simply test our code. We build an adversarial distributed system as complex and sophisticated as the product itself.
The Quality Testing & Reliability group is not a traditional QA team. We are a high-octane engineering force that treats reliability as a first-class software problem. We build the systems, frameworks, and infrastructure that prove our platform's correctness at scale — and we move with the urgency and ambition of a category-defining company.
We are looking for a Technical Lead to drive the architectural direction and engineering excellence of this group. This is a senior, deeply hands-on role for a technology leader who can own the technical roadmap, mentor a team of elite engineers, and build the infrastructure that challenges WEKA's platform to its theoretical limits.
What You'll Lead
- Define and own the technical architecture of the group's distributed testing and reliability platform - designing for massive scale, real-world workload simulation, and adversarial failure injection
- Lead effort involving multiple engineers, setting technical standards, running architecture reviews, driving design decisions, and mentoring engineers to grow
- Build the systems that orchestrate millions of concurrent IO operations, inject chaos at the infrastructure layer (latency, packet loss, hardware failures), and expose the hardest-to-find race conditions and consistency bugs
- Advance AI-driven approaches to test automation: intelligent scenario generation, LLM-augmented root-cause analysis, and autonomous validation pipelines
- Drive observability and reliability engineering across the group - building telemetry pipelines that track P99 latency, jitter, and system health, turning quality into a quantitative discipline
- Collaborate deeply with Core R&D, Storage Kernel, and Infrastructure teams - translating architectural knowledge into targeted reliability strategies
- Establish engineering practices - design docs, production-grade code reviews, testing philosophy, and cross-team technical alignment
What You Bring
- Strong software engineering background - Python expertise is essential; ability to read, debug, and reason about C++, Rust, or Go is a significant advantage
- Deep understanding of distributed systems: concurrency, consistency models, fault tolerance, and large-scale system behavior under stress
- Background in one or more of: storage systems, networking (TCP/IP, RDMA), cloud infrastructure, database internals, or high-performance backend systems
- Experience building large-scale infrastructure platforms, internal developer platforms, or reliability engineering systems
Leadership
- Proven track record leading complex technical initiatives from architecture through delivery
- Experience mentoring and growing engineers - raising the technical bar of a team, not just directing work
- Ability to drive technical alignment across teams, communicate tradeoffs clearly, and make high-quality architectural decisions at speed
- Comfortable operating at both the strategic and hands-on level - you write code, review designs, and shape roadmaps
- Previous experience in people management roles - Advantage
Mindset
- You approach quality through the lens of Site Reliability Engineering: you care about MTTD, observability, and building self-healing systems
- You have a "hacker" instinct - you don't just find bugs; you find the architectural flaws that allowed them to exist
- You are an early adopter of AI tools and excited about applying LLMs and generative AI to accelerate engineering velocity
Big Advantages
- Experience with storage systems, file systems, or high-performance distributed environments
- Background in chaos engineering, fault injection, or simulation systems
- Familiarity with observability tooling and performance engineering at scale
- Experience building testing or reliability platforms as first-class engineering products
- Prior experience as a Team Lead in a high-growth infrastructure company
Why This Role Is Different
Most engineering leadership roles manage delivery.
This role builds the system that proves the product.
You will lead one of the most technically demanding groups in the company - solving hard problems in distributed systems correctness, adversarial infrastructure design, and AI-augmented validation. You will have real influence on how one of the industry's most advanced storage platforms is hardened, scaled, and trusted by the world's leading AI organizations.
If you want to lead engineers who are building the future of infrastructure reliability - this role was built for you.
Posted by WEKA on their own careers page — you
apply directly, no recruiter in between. View original / apply →
More at WEKA
W
WEKA · WEKA provides a software platform that powers high-performance d…
Remote
EMEA Remote +1
datacenterstorage
2mo ago
W
WEKA · WEKA provides a software platform that powers high-performance d…
Tel Aviv, Israel
datacenterstorage
11d ago
W
WEKA · WEKA provides a software platform that powers high-performance d…
Remote
Japan Remote
datacenterstorage
2mo ago
W
WEKA · WEKA provides a software platform that powers high-performance d…
Bengaluru, India
datacenterstorage
18d ago