6-Month Engineering Roadmap

3phases

4github projects

6months

Phase 1C++ systems foundationWeeks 1-6

Learn

Modern C++ (C++17/20): memory ownership, RAII, smart pointers

Resource: "A Tour of C++" by Bjarne Stroustrup. 2 chapters/week.

CMake build system: how to build, link, and structure C++ projects

You'll need this before any real project compiles correctly.

Multithreading: std::thread, mutexes, condition variables, atomics

Everything in inference is concurrent. This is non-negotiable.

Sockets and HTTP from scratch: BSD socket API in C++

Don't use a library yet. Write raw send/recv. Understand the protocol.

Project 1: HTTP server in C++

tokoro: HTTP/1.1 server in C++

Build a multi-threaded HTTP server that serves static files and handles concurrent connections. No libraries: raw POSIX sockets. Implement: TCP accept loop, HTTP parser, thread pool, keep-alive.

C++systemsgithub ship

Why this matters: every inference server is a networked C++ process. This teaches you what's below frameworks like NestJS, and the project is immediately legible to AI infrastructure engineers.

Milestones by week 6

Server handles 1000 concurrent connections

HTTP parser handles chunked encoding

README with architecture diagram and benchmark results

Published to GitHub with CI via GitHub Actions

Phase 2AI inference internalsWeeks 7-16

Learn

How LLM inference works: tokenization, attention, KV cache, batching

Read the llama.cpp source. Understand every struct. Don't just run it.

GGUF format: how model weights are quantized and stored on disk

Open a .gguf file in a hex editor. Map the header. Then load it in code.

Profiling C++: perf, Valgrind, gprof, cache miss analysis

You can't optimize what you haven't measured. This skill separates real systems engineers.

SIMD basics: SSE2/AVX2 intrinsics for vectorized float math

Optional but powerful. llama.cpp uses SIMD heavily. Even reading it builds intuition.

Project 2: inference server in C++

vahan: LLM inference server in C++

Build an HTTP server using your tokoro base or cpp-httplib that loads a GGUF model via llama.cpp as a library, accepts prompts, streams tokens via SSE, and handles concurrent request queuing. Expose /generate and /health endpoints. Add latency metrics.

C++AI infragithub ship

Project 3: inference benchmarking CLI

drishti: inference benchmark CLI

A CLI tool in C++ or Python that stress-tests any OpenAI-compatible inference endpoint. Measures TTFT, throughput, p50/p95/p99 latency, and concurrent load. Outputs structured JSON reports. Genuinely useful to the community.

C++AI infratoolinggithub ship

Milestones by week 16

vahan streams Llama 3.2 3B locally over HTTP

Handles 4 concurrent requests with queuing

drishti publishes benchmark report for 3 popular inference endpoints

Blog post: "What I learned reading the llama.cpp source"

Phase 3Specialise and signalWeeks 17-26

Go deep on one track

Contribute a real PR to llama.cpp, vLLM, or whisper.cpp

Not a docs fix. A bug fix, a perf improvement, or a missing feature. One merged PR > 10 side projects.

If targeting ElevenLabs: build real-time voice pipeline

STT -> LLM -> TTS, full duplex, WebSocket, interruption handling. Hard latency budget: <800ms.

If targeting Anthropic: study PagedAttention, speculative decoding

Read the vLLM paper. Then implement a toy speculative decoder in C++.

Project 4: the flagship

shabda: real-time voice AI pipeline

End-to-end: microphone input -> Whisper STT -> local LLM -> TTS -> speaker output. WebSocket server in C++. Full duplex with interruption handling. Latency budget per stage. Deployed as a demo anyone can run. This is the project that gets you the recruiter email.

C++AI infrareal-timeflagship

Signal the work publicly

Write 1 technical post/week on LinkedIn about what you're building

Not summaries of articles. Your own findings, failures, benchmarks. "I ran X and found Y."

Publish 2 long-form blog posts on your personal site

"How I built a streaming inference server in C++" and "Benchmarking open-source LLM inference"

All 4 projects starred, documented, and CI-passing on GitHub

Recruiters look at GitHub before they look at your resume. Make it easy for them.

Milestones by week 26

One merged PR in a major open-source inference project

shabda demo runs end-to-end in under 800ms

500+ GitHub stars across all projects combined

Resume updated: inference infra as the lead skill

First recruiter outreach from a target company

6-Month Engineering Roadmap

Tags