<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Modern C++ // dev</title><description>High-signal modern C++ content for senior engineers.</description><link>https://moderncpp.dev/</link><item><title>Unlocking ARM NEON Performance: A Deep Dive into Parallel Prefix Sums</title><link>https://moderncpp.dev/articles/arm-neon-parallel-prefix-sum-deep-dive/</link><guid isPermaLink="true">https://moderncpp.dev/articles/arm-neon-parallel-prefix-sum-deep-dive/</guid><description>I measured the prefix sum on a Haswell i7-4790 with GCC 15.2.1 and `-O3 -march=native`. The scalar version hit 7.1 GB/s on 1MB arrays and held steady at 6.0–6.2 GB/s as I pushed to 1GB. ARM NEON on th</description><pubDate>Sun, 10 May 2026 00:00:00 GMT</pubDate></item><item><title>Building Fast, Immutable String-to-Float Array Maps in C++: A &apos;ConstMap&apos; Inspired Approach</title><link>https://moderncpp.dev/articles/cpp-immutable-string-to-array-map-perf/</link><guid isPermaLink="true">https://moderncpp.dev/articles/cpp-immutable-string-to-array-map-perf/</guid><description>I built an immutable string-to-float-array map because I got tired of losing 6.8 nanoseconds every time I touched `std::unordered_map`. On an i7-4790 with GCC 15.2 and -O2, the immutable map delivers </description><pubDate>Sun, 10 May 2026 00:00:00 GMT</pubDate></item><item><title>Beyond Sequential Consistency: Practical Performance Gains with C++ Weaker Memory Orderings</title><link>https://moderncpp.dev/articles/cpp-weaker-memory-orderings-performance-guide/</link><guid isPermaLink="true">https://moderncpp.dev/articles/cpp-weaker-memory-orderings-performance-guide/</guid><description>For years I avoided thinking about memory orderings. I&apos;d write `std::atomic&lt;T&gt;` with the default `std::memory_order_seq_cst`, and it worked. On x86-64, the CPU&apos;s strong memory model does the heavy lif</description><pubDate>Sun, 10 May 2026 00:00:00 GMT</pubDate></item><item><title>GCC 16.1 and C++20 Modules: A Practical Guide to Bridging the Tooling Gap</title><link>https://moderncpp.dev/articles/gcc16-cpp20-modules-adoption-practical/</link><guid isPermaLink="true">https://moderncpp.dev/articles/gcc16-cpp20-modules-adoption-practical/</guid><description>C++20 modules promised to fix the compile-time tax that header files impose. Five years on, GCC 16.1 is ready. The promise is sound. The tooling is not.</description><pubDate>Sun, 10 May 2026 00:00:00 GMT</pubDate></item><item><title>When JSON Schema Crashes Your Inference Server: Regex DoS in C++</title><link>https://moderncpp.dev/articles/json-schema-regex-sandboxing-in-cpp26-inference-servers/</link><guid isPermaLink="true">https://moderncpp.dev/articles/json-schema-regex-sandboxing-in-cpp26-inference-servers/</guid><description>Feed `std::regex` a pathological pattern and a crafted input, and watch it spiral. Input length 16 characters: 4.48 milliseconds. Input length 18: 18 milliseconds. Input length 20: over a second. The </description><pubDate>Fri, 08 May 2026 00:00:00 GMT</pubDate></item><item><title>LLVM&apos;s Flat-Buffer Tree for IR Dominators: O(1) Reads vs O(n) Moves</title><link>https://moderncpp.dev/articles/llvm-flat-buffer-tree/</link><guid isPermaLink="true">https://moderncpp.dev/articles/llvm-flat-buffer-tree/</guid><description>Compiler optimization passes live and die on tree traversal. LLVM&apos;s dominator analysis alone queries ancestor relationships thousands of times per function. A real C++ translation unit with heavy temp</description><pubDate>Tue, 05 May 2026 00:00:00 GMT</pubDate></item><item><title>Compile-Time Unsigned Overflow Detection in C++: From `if`-checks to `constexpr` Assertions</title><link>https://moderncpp.dev/articles/constexpr-unsigned-overflow-detection/</link><guid isPermaLink="true">https://moderncpp.dev/articles/constexpr-unsigned-overflow-detection/</guid><description>Unsigned overflow wraps. That&apos;s the contract. C++ won&apos;t catch it, the CPU won&apos;t trap it. But your size calculation that wraps to a smaller buffer? That&apos;s a memory corruption vulnerability, and it&apos;s yo</description><pubDate>Fri, 01 May 2026 00:00:00 GMT</pubDate></item><item><title>Beyond Binary Search: When Interpolation Beats Quaternary, Radix, and SIMD</title><link>https://moderncpp.dev/articles/interleaved-simd-search-alternatives-binary-vs-interpolation/</link><guid isPermaLink="true">https://moderncpp.dev/articles/interleaved-simd-search-alternatives-binary-vs-interpolation/</guid><description>You have a sorted array. Need to search it. `std::lower_bound`: O(log n), predictable, robust. This is the default solution. But my industry experience breeds skepticism. Every few months, a new paper</description><pubDate>Tue, 28 Apr 2026 00:00:00 GMT</pubDate></item><item><title>Parallel execution for loops: C++26&apos;s work-stealing scheduler under the hood</title><link>https://moderncpp.dev/articles/deep-dive-cpp26-parallel-algorithms-proposal/</link><guid isPermaLink="true">https://moderncpp.dev/articles/deep-dive-cpp26-parallel-algorithms-proposal/</guid><description>I&apos;ve spent enough time staring at `perf stat` output to recognize a pattern: OpenMP&apos;s dynamic scheduler measures **55,593 ops/sec** on Zipf-distributed task costs with 512 tasks. That&apos;s about 18 micro</description><pubDate>Fri, 24 Apr 2026 00:00:00 GMT</pubDate></item><item><title>Kernel Fusion on CPU: What llama.cpp&apos;s RMS_NORM + MUL Fusion Teaches Us About LLM Performance</title><link>https://moderncpp.dev/articles/llama-cpp-fusion-rms-norm/</link><guid isPermaLink="true">https://moderncpp.dev/articles/llama-cpp-fusion-rms-norm/</guid><description>Llama.cpp&apos;s PR #22423 landed a kernel fusion for RMS_NORM + MUL in the ggml CPU backend a few weeks ago. The speedup: 1.60×. Consistently. Across dimension sizes, thread counts, even hardware variatio</description><pubDate>Tue, 21 Apr 2026 00:00:00 GMT</pubDate></item><item><title>P3373R2: The Case for a Standardized Low-Latency I/O API</title><link>https://moderncpp.dev/articles/wg21-p3373r2-low-latency-io-api/</link><guid isPermaLink="true">https://moderncpp.dev/articles/wg21-p3373r2-low-latency-io-api/</guid><description>Here&apos;s the uncomfortable truth: modern C++ standard library I/O becomes a bottleneck at scale. Traditional POSIX APIs introduce 1–10 microseconds of latency per operation due to syscall overhead and k</description><pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate></item><item><title>Compile-Time String Substitution with C++26 Reflection</title><link>https://moderncpp.dev/articles/barry-revzin-meta-substitute/</link><guid isPermaLink="true">https://moderncpp.dev/articles/barry-revzin-meta-substitute/</guid><description>You&apos;ve probably formatted a string in C++ more times than you&apos;ve thought about how it works. Write a format string, pass some values; the library parses at runtime. It&apos;s a solved problem, shipped, sta</description><pubDate>Tue, 14 Apr 2026 00:00:00 GMT</pubDate></item><item><title>C++26 Move Semantics: What&apos;s New Since CppCon 2025 Basics Talk</title><link>https://moderncpp.dev/articles/cpp26-move-semantics-cppcon-2025-deep-dive/</link><guid isPermaLink="true">https://moderncpp.dev/articles/cpp26-move-semantics-cppcon-2025-deep-dive/</guid><description>If you watched Ben Saks&apos;s CppCon 2025 &apos;Back to Basics: Move Semantics&apos; talk, you know what moves are and why the compiler calls them. That talk is solid. C++26 doesn&apos;t contradict it. What it does is t</description><pubDate>Fri, 10 Apr 2026 00:00:00 GMT</pubDate></item><item><title>C++23 std::stacktrace: Never Debug Blind Again</title><link>https://moderncpp.dev/articles/c23-stdstacktrace-never-debug-blind-again/</link><guid isPermaLink="true">https://moderncpp.dev/articles/c23-stdstacktrace-never-debug-blind-again/</guid><description>I have written the `#ifdef` tower — `backtrace()` on Linux, `CaptureStackBackTrace()` on Windows, `dladdr` for symbol resolution on one side, `dbghelp.dll` on the other — more times than I care to adm</description><pubDate>Tue, 07 Apr 2026 00:00:00 GMT</pubDate></item><item><title>C++26 Is Finalized: What Shipped, What Didn&apos;t, and What It Means</title><link>https://moderncpp.dev/articles/cpp26-what-shipped/</link><guid isPermaLink="true">https://moderncpp.dev/articles/cpp26-what-shipped/</guid><description>C++26 was voted out in March 2026. Contracts, reflection, and sender/receiver all shipped. I fed every major feature through GCC 15 and Clang 21 to see what actually compiles today.</description><pubDate>Fri, 03 Apr 2026 00:00:00 GMT</pubDate></item><item><title>Designing a SIMD Algorithm from Scratch</title><link>https://moderncpp.dev/articles/designing-a-simd-algorithm-from-scratch/</link><guid isPermaLink="true">https://moderncpp.dev/articles/designing-a-simd-algorithm-from-scratch/</guid><description>I manually unrolled a byte-counting loop with four independent accumulators — the textbook ILP optimization — and it ran 2.08x *slower* than the plain loop. The plain loop that GCC had quietly autovec</description><pubDate>Tue, 31 Mar 2026 00:00:00 GMT</pubDate></item><item><title>C++ Profiles: What, Why, and How at using std::cpp 2026</title><link>https://moderncpp.dev/articles/c-profiles-what-why-and-how-at-using-stdcpp-2026/</link><guid isPermaLink="true">https://moderncpp.dev/articles/c-profiles-what-why-and-how-at-using-stdcpp-2026/</guid><description>I spent a week after using std::cpp 2026 trying to answer one question about profiles: do the checks cost anything? The proposal sounds good on paper — opt-in safety enforcement per translation unit, </description><pubDate>Fri, 27 Mar 2026 00:00:00 GMT</pubDate></item><item><title>C++20 Modules: The Tooling Gap</title><link>https://moderncpp.dev/articles/c20-modules-the-tooling-gap/</link><guid isPermaLink="true">https://moderncpp.dev/articles/c20-modules-the-tooling-gap/</guid><description>GCC 15 — six years after C++20 — still does not enable modules with -std=c++20. The tooling gap is real, and it is not closing fast.</description><pubDate>Fri, 20 Mar 2026 00:00:00 GMT</pubDate></item><item><title>Contracts in C++26: What They Check, What They Cost, When to Use Them</title><link>https://moderncpp.dev/articles/contracts-cpp26/</link><guid isPermaLink="true">https://moderncpp.dev/articles/contracts-cpp26/</guid><description>That&apos;s GCC 15.2.1, `-O2 -std=c++26`, on an i7-4790 at 3.6 GHz. The function under test does one multiply. One precondition is checked.</description><pubDate>Tue, 17 Mar 2026 00:00:00 GMT</pubDate></item><item><title>Anatomy of llama.cpp: How 105K Stars of C++ Runs LLMs on Your Laptop</title><link>https://moderncpp.dev/articles/anatomy-of-llamacpp/</link><guid isPermaLink="true">https://moderncpp.dev/articles/anatomy-of-llamacpp/</guid><description>I spent a week reading llama.cpp&apos;s source. Not the GitHub README, not the model card — the actual C that runs when you type `./llama-cli -m llama-7b-q4.gguf`. What I found is one of the better-enginee</description><pubDate>Fri, 13 Mar 2026 00:00:00 GMT</pubDate></item><item><title>Profile-Guided Optimization Made Our Code Slower</title><link>https://moderncpp.dev/articles/profile-guided-optimization/</link><guid isPermaLink="true">https://moderncpp.dev/articles/profile-guided-optimization/</guid><description>That&apos;s the whole story. I took a virtual-dispatch interpreter loop — the textbook PGO target — instrumented it, trained it on a representative workload, and recompiled. Both GCC 15.2.1 and Clang 21.1.</description><pubDate>Tue, 10 Mar 2026 00:00:00 GMT</pubDate></item><item><title>Lock-Free Queue Implementations Compared: Correctness, Performance, and the Bugs You&apos;ll Ship</title><link>https://moderncpp.dev/articles/lock-free-queue-comparison/</link><guid isPermaLink="true">https://moderncpp.dev/articles/lock-free-queue-comparison/</guid><description>A `std::mutex`-protected `std::deque` is 12% faster than moodycamel::ConcurrentQueue when contention is low.</description><pubDate>Fri, 06 Mar 2026 00:00:00 GMT</pubDate></item><item><title>std::expected on Bare Metal: Error Handling Without Exceptions</title><link>https://moderncpp.dev/articles/std-expected-bare-metal/</link><guid isPermaLink="true">https://moderncpp.dev/articles/std-expected-bare-metal/</guid><description>The `-fno-exceptions` build flag and `int` return codes. Every embedded C++ codebase I&apos;ve worked on has both, and the pattern is always the same: return an error code, take an output pointer, hope the</description><pubDate>Tue, 03 Mar 2026 00:00:00 GMT</pubDate></item><item><title>Cache-Line Archaeology: Finding and Fixing False Sharing in Production</title><link>https://moderncpp.dev/articles/cache-line-false-sharing/</link><guid isPermaLink="true">https://moderncpp.dev/articles/cache-line-false-sharing/</guid><description>Your threads are doing independent work on independent data, and yet adding a second thread makes everything six times slower. This is false sharing, and it hides in struct layouts and thread-local counters across more production codebases than anyone wants to admit.</description><pubDate>Fri, 27 Feb 2026 00:00:00 GMT</pubDate></item></channel></rss>