Wednesday, 13 May 2026
Six critical CVEs hit dnsmasq, GitHub Copilot moves to usage-based pricing, and DuckDB gains a native client-server protocol
Today's Lead
dnsmasq-discuss
CERT Releases Six Critical CVEs for dnsmasq
Simon Kelley, the sole maintainer of dnsmasq, announced six CVEs covering serious security vulnerabilities in nearly all non-ancient versions of the widely-deployed DNS and DHCP software. Patches are available in version 2.92rel2, with full technical details published at thekelleys.org.uk/dnsmasq/CVE/. Kelley explicitly attributes the increased pace of bug discovery to AI-based security research tools, a theme that echoes across this month's broader security landscape — the same week that Apple, Google, Microsoft, and Oracle are patching record volumes of vulnerabilities after participating in Anthropic's Project Glasswing AI vulnerability-discovery capability. Dnsmasq is embedded in an enormous range of infrastructure: home routers, Android devices, Linux distributions using NetworkManager, and embedded systems throughout networking hardware. The exposure surface is correspondingly wide, and because many deployments receive updates only through vendor firmware channels rather than direct package management, the effective patch latency for a significant fraction of affected devices will be measured in months or years rather than days. Kelley also noted a deliberate policy decision: rather than adopting the long-embargo coordination model common in enterprise vulnerability disclosure, he is prioritizing rapid releases, with version 2.93 targeting release within a week. This approach trades coordinated vendor notification time for faster availability of patches to administrators who can apply them directly — a reasonable tradeoff given that the AI-discovery pipeline that found these bugs likely means attacker-side awareness is also accelerating.
Also today
Krebs on Security
Patch Tuesday, May 2026: AI Is Finding Bugs Faster Than Vendors Can Ship Patches
Microsoft's May Patch Tuesday addressed 118 security vulnerabilities — the first such release in nearly two years with no active zero-day exploits in the batch. Sixteen earned Microsoft's 'critical' designation, including a stack-based buffer overflow in Windows Netlogon (CVE-2026-41089) granting SYSTEM privileges on domain controllers with no required user interaction, a critical DNS client RCE (CVE-2026-41096), and an Entra ID authentication bypass via forged credentials (CVE-2026-41103). The broader story this cycle is the AI-driven volume surge across the entire industry: Apple shipped iOS 15 with 52 vulnerabilities patched and backported fixes to iPhone 6s; Google fixed 127 Chrome vulnerabilities in a single update; Oracle addressed 450 flaws in its quarterly patch cycle and announced a shift to monthly critical updates; Mozilla released Firefox 150 with 271 vulnerabilities — a direct result of running the browser through an AI evaluation pipeline. The connecting thread is 'Project Glasswing,' a vulnerability-discovery capability developed by Anthropic and made available to a cohort of major software vendors. The AI tooling appears to be genuinely effective at surfacing bugs that manual review misses, which is producing a distinctive pattern: companies that have participated in the program are releasing outsized patch volumes, at accelerating cadence, for flaws in code that has been shipping for years. For security teams, the operational implication is that the steady-state patch backlog across the entire software industry is being worked through much faster than historical averages predicted — a net positive, but one that places real pressure on patch management processes that were calibrated for the old disclosure rhythm.
Read →GitHub Blog
GitHub Copilot Restructures Individual Plans with Flex Allotments and a New Max Tier
GitHub is moving all Copilot individual plans to usage-based billing effective June 1, 2026, with a restructured lineup: Free (limited usage), Pro ($10/month, $15 in total included usage), Pro+ ($39/month, $70 total), and a new Max tier ($100/month, $200 total). The pricing architecture introduces a 'flex allotment' concept on top of fixed base credits — a variable additional usage pool that GitHub can adjust over time as model pricing and efficiency evolve. Code completions and next-edit suggestions remain unlimited on all paid plans and don't consume credits; the usage-based pool applies to chat and agentic workflows. The flex allotment framing is worth examining carefully: it means the effective value of paid plans is explicitly not fixed, and GitHub reserves the right to reduce the variable component as economics shift. The base credit component (1:1 with subscription price) is guaranteed not to change, but the bonus usage that makes the plans attractive is explicitly ephemeral. For individual developers currently on monthly Pro or Pro+ plans, the transition is automatic — no action required. For teams evaluating which tier to commit to, the Max plan at $100/month targets sustained high-volume agentic work, implying GitHub's internal modeling suggests that developers running long multi-step Copilot workflows regularly consume well above the Pro+ allotment. The shift represents GitHub's formal acknowledgment that the move to agentic coding workflows fundamentally changes the unit economics of AI developer tooling — and that pricing structures calibrated for completion and chat interactions no longer map cleanly onto the emerging mode of use.
Read →GitHub (Cactus Compute)
Needle: A 26M Parameter Tool-Calling Model Distilled from Gemini
Cactus Compute released Needle, an MIT-licensed 26M parameter model specialized for single-shot function calling, achieving 6,000 tokens/second prefill and 1,200 tokens/second decode on consumer devices. The architectural claim is notable: the entire model uses attention and gating with no MLP/FFN layers anywhere — a deliberate design choice grounded in the observation that function calling is fundamentally a retrieval-and-assembly task (match query to tool name, extract argument values, emit JSON) rather than a reasoning task that needs factual memory stored in FFN weights. If the relevant facts are in context — which they always are in tool-calling scenarios, where tool schemas are provided in the prompt — the FFN's role as a knowledge store becomes unnecessary. The model was pretrained on 200B tokens across 16 TPU v6e (27 hours) and post-trained on 2B tokens of Gemini-synthesized function-calling data (45 minutes), covering 15 tool categories. On single-shot function-calling benchmarks, it outperforms FunctionGemma-270M, Qwen-0.6B, Granite-350M, and LFM2.5-350M — models 10–14x larger — while acknowledging that those models have broader conversational capacity. The practical framing from commenters is telling: a 26M model fast enough to run on a phone could serve as a lightweight router that decides which tool to call and dispatches accordingly, reserving expensive large-model calls for cases that genuinely require reasoning. The no-FFN finding may also generalize to other retrieval-augmented settings where context contains all necessary information, pointing toward a class of smaller, faster specialized models for agentic pipelines where the bottleneck is decision latency rather than raw intelligence.
Read →DuckDB Blog
Quack: DuckDB Introduces a Native Client-Server Protocol
DuckDB has shipped Quack, an HTTP-based client-server protocol that allows multiple DuckDB instances to communicate across processes and machines — filling what has been a notable gap in DuckDB's capabilities for networked and multi-process database access. The protocol is designed with DuckDB's characteristic emphasis on simplicity and performance: bulk data transfers benchmarked at 60 million rows in under 5 seconds, and concurrent writes achieving approximately 5,434 transactions per second. Security defaults are sane: the server binds to localhost by default and generates a random authentication token on startup, requiring explicit configuration to expose over a network. DuckDB's architecture has historically been explicitly single-process and single-node — optimized for analytics workloads where a single machine pulls in data, analyzes it, and outputs results. Quack doesn't change that model so much as it creates a coordination layer: multiple DuckDB instances can now share data, coordinate writes, and serve results to remote clients over HTTP without requiring a separate database server. For data engineering workflows that currently involve awkward handoffs through Parquet files or intermediary services, Quack provides a more direct path. The announcement also signals DuckDB's continued expansion from a read-optimized analytics engine toward a more general-purpose embeddable database — a trajectory that has been apparent in its growing write performance and broader SQL coverage, and that Quack accelerates by making networked use cases first-class.
Read →Google DeepMind
Google DeepMind Reimagines the Mouse Pointer for the AI Era
Google DeepMind published a concept and early demo for an AI-powered cursor — powered by Gemini — that understands context and intent rather than acting purely as a spatial targeting device. The core interaction model: a user points at any on-screen element and speaks a shorthand instruction ('Fix this,' 'Summarize,' 'Send to John'), and the AI captures the visual and semantic context of whatever the cursor is pointing at, interprets the instruction in that context, and acts across applications without the user switching to a separate AI interface or constructing a detailed prompt. The vision represents a meaningful departure from the current 'AI in a sidebar' paradigm, where users must manually copy content into a chat interface, construct context, and transfer outputs back. By embedding AI interpretation at the pointing layer, the model turns the cursor itself into the context-gathering mechanism — every click or hover becomes an implicit declaration of what the user is focused on. The demo covers application-crossing scenarios: pointing at an email and saying 'schedule this' navigates to a calendar; pointing at code and saying 'explain' opens inline documentation. This sits alongside Google's parallel 'Googlebook' announcement (a new reading-focused product) and the broader arc of Gemini integration into Android and Chrome OS, suggesting a coordinated push toward Gemini as an OS-level intelligence layer rather than a chat application. Whether the implementation can match the concept — particularly on the consent and context-boundary questions that arise when an AI has ambient access to everything on screen — remains to be seen.
Read →Cloudflare Blog
When 'Idle' Isn't Idle: How a Linux Kernel Optimization Became a QUIC Death Spiral
Cloudflare's networking team published a detailed postmortem on a bug in their open-source QUIC implementation (quiche) where the CUBIC congestion controller's window would become permanently stuck at two packets — never recovering from congestion collapse even after packet loss ceased entirely. The root cause traces back to a 2017 Linux kernel fix for CUBIC's behavior after application idle periods: rather than resetting the congestion epoch (which distorts the growth curve), the kernel shift advances the epoch start time forward by the idle duration, preserving the growth curve's shape. When this logic was ported to quiche in 2020, a subtle difference in how QUIC and TCP measure 'idle' introduced a bug. At minimum congestion window (2 packets), bytes-in-flight drops to zero after every RTT as the window drains and the next burst hasn't been sent yet. The ported code interpreted this zero as an idle period, measured the 'idle duration' using the last packet sent time (roughly one RTT in the past), and applied that inflated delta to advance the recovery start time — often pushing it into the future. With recovery start time perpetually in the future, the controller treated every outgoing packet as being in recovery, skipped window growth for all of them, and stayed locked at minimum window. The fix: track the time of the last ACK processed (the actual moment bytes-in-flight hit zero) and use that as the idle start measurement point rather than the last packet sent time. Three lines of code. The incident is a textbook case of how subtle the behavioral contracts between abstraction layers can be: a correct optimization in one context (kernel TCP's CA_EVENT_TX_START callback timing) creates a latent bug when ported to a context with different measurement semantics (userspace QUIC's on_packet_sent timing). The 61% test failure rate that initially surfaced this — and the fact that it only triggered after a specific sequence of congestion avoidance state, minimum cwnd, and zero bytes-in-flight — illustrates why corner-case congestion controller behavior is notoriously difficult to test.
Read →Latent Space
The End of Finetuning: A Bifurcation Between Mainstream and Frontier AI Engineering
Prompted by OpenAI's deprecation of their finetuning APIs, Latent Space's AINews examines what has become a visible bifurcation in the AI engineering industry. For the majority of practitioners, finetuning has been quietly superseded by long-context prompting and retrieval-augmented generation — approaches that are cheaper, faster to iterate, and require no infrastructure to maintain model weights. OpenAI's decision to sunset finetuning APIs is both a reflection of this trend and an acceleration of it, removing a reason to invest in the workflow at all. At the same time, the analysis makes clear that frontier companies — Cursor, Cognition — have moved in the opposite direction, significantly increasing their investment in open model RLFT (reinforcement learning from human feedback on top of open weights) rather than relying on API-based finetuning. The divergence makes sense when you look at what each approach optimizes: for most applications, long prompts and good retrieval get 80% of the way to customized behavior without operational complexity; for systems where model behavior needs to be precisely tuned for a specific agentic workflow or coding assistant use case, RLFT on open models provides the control and cost structure that API finetuning never did. The piece also notes a broader trend: the tools that appeared important for AI engineering in 2024–2025 are being triaged by the market, with Sora already an earlier casualty and finetuning APIs now following. The open question is whether this consolidation — toward long prompts and context injection as the dominant paradigm — reflects genuine technical convergence or a temporary equilibrium that frontier RLFT work will eventually disrupt.
Read →Martin Fowler
What Is Code? Programming as Conceptual Modeling in the Age of LLMs
Unmesh Joshi, writing on Martin Fowler's site, makes a case that code has always served two distinct purposes simultaneously: instructions to a machine and a conceptual model of the problem domain. The article's argument becomes most interesting when it reaches the present: as LLMs increasingly generate the syntactic layer of code, the activity that remains distinctively human — and that determines the quality of the eventual machine instructions — is the development of a shared vocabulary that accurately represents the problem domain. Well-named abstractions, precise domain concepts, and a coherent conceptual model are not just aesthetically pleasing; they are the context in which LLM-generated code either becomes correct or accumulates subtle misalignment. The piece positions programming languages less as tools for instructing machines and more as thinking tools: they force developers to make their mental models precise enough to be executable, and that precision is the irreplaceable contribution. The argument has direct implications for how teams should approach AI-assisted development: the investment in getting domain vocabulary right, building shared conceptual models, and reviewing code for conceptual coherence rather than just syntactic correctness becomes more valuable as code generation becomes cheaper, not less. The article is more philosophical than technical, but it addresses a question that has become increasingly live — 'what is a programmer actually doing?' — with more precision than most takes that frame the answer purely in terms of prompt engineering or output review.
Read →Trail of Bits
Trail of Bits Forks the Go Toolchain to Bring AFL++-Grade Fuzzing to Go
Trail of Bits released gosentry, a fork of the Go toolchain that replaces Go's native fuzzing engine with LibAFL while preserving the standard testing.F harness interface — meaning existing Go fuzz tests run under gosentry without modification. The gap gosentry addresses is real: Go's native fuzzer has lagged the Rust/C/C++ fuzzing ecosystem significantly, lacking support for grammar-based fuzzing (Nautilus), struct-aware fuzzing over composite types, and detection of several important bug classes including integer overflows, goroutine leaks, data races, and execution timeouts. Gosentry adds all of these, plus coverage report generation from existing campaigns and configurable crash-on-log-critical behavior for codebases that log errors rather than panic. The real-world validation is compelling: Trail of Bits ran gosentry on blockchain infrastructure targets and disclosed a set of protocol mismatches and state inconsistencies in Optimism's kona-protocol and op-revm — classes of bugs that grammar-based differential fuzzing is well-suited to find and that the native Go fuzzer would have struggled to reach. For Go teams working on protocol implementations, parsers, or any code where input structure matters, gosentry offers a substantially stronger fuzzing capability without requiring harness rewrites. The project is available on GitHub and integrates into existing CI pipelines through the same interface as go test -fuzz. Trail of Bits' decision to fork the toolchain rather than build a separate framework reflects a deliberate ergonomics choice — reducing the activation energy for adoption by making the upgrade path as frictionless as possible for teams with existing fuzz coverage.
Read →