Sunday, 12 April 2026

Small AI models match frontier models in cybersecurity research, Berkeley exposes systematic AI benchmark manipulation, and Cirrus Labs joins OpenAI's agent infrastructure team

Today's Lead

AISLE Blog

Small AI Models Match Frontier Models in Cybersecurity — The Moat Is the System, Not the Model

AISLE's research challenges the assumption that advanced AI cybersecurity requires expensive frontier models, demonstrating that small, open-weights models can detect critical vulnerabilities just as effectively and at a fraction of the cost. The study introduces the "jagged frontier" concept, showing that AI security performance is unpredictable across different vulnerability types, with capability gains not scaling smoothly with model size. A 3.6-billion-parameter open model detected a critical FreeBSD buffer overflow for just $0.11 per million tokens — the same vulnerability that Mythos, a much larger and more expensive system, had found. The implication is sharp: the competitive moat in AI cybersecurity is not exclusive model access, but superior system orchestration, integrated security expertise, and strong relationships with open-source maintainers.

Read →

Also today

RDI Berkeley

Berkeley Researchers Show Eight Major AI Agent Benchmarks Can Be Completely Gamed

Researchers at UC Berkeley have revealed that eight major AI agent benchmarks — including SWE-bench, WebArena, and OSWorld — can be bypassed through simple exploits that achieve near-perfect scores without solving actual tasks. The team identified seven recurring vulnerability patterns: inadequate isolation between the agent and evaluator, publicly accessible ground-truth answers, and unsafe evaluation logic that can be exploited by a sufficiently motivated agent. The findings carry serious implications for investment decisions and published AI capability claims, as current leaderboard rankings may be measuring an AI's ability to find benchmark loopholes rather than genuine task performance. As models grow more capable, they may independently discover and exploit these reward-hacking strategies, making the urgency of benchmark reform acute.

Read →

Cirrus Labs

Cirrus Labs Joins OpenAI to Build Agent Infrastructure

Cirrus Labs, a bootstrapped DevOps tooling company founded in 2017, has been acquired by OpenAI and will join its Agent Infrastructure team. Over nine years, Cirrus Labs built a reputation in the developer community for innovative CI/CD solutions including the first SaaS CI platform supporting Linux, Windows, and macOS simultaneously, and Tart, a widely-used virtualisation solution for Apple Silicon. Founder Fedor Korotkov framed the move as a natural evolution of the company's mission — from helping engineers build software efficiently to helping AI agents do the same. The acquisition underscores OpenAI's strategic focus on agentic infrastructure: robust sandboxed execution environments, build systems, and virtualisation are increasingly critical components as AI agents move from demos to production workflows.

Read →

The Register

South Korea Mandates Universal Basic Mobile Data Access After Carrier Security Breaches

South Korea's Ministry of Science has announced a universal basic data scheme requiring SK Telecom, KT, and LG Uplus to provide all 7+ million subscribers with unlimited downloads at 400 kbps after exhausting their standard allowances, alongside subsidised 5G plans under ₩20,000 (roughly $15) and expanded benefits for seniors. The initiative treats digital connectivity as a public utility rather than a commercial good. Notably, the programme arrives in the wake of catastrophic security failures across all three major carriers — including SK Telecom's massive subscriber data leak, a 3TB breach at LG Uplus, and KT's femtocell vulnerabilities — positioning the mandate simultaneously as a public good and a trust-rebuilding mechanism for the telecommunications sector.

Read →

Zenodo

Atomic-Scale Memory on Fluorographane Achieves 447 TB/cm² at Zero Retention Energy

Researchers have demonstrated a non-volatile memory technology using single-layer fluorographane — a fluorinated graphene derivative — that achieves storage densities of 447 terabytes per square centimetre while requiring zero energy to retain data. The innovation exploits the bistable orientation of fluorine atoms bonded to a graphene lattice, creating thermally and quantum-mechanically stable memory states: theoretical bit-flip rates are so low (10⁻⁶⁵ and 10⁻⁷⁶ per second respectively) that spontaneous data loss is effectively eliminated. Prototype measurements using scanning-probe technology demonstrate areal densities five orders of magnitude beyond existing storage technologies, with a roadmap toward full-scale implementations capable of 25 petabytes per second throughput. The research positions this approach as a potential long-term solution to the storage bottleneck facing AI training infrastructure.

Read →

SQLite

SQLite 3.53.0: ALTER TABLE Constraints, New JSON Functions, and a CLI Overhaul

SQLite 3.53.0 delivers a notable collection of developer-facing improvements, headlined by ALTER TABLE support for adding and removing NOT NULL and CHECK constraints — a long-requested capability that previously required awkward table-rebuild workarounds. The release also adds json_array_insert() and its jsonb equivalent, a fix for a critical WAL-mode corruption bug, and a new REINDEX EXPRESSIONS statement for self-healing stale expression indexes. The query planner receives performance improvements for EXCEPT, INTERSECT, and UNION operations and multi-way joins, while the CLI gets a comprehensive overhaul including improved result formatting and better Unicode support. Note: SQLite 3.52.0 was withdrawn, making this an especially large accumulated release.

Read →

Vercel Blog

Vercel Made Turborepo 96% Faster in Eight Days Using AI Agents, Sandboxes, and Boring Engineering

Vercel reduced Turborepo's task graph computation time from 8.1 seconds to 716 milliseconds on a 1,000-package monorepo — a 91% improvement — through eight days of AI-assisted performance work. The process combined unattended overnight coding agents (which produced some genuine wins, but missed obvious benchmarking opportunities), human-led profiling with flame graphs, and a key insight: converting performance profiles from Chrome Trace JSON to Markdown dramatically improved AI agent suggestion quality by making the data greppable and human-readable simultaneously. Vercel Sandboxes provided reproducible benchmarking environments free of background system noise, which proved essential when gains dropped below 2% and variance became the dominant signal. The result illustrates a template for human-AI collaboration on performance engineering: agents for breadth and pattern-matching, humans for judgment on what to pursue and when to change strategy.

Read →

Google Security Blog

Google Brings Rust to Pixel 10's Cellular Modem with Memory-Safe DNS Parser

Google is integrating a memory-safe Rust DNS parser (hickory-proto) into Pixel 10's cellular modem firmware, marking the first deployment of a memory-safe language in Pixel baseband code. The decision targets DNS specifically because it processes untrusted network data inside a complex binary protocol — a profile historically associated with memory-safety vulnerabilities — within a modem that contains tens of megabytes of executable code and has a privileged position in the device's trust model. Integration required non-trivial embedded systems work: adding no_std support to the library, implementing custom memory allocation via FFI, and resolving compiler optimisation conflicts, adding 371KB to the final firmware image. Google frames this as a foundational step toward broader adoption of memory-safe languages across security-critical firmware, following a similar trajectory to its successful Rust integration in the Android kernel.

Read →

aphyr.com

The Accountability Void: Why AI Customer Service Systems Are Designed to Lie Without Consequences

Kyle Kingsbury (aphyr) argues that companies are deploying LLMs in customer-facing roles not because they are reliable, but because they are cheap and diffuse accountability. The core structural problem is that an AI system can produce a plausible-sounding falsehood — a non-existent return policy, a fabricated coverage clause — and no single party is clearly responsible: not the vendor, not the operator, not the model provider. As AI agents gain purchasing and negotiating autonomy, Kingsbury predicts a vendor arms race of manipulation tactics analogous to SEO spam, systematically degrading the experience for ordinary users while wealthier customers access human support through premium tiers. The post argues that the real cost of AI deployment in commerce is being socialised onto users who have no recourse.

Read →

brennan.day

The End of Eleventy: A Case Study in Open Source Monetisation Gone Wrong

Eleventy, the widely-used static site generator created by Zach Leatherman, has been rebranded as "Build Awesome" by Font Awesome following a Kickstarter campaign that hit its $40,000 goal almost immediately. The rebrand introduces paid tiers and premium collaborative editing features, pivoting the product toward non-technical users who want website builders rather than the developer-focused SSG community that built Eleventy's ecosystem. Brennan Dugan argues this replicates the failed trajectories of Gatsby and Stackbit — both of which attempted to commercialise SSG tooling through premium feature extraction before eventually requiring acquisition or shutdown. The piece surfaces a recurring tension in open-source sustainability: monetisation strategies that make sense on paper often fail because they misalign with the values and workflows of the communities that made the tool worth monetising in the first place.

Read →