Tuesday, 24 March 2026

Streaming experts bring 400B LLMs to iPhones, GPT-5.4 Pro solves a frontier math problem, and CanisterWorm targets Iran through supply chain attacks

Today's Lead

Simon Willison

Streaming Experts: Running Massive AI Models on Limited Hardware

Simon Willison discusses 'streaming experts,' a technique that allows enormous language models to run on consumer hardware by dynamically loading model weights from SSD rather than keeping them entirely in RAM. The approach enables a 397B-parameter Qwen model to run on an iPhone 17 Pro at 0.6 tokens/second, and a trillion-parameter Kimi K2.5 model on a MacBook Pro with 96GB RAM. Researchers are running autoresearch loops to find further optimizations, suggesting the technique has significant legs for the future of local AI inference.

Read →

Also today

Epoch AI

GPT-5.4 Pro Solves a Frontier Math Open Problem

GPT-5.4 Pro has solved a frontier mathematics open problem from the FrontierMath benchmark concerning Ramsey theory and hypergraphs. The problem required constructing hypergraphs with no partitions larger than n and determining the maximum number of vertices achievable. GPT-5.4 Pro's solution improved existing lower bounds through a novel construction that matched the known upper bound — a rare feat in Ramsey theory — and was independently verified by the problem's author and other advanced AI models.

Read →

Krebs on Security

CanisterWorm Springs Wiper Attack Targeting Iran

The cybercrime group TeamPCP deployed CanisterWorm, a self-propagating worm that spread through two compromised open-source security tools — the Trivy and KICS vulnerability scanners. The malware activates a destructive data-wiping payload when it detects Iran's timezone or Farsi language, destroying data across entire Kubernetes clusters if present. The group orchestrates its infrastructure via blockchain-based ICP canisters, making it resistant to takedown, and has been conducting credential theft and extortion campaigns against cloud environments since December 2025.

Read →

RocksDB Blog

RocksDB Discovers AMD CPU Bug Through Unit Testing

RocksDB's random number generation unit tests unexpectedly uncovered a hardware bug in newer AMD processors: the RDSEED instruction was returning 0 while falsely reporting success far more often than statistically possible, under specific conditions involving multiple cores and heavy memory load. AMD acknowledged the bug and released a microcode fix, while Meta developed a Linux kernel patch as a workaround. The discovery underscores the value of rigorous low-level testing — even CPU instruction guarantees can have bugs.

Read →

Cloudflare Blog

Cloudflare Launches Gen 13 Servers with 2x Throughput via Hardware-Software Co-Design

Cloudflare announced its Gen 13 server platform, achieving double the compute throughput over Gen 12 by pairing AMD EPYC Turin processors (192 cores) with FL2, a complete Rust rewrite of their request handling layer. Earlier benchmarks showed that FL1 (NGINX/LuaJIT) struggled badly with Turin's reduced per-core L3 cache, but FL2's leaner memory access patterns eliminated the bottleneck. The result: 2x throughput, 50% better performance-per-watt, and 60% higher rack-level throughput — all without regressing latency SLAs.

Read →

The Register

GitHub Struggles to Maintain Three Nines Availability

GitHub has been failing to meet its own 99.9% uptime SLA, with recent February 2026 outages affecting Actions, pull requests, and Copilot — in some cases with incident notification delays reaching 50 minutes. Notably, GitHub's uptime dropped below 90% at one point in 2025. As a critical chokepoint in the global software supply chain, GitHub's reliability struggles have outsized impact on development teams worldwide.

Read →

GitHub

LocalStack Shifts to Commercial Model, Archives Open-Source Repository

LocalStack has archived its open-source GitHub repository and consolidated development into a single unified commercial image. The project now directs users to a commercial offering with a free Hobby tier for non-commercial use. This mirrors a broader trend of developer infrastructure tools transitioning from fully open-source to commercial SaaS models as their operational complexity grows.

Read →

GitHub Blog

GitHub Expands Security Coverage with AI-Powered Detections

GitHub Code Security is introducing AI-powered vulnerability detections to extend coverage beyond CodeQL into ecosystems like Shell/Bash, Dockerfiles, Terraform (HCL), and PHP. The system surfaces findings directly in pull requests alongside automated Copilot Autofix suggestions. In internal testing, over 170,000 findings were processed in 30 days with 80% positive developer feedback, and Autofix has already resolved more than 460,000 security alerts — cutting average resolution time roughly in half.

Read →

TechCrunch

Cyberattack on Vehicle Breathalyzer Company Leaves Drivers Stranded

A cyberattack targeting a U.S. vehicle breathalyzer (ignition interlock) manufacturer compromised their backend systems, disabling the devices in vehicles across multiple states and leaving thousands of drivers unable to start their cars. The incident highlights the physical-world consequences of attacks on connected automotive supply chains, where a single vendor compromise can directly strand users.

Read →

Skylar B. Payne

If DSPy Is So Great, Why Isn't Anyone Using It?

This article diagnoses DSPy's low adoption despite solving real AI engineering problems: teams simply don't realize they need it until they've already built a worse version themselves. The author maps out seven maturity stages in AI system engineering — from raw API calls to optimized prompt pipelines — and argues that DSPy's core abstractions (Signatures, Modules, Optimizers) are what teams inevitably converge on. The takeaway: adopting DSPy early means deliberate architecture rather than accreted complexity.

Read →