Saturday, 09 May 2026

ChatGPT 5.5 Pro solves a Fields Medalist's open research problems in two hours, Google quietly revives Web Environment Integrity as 'Cloud Fraud Defence', and a new Linux kernel vulnerability allows unprivileged users to corrupt arbitrary cached files

Today's Lead

Gowers's Weblog

A Fields Medalist Let ChatGPT 5.5 Pro Loose on His Open Research Problems. It Solved Them.

Tim Gowers — a Fields Medal-winning mathematician at Cambridge and one of the most prominent critics-turned-cautious-optimists in the AI-and-mathematics debate — posted a detailed account of ChatGPT 5.5 Pro solving open problems from his current research in additive combinatorics. The core result: in roughly two hours of back-and-forth, the model improved a bound he had been working on from exponential to polynomial dependence, demonstrating what Gowers describes as genuine innovation rather than recombination of known results. He is careful to hedge — he has not fully verified the proof, and the back-and-forth involved real human steering — but his reading is that the model was constructing original mathematical insight, not retrieving a known technique. The implications Gowers draws are concrete and uncomfortable: the traditional 'gentle problems' that PhD supervisors assign to students as onboarding exercises are no longer viable as training grounds. If a model can solve them in two hours, they cannot function as the low-stakes environments for building mathematical intuition that they were designed to be. His tentative conclusion is that the future of mathematics involves collaborative problem-solving with AI as a genuine research partner — but that this future also disrupts the graduate education pipeline in ways nobody has worked out how to address. The broader significance is the source: Gowers is not a credulous observer, and his willingness to describe this as research-level capability rather than surface-level pattern matching is the kind of expert attestation that changes how the mathematical community will receive claims about AI capability in the field.

Read →

Also today

Private Captcha

Google Cloud Fraud Defence Is Just Web Environment Integrity Repackaged

In 2023, Google proposed Web Environment Integrity (WEI) — a browser API that would require websites to verify that users were running 'approved' browser environments before granting access. The proposal was killed after swift, unified backlash from browser vendors, privacy advocates, and developers who recognized it as a mechanism that would let platform owners gatekeep access based on device attestation, effectively ending the open web's guarantee that any compliant client could reach any server. Private Captcha's analysis shows that Google Cloud Fraud Defence, launched quietly this year, uses device attestation technology that is mechanically identical to WEI: it requires users to scan a QR code with a Google Play Services-enabled device to prove they are on a 'trusted' platform, generating a certificate that the website can verify with Google's API. The security case for the approach is weak on its own terms — the mechanism is vulnerable to straightforward workarounds like pointing a compliant phone's camera at a screen showing the QR code, or bulk-purchasing certified devices for bot operations. The privacy case is worse: the hardware identifiers involved are persistent, enabling cross-site correlation of which certified devices access which services. The practical impact is already visible: de-Googled Android users running distributions like GrapheneOS or CalyxOS without Google Play Services are being blocked from services that adopt Cloud Fraud Defence, including Google's own reCAPTCHA — which explains the related HN story this week about reCAPTCHA breaking for 878-point engagement. What WEI failed to achieve through a standards process, Google is achieving incrementally through its cloud services business.

Read →

Jeff Kaufman

AI Is Breaking Two Vulnerability Cultures Simultaneously

Jeff Kaufman's post (310 points on HN) argues that AI is simultaneously disrupting both of the dominant cultural frameworks security researchers use for managing vulnerability disclosure — and that the disruption runs in opposite directions. The first culture is coordinated disclosure: the standard practice of giving vendors a 90-day embargo window before publishing a vulnerability, so they have time to patch before attackers can exploit the public information. This 90-day window was calibrated around the assumption that discovering the same vulnerability independently requires significant human effort. AI collapses that assumption: if a model can be prompted to find the same class of bug in a codebase within hours of a CVE being published, the practical embargo is zero — the disclosure itself is an attack signal, not advance warning. The second culture is the informal 'vulnerability broker' market, where researchers sell unreported vulnerabilities to governments, intelligence agencies, and commercial exploit vendors. AI erodes this market from the supply side: if finding vulnerabilities becomes cheap and automatable, the scarcity premium that makes individual exploits worth six or seven figures disappears. Kaufman's proposed solution for the first problem — move to very short or immediate disclosure, on the theory that AI accelerates patching as much as it accelerates exploitation — is underexplored and may be more wishful than realistic for the organizations currently patching on 90-day cycles. But the structural observation is correct and its implications are still being worked out by the security community.

Read →

retr0.zip

CVE-2026-31431: A Logic Flaw That Chains Kernel Subsystems Into a 4-Byte Root Write

CVE-2026-31431, dubbed 'Copy Fail,' is a Linux kernel local privilege escalation that chains together three kernel subsystems — the page cache, scatterlists, and the authencesn AEAD crypto template — to produce a deterministic 4-byte write into cached file data without triggering memory safety checks. The attack works by exploiting a 2017 performance optimization that, under specific conditions, places pages into a scatterlist position they are not supposed to occupy. Memory sanitizers like KASAN do not detect the write because the memory access itself is architecturally valid — the logic error is in which page ends up at that address, not in how the write to that page is performed. The practical exploitation path targets world-readable files: an unprivileged user can corrupt /etc/passwd to insert a new root-privileged account, or corrupt other cached executables and configuration files with the same 4-byte primitive. The fix is a one-line revert of the 2017 optimization. The disclosure write-up is technically dense and worth reading for the subsystem interaction model it demonstrates — the vulnerability isn't in any one component but in a subtle emergent property of how three components interact under a specific sequence of operations, which is exactly the class of bug that static analysis and fuzzing are least likely to find.

Read →

Joachim Schipper

Stop MITM on the First SSH Connection, on Any VPS or Cloud Provider

The standard Trust On First Use (TOFU) approach to SSH host key verification has a well-known gap: the first connection to a newly provisioned VM is inherently unauthenticated. If an attacker can MITM that connection — trivial in a cloud environment where the network path between client and VM is controlled by the provider — they can intercept credentials or inject commands before any trust is established. Joachim Schipper's post proposes a cloud-init-based solution that works across any provider supporting user data. The approach injects a short-lived temporary SSH host key at provision time through cloud-init; the client uses this key exclusively to retrieve and verify the machine's permanent long-term host keys over a channel that is cryptographically bound to the provisioning process, rather than to the network path. The permanent private keys are never exposed through the temporary channel, and the temporary key is rotated out after the bootstrapping exchange. The practical advantage is that it requires no provider-specific tooling — any cloud-init-compatible provider is supported, which covers the major IaaS providers and most VPS offerings. The gap being addressed is real and underappreciated: most developers accept TOFU on first connection and rely on the practical obscurity of being an uninteresting target. In enterprise or sensitive environments, that assumption is worth hardening.

Read →

Anthropic

Anthropic: Training Claude on Why Ethics Work, Not Just What's Ethical

Anthropic published research describing a training methodology shift: rather than training Claude to imitate ethical behavior by example, they trained it to understand the underlying reasons why specific behaviors are better than their alternatives — essentially teaching constitutional reasoning rather than behavioral pattern matching. The measurable results are significant: misaligned behavior fell substantially while generalization across novel ethical scenarios improved, using 28× fewer training tokens than direct scenario-specific matching. The mechanism works by emphasizing explanations of ethical principles during training and using diverse data featuring narratives of aligned AI behavior, rather than pairing input scenarios with correct-output labels. The deeper implication is about robustness: a model that has learned 'do X in situation Y' is brittle when Y varies in unforeseen ways; a model that understands why X is better than alternatives can reason about variants of Y it has never seen. This is the same distinction that makes deontological versus consequentialist reasoning behave differently under edge cases — and applying it to model training rather than philosophical debate produces an empirically testable prediction. The research connects to the broader interpretability program at Anthropic: models that understand reasons for their training signal, rather than correlating inputs to outputs, may produce more stable representations of their goals that are detectable through interpretability tools.

Read →

Let's Encrypt

Let's Encrypt Halted All Certificate Issuance for 2.5 Hours During Root Transition

Let's Encrypt stopped issuing certificates for approximately 2.5 hours on May 8 after detecting a potential incident with the cross-signed certificate chain during its ongoing transition from the Generation X (ISRG Root X1) to Generation Y (ISRG Root Y1) root. The incident was resolved by reverting certificate issuance back to the Generation X root, restoring normal operations at 21:03 UTC. Let's Encrypt issues roughly 6–7 million certificates per day, so a 2.5-hour outage represents a meaningful disruption to automated renewal pipelines — particularly for any operator running renewals on short windows or close to expiry. The Generation X to Generation Y root transition is a planned long-term migration driven by the shift toward newer cryptographic primitives; cross-signed chains are how Let's Encrypt maintains compatibility with older clients that trust the legacy root hierarchy while migrating to the new one. The fragility exposed here is in the cross-signing step: a newly issued cross-signed intermediate that fails validation interrupts the entire issuance pipeline, not just the transition-specific path. The incident was detected and resolved within Let's Encrypt's own monitoring rather than via widespread renewal failures, which is the intended operational outcome — but it is a reminder that infrastructure-level PKI transitions carry real outage risk even for well-operated CAs.

Read →

btxx.org

Serving a Website on a Raspberry Pi Zero Running Entirely in RAM

A practical experiment in running a website on a Raspberry Pi Zero v1.3 — the original 512MB model — using diskless Alpine Linux entirely from RAM. The setup eliminates the SD card as a failure point (a meaningful concern for always-on Pi deployments where SD cards wear out from writes), and uses darkhttpd for web serving and Dropbear for SSH, keeping the total footprint well within the 512MB RAM envelope. Alpine's local backup utility (lbu) handles the diskless configuration, saving the system state to a network location at shutdown rather than to local storage. TLS termination is handled by a separate budget VPS rather than on the Pi itself, with traffic proxied through a socat relay on a single exposed port. The project is partly a technical exercise and partly a commentary on how low the floor is for running infrastructure: a device that costs under $20, consumes under a watt, and fits in a pocket is sufficient to serve a static website to real traffic — the primary constraint is bandwidth, not compute. The approach generalizes to any use case where persistence is managed externally and the device is stateless at the kernel level.

Read →

poppastring

What We Lost the Last Time Code Got Cheap — and What We're Losing Again

The central argument is a historical analogy: when offshore outsourcing became cheap enough to move code production out of the building in the early 2000s, many organizations learned — slowly and painfully — that the real cost of software was never writing the code, but understanding it. Knowledge of why a system was built a specific way, what constraints shaped its architecture, and what invariants it depends on couldn't be transferred with a spec document. AI-generated code presents the same trap in a more extreme form: the code can be syntactically correct, pass tests, and look reasonable to a reviewer who didn't write it, while being completely devoid of the design intent that makes a system maintainable. The outsourcing analogy breaks at one important point: offshore teams were composed of humans who could ask questions, participate in postmortems, and accumulate institutional knowledge over time. AI-generated code has no such trajectory — each generation is equally disconnected from the why. The author's recommendation is to treat code comprehension tooling as a first-class engineering investment rather than a nice-to-have: documentation tools, code archaeology systems, and anything that encodes rationale alongside implementation. The post resonates with the broader theme visible across this week's HN discussions about AI-generated code: the quality questions are real, but the comprehension and ownership questions may be more consequential.

Read →

Blain Smith

Just Use Go: The Case for Boring Pragmatism in Backend Language Choice

A concise pragmatic argument for Go as the default choice for backend services, grounded in Go's deliberate rejection of features rather than its accumulation of them. The case rests on three pillars: the standard library is comprehensive enough to build production HTTP services without frameworks (net/http, encoding/json, database/sql, and testing are all there); the concurrency model (goroutines at approximately 2KB each, compared to OS threads at ~8MB) makes high-concurrency services tractable without async/await ceremony; and the deployment story is a single statically compiled binary with no runtime dependency. The author's framing is that Go's 'boring' design is a feature rather than a limitation — the language has enough opinion baked in that teams stop relitigating formatting, error handling style, and package structure and start solving the actual problem. The implicit argument against alternatives: JavaScript frameworks add framework-shaped complexity before you've written a line of business logic; Rails and Django are optimized for a specific application shape that diverges quickly from anything non-standard; Rust's correctness guarantees come at an onboarding and compile-time cost that most backend services don't need. Go is conspicuously not the answer to every problem — it is a weak choice for compute-heavy workloads where Rust or C++ matter, and it lacks the expressiveness that makes functional languages elegant for certain domains. But for the median backend service, the 'boring' choice has a genuine engineering case.

Read →