Thursday, 26 March 2026

ARC-AGI-3 launches interactive AGI benchmarks, GitHub Copilot begins training on Free and Pro user code, and electric motorcycles are riddled with critical security vulnerabilities

Today's Lead

ARC Prize

ARC-AGI-3 Introduces Interactive Reasoning Benchmark for AI Agents

ARC-AGI-3 introduces an interactive reasoning benchmark designed to evaluate how well AI agents learn and adapt in novel environments — shifting from static puzzles to dynamic challenges. It measures skill acquisition, long-horizon planning with limited feedback, and experience-driven adaptation. Unlike previous ARC benchmarks that tested pattern recognition on fixed grids, ARC-AGI-3 focuses on agents that can explore unknown environments, dynamically build world models, and acquire goals on the fly — setting a new bar for what it means to progress toward artificial general intelligence.

Read →

Also today

GitHub Blog

GitHub Copilot Will Use Free and Pro User Code for AI Training Starting April 24

Starting April 24, 2026, GitHub will use interaction data — inputs, outputs, code snippets, and surrounding context — from Copilot Free, Pro, and Pro+ users to train its AI models. Copilot Business and Enterprise users are not affected. Users who previously opted out retain their preference; others can opt out via the Privacy settings. GitHub cites meaningful improvements from using Microsoft employee data (e.g., increased code suggestion acceptance rates) as justification for the policy change. The data may be shared with GitHub affiliates including Microsoft, but not with third-party AI model providers.

Read →

Persephone Karnstein (Personal Blog)

Zero Days: Electric Motorcycles Are a Security Nightmare

A security researcher exposed critical vulnerabilities in Zero Motorcycles' connected systems, including hardcoded SHA-512 hashes for firmware authentication, credentials exposed in the mobile app, and a fully unauthenticated CAN bus accessible via the OBD-2 port. Practical exploits demonstrated included Bluetooth pairing attacks and arbitrary firmware injection into a safety-critical vehicle. Remediation only began after 13+ months and escalation through CERT/CC. The findings highlight fundamental cryptographic failures across an entire product line and serve as a warning for the broader connected-vehicle industry.

Read →

Mario Zechner (Personal Blog)

Thoughts on Slowing Down When Using AI Agents

Mario Zechner — creator of the Pi agent framework used by OpenClaw — argues that AI coding agents produce brittle and unmaintainable systems when deployed without sufficient human oversight. The core problem: agents repeat mistakes without learning, lack holistic architectural understanding, and accumulate technical debt faster than humans can review. Where a human developer might introduce a handful of compounding mistakes per day, an orchestrated agent army has no natural bottleneck. Zechner advocates deliberately slowing development by hand-writing anything that defines system architecture, setting daily limits on agent-generated code, and preserving the human loop — not to oppose AI, but to stay in control of what's actually being built.

Read →

David Hühnlein (bugs.xdavidhu.me)

Running Tesla Model 3's Computer on a Desk Using Salvaged Parts

A security researcher built a working Tesla Model 3 bench setup for ~$400–500 using salvaged MCU, autopilot computer, and touchscreen components sourced from eBay crash listings. Once assembled and powered, the system booted the Tesla OS and exposed SSH and diagnostic API services. The biggest practical hurdle was sourcing a specialized Rosenberger connector, which required purchasing an entire dashboard wiring harness. The setup serves as an affordable, isolated environment for vehicle software security research and bug bounty hunting.

Read →

Trail of Bits

Trail of Bits Releases Dimensional Analysis Plugin for Claude Code

Trail of Bits released a Claude Code plugin that uses LLMs to annotate codebases with dimensional types and detect unit mismatches in arithmetic-heavy projects. Rather than asking the model to find bugs directly, it uses the LLM as an annotation engine — labeling values with their physical dimensions (e.g., D18{price}, D18{1}) and then flagging mismatches mechanically. In testing against real audit findings, the plugin achieved 93% recall compared to 50% for baseline prompts. It's particularly useful for smart contracts and blockchain systems where arithmetic errors carry critical security consequences. The plugin is available via the Claude plugin marketplace.

Read →

Electrek

Sodium-Ion EV Battery Delivers 11-Minute Charging and 450 km Range

BAIC's Aurora sodium-ion battery achieves over 170 Wh/kg energy density and 4C charging — enough to fully recharge in approximately 11 minutes — while delivering a 450 km CLTC driving range. The battery also performs reliably in extreme temperatures, maintaining over 92% capacity at -20°C, making it viable across climates. Sodium-ion avoids lithium and cobalt, which are both expensive and geopolitically constrained. Global sodium-ion shipments reached 9 GWh in 2025 (up 150% year-over-year), with industry projections exceeding 1,000 GWh within four years.

Read →

Drew DeVault (drewdevault.com)

Drew DeVault Forks Vim to Preserve a Version Free of AI-Generated Code

Drew DeVault announced Vim Classic, a fork of Vim based on version 8.2.0148, created to maintain a version of the editor untouched by AI-assisted contributions. Citing concerns about generative AI's environmental impact, labor exploitation, and role in spreading misinformation, DeVault decided to fork rather than accept AI-generated patches upstream. The fork welcomes bug fixes and security updates but keeps changes minimal and deliberate. The announcement sparked significant discussion in the community about open source governance and contributor screening in the age of AI-generated code.

Read →

ngrok Blog

Quantization from the Ground Up: Running LLMs on a Laptop

This technical explainer walks through quantization — the technique that compresses large language models by converting 32-bit floating-point weights to lower-precision integers. Using Qwen 3.5 9B as a reference, the article shows that 8-bit quantization cuts model size by ~66% with negligible quality loss and faster inference, while 4-bit quantization reduces it further at a 5–10% accuracy trade-off but enables running capable models on personal laptops. The piece covers the mechanics of scale factors, zero-point calibration, and per-channel quantization, making it a useful primer for developers who want to self-host models without relying on cloud inference.

Read →

Jeff Johnson (lapcatsoftware.com)

Apple Systematically Closes Bug Reports by Pressuring Developers to Verify Unfixed Issues

Developer Jeff Johnson documents Apple's practice of pressuring developers to verify bug fixes in beta releases within tight deadlines — then closing reports if they don't respond, regardless of whether the issue was actually fixed. In one case, Apple demanded Johnson verify a three-year-old privacy bug within two weeks; the bug remained present in the final public release. Johnson argues this policy artificially deflates Apple's open bug count, masking systemic software quality problems rather than resolving them. The post resonated widely with Apple developers who have observed similar patterns.

Read →