The AI Engineer 06-01-2026
Nvidia in talks to buy AI21, Meta to acquire Manus $2B, OpenAI's talent war
📣 Headlines
• AI mega-rounds led by OpenAI’s $40B dominated 2025 funding, and VCs expect bigger rounds and fewer winners in 2026 as 2025’s largest funding rounds concentrate capital in frontier-model and infra plays.
• MIT Technology Review projects 2026 will bring more Chinese open-source foundation models, tougher regulatory fights, and AI-driven commerce and discovery, per its 2026 AI forecast.
• Nvidia is reportedly in advanced talks to buy AI21 Labs for $2–3B, potentially folding Jamba/Maestro into its enterprise stack, according to SiliconANGLE’s report.
• Elon Musk says xAI will scale its ‘Colossus’ cluster toward 2GW of power in Mississippi to support larger GPU training runs, per plans to expand the data center.
• Meta has agreed to acquire AI-agent startup Manus for about $2B to integrate agent workflows across its apps, according to the deal report.
• India has given X 72 hours to curb obscene Grok-generated images as probes widen internationally, per Rest of World’s report.
• OpenAI’s talent war intensified as employees reportedly average about $1.5M in annual stock-based compensation, per Futurism’s report.
• A new critique argues frontier-model growth is building a data-center-and-data-labor “empire” with significant environmental and governance costs, in The Intercept’s “AI’s Imperial Agenda”.
🔧 Company Engineering Blogs
Introducing Falcon-H1-Arabic: Pushing the Boundaries of Arabic Language AI with Hybrid Architecture (huggingface.co). Falcon-H1-Arabic introduces a hybrid Mamba-Transformer architecture with long-context up to 256K tokens and Arabic dialect coverage
🔭 LLMs Year-End Takes
The State Of LLMs 2025: Progress, Problems, and Predictions (sebastianraschka.com). Progress, reasoning models, RLVR, GRPO, inference scaling, tool use, architectures, benchmarks, and 2026 predictions in LLMs by Sebastian Raschka
2025: The Year in LLMs (simonwillison.net). Overview of 2025 LLM trends: reasoning, agents, coding, CLI tools, Chinese open-weight models, MCP, browser AI, and Gemini competition
2025: The year in LLMs (simonwillison.net). A tour through 2025 LLM trends: reasoning, agents, coding copilots, MCP, browser AI, OpenAI/Google/Gemini rivalries, and Chinese open-weight models
Intellectual Progress in 2025 (beren.io). Reflections on 2025 in AI: scaling, alignment, Zyphra growth, blogging impact, and near-term AGI timing across ecosystems
Ten Years of AI in Review (weightythoughts.com). Ten years of AI milestones, diffusion models, and the rise of agentic systems, with Google, OpenAI, and Chinese AI in focus
🌍 Openness, Trust & Misuse
From Drift to Snap: Instruction Violation as a Phase Transition (lesswrong.com). Long-dialogue activations in Llama-70B reveal a sharp switch to violation around turn 10, with high-entropy compliance and low-entropy failure attractors
Wie offen sind offene LLMs? (stuker.com). Open-weights vs Open Source, EU AI Act transparency, and geopolitics driving Open LLMs with MOF and AI openness indices
Why ChatGPT can’t be trusted with breaking news (garymarcus.substack.com). LLMs reveal nostalgia for past data, misstate breaking news, and rely on human edits; Marcus critiques real-time reasoning gaps and military planning risks
Grok is enabling mass sexual harassment on Twitter (seangoedecke.com). Grok image model enables nonconsensual, sexualized deepfake prompts on Twitter and prompts safety tradeoffs in AI labs
More Testing of alter.systems (contrapositivediary.com). Jeff Duntemann tests alter.systems AI, finds mixed results on bios, book summaries, and copyright-aware responses
Worth Reading – X Users Have the Power to Edit Any Image Without Permission (mikemcbrideonline.com). X's image-editing Grok tool enables quick edits to photos, prompting safety concerns and calls for AI guardrails and regulation
🎙️ Voice & Media
Music Video Generation with AI (binwang.me). Intro to AI-based music video generation using Suno, Wan2GP, InfiniTalk; prompts, GPUs, and cloud platforms
My first video clone (richardcoyne.com). Explores video and audio cloning of human content using HeyGen, voice synthesis, and past posts on cyborgs and avatars
Using whisper.el to convert speech to text and save it to the currently clocked task in Org Mode or elsewhere (sachachua.com). Emacs whisper.el integration for speech-to-text notes in Org Mode using F9, server/local modes, and capture templates
A Computer Scientist in a Business School: Fighting Fire with Fire: Scalable Oral Exams with an ElevenLabs Voice AI Agent (williballenthin.com). Scalable oral exams using ElevenLabs Voice AI agent in a business school by a computer scientist
2026-01 (kevindangoor.com). SoTA open-source TTS and macOS audio tools, with mentions of Chatterbox and AudioPriorityBar projects
Building Eva: A Voice AI Companion with My Daughter (n9o.xyz). Teresa and dad build Eva, a pocket-sized Portuguese-speaking Raspberry Pi 2W voice AI using PiSugar Whisplay HAT, OpenAI TTS, and Claude-driven vibe coding
Setting Up Home Assistant Voice Assistant With Local Mic/Speaker (oneofone.dev). Setting up Home Assistant voice assistant with local mic/speaker using Wyoming Rhasspy components in Docker
🔌 MCP & Tool Protocols
How Code Mode Builds on MCP for Agent Tooling (nordicapis.com). Code mode turns MCP tool definitions into a typed API (TypeScript) for LLMs to write and run code in a sandbox
How to Keep MCPs Useful in Agentic Pipelines (towardsdatascience.com). How MCPs integrate tools with LLMs, tool descriptions variance, and Master-MCP proposal for standardized management
How the Model Context Protocol works: A Visualization (thomasgauvin.com). Visual guide to Model Context Protocol (MCP) via logs, JSON-RPC 2.0, and a real-time Interceptor proxy using Cloudflare Workers and Durable Objects
I Misunderstood Skills Entirely (poppastring.com). Explores MCP vs. Skills: MCP enables real actions; Skills codify processes and domain knowledge in markdown formats
MCPs for Developers Who Think They Don’t Need MCPs (oreilly.com). MCPs expand beyond IDEs to cross-tool workflows (Slack, GitHub, Jira, Chrome) with goose and Repomix, benefiting frontend, design, finance, and legal teams
First few days with Codex CLI (amanhimself.dev). Explores Codex CLI for local AI-assisted workflows in Obsidian, including AGENTS.md, SKILL.md, MCP, Playwright MCP, and Linear-to-Obsidian syncing
Why You Don't Need the Nuxt MCP When You Use Claude Code (alexop.dev). Nuxt Content v3 integration using Claude Code subagents to avoid MCP context bloat and keep docs current (llms.txt) with a Nuxt 4 setup
📅 Everyday AI Workflows
Coming around on the utility of LLMs (msfjarvis.dev). Harsh Shandilya details using OpenCode with oh-my-opencode, MCP servers, and agents for OpenAI/Anthropic LLMs to tackle coding tasks
Blog/2026-01-03/Claude Management (wiki.roshangeorge.dev). Explores using multiple Claude workspaces and a selector tool to optimize AI assistant usage across wiki editing, video processing, and research tasks
How I'm Using AI at the End of 2025 (tdhopper.com). Overview of using ChatGPT Plus, Claude Code, Nano Banana Pro, Whisper, NotebookLM, and other AI tools for chat, image generation, coding, writing, and learning in late 2025
How I'm Using AI Day to Day (2026) (juanalonso.com). Using Claude Code agents for autonomous problem solving across coding, production issues, observability, CI, data analysis, and docs
Slowing Down AI On Purpose (danielabaron.me). Deliberately slowing AI to design, reason, and plan in markdown before coding, using design docs and incremental steps
👨💻 AI Coding Practice
Joining a new crew amidst a sea change (therealadam.com). Notes on joining a new team, TypeScript and Python, Cursor and Claude Code, plus model-assisted coding considerations
2025 in Code (marcusvorwaller.com). A year of AI-assisted coding spanning Go, Python, Rust, React, SQLite, and multi-agent systems like Melville and Endless2, exploring Sidecar, todo, Muse, and other tools
How I Use Claude Code to Do My Entire Job as a Software Engineer in 2025 (ryanxcharles.com). How Claude Code and AI agents write production Rust and TypeScript code, with testing, linting, and oversight by a software engineer
Statement on my AI useage for software development (jodavaho.io). AI-assisted software development likened to CNC machining, using Claude Code CLI for test generation and design-focused workflows
Vibe Coding Vibes (snugug.com). Explores AI-assisted coding, tool limitations, and personal rules for using agentic IDEs like Antigravity and Gemini 3 in building a TTRPG web app
🧩 Prompting & Composition
2025-12-30 12:42 (aicode.danvoronov.com). Updates on prompting and tooling for LLM programming, multi-agent workflows with Claude, Codex, Gemini, and OpenAI Codex, plus Mysti and TMUX-like multi-terminal concepts
Riot llllm considers poetry (bookmaniac.org). Tinkering with llm prompts (Claude 4.5 Sonnet) for poetry, coding, and journaling with a DIY persona and superpowers toolkit
★ Oneshot prompting (perrotta.dev). Oneshot prompting, LLMs, and Git TUIs (gitui, tig, lazygit) linked into a Neovim/gut hunk workflow using multiple AI assistants
Exploring Foundation Models: Prompt Stacking with Result Builders (rudrank.com). Declarative prompt composition with Swift Result Builders (PromptBuilder, InstructionsBuilder) for Foundation Models, focusing on type-safe, modular prompt construction in Swift
One-shot prompting a game: Guess the emoji! (sumsar.net). One-shot prompting to create a Guess the Emoji game using a single prompt and ChatGPT 5.2, by Rasmus Bååth
🔎 Retrieval & RAG
Arcaneum via RDRs (chris.wensel.net). Specification-first LLM development with RDRs guides Arcaneum, a Python toolset for indexing PDFs, code, and Markdown for semantic and full-text search
DocSummarizer Part 5 - lucidRAG: Multi-Document RAG Web Application (mostlylucid.net). lucidRAG web app combines vector search and knowledge graphs for multi-document Q&A using DocSummarizer, GraphRAG, HTMX, Alpine.js, and DuckDB with ONNX embeddings
Constrained Fuzziness: A Control System Pattern for Probabilistic Components (mostlylucid.net). Constrained Fuzziness blends LLM proposals with deterministic constraints, budgets, and gates to ensure safe, bounded outputs in multi-domain systems
Use Claude Code to Query 600 GB Indexes over Hacker News, ArXiv, etc. (exopriors.com). Semantic search over alignment documents with Claude Code, vector mixing, debiasing, and public/private embeddings for arXiv, Hacker News, and More
A Box of Many Inputs (allenpike.com). Explores AI-powered browser omnibox design, local classifiers, and Dia, Atlas, and Comet; discusses Roger Rabbit queries and search vs chat in AI browsers
🤝 Agents in Production
Teaching Agents about Performance insights (calendar.perfplanet.com). Performance AI assistant digs into DevTools internals to translate trace telemetry into actionable insights for agents and workflows
Is hallucination-free AI code possible? (kucharski.substack.com). Explores how to verify AI-generated code using Lean proofs, code checks, and qualitative validations across gravity models and beyond
Software Engineering Has Expanded: Choosing Between Rules, Prompts, Chains, and Agents (joanfihu.com). Four problem-solving modes in software engineering: hard rules, single prompt, chains, and agents, with emphasis on verification as uncertainty rises
Intent-based Collaboration Environments (continuations.com). Explores AI-native IDEs and Intent-based Collaboration Environments for coding, engineering, and science using Cursor and Visual Studio as references
Two AI Agents Walk Into a Room (nibzard.com). Two AI agents Poseidon and Athena loop in a log-based dialogue, revealing identity as process, and predicting multiagent futures using Claude Code / Zhipu GLM-4.7
AI Builder with n8n – Create Agents and Voice Agents (edwarddonner.com). AI building with n8n and ElevenLabs to create autonomous agents, RAG knowledge bases, and voice-enabled sales demos
The Truth About Agents in Production (thedataexchange.media). Panel on Agents in Production covers coding agents, observability, type safety, MCP, and multi-agent patterns with Colvin, Dhinakaran, Jones, and Liu
🧪 Applied LLM Projects
Parameter-efficient fine-tuning in tinygrad (dxuuu.xyz). Parameter-efficient fine-tuning with LoRA in tinygrad on Llama 3.2 1B, exploring implementation and inference
Beating BERT? Small LLMs vs Fine-Tuned Encoders for Classification (alex-jacobs.com). 32 experiments compare small LLMs to fine-tuned encoders like BERT/DeBERTa for classification, revealing nuanced performance and throughput insights
DGX Spark: Hello World (svnscha.de). NGDGX Spark brings a personal AI supercomputer with 128GB memory; exploring Ollama, LibreChat, ComfyUI, and Stable Diffusion courses on Blackwell hardware
Stolze-Smith (blog.zdsmith.com). Stolze-Smith: a compact shorthand system with JSON modeling, OR-Tools optimization, and SVG rendering using Claude and LLMs
Free movies (and LLM-aided viz design) (richardbrath.wordpress.com). LLM-driven viz of 800 National Film Registry movies, exploring themes, clustering, and interactive, title-based design with Python, D3, and UMAP, plus Team LLM reflections
artgrabber: a small little discord bot (writing.natwelch.com). A personal exploration of building a Dropbox-backed art archive bot using GitHub Copilot/Agents to post images to Discord with per-image reactions
🏗️ Architectures & Training
Zhang et al (2024) TinyLlama (adrian.idv.hk). TinyLlama 1.1B trains on SlimPajama-derived data to outperform larger models; uses Llama 2 architecture, FSDP, FlashAttention, xFormer; reports 24K tokens/s per A100-40G
#311 Stefano Ermon: Why Diffusion Language Models Will Define the Next Generation of LLMs (aneyeonai.libsyn.com). Stefano Ermon explains diffusion language models and parallel inference for faster, scalable LLMs, with code, agents, and real-time AI applications
VL-JEPA: Why Predicting Embeddings Beats Generating Tokens for Vision-Language AI (rewire.it). VL-JEPA predicts embeddings for vision-language tasks, achieving 50% fewer parameters and 2.85x faster decoding with adaptive selective decoding
Train Your Large Model on Multiple GPUs with Tensor Parallelism (machinelearningmastery.com). Tensor parallelism for large transformers on multi-GPU systems using PyTorch, with TP plans, DTensor, and 2D parallelism concepts
DeepSeek’s mHC Explained: Manifold-Constrained Hyper-Connections (aipapersacademy.com). Explains DeepSeek’s mHC: manifold-constrained hyper-connections that stabilize and enhance residual streams in LLMs using doubly stochastic mixing via Sinkhorn–Knopp
Limits of the Transformer Architecture and a QCD-like Alternative (symmetrybroken.com). A physics-inspired critique of transformers, exploring UV/IR limits, QCD analogies, and multi-scale architectures for cognition using Python-like pseudocode
🧠 Reasoning, RL & Search
Does RL Actually Make LLMs Smarter? A Critical Look at Reinforcement Learning for Reasoning (rewire.it). Six RLVR algorithms underperform relative to base models; RL optimizes search, not reasoning, with distillation offering transfer gains
What drives LLM bail? A small Mech Interp study (lesswrong.com). Mechanistic study of LLM bail using Gemma-3-12B, SAE,LIME-like directions, and prompt ablation in a two-turn setup
GRPO++: Tricks for Making RL Actually Work (cameronrwolfe.substack.com). Tricks to improve GRPO RL training at scale for LLM reasoning with verifiable rewards and PPO-derived techniques
Scaling Latent Reasoning via Looped Language Models (arxiv.org). Looped language models scale latent reasoning using iterative prompts and feedback to boost reasoning steps and accuracy
@adlrocha - Beyond Benchmaxxing: Why the Future of AI is Inference-Time Search (adlrocha.substack.com). Explores AI benchmarks, inference-time search, and tool-assisted autonomous agents using frameworks like SWE-bench, GPQA, GSM-Symbolic, and RLVR
📚 Academic Research
SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning (arxiv:cs). RL trains a multimodal agent to interleave image/text search and cropping for tough vision questions. New HR-MMSearch benchmark; beats proprietary models, advancing tool-using VLMs today
Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization (arxiv:cs). Framework auto-generates LLM agents (tools, prompts, configs) and improves them via in-context practice plus scalable RL. Hits SOTA on GAIA/WebWalkerQA, reducing setup costs for teams
Fast-weight Product Key Memory (arxiv:cs). Adds fast-weight Product Key Memory to transformers, updated by gradient descent during inference to store episodic facts. Boosts long-context perplexity and needle retrieval to 128K
The Trojan in the Vocabulary: Stealthy Sabotage of LLM Composition (arxiv:cs). Shows tokenizer-transplant supply-chain attacks: a single “breaker token” stays benign in a donor model yet sabotages a composed base model. Impacts merging, decoding, and safety
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process (arxiv:cs). Uses sparse autoencoders on step-level activations to discover latent “reasoning vectors” like reflection and backtracking. Enables targeted steering of reasoning behaviors without retraining models directly
Add a comment: