The AI Engineer 25-11-2025

        November 25, 2025

The AI Engineer 25-11-2025
Yann LeCun leaves Meta, quantum physicists de-censor DeepSeek R1 Slim, AI market heats up (again)

            📣 Headlines
•  Yann LeCun is leaving Meta to create an independent AI research entity, pursuing visual-learning-based 'advanced machine intelligence' and world models outside the dominant LLM paradigm.

•  Quantum physicists at Multiverse Computing have released DeepSeek R1 Slim, a 55% smaller tensor-network-based variant of DeepSeek R1 that removes built-in censorship to more freely answer sensitive questions.

•  A German court has ruled that OpenAI's training of ChatGPT on song lyrics amounts to plagiarism, hinting that future generative AI models may be forced into explicit licensing and revenue-sharing arrangements with rights holders.

•  Top AI leaders are warning of an overheated market, with Google DeepMind co-founder Demis Hassabis and Google CEO Sundar Pichai both cautioning that massive valuations, energy demands, and hype mean no major company would be immune if an AI bubble bursts.

•  Even amid bubble concerns, capital is surging into AI infrastructure as Databricks reportedly seeks fresh funding at a $130B+ valuation on the strength of its AI data and agent platform, while infrastructure-focused founders argue that compliant, scalable data and tooling layers will drive the next phase of enterprise AI deployment in regulated sectors](https://news.crunchbase.com/ai/wave-infrastructure-phase-connell-cleve-atomico-corti/).

•  The UAE’s MBZUAI has launched a Silicon Valley lab dedicated to building open-source foundation models, using sovereign compute and cross-border partnerships to position itself as a global rival to OpenAI and DeepMind.

•  Policy experts are scrutinising how AI should interact with nuclear decision-making, with new analysis warning that automation bias and overreliance on models in strategic warning systems could be more dangerous than fully autonomous launch scenarios.

•  Roblox is treating its child-safety challenges as an opportunity for AI-driven content moderation within its massive gaming platform, while simultaneously moving to mandate facial age verification and age-based chat controls for all users worldwide, tightening the link between AI safety systems and real-world identity checks.

🔧 Company Engineering Blogs

LyftLearn Evolution: Rethinking ML Platform Architecture (eng.lyft.com). LyftLearn evolves to a hybrid architecture, using SageMaker for offline training and Kubernetes for online serving, with compatibility layers and AWS integration

Background Coding Agents: Context Engineering (Part 2) (engineering.atspotify.com). Context engineering for autonomous coding agents using Claude Code, prompts, and limited tools to migrate large codebases at Spotify

Zoomer: Powering AI Performance at Meta’s Scale Through Intelligent Debugging and Optimization (engineering.fb.com). Meta's Zoomer automates AI performance profiling and optimization across training and inference with Kineto, DCGM, Strobelight, and dyno telemetry

Solving Real-Time AI Classification for Agentforce: How Single-Token Prediction Delivers 30x Faster Agent Responses (engineering.salesforce.com). HyperClassifier: a small language model delivering 30x faster Agentforce classifications with single-token prediction

Evolving GitHub Copilot’s next edit suggestions through custom model training (github.blog). NES next-edit suggestions in GitHub Copilot trained with custom data, RL, and in-editor UX for VS Code

🌍 Applied AI Use Cases
Generative AI in the Real World: The LLMOps Shift with Abi Aryan (oreilly.com). LLMOps shift from MLOps, FinOps, agentic systems, observability, and data engineering with Abi Aryan on AI hardware, costs, and architecture

How to use NotebookLM: A practical guide with examples (geshan.com.np). Guide to using NotebookLM for research with sources, audio/video overviews, mind maps, quizzes, and a real job-search example

Artificial Insurance (findthethread.blog). AI adoption skepticism, RAG, and insurer risk as GenAI use expands with cautions on hallucinations and product safety

📦 Open Models and Infrastructure
Olmo 3 is a fully open LLM (simonwillison.net). Ai2's Olmo 3 32B-scale open LLM with OlmoTrace data tracing and full training data release

Olmo 3: America’s truly open reasoning models (interconnects.ai). Olmo 3: AI2 unveils open 7B and 32B base models focused on reasoning and instruct capabilities with post-training flows and RL Zero strategies

How LLM Inference Works (arpitbhayani.me). Explains LLM inference, tokenization, embeddings, transformers, KV cache, precision, and serving frameworks like vLLM and TensorRT-LLM

Pretraining at home: 20B tokens from 222 hours to 12 (hackbot.dad). How to pretrain a 1B Llama-3.2 model on 20B tokens in under 12 hours using BF16, Flash Attention, and distributed data parallel training

AI Infrastructure on Consumer Hardware (glukhov.org). Deploy self-hosted AI infrastructure on consumer GPUs with Ollama, vLLM, and LocalAI using LoRA fine-tuning and Open-Source models

🕹️ Agents and Multi-Model Systems

Agent design is still hard (simonwillison.net). Agent design is hard as abstractions fail; reinforcement and testing prove tricky for tools, prompts, and synchronization

"LLM Council" (languagelog.ldc.upenn.edu). LLM Council gathers multiple LLMs via OpenRouter to compare, review, and rank outputs for final synthesis

How to Perform Agentic Information Retrieval (towardsdatascience.com). Agentic information retrieval using RAG, keyword search, and AI agents with GPT-5-like tooling and vector stores in Python

📚 RAG and Web Retrieval

Building Production-Grade RAG Systems: Kubernetes, Autoscaling & LLMs (aboullaite.me). Production-grade RAG with Kubernetes, autoscaling, KServe, KEDA, Prometheus, Grafana, and OpenTelemetry in Java on Kubernetes

Advanced RAG: LongRAG, Self-RAG and GraphRAG Explained (glukhov.org). LongRAG, Self-RAG and GraphRAG: advanced RAG variants using long-context chunks, meta-reflection, and knowledge graphs with Python-style pseudocode

RAG Explained: Origins and Fundamentals (mostlylucid.net). RAG fundamentals, origins, three-phase workflow, and concrete C# examples for retrieval-augmented generation

RAG Architecture and Internals: How It Really Works (mostlylucid.net). RAG architecture deep dive: indexing, chunking, embeddings, vector stores, retrieval, LLM internals, KV caches, and practical token management with C# examples

“WAG (Web-Augmented Generation) for Not Quite Dummies” on the Pure AI Web Site (jamesmccaffreyblog.com). WAG uses web search to augment LLMs like GPT and Llama for recent information in a web-enabled workflow

🧪 AI Research Frontiers

'Unified FP8: Moving Beyond Mixed Precision for Stable and Accelerated MoE RL' (lmsys.org). Unified FP8 for RL sampling and training in MoE models boosts stability and speed, aligning training/rollout with TE, Megatron, and SLIME tools

How Relevance Models Foreshadowed Transformers for NLP (towardsdatascience.com). RM1 relevance modelling to Transformers: tracing attention’s roots with Lavrenko and Croft, Python demo for RM1

Group Relative Policy Optimization (GRPO) (cameronrwolfe.substack.com). GRPO explains Group Relative Policy Optimization as a simplified, scalable RL optimizer for LRMs, contrasting RLHF PPO with verifiable rewards in math/code domains

How Language Models Actually Think (thedataexchange.media). Emmanuel Ameisen of Anthropic discusses LLM hallucinations, internal reasoning, and practical engineering tips for developers

NeurIPS Spotlight 2: Diffusion Models Meet Lie Groups (quantumformalism.substack.com). Diffusion models on Lie group representations explored; GitHub implementation; QF Academy bootcamp links and AI-content quality notes

What’s the deal with RL and forecasting? (newsletter.danielpaleka.com). Forecasting with RL, reasoning plus tool use, retrieval vs. reasoning, datasets, Polymarket profits, DeepSeek-R1, GRPO, CommonCrawl, Kalshi

📚 Academic Research

OrdMoE: Preference Alignment via Hierarchical Expert Group Ranking in Multimodal Mixture-of-Experts LLMs (arxiv:cs). OrdMoE turns MoE routers’ internal expert scores into a free preference signal, ranking experts to generate ordered responses for alignment. It offers scalable, zero-label preference training for large multimodal models

EvoLMM: Self-Evolving Large Multimodal Models with Continuous Rewards (arxiv:cs). EvoLMM self-improves multimodal reasoning using only raw images, with proposer–solver agents rewarding internal consistency instead of human labels. It demonstrates practical, fully unsupervised RL pipelines for upgrading existing vision-language models

OrdMoE: Preference Alignment via Hierarchical Expert Group Ranking in Multimodal Mixture-of-Experts LLMs (arxiv:cs). SkyRL-Agent provides a high-throughput, asynchronous RL framework for multi-turn tool-using LLM agents, plus an SWE agent hitting strong SWE-Bench scores. It’s directly relevant for engineers training production-grade code and research agents

Be My Eyes: Extending Large Language Models to New Modalities Through Multi-Agent Collaboration (arxiv:cs). BeMyEyes connects a lightweight vision model as perceiver with a powerful text-only LLM reasoner through supervised multi-agent collaboration. It offers a practical blueprint for cheaply adding multimodal capabilities without retraining giant VLMs

MammothModa2: A Unified AR-Diffusion Framework for Multimodal Understanding and Generation (arxiv:cs). MammothModa2 couples autoregressive semantic planning with diffusion-based image generation in one unified multimodal system. Engineers gain an industry-scale, end-to-end recipe for high-fidelity generation, editing, and strong visual understanding in a single architecture

👋 Before you go...
I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can, by joining the Patreon page. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. 
If you are getting value from blaze, checking this out would mean the absolute world. But if you can't contribute, no worries - the newsletters keep coming either way. Thanks for reading and being part of this nerdy corner of the internet. All the best for the coming week - Alastair.

                                Don't miss what's next. Subscribe to The AI Engineer:

          Add a comment