The AI Engineer 18-11-2025
OpenAI GPT-5.1 updates, Google's AI privacy tools, rising AI infrastructure in Europe
📣 Headlines
• The AI firm Anthropic says Chinese state hackers used its Claude model to automate cyber espionage against around 30 organisations, a claim met with skepticism over accuracy and motives.
• OpenAI unveiled the GPT‑5.1 update with Instant, Thinking and Auto modes and improved tone and adaptive reasoning and is piloting group conversations in ChatGPT across Japan, New Zealand, South Korea and Taiwan.
• OpenAI CEO Fidji Simo outlined plans to make ChatGPT more useful and commercial—covering Pulse, APIs, ads and enterprise offerings, signalling stronger monetization ahead of compute constraints.
• Google detailed Private AI Compute, a cloud-based system securing Pixel AI features using Ironwood TPUs, SEV‑SNP and IP‑blinding relays, highlighting new on-device/cloud hybrid privacy tooling.
• KubeCon NA showcased an AI-native platform engineering revival and emphasis on CNCF conformance and observability, while the Model Context Protocol (MCP) trend is driving integration of models into CMS, CRM and analytics (https://news.crunchbase.com/ai/boring-tech-2025-mcp-rise-angerer-storyblok/).
• Cyware expanded its Quarterback AI with an AI Fabric to unify threat intelligence across security workflows and automation, reflecting growing LLM-driven security tooling adoption.
• Cloud and energy infrastructure for AI scaled up as Microsoft and Google revealed $16bn+ European AI data centre deals, while Exowatt pitched low‑cost solar‑thermal rock storage to power AI data centres.
đź”§ Company Engineering Blogs
SIMA 2: An agent that plays, reasons, and learns with you in virtual 3D worlds (deepmind​.google). SIMA 2 uses Gemini-powered reasoning to play, reason, converse, and self-improve across 3D virtual worlds and Genie-generated environments
Open Source Is Good for the Environment (engineering​.fb​.com). Open hardware and AI-driven methods for reducing emissions, featuring OCP Summit announcements and Meta’s net-zero roadmap
TypeScript, Python, and the AI feedback loop changing software development (github​.blog). AI-driven shift in language use, with TypeScript, Python, Bash rising as AI tools shape choices and productivity
Building for an Open Future - our new partnership with Google Cloud (huggingface​.co). Hugging Face and Google Cloud partner to empower open models with Vertex AI, GKE, Cloud Run, and Inference Endpoints for secure, scalable deployment
🌍 Industry & Models
Paper AI Tigers (gleech​.org). Overview of Open Chinese LLMs' frontier performance, costs, self-hosting, quantization, and market dynamics across 2024–2025
Building The Intent Engine: How Instacart is Revamping Query Understanding with LLMs (tech​.instacart​.com). Instacart uses LLMs for query understanding: taxonomy classification, query rewrites, and semantic role labeling with offline/real-time hybrid systems
What 375 AI Builders Actually Ship (tomtunguz​.com). Production AI matures with open source adoption, agent data access over chat, and focus on data, verification, and evaluation
Tracking the Evolution of R and Python Tools for GenAI (r-consortium​.org). Insights into building GenAI apps with R and Python using MCP, R packages like ellmer, and REST-based cross-language tooling
đź§ Learning & Behavior
LLMs Are Randomized Algorithms (towardsdatascience​.com). Explores randomized algorithms in LLMs, adversarial design, temperature settings, and practical usage tips with Motwani’s influence
Thinking through how pretraining vs RL learn (dwarkesh​.com). Explores Bits/FLOP, Bits/Sample, RL vs pretraining efficiency, pass rates, curriculum learning, self-play, and RLVR implications
Obvious ways RL can fail (newsletter​.danielpaleka​.com). RL limitations in math, games, and writing; reward hacking issues; need for footholds and careful reward design; examples in forecasting and planning
What Are World Models? AI's Path to Understanding Reality (rewire​.it). Explores world models in AI, from Dreamer and PlaNet to Genie and Sora, with DeepMind, OpenAI, and NVIDIA's Cosmos as context
🛠️ Local Models & Tools
Building AI with AI (speakerdeck​.com). Ines Montani explores using LLMs to build AI systems, highlighting open-source tools like spaCy and Prodigy for AI development
An AI By Any Other Name (hackaday​.com). Tiny-diffusion demo shows diffusion basics with Shakespearean output from 10.7M-parameter model
A Neat Little Thing I Learned (lukaswerner​.com). Local-model LLMs for description generation: reordering prompts to improve coherence and reduce verbosity on a 1B-scale setup (gemma3) for mark
Building a "Lawyer GPT" for Your Blog - Part 1: Introduction & Architecture (mostlylucid​.net). Building a C#-driven RAG writing assistant for a blog using embeddings, vector databases, GPUs, and local LLMs like ONNX/llama.cpp
🔎 RAG & Retrieval
How to Evaluate Retrieval Quality in RAG Pipelines (Part 3): DCG@k and NDCG@k (towardsdatascience​.com). Graded retrieval metrics DCG@k and NDCG@k for RAG pipelines; Python implementations, IDCG, and practical examples
Graph RAG: Elevating AI with Dynamic Knowledge Graphs (stackabuse​.com). Graph RAG combines dynamic knowledge graphs with entity linking and graph traversal to enhance RAG in LLMs using Python, NetworkX, and spaCy
Building Production-Grade RAG Systems: Architecture Deep Dive (aboullaite​.me). Production-grade RAG architecture in Java with Spring WebFlux, Redis caching, Weaviate/OpenSearch retrievers, KServe vLLM, and SSE streaming
How to Build Your Own Local RAG System (tuhrig​.de). Build a private local RAG system with Confluence content, Java tooling, Python Flask embedding service, and local embeddings
Hands-On: kb-bridge for Context-Aware Knowledge Base Search (egpivo​.github​.io). Hands-On guide to kb-bridge for context-aware knowledge base search using FastMCP, Python tools, and multi-stage retrieval
🤖 Agents & Hallucinations
Semantic Intelligence: Part 7 - The Real Thing! Experimenting with Directed Synthetic Evolution (mostlylucid​.net). Directed Synthetic Evolution using multi-agent LLMs (Overseer, Generator, Triage, Evaluator) for self-improving code with RAG memory and auto-evolution in Python
Agent Hallucination by Hand ✍️ (byhand​.ai). Hands-on lecture on LLMs, RAGs, agents, and hallucination mitigation inside Excel, with math intuition and by-hand exercises
Solving Amazon's Infinite Shelf Space Problem (worksonmymachine​.ai). Explores infinite shelf space, Latent Library, and browse-first generation using LLMs to create and navigate hallucinated books and citations
🔬 Research & Deep Dives
The Anatomy of the Least Squares Method, Part Four (thepalindrome​.org). Explores modeling GPT-2 activations with least squares regression in Python using PyTorch and statsmodels
Paper Review: HunyuanImage 3.0 Technical Report (andlukyane​.com). Review of HunyuanImage 3.0: Mixture-of-Experts architecture, multimodal training, and SSAE evaluation for text–image generation
A User’s Guide to FlexAttention in Flash Attention CuTe DSL (research​.colfax-intl​.com). FlexAttention in Flash Attention CuTe DSL: score_mod and mask_mod customization, block sparsity, and PyTorch/CuTe DSL integration
From Tokens to Vectors: The Efficiency Hack That Could Save AI (Ep. 294) (datascienceathome​.podbean​.com). Compressing 4 tokens into vectors to cut AI costs by 44% with open science proofs and code
📚 Academic Research
Optimizing Mixture of Block Attention (arxiv:cs). Introduces MoBA improvements and FlashMoBA CUDA kernel for small-block sparse attention, boosting long-context LLM training/inference efficiency (up to 14.7Ă—). Engineers gain scalable sparse-attention tools now
Fast and Expressive Multi-Token Prediction with Probabilistic Circuits (arxiv:cs). MtPC uses probabilistic circuits parameterized by LLM embeddings for multi-token joint prediction, enabling speculative decoding and up to 5.47Ă— throughput improvements for generation engineers today
Instella: Fully Open Language Models with Stellar Performance (arxiv:cs). Instella delivers fully open 3B LLMs trained on public data on AMD MI300X; Instella-Long/Math show strong performance, enabling reproducible high-quality open-weight models for production use
MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism (arxiv:cs). MarsRL jointly trains Solver, Verifier, Corrector with agentic RL and pipeline parallelism, improving multi-agent reasoning (AIME scores), reducing training inefficiencies for practical reasoning systems deployment
STAGE: A Symbolic Tensor grAph GEnerator for distributed AI system co-design (arxiv:cs). STAGE synthesizes symbolic tensor graphs modeling distributed LLM workloads across configurations, enabling scalable pre-deployment optimizations and hardware design-space exploration up to 32K GPUs with fidelity