The AI Engineer logo

The AI Engineer

Subscribe
Archives
October 14, 2025

Generative AI newsletter

Blaze Email

Generative AI

Blaze Logo 2025-10-14 • read online • patreon

📣 Headlines

• Anthropic’s Petri launches as an autonomous-agent safety lab that stress-tests 14 leading LLMs for behaviors like deception, power-seeking, and tool misuse to empirically map risk boundaries.

• Samsung unveils a 7M-parameter Tiny Recursive Model that uses recursive reasoning and adaptive halting to beat much larger LLMs on Sudoku, Maze-Hard, and ARC-AGI puzzles.

• Atlassian upgrades Rovo AI with skills, Canvas, personal memory, and Studio enhancements, plus new developer tooling across Jira, Confluence, and Bitbucket to build enterprise agents and RAG workflows.

• Windows Copilot can now generate Word, Excel, PowerPoint, and PDF files from chat and connect to Outlook/Gmail accounts, pushing deeper OS-level automation for knowledge work.

• The UK regulator granted Google “strategic market status” , opening the door to mandated changes in search, ads, and AI answer surfaces, with reports indicating Google may be forced to alter UK search .

• Former UK PM Rishi Sunak became a senior adviser to both Microsoft and Anthropic, underscoring tighter links between AI governance and frontier lab strategy.

• In seismology, AI models detect smaller earthquakes faster , improving phase picking and subsurface imaging in noisy environments for volcano and fault monitoring.

• With publishers locking data and policies shifting, the AI training “gold rush” is slowing , forcing model builders to renegotiate licensing, synthetic data strategy, and provenance controls.

🔧 Company Engineering Blogs

Over Palantir (blog​.palantir​.com) . Palantir uitleg over data ownership, privacy-by-design, governance, and ethics in AI, with ICE contract context and European data sovereignty

Introducing the Gemini 2.5 Computer Use model (deepmind​.google) . Gemini 2.5 Computer Use model enables UI-interacting agents via Gemini API with low latency for web and mobile tasks

From Single-Node to Multi-GPU Clusters: How Discord Made Distributed Compute Easy for ML Engineers (discord​.com) . Discord details building a Ray-based ML platform with CLI, Dagster + KubeRay orchestration, and X-Ray observability for multi-GPU training

Engineering Real-Time Multimodal AI Pipelines: Scaling File Processing to 50M Daily Uploads (engineering​.salesforce​.com) . Real-time multimodal AI pipelines for 50M daily uploads: file processing, validation, base64 grounding, and cross-platform prompts

How to build reliable AI workflows with agentic primitives and context engineering (github​.blog) . Three-layer agentic framework using Markdown prompts, agentic primitives, and context engineering to build reliable AI workflows with Copilot CLI and APM

💻 Local Inference and Open Source

Running Llama 3.1 8B Locally (LangChain and SQLite) (confessionsofadataguy​.com) . Local Llama 3.1 8B with Ollama, LangChain, SQLite; Python uv toolchain; RAG indexing with FAISS; terminal chatbot on a laptop

R port of llama2.c (thierrymoudiki​.github​.io) . R port of llama2.c with Shiny app, installation steps, and API access for educational use

Kumru LLM (medium​.com/vngrs) . Kumru LLM: a 7.4B Turkish decoder-only model trained from scratch for in-house deployment with 8,192 context, 300B tokens, and 16GB GPUs

Integrating Ollama with Python: REST API and Python Client Examples (glukhov​.org) . Connecting Python apps to Ollama via REST API and Python client for chat, generate, and thinking models like qwen3

Fine-Tuning Gemma 3n for Speech Transcription (debuggercafe​.com) . Fine-tuning Gemma 3n for German speech transcription using Unsloth and evaluating with WER

🏗️ Production Agents and Strategy

The Infrastructure for Production AI (thedataexchange​.media) . Zhen Lu discusses AI-first clouds, production use cases, GPU reliability, and agent-driven software at The Data Exchange

I'm Writing a Book on Production-Grade Agentic AI (And You Can Read It Now) (aroussi​.com) . Explores production-grade agentic AI, memory management, orchestration, observability, and deployment patterns with LeanPub chapter-by-chapter release

AI Apps -> Agent Labs (akashbajwa​.co) . Agent Labs vs Model Labs: product-first AI apps, RL incentives, data moat, and developers' shift toward vertical integration

dead framework theory (aifoc​.us) . Explores how React dominates as platform, LLMs and training data create a dead framework effect, and implications for new frameworks, tools, and browser features

Debugging DSPy token usage and prompts (danielcorin​.com) . Debugging DSPy token usage, prompts and LM configurations across Gemini, GPT-5, and OpenAI APIs

📏 LLM Evaluation and Benchmarks

Who watches the watchers? LLM on LLM evaluations (stackoverflow​.blog) . LLMs judge LLM outputs at scale using golden datasets, teacher models, and ProLLM; StackOverflow data informs evaluation benchmarks

Importance of offline evaluation to guide model choice (tech​.olx​.com) . OLX compares open embedding models with internal Item2Vec using MTEB benchmarks, fine-tuning, and offline evaluation for multilingual recall

Inspect AI (alexdong​.com) . Inspect AI: exploring a Petri Alignment plugin, Inspect AI scaffolding, and extending evaluation workflows with typed, well-documented code

Comparison: Qwen3:30b vs GPT-OSS:20b (glukhov​.org) . Tech benchmark comparison of Qwen3:30b, Qwen3:30b-instruct, Qwen3:30b-thinking vs GPT-OSS:20b across speed, context windows, and token benchmarks

🧭 Smarter RAG and Reranking

Meta Superintelligence's surprising first paper (paddedinputs​.substack​.com) . MSI's REFRAG enables 30x faster TTFT in RAG by using chunk embeddings and a lightweight RL policy to expand select chunks

Using Language Engineering to Build a Smarter RAG for Code (tomassetti​.me) . Using parsers and symbol resolvers to build a smarter RAG for code with LangChain4J and a Parser-based CodeSplitter

Why using a reranker? (zansara​.dev) . RAG with bi-encoders and cross-encoders, reranking strategies, distillation, late interaction (ColBERT), listwise reranking, caching, and hybrid architectures

Why did Meta’s superintelligence team publish an obscure paper? (tornikeo​.com) . Meta's MSI publishes REFRAG, a fast retrieval-augmented generation method that speeds RAG 30x without accuracy loss for business-scale document search

Cross Talk (joecooper​.me) . Markov text generation, DeBERTa-based reranking, and OCR-like text sorting for multiturnChat on a 3090, OpenSubtitles data, and bespoke quality-control models

What Problem Is Traditional RAG Solving? (gojiberries​.io) . Traditional RAG uses pre-chunked text and embedding‑based search for fast, small‑evidence reasoning on uniform, time‑neutral corpora

🧠 Long-Context and KV Caches

From 2K to 2M+ Tokens: The Long-Context Frontiers of GenAI (medium​.datadriveninvestor​.com) . Long-context LLMs, Lost-in-the-Middle, RAG, position engineering, prompt compression, LIFT, and agentic RAG for reliable reasoning

Recurrence and Attention for Long-Context Transformers with Jacob Buckman - #750 (twimlai​.com) . Long-context transformers with Jacob Buckman; windowed attention, grouped query attention, latent space attention, Power Retention, and Vidrial/PowerCoder open-source projects

KV Cache Optimization via Multi-Head Latent Attention (pyimagesearch​.com) . KV Cache optimization with Multi-Head Latent Attention (MLA) reduces KV cache memory in transformers for long-context inference

The Best Choice for AI Inference: vLLM (terrytangyuan​.github​.io) . vLLM enables open-source, memory-efficient LLM inference with KV-Cache, KV-Cache, PagedAttention, and multi-parallelism; llm-d orchestrates distribution on OpenShift AI

🔬 Training, Internals, and Theory

Replacing RL w/ Parameter-based Evolutionary Strategies (lesswrong​.com) . Parameter-based evolutionary strategies (ES) scale to billion-parameter models for fine-tuning LLMs, using distributional weight perturbations and reward normalization

LLM Poisoning [1/3] - Reading the Transformer's Thoughts (synacktiv​.com) . Explores Transformer internals, FFN key–value memory, trigger detection in pre-down MLP activations, and causal tracing for hidden knowledge in LLMs

Revisiting Karpathy’s 'The Unreasonable Effectiveness of Recurrent Neural Networks' (gilesthomas​.com) . Explores Karpathy's 2015 RNN post, contrasts vanilla RNNs with LLMs, discusses byte-level inputs, training via truncated BPTT, and PyTorch vs Lua Torch implementations

modded-nanogpt world record: Decoupling embedding size from model dimension (snimu​.github​.io) . Modded-NanoGPT uses multiple input embeddings with learned layer-wise weights to decouple embedding size from model dimension

Book Review: Time Series Forecasting using Foundation Models (sujitpal​.blogspot​.com) . Book review surveys seven Foundation Models for time series forecasting, with zero-shot, fine-tuning, probabilistic forecasts, anomaly detection, and a capstone project

📚 Academic Research

To Sink or Not to Sink: Visual Information Pathways in Large Vision-Language Models (arxiv:cs) . Analyzes ViT attention sinks to reveal high-norm visual tokens guiding LLM reasoning in LVLMs and proposes training-free and training-based utilization methods

Tiny-R1V: Lightweight Multimodal Unified Reasoning Model via Model Merging (arxiv:cs) . Tiny-R1V: a 3B lightweight multimodal model using LIPO reinforcement learning and AMM model merging for unified reasoning across tasks

Spotlight on Token Perception for Multimodal Reinforcement Learning (arxiv:cs) . Visually-Perceptive Policy Optimization (VPPO) reweights and focuses updates on tokens with high visual dependency for multimodal RLVR in LVLMs

ASPO: Asymmetric Importance Sampling Policy Optimization (arxiv:cs) . ASPO corrects importance sampling in OSRL for LLMs by flipping IS ratios of positive-advantage tokens and introducing soft dual-clipping

On the Representations of Entities in Auto-regressive Large Language Models (arxiv:cs) . Entity mentions, multi-token encoding, and relational knowledge in autoregressive LLMs via task vectors and the Entity Lens

👋 Before you go

I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can. That's why I'm launching a Patreon page! . Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:

  • Real say in how Blaze evolves — vote on new topics, features, topic curation ideas
  • First dibs on merch (details still cooking)
  • That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing

If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free. Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.

Have an idea for how blaze could be better? Please visit the feedback form to let us know. To update your preferences, or to unsubscribe, please go to blaze.email/unsubscribe .

Don't miss what's next. Subscribe to The AI Engineer:
Start the conversation:
Bluesky Mastodon LinkedIn
Powered by Buttondown, the easiest way to start and grow your newsletter.