Generative AI newsletter

                        October 14, 2025

            Generative AI newsletter

   Blaze Email

               Generative AI

               2025-10-14

                •  read online

                •  patreon

            📣 Headlines

            •

             Anthropic’s Petri

            launches as an autonomous-agent safety lab that stress-tests 14 leading LLMs for behaviors like deception, power-seeking, and tool misuse to empirically map risk boundaries.

            •  Samsung unveils a 7M-parameter

             Tiny Recursive Model

            that uses recursive reasoning and adaptive halting to beat much larger LLMs on Sudoku, Maze-Hard, and ARC-AGI puzzles.

            •  Atlassian upgrades

             Rovo AI

            with skills, Canvas, personal memory, and Studio enhancements, plus new developer tooling across Jira, Confluence, and Bitbucket to build enterprise agents and RAG workflows.

            •  Windows

             Copilot

            can now generate Word, Excel, PowerPoint, and PDF files from chat and connect to Outlook/Gmail accounts, pushing deeper OS-level automation for knowledge work.

            •  The UK regulator granted Google

             “strategic market status”

            , opening the door to mandated changes in search, ads, and AI answer surfaces, with reports indicating Google

             may be forced to alter UK search

            .

            •  Former UK PM

             Rishi Sunak

            became a senior adviser to both Microsoft and Anthropic, underscoring tighter links between AI governance and frontier lab strategy.

            •  In seismology, AI models

             detect smaller earthquakes faster

            , improving phase picking and subsurface imaging in noisy environments for volcano and fault monitoring.

            •  With publishers locking data and policies shifting, the

             AI training “gold rush” is slowing

            , forcing model builders to renegotiate licensing, synthetic data strategy, and provenance controls.

            🔧 Company Engineering Blogs

             Over Palantir

             (blog.palantir.com)

            . Palantir uitleg over data ownership, privacy-by-design, governance, and ethics in AI, with ICE contract context and European data sovereignty

             Introducing the Gemini 2.5 Computer Use model

             (deepmind.google)

            . Gemini 2.5 Computer Use model enables UI-interacting agents via Gemini API with low latency for web and mobile tasks

             From Single-Node to Multi-GPU Clusters: How Discord Made Distributed Compute Easy for ML Engineers

             (discord.com)

            . Discord details building a Ray-based ML platform with CLI, Dagster + KubeRay orchestration, and X-Ray observability for multi-GPU training

             Engineering Real-Time Multimodal AI Pipelines: Scaling File Processing to 50M Daily Uploads

             (engineering.salesforce.com)

            . Real-time multimodal AI pipelines for 50M daily uploads: file processing, validation, base64 grounding, and cross-platform prompts

             How to build reliable AI workflows with agentic primitives and context engineering

             (github.blog)

            . Three-layer agentic framework using Markdown prompts, agentic primitives, and context engineering to build reliable AI workflows with Copilot CLI and APM

            💻 Local Inference and Open Source

             Running Llama 3.1 8B Locally (LangChain and SQLite)

             (confessionsofadataguy.com)

            . Local Llama 3.1 8B with Ollama, LangChain, SQLite; Python uv toolchain; RAG indexing with FAISS; terminal chatbot on a laptop

             R port of llama2.c

             (thierrymoudiki.github.io)

            . R port of llama2.c with Shiny app, installation steps, and API access for educational use

             Kumru LLM

             (medium.com/vngrs)

            . Kumru LLM: a 7.4B Turkish decoder-only model trained from scratch for in-house deployment with 8,192 context, 300B tokens, and 16GB GPUs

             Integrating Ollama with Python: REST API and Python Client Examples

             (glukhov.org)

            . Connecting Python apps to Ollama via REST API and Python client for chat, generate, and thinking models like qwen3

             Fine-Tuning Gemma 3n for Speech Transcription

             (debuggercafe.com)

            . Fine-tuning Gemma 3n for German speech transcription using Unsloth and evaluating with WER

            🏗️ Production Agents and Strategy

             The Infrastructure for Production AI

             (thedataexchange.media)

            . Zhen Lu discusses AI-first clouds, production use cases, GPU reliability, and agent-driven software at The Data Exchange

             I'm Writing a Book on Production-Grade Agentic AI (And You Can Read It Now)

             (aroussi.com)

            . Explores production-grade agentic AI, memory management, orchestration, observability, and deployment patterns with LeanPub chapter-by-chapter release

             AI Apps -> Agent Labs

             (akashbajwa.co)

            . Agent Labs vs Model Labs: product-first AI apps, RL incentives, data moat, and developers' shift toward vertical integration

             dead framework theory

             (aifoc.us)

            . Explores how React dominates as platform, LLMs and training data create a dead framework effect, and implications for new frameworks, tools, and browser features

             Debugging DSPy token usage and prompts

             (danielcorin.com)

            . Debugging DSPy token usage, prompts and LM configurations across Gemini, GPT-5, and OpenAI APIs

            📏 LLM Evaluation and Benchmarks

             Who watches the watchers? LLM on LLM evaluations

             (stackoverflow.blog)

            . LLMs judge LLM outputs at scale using golden datasets, teacher models, and ProLLM; StackOverflow data informs evaluation benchmarks

             Importance of offline evaluation to guide model choice

             (tech.olx.com)

            . OLX compares open embedding models with internal Item2Vec using MTEB benchmarks, fine-tuning, and offline evaluation for multilingual recall

             Inspect AI

             (alexdong.com)

            . Inspect AI: exploring a Petri Alignment plugin, Inspect AI scaffolding, and extending evaluation workflows with typed, well-documented code

             Comparison: Qwen3:30b vs GPT-OSS:20b

             (glukhov.org)

            . Tech benchmark comparison of Qwen3:30b, Qwen3:30b-instruct, Qwen3:30b-thinking vs GPT-OSS:20b across speed, context windows, and token benchmarks

            🧭 Smarter RAG and Reranking

             Meta Superintelligence's surprising first paper

             (paddedinputs.substack.com)

            . MSI's REFRAG enables 30x faster TTFT in RAG by using chunk embeddings and a lightweight RL policy to expand select chunks

             Using Language Engineering to Build a Smarter RAG for Code

             (tomassetti.me)

            . Using parsers and symbol resolvers to build a smarter RAG for code with LangChain4J and a Parser-based CodeSplitter

             Why using a reranker?

             (zansara.dev)

            . RAG with bi-encoders and cross-encoders, reranking strategies, distillation, late interaction (ColBERT), listwise reranking, caching, and hybrid architectures

             Why did Meta’s superintelligence team publish an obscure paper?

             (tornikeo.com)

            . Meta's MSI publishes REFRAG, a fast retrieval-augmented generation method that speeds RAG 30x without accuracy loss for business-scale document search

             Cross Talk

             (joecooper.me)

            . Markov text generation, DeBERTa-based reranking, and OCR-like text sorting for multiturnChat on a 3090, OpenSubtitles data, and bespoke quality-control models

             What Problem Is Traditional RAG Solving?

             (gojiberries.io)

            . Traditional RAG uses pre-chunked text and embedding‑based search for fast, small‑evidence reasoning on uniform, time‑neutral corpora

            🧠 Long-Context and KV Caches

             From 2K to 2M+ Tokens: The Long-Context Frontiers of GenAI

             (medium.datadriveninvestor.com)

            . Long-context LLMs, Lost-in-the-Middle, RAG, position engineering, prompt compression, LIFT, and agentic RAG for reliable reasoning

             Recurrence and Attention for Long-Context Transformers with Jacob Buckman - #750

             (twimlai.com)

            . Long-context transformers with Jacob Buckman; windowed attention, grouped query attention, latent space attention, Power Retention, and Vidrial/PowerCoder open-source projects

             KV Cache Optimization via Multi-Head Latent Attention

             (pyimagesearch.com)

            . KV Cache optimization with Multi-Head Latent Attention (MLA) reduces KV cache memory in transformers for long-context inference

             The Best Choice for AI Inference: vLLM

             (terrytangyuan.github.io)

            . vLLM enables open-source, memory-efficient LLM inference with KV-Cache, KV-Cache, PagedAttention, and multi-parallelism; llm-d orchestrates distribution on OpenShift AI

            🔬 Training, Internals, and Theory

             Replacing RL w/ Parameter-based Evolutionary Strategies

             (lesswrong.com)

            . Parameter-based evolutionary strategies (ES) scale to billion-parameter models for fine-tuning LLMs, using distributional weight perturbations and reward normalization

             LLM Poisoning [1/3] - Reading the Transformer's Thoughts

             (synacktiv.com)

            . Explores Transformer internals, FFN key–value memory, trigger detection in pre-down MLP activations, and causal tracing for hidden knowledge in LLMs

             Revisiting Karpathy’s 'The Unreasonable Effectiveness of Recurrent Neural Networks'

             (gilesthomas.com)

            . Explores Karpathy's 2015 RNN post, contrasts vanilla RNNs with LLMs, discusses byte-level inputs, training via truncated BPTT, and PyTorch vs Lua Torch implementations

             modded-nanogpt world record: Decoupling embedding size from model dimension

             (snimu.github.io)

            . Modded-NanoGPT uses multiple input embeddings with learned layer-wise weights to decouple embedding size from model dimension

             Book Review: Time Series Forecasting using Foundation Models

             (sujitpal.blogspot.com)

            . Book review surveys seven Foundation Models for time series forecasting, with zero-shot, fine-tuning, probabilistic forecasts, anomaly detection, and a capstone project

            📚 Academic Research

             To Sink or Not to Sink: Visual Information Pathways in Large   Vision-Language Models

             (arxiv:cs)

            . Analyzes ViT attention sinks to reveal high-norm visual tokens guiding LLM reasoning in LVLMs and proposes training-free and training-based utilization methods

             Tiny-R1V: Lightweight Multimodal Unified Reasoning Model via Model   Merging

             (arxiv:cs)

            . Tiny-R1V: a 3B lightweight multimodal model using LIPO reinforcement learning and AMM model merging for unified reasoning across tasks

             Spotlight on Token Perception for Multimodal Reinforcement Learning

             (arxiv:cs)

            . Visually-Perceptive Policy Optimization (VPPO) reweights and focuses updates on tokens with high visual dependency for multimodal RLVR in LVLMs

             ASPO: Asymmetric Importance Sampling Policy Optimization

             (arxiv:cs)

            . ASPO corrects importance sampling in OSRL for LLMs by flipping IS ratios of positive-advantage tokens and introducing soft dual-clipping

             On the Representations of Entities in Auto-regressive Large Language   Models

             (arxiv:cs)

            . Entity mentions, multi-token encoding, and relational knowledge in autoregressive LLMs via task vectors and the Entity Lens

            👋 Before you go

            I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching

             a Patreon page!

            .  Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:

             Real say in how Blaze evolves — vote on new topics, features, topic curation ideas

             First dibs on merch (details still cooking)

             That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing

            If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.

              Have an idea for how blaze could be better? Please visit the

               feedback form

              to let us know. To update your preferences, or to unsubscribe, please go to

               blaze.email/unsubscribe

              .

Don't miss what's next. Subscribe to The AI Engineer:

Start the conversation: