📣 Headlines
           
            •
            
             Anthropic lets Claude remember previous interactions
            
            , adding persistent memory, an optional incognito mode, and cross‑export to rival assistants to streamline enterprise workflows.
            
            •
            
             OpenAI's first AI chip could be launched in 2026
            
            to reduce reliance on Nvidia/AMD, while a new
            
             Microsoft–OpenAI deal hints at IPO prospects and deeper collaboration
            
            .
            
            •
            
             Oracle's new product is power
            
            , unveiling cloud hardware and AI chips to scale training/inference via OCI and OpenAI-aligned partnerships.
            
            •  On‑device AI accelerated with
            
             Arm’s Lumex compute subsystem for smartphones/PCs
            
            ,
            
             Firefox for iOS summarization (local on A17 Pro, cloud on older devices)
            
            , and
            
             Apple’s iPhone Air/17 lineup with AI‑forward hardware
            
            .
            
            •  Agentic AI moved from concept to practice as enterprises explore delegating multistep workflows with expert oversight [(https://news.crunchbase.com/ai/agentic-ai-evolution-wong-hron-thomson-reuters/)]. Startups launched security agents:
            
             AegisAI to neutralize email threats in real time
            
            ,
            
             Lookout’s Smishing AI for mobile social engineering
            
            , and
            
             Miru’s unified cyber investigations copilot
            
            .
            
            •  U.S. AI policy heated up: regulators probe
            
             AI companionship platforms
            
            , California advanced
            
             frontier model risk disclosure rules
            
            , while a proposal seeks a multi‑year
            
             federal regulatory waiver and sandbox for AI firms
            
            .
            
            •  Microsoft expanded Fabric with a
            
             native graph database and real‑time geospatial maps powered by LinkedIn tech
            
            , integrated with OneLake for unified analytics.
            
            •  RL training markets surged as
            
             Mercor targets a $10B+ valuation on a $450M run rate
            
            , linking model providers with domain experts for reinforcement learning workflows.
            
            🔧 Company Engineering Blogs
           
             Jupyter Agents: training LLMs to reason with notebooks
            
             (huggingface.co)
            
            . Jupyter Agent builds a data science workflow inside notebooks using Qwen models, scaffolding, QA generation, and E2B execution pipelines
            
             Accelerating scientific discovery with AI-powered empirical software
            
             (research.google)
            
            . Google Research presents an AI-powered system, built on Gemini, that writes, optimizes, and empirically evaluates scientific software across genomics, public health, geospatial analysis, neuroscience, and time-series forecasting
            
             Scientific frontiers of agentic AI
            
             (amazon.science)
            
            . Agentic AI explores embedding languages, context, negotiation, common sense, and privacy with embeddings, context windows, and behavioral economics insights
            
            🧠 Model Architecture & Optimization: Qwen3-Next, MoE, Tokenization, Test-Time Compute
           
             Qwen3-Next-80B-A3B: 🐧🦩 Who needs legs?!
            
             (simonwillison.net)
            
            . Qwen3-Next-80B-A3B-Instruct and Thinking models; 80B with 3B active per round; OpenRouter deployment; llm-openrouter plugin; pelican SVG prompt; performance claims
            
             lecture three
            
             (aarnphm.xyz)
            
            . Lecture three on tokenizers, LLMs, alignment, sparse autoencoders, residual streams, and speculative decoding for efficient inference
            
             assignment three reports.
            
             (aarnphm.xyz)
            
            . Discussion of replacing one-hot cross-entropy, 2D GEMMs, batching, tokenization, and optimization techniques for large V vocabularies
            
             Qwen 3 Next
            
             (sibellavia.lol)
            
            . Qwen3-Next-80B models with hybrid Gated DeltaNet, ultra-sparse MoE (512 experts), YaRN context up to 1,000,000 tokens, and multi-token prediction
            
             LLM-driven Evolutionary Search to squeeze even more value out of Test-Time Compute
            
             (alexdong.com)
            
            . LLM-driven evolutionary search uses islands, contextual feedback, and critique through role separation to optimize test-time compute
            
            ⚡ Deterministic & Efficient LLM Inference and Serving
           
             Defeating Nondeterminism in LLM Inference
            
             (simonwillison.net)
            
            . Nondeterminism in LLM inference arises mainly from varying load and batch size; paper proposes invariant kernels in PyTorch to achieve determinism
            
             Speculative cascades — A hybrid approach for smarter, faster LLM inference
            
             (research.google)
            
            . Speculative cascades combine cascades and speculative decoding with a deferral rule to speed LLM inference and improve cost–quality trade-offs
            
             Paper Review: Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
            
             (andlukyane.com)
            
            . Decentralized RL post-training with SAPO sharing rollouts across a swarm for LM fine-tuning and reward-based learning
            
             The Rise of Multimodal LLMs and Efficient Serving with vLLM
            
             (pyimagesearch.com)
            
            . Multimodal LLMs (LLaVA, GPT-4V, BakLLaVA) and vLLM enable OpenAI-compatible vision–language inference and efficient deployment
            
             Defeating Nondeterminism in LLM Inference – Thinking Machines Lab
            
             (jmason.ie)
            
            . Defeating nondeterminism in LLM inference by examining sampling, temperature effects, and deterministic behavior across stacks and libraries
            
             🚀 Not for the Faint-Hearted: Diving Deep into GPT-OSS
            
             (visokio.com)
            
            . GPT-OSS 20B & 120B open-weight models tested across llama.cpp, vLLM, HuggingFace, and lmstudio from MacBooks to H100 GPUs in Omniscope workflows
            
            🤖 Agentic Systems & RL: Frameworks, Evals, and Enterprise Patterns
           
             Exploring Active Agent, or can we build AI features the Rails way?
            
             (evilmartians.com)
            
            . Rails-style AI abstractions with Active Agent: agents, prompts, callbacks, templates, and battle-tested Rails examples
            
             Lessons learned from a 100 blog posts on AI
            
             (frontierai.substack.com)
            
            . Big-picture AI trends: economics of inference, token costs vs. volume, open-loop agents, evals, data quality, context management, and UX in AI apps
            
             Generalists Can Also Dig Deep
            
             (towardsdatascience.com)
            
            . Generalist Ida Silfverskiöld on AI agents, RAG, evals, and design choices in agentic systems
            
             Verlog: A Multi-turn RL framework for LLM agents
            
             (blog.ml.cmu.edu)
            
            . Verlog introduces multi-turn RL for long-horizon LLM agents with turn-level abstraction, fixed-turn batching, dual discounting GAE, and critic pre-training
            
             Beyond the Chatbot: What Actually Works in Enterprise AI
            
             (thedataexchange.media)
            
            . RAG systems evolution, evaluation as IP, embeddings, enterprise security, agent workflows, multi-modality, small models, and AI-enabled coding tools
            
            🛠️ Applied LLMs: RAG, Data Pipelines, and AI in Science
           
             Text analytics in Data Pipelines using AI
            
             (medium.com/@ed.bullen)
            
            . Databricks AI Query workflows for ETL pipelines; using LLMs to classify, rate sentiment, and justify results on Amazon Reviews data
            
             Single-cell analysis and infectious disease forecasting: Google's new AI scientist
            
             (blog.stephenturner.us)
            
            . AI systems generate and test new methods for single-cell RNA-seq batch integration and COVID-19 forecasting, surpassing some benchmarks
            
             Stumbling into AI: Part 3—RAG
            
             (rmoff.net)
            
            . Explains Retrieval-Augmented Generation (RAG) using embeddings, vector stores (ChromaDB), Ollama, and Llama models with Kafka release notes as example
            
             Benchmarking AI & ML on local CPU/GPUs: an end-to-end Python project
            
             (allaboutdata.substack.com)
            
            . Benchmarking AI/ML on local CPU/GPU with Python: XGBoost, Ollama, CUDA, uv, Altair, Streamlit dashboard and Docker-free workflow
            
            📚 Academic Research
           
             Inpainting-Guided Policy Optimization for Diffusion Large Language   Models
            
             (arxiv:cs)
            
            . Inpainting-guided RL for diffusion LLMs improves exploration, using partial ground-truth reasoning to boost GRPO, with synthetic traces and entropy filtering
            
             Can Understanding and Generation Truly Benefit Together -- or Just   Coexist?
            
             (arxiv:cs)
            
            . Unified multimodal learning: encoder–decoder paradigm with long-context captions, UAE framework, Unified-GRPO RL, and Unified-Bench benchmark
            
             AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making   through Multi-Turn Reinforcement Learning
            
             (arxiv:cs)
            
            . AgentGym-RL trains LLM agents for multi-turn decision making using RL, ScalingInter-RL for exploration-exploitation balance across diverse environments
            
             Multipole Semantic Attention: A Fast Approximation of Softmax Attention   for Pretraining
            
             (arxiv:cs)
            
            . MuSe: efficient multipole-based attention for transformers via dual semantic clustering and dipole corrections
            
             RewardDance: Reward Scaling in Visual Generation
            
             (arxiv:cs)
            
            . RewardDance: scalable reward modeling for visual generation using yes-token probability, enabling large RMs and CoT integration
            
            👋 Before you go
           
            I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching
            
             a Patreon page!
            
            .  Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:
            
- 
             Real say in how Blaze evolves — vote on new topics, features, topic curation ideas
            
 
- 
             First dibs on merch (details still cooking)
            
 
- 
             That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing
            
 
 
            If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.
            
 |