📣 Headlines
•
Anthropic lets Claude remember previous interactions
, adding persistent memory, an optional incognito mode, and cross‑export to rival assistants to streamline enterprise workflows.
•
OpenAI's first AI chip could be launched in 2026
to reduce reliance on Nvidia/AMD, while a new
Microsoft–OpenAI deal hints at IPO prospects and deeper collaboration
.
•
Oracle's new product is power
, unveiling cloud hardware and AI chips to scale training/inference via OCI and OpenAI-aligned partnerships.
• On‑device AI accelerated with
Arm’s Lumex compute subsystem for smartphones/PCs
,
Firefox for iOS summarization (local on A17 Pro, cloud on older devices)
, and
Apple’s iPhone Air/17 lineup with AI‑forward hardware
.
• Agentic AI moved from concept to practice as enterprises explore delegating multistep workflows with expert oversight [(https://news.crunchbase.com/ai/agentic-ai-evolution-wong-hron-thomson-reuters/)]. Startups launched security agents:
AegisAI to neutralize email threats in real time
,
Lookout’s Smishing AI for mobile social engineering
, and
Miru’s unified cyber investigations copilot
.
• U.S. AI policy heated up: regulators probe
AI companionship platforms
, California advanced
frontier model risk disclosure rules
, while a proposal seeks a multi‑year
federal regulatory waiver and sandbox for AI firms
.
• Microsoft expanded Fabric with a
native graph database and real‑time geospatial maps powered by LinkedIn tech
, integrated with OneLake for unified analytics.
• RL training markets surged as
Mercor targets a $10B+ valuation on a $450M run rate
, linking model providers with domain experts for reinforcement learning workflows.
🔧 Company Engineering Blogs
Jupyter Agents: training LLMs to reason with notebooks
(huggingface.co)
. Jupyter Agent builds a data science workflow inside notebooks using Qwen models, scaffolding, QA generation, and E2B execution pipelines
Accelerating scientific discovery with AI-powered empirical software
(research.google)
. Google Research presents an AI-powered system, built on Gemini, that writes, optimizes, and empirically evaluates scientific software across genomics, public health, geospatial analysis, neuroscience, and time-series forecasting
Scientific frontiers of agentic AI
(amazon.science)
. Agentic AI explores embedding languages, context, negotiation, common sense, and privacy with embeddings, context windows, and behavioral economics insights
🧠 Model Architecture & Optimization: Qwen3-Next, MoE, Tokenization, Test-Time Compute
Qwen3-Next-80B-A3B: 🐧🦩 Who needs legs?!
(simonwillison.net)
. Qwen3-Next-80B-A3B-Instruct and Thinking models; 80B with 3B active per round; OpenRouter deployment; llm-openrouter plugin; pelican SVG prompt; performance claims
lecture three
(aarnphm.xyz)
. Lecture three on tokenizers, LLMs, alignment, sparse autoencoders, residual streams, and speculative decoding for efficient inference
assignment three reports.
(aarnphm.xyz)
. Discussion of replacing one-hot cross-entropy, 2D GEMMs, batching, tokenization, and optimization techniques for large V vocabularies
Qwen 3 Next
(sibellavia.lol)
. Qwen3-Next-80B models with hybrid Gated DeltaNet, ultra-sparse MoE (512 experts), YaRN context up to 1,000,000 tokens, and multi-token prediction
LLM-driven Evolutionary Search to squeeze even more value out of Test-Time Compute
(alexdong.com)
. LLM-driven evolutionary search uses islands, contextual feedback, and critique through role separation to optimize test-time compute
⚡ Deterministic & Efficient LLM Inference and Serving
Defeating Nondeterminism in LLM Inference
(simonwillison.net)
. Nondeterminism in LLM inference arises mainly from varying load and batch size; paper proposes invariant kernels in PyTorch to achieve determinism
Speculative cascades — A hybrid approach for smarter, faster LLM inference
(research.google)
. Speculative cascades combine cascades and speculative decoding with a deferral rule to speed LLM inference and improve cost–quality trade-offs
Paper Review: Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
(andlukyane.com)
. Decentralized RL post-training with SAPO sharing rollouts across a swarm for LM fine-tuning and reward-based learning
The Rise of Multimodal LLMs and Efficient Serving with vLLM
(pyimagesearch.com)
. Multimodal LLMs (LLaVA, GPT-4V, BakLLaVA) and vLLM enable OpenAI-compatible vision–language inference and efficient deployment
Defeating Nondeterminism in LLM Inference – Thinking Machines Lab
(jmason.ie)
. Defeating nondeterminism in LLM inference by examining sampling, temperature effects, and deterministic behavior across stacks and libraries
🚀 Not for the Faint-Hearted: Diving Deep into GPT-OSS
(visokio.com)
. GPT-OSS 20B & 120B open-weight models tested across llama.cpp, vLLM, HuggingFace, and lmstudio from MacBooks to H100 GPUs in Omniscope workflows
🤖 Agentic Systems & RL: Frameworks, Evals, and Enterprise Patterns
Exploring Active Agent, or can we build AI features the Rails way?
(evilmartians.com)
. Rails-style AI abstractions with Active Agent: agents, prompts, callbacks, templates, and battle-tested Rails examples
Lessons learned from a 100 blog posts on AI
(frontierai.substack.com)
. Big-picture AI trends: economics of inference, token costs vs. volume, open-loop agents, evals, data quality, context management, and UX in AI apps
Generalists Can Also Dig Deep
(towardsdatascience.com)
. Generalist Ida Silfverskiöld on AI agents, RAG, evals, and design choices in agentic systems
Verlog: A Multi-turn RL framework for LLM agents
(blog.ml.cmu.edu)
. Verlog introduces multi-turn RL for long-horizon LLM agents with turn-level abstraction, fixed-turn batching, dual discounting GAE, and critic pre-training
Beyond the Chatbot: What Actually Works in Enterprise AI
(thedataexchange.media)
. RAG systems evolution, evaluation as IP, embeddings, enterprise security, agent workflows, multi-modality, small models, and AI-enabled coding tools
🛠️ Applied LLMs: RAG, Data Pipelines, and AI in Science
Text analytics in Data Pipelines using AI
(medium.com/@ed.bullen)
. Databricks AI Query workflows for ETL pipelines; using LLMs to classify, rate sentiment, and justify results on Amazon Reviews data
Single-cell analysis and infectious disease forecasting: Google's new AI scientist
(blog.stephenturner.us)
. AI systems generate and test new methods for single-cell RNA-seq batch integration and COVID-19 forecasting, surpassing some benchmarks
Stumbling into AI: Part 3—RAG
(rmoff.net)
. Explains Retrieval-Augmented Generation (RAG) using embeddings, vector stores (ChromaDB), Ollama, and Llama models with Kafka release notes as example
Benchmarking AI & ML on local CPU/GPUs: an end-to-end Python project
(allaboutdata.substack.com)
. Benchmarking AI/ML on local CPU/GPU with Python: XGBoost, Ollama, CUDA, uv, Altair, Streamlit dashboard and Docker-free workflow
📚 Academic Research
Inpainting-Guided Policy Optimization for Diffusion Large Language Models
(arxiv:cs)
. Inpainting-guided RL for diffusion LLMs improves exploration, using partial ground-truth reasoning to boost GRPO, with synthetic traces and entropy filtering
Can Understanding and Generation Truly Benefit Together -- or Just Coexist?
(arxiv:cs)
. Unified multimodal learning: encoder–decoder paradigm with long-context captions, UAE framework, Unified-GRPO RL, and Unified-Bench benchmark
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
(arxiv:cs)
. AgentGym-RL trains LLM agents for multi-turn decision making using RL, ScalingInter-RL for exploration-exploitation balance across diverse environments
Multipole Semantic Attention: A Fast Approximation of Softmax Attention for Pretraining
(arxiv:cs)
. MuSe: efficient multipole-based attention for transformers via dual semantic clustering and dipole corrections
RewardDance: Reward Scaling in Visual Generation
(arxiv:cs)
. RewardDance: scalable reward modeling for visual generation using yes-token probability, enabling large RMs and CoT integration
👋 Before you go
I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching
a Patreon page!
. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:
-
Real say in how Blaze evolves — vote on new topics, features, topic curation ideas
-
First dibs on merch (details still cooking)
-
That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing
If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.
|
Add a comment: