Generative AI newsletter


            
        September 16, 2025
    

Generative AI newsletter


   Blaze Email
  

               Generative AI
              

               2025-09-16
               
                •  read online
               

                •  patreon
               

            📣 Headlines
           

            •
            
             Anthropic lets Claude remember previous interactions
            
            , adding persistent memory, an optional incognito mode, and cross‑export to rival assistants to streamline enterprise workflows.
           

            •
            
             OpenAI's first AI chip could be launched in 2026
            
            to reduce reliance on Nvidia/AMD, while a new
            
             Microsoft–OpenAI deal hints at IPO prospects and deeper collaboration
            
            .
           

            •
            
             Oracle's new product is power
            
            , unveiling cloud hardware and AI chips to scale training/inference via OCI and OpenAI-aligned partnerships.
           

            •  On‑device AI accelerated with
            
             Arm’s Lumex compute subsystem for smartphones/PCs
            
            ,
            
             Firefox for iOS summarization (local on A17 Pro, cloud on older devices)
            
            , and
            
             Apple’s iPhone Air/17 lineup with AI‑forward hardware
            
            .
           

            •  Agentic AI moved from concept to practice as enterprises explore delegating multistep workflows with expert oversight [(https://news.crunchbase.com/ai/agentic-ai-evolution-wong-hron-thomson-reuters/)]. Startups launched security agents:
            
             AegisAI to neutralize email threats in real time
            
            ,
            
             Lookout’s Smishing AI for mobile social engineering
            
            , and
            
             Miru’s unified cyber investigations copilot
            
            .
           

            •  U.S. AI policy heated up: regulators probe
            
             AI companionship platforms
            
            , California advanced
            
             frontier model risk disclosure rules
            
            , while a proposal seeks a multi‑year
            
             federal regulatory waiver and sandbox for AI firms
            
            .
           

            •  Microsoft expanded Fabric with a
            
             native graph database and real‑time geospatial maps powered by LinkedIn tech
            
            , integrated with OneLake for unified analytics.
           

            •  RL training markets surged as
            
             Mercor targets a $10B+ valuation on a $450M run rate
            
            , linking model providers with domain experts for reinforcement learning workflows.
           

            🔧 Company Engineering Blogs
           

             Jupyter Agents: training LLMs to reason with notebooks
            

             (huggingface.co)
            
            . Jupyter Agent builds a data science workflow inside notebooks using Qwen models, scaffolding, QA generation, and E2B execution pipelines
           

             Accelerating scientific discovery with AI-powered empirical software
            

             (research.google)
            
            . Google Research presents an AI-powered system, built on Gemini, that writes, optimizes, and empirically evaluates scientific software across genomics, public health, geospatial analysis, neuroscience, and time-series forecasting
           

             Scientific frontiers of agentic AI
            

             (amazon.science)
            
            . Agentic AI explores embedding languages, context, negotiation, common sense, and privacy with embeddings, context windows, and behavioral economics insights
           

            🧠 Model Architecture & Optimization: Qwen3-Next, MoE, Tokenization, Test-Time Compute
           

             Qwen3-Next-80B-A3B: 🐧🦩 Who needs legs?!
            

             (simonwillison.net)
            
            . Qwen3-Next-80B-A3B-Instruct and Thinking models; 80B with 3B active per round; OpenRouter deployment; llm-openrouter plugin; pelican SVG prompt; performance claims
           

             lecture three
            

             (aarnphm.xyz)
            
            . Lecture three on tokenizers, LLMs, alignment, sparse autoencoders, residual streams, and speculative decoding for efficient inference
           

             assignment three reports.
            

             (aarnphm.xyz)
            
            . Discussion of replacing one-hot cross-entropy, 2D GEMMs, batching, tokenization, and optimization techniques for large V vocabularies
           

             Qwen 3 Next
            

             (sibellavia.lol)
            
            . Qwen3-Next-80B models with hybrid Gated DeltaNet, ultra-sparse MoE (512 experts), YaRN context up to 1,000,000 tokens, and multi-token prediction
           

             LLM-driven Evolutionary Search to squeeze even more value out of Test-Time Compute
            

             (alexdong.com)
            
            . LLM-driven evolutionary search uses islands, contextual feedback, and critique through role separation to optimize test-time compute
           

            ⚡ Deterministic & Efficient LLM Inference and Serving
           

             Defeating Nondeterminism in LLM Inference
            

             (simonwillison.net)
            
            . Nondeterminism in LLM inference arises mainly from varying load and batch size; paper proposes invariant kernels in PyTorch to achieve determinism
           

             Speculative cascades — A hybrid approach for smarter, faster LLM inference
            

             (research.google)
            
            . Speculative cascades combine cascades and speculative decoding with a deferral rule to speed LLM inference and improve cost–quality trade-offs
           

             Paper Review: Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
            

             (andlukyane.com)
            
            . Decentralized RL post-training with SAPO sharing rollouts across a swarm for LM fine-tuning and reward-based learning
           

             The Rise of Multimodal LLMs and Efficient Serving with vLLM
            

             (pyimagesearch.com)
            
            . Multimodal LLMs (LLaVA, GPT-4V, BakLLaVA) and vLLM enable OpenAI-compatible vision–language inference and efficient deployment
           

             Defeating Nondeterminism in LLM Inference – Thinking Machines Lab
            

             (jmason.ie)
            
            . Defeating nondeterminism in LLM inference by examining sampling, temperature effects, and deterministic behavior across stacks and libraries
           

             🚀 Not for the Faint-Hearted: Diving Deep into GPT-OSS
            

             (visokio.com)
            
            . GPT-OSS 20B & 120B open-weight models tested across llama.cpp, vLLM, HuggingFace, and lmstudio from MacBooks to H100 GPUs in Omniscope workflows
           

            🤖 Agentic Systems & RL: Frameworks, Evals, and Enterprise Patterns
           

             Exploring Active Agent, or can we build AI features the Rails way?
            

             (evilmartians.com)
            
            . Rails-style AI abstractions with Active Agent: agents, prompts, callbacks, templates, and battle-tested Rails examples
           

             Lessons learned from a 100 blog posts on AI
            

             (frontierai.substack.com)
            
            . Big-picture AI trends: economics of inference, token costs vs. volume, open-loop agents, evals, data quality, context management, and UX in AI apps
           

             Generalists Can Also Dig Deep
            

             (towardsdatascience.com)
            
            . Generalist Ida Silfverskiöld on AI agents, RAG, evals, and design choices in agentic systems
           

             Verlog: A Multi-turn RL framework for LLM agents
            

             (blog.ml.cmu.edu)
            
            . Verlog introduces multi-turn RL for long-horizon LLM agents with turn-level abstraction, fixed-turn batching, dual discounting GAE, and critic pre-training
           

             Beyond the Chatbot: What Actually Works in Enterprise AI
            

             (thedataexchange.media)
            
            . RAG systems evolution, evaluation as IP, embeddings, enterprise security, agent workflows, multi-modality, small models, and AI-enabled coding tools
           

            🛠️ Applied LLMs: RAG, Data Pipelines, and AI in Science
           

             Text analytics in Data Pipelines using AI
            

             (medium.com/@ed.bullen)
            
            . Databricks AI Query workflows for ETL pipelines; using LLMs to classify, rate sentiment, and justify results on Amazon Reviews data
           

             Single-cell analysis and infectious disease forecasting: Google's new AI scientist
            

             (blog.stephenturner.us)
            
            . AI systems generate and test new methods for single-cell RNA-seq batch integration and COVID-19 forecasting, surpassing some benchmarks
           

             Stumbling into AI: Part 3—RAG
            

             (rmoff.net)
            
            . Explains Retrieval-Augmented Generation (RAG) using embeddings, vector stores (ChromaDB), Ollama, and Llama models with Kafka release notes as example
           

             Benchmarking AI & ML on local CPU/GPUs: an end-to-end Python project
            

             (allaboutdata.substack.com)
            
            . Benchmarking AI/ML on local CPU/GPU with Python: XGBoost, Ollama, CUDA, uv, Altair, Streamlit dashboard and Docker-free workflow
           

            📚 Academic Research
           

             Inpainting-Guided Policy Optimization for Diffusion Large Language   Models
            

             (arxiv:cs)
            
            . Inpainting-guided RL for diffusion LLMs improves exploration, using partial ground-truth reasoning to boost GRPO, with synthetic traces and entropy filtering
           

             Can Understanding and Generation Truly Benefit Together -- or Just   Coexist?
            

             (arxiv:cs)
            
            . Unified multimodal learning: encoder–decoder paradigm with long-context captions, UAE framework, Unified-GRPO RL, and Unified-Bench benchmark
           

             AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making   through Multi-Turn Reinforcement Learning
            

             (arxiv:cs)
            
            . AgentGym-RL trains LLM agents for multi-turn decision making using RL, ScalingInter-RL for exploration-exploitation balance across diverse environments
           

             Multipole Semantic Attention: A Fast Approximation of Softmax Attention   for Pretraining
            

             (arxiv:cs)
            
            . MuSe: efficient multipole-based attention for transformers via dual semantic clustering and dipole corrections
           

             RewardDance: Reward Scaling in Visual Generation
            

             (arxiv:cs)
            
            . RewardDance: scalable reward modeling for visual generation using yes-token probability, enabling large RMs and CoT integration
           

            👋 Before you go
           

            I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching
            
             a Patreon page!
            
            .  Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:
           

             Real say in how Blaze evolves — vote on new topics, features, topic curation ideas
            

             First dibs on merch (details still cooking)
            

             That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing
            

            If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.
           

              Have an idea for how blaze could be better? Please visit the
              
               feedback form
              
              to let us know. To update your preferences, or to unsubscribe, please go to
              
               blaze.email/unsubscribe
              
              .
             

                            Don't miss what's next. Subscribe to The AI Engineer:
                        
                    
          Add a comment:
          
            
                Share this email:
                
                    
                                Share on LinkedIn
                            
                        
                                Share on Hacker News
                            
                        
                                Share on Mastodon
                            
                        
                                Share on Bluesky