📣 Headlines
           
            •  Meta introduced
            
             Ray-Ban Display smart glasses
            
            with an in-lens screen and AI assistant plus a
            
             Neural Band wrist controller
            
            , signaling a more capable HUD-class wearable platform.
            
            •  Zoom unveiled
            
             AI Companion 3.0
            
            with agentic capabilities spanning meetings, CX, marketing, sales, and frontline workflows.
            
            •  In healthcare, Akido Labs'
            
             ScopeAI
            
            runs appointments and drafts diagnoses under physician review, while more clinicians
            
             turn to ChatGPT for second opinions
            
            , raising benefit and privacy questions.
            
            •  As attackers weaponize AI, CrowdStrike
            
             pushed scaling defensive AI
            
            and backed
            
             Terra Security’s agentic offensive platform
            
            via its accelerator with Nvidia and AWS support.
            
            •  To curb GPU dependency, the industry is pursuing alternatives and open networks as firms seek to
            
             escape the 'Nvidia tax'
            
            , highlighted by
            
             Upscale AI’s $100M seed for open-standards AI networking
            
            .
            
            •  Biosecurity spotlight: researchers
            
             used AI-designed DNA to create bacteriophages that infected and killed E. coli
            
            , demonstrating real-world bioactivity from AI-generated genomes.
            
            •  Materials discovery advance: MIT’s
            
             SCIGEN steers diffusion models
            
            to generate candidate quantum materials with target lattice geometries (e.g., Kagome, Archimedean).
            
            •  Microsoft expanded its data stack as Fabric
            
             adds a LinkedIn-derived native graph engine and real-time geospatial maps
            
            integrated with OneLake.
            
            🔧 Company Engineering Blogs
           
             Gemini achieves gold-level performance at the International Collegiate Programming Contest World Finals
            
             (deepmind.google)
            
            . Gemini 2.5 Deep Think achieves gold-medal level at the 2025 ICPC World Finals, solving 10/12 problems with advanced reasoning and reinforcement learning techniques
            
             Meet the GitHub MCP Registry: The fastest way to discover MCP Servers
            
             (github.blog)
            
            . GitHub introduces the MCP Registry to centralize MCP server discovery for Copilot, agents, and MCP-enabled tools
            
             Scaleway on Hugging Face Inference Providers 🔥
            
             (huggingface.co)
            
            . Scaleway joins Hugging Face Inference Providers, enabling serverless inference with Scaleway API keys and HF routing
            
             Learn Your Way: Reimagining textbooks with generative AI
            
             (research.google)
            
            . Google Research explores Learn Your Way, using GenAI to generate multimodal, personalized educational materials and measure learning efficacy
            
            🤖 Agentic systems: data, operations, and real workflows
           
             Supporting our AI overlords: Redesigning data systems to be Agent-first
            
             (muratbuffalo.blogspot.com)
            
            . Agent-first data systems: LLM agent workloads, agentic speculation, multi-query optimization, memory stores, and neurosymbolic collaboration in DBMS redesign
            
             Clouded Judgement 9.19.25 - The AI Shift: Static Software vs. Living AI Systems
            
             (cloudedjudgement.substack.com)
            
            . AI products evolve like living systems, requiring continuous evaluation, observability, and hot-swappable models and prompts
            
             Why Digital Work is the Perfect Training Ground for AI Agents
            
             (thedataexchange.media)
            
            . Upwork CTO Andrew Rabinovich explains Uma, RLEF, RAG with knowledge graphs, and human-in-the-loop evaluation for AI agents in digital work
            
             What happens when coding agents stop feeling like dialup?
            
             (martinalderson.com)
            
            . Discusses AI coding agents, reliability, token speeds, OpenRouter data, Claude Code, Cerebras Code, Gemini CLI, and implications for developer workflow and pricing
            
            ⚙️ LLM performance engineering: inference, profiling, and embeddings
           
             Lessons from the trenches: why llama.cpp works best (today)
            
             (visokio.com)
            
            . llama.cpp beats vLLM for running GPT-OSS models locally, with reliability and interactive capabilities highlighted
            
             Scaled dot-product attention profiling
            
             (aarnphm.xyz)
            
            . Scaled dot-product attention profiling with naive, sdpa, and tensorboard tracing using UV, Modal, and PyTorch on CPU/CUDA
            
             How to Reduce the costs of Running LLMs by 10-15x [Investigations]
            
             (artificialintelligencemadesimple.substack.com)
            
            . Techniques for cost-efficient LLM inference: batching, compiler graphs, FlashAttention, quantization, KV caches, sparse architectures, MoE, and spec decoding
            
             Qwen-8B Embeddings: Near-SOTA Performance at 600x the Speed
            
             (alexdong.com)
            
            . Qwen-8B embeddings enable near-SOTA text classification, 600x faster than LLM classifiers, achieving MAP ~0.944 on Kaggle with simple MLP
            
            🛠️ Hands-on builds and experiments: vLLM, Android RAG, diffusion, and personal projects
           
             Summer 2025 in Review
            
             (bengubler.com)
            
            . Summer 2025 recap of AI projects, tokenizers, and WebGPU shading library shade, plus dataset tooling and a LessWrong piece
            
             How I Built the Database of my Dreams
            
             (blog.apiad.net)
            
            . BeaverDB: a Pythonic, SQLite-backed multi-modal data store for vectors, text, lists, queues, pub-sub and more
            
             Setting Up LLaVA/BakLLaVA with vLLM: Backend and API Integration
            
             (pyimagesearch.com)
            
            . Guide to setting up vLLM with CUDA for LLaVA/BakLLaVA, offline Python inference, and OpenAI-compatible API serving
            
             Running a RAG powered language model on Android using MediaPipe
            
             (darrylbayliss.net)
            
            . Step-by-step guide using MediaPipe to run a RAG-powered language model on Android with Gemma, embeddings, and a local vector store
            
             arkaine - an experiment in AI tooling
            
             (hlfshell.ai)
            
            . Arkaine: an AI tooling framework for agents with tool calling, contexts, PythonEnv backend, Spellbook, and lessons learned
            
             Diffusion models: image generation
            
             (konradb.substack.com)
            
            . DIY diffusion-image generation with Flux, Hugging Face diffusers, and prompts automation in Colab
            
            🔎 RAG in production: evaluation, selective retrieval, and vector stores
           
             RAG talk recap from DevConf.US 2025
            
             (major.io)
            
            . RAG with LLMs explained through a Fellowship metaphor, failures, strategies, and practical lessons for production systems
            
             Evaluating Your RAG Solution
            
             (towardsdatascience.com)
            
            . RAG pipeline construction with OpenAlex abstracts, FAISS vector store, LangChain, and DeepEval for retriever and generator evaluation
            
             Deciding When Not to Retrieve: Adaptive RAG, Part 2
            
             (blog.reachsumit.com)
            
            . Selective Retrieval in Adaptive RAG: pre-generation decisions using external features and popularity-based triggers
            
             How do vector databases work?
            
             (hclimente.github.io)
            
            . Vector embeddings, cosine similarity, UMAP visualizations, and HNSW-based vector databases (Qdrant) for RAG with LLMs
            
            🧪 Rethinking learning: test-time diffusion, layer-wise decoding, and RL efficiency
           
             Deep researcher with test-time diffusion
            
             (research.google)
            
            . TTD-DR uses test-time diffusion with self-evolution and retrieval-denoising to draft and revise long-form research reports
            
             Making LLMs more accurate by using all of their layers
            
             (research.google)
            
            . SLED decoding uses all LLM layers to align outputs with factual knowledge without external data or fine-tuning
            
             Prediction is hard, especially about the future
            
             (strangeloopcanon.com)
            
            . Forecasting with tiny LLMs: Varro RL environment, GSPO training, semantic similarity, and daily headline predictions
            
             The Extreme Inefficiency of RL for Frontier Models
            
             (tobyord.com)
            
            . New scaling paradigm: RL’s information efficiency vs pre-training; long-horizon tasks, token-entropy, METR/HCAST, o1/o3/o3 models, latency and inference costs
            
             The Shift to Reinforcement Learning Greatly Reduces Learning-Efficiency
            
             (tobyord.com)
            
            . RL training learns far less per hour than pre-training, impacting scalability, generality, and frontier task efficiency in AI systems
            
            📚 Academic Research
           
             LLM-I: LLMs are Naturally Interleaved Multimodal Creators
            
             (arxiv:cs)
            
            . LLMs orchestrate tools like online image search, diffusion generation, code execution, and image editing for interleaved multimodal creation
            
             Understand Before You Generate: Self-Guided Training for Autoregressive   Image Generation
            
             (arxiv:cs)
            
            . Self-guided training for autoregressive image generation improves visual understanding and FID for LlamaGen models
            
             Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to   Multi-Scale Problems
            
             (arxiv:stat)
            
            . Hierarchical self-attention for multi-scale, multi-modal data using entropy-minimizing mechanics and dynamic-programming-accelerated transformers
            
             MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid   Vision Tokenizer
            
             (arxiv:cs)
            
            . Manzano proposes a unified multimodal framework with a hybrid image tokenizer, shared vision encoder, dual adapters, and a unified LLM for text and image token generation
            
             AToken: A Unified Tokenizer for Vision
            
             (arxiv:cs)
            
            . AToken: a unified transformer-based visual tokenizer for images, videos, and 3D with 4D rotary embeddings and adversarial-free training
            
            👋 Before you go
           
            I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching
            
             a Patreon page!
            
            .  Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:
            
- 
             Real say in how Blaze evolves — vote on new topics, features, topic curation ideas
            
 
- 
             First dibs on merch (details still cooking)
            
 
- 
             That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing
            
 
 
            If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.
            
 |