Generative AI newsletter

                        September 23, 2025

            Generative AI newsletter

   Blaze Email

               Generative AI

               2025-09-23

                •  read online

                •  patreon

            📣 Headlines

            •  Meta introduced

             Ray-Ban Display smart glasses

            with an in-lens screen and AI assistant plus a

             Neural Band wrist controller

            , signaling a more capable HUD-class wearable platform.

            •  Zoom unveiled

             AI Companion 3.0

            with agentic capabilities spanning meetings, CX, marketing, sales, and frontline workflows.

            •  In healthcare, Akido Labs'

             ScopeAI

            runs appointments and drafts diagnoses under physician review, while more clinicians

             turn to ChatGPT for second opinions

            , raising benefit and privacy questions.

            •  As attackers weaponize AI, CrowdStrike

             pushed scaling defensive AI

            and backed

             Terra Security’s agentic offensive platform

            via its accelerator with Nvidia and AWS support.

            •  To curb GPU dependency, the industry is pursuing alternatives and open networks as firms seek to

             escape the 'Nvidia tax'

            , highlighted by

             Upscale AI’s $100M seed for open-standards AI networking

            .

            •  Biosecurity spotlight: researchers

             used AI-designed DNA to create bacteriophages that infected and killed E. coli

            , demonstrating real-world bioactivity from AI-generated genomes.

            •  Materials discovery advance: MIT’s

             SCIGEN steers diffusion models

            to generate candidate quantum materials with target lattice geometries (e.g., Kagome, Archimedean).

            •  Microsoft expanded its data stack as Fabric

             adds a LinkedIn-derived native graph engine and real-time geospatial maps

            integrated with OneLake.

            🔧 Company Engineering Blogs

             Gemini achieves gold-level performance at the International Collegiate Programming Contest World Finals

             (deepmind.google)

            . Gemini 2.5 Deep Think achieves gold-medal level at the 2025 ICPC World Finals, solving 10/12 problems with advanced reasoning and reinforcement learning techniques

             Meet the GitHub MCP Registry: The fastest way to discover MCP Servers

             (github.blog)

            . GitHub introduces the MCP Registry to centralize MCP server discovery for Copilot, agents, and MCP-enabled tools

             Scaleway on Hugging Face Inference Providers 🔥

             (huggingface.co)

            . Scaleway joins Hugging Face Inference Providers, enabling serverless inference with Scaleway API keys and HF routing

             Learn Your Way: Reimagining textbooks with generative AI

             (research.google)

            . Google Research explores Learn Your Way, using GenAI to generate multimodal, personalized educational materials and measure learning efficacy

            🤖 Agentic systems: data, operations, and real workflows

             Supporting our AI overlords: Redesigning data systems to be Agent-first

             (muratbuffalo.blogspot.com)

            . Agent-first data systems: LLM agent workloads, agentic speculation, multi-query optimization, memory stores, and neurosymbolic collaboration in DBMS redesign

             Clouded Judgement 9.19.25 - The AI Shift: Static Software vs. Living AI Systems

             (cloudedjudgement.substack.com)

            . AI products evolve like living systems, requiring continuous evaluation, observability, and hot-swappable models and prompts

             Why Digital Work is the Perfect Training Ground for AI Agents

             (thedataexchange.media)

            . Upwork CTO Andrew Rabinovich explains Uma, RLEF, RAG with knowledge graphs, and human-in-the-loop evaluation for AI agents in digital work

             What happens when coding agents stop feeling like dialup?

             (martinalderson.com)

            . Discusses AI coding agents, reliability, token speeds, OpenRouter data, Claude Code, Cerebras Code, Gemini CLI, and implications for developer workflow and pricing

            ⚙️ LLM performance engineering: inference, profiling, and embeddings

             Lessons from the trenches: why llama.cpp works best (today)

             (visokio.com)

            . llama.cpp beats vLLM for running GPT-OSS models locally, with reliability and interactive capabilities highlighted

             Scaled dot-product attention profiling

             (aarnphm.xyz)

            . Scaled dot-product attention profiling with naive, sdpa, and tensorboard tracing using UV, Modal, and PyTorch on CPU/CUDA

             How to Reduce the costs of Running LLMs by 10-15x [Investigations]

             (artificialintelligencemadesimple.substack.com)

            . Techniques for cost-efficient LLM inference: batching, compiler graphs, FlashAttention, quantization, KV caches, sparse architectures, MoE, and spec decoding

             Qwen-8B Embeddings: Near-SOTA Performance at 600x the Speed

             (alexdong.com)

            . Qwen-8B embeddings enable near-SOTA text classification, 600x faster than LLM classifiers, achieving MAP ~0.944 on Kaggle with simple MLP

            🛠️ Hands-on builds and experiments: vLLM, Android RAG, diffusion, and personal projects

             Summer 2025 in Review

             (bengubler.com)

            . Summer 2025 recap of AI projects, tokenizers, and WebGPU shading library shade, plus dataset tooling and a LessWrong piece

             How I Built the Database of my Dreams

             (blog.apiad.net)

            . BeaverDB: a Pythonic, SQLite-backed multi-modal data store for vectors, text, lists, queues, pub-sub and more

             Setting Up LLaVA/BakLLaVA with vLLM: Backend and API Integration

             (pyimagesearch.com)

            . Guide to setting up vLLM with CUDA for LLaVA/BakLLaVA, offline Python inference, and OpenAI-compatible API serving

             Running a RAG powered language model on Android using MediaPipe

             (darrylbayliss.net)

            . Step-by-step guide using MediaPipe to run a RAG-powered language model on Android with Gemma, embeddings, and a local vector store

             arkaine - an experiment in AI tooling

             (hlfshell.ai)

            . Arkaine: an AI tooling framework for agents with tool calling, contexts, PythonEnv backend, Spellbook, and lessons learned

             Diffusion models: image generation

             (konradb.substack.com)

            . DIY diffusion-image generation with Flux, Hugging Face diffusers, and prompts automation in Colab

            🔎 RAG in production: evaluation, selective retrieval, and vector stores

             RAG talk recap from DevConf.US 2025

             (major.io)

            . RAG with LLMs explained through a Fellowship metaphor, failures, strategies, and practical lessons for production systems

             Evaluating Your RAG Solution

             (towardsdatascience.com)

            . RAG pipeline construction with OpenAlex abstracts, FAISS vector store, LangChain, and DeepEval for retriever and generator evaluation

             Deciding When Not to Retrieve: Adaptive RAG, Part 2

             (blog.reachsumit.com)

            . Selective Retrieval in Adaptive RAG: pre-generation decisions using external features and popularity-based triggers

             How do vector databases work?

             (hclimente.github.io)

            . Vector embeddings, cosine similarity, UMAP visualizations, and HNSW-based vector databases (Qdrant) for RAG with LLMs

            🧪 Rethinking learning: test-time diffusion, layer-wise decoding, and RL efficiency

             Deep researcher with test-time diffusion

             (research.google)

            . TTD-DR uses test-time diffusion with self-evolution and retrieval-denoising to draft and revise long-form research reports

             Making LLMs more accurate by using all of their layers

             (research.google)

            . SLED decoding uses all LLM layers to align outputs with factual knowledge without external data or fine-tuning

             Prediction is hard, especially about the future

             (strangeloopcanon.com)

            . Forecasting with tiny LLMs: Varro RL environment, GSPO training, semantic similarity, and daily headline predictions

             The Extreme Inefficiency of RL for Frontier Models

             (tobyord.com)

            . New scaling paradigm: RL’s information efficiency vs pre-training; long-horizon tasks, token-entropy, METR/HCAST, o1/o3/o3 models, latency and inference costs

             The Shift to Reinforcement Learning Greatly Reduces Learning-Efficiency

             (tobyord.com)

            . RL training learns far less per hour than pre-training, impacting scalability, generality, and frontier task efficiency in AI systems

            📚 Academic Research

             LLM-I: LLMs are Naturally Interleaved Multimodal Creators

             (arxiv:cs)

            . LLMs orchestrate tools like online image search, diffusion generation, code execution, and image editing for interleaved multimodal creation

             Understand Before You Generate: Self-Guided Training for Autoregressive   Image Generation

             (arxiv:cs)

            . Self-guided training for autoregressive image generation improves visual understanding and FID for LlamaGen models

             Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to   Multi-Scale Problems

             (arxiv:stat)

            . Hierarchical self-attention for multi-scale, multi-modal data using entropy-minimizing mechanics and dynamic-programming-accelerated transformers

             MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid   Vision Tokenizer

             (arxiv:cs)

            . Manzano proposes a unified multimodal framework with a hybrid image tokenizer, shared vision encoder, dual adapters, and a unified LLM for text and image token generation

             AToken: A Unified Tokenizer for Vision

             (arxiv:cs)

            . AToken: a unified transformer-based visual tokenizer for images, videos, and 3D with 4D rotary embeddings and adversarial-free training

            👋 Before you go

            I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching

             a Patreon page!

            .  Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:

             Real say in how Blaze evolves — vote on new topics, features, topic curation ideas

             First dibs on merch (details still cooking)

             That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing

            If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.

              Have an idea for how blaze could be better? Please visit the

               feedback form

              to let us know. To update your preferences, or to unsubscribe, please go to

               blaze.email/unsubscribe

              .

Don't miss what's next. Subscribe to The AI Engineer:

Start the conversation: