Generative AI newsletter

                        August 22, 2025

            Generative AI newsletter

   Blaze Email

               Generative AI

               2025-08-22

                •  read online

                •  patreon

            📣 Headlines

            •

             OpenAI faces legal challenges

            as it subpoenas Meta in Musk's takeover bid case, while simultaneously

             expanding into India

            with a new Delhi office and launching ChatGPT Go at ₹399/month.

            •

             ChatGPT and Claude are entering U.S. government

            operations, raising concerns about security and governance, while

             Salesforce launches Agentforce for Public Sector

            with FedRAMP-certified AI agents for citizen services.

            •

             Google unveiled the Pixel 10 lineup

            featuring Tensor G5 chips, Gemini Nano integration, and Pro Res Zoom up to 100x, alongside

             new Gemini-powered smart home speakers

            with TV pairing and Matter support.

            •

             New research reveals AI models prefer AI-generated content

            over human-written descriptions, with GPT-4, GPT-3.5, and Llama showing systematic bias that could impact hiring and educational decisions.

            •

             Cohere released Command A Reasoning

            , a 111-billion parameter enterprise-focused LLM supporting 23 languages with advanced tool use and multi-GPU deployment capabilities for customer service applications.

            •

             Texas AG launches probe into Meta and Character.AI

            over misleading mental health claims, while

             Australian reports show teens being hospitalized

            after harmful interactions with AI chatbots.

            •

             Masayoshi Son's SoftBank surged $11 billion in two weeks

            on AI infrastructure bets, boosting holdings in Nvidia and TSMC amid the ongoing AI boom.

            •

             Film schools are embracing AI tools

            like ChatGPT and RunwayML for screenwriting courses, while

             Japanese novelist Rie Qudan won a prize

            for a novel composed using ChatGPT assistance.

            🔧 Company Engineering Blogs

             How Cursor AI Cut Legacy Code Coverage Time by 85%

             (engineering.salesforce.com)

            . Cursor AI reduces legacy code coverage effort from 26 to 4 engineer days per module, achieving 80% coverage across 76 repos, with AI-generated tests, iterative class-by-class analysis, and human oversight

             New Nemotron Nano 2 Open Reasoning Model Tops Leaderboard and Delivers 6x Higher Throughput

             (huggingface.co)

            . Nemotron Nano 2 9B delivers 6x throughput on edge reasoning with Hybrid Transformer–Mamba architecture, thinking budget, pruning to 9B, post-training alignment, and vLLM deployment

             Investigating Intersectional Bias in Large Language Models using Confidence Disparities in Coreference Resolution

             (machinelearning.apple.com)

            . WinoIdentity benchmark expands WinoBias with 25 demographic markers across 10 attributes, 245,700 prompts; analyzes intersectional bias via Coreference Confidence Disparity across LLMs

             How we built a high quality Q&A assistant

             (medium.com/airtable-eng)

            . Airtable Omni Q&A: LLM-driven multi-step reasoning, contextual schema exploration, planning and replanning, hybrid search with RAG, inline citations, token-efficient ID encoding, eval suites, and production-scale latency/ cost optimizations

             From massive models to mobile magic: The tech behind YouTube real-time generative AI effects

             (research.google)

            . YouTube real-time AI effects on mobile: distilling large generative models with PTI inversion, UNet-MobileNet student, on-device MediaPipe pipelines, 30fps latency, 6–10 ms GPUs, datasets with Monk Skin Tone scaling, and effects like Never Blink, Toon 2, Risen zombie

            📈 AI Industry, Economics & Future

             Will Giant Companies Always Have a Monopoly on Top AI Models?

             (aclu.org)

            . ACLU analysis examines data sourcing, pre-training costs, model scaling, DL training stages, data curation, Common Crawl, GDPR-like concerns, RLHF, SFT, retrieval-augmented generation, multi-modal data, distributed training, and Emergence in frontier LLMs

             Is this the moment when the Generative AI bubble finally deflates?

             (garymarcus.substack.com)

            . Generative AI hype, LLMs economics, GPT-5 expectations, Altman imagery, market enthusiasm decline, ROI concerns, gurus' reputations, practical use cases, and real-world value debates

             Humans aren't going anywhere

             (bitsondata.dev)

            . GenAI, AGI, open models, retrieval augmentation, domain-specific workflows, memory in AI, energy use, Nvidia valuation, OpenAI, LLaMA, Mistral, Phi 2, OpenOrca, RAG, zero-backend memory, enterprise ROI

             How I learn about generative AI

             (blog.pamelafox.org)

            . Pamela Fox outlines how she learns generative AI: foundational books and videos, LLM concepts, PyTorch, transformer basics, vector search indexes, RAG on Azure, practical projects, and sharing through talks and study sessions

            🛠️ Applications & Development Tools

             The Modern Data Toolbox

             (technology.doximity.com)

            . Hybrid data toolbox blends LLMs, traditional ML, and statistics for real-time fraud detection, enhanced product discovery, and synthetic data generation with privacy-preserving techniques

             Small hallucinations, big problems

             (kucharski.substack.com)

            . Bayesian reasoning on LLM hallucinations in rare-event detection, 1% false positives, base rates, ELISA analogy for crises, Western Blot follow-ups, human-in-the-loop implications

             Smarter Model Tuning: An AI Agent with LangGraph + Streamlit That Boosts ML Performance

             (towardsdatascience.com)

            . Automating model tuning with LangGraph, Streamlit, and Gemini 2.0 Flash to improve regression and classification performance using graph-based nodes, LLM prompts, and a Streamlit UI

             Leverage LLM for Next-Gen Recommender Systems: The Evolution of Recommender Systems and Rise of LLMs

             (lfaidata.foundation)

            . Evolution from rule-based to ML pipelines; LLM-driven embedding-based, generative, and hybrid recommender forms; instruction tuning, LoRA, in-context learning; evaluation, explainability, fairness; GenAI Commons

            🔍 RAG, Context Engineering & Evaluation

             "RAG is Dead, Context Engineering is King" — with Jeff Huber of Chroma

             (latent.space)

            . Jeff Huber of Chroma discusses modern AI workloads, vector databases, context engineering, Retrieval-Augmented Language Models, Context Rot, Generative Benchmarking, and practical deployment tips for production search systems

             ragnar 0.2

             (tidyverse.org)

            . Ragnar 0.2 introduces a tidy R package for building trustworthy RAG pipelines, embedding with OpenAI models, creating a duckdb store, and retrieving via semantic and BM25 scoring

             How to Create Powerful LLM Applications with Context Engineering

             (towardsdatascience.com)

            . Context engineering, prompt structuring, context window management, RAG vs keyword search, context compression, BM25 retrieval, evaluation via A/B testing, observability, prompt management tools, manual context inspection, LLM application reliability

             Evaluating RAG, aka Optimizing the Optimization

             (blog.n8n.io)

            . RAG performance, evaluation metrics, and n8n integrations: document relevance, context recall/precision, groundedness, hallucination types, HHEM model, and OpenAI-based evaluation workflows

            🔧 Model Architecture & Technical Analysis

             llama.cpp guide: running gpt-oss with llama.cpp

             (simonwillison.net)

            . Guide to running gpt-oss with llama.cpp on macOS using llama-server, including ggml gpt-oss-20b-GGUF, homebrew setup, model cache, port 8080, and performance notes on M2 Macs

             How LLMs See Images, Audio, and More

             (blog.bytebytego.com)

            . Tokenization of multi-modal data: image patches via patch embeddings, VQ-VAE/ VQ-GAN patterns, CLIP-style embeddings, and audio codecs like EnCodec and SoundStream; ASR-based tokens, multi-scale hierarchies, tradeoffs in efficiency, quality, and semantic preservation

             Unboxing the Black Box: Understanding LLMs with Reverse Mechanistic Localization

             (journal.hexmos.com)

            . Reverse Mechanistic Localization (RML) explained via querying masked language models like DistilBERT, tracking token influences, attention maps, and top predictions, with a Colab workflow

             GPT-oss from the Ground Up

             (cameronrwolfe.substack.com)

            . OpenAI's GPT-oss: MoE transformers, GPT-oss-20b/120b, 131k token context, Harmony prompt format, MXFP4 quantization, pre-normalization RMSNorm, MoE routing, agentic workflows, health benchmarks, o3/o4-mini comparisons

             A look through the Seven Years of Transformers [Guest]

             (artificialintelligencemadesimple.substack.com)

            . DeepSeek V3/R1 with MLA and MoE; GQA vs MLA memory tradeoffs; Mistral, Gemma, Qwen3, Kimi K2 scales; architecture vs data; sliding window attention; Fractals in intelligence; 8pm EST live streams

             The Illustrated GPT-OSS

             (newsletter.languagemodels.co)

            . GPT-OSS open-source LLM from OpenAI; mixture-of-experts MoE architecture; tokenization notes; reasoning modes (low/medium/high); tool usage, attention visuals, and system/developer messages; comparisons to GPT-2, DeepSeek, Qwen, Kimi; tokenization of emoji, Arabic, etc.; architecture diagrams and course reference

            📚 Academic Research

             Intern-S1: A Scientific Multimodal Foundation Model

             (arxiv:cs)

            . Shanghai AI Laboratory releases 241B-parameter multimodal MoE model specialized for scientific domains, achieving SOTA performance on molecular synthesis and crystal prediction tasks

             ComputerRL: Scaling End-to-End Online Reinforcement Learning for   Computer Use Agents

             (arxiv:cs)

            . Tsinghua and Zhipu AI achieve 48.1% accuracy on OSWorld benchmark using distributed RL infrastructure for training desktop automation agents at scale

             Thyme: Think Beyond Images

             (arxiv:cs)

            . Chinese tech consortium develops multimodal LLM that autonomously generates executable code for image processing and mathematical computations during reasoning tasks

             Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains   RLVR

             (arxiv:cs)

            . UCLA and Microsoft researchers solve entropy collapse in RLVR training, achieving 18.3% improvement on competition-level mathematical reasoning benchmarks

             Efficient Mixed-Precision Large Language Model Inference with TurboMind

             (arxiv:cs)

            . Shanghai AI Lab delivers up to 61% lower latency and 156% higher throughput in LLM inference through hardware-optimized mixed-precision techniques

             Controlling Multimodal LLMs via Reward-guided Decoding

             (arxiv:cs)

            . Mila and Meta FAIR introduce first reward-guided decoding method for multimodal LLMs, enabling real-time control over visual grounding precision and recall

             MedReseacher-R1: Expert-Level Medical Deep Researcher via A   Knowledge-Informed Trajectory Synthesis Framework

             (arxiv:cs)

            . Ant Group creates specialized medical research agent using knowledge graphs and custom retrieval, outperforming larger proprietary models on medical benchmarks

            👋 Before you go

            I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching

             a Patreon page!

            .  Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:

             Real say in how Blaze evolves — vote on new topics, features, topic curation ideas

             First dibs on merch (details still cooking)

             That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing

            If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.

              Have an idea for how blaze could be better? Please visit the

               feedback form

              to let us know. To update your preferences, or to unsubscribe, please go to

               blaze.email/unsubscribe

              .

Don't miss what's next. Subscribe to The AI Engineer:

Start the conversation: