📣 Headlines
           
            •
            
             OpenAI faces legal challenges
            
            as it subpoenas Meta in Musk's takeover bid case, while simultaneously
            
             expanding into India
            
            with a new Delhi office and launching ChatGPT Go at ₹399/month.
            
            •
            
             ChatGPT and Claude are entering U.S. government
            
            operations, raising concerns about security and governance, while
            
             Salesforce launches Agentforce for Public Sector
            
            with FedRAMP-certified AI agents for citizen services.
            
            •
            
             Google unveiled the Pixel 10 lineup
            
            featuring Tensor G5 chips, Gemini Nano integration, and Pro Res Zoom up to 100x, alongside
            
             new Gemini-powered smart home speakers
            
            with TV pairing and Matter support.
            
            •
            
             New research reveals AI models prefer AI-generated content
            
            over human-written descriptions, with GPT-4, GPT-3.5, and Llama showing systematic bias that could impact hiring and educational decisions.
            
            •
            
             Cohere released Command A Reasoning
            
            , a 111-billion parameter enterprise-focused LLM supporting 23 languages with advanced tool use and multi-GPU deployment capabilities for customer service applications.
            
            •
            
             Texas AG launches probe into Meta and Character.AI
            
            over misleading mental health claims, while
            
             Australian reports show teens being hospitalized
            
            after harmful interactions with AI chatbots.
            
            •
            
             Masayoshi Son's SoftBank surged $11 billion in two weeks
            
            on AI infrastructure bets, boosting holdings in Nvidia and TSMC amid the ongoing AI boom.
            
            •
            
             Film schools are embracing AI tools
            
            like ChatGPT and RunwayML for screenwriting courses, while
            
             Japanese novelist Rie Qudan won a prize
            
            for a novel composed using ChatGPT assistance.
            
            🔧 Company Engineering Blogs
           
             How Cursor AI Cut Legacy Code Coverage Time by 85%
            
             (engineering.salesforce.com)
            
            . Cursor AI reduces legacy code coverage effort from 26 to 4 engineer days per module, achieving 80% coverage across 76 repos, with AI-generated tests, iterative class-by-class analysis, and human oversight
            
             New Nemotron Nano 2 Open Reasoning Model Tops Leaderboard and Delivers 6x Higher Throughput
            
             (huggingface.co)
            
            . Nemotron Nano 2 9B delivers 6x throughput on edge reasoning with Hybrid Transformer–Mamba architecture, thinking budget, pruning to 9B, post-training alignment, and vLLM deployment
            
             Investigating Intersectional Bias in Large Language Models using Confidence Disparities in Coreference Resolution
            
             (machinelearning.apple.com)
            
            . WinoIdentity benchmark expands WinoBias with 25 demographic markers across 10 attributes, 245,700 prompts; analyzes intersectional bias via Coreference Confidence Disparity across LLMs
            
             How we built a high quality Q&A assistant
            
             (medium.com/airtable-eng)
            
            . Airtable Omni Q&A: LLM-driven multi-step reasoning, contextual schema exploration, planning and replanning, hybrid search with RAG, inline citations, token-efficient ID encoding, eval suites, and production-scale latency/ cost optimizations
            
             From massive models to mobile magic: The tech behind YouTube real-time generative AI effects
            
             (research.google)
            
            . YouTube real-time AI effects on mobile: distilling large generative models with PTI inversion, UNet-MobileNet student, on-device MediaPipe pipelines, 30fps latency, 6–10 ms GPUs, datasets with Monk Skin Tone scaling, and effects like Never Blink, Toon 2, Risen zombie
            
            📈 AI Industry, Economics & Future
           
             Will Giant Companies Always Have a Monopoly on Top AI Models?
            
             (aclu.org)
            
            . ACLU analysis examines data sourcing, pre-training costs, model scaling, DL training stages, data curation, Common Crawl, GDPR-like concerns, RLHF, SFT, retrieval-augmented generation, multi-modal data, distributed training, and Emergence in frontier LLMs
            
             Is this the moment when the Generative AI bubble finally deflates?
            
             (garymarcus.substack.com)
            
            . Generative AI hype, LLMs economics, GPT-5 expectations, Altman imagery, market enthusiasm decline, ROI concerns, gurus' reputations, practical use cases, and real-world value debates
            
             Humans aren't going anywhere
            
             (bitsondata.dev)
            
            . GenAI, AGI, open models, retrieval augmentation, domain-specific workflows, memory in AI, energy use, Nvidia valuation, OpenAI, LLaMA, Mistral, Phi 2, OpenOrca, RAG, zero-backend memory, enterprise ROI
            
             How I learn about generative AI
            
             (blog.pamelafox.org)
            
            . Pamela Fox outlines how she learns generative AI: foundational books and videos, LLM concepts, PyTorch, transformer basics, vector search indexes, RAG on Azure, practical projects, and sharing through talks and study sessions
            
            🛠️ Applications & Development Tools
           
             The Modern Data Toolbox
            
             (technology.doximity.com)
            
            . Hybrid data toolbox blends LLMs, traditional ML, and statistics for real-time fraud detection, enhanced product discovery, and synthetic data generation with privacy-preserving techniques
            
             Small hallucinations, big problems
            
             (kucharski.substack.com)
            
            . Bayesian reasoning on LLM hallucinations in rare-event detection, 1% false positives, base rates, ELISA analogy for crises, Western Blot follow-ups, human-in-the-loop implications
            
             Smarter Model Tuning: An AI Agent with LangGraph + Streamlit That Boosts ML Performance
            
             (towardsdatascience.com)
            
            . Automating model tuning with LangGraph, Streamlit, and Gemini 2.0 Flash to improve regression and classification performance using graph-based nodes, LLM prompts, and a Streamlit UI
            
             Leverage LLM for Next-Gen Recommender Systems: The Evolution of Recommender Systems and Rise of LLMs
            
             (lfaidata.foundation)
            
            . Evolution from rule-based to ML pipelines; LLM-driven embedding-based, generative, and hybrid recommender forms; instruction tuning, LoRA, in-context learning; evaluation, explainability, fairness; GenAI Commons
            
            🔍 RAG, Context Engineering & Evaluation
           
             "RAG is Dead, Context Engineering is King" — with Jeff Huber of Chroma
            
             (latent.space)
            
            . Jeff Huber of Chroma discusses modern AI workloads, vector databases, context engineering, Retrieval-Augmented Language Models, Context Rot, Generative Benchmarking, and practical deployment tips for production search systems
            
             ragnar 0.2
            
             (tidyverse.org)
            
            . Ragnar 0.2 introduces a tidy R package for building trustworthy RAG pipelines, embedding with OpenAI models, creating a duckdb store, and retrieving via semantic and BM25 scoring
            
             How to Create Powerful LLM Applications with Context Engineering
            
             (towardsdatascience.com)
            
            . Context engineering, prompt structuring, context window management, RAG vs keyword search, context compression, BM25 retrieval, evaluation via A/B testing, observability, prompt management tools, manual context inspection, LLM application reliability
            
             Evaluating RAG, aka Optimizing the Optimization
            
             (blog.n8n.io)
            
            . RAG performance, evaluation metrics, and n8n integrations: document relevance, context recall/precision, groundedness, hallucination types, HHEM model, and OpenAI-based evaluation workflows
            
            🔧 Model Architecture & Technical Analysis
           
             llama.cpp guide: running gpt-oss with llama.cpp
            
             (simonwillison.net)
            
            . Guide to running gpt-oss with llama.cpp on macOS using llama-server, including ggml gpt-oss-20b-GGUF, homebrew setup, model cache, port 8080, and performance notes on M2 Macs
            
             How LLMs See Images, Audio, and More
            
             (blog.bytebytego.com)
            
            . Tokenization of multi-modal data: image patches via patch embeddings, VQ-VAE/ VQ-GAN patterns, CLIP-style embeddings, and audio codecs like EnCodec and SoundStream; ASR-based tokens, multi-scale hierarchies, tradeoffs in efficiency, quality, and semantic preservation
            
             Unboxing the Black Box: Understanding LLMs with Reverse Mechanistic Localization
            
             (journal.hexmos.com)
            
            . Reverse Mechanistic Localization (RML) explained via querying masked language models like DistilBERT, tracking token influences, attention maps, and top predictions, with a Colab workflow
            
             GPT-oss from the Ground Up
            
             (cameronrwolfe.substack.com)
            
            . OpenAI's GPT-oss: MoE transformers, GPT-oss-20b/120b, 131k token context, Harmony prompt format, MXFP4 quantization, pre-normalization RMSNorm, MoE routing, agentic workflows, health benchmarks, o3/o4-mini comparisons
            
             A look through the Seven Years of Transformers [Guest]
            
             (artificialintelligencemadesimple.substack.com)
            
            . DeepSeek V3/R1 with MLA and MoE; GQA vs MLA memory tradeoffs; Mistral, Gemma, Qwen3, Kimi K2 scales; architecture vs data; sliding window attention; Fractals in intelligence; 8pm EST live streams
            
             The Illustrated GPT-OSS
            
             (newsletter.languagemodels.co)
            
            . GPT-OSS open-source LLM from OpenAI; mixture-of-experts MoE architecture; tokenization notes; reasoning modes (low/medium/high); tool usage, attention visuals, and system/developer messages; comparisons to GPT-2, DeepSeek, Qwen, Kimi; tokenization of emoji, Arabic, etc.; architecture diagrams and course reference
            
            📚 Academic Research
           
             Intern-S1: A Scientific Multimodal Foundation Model
            
             (arxiv:cs)
            
            . Shanghai AI Laboratory releases 241B-parameter multimodal MoE model specialized for scientific domains, achieving SOTA performance on molecular synthesis and crystal prediction tasks
            
             ComputerRL: Scaling End-to-End Online Reinforcement Learning for   Computer Use Agents
            
             (arxiv:cs)
            
            . Tsinghua and Zhipu AI achieve 48.1% accuracy on OSWorld benchmark using distributed RL infrastructure for training desktop automation agents at scale
            
             Thyme: Think Beyond Images
            
             (arxiv:cs)
            
            . Chinese tech consortium develops multimodal LLM that autonomously generates executable code for image processing and mathematical computations during reasoning tasks
            
             Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains   RLVR
            
             (arxiv:cs)
            
            . UCLA and Microsoft researchers solve entropy collapse in RLVR training, achieving 18.3% improvement on competition-level mathematical reasoning benchmarks
            
             Efficient Mixed-Precision Large Language Model Inference with TurboMind
            
             (arxiv:cs)
            
            . Shanghai AI Lab delivers up to 61% lower latency and 156% higher throughput in LLM inference through hardware-optimized mixed-precision techniques
            
             Controlling Multimodal LLMs via Reward-guided Decoding
            
             (arxiv:cs)
            
            . Mila and Meta FAIR introduce first reward-guided decoding method for multimodal LLMs, enabling real-time control over visual grounding precision and recall
            
             MedReseacher-R1: Expert-Level Medical Deep Researcher via A   Knowledge-Informed Trajectory Synthesis Framework
            
             (arxiv:cs)
            
            . Ant Group creates specialized medical research agent using knowledge graphs and custom retrieval, outperforming larger proprietary models on medical benchmarks
            
            👋 Before you go
           
            I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching
            
             a Patreon page!
            
            .  Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:
            
- 
             Real say in how Blaze evolves — vote on new topics, features, topic curation ideas
            
 
- 
             First dibs on merch (details still cooking)
            
 
- 
             That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing
            
 
 
            If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.
            
 |