The AI Engineer 28-10-2025

                        October 28, 2025

            The AI Engineer 28-10-2025
            OpenAI's Atlas browser gets mixed reviews, Google advances 'vibe coding', and a call for AI superintelligence safety.

                    📣 Headlines
•  OpenAI launched the Atlas browser built around ChatGPT with an agent mode for paid users, but early hands-on coverage questioned its real-world value and suggested it may primarily serve OpenAI's data-collection aims.
•  Google updated AI Studio with 'vibe coding' natural-language prompts, secret variables, visual UI editing, and one-click Cloud Run deploys, signaling a push toward low-friction LLM app dev workflows.
•  Anchor raised $6M to launch b0.dev, a cloud-based agentic browser automation platform with planning-first execution, targeting headless web workflows powered by LLM planners.
•  Uniphore secured $260M from Nvidia, Snowflake, Databricks, AMD and others to scale its Business AI Cloud for enterprise data, models, guards, and agent orchestration.
•  Pegasystems reported growing earnings and ACV as its agentic AI tools like Pega Blueprint and GenAI Blueprint gain traction, underlining demand for orchestrated enterprise agents.
•  A coalition of public figures urged a [prohibition on developing AI superintelligence until robust safety and public buy-in exist]https://futurism.com/artificial-intelligence/steve-wozniak-steve-bannon-letter-ai-superintelligence, a call echoed by additional signatories including Harry and Meghan.
•  China unveiled a wind-powered, offshore underwater data center near Shanghai with ocean cooling and 24 MW capacity, with further details on seabed modules and reduced land/water usage highlighted in additional reporting.
•  MIT researchers advanced neuromorphic, magnesium-based ionic devices for brain-inspired, energy-efficient AI hardware, pointing toward lower-power alternatives to conventional accelerators.

🔧 Company Engineering Blogs
Inside the AIPCon 8 Demos Transforming Manufacturing, Insurance, and Construction (blog.palantir.com). Four Palantir AIPCon 8 demos—Ursa Major, Epirus, Acrisure, and Thomas Cavanagh—show production-ready AI transforming MES, MRPspeed, underwriting, and central operating systems
Scaling Privacy Infrastructure for GenAI Product Innovation (engineering.fb.com). Meta scales Privacy Aware Infrastructure (PAI) for GenAI with data lineage, policy enforcement, and on-device/off-cloud processing using PrivacyLib in AI glasses
Multi-Table Predictions in Data Cloud: Enabling Machine Learning Across Related Data Objects (engineering.salesforce.com). Multi-DMO in Data Cloud enables cross-object predictions with UI/UX refinements, SQL performance tuning, and 99.999% reliability
The road to better completions: Building a faster, smarter GitHub Copilot with a new custom model (github.blog). GitHub Copilot’s new custom model boosts code completions with 20% more accepted characters and faster throughput
Streaming datasets: 100x More Efficient (huggingface.co). Streaming datasets enable 100x faster data loading with a simple streaming API, dedupe storage (Xet), Parquet CDC, and scalable pipelines

🧠 Memory & Context
LLM Hallucinations in Practical Code Generation — Phenomena, Mechanism, and Mitigation (gwolf.org). Six LLMs evaluated on Python code generation from CoderEval, taxonomy of hallucinations, root causes, and RAG mitigation
The Memory Problem: Why LLMs Sometimes Forget Your Conversation (blog.bytebytego.com). Context windows and RAG limit LLM memory; tokens, attention, and stateless design explained with examples and GPU constraints
Making sense of KV Cache optimizations, Ep. 2: Token-level (zansara.dev). Token-level KV cache optimizations including selection, budget allocation, merging, quantization, and low-rank decomposition techniques
How does prompt caching work? (zansara.dev). KV caching in decoder-only Transformers speeds autoregressive generation by reusing past keys and values
Meta’s new free transformer (kiledjian.com). Meta's Free Transformer introduces a latent variable layer enabling working memory for improved planning in generation

📚 Retrieval & RAG
Is RAG Dead? The Rise of Context Engineering and Semantic Layers for Agentic AI (towardsdatascience.com). Context engineering and semantic layers govern data, enabling agentic AI with GraphRAG, knowledge graphs, and memory tools
Building an Open-Source RAG System (elimbi.com). Self-hosted RAG workflow using Puppeteer/Cheerio, site detection, adaptive chunking, local embeddings with Xenova MiniLM, vector search, and OpenRouter for Llama-3.2-3B-instruct
How I’m Building a Context-Aware Retriever to Boost RAG Quality (Part 1: Introduction) (egpivo.github.io). Context-aware retriever for RAG: MCP server, keyword/semantic fusion, reflection, expansion, and reranking with FastMCP, jina reranker, and query expander
“We’re Good at Search”… Just Not the Kind That the AI era Demands - a Provocation (aarontay.substack.com). Librarians confront AI-powered search: relevancy, retrieval, evaluation, and the shift toward vector embeddings and retrieval augmented generation
Creating an OpenAI Vector Store for RAG Systems (jamesmccaffreyblog.com). OpenAI vector store demo for RAG using the Responses API and vector stores, with sample Python code to create and delete a store

⚙️ Inference & Systems
Accelerating Hybrid Inference in SGLang with KTransformers CPU Kernels (lmsys.org). KTransformers CPU kernels, AMX/AVX-512, NUMA-aware tensor parallelism, CUDA Graphs, Expert Deferral, SGLang integration for MoE hybrid inference
Retro Language Models: Rebuilding Karpathy’s RNN in PyTorch (gilesthomas.com). Retro RNNs in PyTorch: TBPTT, NextByteDataset, batchify, and training loop with Karpathy-inspired LSTM
lecture nine (aarnphm.xyz). Lecture nine covers PyTorch and JAX, including NumPy+Autograd, JAX jit, memory handling, static_argnums, and normalization techniques
Building Gemma3 from scratch in Rust (lucas-montes.com). Rust-based recreation of Gemma3 components: Linear, TransformerBlock, GQA, RMSNorm, RoPE, masking, and Safetensors loading
NVIDIA DGX Spark vs Mac Studio vs RTX-4080: Ollama Performance Comparison (glukhov.org). Ollama performance across NVIDIA DGX Spark, Mac Studio, and RTX 4080 for GPT-OSS 120b (65GB MoE model) with CPU/GPU offloading
Building the RoPE operation for Tenstorrent hardware (clehaxze.tw). RoPE rotation kernel for Tenstorrent GGML backend; on-the-fly trig functions, SFPU tiling, and NeoX variant

🧪 Evaluation & Agents
Beyond Benchmarks: Testing Open-Source LLMs in Multi-Agent Workflows by Ollie Southwell (blog.scottlogic.com). Open-source LLMs tested in multi-agent ESG analysis workflows using Infer ESG, LM Studio, and various models (DeepSeek-R1-0528, LFM2, GPT-OSS 20B, Qwen3-30B-A3B, Gemma, GPT-4o) on local and AWS infrastructure
Exploring the multi-dimensional refusal subspace in reasoning models (lesswrong.com). Explores multi-dimensional refusal subspace in LLMs using DIM, probes, BigBench-style dataset, SSR-inspired harm data, MINCOS, SVD, and weight ablation on Qwen3/12B-14B models
PPO for LLMs: A Guide for Normal People (cameronrwolfe.substack.com). Overview of RL basics, policy gradients, VPG, TRPO, PPO, GRPO, GAE, and their application to LLMs with token-level actions and reward models
AI Agents: The case for Eval Driven Development (sdarchitect.blog). Eval Driven Development for AI agents using EDD, evals, MCP protocol, A2A, drift, bias, compliance, observability, and continuous testing with Andrew Ng and Hamel Husain quotes
A response to everyone bashing evals (pashpashpash.substack.com). Evaluations 2.0: converging benchmarks and RL environments, verifiers as the new meta, Terminal-Bench and Environments Hub

💡 Theory & Perspectives
Note on Beyond the Machine Extras via Frank Chimero (joshbeckman.org). Generative models over AI, common intelligence, and open development; mentions Holly Herndon, Mat Dryhurst, LLMs, GPT, and symbolic milestones
AI Winter is Coming… Or Is It? (blog.apiad.net). Pragmatic view on AI hype, market correction, and shift to practical AI adoption, with OPEN-SOURCE models, frontier AI costs, and JEPA/Neuro-Symbolic AI ideas
Paper Review: The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain (andlukyane.com). Biologically inspired BDH: graph-based neuron units, Hebbian learning, and Transformer-level performance with GPU efficiency
Patterns and Anti-Patterns for Building with LLMs (marvelousmlops.io). Seven sins of AI app development with penances, RAG design, context engineering, multi-agent caution, and practical prompts
The continual learning problem (jessylin.com). Memory layers enable continual learning with sparse updates, outperforming full finetuning and LoRA on TriviaQA and NaturalQuestions
LLMs as Ultra-High-Level Programming Languages (philoserf.com). Explores LLMs as ultra-high-level programming languages, comparing abstraction, determinism, and non-deterministic execution with SQL, 4GLs, and garbage collection

📚 Academic Research
Exploring a Unified Vision-Centric Contrastive Alternatives on   Multi-Modal Web Documents (arxiv:cs). MTraining enables efficient distributed training of ultra-long-context LLMs (up to 512K tokens) via dynamic sparse attention. It achieves up to 6× throughput speedups—vital for scaling generative models
CoSense-LLM: Semantics at the Edge with Cost- and Uncertainty-Aware   Cloud-Edge Cooperation (arxiv:cs). Topoformer augments self-attention with spatial querying and reweighting to induce topographic organization in Transformers. It preserves performance, improves interpretability, and shows alignment with human fMRI language maps
Adamas: Hadamard Sparse Attention for Efficient Long-Context Inference (arxiv:cs). Adamas uses Hadamard transform, bucketization, and compact KV caches to preselect attention candidates, matching full-attention accuracy with much lower compute. It delivers large inference speedups for long-context generative deployment
Head Pursuit: Probing Attention Specialization in Multimodal   Transformers (arxiv:cs). Head Pursuit ranks attention heads by concept relevance to expose specialization in multimodal transformers. Targeted edits of as few as 1% of heads can controllably suppress or enhance model concepts
Multi-Agent Evolve: LLM Self-Improve through Co-evolution (arxiv:cs). Multi-Agent Evolve (MAE) instantiates Proposer/Solver/Judge agents from one LLM to self-generate tasks and co-evolve via RL without human labels. It yields measurable reasoning improvements, enabling scalable self-improvement

            Read more
            →

                        Oct 21, 2025

                    The AI Engineer

                    Move to buttondown, AI news, models and tooling

                        Read article →

Don't miss what's next. Subscribe to The AI Engineer:

Start the conversation: