The AI Engineer 28-10-2025
OpenAI's Atlas browser gets mixed reviews, Google advances 'vibe coding', and a call for AI superintelligence safety.
📣 Headlines
• OpenAI launched the Atlas browser built around ChatGPT with an agent mode for paid users, but early hands-on coverage questioned its real-world value and suggested it may primarily serve OpenAI's data-collection aims.
• Google updated AI Studio with 'vibe coding' natural-language prompts, secret variables, visual UI editing, and one-click Cloud Run deploys, signaling a push toward low-friction LLM app dev workflows.
• Anchor raised $6M to launch b0.dev, a cloud-based agentic browser automation platform with planning-first execution, targeting headless web workflows powered by LLM planners.
• Uniphore secured $260M from Nvidia, Snowflake, Databricks, AMD and others to scale its Business AI Cloud for enterprise data, models, guards, and agent orchestration.
• Pegasystems reported growing earnings and ACV as its agentic AI tools like Pega Blueprint and GenAI Blueprint gain traction, underlining demand for orchestrated enterprise agents.
• A coalition of public figures urged a [prohibition on developing AI superintelligence until robust safety and public buy-in exist]https://futurism.com/artificial-intelligence/steve-wozniak-steve-bannon-letter-ai-superintelligence, a call echoed by additional signatories including Harry and Meghan.
• China unveiled a wind-powered, offshore underwater data center near Shanghai with ocean cooling and 24 MW capacity, with further details on seabed modules and reduced land/water usage highlighted in additional reporting.
• MIT researchers advanced neuromorphic, magnesium-based ionic devices for brain-inspired, energy-efficient AI hardware, pointing toward lower-power alternatives to conventional accelerators.
đź”§ Company Engineering Blogs
Inside the AIPCon 8 Demos Transforming Manufacturing, Insurance, and Construction (blog​.palantir​.com). Four Palantir AIPCon 8 demos—Ursa Major, Epirus, Acrisure, and Thomas Cavanagh—show production-ready AI transforming MES, MRPspeed, underwriting, and central operating systems
Scaling Privacy Infrastructure for GenAI Product Innovation (engineering​.fb​.com). Meta scales Privacy Aware Infrastructure (PAI) for GenAI with data lineage, policy enforcement, and on-device/off-cloud processing using PrivacyLib in AI glasses
Multi-Table Predictions in Data Cloud: Enabling Machine Learning Across Related Data Objects (engineering​.salesforce​.com). Multi-DMO in Data Cloud enables cross-object predictions with UI/UX refinements, SQL performance tuning, and 99.999% reliability
The road to better completions: Building a faster, smarter GitHub Copilot with a new custom model (github​.blog). GitHub Copilot’s new custom model boosts code completions with 20% more accepted characters and faster throughput
Streaming datasets: 100x More Efficient (huggingface​.co). Streaming datasets enable 100x faster data loading with a simple streaming API, dedupe storage (Xet), Parquet CDC, and scalable pipelines
đź§ Memory & Context
LLM Hallucinations in Practical Code Generation — Phenomena, Mechanism, and Mitigation (gwolf​.org). Six LLMs evaluated on Python code generation from CoderEval, taxonomy of hallucinations, root causes, and RAG mitigation
The Memory Problem: Why LLMs Sometimes Forget Your Conversation (blog​.bytebytego​.com). Context windows and RAG limit LLM memory; tokens, attention, and stateless design explained with examples and GPU constraints
Making sense of KV Cache optimizations, Ep. 2: Token-level (zansara​.dev). Token-level KV cache optimizations including selection, budget allocation, merging, quantization, and low-rank decomposition techniques
How does prompt caching work? (zansara​.dev). KV caching in decoder-only Transformers speeds autoregressive generation by reusing past keys and values
Meta’s new free transformer (kiledjian​.com). Meta's Free Transformer introduces a latent variable layer enabling working memory for improved planning in generation
📚 Retrieval & RAG
Is RAG Dead? The Rise of Context Engineering and Semantic Layers for Agentic AI (towardsdatascience​.com). Context engineering and semantic layers govern data, enabling agentic AI with GraphRAG, knowledge graphs, and memory tools
Building an Open-Source RAG System (elimbi​.com). Self-hosted RAG workflow using Puppeteer/Cheerio, site detection, adaptive chunking, local embeddings with Xenova MiniLM, vector search, and OpenRouter for Llama-3.2-3B-instruct
How I’m Building a Context-Aware Retriever to Boost RAG Quality (Part 1: Introduction) (egpivo​.github​.io). Context-aware retriever for RAG: MCP server, keyword/semantic fusion, reflection, expansion, and reranking with FastMCP, jina reranker, and query expander
“We’re Good at Search”… Just Not the Kind That the AI era Demands - a Provocation (aarontay​.substack​.com). Librarians confront AI-powered search: relevancy, retrieval, evaluation, and the shift toward vector embeddings and retrieval augmented generation
Creating an OpenAI Vector Store for RAG Systems (jamesmccaffreyblog​.com). OpenAI vector store demo for RAG using the Responses API and vector stores, with sample Python code to create and delete a store
⚙️ Inference & Systems
Accelerating Hybrid Inference in SGLang with KTransformers CPU Kernels (lmsys​.org). KTransformers CPU kernels, AMX/AVX-512, NUMA-aware tensor parallelism, CUDA Graphs, Expert Deferral, SGLang integration for MoE hybrid inference
Retro Language Models: Rebuilding Karpathy’s RNN in PyTorch (gilesthomas​.com). Retro RNNs in PyTorch: TBPTT, NextByteDataset, batchify, and training loop with Karpathy-inspired LSTM
lecture nine (aarnphm​.xyz). Lecture nine covers PyTorch and JAX, including NumPy+Autograd, JAX jit, memory handling, static_argnums, and normalization techniques
Building Gemma3 from scratch in Rust (lucas-montes​.com). Rust-based recreation of Gemma3 components: Linear, TransformerBlock, GQA, RMSNorm, RoPE, masking, and Safetensors loading
NVIDIA DGX Spark vs Mac Studio vs RTX-4080: Ollama Performance Comparison (glukhov​.org). Ollama performance across NVIDIA DGX Spark, Mac Studio, and RTX 4080 for GPT-OSS 120b (65GB MoE model) with CPU/GPU offloading
Building the RoPE operation for Tenstorrent hardware (clehaxze​.tw). RoPE rotation kernel for Tenstorrent GGML backend; on-the-fly trig functions, SFPU tiling, and NeoX variant
đź§Ş Evaluation & Agents
Beyond Benchmarks: Testing Open-Source LLMs in Multi-Agent Workflows by Ollie Southwell (blog​.scottlogic​.com). Open-source LLMs tested in multi-agent ESG analysis workflows using Infer ESG, LM Studio, and various models (DeepSeek-R1-0528, LFM2, GPT-OSS 20B, Qwen3-30B-A3B, Gemma, GPT-4o) on local and AWS infrastructure
Exploring the multi-dimensional refusal subspace in reasoning models (lesswrong​.com). Explores multi-dimensional refusal subspace in LLMs using DIM, probes, BigBench-style dataset, SSR-inspired harm data, MINCOS, SVD, and weight ablation on Qwen3/12B-14B models
PPO for LLMs: A Guide for Normal People (cameronrwolfe​.substack​.com). Overview of RL basics, policy gradients, VPG, TRPO, PPO, GRPO, GAE, and their application to LLMs with token-level actions and reward models
AI Agents: The case for Eval Driven Development (sdarchitect​.blog). Eval Driven Development for AI agents using EDD, evals, MCP protocol, A2A, drift, bias, compliance, observability, and continuous testing with Andrew Ng and Hamel Husain quotes
A response to everyone bashing evals (pashpashpash​.substack​.com). Evaluations 2.0: converging benchmarks and RL environments, verifiers as the new meta, Terminal-Bench and Environments Hub
đź’ˇ Theory & Perspectives
Note on Beyond the Machine Extras via Frank Chimero (joshbeckman​.org). Generative models over AI, common intelligence, and open development; mentions Holly Herndon, Mat Dryhurst, LLMs, GPT, and symbolic milestones
AI Winter is Coming… Or Is It? (blog​.apiad​.net). Pragmatic view on AI hype, market correction, and shift to practical AI adoption, with OPEN-SOURCE models, frontier AI costs, and JEPA/Neuro-Symbolic AI ideas
Paper Review: The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain (andlukyane​.com). Biologically inspired BDH: graph-based neuron units, Hebbian learning, and Transformer-level performance with GPU efficiency
Patterns and Anti-Patterns for Building with LLMs (marvelousmlops​.io). Seven sins of AI app development with penances, RAG design, context engineering, multi-agent caution, and practical prompts
The continual learning problem (jessylin​.com). Memory layers enable continual learning with sparse updates, outperforming full finetuning and LoRA on TriviaQA and NaturalQuestions
LLMs as Ultra-High-Level Programming Languages (philoserf​.com). Explores LLMs as ultra-high-level programming languages, comparing abstraction, determinism, and non-deterministic execution with SQL, 4GLs, and garbage collection
📚 Academic Research
Exploring a Unified Vision-Centric Contrastive Alternatives on Multi-Modal Web Documents (arxiv:cs). MTraining enables efficient distributed training of ultra-long-context LLMs (up to 512K tokens) via dynamic sparse attention. It achieves up to 6× throughput speedups—vital for scaling generative models
CoSense-LLM: Semantics at the Edge with Cost- and Uncertainty-Aware Cloud-Edge Cooperation (arxiv:cs). Topoformer augments self-attention with spatial querying and reweighting to induce topographic organization in Transformers. It preserves performance, improves interpretability, and shows alignment with human fMRI language maps
Adamas: Hadamard Sparse Attention for Efficient Long-Context Inference (arxiv:cs). Adamas uses Hadamard transform, bucketization, and compact KV caches to preselect attention candidates, matching full-attention accuracy with much lower compute. It delivers large inference speedups for long-context generative deployment
Head Pursuit: Probing Attention Specialization in Multimodal Transformers (arxiv:cs). Head Pursuit ranks attention heads by concept relevance to expose specialization in multimodal transformers. Targeted edits of as few as 1% of heads can controllably suppress or enhance model concepts
Multi-Agent Evolve: LLM Self-Improve through Co-evolution (arxiv:cs). Multi-Agent Evolve (MAE) instantiates Proposer/Solver/Judge agents from one LLM to self-generate tasks and co-evolve via RL without human labels. It yields measurable reasoning improvements, enabling scalable self-improvement