The AI Engineer 02-12-2025
AI's transformative effect on work, political manipulation, advancements in cyber-security
📣 Headlines
• Three years after ChatGPT's debut, analyses of its global impact on work, education, and local AI ecosystems and the broader AI hype saturating art, media, and industry show the technology moving from novelty to entrenched infrastructure.
• New legal and platform controversies—from an OpenAI court filing that links a teen's suicide to alleged ChatGPT rule violations to Elon Musk's Grokipedia project that rewrites Wikipedia with partisan, extremist-sourced narratives—underscore the risks of unregulated generative AI systems.
• AI is intensifying political manipulation, with AI-funded PACs pouring money into deregulation‑friendly candidates in the U.S. elections while networks of Asia‑based pro‑Trump accounts on X monetize rage‑bait and potential foreign influence operations at scale.
• Leading technologists signal AI will transform—not eliminate—work, as Nvidia's CEO predicts AI will increase productivity but also workloads while MIT researchers argue adoption costs will shift humans toward higher‑expertise tasks rather than wholesale job loss.
• Enterprise AI infrastructure continues to mature, with Kovant pitching itself as an orchestration nerve center for fleets of agentic AI systems in industrial operations and NetApp raising guidance on the back of demand for its AI‑optimized storage platforms and data engine.
• Music labels are moving from litigation to licensing as Warner Music signs a deal allowing opt‑in use of artists' voices on AI song generator Suno, following a settlement that aims to turn the tool into an artist‑friendly, licensed AI music platform.
• Cybersecurity leaders are leaning on AI both as a weapon and a shield, with Clover Security raising major funding to automate vulnerability remediation while IT executives warn that detecting AI‑generated deepfake attacks will be a top priority by 2026.
• Defense and dual‑use AI are under scrutiny as record defense‑tech funding rounds chase autonomous weapons and sensing platforms, even as high‑profile systems from Anduril reportedly stumble in tests and combat, revealing technical and reliability gaps.
đź”§ Company Engineering Blogs
Why developers still flock to Python: Guido van Rossum on readability, AI, and the future of programming (github​.blog). Guido van Rossum discusses Python’s readability, its pivotal role in AI, and the language’s evolution and ecosystem
Transformers v5: Simple model definitions powering the AI ecosystem (huggingface​.co). Transformers v5 introduces simplicity, modular model definitions, training at scale, improved inference, and quantization with PyTorch focus and ecosystem interoperability
Sample and Map from a Single Convex Potential: Generation using Conjugate Moment Measures (machinelearning​.apple​.com). Conjugate moment measures and convex potentials to sample and map from a log-concave distribution using input-convex neural networks
Using LLMs to improve Amazon product listings (amazon​.science). LLMs adapt to catalogue structures via prompt tuning to improve attribute quality and multilingual coverage at scale
đź“° Conferences & Community
Oxford researchers to explore metrics and fairness at NeurIPS 2025 (oii​.ox​.ac​.uk). Oxford Internet Institute researchers present at NeurIPS 2025 in San Diego, exploring metrics, fairness, multilingual and community-centered AI perspectives with papers on benchmarking, bias mitigation, multilingual LLM judging, and Queer in AI workshop
Announcing the NeurIPS 2025 Best Paper Awards (blog​.neurips​.cc). NeurIPS 2025 best paper awards announced across seven papers spanning diffusion, LLMs, RL, and benchmarking
512KB Club: philna.sh was added or updated (philna​.sh). Phil Nash showcases software development, speaking, and open‑source work, including Langflow at IBM and on-stage coding
🌍 AI Landscape & Theory
Three years on, ChatGPT still isn't what it was cracked up to be – and it probably never will be (garymarcus​.substack​.com). Skeptic argues ChatGPT won’t achieve AGI; discusses hype, ROI, reliability, and need for structured systems
A Brief History of Large Language Models (koenvangilst​.nl). From ChatGPT to AI agents with tools, a user-centric timeline of LLM evolution, reasoning time, retrieval, and end-to-end workflows
Claude Opus 4.5, and why evaluating new LLMs is increasingly difficult (simonw​.substack​.com). Opus 4.5, prompt reliability, robustness to prompt injection, and Nano Banana Pro image generation with Grounding via Google Search
LLMs Navigate Knowledge, They Do Not Create (libidosciendi​.substack​.com). Columbia professor Vishal Misra’s formal models argue LLMs navigate knowledge, not create new frameworks, with RAG origins and Bayesian reasoning
Ilya Sutskever – We're moving from the age of scaling to the age of research (dwarkesh​.com). Ilya Sutskever discusses aging from scaling to research, model generalization, RL efficiency, value functions, pre-training limits, and implications for AI alignment
🤖 Agents & Applied LLMs
Fara-7B: An efficient agentic model for computer use (github​.com). Fara-7B: a 7B agentic LLM for automated computer use, built on Qwen2.5-VL-7B, with a synthetic data pipeline and Playwright-based web automation
Evaluating Answers with Large Language Models: How InferESG and RAGAS Helped by Ana Fonseca (blog​.scottlogic​.com). Open-source and proprietary LLMs evaluated via InferESG and RAGAS for ESG report QA, focusing on Qwen3-30b and GPT-OSS-20b
#528: Python apps with LLM building blocks (talkpython​.fm). Vincent Warmerdam discusses treating LLMs as APIs in Python apps, with caching, validation, and structured outputs using DiskCache, LLM library, Pydantic, and Marimo notebooks
Building the Knowledge Layer Your Agents Need (thedataexchange​.media). Graph RAG, knowledge graphs, AI agents, and memory with Philip Rathle of Neo4j, exploring production patterns and governance
Community Benchmarks for AI Coding Tools (nesbitt​.io). Discusses community benchmarks for AI coding tools across Ruby, Elixir, Go, Rust; proposes maintainer-defined tests to improve model guidance
🖥️ Local & Offline LLM Builds
So you wanna build a local RAG? (blog​.yakkomajuri​.com). Local RAG setup using Postgres pgvector, Sentence Transformers, llama.cpp LLM, Docling, and open-source benchmarks
Building an offline AI stoic chatbot (abishekmuthian​.com). Offline Epictetus chatbot on ARM devices using Gemma-3-270m, fine-tuning with 4-bit QLoRA, Python environments, and JSONL datasets
how prompt caching works - paged attention and prefix caching plus practical tips (sankalp​.bearblog​.dev). Explains vLLM's paged attention and prefix caching, KV-cache reuse across requests, and practical prompts optimization in Python-like terms
Optimizing Token Generation in llama.cpp's CUDA Backend (am17an​.bearblog​.dev). Optimizing token generation in llama.cpp CUDA backend with kernel fusion and concurrent streams for faster TG workloads
Switching from Ollama to llama-swap + llama.cpp on NixOS: the power user's choice 🦙 (nijho​.lt). Power-user guide to switching from Ollama to llama-swap + llama.cpp on NixOS for multi-GPU, RAM offloading, and declarative model management
đź§Ş Alignment & Emergent Behaviors
Image Diffusion Models Exhibit Emergent Temporal Propagation in Videos (arxiv​.org). Diffusion models for images show emergent temporal propagation in videos, analyzed with CV techniques and diffusion-based pipelines
Circuit discovery through chain of thought using policy gradients (lesswrong​.com). Policy gradients and integrated gradients enable tracing chain-of-thought in RL-augmented circuit discovery for LLMs, using subgraph z variables to patch edges
Training Models to Detect Activation Steering: Results and Implications (lesswrong​.com). Fine-tuned deeper-llm to detect activation steering; 85% accuracy on unseen concepts; discusses Hawthorne effect, RLHF suppression, and introspection in AI models
How to Explore to Scale RL Training of LLMs on Hard Problems? (blog​.ml​.cmu​.edu). Explores POPE: guided on-policy exploration for RL scaling on hard LLM problems using offline human guidance and limited transfer effects
Even superhuman AI forecasters are only as good as your questions (newsletter​.danielpaleka​.com). Explores RL on forecasting with LLMs, questions for decision-making, and challenges in building superhuman forecasters
🔩 LLM Internals & Optimization
Verifying LLM Inference with Token-DiFR (adamkarvonen​.github​.io). Token-DiFR verifies LLM inference by measuring divergence from a reference with a shared seed, enabling detection of quantization, sampling bugs, and tampering
The Q, K, V Matrices (arpitbhayani​.me). How Q, K, V projections drive self-attention with Python numpy examples and dimension choices
Why We’ve Been Optimizing the Wrong Thing in LLMs for Years (towardsdatascience​.com). Shift to Multi-Token Prediction (MTP) in LLMs enables foresight, faster inference, and improved reasoning with parallel token prediction
KV Cache Optimization via Tensor Product Attention (pyimagesearch​.com). Tensor Product Attention (TPA) reduces KV cache size for LLM inference using low-rank factorization and RoPE integration
📚 Academic Research
LFM2 Technical Report (arxiv:cs). LFM2 releases compact open-weight models optimized for fast CPU and edge inference. Engineers gain multimodal, speech and retrieval variants with support for ExecuTorch, llama.cpp, vLLM
HBridge: H-Shape Bridging of Heterogeneous Experts for Unified Multimodal Understanding and Generation (arxiv:cs). MapReduce LoRA and RaTE train separate preference experts then merge them, enabling generative models to optimize multiple reward dimensions without alignment tax. Improves cross-modal alignment
Latent Collaboration in Multi-Agent Systems (arxiv:cs). LatentMAS lets multiple LLM agents communicate via shared hidden-state memory instead of natural language, preserving information while cutting token usage and latency. Boosts multi-domain reasoning
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models (arxiv:cs). HSA-UltraLong combines sparse attention with sliding windows to handle contexts up to sixteen million tokens while matching full-attention performance on normal lengths. Enables ultra-long RAG
Behavior-Equivalent Token: Single-Token Replacement for Long Prompts in LLMs (arxiv:cs). Behavior-Equivalent Tokens compress long natural-language system prompts into a single learned token that reproduces almost identical downstream behavior. Dramatically cuts inference cost and context usage
đź‘‹ Before you go...
I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can, by joining the Patreon page. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month.
If you are getting value from blaze, checking this out would mean the absolute world. But if you can't contribute, no worries - the newsletters keep coming either way. Thanks for reading and being part of this nerdy corner of the internet. All the best for the coming week - Alastair.
Add a comment: