The AI Engineer 02-12-2025

        December 2, 2025

The AI Engineer 02-12-2025
AI's transformative effect on work, political manipulation, advancements in cyber-security

            📣 Headlines
•  Three years after ChatGPT's debut, analyses of its global impact on work, education, and local AI ecosystems and the broader AI hype saturating art, media, and industry show the technology moving from novelty to entrenched infrastructure.

•  New legal and platform controversies—from an OpenAI court filing that links a teen's suicide to alleged ChatGPT rule violations to Elon Musk's Grokipedia project that rewrites Wikipedia with partisan, extremist-sourced narratives—underscore the risks of unregulated generative AI systems.

•  AI is intensifying political manipulation, with AI-funded PACs pouring money into deregulation‑friendly candidates in the U.S. elections while networks of Asia‑based pro‑Trump accounts on X monetize rage‑bait and potential foreign influence operations at scale.

•  Leading technologists signal AI will transform—not eliminate—work, as Nvidia's CEO predicts AI will increase productivity but also workloads while MIT researchers argue adoption costs will shift humans toward higher‑expertise tasks rather than wholesale job loss.

•  Enterprise AI infrastructure continues to mature, with Kovant pitching itself as an orchestration nerve center for fleets of agentic AI systems in industrial operations and NetApp raising guidance on the back of demand for its AI‑optimized storage platforms and data engine.

•  Music labels are moving from litigation to licensing as Warner Music signs a deal allowing opt‑in use of artists' voices on AI song generator Suno, following a settlement that aims to turn the tool into an artist‑friendly, licensed AI music platform.

•  Cybersecurity leaders are leaning on AI both as a weapon and a shield, with Clover Security raising major funding to automate vulnerability remediation while IT executives warn that detecting AI‑generated deepfake attacks will be a top priority by 2026.

•  Defense and dual‑use AI are under scrutiny as record defense‑tech funding rounds chase autonomous weapons and sensing platforms, even as high‑profile systems from Anduril reportedly stumble in tests and combat, revealing technical and reliability gaps.

🔧 Company Engineering Blogs

Why developers still flock to Python: Guido van Rossum on readability, AI, and the future of programming (github.blog). Guido van Rossum discusses Python’s readability, its pivotal role in AI, and the language’s evolution and ecosystem

Transformers v5: Simple model definitions powering the AI ecosystem (huggingface.co). Transformers v5 introduces simplicity, modular model definitions, training at scale, improved inference, and quantization with PyTorch focus and ecosystem interoperability

Sample and Map from a Single Convex Potential: Generation using Conjugate Moment Measures (machinelearning.apple.com). Conjugate moment measures and convex potentials to sample and map from a log-concave distribution using input-convex neural networks

Using LLMs to improve Amazon product listings (amazon.science). LLMs adapt to catalogue structures via prompt tuning to improve attribute quality and multilingual coverage at scale

📰 Conferences & Community

Oxford researchers to explore metrics and fairness at NeurIPS 2025 (oii.ox.ac.uk). Oxford Internet Institute researchers present at NeurIPS 2025 in San Diego, exploring metrics, fairness, multilingual and community-centered AI perspectives with papers on benchmarking, bias mitigation, multilingual LLM judging, and Queer in AI workshop

Announcing the NeurIPS 2025 Best Paper Awards (blog.neurips.cc). NeurIPS 2025 best paper awards announced across seven papers spanning diffusion, LLMs, RL, and benchmarking

512KB Club: philna.sh was added or updated (philna.sh). Phil Nash showcases software development, speaking, and open‑source work, including Langflow at IBM and on-stage coding

🌍 AI Landscape & Theory

Three years on, ChatGPT still isn't what it was cracked up to be – and it probably never will be (garymarcus.substack.com). Skeptic argues ChatGPT won’t achieve AGI; discusses hype, ROI, reliability, and need for structured systems

A Brief History of Large Language Models (koenvangilst.nl). From ChatGPT to AI agents with tools, a user-centric timeline of LLM evolution, reasoning time, retrieval, and end-to-end workflows

Claude Opus 4.5, and why evaluating new LLMs is increasingly difficult (simonw.substack.com). Opus 4.5, prompt reliability, robustness to prompt injection, and Nano Banana Pro image generation with Grounding via Google Search

LLMs Navigate Knowledge, They Do Not Create (libidosciendi.substack.com). Columbia professor Vishal Misra’s formal models argue LLMs navigate knowledge, not create new frameworks, with RAG origins and Bayesian reasoning

Ilya Sutskever – We're moving from the age of scaling to the age of research (dwarkesh.com). Ilya Sutskever discusses aging from scaling to research, model generalization, RL efficiency, value functions, pre-training limits, and implications for AI alignment

🤖 Agents & Applied LLMs

Fara-7B: An efficient agentic model for computer use (github.com). Fara-7B: a 7B agentic LLM for automated computer use, built on Qwen2.5-VL-7B, with a synthetic data pipeline and Playwright-based web automation

Evaluating Answers with Large Language Models: How InferESG and RAGAS Helped by Ana Fonseca (blog.scottlogic.com). Open-source and proprietary LLMs evaluated via InferESG and RAGAS for ESG report QA, focusing on Qwen3-30b and GPT-OSS-20b

#528: Python apps with LLM building blocks (talkpython.fm). Vincent Warmerdam discusses treating LLMs as APIs in Python apps, with caching, validation, and structured outputs using DiskCache, LLM library, Pydantic, and Marimo notebooks

Building the Knowledge Layer Your Agents Need (thedataexchange.media). Graph RAG, knowledge graphs, AI agents, and memory with Philip Rathle of Neo4j, exploring production patterns and governance

Community Benchmarks for AI Coding Tools (nesbitt.io). Discusses community benchmarks for AI coding tools across Ruby, Elixir, Go, Rust; proposes maintainer-defined tests to improve model guidance

🖥️ Local & Offline LLM Builds

So you wanna build a local RAG? (blog.yakkomajuri.com). Local RAG setup using Postgres pgvector, Sentence Transformers, llama.cpp LLM, Docling, and open-source benchmarks

Building an offline AI stoic chatbot (abishekmuthian.com). Offline Epictetus chatbot on ARM devices using Gemma-3-270m, fine-tuning with 4-bit QLoRA, Python environments, and JSONL datasets

how prompt caching works - paged attention and prefix caching plus practical tips (sankalp.bearblog.dev). Explains vLLM's paged attention and prefix caching, KV-cache reuse across requests, and practical prompts optimization in Python-like terms

Optimizing Token Generation in llama.cpp's CUDA Backend (am17an.bearblog.dev). Optimizing token generation in llama.cpp CUDA backend with kernel fusion and concurrent streams for faster TG workloads

Switching from Ollama to llama-swap + llama.cpp on NixOS: the power user's choice 🦙 (nijho.lt). Power-user guide to switching from Ollama to llama-swap + llama.cpp on NixOS for multi-GPU, RAM offloading, and declarative model management

🧪 Alignment & Emergent Behaviors

Image Diffusion Models Exhibit Emergent Temporal Propagation in Videos (arxiv.org). Diffusion models for images show emergent temporal propagation in videos, analyzed with CV techniques and diffusion-based pipelines

Circuit discovery through chain of thought using policy gradients (lesswrong.com). Policy gradients and integrated gradients enable tracing chain-of-thought in RL-augmented circuit discovery for LLMs, using subgraph z variables to patch edges

Training Models to Detect Activation Steering: Results and Implications (lesswrong.com). Fine-tuned deeper-llm to detect activation steering; 85% accuracy on unseen concepts; discusses Hawthorne effect, RLHF suppression, and introspection in AI models

How to Explore to Scale RL Training of LLMs on Hard Problems? (blog.ml.cmu.edu). Explores POPE: guided on-policy exploration for RL scaling on hard LLM problems using offline human guidance and limited transfer effects

Even superhuman AI forecasters are only as good as your questions (newsletter.danielpaleka.com). Explores RL on forecasting with LLMs, questions for decision-making, and challenges in building superhuman forecasters

🔩 LLM Internals & Optimization

Verifying LLM Inference with Token-DiFR (adamkarvonen.github.io). Token-DiFR verifies LLM inference by measuring divergence from a reference with a shared seed, enabling detection of quantization, sampling bugs, and tampering

The Q, K, V Matrices (arpitbhayani.me). How Q, K, V projections drive self-attention with Python numpy examples and dimension choices

Why We’ve Been Optimizing the Wrong Thing in LLMs for Years (towardsdatascience.com). Shift to Multi-Token Prediction (MTP) in LLMs enables foresight, faster inference, and improved reasoning with parallel token prediction

KV Cache Optimization via Tensor Product Attention (pyimagesearch.com). Tensor Product Attention (TPA) reduces KV cache size for LLM inference using low-rank factorization and RoPE integration

📚 Academic Research

LFM2 Technical Report (arxiv:cs). LFM2 releases compact open-weight models optimized for fast CPU and edge inference. Engineers gain multimodal, speech and retrieval variants with support for ExecuTorch, llama.cpp, vLLM

HBridge: H-Shape Bridging of Heterogeneous Experts for Unified Multimodal Understanding and Generation (arxiv:cs). MapReduce LoRA and RaTE train separate preference experts then merge them, enabling generative models to optimize multiple reward dimensions without alignment tax. Improves cross-modal alignment

Latent Collaboration in Multi-Agent Systems (arxiv:cs). LatentMAS lets multiple LLM agents communicate via shared hidden-state memory instead of natural language, preserving information while cutting token usage and latency. Boosts multi-domain reasoning

Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models (arxiv:cs). HSA-UltraLong combines sparse attention with sliding windows to handle contexts up to sixteen million tokens while matching full-attention performance on normal lengths. Enables ultra-long RAG

Behavior-Equivalent Token: Single-Token Replacement for Long Prompts in LLMs (arxiv:cs). Behavior-Equivalent Tokens compress long natural-language system prompts into a single learned token that reproduces almost identical downstream behavior. Dramatically cuts inference cost and context usage

👋 Before you go...
I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can, by joining the Patreon page. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month.
If you are getting value from blaze, checking this out would mean the absolute world. But if you can't contribute, no worries - the newsletters keep coming either way. Thanks for reading and being part of this nerdy corner of the internet. All the best for the coming week - Alastair.

                                Don't miss what's next. Subscribe to The AI Engineer:

          Add a comment:

                    Share this email:

                            Share on LinkedIn

                            Share on Hacker News

                            Share on Mastodon

                            Share on Bluesky