Topic

#Llm

60 articles on Llm — news, releases, guides and analysis from the SourceFeed engine.

The 1.6-Trillion Parameter Mirage: LongCat 2.0 and the MoE Memory Tax

LongCat 2.0 delivers 48B active parameter performance, but its massive 1.6T total footprint demands a brutal hardware reality check.

Rachel Goldstein

Popping the CPU-GPU Latency Bubble in Inference

Pipelined decoding techniques show that software optimization, not just raw hardware scaling, is the key to maximizing GPU utilization.

Article · 11h ago2

Add Semantic Caching to Your LLM App with Redis

Build a Python layer that stores LLM responses by embedding and retrieves them by semantic similarity, so paraphrased questions skip the API entirely.

Tutorial · 15h ago0

Qwen 3.6 27B Hits the Local Development Sweet Spot

The dense 27B model delivers frontier-class intelligence on local hardware without the compromises of lightweight mixtures of experts.

Article · 1d ago0

How a Database Schema Error Triggered an Expensive AI Retry Storm

When deterministic database failures meet automatic task retries, non-idempotent LLM calls can quietly drain your entire cloud budget.

Article · 1d ago2

HackerRank's open ATS scores your résumé by dice roll

The same PDF swings from 66 to 99 across runs, and the reason isn't a bug you can prompt away.

Article · 1d ago2

Moving Off the Meter: The Reality of Self-Hosting Production LLMs

Swapping SaaS APIs for local hardware and free cloud tiers eliminates token fees but introduces a steep operational tax.

Article · 3d ago2

The Open-Weights Gap Depends on What You Measure

A viral chart predicts open models reach parity by December 2026. Across 18 benchmarks, the honest answer is messier.

Article · 3d ago5

Why Your AI Coding Agent Needs a Local Proxy

Local routers like Weave and 9Router cut API bills and bypass the context re-read tax.

Article · 4d ago0

GPT-5.6 splits model tiers from version numbers

OpenAI's Sol, Terra and Luna preview behind a US-government gate, with a quiet agentic-safety regression worth watching.

News · 4d ago2

The LLM Cost Cliff Your Budget Isn't Ready For

Per-token prices are collapsing, yet AI bills keep exploding. The two facts aren't a contradiction, and confusing them will wreck your business case.

Article · 4d ago1

Prompt Injection Is the Least of Your AI Security Problems

Real-world attacks reveal that while frontier models can resist linguistic trickery, your glue code and infrastructure are wide open.

Article · 4d ago1

Build a Multi-Agent Research Pipeline with CrewAI and Ollama

Assemble a three-agent CrewAI crew backed by a locally running Llama 3.1 model to autonomously produce structured, cited research reports — no OpenAI key required.

Tutorial · 4d ago0

Why Developers are Trading Obsidian for Agent-Native Markdown Wikis

Traditional knowledge bases isolate your notes. A new wave of open-source, CLI-first tools connects your wiki directly to your LLM agents.

Article · 4d ago1

The distillation attack no API can fully block

Anthropic's 28.8-million-query accusation against Alibaba exposes an uncomfortable truth: when you sell model outputs, you sell the training data too.

Article · 5d ago4

Under the Hood of NeMo AutoModel: High-Performance MoE Fine-Tuning

NVIDIA's new library injects Expert Parallelism and DeepEP into Hugging Face's API, slashing memory use and training times.

Article · 6d ago0

Llm in your inbox

The best developer & AI content, delivered. No spam.