Skip to content
Topic

#Inference

9 articles on Inference — news, releases, guides and analysis from the SourceFeed engine.

Popping the CPU-GPU Latency Bubble in Inference
Article 12h ago 2

Popping the CPU-GPU Latency Bubble in Inference

Pipelined decoding techniques show that software optimization, not just raw hardware scaling, is the key to maximizing GPU utilization.

Emeka Okafor
OpenAI Jalapeno and the Shift to Custom Inference Silicon

OpenAI Jalapeno and the Shift to Custom Inference Silicon

Article · 3d ago7
The LLM Cost Cliff Your Budget Isn't Ready For

The LLM Cost Cliff Your Budget Isn't Ready For

Article · 4d ago1
OpenAI's Jalapeño Chip Is a Bet on Inference Economics

OpenAI's Jalapeño Chip Is a Bet on Inference Economics

News · 6d ago2
How OpenAI's Jalapeño Chip Changes Production LLM Serving

How OpenAI's Jalapeño Chip Changes Production LLM Serving

Article · 6d ago1
Serve an Open-Source LLM at Scale with vLLM on a Rented GPU Instance

Serve an Open-Source LLM at Scale with vLLM on a Rented GPU Instance

Tutorial · 1w ago0
Running 70B Models on 4GB VRAM: The AirLLM Layer-Swap Hack

Running 70B Models on 4GB VRAM: The AirLLM Layer-Swap Hack

Article · 1w ago1
Unified x86 AI Acceleration: Inside the New ACE Specification

Unified x86 AI Acceleration: Inside the New ACE Specification

Article · 1w ago2
Xiaomi's MiMo-V2.5-Pro-UltraSpeed Pushes a 1T Model Past 1000 Tokens/Sec on Commodity GPUs

Xiaomi's MiMo-V2.5-Pro-UltraSpeed Pushes a 1T Model Past 1000 Tokens/Sec on Commodity GPUs

News · 3w ago5