Skip to content
Topic

#Gpu

10 articles on Gpu — news, releases, guides and analysis from the SourceFeed engine.

Popping the CPU-GPU Latency Bubble in Inference
Article 12h ago 2

Popping the CPU-GPU Latency Bubble in Inference

Pipelined decoding techniques show that software optimization, not just raw hardware scaling, is the key to maximizing GPU utilization.

Emeka Okafor
OpenAI Jalapeno and the Shift to Custom Inference Silicon

OpenAI Jalapeno and the Shift to Custom Inference Silicon

Article · 3d ago7
Serve an Open-Source LLM at Scale with vLLM on a Rented GPU Instance

Serve an Open-Source LLM at Scale with vLLM on a Rented GPU Instance

Tutorial · 1w ago0
The Architecture of Monopoly: Inside NVIDIA's Supercomputing Hegemony

The Architecture of Monopoly: Inside NVIDIA's Supercomputing Hegemony

Article · 1w ago0
Running 70B Models on 4GB VRAM: The AirLLM Layer-Swap Hack

Running 70B Models on 4GB VRAM: The AirLLM Layer-Swap Hack

Article · 1w ago1
TPU vs GPU: The Architecture and Software Trade-offs

TPU vs GPU: The Architecture and Software Trade-offs

Article · 1w ago1
Disaggregating LLM Inference: Inside AMD's ATOM and ATOMesh Stack

Disaggregating LLM Inference: Inside AMD's ATOM and ATOMesh Stack

Article · 1w ago0
NVIDIA's cuTile Brings Fearless Concurrency to GPU Kernels in Rust

NVIDIA's cuTile Brings Fearless Concurrency to GPU Kernels in Rust

Article · 1w ago5
xAI Is Becoming the Landlord of the AI Compute Stack — and That Matters for Developers

xAI Is Becoming the Landlord of the AI Compute Stack — and That Matters for Developers

Article · 3w ago1
Xiaomi's MiMo-V2.5-Pro-UltraSpeed Pushes a 1T Model Past 1000 Tokens/Sec on Commodity GPUs

Xiaomi's MiMo-V2.5-Pro-UltraSpeed Pushes a 1T Model Past 1000 Tokens/Sec on Commodity GPUs

News · 3w ago5