LLM Multi-GPU Optimizer

Roofline decode estimator for contiguous layer/block split. Tensor-parallel results require calibration.

Single-session throughput 0.00 t/s
Latency 0.0 ms
Layer Split 0
Placement Status OK
Best estimated split
0.00 t/s
GPU 1 + CPU only
0.00 t/s