讲座:Efficient and Robust Scheduling for LLM Inference 发布时间:2025-08-08

  • 活动时间:
  • 活动地址:
  • 主讲人:

题 目:Efficient and Robust Scheduling for LLM Inference

嘉 宾:周子杰 助理教授 香港科技大学

主持人:许欢 教授 上海交通大学安泰经济与管理学院

时 间:2025年8月13日(周三)14:00-15:30

地 点:安泰楼A507室

内容简介:

Efficiently serving Large Language Model (LLM) requests requires carefully balancing memory constraints, variable request lengths, and uncertain output behaviors. In this talk, we present two complementary approaches to optimize LLM inference scheduling. First, we study the problem of scheduling requests with heterogeneous prefill and decode lengths, where prefill corresponds to prompt processing and decode involves sequential token generation. We show that this problem is NP-hard due to batching, precedence constraints, and dynamic memory usage. While competitive ratios of common heuristics (e.g., First-Come-First-Serve) scale up with the memory limit, we propose a novel batching algorithm with a constant competitive ratio, alongside efficient variants (dynamic programming, local search) that outperform baselines. Second, we address the challenge of uncertain output lengths, where the decode phase is initially unknown. We introduce a prediction-aware framework that assumes output lengths are predicted within intervals. A conservative approach (Amax) schedules requests based on upper bounds but lacks robustness to prediction errors. Instead, we propose Amin, an adaptive algorithm that dynamically refines estimates during inference, achieving a log scale competitive ratio and near-optimal performance in practice.

演讲人简介:

Zijie Zhou is an Assistant Professor in the Department of Industrial Engineering & Decision Analytics (IEDA) at the Hong Kong University of Science and Technology (HKUST). He received his Ph.D. in 2025 from the MIT Operations Research Center (ORC) and the Laboratory for Information & Decision Systems (LIDS). His research focuses on online optimization, scheduling, queueing theory, and experiment design, with recent work applying operations research techniques to enhance LLM inference efficiency. Previously, Zijie was a Research Intern at Microsoft Research and Microsoft Azure (2024), where he worked on foundational models for LLM inference. He also interned at Oracle Labs (2023), designing robust booking and upgrade mechanisms for the hospitality industry.

 

欢迎广大师生参加!