Star Attention: Efficient LLM Inference Over Long Sequences 2 minute read Published: December 04, 2024AbstractIntroductionPhase 1: Context EncodingPhase 2: Query Encoding and Token GenerationUpdate of Cache and OutputThoughts
Preface 1 minute read Published: November 27, 2024I am on a train from Beijing to Wuxin right now. Sometimes I believe one’s life is just like traveling on a train…