▸ We are a cross-lab MIT AI graduate student collective focusing on Algorithms That Learn and Scale.
▸ The group is open to all MIT affiliates, and to participate, contact the organizer. We currently host bi-weekly seminars and will have hands on sessions and research socials in the future.
▸ Our coffee ☕ + baked goods 🍰 are currently funded by generous donations from Phillip Isola and Yoon Kim.
▸ We are looking for sponsors to increase our seminar snack capacity, fund research socials, and reimburse speaker travels. Please contact the organizers if interested.
▸ Join our next seminar on Zoom, note this is only open to MIT students currently: Click here to join
What batch-size should you use for your model? What does the batch-size tell you about your task? This post discusses one main aspects of the scaling law....
MoE are rumored to be a critical components in scaling up to a trillion parameter model. By routing tokens to specialized modular functions, it enables models to have representational power of a much larger model than what is used for prediction. We will be discussing how MoE works and its recent advances in this literature...
A brief overview of speculative decoding, detailing the roots of LLM inference slowdowns and how algorithmic level changes can improve generation speed!...