Scale ML

Image Description

▸ We are a cross-lab MIT AI graduate student collective focusing on Algorithms That Learn and Scale.
▸ The group is open to all MIT affiliates, and to participate, contact the organizer. We currently host bi-weekly seminars and will have hands on sessions and research socials in the future.
▸ Our coffee ☕ + baked goods 🍰 are currently funded by generous donations from Phillip Isola and Yoon Kim.
▸ We are looking for sponsors to increase our seminar snack capacity, fund research socials, and reimburse speaker travels. Please contact the organizers if interested.

▸ Join our next seminar on Zoom, note this is only open to MIT students currently: Click here to join

Discussion Schedule

  • TBD 1B parameter model training. (hands on session) Aniruddha Nrusimha (MIT)
  • TBG How to scale models with Modula. (hands on session) Jeremy Bernstein (MIT)
  • 07/24 FineWeb: Creating a large dataset for pretraining LLMse Guilherme Penedo (Hugging Face)
  • 07/17 Hardware-aware Algorithms for Language Modeling Tri Dao (Princeton)
  • 07/10 LLM360: Towards Fully Transparent Open-Source LLMs Hongyi Wang (CMU)
  • 07/3 DeciMamba: Exploring the Length Extrapolation Potential of Mamba. Assaf Ben-Kish (Tel-Aviv)
  • 04/17 Adapting LLMs with Reinforcement Learning Idan Shenfeld
  • 04/03 The Quest to build an (O)pen (L)anguage (Mo)del Luca Soldaini (AI2)
  • 03/20 Efficient Deep Learning with Sparsity: Algorithms, Systems, and Applications Zhijian Liu
  • 03/12 Building and Deploying Large Language Model Applications Efficiently and Verifiably Ying Sheng (Stanford)
  • 03/06 In-Context Language Learning and N-gram Heads Ekin Akyürek
  • 02/21 Neurons, norms and number systems Jeremy Bernstein
  • 11/28 Sparsity in Transformers Shobhita Sundaram
  • 10/18 Large-Scale RNNs in the era of Transformers Bailin Wang
  • 11/01 Critical batch-size in deep learning Minyoung Huh (Jacob)
  • 10/18 Tensor Program Synthesis Han Guo
  • 10/04 Mixture of Experts (MOEs) Jyo Pari
  • 09/13 Speculative Decoding Aniruddha Nrusimha

Critical batch-size in deep learning

What batch-size should you use for your model? What does the batch-size tell you about your task? This post discusses one main aspects of the scaling law....

November 1, 2023 · 23 min ·  Author:  Minyoung Huh   |   Editor:  N/A

Mixture of Experts (MoE)

MoE are rumored to be a critical components in scaling up to a trillion parameter model. By routing tokens to specialized modular functions, it enables models to have representational power of a much larger model than what is used for prediction. We will be discussing how MoE works and its recent advances in this literature...

October 4, 2023 · 11 min ·  Author:  Jyo Pari   |   Editor:  Minyoung Huh

Speculative decoding

A brief overview of speculative decoding, detailing the roots of LLM inference slowdowns and how algorithmic level changes can improve generation speed!...

September 13, 2023 · 14 min ·  Author:  Ani Nrusimha   |   Editor:  Minyoung Huh