Recent Publications
Generative Reward Models
Dakota Mahan*, Duy Van Phung*, Rafael Rafailov*, Chase Blagden, Nathan Lile, Louis Castricato, Jan-Philipp Fränken, Chelsea Finn, Alon Albalak*
PreprintUnderstanding and Improving Language Models Through a Data-Centric Lens
Alon Albalak
Dissertation, 2024
PaperA Survey on Data Selection for Language Models
Alon Albalak, Yanai Elazar, Sang Michael Xie, Shayne Longpre, Nathan Lambert, Xinyi Wang, Niklas Muennighoff, Bairu Hou, Liangming Pan, Haewon Jeong, Colin Raffel, Shiyu Chang, Tatsunori Hashimoto, William Yang Wang
TMLR, 2024
Paper GithubDataComp-LM: In search of the next generation of training sets for language models
Jeffrey Li, Alex Fang, Georgios Smyrnis, Maor Ivgi, … 14 authors, … Alon Albalak, … 40 more authors
NeurIPS 2024, Datasets and Benchmarks Track
PreprintThe Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources
Shayne Longpre, Stella Biderman, Alon Albalak, … 20 more authors
PreprintA Mathematical Framework, a Taxonomy of Modeling Paradigms, and a Suite of Learning Techniques for Neural-Symbolic Systems
Charles Dickens, Connor Pryor, Changyu Gao, Alon Albalak, Eriq Augustine, William Wang, Stephen Wright, Lise Getoor
PreprintEagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Bo Peng*, Daniel Goldstein*, Quentin Anthony*, Alon Albalak, … 23 more authors
COLM 2024
PaperGeneralization v.s. Memorization: Tracing Language Models’ Capabilities Back to Pretraining Data
Antonis Antoniades, Xinyi Wang, Yanai Elazar, Alfonso Amayuelas, Alon Albalak, Kexun Zhang, William Yang Wang
ICML 2024, Workshop on Foundation Models in the Wild
PreprintImproving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data
Alon Albalak, Colin Raffel, William Yang Wang
NeurIPS 2023
Paper Code PresentationEfficient Online Data Mixing For Language Model Pre-Training
Alon Albalak, Liangming Pan, Colin Raffel, William Yang Wang
NeurIPS 2023, Workshop on Robustness of Few-shot and Zero-shot Learning in Foundation Models
PaperRWKV: Reinventing RNNs for the Transformer Era
Bo Peng*, Eric Alcaide*, Quentin Anthony*, Alon Albalak, … 26 more authors
EMNLP 2023
Paper CodeLogic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning
Liangming Pan, Alon Albalak, Xinyi Wang, William Yang Wang
EMNLP 2023
Paper CodeCausalDialogue: Modeling Utterance-level Causality in Conversations
Yi-Lin Tuan, Alon Albalak, Wenda Xu, Michael Saxon, Connor Pryor, Lise Getoor, William Yang Wang
ACL 2023
PaperNeuPSL: Neural Probabilistic Soft Logic
Connor Pryor, Charles Dickens, Eriq Augustine, Alon Albalak, William Yang Wang, Lise Getoor
IJCAI 2023
PaperAddressing Issues of Cross-Linguality in Open-Retrieval Question Answering Systems For Emergent Domains
Alon Albalak, Sharon Levy, William Yang Wang
EACL 2023, Demonstration Track
Paper CodeFETA: A Benchmark for Few-Sample Task Transfer in Open-Domain Dialogue
Alon Albalak, Yi-Lin Tuan, Pegah Jandaghi, Connor Pryor, Luke Yoffe, Deepak Ramachandran, Lise Getoor, Jay Pujara, William Yang Wang
EMNLP 2022
Paper Code Benchmark WebsiteAn Exploration of Methods for Zero-shot Transfer in Small Language Models
Alon Albalak, Akshat Shrivastava, Chinnadhurai Sankar, Adithya Sagar, Mike Ross
NeurIPS 2022, Workshop on Efficient Natural Language and Speech Processing
PaperMaking Something out of Nothing: Building Robust Task-oriented Dialogue Systems from Scratch
Zekun Li, Hong Wang, Alon Albalak, Yingrui Yang, Jing Qian, Shiyang Li, Xifeng Yan
Alexa Prize Taskbot Challenge 2022
PaperD-REX: Dialogue Relation Extraction with Explanations
Alon Albalak, Varun Embar, Yi-Lin Tuan, Lise Getoor, William Yang Wang
ACL 2022, ConvAI Workshop
Paper CodeEfficient Learning Losses for Deep Hinge-Loss Markov Random Fields
Charles Dickens, Connor Pryor, Eriq Augustine, Alon Albalak, Lise Getoor
UAI 2022, 5th Workshop on Tractable Probabilistic Modeling
PaperModeling Disclosive Transparency in NLP Application Descriptions
Michael Saxon, Sharon Levy, Xinyi Wang, Alon Albalak, William Yang Wang
EMNLP 2021, Oral Presentation
Paper