A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes Aug 17, 2022 • 56
🦅 🐍 FalconMamba 7B Collection This collection features the FalconMamba 7B base model, the instruction-tuned version, their 4-bit and GGUF variants, and the demo. • 13 items • Updated 2 days ago • 25
Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning Paper • 2303.02861 • Published Mar 6, 2023 • 1
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model Paper • 2406.04904 • Published Jun 7 • 4
AQLM+PV Collection Official AQLM quantizations for "PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression": https://arxiv.org/abs/2405.14852 • 21 items • Updated 1 day ago • 15
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations Paper • 2405.18392 • Published May 28 • 12
view article Article Overview of natively supported quantization schemes in 🤗 Transformers Sep 12, 2023 • 10
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Aug 2 • 673
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study Paper • 2404.10719 • Published Apr 16 • 3
Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent Paper • 2402.09844 • Published Feb 15 • 20
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22 • 250
ORPO: Monolithic Preference Optimization without Reference Model Paper • 2403.07691 • Published Mar 12 • 59
view article Article Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA May 24, 2023 • 79
view article Article A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes Aug 17, 2022 • 56
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 182
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models Paper • 2402.17177 • Published Feb 27 • 88
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model Paper • 2402.17412 • Published Feb 27 • 21
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation Paper • 2401.08417 • Published Jan 16 • 30
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU Paper • 2312.12456 • Published Dec 16, 2023 • 40
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 590
Extreme Compression of Large Language Models via Additive Quantization Paper • 2401.06118 • Published Jan 11 • 12
Gemma release Collection Groups the Gemma models released by the Google team. • 40 items • Updated Jul 31 • 325
Load 4bit models 4x faster Collection Native bitsandbytes 4bit pre quantized models • 21 items • Updated 15 days ago • 43
Flan-T5 release Collection The Flan-T5 covers 4 checkpoints of different sizes each time. It also includes upgrades versions trained using Universal sampling • 7 items • Updated Jul 31 • 18
ALBERT release Collection The ALBERT release was done in two steps, over 4 checkpoints of different sizes each time. The first version is noted as "v1", the second as "v2". • 8 items • Updated Jul 31 • 5
BERT release Collection Regroups the original BERT models released by the Google team. Except for the models marked otherwise, the checkpoints support English. • 8 items • Updated Jul 31 • 18
Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning Paper • 2402.03046 • Published Feb 5 • 6
Canonical models Collection This collection lists all the historical (pre-"Hub") canonical model checkpoints, i.e. repos that were not under an org or user namespace • 68 items • Updated Feb 13 • 13
Comparing DPO with IPO and KTO Collection A collection of chat models to explore the differences between three alignment techniques: DPO, IPO, and KTO. • 56 items • Updated Jan 9 • 31
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model Paper • 2312.11370 • Published Dec 18, 2023 • 19
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks Paper • 2312.08583 • Published Dec 14, 2023 • 9
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration Paper • 2306.00978 • Published Jun 1, 2023 • 8
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale Paper • 2208.07339 • Published Aug 15, 2022 • 4
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers Paper • 2210.17323 • Published Oct 31, 2022 • 7
Papers to read - General Collection Papers I want to read, at some point. • 8 items • Updated Apr 9 • 4
Model Merging Collection Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! • 30 items • Updated Jun 12 • 211
ML for Tools Collection Collection of papers about ML for using tools! • 25 items • Updated Jan 17 • 9
🐒 Stable Diffusion LoRAs Collection Awesome LoRAs found on the hub - using only 🐵 • 7 items • Updated Jul 23 • 16
Recent models: last 100 repos, sorted by creation date Collection The last 100 repos I have created. Sorted by creation date descending, so the most recently created repos appear at the top. • 121 items • Updated Jan 31 • 489
Tulu V2 Suite Collection The set of models associated with the paper "Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2" • 19 items • Updated 22 days ago • 43
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity Paper • 2101.03961 • Published Jan 11, 2021 • 14
Sharded checkpoints Collection useful sharded checkpoints for users to run inference / fine-tuning on a Google colab without having to deal with CPU OOM issues. • 7 items • Updated Dec 9, 2023 • 5