721 57 257

Younes Belkada

ybelkada

AI & ML interests

Large Language Models, Quantization, Vision, Multimodality, Diffusion models

Articles

Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA

May 24, 2023

• 79

Introducing RWKV — An RNN with the advantages of a transformer

May 15, 2023

• 12

StackLLaMA: A hands-on guide to train LLaMA with RLHF

Apr 5, 2023

• 15

Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU

Mar 9, 2023

• 30

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes

Aug 17, 2022

• 56

Organizations

ybelkada's activity

upvoted an article about 1 month ago

Article

Welcome FalconMamba: The first strong attention-free 7B model

Aug 12

• 96

upvoted a collection about 1 month ago

🦅 🐍 FalconMamba 7B

Collection

This collection features the FalconMamba 7B base model, the instruction-tuned version, their 4-bit and GGUF variants, and the demo. • 13 items • Updated 2 days ago • 25

upvoted a collection 3 months ago

4M Models

Collection

Multimodal models from https://4m.epfl.ch/ • 14 items • Updated Jun 14 • 29

upvoted 2 papers 3 months ago

Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning

Paper • 2303.02861 • Published Mar 6, 2023 • 1

XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model

Paper • 2406.04904 • Published Jun 7 • 4

upvoted a collection 4 months ago

AQLM+PV

Collection

Official AQLM quantizations for "PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression": https://arxiv.org/abs/2405.14852 • 21 items • Updated 1 day ago • 15

upvoted a paper 4 months ago

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

Paper • 2405.18392 • Published May 28 • 12

upvoted 2 articles 5 months ago

Article

Overview of natively supported quantization schemes in 🤗 Transformers

Sep 12, 2023

• 10

Article

Mixture of Experts Explained

Dec 11, 2023

• 157

upvoted a collection 5 months ago

Meta Llama 3

Collection

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Aug 2 • 673

upvoted 3 papers 5 months ago

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

Paper • 2404.10719 • Published Apr 16 • 3

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

Paper • 2402.09844 • Published Feb 15 • 20

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22 • 250

upvoted a collection 5 months ago

Pile-T5

Collection

T5 trained on the Pile with Llama Tokenizer • 4 items • Updated Jul 6 • 16

upvoted a paper 5 months ago

ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12 • 59

upvoted 6 articles 5 months ago

Article

Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA

May 24, 2023

• 79

Article

Fine-Tuning Gemma Models in Hugging Face

Feb 23

• 21

Article

GaLore: Advancing Large Model Training on Consumer-grade Hardware

Mar 20

• 24

Article

quanto: a pytorch quantization toolkit

Mar 18

• 28

Article

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes

Aug 17, 2022

• 56

Article

CodeGemma - an official Google release for code LLMs

Apr 9

• 99

upvoted a collection 5 months ago

CodeGemma Release

Collection

18 items • Updated Aug 2 • 75

upvoted a collection 7 months ago

AQLM

Collection

AQLM quantized LLMs • 20 items • Updated May 3 • 41

upvoted 9 papers 7 months ago

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6 • 182

Training Machine Learning models at the Edge: A Survey

Paper • 2403.02619 • Published Mar 5 • 1

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27 • 88

DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model

Paper • 2402.17412 • Published Feb 27 • 21

StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29 • 132

Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

Paper • 2401.08417 • Published Jan 16 • 30

PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU

Paper • 2312.12456 • Published Dec 16, 2023 • 40

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 590

Extreme Compression of Large Language Models via Additive Quantization

Paper • 2401.06118 • Published Jan 11 • 12

upvoted 5 collections 7 months ago

Gemma release

Collection

Groups the Gemma models released by the Google team. • 40 items • Updated Jul 31 • 325

Load 4bit models 4x faster

Collection

Native bitsandbytes 4bit pre quantized models • 21 items • Updated 15 days ago • 43

Flan-T5 release

Collection

The Flan-T5 covers 4 checkpoints of different sizes each time. It also includes upgrades versions trained using Universal sampling • 7 items • Updated Jul 31 • 18

ALBERT release

Collection

The ALBERT release was done in two steps, over 4 checkpoints of different sizes each time. The first version is noted as "v1", the second as "v2". • 8 items • Updated Jul 31 • 5

BERT release

Collection

Regroups the original BERT models released by the Google team. Except for the models marked otherwise, the checkpoints support English. • 8 items • Updated Jul 31 • 18

upvoted a paper 7 months ago

Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning

Paper • 2402.03046 • Published Feb 5 • 6

upvoted 4 collections 8 months ago

LLaVA-1.5

Collection

A collection of LLaVA-1.5 checkpoints • 4 items • Updated Jan 31 • 16

LLaVA-1.6

Collection

A collection of LLaVA-1.6 checkpoints • 4 items • Updated Jan 31 • 64

Canonical models

Collection

This collection lists all the historical (pre-"Hub") canonical model checkpoints, i.e. repos that were not under an org or user namespace • 68 items • Updated Feb 13 • 13

Comparing DPO with IPO and KTO

Collection

A collection of chat models to explore the differences between three alignment techniques: DPO, IPO, and KTO. • 56 items • Updated Jan 9 • 31

upvoted 6 papers 9 months ago

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

Paper • 2312.11370 • Published Dec 18, 2023 • 19

ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks

Paper • 2312.08583 • Published Dec 14, 2023 • 9

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Paper • 2306.00978 • Published Jun 1, 2023 • 8

QLoRA: Efficient Finetuning of Quantized LLMs

Paper • 2305.14314 • Published May 23, 2023 • 45

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

Paper • 2208.07339 • Published Aug 15, 2022 • 4

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Paper • 2210.17323 • Published Oct 31, 2022 • 7

upvoted 6 collections 10 months ago

Papers to read - General

Collection

Papers I want to read, at some point. • 8 items • Updated Apr 9 • 4

Model Merging

Collection

Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! • 30 items • Updated Jun 12 • 211

ML for Tools

Collection

Collection of papers about ML for using tools! • 25 items • Updated Jan 17 • 9

🐒 Stable Diffusion LoRAs

Collection

Awesome LoRAs found on the hub - using only 🐵 • 7 items • Updated Jul 23 • 16

Recent models: last 100 repos, sorted by creation date

Collection

The last 100 repos I have created. Sorted by creation date descending, so the most recently created repos appear at the top. • 121 items • Updated Jan 31 • 489

Tulu V2 Suite

Collection

The set of models associated with the paper "Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2" • 19 items • Updated 22 days ago • 43

upvoted a paper 10 months ago

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

Paper • 2101.03961 • Published Jan 11, 2021 • 14

upvoted a paper 11 months ago

Zephyr: Direct Distillation of LM Alignment

Paper • 2310.16944 • Published Oct 25, 2023 • 120

upvoted a collection 12 months ago

Sharded checkpoints

Collection

useful sharded checkpoints for users to run inference / fine-tuning on a Google colab without having to deal with CPU OOM issues. • 7 items • Updated Dec 9, 2023 • 5

Younes Belkada

AI & ML interests

Articles

Welcome FalconMamba: The first strong attention-free 7B model

Welcome Llama 3 - Meta's new open LLM

GaLore: Advancing Large Model Training on Consumer-grade Hardware

quanto: a pytorch quantization toolkit

Fine-Tuning Gemma Models in Hugging Face

Mixture of Experts Explained

Welcome Mixtral - a SOTA Mixture of Experts on Hugging Face

Overview of natively supported quantization schemes in 🤗 Transformers

Making LLMs lighter with AutoGPTQ and transformers

Fine-tune Llama 2 with DPO

The Falcon has landed in the Hugging Face ecosystem

Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA

Introducing RWKV — An RNN with the advantages of a transformer

StackLLaMA: A hands-on guide to train LLaMA with RLHF

Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes

Organizations

ybelkada's activity

Welcome FalconMamba: The first strong attention-free 7B model

Overview of natively supported quantization schemes in 🤗 Transformers

Mixture of Experts Explained

Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA

Fine-Tuning Gemma Models in Hugging Face

GaLore: Advancing Large Model Training on Consumer-grade Hardware

quanto: a pytorch quantization toolkit

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes

CodeGemma - an official Google release for code LLMs