Omar Sanseviero's picture

79065139128143648.0 TFLOPS

Omar Sanseviero

osanseviero

·

https://osanseviero.github.io/hackerllama/

AI & ML interests

Llamas, model merging, massive ASR for data collection, 3D ML, on-device ML, quantization, model judging, ML in browser, healthcare applications, education, intersection of art and ML.🦙

Articles

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context

WWDC 24: Running Mistral 7B with Core ML

How we leveraged distilabel to create an Argilla 2.0 Chatbot

Welcome Gemma 2 - Google's new open LLM

Welcome Llama 3 - Meta's new open LLM

CodeGemma - an official Google release for code LLMs

🪆 Introduction to Matryoshka Embedding Models

Welcome Gemma - Google's new open LLM

Constitutional AI with Open LLMs

Preference Tuning LLMs with Direct Preference Optimization Methods

Mixture of Experts Explained

Welcome Mixtral - a SOTA Mixture of Experts on Hugging Face

Inference for PROs

Spread Your Wings: Falcon 180B is here

Code Llama: Llama 2 learns to code

Results of the Open Source AI Game Jam

Llama 2 is here - get it on Hugging Face

The Falcon has landed in the Hugging Face ecosystem

Hugging Face Machine Learning Demos on arXiv

What's new in Diffusers? 🎨

Announcing Evaluation on the Hub

An Introduction to Deep Reinforcement Learning

Welcome spaCy to the 🤗 Hub

Sentence Transformers in the 🤗 Hub

Organizations

osanseviero's activity

upvoted a collection 1 day ago

Moshi v0.1 Release

MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated 1 day ago • 134

upvoted an article 2 days ago

Article

Fine-tuning Parler TTS on a Specific Language

By

•

4 days ago

• 13

upvoted a collection 2 days ago

jina-embeddings-v3

Multilingual multi-task general text embedding model • 6 items • Updated about 12 hours ago • 4

upvoted 3 papers 3 days ago

Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources

Paper • 2409.08239 • Published 7 days ago • 15

DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?

Paper • 2409.07703 • Published 8 days ago • 58

Agent Workflow Memory

Paper • 2409.07429 • Published 8 days ago • 25

upvoted 2 articles 3 days ago

Article

Safetensors audited as really safe and becoming the default

May 23, 2023

• 3

Article

Introducing Community Tools

4 days ago

• 18

upvoted a paper 6 days ago

Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

Paper • 2409.08264 • Published 7 days ago • 39

upvoted 2 papers 7 days ago

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers

Paper • 2409.04109 • Published 14 days ago • 37

MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications

Paper • 2409.07314 • Published 8 days ago • 49

upvoted a collection 8 days ago

DataGemma Release

A series of pioneering open models that help ground LLMs in real-world data through Data Commons. • 2 items • Updated 8 days ago • 53

upvoted an article 8 days ago

Article

StarCoder2 and The Stack v2

Feb 28

• 5

upvoted a paper 8 days ago

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

Paper • 2409.06666 • Published 9 days ago • 51

upvoted 3 papers 9 days ago

OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs

Paper • 2409.05152 • Published 11 days ago • 27

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Paper • 1908.10084 • Published Aug 27, 2019 • 4

Quantifying the Carbon Emissions of Machine Learning

Paper • 1910.09700 • Published Oct 21, 2019 • 8

upvoted 3 collections 9 days ago

Yi-Coder

4 items • Updated 15 days ago • 28

OLMoE

Artifacts for open mixture-of-experts language models. • 13 items • Updated 5 days ago • 18

RWKV v6

5 items • Updated 16 days ago • 8

upvoted 6 papers 10 days ago

MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct

Paper • 2409.05840 • Published 10 days ago • 43

Towards a Unified View of Preference Learning for Large Language Models: A Survey

Paper • 2409.02795 • Published 15 days ago • 70

Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation

Paper • 2409.04410 • Published 13 days ago • 23

Building Math Agents with Multi-Turn Iterative Preference Learning

Paper • 2409.02392 • Published 16 days ago • 14

Attention Heads of Large Language Models: A Survey

Paper • 2409.03752 • Published 14 days ago • 83

Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing

Paper • 2409.01322 • Published 17 days ago • 94

upvoted 16 papers 11 days ago

From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents

Paper • 2409.03512 • Published 14 days ago • 25

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Paper • 2409.02813 • Published 15 days ago • 27

LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture

Paper • 2409.02889 • Published 15 days ago • 53

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

Paper • 2409.02095 • Published 16 days ago • 32

Kvasir-VQA: A Text-Image Pair GI Tract Dataset

Paper • 2409.01437 • Published 17 days ago • 70

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

Paper • 2408.16725 • Published 21 days ago • 49

Large-Scale Multi-omic Biosequence Transformers for Modeling Peptide-Nucleotide Interactions

Paper • 2408.16245 • Published 22 days ago • 4

VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images

Paper • 2408.16176 • Published 22 days ago • 7

SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a Survey Section

Paper • 2408.16444 • Published 22 days ago • 7

InkubaLM: A small language model for low-resource African languages

Paper • 2408.17024 • Published 21 days ago • 10

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

Paper • 2408.16532 • Published 21 days ago • 44

Law of Vision Representation in MLLMs

Paper • 2408.16357 • Published 22 days ago • 92

Beyond Preferences in AI Alignment

Paper • 2408.16984 • Published 21 days ago • 1

MemLong: Memory-Augmented Retrieval for Long Text Modeling

Paper • 2408.16967 • Published 21 days ago • 1

LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA

Paper • 2409.02897 • Published 15 days ago • 42

Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation

Paper • 2409.03271 • Published 15 days ago • 2

upvoted a collection 12 days ago

Power-LM

Dense & MoE LLMs trained with power learning rate scheduler. • 3 items • Updated 8 days ago • 13

upvoted a collection 13 days ago

DeepSeek-V2.5

1 item • Updated 14 days ago • 19

upvoted a collection 15 days ago

Ruri: Japanese General Text Embeddings

18 items • Updated 7 days ago • 13

upvoted 2 articles 15 days ago

Article

The Environmental Impacts of AI -- Primer

By

•

16 days ago

• 23

Article

Meet Yi-Coder: A Small but Mighty LLM for Code

By

•

15 days ago

• 11

upvoted 2 articles 16 days ago

Article

Understanding Vector Quantization in VQ-VAE

By

•

23 days ago

• 9

Article

Selective fine-tuning of Language Models with Spectrum

By

•

17 days ago

• 25

upvoted a paper 16 days ago

OLMoE: Open Mixture-of-Experts Language Models

Paper • 2409.02060 • Published 16 days ago • 74

upvoted 3 collections 16 days ago

Tinyllama-1.1B-v1.1

3 items • Updated Apr 2 • 4

Mini-MOEs - Mixture of Experts 2x, x4 and x8

Tiny but mighty. 1B, 1.1B, 2B (2x,x4,x8) MOE models. Suggest Q8 version, and review of original model page for template (!), usage & help. • 31 items • Updated Aug 9 • 4

ZeroGPU Spaces

ZeroGPU Spaces made by the community • 17 items • Updated Jun 6 • 217

upvoted a paper 16 days ago

FLUX that Plays Music

Paper • 2409.00587 • Published 19 days ago • 31

upvoted 3 papers 17 days ago

CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization

Paper • 2408.15914 • Published 22 days ago • 21

SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding

Paper • 2408.15545 • Published 23 days ago • 32

The Future of Open Human Feedback

Paper • 2408.16961 • Published Aug 15 • 19

upvoted 3 papers 20 days ago

SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners

Paper • 2408.16768 • Published 21 days ago • 25

CogVLM2: Visual Language Models for Image and Video Understanding

Paper • 2408.16500 • Published 21 days ago • 55

Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

Paper • 2408.15518 • Published 23 days ago • 41