Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
mayank-mishra 
posted an update Mar 9

Interesting! @joaogante and @tomaarsen and @olivierdehaene might be interested in this too!

Nice blog!
@osanseviero we have been doing this in TGI and TEI for a while ;)
Padding free implementations also make dynamic batching easier to implement and more predictable in memory.

·

yeah, its just that people have not been using this for finetuning where it can give considerable memory savings. I guess the issue is the core design of HF transformers.

I am planning to release the code for this sometime soon :)

Really Intresting ,can't wait to see the code

This comment has been hidden
·
This comment has been hidden

really nice blog

·

Thanks a lot @julien-c
means a lot coming from you :)