Spaces:
Running
[MODELS] Discussion
what are limits of using these? how many api calls can i send them per month?
How can I know which model am using
Out of all these models, Gemma, which was recently released, has the newest information about .NET. However, I don't know which one has the most accurate answers regarding coding
Gemma seems really biased. With web search on, it says that it doesn't have access to recent information asking it almost anything about recent events. But when I ask it about recent events with Google, I get responses with the recent events.
apparently gemma cannot code?
Gemma is just like Google's Gemini series models, it have a very strong moral limit put on, any operation that may related to file operation, access that might be deep, would be censored and refused to reply.
So even there are solution for such things in its training data, it will just be filtered and ignored.
But still didn't test the coding accuracy that doesn't related to these kind of "dangerous" operations
Hi, I have a single machine with 10 h100 gpus(0-9) 80Gb Gpu ram, when i load the model onto 2 gpus it works well, when i switch to 3 gpus (45 Gb per gpu) or higher (tested for 3-9)the model loads but when inferencing it give trash output โฆ//// or gives and error like the probability contains nan or inf values. I have tried using device map = auto, also tried the empty weights loading and the model dispatch with llama decoder layer specified to be on one gpu, i tried custom device maps as well, i also tried many models all had this same issue. I used ollama and was able to load the model and infer on all 10 gpus, so i think that the issue is not with the gpusโs. I have also tried using different generation arguments and found out 1 thing that if you set โdo sampleโ false then you get the probability error else you get the output in โฆ//// form. If the model is small you get some random russian, spanish etc words. I have also tried using different configurations like float16, bfloat16, float 32(no results waited for long time). I am sharing my code as well can you guys point me in right direction. Thanks a lot.
from transformers import pipeline
import os
import torch
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
os.environ[โTRANSFORMERS_CACHEโ] = โ/data/HF_modelsโ
checkpoint = โ/data/HF_models/hub/modelsโmeta-llamaโMeta-Llama-3.1-70B/snapshots/7740ff69081bd553f4879f71eebcc2d6df2fbcb3โ
model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map=โautoโ, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
print(model)
message = โTell me a jokeโ
pipe = pipeline(
โtext-generationโ,
model = model,
tokenizer = tokenizer,)
generation_args = {
โmax_new_tokensโ: 20,
#โreturn_full_textโ: False,
#โtemperatureโ: 0.4,
#โdo_sampleโ: True, #false worked
#โtop_pโ: 0.5,
}
print(pipe(message, **generation_args))
Anyone know why this happens sometimes?
(meta-llama/Meta-Llama-3.1-70B-Instruct ):\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ have are\\\\\\\ is\\\\\\\\n\\\\n\\\\\\\\\\\\\\\\\\\\\\\\``assistant\\```````assistant\\\\````assistant
assistant\\\\
\\\\\`````\\assistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistanta
Temputare is too high probably
Anyone know why this happens sometimes?
(meta-llama/Meta-Llama-3.1-70B-Instruct ):\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ have are\\\\\\\ is\\\\\\\\n\\\\n\\\\\\\\\\\\\\\\\\\\\\\\``assistant\\```````assistant\\\\````assistant
assistant\\\\
\\\\\`````\\assistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantaTemputare is too high probably
Please can you share conversation?? if possible
hi, can we have deepseek v2.5 model?
I need community model features
unable to download "meta-llama/Meta-Llama-3.1-405B-Instruct-FP8" model gets struck at 81%, no disk space issues on my side.
Qwen 2.5 72B is open weights SOTA level per Artificial Analysis:
https://x.com/ArtificialAnlys/status/1836822858695139523?t=Z-rFb-13NPEC2pDqZYjoPQ&s=19
Also seconding mistral large 2, Deepseek 2.5