huggingchat/chat-ui · [MODELS] Discussion

victor

Hugging Chat org Feb 20

•

edited Apr 10

Here we can discuss about HuggingChat available models.

victor pinned discussion Feb 20

smbg56

Feb 21

what are limits of using these? how many api calls can i send them per month?

Kondwani88

Feb 21

How can I know which model am using

Alexandro14

Feb 21

How can I know which model am using

at the bottom of your screen:

EndsM

Feb 21

Out of all these models, Gemma, which was recently released, has the newest information about .NET. However, I don't know which one has the most accurate answers regarding coding

EveryPizza

Feb 21

Gemma seems really biased. With web search on, it says that it doesn't have access to recent information asking it almost anything about recent events. But when I ask it about recent events with Google, I get responses with the recent events.

GitMeSomeLlama

Feb 21

apparently gemma cannot code?

EndsM

Feb 22

Gemma is just like Google's Gemini series models, it have a very strong moral limit put on, any operation that may related to file operation, access that might be deep, would be censored and refused to reply.
So even there are solution for such things in its training data, it will just be filtered and ignored.
But still didn't test the coding accuracy that doesn't related to these kind of "dangerous" operations

294 hidden messages

Expand all

victor

Hugging Chat org 4 days ago

@typo777 can you share the conversation?

MSS444

4 days ago

Hi, I have a single machine with 10 h100 gpus(0-9) 80Gb Gpu ram, when i load the model onto 2 gpus it works well, when i switch to 3 gpus (45 Gb per gpu) or higher (tested for 3-9)the model loads but when inferencing it give trash output …//// or gives and error like the probability contains nan or inf values. I have tried using device map = auto, also tried the empty weights loading and the model dispatch with llama decoder layer specified to be on one gpu, i tried custom device maps as well, i also tried many models all had this same issue. I used ollama and was able to load the model and infer on all 10 gpus, so i think that the issue is not with the gpus’s. I have also tried using different generation arguments and found out 1 thing that if you set ‘do sample’ false then you get the probability error else you get the output in …//// form. If the model is small you get some random russian, spanish etc words. I have also tried using different configurations like float16, bfloat16, float 32(no results waited for long time). I am sharing my code as well can you guys point me in right direction. Thanks a lot.

from transformers import pipeline
import os
import torch
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer

os.environ[‘TRANSFORMERS_CACHE’] = ‘/data/HF_models’

checkpoint = “/data/HF_models/hub/models–meta-llama–Meta-Llama-3.1-70B/snapshots/7740ff69081bd553f4879f71eebcc2d6df2fbcb3”
model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map=‘auto’, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

print(model)

message = “Tell me a joke”

pipe = pipeline(
“text-generation”,
model = model,
tokenizer = tokenizer,)

generation_args = {
“max_new_tokens”: 20,
#“return_full_text”: False,
#“temperature”: 0.4,
#“do_sample”: True, #false worked
#“top_p”: 0.5,
}

print(pipe(message, **generation_args))

ANIMDUDE

3 days ago

Anyone know why this happens sometimes?
(meta-llama/Meta-Llama-3.1-70B-Instruct ):

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ have are\\\\\\\ is\\\\\\\\n\\\\n\\\\\\\\\\\\\\\\\\\\\\\\``assistant\\```````assistant\\\\````assistantassistant\\\\\\\\\`````\\assistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistanta

Temputare is too high probably

acharyaaditya26

3 days ago

Anyone know why this happens sometimes?
(meta-llama/Meta-Llama-3.1-70B-Instruct ):

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ have are\\\\\\\ is\\\\\\\\n\\\\n\\\\\\\\\\\\\\\\\\\\\\\\``assistant\\```````assistant\\\\````assistantassistant\\\\\\\\\`````\\assistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistanta

Temputare is too high probably

Please can you share conversation?? if possible