SE6446/Llama-3.1-SuperNova-Lite-Reflection-V1.0

This model is a LoRA adaptation of arcee-ai/Llama-3.1-SuperNova-Lite on thesven/Reflective-MAGLLAMA-v0.1. This has been a simple experiment into reflection and the model appears to perform adequately, though I am unsure if it is a large improvement.

See axolotl config

axolotl version: 0.4.1

base_model: arcee-ai/Llama-3.1-SuperNova-Lite

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: SE6446/MAGllama_Sharegpt
    type: sharegpt
    conversation: chatml
  
dataset_prepared_path: /workspace/data/last_run_prepared
val_set_size: 0.05
output_dir: /workspace/data/outputs/out

sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true
eval_sample_packing: false


hub_model_id: SE6446/Llama-3.1-SuperNova-Lite-Reflections-3
hub_strategy: every_save
use_auth_token: true

wandb_project: Bojangles
wandb_entity:
wandb_watch:
wandb_name: run-6
wandb_log_model: checkpoint

gradient_accumulation_steps: 2
micro_batch_size: 1
num_epochs: 2
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 0.00015

adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_modules_to_save:
  - embed_tokens
  - lm_head

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
logging_steps: 1
xformers_attention:
flash_attention: false

warmup_steps: 10
evals_per_epoch: 2
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed: 
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  pad_token: <|end_of_text|>
tokens:
  - <thinking>
  - </thinking>
  - <reflection>
  - </reflection>
  - <output>
  - </output>

Instructions

Using hf pipeline

You must use the tokenizer provided with the model as the COT tokens are unique special tokens. It should work on most inference engines that can run llama 3.1

from transformers import pipeline

pipe = pipeline("text-generation", "SE6446/Llama-3.1-SuperNova-Lite-Reflection-V1.0", device_map="auto",trust_remote_code=True)

sys_prompt = "You are an AI assistant who reflects before answering the user." #If you put 'reflect' it will typically do so. If you want to vary the character just append it under this.
user_prompt = "Explain the difference between Newtonian and Keplerian orbits for a five year old." #Classic

messages = [
    {
        "role": "system",
        "content": sys_prompt,
    },
    {"role": "user", "content": user_prompt}
]

prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
prompt = prompt + "<thinking>" #Though not necessary, putting <thinking> under the new line does ensure it reflects. Testing revealed not doing this could cause it to rarely disobey the tokens. Which is bad.
# prompt = "<|im_start|>assistant\n[sys prompt]<|im_end|><|im_start|>user\n[user input]<|im_end|><|im_start|>assistant\n<thinking>" should do the trick if you like it old school.

text = pipe(prompt, max_new_tokens=1000) #max_new_tokens needs to be decently high so it may adequatley perform it's reflection AND output a concise answer.
print(text[0]['generated_text'])

Training details

It achieves the following results on the evaluation set:

Loss: 0.6365

Training procedure

I trained it as a LoRA not only because it is cheap, but because it tries to preserve as much of the original parameters as possible. I just wanted it to get used to COT.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.00015
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 8
total_eval_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss
2.7211	0.0049	1	1.4048
0.6381	0.5	103	0.6583
0.4985	1.0049	206	0.6320
0.4992	1.5049	309	0.6365

Framework versions

PEFT 0.12.0
Transformers 4.45.0.dev0
Pytorch 2.3.1+cu121
Datasets 2.21.0
Tokenizers 0.19.1

SE6446
/

Llama-3.1-SuperNova-Lite-Reflection-V1.0