Close Menu
    What's Hot

    Solana indicators point north, bulls test $165 target

    Cardano is at the Nexus of Bitcoin DeFi: Charles Hoskinson

    ChatGPT vs Cursor.ai vs Windsurf

    Facebook X (Twitter) Instagram
    yeek.io
    • Crypto Chart
    • Crypto Price Chart
    X (Twitter) Instagram TikTok
    Trending Topics:
    • Altcoin
    • Bitcoin
    • Blockchain
    • Crypto News
    • DeFi
    • Ethereum
    • Meme Coins
    • NFTs
    • Web 3
    yeek.io
    • Altcoin
    • Bitcoin
    • Blockchain
    • Crypto News
    • DeFi
    • Ethereum
    • Meme Coins
    • NFTs
    • Web 3
    Web 3

    Fine-Tuning Llama 3.2 11B for Extractive Question Answering

    Yeek.ioBy Yeek.ioNovember 26, 2024No Comments4 Mins Read
    Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Massive Language Fashions (LLMs) are highly effective instruments that may carry out a variety of pure language processing duties. Nevertheless, on account of their generic and broadly targeted coaching, they could not all the time carry out optimally on particular duties. High-quality-tuning is a way that permits us to adapt a pre-trained LLM to a selected job, similar to extractive query answering, with out altering the unique weights. On this article, we’ll discover learn how to fine-tune Llama 3.2 11B utilizing the Q-LoRA method and display its efficiency enhance on the SQuAD v2 dataset.

    What’s LoRA?

    LoRA (Low-Rank Adaption) is a way used so as to add new weights to an present mannequin to switch its habits with out altering the unique weights. It includes including new “adapter” weights that modify the output of sure layers, that are modified through the coaching course of whereas the unique weights stay the identical. By freezing the unique weights, LoRA ensures that the mannequin retains its pre-trained information whereas including new, task-specific capabilities by means of the adapter weights.

    Defining the Experiment

    We’ll fine-tune Llama 3.2 11B for extractive query answering utilizing the SQuAD v2 dataset on this experiment. The purpose is to coach the mannequin to extract particular parts of textual content that instantly reply a consumer’s query with out summarizing or rephrasing.

    System Setting

    This experiment was run on a Google Colab platform with an A100 GPU. The code is written in Python and makes use of the Hugging Face Transformers library.

    Putting in Packages

    !pip set up -U transformers peft bitsandbytes datasets trl consider bert_score
    

    Loading Knowledge

    We’ll use the SQuAD v2 dataset for coaching and analysis.

    from datasets import load_dataset
    
    ds = load_dataset("squad_v2")
    print(ds)
    

    Output:

    DatasetDict({
        practice: Dataset({
            options: ['id', 'title', 'context', 'question', 'answers'],
            num_rows: 130319
        })
        validation: Dataset({
            options: ['id', 'title', 'context', 'question', 'answers'],
            num_rows: 11873
        })
    })
    

    Knowledge Preparation

    We’ll cut up the dataset into coaching, validation, and check units and convert the samples right into a format appropriate for Llama.

    num_training_samples = 15000
    num_test_samples = 750
    num_validation_samples = 1000
    
    training_samples = ds['train'].choose([i for i in range(num_training_samples)])
    test_samples = ds['train'].choose([i for i in range(num_training_samples, num_training_samples+num_test_samples)])
    validation_samples = ds['validation'].choose([i for i in range(num_validation_samples)])
    
    def convert_squad_sample_to_llama_conversation(pattern):
        
        return {"textual content": sample_conversation, "messages": messages, "reply": reply}
    
    conversation_training_samples = training_samples.map(convert_squad_sample_to_llama_conversation)
    conversation_test_samples = test_samples.map(convert_squad_sample_to_llama_conversation)
    conversation_validation_samples = validation_samples.map(convert_squad_sample_to_llama_conversation)
    

    Mannequin Preparation

    We’ll load the Llama 3.2 11B mannequin with 4-bit quantization and arrange the LoRA config.

    from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
    
    model_name = "meta-llama/Llama-3.2-11B-Imaginative and prescient-Instruct"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    tokenizer.padding_side = "left"
    tokenizer.pad_token = tokenizer.eos_token
    
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type='nf4',
        bnb_4bit_compute_dtype=torch.float16,
        bnb_4bit_use_double_quant=True
    )
    mannequin = AutoModelForCausalLM.from_pretrained(
        model_name,
        quantization_config = bnb_config,
        device_map="auto"
    )
    mannequin.config.pad_token_id = tokenizer.pad_token_id
    mannequin.config.use_cache = False
    
    from peft import LoraConfig
    rank = 128
    alpha = rank*2
    peft_config = LoraConfig(
        r=rank,
        lora_alpha=alpha,
        lora_dropout=0.05,
        bias="none",
        task_type="CAUSAL_LM",
        target_modules=['k_proj', 'q_proj', 'v_proj', 'o_proj', 'gate_proj', 'down_proj', 'up_proj']
    )
    

    Coaching

    We’ll use the SFTTrainer from the trl library to coach the mannequin.

    from transformers import TrainingArguments
    from trl import SFTTrainer
    
    training_arguments = TrainingArguments(
        output_dir=model_checkpoint_path,
        optim='paged_adamw_32bit',
        per_device_train_batch_size=8,
        gradient_accumulation_steps=4,
        log_level='debug',
        evaluation_strategy = "steps",
        save_strategy='steps',
        logging_steps=8,
        eval_steps=8,
        save_steps=8,
        learning_rate=1e-4,
        fp16=True,
        num_train_epochs=4,
        max_steps=120,
        warmup_ratio=0.1,
        load_best_model_at_end = True,
        overwrite_output_dir = True,
        lr_scheduler_type='linear',
    )
    
    coach = SFTTrainer(
        mannequin=mannequin,
        train_dataset=conversation_training_samples,
        eval_dataset=conversation_test_samples,
        peft_config=peft_config,
        dataset_text_field='textual content',
        max_seq_length=1024,
        tokenizer=tokenizer,
        args=training_arguments
    )
    

    Analysis

    We’ll consider the mannequin utilizing the bert-score and exact-match metrics.

    from consider import load
    
    bert_model = "microsoft/deberta-v2-xxlarge-mnli"
    bertscore = load("bertscore")
    exact_match_metric = load("exact_match")
    
    def get_bulk_predictions(pipe, samples):
        
    
    def get_base_and_tuned_bulk_predictions(samples):
        
    
    conversation_validation_samples = conversation_validation_samples.map(get_base_and_tuned_bulk_predictions, batched=True, batch_size=20)
    
    base_predictions = conversation_validation_samples['base_prediction']
    solutions = conversation_validation_samples['answer']
    base_validation_bert_score = bertscore.compute(predictions=base_predictions, references=solutions, lang="en", model_type=bert_model, gadget="cuda:0")
    baseline_exact_match_score = exact_match_metric.compute(predictions=base_predictions, references=solutions)
    
    trained_predictions = conversation_validation_samples['trained_prediction']
    solutions = conversation_validation_samples['answer']
    trained_validation_bert_score = bertscore.compute(predictions=trained_predictions, references=solutions, lang="en", model_type=bert_model, gadget="cuda:0")
    tuned_exact_match_score = exact_match_metric.compute(predictions=trained_predictions, references=solutions)
    

    Outcomes

    The coaching course of took round 1 hour on an A100 GPU. The outcomes present a big enchancment within the mannequin’s efficiency on the validation set.

    Metric Base Mannequin Tuned Mannequin
    BERT Rating 0.6469 0.7505
    Actual Match 0.116 0.418

    Conclusion

    This text demonstrated learn how to fine-tune Llama 3.2 11B for extractive query answering utilizing the Q-LoRA method. The outcomes present a big enchancment within the mannequin’s efficiency on the validation set, with a rise within the BERT and precise match scores. This system might be utilized to different duties and fashions, and we hope that this text serves as a complete information for future analysis and purposes.

    Follow on Google News Follow on Flipboard
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
    Previous ArticleCoinDCX Powers Unfold 2024: Transforming India’s Web3 Ecosystem with $1 Million Funding Opportunities
    Next Article Sony Unveils Soneium™: Pioneering Web3 for Everyday Use | NFT CULTURE | NFT News | Web3 Culture
    Avatar
    Yeek.io
    • Website

    Yeek.io is your trusted source for the latest cryptocurrency news, market updates, and blockchain insights. Stay informed with real-time updates, expert analysis, and comprehensive guides to navigate the dynamic world of crypto.

    Related Posts

    ChatGPT vs Cursor.ai vs Windsurf

    June 7, 2025

    Explore, Spin & Earn Big!

    June 7, 2025

    Why U.S. States Are Exploring Digital Asset Reserves

    June 6, 2025
    Leave A Reply Cancel Reply

    Advertisement
    Demo
    Latest Posts

    Solana indicators point north, bulls test $165 target

    Cardano is at the Nexus of Bitcoin DeFi: Charles Hoskinson

    ChatGPT vs Cursor.ai vs Windsurf

    Dogecoin faces a sell wall – Will smart money hold or fold at $0.17?

    Popular Posts
    Advertisement
    Demo
    X (Twitter) TikTok Instagram

    Categories

    • Altcoin
    • Bitcoin
    • Blockchain
    • Crypto News

    Categories

    • Defi
    • Ethereum
    • Meme Coins
    • Nfts

    Quick Links

    • Home
    • About
    • Contact
    • Privacy Policy

    Important Links

    • Crypto Chart
    • Crypto Price Chart
    © 2025 Yeek. All Copyright Reserved

    Type above and press Enter to search. Press Esc to cancel.