Huggingface Trainer
1. Huggingface Trainer
Original config:
1# =============================================================================2# Training Configuration3# =============================================================================4output_dir: null5do_train: false6do_eval: false7do_predict: false8num_train_epochs: 3.09max_steps: -110resume_from_checkpoint: null1112# =============================================================================13# Evaluation Configuration14# =============================================================================15eval_steps: null16eval_delay: 017# Options: 'no', 'steps', 'epoch'18eval_strategy: 'no'1920# =============================================================================21# Batch & Gradient Configuration22# =============================================================================23per_device_train_batch_size: 824per_device_eval_batch_size: 825auto_find_batch_size: false26gradient_accumulation_steps: 127gradient_checkpointing: false2829# =============================================================================30# Optimizer & Learning Rate31# =============================================================================32learning_rate: 5e-533weight_decay: 0.034lr_scheduler_type: linear35warmup_ratio: 0.036warmup_steps: 03738# =============================================================================39# Logging Configuration40# =============================================================================41# Options: 'debug', 'info', 'warning', 'error', 'critical', 'passive'42log_level: passive43logging_dir: null44log_on_each_node: true45# Options: 'no', 'steps', 'epoch'46logging_strategy: steps47logging_first_step: false48logging_steps: 5004950# =============================================================================51# Model Saving Configuration52# =============================================================================53# Options: 'no', 'steps', 'epoch', 'best'54save_strategy: steps55save_steps: 50056save_total_limit: null57save_only_model: false5859# =============================================================================60# Random Seeds61# =============================================================================62seed: 4263data_seed: None6465# =============================================================================66# Hardware & Performance67# =============================================================================68use_ipex: false69bf16: false70fp16: false71tf32: null72torch_compile: false73torch_compile_backend: 'inductor'74use_liger_kernel: false7576# =============================================================================77# DataLoader Configuration78# =============================================================================79dataloader_drop_last: false80dataloader_num_workers: 081dataloader_prefetch_factor: null82dataloader_pin_memory: true83dataloader_persistent_workers: false84remove_unused_columns: true85label_names: null8687# =============================================================================88# Experiment Tracking89# =============================================================================90run_name: null91report_to: null9293# =============================================================================94# Hugging Face Hub Integration95# =============================================================================96push_to_hub: false97hub_model_id: null98# Options: 'end', 'every_save', 'checkpoint', 'all_checkpoints'99hub_strategy: 'every_save'100hub_revision: null
A common used config:
1# =============================================================================2# Training Configuration3# =============================================================================4output_dir: ./output5do_train: true6do_eval: false7do_predict: false8num_train_epochs: 1.09max_steps: -110resume_from_checkpoint: null1112# =============================================================================13# Evaluation Configuration14# =============================================================================15eval_steps: null16eval_delay: 017# Options: 'no', 'steps', 'epoch'18eval_strategy: 'no'1920# =============================================================================21# Batch & Gradient Configuration22# =============================================================================23per_device_train_batch_size: 824per_device_eval_batch_size: 825auto_find_batch_size: false26gradient_accumulation_steps: 127gradient_checkpointing: true2829# =============================================================================30# Optimizer & Learning Rate31# =============================================================================32learning_rate: 1e-533weight_decay: 0.034lr_scheduler_type: cosine35warmup_ratio: 0.0336warmup_steps: 03738# =============================================================================39# Logging Configuration40# =============================================================================41# Options: 'debug', 'info', 'warning', 'error', 'critical', 'passive'42log_level: passive43logging_dir: ./logs44log_on_each_node: false45# Options: 'no', 'steps', 'epoch'46logging_strategy: steps47logging_first_step: true48logging_steps: 14950# =============================================================================51# Model Saving Configuration52# =============================================================================53# Options: 'no', 'steps', 'epoch', 'best'54save_strategy: steps55save_steps: 50056save_total_limit: null57save_only_model: false5859# =============================================================================60# Random Seeds61# =============================================================================62seed: 4263data_seed: None6465# =============================================================================66# Hardware & Performance67# =============================================================================68use_ipex: false69bf16: true70fp16: false71tf32: true72torch_compile: false73torch_compile_backend: 'inductor'74use_liger_kernel: true7576# =============================================================================77# DataLoader Configuration78# =============================================================================79dataloader_drop_last: false80dataloader_num_workers: 881dataloader_prefetch_factor: 282dataloader_pin_memory: true83dataloader_persistent_workers: false84remove_unused_columns: true85label_names: null8687# =============================================================================88# Experiment Tracking89# =============================================================================90run_name: null91report_to: wandb9293# =============================================================================94# Hugging Face Hub Integration95# =============================================================================96push_to_hub: true97hub_model_id: null98# Options: 'end', 'every_save', 'checkpoint', 'all_checkpoints'99hub_strategy: 'all_checkpoints'100hub_revision: null
Some methods you may need to modify:
1from transformers import (2 AutoConfig,3 AutoModel,4 AutoModelForSequenceClassification,5 AutoTokenizer,6 Trainer,7)89# Load model and tokenizer10model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)11tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")1213trainer = Trainer(14 model=model,15 args=training_args,16 train_dataset=train_dataset,17 eval_dataset=eval_dataset,18 processing_class=tokenizer,19)2021config = AutoConfig.from_pretrained(pretrained_model_name_or_path="bert-base-uncased")22model = AutoModel.from_config(config)23model_size = sum(t.numel() for t in model.parameters())24print(f"Model size: {model_size / 1000**2:.1f}M parameters")25# methods you can use26trainer.pop_callback()27trainer.remove_callback()28trainer.get_num_trainable_parameters()29trainer.get_learning_rates()30trainer.floating_point_ops(inputs=inputs)31trainer.init_hf_repo()32trainer.create_model_card()33trainer.push_to_hub()343536# methods that you may need to modify37train_dataloader = trainer.get_train_dataloader()38eval_dataloader = trainer.get_eval_dataloader()39trainer.create_optimizer()40trainer.create_scheduler(num_training_steps=num_training_steps, optimizer=optimizer)41trainer.log(logs=logs, start_time=start_time)42trainer.training_step(model=model, inputs=inputs, num_items_in_batch=num_items_in_batch)43trainer.compute_loss(44 model=model, inputs=inputs, return_outputs=return_outputs, num_items_in_batch=num_items_in_batch45)46trainer.save_model()47trainer.evaluate()48trainer.evaluation_loop()49trainer.predict()50trainer.prediction_step()5152trainer.train(resume_from_checkpoint=args.resume_from_checkpoint)