DataGenerationConfig

Generate synthetic data using a model for finetuning an LLM.

KEY TYPE Description
fewshotExamples int Number of fewshot examples used to prompt the model.
frequencyPenalty float Penalty for frequency of token appearance.
documentationCharLimit int Character limit for documentation.
examplesPerTarget int Number of examples per target.
descriptionCol str Name of the description column.
temperature float Sampling temperature for the model.
verifyResponse bool Whether to verify the response.
model str Model to use for data generation.
seed Optional[int] Seed for random number generation.
promptCol str Name of the input prompt column.
oversample bool Whether to oversample the data.
concurrency int Number of concurrent processes.
generationInstructions str Instructions for the data generation model.
idCol str Name of the identifier column.
subsetSize Optional[int] Size of the subset to use for generation.
completionCol str Name of the output completion column.
tokenBudget int Token budget for generation.