DataGenerationConfig

Generate synthetic data using a model for finetuning an LLM.

KEY TYPE Description
concurrency int Number of concurrent processes.
descriptionCol str Name of the description column.
verifyResponse bool Whether to verify the response.
oversample bool Whether to oversample the data.
seed Optional[int] Seed for random number generation.
idCol str Name of the identifier column.
documentationCharLimit int Character limit for documentation.
promptCol str Name of the input prompt column.
completionCol str Name of the output completion column.
temperature float Sampling temperature for the model.
examplesPerTarget int Number of examples per target.
frequencyPenalty float Penalty for frequency of token appearance.
model str Model to use for data generation.
fewshotExamples int Number of fewshot examples used to prompt the model.
generationInstructions str Instructions for the data generation model.
subsetSize Optional[int] Size of the subset to use for generation.
tokenBudget int Token budget for generation.