Gretel AI Shut Down. Here's What That Means for Training Data

In February 2026, Gretel AI — one of the most prominent companies in synthetic data generation — was archived. Their GitHub organization was marked as no longer maintained. The platform that thousands of teams relied on for generating synthetic datasets, including training data for LLM fine-tuning, effectively ceased operations.

If you were a Gretel user, you're now looking for alternatives. If you weren't, the shutdown still matters — because it reveals something important about where the training data market is heading and where it's failing.

What Gretel Did

Gretel built tools for generating synthetic data. Their platform covered a wide range of use cases: creating privacy-safe versions of tabular datasets, generating synthetic text for model training, and producing QA pairs with automated scoring for conformance, quality, toxicity, bias, and accuracy.

They had integrations with major cloud platforms including Azure AI Foundry. They offered both open-source libraries and commercial services. For teams building LLMs or fine-tuning models, Gretel was one of the few platforms that could generate structured training data at scale.

Their Navigator product was specifically designed for creating custom question-answer pairs — the exact format needed for supervised fine-tuning. For many teams, Gretel was the closest thing to an automated training data pipeline that existed.

And now it's gone.

Why This Matters

The timing is significant. We're in a moment where enterprise fine-tuning demand is accelerating rapidly. Organizations across insurance, legal, healthcare, and financial services are realizing that generic foundation models don't understand their domain-specific documents, terminology, and reasoning patterns. Fine-tuning on proprietary data is the path to models that actually work for specialized tasks.

But fine-tuning requires training data. Specifically, it requires high-quality instruction-response pairs that teach the model how to reason about your domain. Creating this data has always been the bottleneck — and Gretel was one of the few companies offering an automated solution.

With Gretel gone, the market has a gap. And the remaining options all have significant limitations.

The Current Landscape

If you need training data for LLM fine-tuning in March 2026, here's what's available:

Manual Annotation Services

Scale AI, Appen

Human-created training data with potentially high quality. But the process takes weeks to months, costs $5–15 per pair, and requires uploading your confidential documents to their platform for annotators to work with. For regulated industries or sensitive data, this creates an unacceptable confidentiality problem.

Open-Source Tools

Unsloth synthetic dataset notebooks

Free and runs on your hardware, solving the privacy problem. But it's a Python notebook — you need technical skills. There's no quality review workflow, no domain-specific question strategies, no cross-document synthesis, and no way to systematically catch bad pairs before they contaminate your training set.

Cloud AI APIs

OpenAI, Google, Anthropic

Can generate training pairs with the right prompting, but you're uploading documents to a cloud service. Quality depends on prompt engineering skill. No structured pipeline, no quality scoring, no review workflow. And generating thousands of pairs through an API gets expensive quickly.

NVIDIA NeMo

Enterprise synthetic data pipelines

Enterprise-grade and well-engineered for agentic AI and conversational workflows. But it's a toolkit, not a product — you need an ML engineering team to implement and operate it. Requires NVIDIA hardware and infrastructure expertise most organizations don't have in-house.

None of these options provide what Gretel offered at its best: an accessible, structured platform for generating quality training data. And none of them are specifically designed for the use case that's growing fastest — turning proprietary enterprise documents into domain-specific fine-tuning datasets.

What the Market Actually Needs

The teams we talk to — in insurance, legal, compliance, and enterprise operations — describe the same problem consistently. They have large collections of domain-specific documents. They know fine-tuning would make their AI dramatically better. But the process of turning those documents into training data is either too expensive, too slow, too technically complex, or requires them to compromise on data privacy.

What they need is a system that can ingest their actual documents — not generic web text, but insurance policies, legal contracts, regulatory filings, internal manuals, codebases — and produce structured training pairs that capture the domain's reasoning patterns.

Not just simple factual extraction ("What does Section 3 cover?") but conditional reasoning ("If the policyholder's revenue exceeds $50,000, does this coverage still apply?"), exclusion logic ("What is explicitly NOT covered under this provision?"), and cross-document synthesis ("How do these two policies interact when both contain overlapping liability clauses?").

The pairs need quality control — not every AI-generated question is worth training on. Ephemeral data should be filtered out because it doesn't teach domain knowledge. Boilerplate and duplicate content should be detected and excluded before any pairs are generated. And a human expert should be able to review every pair and decide whether it meets the bar.

Finally, the entire process should run locally. If you're working with confidential insurance policies or legal contracts, the training data pipeline should have the same privacy guarantees as the documents themselves.

Where FORGE Fits

This is the specific problem we built FORGE to solve.

FORGE is an add-on to Librarian, our local AI document assistant. Librarian handles the document ingestion — parsing PDFs, Word documents, Excel files, and code across 20+ languages, then chunking and indexing them in a local vector database with semantic search.

FORGE reads those indexed chunks and generates training pairs automatically. But not randomly. It uses a structured approach:

Document Classification

Identifies whether each chunk is narrative prose, source code, tabular data, or legal/regulatory text — and adapts the question generation strategy accordingly.

Multi-Angle Generation

Creates up to three question types per qualifying passage: factual recall, conditional application, and exclusion reasoning.

Cross-Document Synthesis

Identifies relationships between files — shared terminology, overlapping provisions, dependency chains — and generates pairs that require integrating information from multiple sources. This is the capability that's hardest to replicate manually and that no other automated tool currently offers.

Quality Scoring

Evaluates every pair on multiple criteria before it reaches the human reviewer. Pairs that reference ephemeral data, lack specificity, or don't faithfully reflect the source material are flagged or auto-rejected.

Human Review

Every pair passes through an Accept, Reject, or Edit interface. The system allows a maximum of two generation attempts per chunk before auto-rejecting — preventing the pipeline from spending resources on low-quality source material.

The output is a clean JSON dataset in Alpaca format (instruction, input, response), ready to drop into any fine-tuning pipeline — Unsloth, Hugging Face, or any SFT toolchain.

And everything — ingestion, generation, review, and export — runs locally on your hardware. Documents never leave your machine.

Beyond Training Data

One thing Gretel never offered was the next step: actually fine-tuning the model.

FORGE doesn't stop at dataset generation. Once you have a reviewed, quality-gated dataset, FORGE can pull a base model from HuggingFace, run LoRA/QLoRA fine-tuning on your local GPU, and deploy the resulting model directly into Librarian — all within the same system.

The full pipeline looks like this:

1 Index your documents with Librarian

2 Generate training pairs with FORGE

3 Review and gate pairs (accept, reject, edit)

4 Select a base model

5 Fine-tune locally on your GPU

6 Deploy and assign the model to your document folder

No separate tools. No cloud services. No Python scripts to stitch together. One system, from raw documents to a deployed domain-specific model.

Who This Is For

FORGE isn't for everyone. It requires an NVIDIA GPU with at least 8 GB of VRAM for training (24 GB+ recommended for serious work with larger models). It's designed for professionals and teams who work with domain-specific documents and need AI that genuinely understands their field — not a general-purpose chatbot that gives surface-level answers.

If you're an InsurTech company building AI tools for the insurance industry, a law firm that needs AI trained on your contract library, a compliance team that wants AI fluent in your regulatory framework, or an enterprise with proprietary documentation that a generic model can't handle — FORGE is built for your use case.

If you were using Gretel for domain-specific training data and need an alternative that runs locally and goes beyond data generation into actual fine-tuning, we'd welcome the chance to show you what FORGE produces.

Getting Started

FORGE is available as an add-on to any Librarian subscription. You can start with a free 30-day trial of Librarian, index your documents, and see the quality of the search and retrieval before adding FORGE for training data generation and fine-tuning.

If you want to see what FORGE produces before committing, send us a few sample documents. We'll generate a demo dataset and send you the JSON output — no commitment, no cost. You can evaluate the pair quality yourself before deciding.

Librarian is a private, local AI document assistant. FORGE is the fine-tuning add-on that turns your documents into domain-specific AI models. Everything runs on your machine. Start a free trial →

Want to see FORGE in action on your own documents? Contact us for a free demo dataset →

Gretel AI Shut Down. Here's What That Means for Training Data — And What Comes Next.