Fetchply Docs
Knowledge & Training

Training Your Agent

Understand how training works — from crawling to embedding to indexing your content.

How Training Works

Training is the process of converting your raw content (web pages, files, Q&A pairs) into a searchable knowledge base that powers your agent's responses.

The Training Pipeline

Content Extraction

Fetchply crawls your sources and extracts clean text content. HTML tags, scripts, and styles are stripped. Only readable text is kept.

Chunking

Long documents are split into segments of up to 512 tokens (~400 words). Each chunk is a self-contained piece of information that the agent can retrieve and reference.

Embedding

Each chunk is converted into a 1024-dimensional vector using Voyage AI. These vectors capture the semantic meaning of the text — similar concepts produce similar vectors.

Embeddings are processed in batches of 128 for efficiency.

Indexing

Vectors are stored in a high-performance vector database. When a visitor asks a question, Fetchply searches for the most similar vectors and returns the corresponding text chunks as context for the AI response.

Training Duration

Content SizeEstimated Time
10–50 pages1–2 minutes
50–200 pages2–5 minutes
200+ pages5–15 minutes

You can navigate away from the training page — training continues in the background. You'll be notified when it completes.

Monitoring Progress

The training page shows real-time progress:

  • Pages discovered and crawled
  • Chunks created and embedded
  • Errors encountered (if any)
  • Estimated time remaining

Starting Training via API

Trigger training
curl -X POST https://fetchply.com/api/v1/agents/YOUR_AGENT_ID/train \
  -H "Authorization: Bearer fp_your_api_key"

Only one training job can run per agent at a time. Starting a new training while one is active returns a 409 Conflict response.

Source-Scoped Training

When you add a single URL or upload a file, Fetchply can train just that source without rebuilding the entire knowledge base. This is called additive training — it's faster and doesn't interrupt existing knowledge.

Full retraining (which rebuilds everything) is used when you click Retrain or set up automatic retraining.

On this page