Website Sources
Crawl your website to extract and index content — Fetchply automatically follows links and processes every page.
Adding Website Sources
Website crawling is the primary way to build your agent's knowledge base. Enter a URL, and Fetchply crawls the entire site.
Navigate to Sources
Open your agent's dashboard and go to the Sources tab.
Add a Website URL
Click Add Source and enter your website URL (e.g., https://yoursite.com).
Fetchply will:
- Start from the URL you provide
- Follow internal links to discover all pages
- Extract readable text content from each page
- Skip images, videos, and non-HTML files
Monitor Crawling Progress
Watch the real-time progress as pages are discovered and processed. The dashboard shows:
- Pages found vs. pages crawled
- Current page being processed
- Any errors encountered
Adding Individual URLs
You can also add specific URLs one at a time using the Train URL feature. This is useful for:
- Adding pages from external sites
- Indexing a single blog post or article
- Adding content from sites you don't want fully crawled
Individual URLs are not limited to a specific domain — you can index any publicly accessible web page.
How the Crawler Works
- Respects
robots.txt— the crawler follows standard robots exclusion rules - Text-only extraction — images, scripts, and styles are stripped; only readable content is indexed
- Deduplication — pages with identical content are not indexed twice
- Content hashing — when retraining, unchanged pages are skipped to save time
JavaScript-rendered pages: The crawler processes HTML content directly. Pages that rely heavily on client-side JavaScript rendering may not be fully crawled. Consider using file uploads for such content.
Managing Website Sources
From the Sources tab, you can:
- View all crawled URLs and their status
- Remove individual pages from the knowledge base
- Re-crawl a specific source to update content
- See crawl logs with detailed timing and error information