Wuzzy is a decentralized web crawling and search system built on the AO (Actor Oriented) protocol. The system consists of two main components:
The Wuzzy system uses a distributed architecture where:
The Nest is the central hub that:
Crawlers are autonomous processes that:
Both components use an Access Control List (ACL) system with roles:
owner: Full administrative accessadmin: Administrative accessThe Nest provides the following handlers for document indexing, search, and crawler management.
Indexes a document in the search database.
Action: Index-Document
Required Roles: owner, admin, Index-Document
Parameters:
document-url (string): The URL of the document to index, used as document-iddocument-last-crawled-at (string): The date header from the relay device responsedocument-content-type (string): MIME type of the documentdata (string): The content of the documentdocument-title (string, optional): Title of the documentdocument-description (string, optional): Description/summary of the documentResponse:
Example:
Removes a document from the search index.
Action: Remove-Document
Required Roles: owner, admin, Remove-Document
Parameters:
document-id (string): The ID of the document to remove, typically its URLResponse:
Searches the document index for matching content.
Action: Search
Required Roles: None (public)
Parameters:
query (string): The search querysearch-type (string, optional): Search algorithm to use (simple or bm25, defaults to simple)Response:
Example:
Registers an existing crawler with the Nest.
Action: Add-Crawler
Required Roles: owner, admin, Add-Crawler
Parameters:
crawler-id (string): The process ID of the crawler to addcrawler-name (string, optional): Name for the crawler, defaults to "My Wuzzy Crawler"Response:
Removes a crawler from the Nest.
Action: Remove-Crawler
Required Roles: owner, admin, Remove-Crawler
Parameters:
crawler-id (string): The process ID of the crawler to removeResponse:
The Crawler provides handlers for managing crawl tasks and processing web content.
Requests immediate crawling of a specific URL.
Action: Request-Crawl
Required Roles: owner, admin, Request-Crawl
Parameters:
url (string): The URL to crawl immediatelyResponse:
Adds URLs to the crawl task queue.
Action: Add-Crawl-Tasks
Required Roles: owner, admin, Add-Crawl-Tasks
Parameters:
data (string): Newline-separated list of URLs to crawlResponse:
Example:
Removes URLs from the crawl task queue.
Action: Remove-Crawl-Tasks
Required Roles: owner, admin, Remove-Crawl-Tasks
Parameters:
data (string): Newline-separated list of URLs to removeResponse:
Configures which Nest the crawler should submit documents to.
Action: Set-Nest-Id
Required Roles: owner, admin, Set-Nest-Id
Parameters:
nest-id (string): The process ID of the target NestResponse:
Triggers the crawler's scheduled processing cycle.
Action: Cron
Required Roles: owner, admin, Cron
Parameters: None
Behavior:
Note: This is typically called by a scheduler, not manually.
Both components include built-in state management handlers:
Both components support ACL management:
Updates user roles and permissions.
Action: Update-Roles
Required Roles: owner, admin
Parameters:
Grant and/or Revoke operationsRetrieves current role assignments.
Action: Get-Roles
Required Roles: owner, admin, Get-Roles
http:// and https:// - Standard web protocolsarns:// - Arweave Name System URLsar:// - Direct Arweave transaction URLsarns:// - Arweave Name System URLsar:// - Direct Arweave transaction URLsNote: HTTP/HTTPS support may be limited in the Nest depending on configuration.
All handlers include comprehensive error checking and will respond with assertion errors if:
Errors are returned as standard AO error responses with descriptive messages.