Indexing¶

Create a pipeline to index data from a source to the database, using a parser and a scraper.

class indexing.base.BaseIndexer(scraper, parser)¶

Abstract base class for indexing models.

index(sitemap_url: str) dict:¶

Abstract method to index content from a URL into a vectorDB.

abstract async from_pages_to_content(pages: List[ByteStream]) List[Any]¶

Abstract method to convert URLs to content.

Parameters:

pages (List[ByteStream]) – The HTML pages to convert to content.

Returns:

The content extracted from the URLs.

Return type:

List[Any]

async add_content_to_db(db: Session, content: List[Any], source: str, embed: bool)¶

Add content to the database.

Parameters:
  • db (Session) – The database session to use.

  • content (List[Any]) – Content to add to the database.

  • source (str) – The source of the content.

  • embed (bool) – Whether to embed the content

Returns:

content: Success message

Return type:

dict