Indexing¶

Create a pipeline to index data from a source to the database, using a parser and a scraper.

class indexing.base.BaseIndexer(scraper, parser)¶

Abstract base class for indexing models.

index(sitemap_url: str) → dict:¶: Abstract method to index content from a URL into a vectorDB.

abstract async from_pages_to_content(pages: List[ByteStream]) → List[Any]¶

Abstract method to convert URLs to content.

Parameters:: pages (List[ByteStream]) – The HTML pages to convert to content.
Returns:: The content extracted from the URLs.
Return type:: List[Any]

async get_content_from_pdf(content: List[Any]) → List[Any]¶

Extract content from PDFs.

async add_content_to_db(db: Session, content: List[Any], source: str, user_uuid: str, language: str, embed: bool)¶

Add content to the database.

Parameters:

Returns:

content: Success message

Return type:

dict