IndexingPipeline¶

async indexing_api.chunk_rag_data()¶

Dummy endpoint for data chunking (RAG).

async indexing_api.crawl_data()¶

Dummy endpoint for data crawling.

async indexing_api.index_data(url: str, question: str, answer: str, language: str)¶

Upsert a single entry to the FAQ dataset.

Parameters:
  • url (str) – URL where the entry article can be found

  • question (str) – The FAQ question

  • answer (str) – The question answer

  • language (str) – The article language

Returns:

The article id, url, question, answer and language upon successful completion of the process

Return type:

dict

async indexing_api.index_faq_data(sitemap_url: str = 'https://faq.bsv.admin.ch/sitemap.xml', proxy: str = None, k: int = 0)¶

Add and index data for Autocomplete to the FAQ database. The data is obtained by scraping the website sitemap_url.

Parameters:
  • sitemap_url (str, default ‘https://faq.bsv.admin.ch/sitemap.xml’) – the sitemap.xml URL of the website to scrap

  • proxy (str, optional) – Proxy URL if necessary

  • k (int, default 0) – Number of article to scrap and log to test the method.

Returns:

Confirmation message upon successful completion of the process

Return type:

str

async indexing_api.index_faq_vectordb()¶

Add and index test data for Autocomplete to the FAQ database.

Returns:

Confirmation message upon successful completion of the process

Return type:

str

async indexing_api.index_rag_vectordb()¶

Add and index test data for RAG to the embedding database.

Returns:

Confirmation message upon successful completion of the process

Return type:

str

async indexing_api.parse_faq_data()¶

Dummy endpoint for FAQ data parsing.

async indexing_api.parse_rag_data()¶

Dummy endpoint for data parsing (RAG).

async indexing_api.scrap_data()¶

Dummy endpoint for data scraping.