IndexingPipeline¶
- async indexing_api.chunk_rag_data()¶
Dummy endpoint for data chunking (RAG).
- async indexing_api.crawl_data()¶
Dummy endpoint for data crawling.
- async indexing_api.index_data(url: str, question: str, answer: str, language: str)¶
Upsert a single entry to the FAQ dataset.
- Parameters:
url (str) – URL where the entry article can be found
question (str) – The FAQ question
answer (str) – The question answer
language (str) – The article language
- Returns:
The article id, url, question, answer and language upon successful completion of the process
- Return type:
dict
- async indexing_api.index_faq_data(sitemap_url: str = 'https://faq.bsv.admin.ch/sitemap.xml', proxy: str = None, k: int = 0)¶
Add and index data for Autocomplete to the FAQ database. The data is obtained by scraping the website sitemap_url.
- Parameters:
sitemap_url (str, default ‘https://faq.bsv.admin.ch/sitemap.xml’) – the sitemap.xml URL of the website to scrap
proxy (str, optional) – Proxy URL if necessary
k (int, default 0) – Number of article to scrap and log to test the method.
- Returns:
Confirmation message upon successful completion of the process
- Return type:
str
- async indexing_api.index_faq_vectordb()¶
Add and index test data for Autocomplete to the FAQ database.
- Returns:
Confirmation message upon successful completion of the process
- Return type:
str
- async indexing_api.index_rag_vectordb()¶
Add and index test data for RAG to the embedding database.
- Returns:
Confirmation message upon successful completion of the process
- Return type:
str
- async indexing_api.parse_faq_data()¶
Dummy endpoint for FAQ data parsing.
- async indexing_api.parse_rag_data()¶
Dummy endpoint for data parsing (RAG).
- async indexing_api.scrap_data()¶
Dummy endpoint for data scraping.