Indexing APIΒΆ
- async indexing_api.init_indexing()ΒΆ
Initialize the database according to the configuration
indexing_config
specified inconfig.yaml
- async indexing_api.upload_csv_rag(file: fastapi.UploadFile = fastapi.File, embed: bool = False, db: Session = fastapi.Depends)ΒΆ
Upload a CSV file containing RAG data to the database with optional embeddings. The function acknowledges the following columns:
url: source URL of the document
text: Text content of the document
language (optional): Language of the document
embedding (optional): Embedding of the document
tags (optional): Tags of the document
organizations (optional): Organizations access of the document
subtopics (optional): Subtopics of the document
summary (optional): Summary of the document
hyq (optional): Hypothetical queries associated to the document
hyq_declarative (optional): Declarative hypothetical queries associated to the document
doctype (optional): Type of the document
user_uuid (optional): UUID of the user who uploaded the file
- Parameters:
file (UploadFile) β The CSV file sent by the user
embed (bool, optional) β Whether to embed the data or not. Defaults to False.
db (Session) β Database session
- Returns:
A response body containing a confirmation message upon successful completion of the process.
- Return type:
ResponseBody
- async indexing_api.upload_csv_faq(file: fastapi.UploadFile = fastapi.File, embed: bool = False, db: Session = fastapi.Depends)ΒΆ
Upload a CSV file containing RAG data to the database with optional embeddings. The function acknowledges the following columns:
url: source URL of the information
text: Text content of the question
answer: Text content of the answer
language (optional): Language of the question and answer
text_embedding (optional): Embedding of the question text
tags (optional): Tags of the document
- Parameters:
file (UploadFile) β The CSV file sent by the user
embed (bool, optional) β Whether to embed the data or not. Defaults to False.
db (Session) β Database session
- Returns:
A response body containing a confirmation message upon successful completion of the process.
- Return type:
ResponseBody
- async indexing_api.upload_csv_tags(file: fastapi.UploadFile = fastapi.File, embed: bool = False, db: Session = fastapi.Depends)ΒΆ
Upload a CSV file containing tags data to the database with optional embeddings. The function acknowledges the following columns:
tags_en: Tag name in english
description_en: English description of the tag
description: Description of the tag
language: Language of the tag
embedding (optional): Embedding of the description
- Parameters:
file (UploadFile) β The CSV file sent by the user
embed (bool, optional) β Whether to embed the data or not. Defaults to False.
db (Session) β Database session
- Returns:
A response body containing a confirmation message upon successful completion of the process.
- Return type:
ResponseBody
- async indexing_api.parse_pdf(file: fastapi.UploadFile = fastapi.File)ΒΆ
Parse a PDF file and return the text chunks as documents.
- Parameters:
file (UploadFile) β The PDF file sent by the user
- Returns:
A response body containing the text chunks of the PDF file.
- Return type:
Response
- async indexing_api.upload_pdf_rag(files: List[fastapi.UploadFile] = fastapi.File, embed: bool = True, user_uuid: str = None, conversation_uuid: str = None, language: str = 'de', db: Session = fastapi.Depends) fastapi.Response ΒΆ
Upload a CSV file containing RAG data to the database.
- Parameters:
files (List[UploadFile]) β The PDF file sent by the user
embed (bool) β Whether to embed the data or not. Defaults to True
user_uuid (str) β UUID of the user who uploaded the file
language (str) β Language of the document
db (Session) β Database session
- Returns:
A response body containing a confirmation message upon successful completion of the process.
- Return type:
Response
- indexing_api.add_rag_data_from_csv(file_path: str = 'indexing/data/rag_test_data.csv', embed: bool = False, db: Session = fastapi.Depends)ΒΆ
Add and index test data for RAG from csv files with optional embeddings. The function acknowledges the following columns:
url: source URL of the document
text: Text content of the document
language (optional): Language of the document
embedding (optional): Embedding of the document
tags (optional): Tags of the document
- Parameters:
file_path (str, optional) β Path to the csv file containing the data. Defaults to βindexing/data/rag_test_data.csvβ.
embed (bool, optional) β Whether to embed the data or not. Defaults to False.
db (Session) β Database session
- Returns:
Confirmation message upon successful completion of the process
- Return type:
str
- indexing_api.add_faq_data_from_csv(file_path: str = 'indexing/data/faq_test_data.csv', embed: bool = False, db: Session = fastapi.Depends)ΒΆ
Add and index test data for RAG from csv files with optional embeddings. The function acknowledges the following columns:
url: source URL of the information
text: Text content of the question
answer: Text content of the answer
language (optional): Language of the question and answer
embedding (optional): Embedding of the question
tags (optional): Tags of the document
- Parameters:
file_path (str, optional) β Path to the csv file containing the data. Defaults to βindexing/data/faq_test_data.csvβ.
embed (bool, optional) β Whether to embed the data or not. Defaults to False.
db (Session) β Database session
- Returns:
Confirmation message upon successful completion of the process
- Return type:
str
- indexing_api.embed_rag_data(db: Session = fastapi.Depends, embed_empty_only: bool = True, k: int = 0)ΒΆ
Embed all RAG data (documents) that have not been embedded yet.
- Parameters:
db (Session) β Database session
embed_empty_only (bool, optional) β Embed only data that have not been embedded yet. Defaults to True.
k (int, optional) β Number of questions to embed. Default to 0 which means all questions.
- Returns:
Confirmation message upon successful completion of the process
- Return type:
str
- indexing_api.embed_faq_data(db: Session = fastapi.Depends, embed_empty_only: bool = True, k: int = 0)ΒΆ
Embed all FAQ questions that have not been embedded yet.
- Parameters:
db (Session) β Database session
embed_empty_only (bool, optional) β Embed only data that have not been embedded yet. Defaults to True.
k (int, optional) β Number of questions to embed. Default to 0 which means all questions.
- Returns:
Confirmation message upon successful completion of the process
- Return type:
str
- async indexing_api.index_pdfs_from_sitemap(sitemap_url: str = 'https://www.ahv-iv.ch/de/Sitemap-DE', embed: bool = False, db: Session = fastapi.Depends)ΒΆ
Indexes PDFs from a given sitemap URL. The PDFs are scraped and their data is added to the embedding database. This function is specifically designed for the site βhttps://www.ahv-iv.chβ.
- Parameters:
sitemap_url (str, optional) β The URL of the sitemap to scrape PDFs from. Defaults to βhttps://www.ahv-iv.ch/de/Sitemap-DEβ.
embed (bool, optional) β Whether to embed the data or not. Defaults to False.
db (Session) β Database session
- Returns:
A response body containing a confirmation message upon successful completion of the process.
- Return type:
ResponseBody
- async indexing_api.index_html_from_sitemap(sitemap_url: str = 'https://eak.admin.ch/eak/de/home.sitemap.xml', embed: bool = False, db: Session = fastapi.Depends)ΒΆ
Indexes HTML from a given sitemap URL. The HTML pages are scraped and their data is added to the embedding database. This function is specifically designed for the site βhttps://eak.admin.chβ.
- Parameters:
sitemap_url (str, optional) β The URL of the sitemap to scrape HTML from. Defaults to βhttps://eak.admin.ch/eak/de/home.sitemap.xmlβ.
embed (bool, optional) β Whether to embed the data or not. Defaults to False.
db (Session) β Database session
- Returns:
A response body containing a confirmation message upon successful completion of the process.
- Return type:
ResponseBody
- async indexing_api.index_faq_data(sitemap_url: str = 'https://faq.bsv.admin.ch/sitemap.xml', embed_question: bool = False, embed_answer: bool = False, k: int = 0, db: Session = fastapi.Depends)ΒΆ
Add and index data for Autocomplete to the FAQ database. The data is obtained by scraping the website sitemap_url.
- Parameters:
sitemap_url (str, default βhttps://faq.bsv.admin.ch/sitemap.xmlβ) β the sitemap.xml URL of the website to scrap
k (int, default 0) β Number of article to scrap and log to test the method.
embed_question (bool, default False) β Flag to indicate if the system embeds questions text
embed_answer (bool, default False) β Flag to indicate if the system embeds answers text
db (Session, optional) β Database session to use for upserting the extracted
- Returns:
Confirmation message upon successful completion of the process
- Return type:
str
- async indexing_api.index_data(item: FaqQuestionItem, db: Session = fastapi.Depends)ΒΆ
Upsert a single entry to the FAQ dataset.
- Parameters:
item (FaqQuestionItem) β
- The Question item to insert or update :
- idint, optional
The item if update is wanted
- urlstr
URL where the entry article can be found
- questionstr
The FAQ question
- answerstr
The question answer
- languagestr
The article language
- sourcestr
Username of the user who inserted the data
db (Session) β Database session
- Return type:
dict