Indexing APIΒΆ

async indexing_api.init_indexing()ΒΆ

Initialize the database according to the configuration indexing_config specified in config.yaml

async indexing_api.upload_csv_rag(file: fastapi.UploadFile = fastapi.File, embed: bool = False, db: Session = fastapi.Depends)ΒΆ

Upload a CSV file containing RAG data to the database with optional embeddings. The function acknowledges the following columns:

  • url: source URL of the document

  • text: Text content of the document

  • language (optional): Language of the document

  • embedding (optional): Embedding of the document

  • tags (optional): Tags of the document

  • organizations (optional): Organizations access of the document

  • subtopics (optional): Subtopics of the document

  • summary (optional): Summary of the document

  • hyq (optional): Hypothetical queries associated to the document

  • hyq_declarative (optional): Declarative hypothetical queries associated to the document

  • doctype (optional): Type of the document

  • user_uuid (optional): UUID of the user who uploaded the file

  • file (UploadFile) – The CSV file sent by the user

  • embed (bool, optional) – Whether to embed the data or not. Defaults to False.

  • db (Session) – Database session


A response body containing a confirmation message upon successful completion of the process.

Return type:


async indexing_api.upload_csv_faq(file: fastapi.UploadFile = fastapi.File, embed: bool = False, db: Session = fastapi.Depends)ΒΆ

Upload a CSV file containing RAG data to the database with optional embeddings. The function acknowledges the following columns:

  • url: source URL of the information

  • text: Text content of the question

  • answer: Text content of the answer

  • language (optional): Language of the question and answer

  • text_embedding (optional): Embedding of the question text

  • tags (optional): Tags of the document

  • file (UploadFile) – The CSV file sent by the user

  • embed (bool, optional) – Whether to embed the data or not. Defaults to False.

  • db (Session) – Database session


A response body containing a confirmation message upon successful completion of the process.

Return type:


async indexing_api.upload_csv_tags(file: fastapi.UploadFile = fastapi.File, embed: bool = False, db: Session = fastapi.Depends)ΒΆ

Upload a CSV file containing tags data to the database with optional embeddings. The function acknowledges the following columns:

  • tags_en: Tag name in english

  • description_en: English description of the tag

  • description: Description of the tag

  • language: Language of the tag

  • embedding (optional): Embedding of the description

  • file (UploadFile) – The CSV file sent by the user

  • embed (bool, optional) – Whether to embed the data or not. Defaults to False.

  • db (Session) – Database session


A response body containing a confirmation message upon successful completion of the process.

Return type:


async indexing_api.parse_pdf(file: fastapi.UploadFile = fastapi.File)ΒΆ

Parse a PDF file and return the text chunks as documents.


file (UploadFile) – The PDF file sent by the user


A response body containing the text chunks of the PDF file.

Return type:


async indexing_api.upload_pdf_rag(files: List[fastapi.UploadFile] = fastapi.File, embed: bool = True, user_uuid: str = None, conversation_uuid: str = None, language: str = 'de', db: Session = fastapi.Depends) fastapi.ResponseΒΆ

Upload a CSV file containing RAG data to the database.

  • files (List[UploadFile]) – The PDF file sent by the user

  • embed (bool) – Whether to embed the data or not. Defaults to True

  • user_uuid (str) – UUID of the user who uploaded the file

  • language (str) – Language of the document

  • db (Session) – Database session


A response body containing a confirmation message upon successful completion of the process.

Return type:


indexing_api.add_rag_data_from_csv(file_path: str = 'indexing/data/rag_test_data.csv', embed: bool = False, db: Session = fastapi.Depends)ΒΆ

Add and index test data for RAG from csv files with optional embeddings. The function acknowledges the following columns:

  • url: source URL of the document

  • text: Text content of the document

  • language (optional): Language of the document

  • embedding (optional): Embedding of the document

  • tags (optional): Tags of the document

  • file_path (str, optional) – Path to the csv file containing the data. Defaults to β€œindexing/data/rag_test_data.csv”.

  • embed (bool, optional) – Whether to embed the data or not. Defaults to False.

  • db (Session) – Database session


Confirmation message upon successful completion of the process

Return type:


indexing_api.add_faq_data_from_csv(file_path: str = 'indexing/data/faq_test_data.csv', embed: bool = False, db: Session = fastapi.Depends)ΒΆ

Add and index test data for RAG from csv files with optional embeddings. The function acknowledges the following columns:

  • url: source URL of the information

  • text: Text content of the question

  • answer: Text content of the answer

  • language (optional): Language of the question and answer

  • embedding (optional): Embedding of the question

  • tags (optional): Tags of the document

  • file_path (str, optional) – Path to the csv file containing the data. Defaults to β€œindexing/data/faq_test_data.csv”.

  • embed (bool, optional) – Whether to embed the data or not. Defaults to False.

  • db (Session) – Database session


Confirmation message upon successful completion of the process

Return type:


indexing_api.embed_rag_data(db: Session = fastapi.Depends, embed_empty_only: bool = True, k: int = 0)ΒΆ

Embed all RAG data (documents) that have not been embedded yet.

  • db (Session) – Database session

  • embed_empty_only (bool, optional) – Embed only data that have not been embedded yet. Defaults to True.

  • k (int, optional) – Number of questions to embed. Default to 0 which means all questions.


Confirmation message upon successful completion of the process

Return type:


indexing_api.embed_faq_data(db: Session = fastapi.Depends, embed_empty_only: bool = True, k: int = 0)ΒΆ

Embed all FAQ questions that have not been embedded yet.

  • db (Session) – Database session

  • embed_empty_only (bool, optional) – Embed only data that have not been embedded yet. Defaults to True.

  • k (int, optional) – Number of questions to embed. Default to 0 which means all questions.


Confirmation message upon successful completion of the process

Return type:


async indexing_api.index_pdfs_from_sitemap(sitemap_url: str = '', embed: bool = False, db: Session = fastapi.Depends)ΒΆ

Indexes PDFs from a given sitemap URL. The PDFs are scraped and their data is added to the embedding database. This function is specifically designed for the site β€œ”.

  • sitemap_url (str, optional) – The URL of the sitemap to scrape PDFs from. Defaults to β€œ”.

  • embed (bool, optional) – Whether to embed the data or not. Defaults to False.

  • db (Session) – Database session


A response body containing a confirmation message upon successful completion of the process.

Return type:


async indexing_api.index_html_from_sitemap(sitemap_url: str = '', embed: bool = False, db: Session = fastapi.Depends)ΒΆ

Indexes HTML from a given sitemap URL. The HTML pages are scraped and their data is added to the embedding database. This function is specifically designed for the site β€œ”.

  • sitemap_url (str, optional) – The URL of the sitemap to scrape HTML from. Defaults to β€œ”.

  • embed (bool, optional) – Whether to embed the data or not. Defaults to False.

  • db (Session) – Database session


A response body containing a confirmation message upon successful completion of the process.

Return type:


async indexing_api.index_faq_data(sitemap_url: str = '', embed_question: bool = False, embed_answer: bool = False, k: int = 0, db: Session = fastapi.Depends)ΒΆ

Add and index data for Autocomplete to the FAQ database. The data is obtained by scraping the website sitemap_url.

  • sitemap_url (str, default β€˜’) – the sitemap.xml URL of the website to scrap

  • k (int, default 0) – Number of article to scrap and log to test the method.

  • embed_question (bool, default False) – Flag to indicate if the system embeds questions text

  • embed_answer (bool, default False) – Flag to indicate if the system embeds answers text

  • db (Session, optional) – Database session to use for upserting the extracted


Confirmation message upon successful completion of the process

Return type:


async indexing_api.index_data(item: FaqQuestionItem, db: Session = fastapi.Depends)ΒΆ

Upsert a single entry to the FAQ dataset.

  • item (FaqQuestionItem) –

    The Question item to insert or update :
    idint, optional

    The item if update is wanted


    URL where the entry article can be found


    The FAQ question


    The question answer


    The article language


    Username of the user who inserted the data

  • db (Session) – Database session

Return type:
