Indexing APIΒΆ

async indexing_api.init_indexing()ΒΆ

Initialize the database according to the configuration indexing_config specified in config.yaml

indexing_api.upload_csv_rag(file: fastapi.UploadFile = fastapi.File, embed: bool = False, db: Session = fastapi.Depends)ΒΆ

Upload a CSV file containing RAG data to the database with optional embeddings. The function acknowledges the following columns:

  • url: source URL of the document

  • text: Text content of the document

  • language (optional): Language of the document

  • embedding (optional): Embedding of the document

  • tag (optional): Tag of the document

Parameters:
  • file (UploadFile) – The CSV file sent by the user

  • embed (bool, optional) – Whether to embed the data or not. Defaults to False.

  • db (Session) – Database session

Returns:

A response body containing a confirmation message upon successful completion of the process.

Return type:

ResponseBody

indexing_api.upload_csv_faq(file: fastapi.UploadFile = fastapi.File, embed: bool = False, db: Session = fastapi.Depends)ΒΆ

Upload a CSV file containing RAG data to the database with optional embeddings. The function acknowledges the following columns:

  • url: source URL of the information

  • text: Text content of the question

  • answer: Text content of the answer

  • language (optional): Language of the question and answer

  • embedding (optional): Embedding of the question

  • tag (optional): Tag of the document

Parameters:
  • file (UploadFile) – The CSV file sent by the user

  • embed (bool, optional) – Whether to embed the data or not. Defaults to False.

  • db (Session) – Database session

Returns:

A response body containing a confirmation message upon successful completion of the process.

Return type:

ResponseBody

async indexing_api.upload_pdf_rag(file: fastapi.UploadFile = fastapi.File, embed: bool = False, db: Session = fastapi.Depends)ΒΆ

Upload a CSV file containing RAG data to the database.

Parameters:
  • file (UploadFile) – The PDF file sent by the user

  • embed (bool, optional) – Whether to embed the data or not. Defaults to False.

  • db (Session) – Database session

Returns:

A response body containing a confirmation message upon successful completion of the process.

Return type:

ResponseBody

indexing_api.add_rag_data_from_csv(file_path: str = 'indexing/data/rag_test_data.csv', embed: bool = False, db: Session = fastapi.Depends)ΒΆ

Add and index test data for RAG from csv files with optional embeddings. The function acknowledges the following columns:

  • url: source URL of the document

  • text: Text content of the document

  • language (optional): Language of the document

  • embedding (optional): Embedding of the document

  • tag (optional): Tag of the document

Parameters:
  • file_path (str, optional) – Path to the csv file containing the data. Defaults to β€œindexing/data/rag_test_data.csv”.

  • embed (bool, optional) – Whether to embed the data or not. Defaults to False.

  • db (Session) – Database session

Returns:

Confirmation message upon successful completion of the process

Return type:

str

indexing_api.add_faq_data_from_csv(file_path: str = 'indexing/data/faq_test_data.csv', embed: bool = False, db: Session = fastapi.Depends)ΒΆ

Add and index test data for RAG from csv files with optional embeddings. The function acknowledges the following columns:

  • url: source URL of the information

  • text: Text content of the question

  • answer: Text content of the answer

  • language (optional): Language of the question and answer

  • embedding (optional): Embedding of the question

  • tag (optional): Tag of the document

Parameters:
  • file_path (str, optional) – Path to the csv file containing the data. Defaults to β€œindexing/data/faq_test_data.csv”.

  • embed (bool, optional) – Whether to embed the data or not. Defaults to False.

  • db (Session) – Database session

Returns:

Confirmation message upon successful completion of the process

Return type:

str

indexing_api.embed_rag_data(db: Session = fastapi.Depends, embed_empty_only: bool = True, k: int = 0)ΒΆ

Embed all RAG data (documents) that have not been embedded yet.

Parameters:
  • db (Session) – Database session

  • embed_empty_only (bool, optional) – Embed only data that have not been embedded yet. Defaults to True.

  • k (int, optional) – Number of questions to embed. Default to 0 which means all questions.

Returns:

Confirmation message upon successful completion of the process

Return type:

str

indexing_api.embed_faq_data(db: Session = fastapi.Depends, embed_empty_only: bool = True, k: int = 0)ΒΆ

Embed all FAQ questions that have not been embedded yet.

Parameters:
  • db (Session) – Database session

  • embed_empty_only (bool, optional) – Embed only data that have not been embedded yet. Defaults to True.

  • k (int, optional) – Number of questions to embed. Default to 0 which means all questions.

Returns:

Confirmation message upon successful completion of the process

Return type:

str

async indexing_api.index_pdfs_from_sitemap(sitemap_url: str = 'https://www.ahv-iv.ch/de/Sitemap-DE', embed: bool = False, db: Session = fastapi.Depends)ΒΆ

Indexes PDFs from a given sitemap URL. The PDFs are scraped and their data is added to the embedding database. This function is specifically designed for the site β€œhttps://www.ahv-iv.ch”.

Parameters:
  • sitemap_url (str, optional) – The URL of the sitemap to scrape PDFs from. Defaults to β€œhttps://www.ahv-iv.ch/de/Sitemap-DE”.

  • embed (bool, optional) – Whether to embed the data or not. Defaults to False.

  • db (Session) – Database session

Returns:

A response body containing a confirmation message upon successful completion of the process.

Return type:

ResponseBody

async indexing_api.index_html_from_sitemap(sitemap_url: str = 'https://eak.admin.ch/eak/de/home.sitemap.xml', embed: bool = False, db: Session = fastapi.Depends)ΒΆ

Indexes HTML from a given sitemap URL. The HTML pages are scraped and their data is added to the embedding database. This function is specifically designed for the site β€œhttps://eak.admin.ch”.

Parameters:
  • sitemap_url (str, optional) – The URL of the sitemap to scrape HTML from. Defaults to β€œhttps://eak.admin.ch/eak/de/home.sitemap.xml”.

  • embed (bool, optional) – Whether to embed the data or not. Defaults to False.

  • db (Session) – Database session

Returns:

A response body containing a confirmation message upon successful completion of the process.

Return type:

ResponseBody

async indexing_api.index_faq_data(sitemap_url: str = 'https://faq.bsv.admin.ch/sitemap.xml', embed_question: bool = False, embed_answer: bool = False, k: int = 0, db: Session = fastapi.Depends)ΒΆ

Add and index data for Autocomplete to the FAQ database. The data is obtained by scraping the website sitemap_url.

Parameters:
  • sitemap_url (str, default β€˜https://faq.bsv.admin.ch/sitemap.xml’) – the sitemap.xml URL of the website to scrap

  • k (int, default 0) – Number of article to scrap and log to test the method.

  • embed_question (bool, default False) – Flag to indicate if the system embeds questions text

  • embed_answer (bool, default False) – Flag to indicate if the system embeds answers text

  • db (Session, optional) – Database session to use for upserting the extracted

Returns:

Confirmation message upon successful completion of the process

Return type:

str

async indexing_api.index_data(item: QuestionItem, db: Session = fastapi.Depends)ΒΆ

Upsert a single entry to the FAQ dataset.

Parameters:
  • item (QuestionItem) –

    The Question item to insert or update :
    idint, optional

    The item if update is wanted

    urlstr

    URL where the entry article can be found

    questionstr

    The FAQ question

    answerstr

    The question answer

    languagestr

    The article language

    sourcestr

    Username of the user who inserted the data

  • db (Session) – Database session

Return type:

dict