Skip to main content

Summary

Overview

This repository is designed to facilitate the retrieval of knowledge data which is part of CIN Knowledge Discovery. It consists of several services including the Semantic API, Agent API, and Ingestion Processor Lambda. These services work together to process, analyze, and retrieve data in a meaningful and efficient manner.

Prerequisites

  • .NET
  • Terraform
  • Docker
  • Node.js and npm (K6 testing)

Architecture

For detailed information about the design and functionality of the services, please refer to the Knowledge Retrieval Design page on Confluence here.

Components

Provisioning

  • Provisioner: Orchestrates the provisioning of Hxp environment resources using Terraform Cloud.

Ingestion

  • Ingestion Events Writer Lambda: Delivers incoming events to S3 Bucket, containing objects along with their metadata.
  • Objects Ingestor Lambda: Imports data from Stage into the objects tables in Snowflake.
  • Embeddings Ingestor Lambda: Imports emdedding data form Stage into the embeddings table in Snowflake.

Knowledge Retrieval (aka RAG pipeline)

  • Prompt Processor Lambda: Processes the user's question and generates answer.
  • Semantic API: Provides Semantic search capabilities over Snowflake vector database.
  • Agent API: Allows defining and configuring the Agents and submitting queries.
  • QnA API: Provides access to question submitted by the users and answers generated.

Testing

  • Parquet Generator: Generates parquet files for testing embedding ingestion.
  • Ingestion Generator: Generates ingestion events for testing ingestion.

API Testing

The api-scenarios folder contains various scenarios that demonstrate how to interact with the APIs. These scenarios can be used to manually test the functionality of the APIs.

Also in the tests location there are automated tests created with Playwright (API, e2e, smoke tests) and k6 (performance tests).

Performance Tests

The performance-tests folder contains performance tests that can be run using K6 scripts.