LLM App Architecture

How to Build an LLM App with Aurory AI

The process of building a Large Language Model (LLM) application with Aurory AI can be broken down into three main phases: Plan, Build, and Run.

Plan

Identify a Single Problem to Solve: Begin by clearly defining the specific problem or use case your LLM application will address. This focus will guide the development process and ensure the application meets user needs.

Build

Choose the LLM: Select the appropriate LLM that aligns with your project's requirements. Consider factors such as the model’s capabilities, size, and compatibility with your data and intended use case.
Customize the LLM: Tailor the chosen LLM to better fit your specific application. This customization may involve fine-tuning the model with domain-specific data or adjusting parameters to optimize performance.
Set Up the Application's Architecture: Establish the infrastructure needed to support your LLM application. This includes setting up data pipelines, integrating necessary APIs, and ensuring the architecture can handle the expected workload.

Run

Conduct Online Evaluations and Implement Feedback: Deploy the application and conduct real-time evaluations to assess its performance. Gather user feedback and iterate on the design to refine and improve the application continually.

By following these steps, Aurory AI ensures a structured and effective approach to developing robust LLM applications that solve specific problems and meet user needs.

Architecture of LLM Applications with Aurory AI

Aurory AI utilizes advanced Large Language Models (LLMs) to develop sophisticated applications. The architectural framework of these applications is divided into two main phases: data processing and application orchestration. Here’s a detailed look at the architecture:

A. Data Processing Phase

Vector Database

The purpose of the Vector Database is to store and index embeddings for quick and efficient retrieval. The process involves mapping query embeddings to vector database embeddings, where snippets are pulled from the vector database based on their contextual relevance to the query.

Embedding Model

The Embedding Model generates vector representations of the data for efficient retrieval. A copy of the query is sent to the embedding model, which produces embeddings that are stored in the vector database to facilitate quick access and relevance-based retrieval.

Data Filter

The Data Filter ensures that only authorized and relevant data is processed by the LLM. It filters context snippets to be injected into the prompt, preventing the LLM from processing any unauthorized data and maintaining the integrity of the data being used.

Initial Prompt

The Initial Prompt formulates and optimizes prompts to guide the LLM in generating accurate responses. The process involves injecting the query into the initial prompt, which is then optimized to ensure the LLM produces the best possible output.

Prompt Optimization Tool

The Prompt Optimization Tool enhances the initial prompt to improve the quality of LLM responses. The optimized prompt is sent to the LLM, ensuring that the input provided to the model is refined and capable of generating high-quality results.

LLM Cache

The LLM Cache stores and retrieves LLM outputs to enhance the system's efficiency. Outputs are either stored in or pulled from the LLM cache, allowing for quick access and reducing the need for repeated computations.

Content Classifier or Filter

The Content Classifier or Filter ensures the safety and appropriateness of LLM outputs. It scans the outputs for any harmful or offensive content before delivering them to users, maintaining a safe and user-friendly environment.

LLM API and Hosting

The LLM API and Hosting enable seamless interaction between the application and the LLM. The API hosts the model, processes optimized prompts, and delivers the final output, facilitating smooth and efficient communication within the system.

Plan Phase

Identify a Single Problem to Solve: Define the specific problem or use case that the LLM application will address, ensuring a focused approach to development.

Choose the LLM: Select the most suitable LLM based on the application's requirements, considering factors like model capabilities, data compatibility, and performance needs.

Build Phase

Customize the LLM: Fine-tune the chosen LLM with domain-specific data to enhance its relevance and performance for the targeted use case. This may involve adjusting model parameters and incorporating specialized datasets.

Set Up the Application's Architecture: Establish the necessary infrastructure, including data pipelines, embedding models, and vector databases, to support the application's functionality and scalability.

Run Phase

Conduct Online Evaluations and Implement Feedback: Deploy the application, collect real-time user feedback, and iterate on the design and implementation to continuously improve the application's effectiveness and user experience.

B. Application Orchestration Phase

Orchestration

The purpose of the Orchestration component is to manage the flow of data and queries within the application. It involves integrating various elements such as data pipelines, embedding models, APIs, plugins, LLM cache, logging, and validation to ensure smooth operation and coordination among different parts of the system.

Playground

The Playground component is designed to test and refine prompts using few-shot examples. Developers can interact with the playground to adjust prompts and observe the responses generated by the LLM, allowing for iterative improvements and fine-tuning of the application.

Data Pipelines

Data Pipelines serve the purpose of integrating and managing data from various sources. Contextual data is provided by app developers and fed into the system to condition the LLM outputs, ensuring that the model's responses are relevant and accurate based on the latest information.

APIs/Plugins

APIs and Plugins are used to extend the functionality of the application. By integrating these components into the orchestration layer, the application gains enhanced capabilities, allowing it to perform a wider range of tasks and interact with external systems more effectively.

Logging/LLMops

The Logging/LLMops component monitors and optimizes LLM operations. It involves logging queries, performance metrics, and system operations to provide insights into the system's behavior and guide further improvements, ensuring that the application runs efficiently and effectively.

Validation

Validation ensures the accuracy and reliability of the application. This process involves continuously testing LLM outputs and system performance to verify that the application is functioning correctly and meeting the desired standards of quality and precision.

App Hosting

App Hosting is responsible for hosting the final application for user interaction. It handles the submission of queries by users, processes these queries through the LLM, and returns the outputs to users via the app hosting platform, facilitating seamless user experiences and interactions with the application.

Note: Interaction Flow

User Query: End users submit queries via the User Interface (UI).
Telemetry Service: Queries are logged and monitored for performance tracking.
Data Retrieval: Relevant data is retrieved from the vector database using the embedding model.
Response Generation: The LLM generates a response based on the optimized prompt.
Content Classification: Outputs are scanned for safety and appropriateness.
Response Delivery: The final output is delivered back to the user through the UI.

By following this structured approach, Aurory AI ensures the development of accurate, and user-centric LLM applications. This architecture enables the creation of powerful tools that address specific problems and deliver exceptional performance, improving the overall user experience.

PreviousAI Agents Architecture NextMachine Learning Architecture

Last updated 9 months ago