LLM App Architecture
Last updated
Last updated
The process of building a Large Language Model (LLM) application with Aurory AI can be broken down into three main phases: Plan, Build, and Run.
Identify a Single Problem to Solve: Begin by clearly defining the specific problem or use case your LLM application will address. This focus will guide the development process and ensure the application meets user needs.
Choose the LLM: Select the appropriate LLM that aligns with your project's requirements. Consider factors such as the model’s capabilities, size, and compatibility with your data and intended use case.
Customize the LLM: Tailor the chosen LLM to better fit your specific application. This customization may involve fine-tuning the model with domain-specific data or adjusting parameters to optimize performance.
Set Up the Application's Architecture: Establish the infrastructure needed to support your LLM application. This includes setting up data pipelines, integrating necessary APIs, and ensuring the architecture can handle the expected workload.
Conduct Online Evaluations and Implement Feedback: Deploy the application and conduct real-time evaluations to assess its performance. Gather user feedback and iterate on the design to refine and improve the application continually.
By following these steps, Aurory AI ensures a structured and effective approach to developing robust LLM applications that solve specific problems and meet user needs.
Aurory AI utilizes advanced Large Language Models (LLMs) to develop sophisticated applications. The architectural framework of these applications is divided into two main phases: data processing and application orchestration. Here’s a detailed look at the architecture:
The purpose of the Vector Database is to store and index embeddings for quick and efficient retrieval. The process involves mapping query embeddings to vector database embeddings, where snippets are pulled from the vector database based on their contextual relevance to the query.
The Embedding Model generates vector representations of the data for efficient retrieval. A copy of the query is sent to the embedding model, which produces embeddings that are stored in the vector database to facilitate quick access and relevance-based retrieval.
The Data Filter ensures that only authorized and relevant data is processed by the LLM. It filters context snippets to be injected into the prompt, preventing the LLM from processing any unauthorized data and maintaining the integrity of the data being used.
The Initial Prompt formulates and optimizes prompts to guide the LLM in generating accurate responses. The process involves injecting the query into the initial prompt, which is then optimized to ensure the LLM produces the best possible output.
The Prompt Optimization Tool enhances the initial prompt to improve the quality of LLM responses. The optimized prompt is sent to the LLM, ensuring that the input provided to the model is refined and capable of generating high-quality results.
The LLM Cache stores and retrieves LLM outputs to enhance the system's efficiency. Outputs are either stored in or pulled from the LLM cache, allowing for quick access and reducing the need for repeated computations.
The Content Classifier or Filter ensures the safety and appropriateness of LLM outputs. It scans the outputs for any harmful or offensive content before delivering them to users, maintaining a safe and user-friendly environment.
The LLM API and Hosting enable seamless interaction between the application and the LLM. The API hosts the model, processes optimized prompts, and delivers the final output, facilitating smooth and efficient communication within the system.
Identify a Single Problem to Solve: Define the specific problem or use case that the LLM application will address, ensuring a focused approach to development.
Choose the LLM: Select the most suitable LLM based on the application's requirements, considering factors like model capabilities, data compatibility, and performance needs.
Customize the LLM: Fine-tune the chosen LLM with domain-specific data to enhance its relevance and performance for the targeted use case. This may involve adjusting model parameters and incorporating specialized datasets.
Set Up the Application's Architecture: Establish the necessary infrastructure, including data pipelines, embedding models, and vector databases, to support the application's functionality and scalability.
Conduct Online Evaluations and Implement Feedback: Deploy the application, collect real-time user feedback, and iterate on the design and implementation to continuously improve the application's effectiveness and user experience.
The purpose of the Orchestration component is to manage the flow of data and queries within the application. It involves integrating various elements such as data pipelines, embedding models, APIs, plugins, LLM cache, logging, and validation to ensure smooth operation and coordination among different parts of the system.
The Playground component is designed to test and refine prompts using few-shot examples. Developers can interact with the playground to adjust prompts and observe the responses generated by the LLM, allowing for iterative improvements and fine-tuning of the application.
Data Pipelines serve the purpose of integrating and managing data from various sources. Contextual data is provided by app developers and fed into the system to condition the LLM outputs, ensuring that the model's responses are relevant and accurate based on the latest information.
APIs and Plugins are used to extend the functionality of the application. By integrating these components into the orchestration layer, the application gains enhanced capabilities, allowing it to perform a wider range of tasks and interact with external systems more effectively.
The Logging/LLMops component monitors and optimizes LLM operations. It involves logging queries, performance metrics, and system operations to provide insights into the system's behavior and guide further improvements, ensuring that the application runs efficiently and effectively.
Validation ensures the accuracy and reliability of the application. This process involves continuously testing LLM outputs and system performance to verify that the application is functioning correctly and meeting the desired standards of quality and precision.
App Hosting is responsible for hosting the final application for user interaction. It handles the submission of queries by users, processes these queries through the LLM, and returns the outputs to users via the app hosting platform, facilitating seamless user experiences and interactions with the application.
User Query: End users submit queries via the User Interface (UI).
Telemetry Service: Queries are logged and monitored for performance tracking.
Data Retrieval: Relevant data is retrieved from the vector database using the embedding model.
Response Generation: The LLM generates a response based on the optimized prompt.
Content Classification: Outputs are scanned for safety and appropriateness.
Response Delivery: The final output is delivered back to the user through the UI.
By following this structured approach, Aurory AI ensures the development of accurate, and user-centric LLM applications. This architecture enables the creation of powerful tools that address specific problems and deliver exceptional performance, improving the overall user experience.