Pythian Blog: Technical Track

Architecting and Building Production-Grade GenAI Systems - Part 2: Architectural Matters (Continued)

Architecting and Building Production-Grade GenAI Systems Part 2If you haven't read Part 1, please click here to get started.

Architectural matters (continued)

4. Caching and Optimization

Depending on your use case, the interactions with the LLM could potentially be repetitive. For example, if you want to retrieve a recent conversation or result that the LLM has produced, similarly to how users re-run queries against a database. In this case, we can optimize system performance by implementing caching mechanisms. We would implement a “Cache-Aside” pattern where we check the cache to see if we have the chat or specific LLM answer to a question cached, and if not, we re-do the interaction with the LLM. 

Since LLM interactions are charged by tokens, if you use enough of the cache, then the cost of the cache should offset the cost of the LLM. If on the other hand, your interactions are usually very different or you have no need to retrieve a recent chat, then the cost of the cache might not be worth it. This will depend specifically on what application you are building. Cache retrieval is also very likely to be faster than running inference on the LLM so responsiveness improves.

For our purposes, let’s assume it is a decent assumption that the cache will be beneficial for both cost and performance and update our architecture:

5. Continuous Integration and Delivery

We of course want our solution to implement all the best practices of software delivery so we want to set up at least source control, a development environment, testing workflow and a continuous integration and continuous delivery pipeline for the solution.

Once again,we will augment the architecture. I'm zooming where we are adding a development environment on its separate subscription as well as GitHub as our source code repository and Azure DevOps Pipelines service that can trigger testing, QA and deployment pipelines.

6. Baseline Security

Just like any other software solution, we want to make sure we use a baseline level of security for authentication, authorization and network.

In this scenario, we will assume that this is an enterprise solution and the organization does not want all their service endpoints open to the public internet. For this purpose, we will be wrapping the services inside a Virtual Network, deploying Private Endpoints for each of the services that support it, implementing the Azure Firewall for our Virtual Network and establishing a private DNS zone for the private endpoints.

And of course, we want to enable strong authentication practices for both users and applications themselves. We will be using Azure Active Directory (now also known as Microsoft Entra ID) and we would use managed identities to avoid having to manage credentials. If we need to store any credentials it might only be for ADF and in this case we would use a Key Vault.

Here’s the update:

Notice that I’m only showing the private endpoints for ADF and API management for simplicity (and since they are the ones that will be communicating to on-premises through the Azure Firewall) but the intention would be to put a private endpoint in front of every service, disable all public endpoints and have them talk to each other through the VNET.

Next up, we're covering ethical considerations and compliance.

No Comments Yet

Let us know what you think

Subscribe by email