WHY THIS MATTERS IN BRIEF
Third party AI’s that act as go betweens between your organisation and other AI’s around the world could be a hackers dream come true.
Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.
A new concept is emerging in the Large Language Models (LLMs) Ops workflow – Artificial Intelligence (AI) proxy middleware which, as with everything that can be hacked and manipulated introduces yet another cyber threat into the AI models and environments that companies are becoming increasingly reliant on.
AI proxies are services that stand between an application and the model inference provider, such as OpenAI or Hugging Face. They are responsible for consolidating important steps in the Generative AI developer workflow, including calling different models (LLaMA, GPT*, Mixtral) with a single API, monitoring usage, latency, cost, caching and throttling of inference requests, and managing API keys for inference providers.
The Future of AI and Cyber Security, by Keynote Matthew Griffin
AI proxy middleware sits between applications and model inference providers, yet this architecture might be the wrong solution for solving valid problems. Instead, the capabilities provided by these middlewares could be handled more gracefully by frameworks and protocols that avoid middleware altogether.
The problems AI proxies aim to solve are indeed significant: separating concerns, decoupling model-specific logic from application code, enabling applications to invoke different models with a consistent API surface, monitoring generative AI usage, latency, and cost, caching inference requests, and managing throttling and API keys for different inference providers. However, introducing a proxy middleware creates additional challenges.
A monolithic design incorporates monitoring, observability, and caching, which are already well-established concepts in the software development workflow, with dedicated systems for each. This extra service layer introduces a security risk, requiring encrypted requests and user-specific data. The proxy causes two hops to the LLM provider, potentially degrading performance and reducing debuggability. It does not support local models, which are becoming increasingly important as models get smaller and more efficient. Moreover, many proxy middlewares are closed managed services from third-party providers, creating a critical external dependency without a failover strategy.
An open source AI framework and storage format can replace the AI proxy layer, providing a uniform API while connecting to relevant services to handle monitoring, caching, and key management separately. AIConfig, a config-driven framework, manages prompts, models, and inference settings as JSON-serializable configs. These configs can be version controlled, evaluated, monitored, and edited in a notebook-like playground, integrating directly into the developer workflow.
AIConfig observes that prompts, models, and inference settings should be saved as config, not code, and that a common storage format, model-agnostic and multi-modal, allows for straightforward switching between different models. Breaking down a monolithic service into its constituent parts allows for the use of existing service providers for inference, monitoring, caching, and KMS. AIConfig stores and iterates on prompts separately from application code, providing a uniform API surface across any model and modality.
The framework offers callback handlers for usage tracking, integrating monitoring for generative AI into existing application monitoring services. Solutions like GPTCache for semantic caching can be integrated straightforwardly with a framework instead of a proxy. Existing KMS services can manage inference endpoint keys, addressing API key management.
In addition to these capabilities, frameworks enable critical generative AI workflow elements for building production applications. Evaluation is supported through dedicated config artifacts, defining evals and triggering eval runs as part of CI/CD whenever the config changes. A framework allows for collapsing experimentation and productionization into a single workflow, enabling local experimentation in a notebook-like playground for visual editing and rapid prototyping. Governance and version control ensure reproducibility and provenance of the generative AI components of an application, making AIConfig a comprehensive solution for modern AI development.