Back to Blog
Engineering12 min read

Bring Your Own Anthropic Key: Why We Don't Mark Up LLM Costs

P

Pentestas Team

Security Analyst

5/12/2026
Bring Your Own Anthropic Key: Why We Don't Mark Up LLM Costs
TL;DR · Key insight

Explore how Pentestas integrates the 'Bring Your Own Key' model for Anthropic, ensuring cost transparency and robust security in AI-driven pentesting. Delve into our engineering choices and the implications for user experience and cost management.

Introduction to the 'Bring Your Own Key' Model

The 'Bring Your Own Key' (BYOK) model represents a paradigm shift in how AI services, particularly those involving sensitive data, are deployed. At Pentestas, we recognize the importance of maintaining client control over cryptographic keys, which is why we have implemented the BYOK approach. This model allows our clients to manage their keys, ensuring that they have complete ownership and oversight of their data's encryption. This is particularly relevant for AI-driven services that need to process confidential information without compromising on security.

Anthropic has become a pivotal player in the realm of AI-driven pentesting by providing advanced language models that aid in identifying potential vulnerabilities. However, with great power comes the necessity for stringent security measures. By adopting a BYOK model, Pentestas allows clients to leverage Anthropic's capabilities while keeping the encryption keys within their control. This ensures that sensitive data accessed during pentesting remains secure and private, aligning with our clients' security policies and compliance requirements.

Our decision to adopt the BYOK model for Anthropic was driven by the need to enhance security and trust in our services. Initially, we encountered challenges related to key management and integration. To address these, we developed a robust framework that facilitates seamless integration of client-provided keys into our AI services. This framework ensures that keys are used effectively without compromising the performance or functionality of the AI models. By doing so, we bridged the gap between enhanced security and the operational efficiency our clients demand.

A comparison with traditional models, where service providers manage encryption keys, highlights the advantages of the BYOK approach. In traditional setups, clients often relinquish control of their keys, which can be a significant security concern. The BYOK model, however, empowers clients to maintain control over their encryption keys, reducing the risk of unauthorized access. This model not only enhances security but also builds greater trust between Pentestas and our clients, as they are assured of the integrity and confidentiality of their data.

Engineering the Integration of Anthropic's API

Integrating Anthropic's API into our platform begins with securely establishing a connection. We utilize HTTPS to ensure that data transmitted between our systems and the API is encrypted. This is crucial for maintaining the confidentiality and integrity of the sensitive information exchanged. Our backend processes are designed to handle API endpoints efficiently, ensuring that each request is processed swiftly and accurately. To manage connections, we use a connection pool that reuses existing connections, minimizing latency and resource usage.

Authentication is a cornerstone of our integration strategy. We manage API keys securely within our platform by encrypting them at rest using AES-256 encryption. The keys are stored in a secure vault and accessed only when necessary. Here is a snippet of how we handle key retrieval and decryption:

from cryptography.fernet import Fernet

def get_api_key():
    encrypted_key = retrieve_from_vault("anthropic_api_key")
    cipher = Fernet(os.getenv('VAULT_ENCRYPTION_KEY'))
    return cipher.decrypt(encrypted_key).decode('utf-8')

We ensure that API requests are seamless and secure, never exposing sensitive data. Our logging system is configured to redact any API keys from logs automatically. Additionally, we monitor API usage and handle rate limits by implementing exponential backoff strategies. This helps us optimize the API calls and avoid hitting the rate limits set by Anthropic. For example, if we receive a 429 Too Many Requests response, our system waits exponentially longer periods before retrying the request, thereby reducing the chances of consecutive rejections.

Testing and validation are critical components of our integration process. We employ a suite of automated tests to validate API interactions, including unit tests for individual functions and integration tests for end-to-end scenarios. These tests run in a staging environment that closely mirrors production, ensuring that when updates occur, they do not disrupt existing functionality. Our continuous integration pipeline triggers these tests, providing us with immediate feedback on the health of the integration.

Cost Transparency: No Markup on LLM Costs

At Pentestas, we believe in complete transparency, particularly when it comes to the costs associated with using Large Language Models (LLMs). Our philosophy is straightforward: users should pay exactly what the LLM services cost, without any markup. This approach is rooted in our commitment to ethical pricing and ensuring that our users have access to the best tools without financial barriers. We want our users to feel confident that every dollar spent is directly supporting the computational resources and capabilities they utilize, rather than unnecessary overhead.

To ensure clarity in billing, we calculate and present costs based on real-time usage metrics. Each time an LLM is accessed, whether for inference or training, the associated costs are tracked down to the millisecond. Our system generates detailed invoices that include a breakdown of each session's duration and cost. For example, a typical invoice may look like this:

Session ID: 12345
Start Time: 2023-10-15T08:00:00Z
End Time: 2023-10-15T08:45:00Z
Total Usage: 0.75 hours
Cost: $10.50

Transparency in billing is further emphasized by our open feedback channels. User feedback has been instrumental in shaping our pricing strategy. We actively encourage users to share their experiences and suggestions, which we analyze to refine our billing processes and cost structures. This iterative process ensures that our pricing remains fair and competitive, aligning with user expectations.

In comparison to some of our competitors, many of whom apply undisclosed fees or markups, we stand out by offering a no-nonsense billing approach. While others often bundle LLM costs with additional service fees, potentially obscuring the true cost of computation, Pentestas remains committed to transparency. Our approach not only builds trust but also allows our users to effectively manage their budgets, making informed decisions about their AI investments.

Security Considerations with BYOK

The Bring Your Own Key (BYOK) feature empowers users by giving them control over their own encryption keys. This autonomy means that sensitive data remains secure even if our systems are compromised, as we ourselves do not have access to these keys. This ensures that data breaches at our level do not expose user data, making it a compelling choice for users who prioritize security. A practical implementation of BYOK involves configuring the cloud environment to accept user-managed keys stored externally, for example, in AWS Key Management Service (KMS) or Azure Key Vault.

However, with great power comes responsibility. Users must be diligent in managing their keys to avoid potential risks. The loss of a key could mean permanent data loss, and unauthorized access to a key could expose sensitive information. Mitigation strategies include regularly rotating keys, using multi-factor authentication, and employing a robust key backup policy. We provide guidelines on setting up secure key management practices, which include using strong, unique passwords and avoiding hard-coded keys in source code. Consider the following example of securely accessing a key stored in AWS KMS:

import boto3
from botocore.exceptions import ClientError

def decrypt_data(ciphertext_blob):
    kms_client = boto3.client('kms')
    try:
        response = kms_client.decrypt(CiphertextBlob=ciphertext_blob)
        return response['Plaintext']
    except ClientError as e:
        print("Error decrypting data: ", e)
        return None

BYOK also enhances compliance with data protection regulations such as GDPR and CCPA, which emphasize user control over personal data. By allowing users to manage their own encryption keys, we align with regulatory requirements for data sovereignty and privacy. Furthermore, we educate users on best practices for key management. This includes offering webinars and resources that highlight the importance of securing keys and demonstrate practical steps for maintaining their security. Enhanced user knowledge plays a crucial role in preventing potential security incidents.

Case studies have shown that BYOK can avert security incidents. For instance, in a scenario where a user's account was compromised, the attacker could not access encrypted data because the encryption keys were stored externally and were not compromised. Such incidents underscore the importance of BYOK in an effective security strategy. We continually refine our platform to support these protective measures, ensuring our users have the tools they need to safeguard their data effectively.

Handling Plan Rate Limits

Understanding Anthropic's rate limits is crucial for developers working with their API. These limits, often tied to specific usage plans, define the maximum number of requests per minute or hour. For instance, a typical rate limit might allow for 1,000 requests per hour. Exceeding this threshold can result in HTTP 429 errors, interrupting workflows and degrading performance. Therefore, knowing these constraints is the first step in designing efficient API interactions.

To manage these rate limits effectively, we employ strategies such as request queuing and exponential backoff. By queuing requests, we can release them steadily, preventing sudden spikes that could breach limits. Here's a simple example of how we might implement an exponential backoff in Python:

import time
import requests

url = "https://api.anthropic.com/v1/resource"
retry_attempts = 0
max_retries = 5

while retry_attempts < max_retries:
    response = requests.get(url)
    if response.status_code == 429:
        wait_time = 2 ** retry_attempts
        time.sleep(wait_time)
        retry_attempts += 1
    else:
        break

Beyond handling rate limits in code, we also focus on dynamically allocating requests across our workloads. This involves prioritizing critical requests and deferring non-essential ones during peak times. Such allocation helps us maintain a balance between efficiency and compliance with rate limits. Users are kept informed through notifications and alerts when they approach these thresholds. Our platform’s dashboard offers real-time insights into usage patterns, allowing users to adjust their behavior proactively.

Future Enhancements

Looking ahead, Pentestas plans to integrate more sophisticated algorithms for predictive rate limit handling. By analyzing historical data, we aim to anticipate usage spikes and adjust allocations dynamically, minimizing disruptions even further.

User Experience: Simplifying Complex Operations

In designing the user interface for key management, we focused on a few core principles: clarity, accessibility, and efficiency. Our goal was to make key management intuitive, even for users who are new to the concept. This meant employing a clean design that naturally guides the user through the process, without overwhelming them with technical details. We implemented visual cues and step-by-step instructions to ensure that users can achieve their goals swiftly and without confusion.

Continuous Improvement Through Feedback

We actively sought feedback from our users to refine our UI/UX. By listening to their experiences and challenges, we were able to make iterative improvements that significantly enhanced usability, resulting in a more satisfying user experience.

Balancing complexity and simplicity in user interactions is a delicate art. We aimed to provide users with the tools they need without overwhelming them with unnecessary options. For instance, while the interface offers advanced features for power users, it remains straightforward enough for beginners to navigate effortlessly. This balance was achieved through thoughtful design and robust user testing, ensuring that every feature serves a clear purpose.

To further assist our users, we developed several tools and features designed to enhance the overall user experience. One such tool is the keyValidator, which automatically checks the validity of user-provided keys, offering real-time feedback to prevent errors. This tool not only saves time but also reduces frustration by catching mistakes early in the process.

Scalability and Performance Considerations

Scaling the Bring Your Own Key (BYOK) model with Anthropic integration presents unique challenges. One of the primary concerns is efficiently managing increased loads without compromising system performance. As our user base grows, the need to handle larger datasets and more complex queries becomes paramount. At Pentestas, we have designed our infrastructure to accommodate these demands, but it requires constant evaluation and adaptation. We leverage load balancing and autoscaling groups to dynamically allocate resources as needed, ensuring that spikes in usage do not degrade performance.

Before implementing the BYOK model, we conducted extensive performance benchmarking to understand its impact. By comparing system parameters such as CPU usage and response time before and after integration, we identified key areas for optimization. For instance, our tests revealed that optimizing database queries reduced latency by up to 30%. This was achieved through indexing strategies and query refactoring, which are critical for maintaining fast response times under load.

-- Example SQL query optimization
SELECT * FROM users WHERE last_login > NOW() - INTERVAL '30 days'
-- Optimized with an index on last_login
CREATE INDEX idx_last_login ON users(last_login);

Our infrastructure enhancements include deploying a distributed cache layer to minimize database load. We also utilize a microservices architecture that allows for independent scaling of components based on demand. This modular approach enables us to isolate and address performance bottlenecks without affecting the entire system. Monitoring tools like Prometheus and Grafana are instrumental in providing real-time insights into system performance, helping us proactively troubleshoot issues.

Case Study: Seamless Scaling for Enterprise Client

One of our enterprise clients experienced a 50% increase in traffic overnight. Our scalable architecture allowed us to accommodate this surge seamlessly, with zero downtime. This success showcases the robustness of our BYOK implementation and infrastructure strategies.

Limitations and Future Directions

While our Bring Your Own Key (BYOK) model provides users with the flexibility to manage their own API keys, it currently presents some limitations. For example, the model does not yet fully support all Anthropic endpoints, which can restrict the range of functionalities available to users. Additionally, users have reported occasional latency issues when integrating their keys, particularly during peak usage times. These limitations indicate areas where we need to improve to enhance user experience and broaden the model's applicability.

To address these issues, we are actively working on several potential improvements. One area of focus is optimizing the integration process to make it more seamless and efficient. We're also looking to expand support for additional Anthropic endpoints, which will require considerable backend engineering to maintain performance standards. Furthermore, our team is exploring ways to better scale the platform's infrastructure to handle increased load, ensuring reliability and speed for all users.

User Feedback

Our users have expressed a desire for more detailed analytics on API key usage, enabling them to track consumption patterns and optimize costs. This feedback is instrumental in guiding our development roadmap as we strive to offer more comprehensive tools for user empowerment.

Looking forward, we are excited about the potential for new partnerships and integrations. Collaborating with other platforms could introduce valuable synergies, enhancing the functionality and appeal of our BYOK model. We also envision a future where cost transparency is elevated, allowing users to have a clear understanding of how their key usage translates into expenses. This aligns with our long-term vision to empower users by providing them with the tools and information needed to make informed decisions.

Try it on your stack

Free tier includes 10 scans/month on a verified domain. No credit card required.

Start scanning
Alexander Sverdlov

Alexander Sverdlov

Founder of Pentestas. Author of 2 information security books, cybersecurity speaker at the largest cybersecurity conferences in Asia and a United Nations conference panelist. Former Microsoft security consulting team member, external cybersecurity consultant at the Emirates Nuclear Energy Corporation.