OpenAI-compatible providers

Configure OpenAI-compatible LLM providers such as Mistral, DeepSeek, or any other provider that implements the OpenAI API format in kgateway.

Overview

Many LLM providers offer APIs that are compatible with OpenAI’s API format. You can configure these providers in agentgateway by using the openai provider type with custom host, port, path, and authHeader overrides.

Note that when you specify a custom host override, agentgateway requires explicit TLS configuration via BackendTLSPolicy for HTTPS endpoints. This differs from well-known providers (like OpenAI) where TLS is automatically enabled when using default hosts.

Before you begin

Install and set up an agentgateway proxy.

Set up access to an OpenAI-compatible provider

Review the following examples for common OpenAI-compatible provider endpoints:

Mistral AI example

Set up OpenAI-compatible provider access to Mistral AI models.

  1. Get a Mistral AI API key.

  2. Save the API key in an environment variable.

    export MISTRAL_API_KEY=<insert your API key>
  3. Create a Kubernetes secret to store your Mistral AI API key.

    kubectl apply -f- <<EOF
    apiVersion: v1
    kind: Secret
    metadata:
      name: mistral-secret
      namespace: agentgateway-system
    type: Opaque
    stringData:
      Authorization: $MISTRAL_API_KEY
    EOF
  4. Create an AgentgatewayBackend resource to configure your LLM provider and reference the AI API key secret that you created earlier.

    kubectl apply -f- <<EOF
    apiVersion: agentgateway.dev/v1alpha1
    kind: AgentgatewayBackend
    metadata:
      name: mistral
      namespace: agentgateway-system
    spec:
      ai:
        provider:
          openai:
            model: mistral-medium-2505
          host: api.mistral.ai
          port: 443 
          path: "/v1/chat/completions"
      policies:
        auth:
          secretRef:
            name: mistral-secret
        tls: 
          sni: api.mistral.ai
    EOF

    Review the following table to understand this configuration. For more information, see the API reference.

    Setting Description
    ai.provider.openai Define the OpenAI-compatible provider.
    openai.model The model to use, such as mistral-medium-2505.
    openai.host Required: The hostname of the OpenAI-compatible provider, such as api.mistral.ai.
    openai.port Required: The port number (typically 443 for HTTPS). Both host and port must be set together.
    openai.path Optional: Override the API path. Defaults to /v1/chat/completions if not specified.
    policies.auth Configure the authentication token for OpenAI API. The example refers to the secret that you previously created.
    policies.tls.sni The hostname for which to validate the server certificate (must match the host value).
  5. Create an HTTPRoute resource that routes incoming traffic to the AgentgatewayBackend. The following example sets up a route on the /openai path to the AgentgatewayBackend that you previously created. The URLRewrite filter rewrites the path from /openai to the path of the API in the LLM provider that you want to use, /v1/chat/completions.

    kubectl apply -f- <<EOF
    apiVersion: gateway.networking.k8s.io/v1
    kind: HTTPRoute
    metadata:
      name: mistral
      namespace: agentgateway-system
    spec:
      parentRefs:
        - name: agentgateway-proxy
          namespace: agentgateway-system
      rules:
      - matches:
        - path:
            type: PathPrefix
            value: /mistral
        filters:
          - type: URLRewrite
            urlRewrite:
              hostname: api.mistral.ai
        backendRefs:
        - name: mistral
          namespace: agentgateway-system
          group: agentgateway.dev
          kind: AgentgatewayBackend
    EOF
  6. Send a request to the LLM provider API. Verify that the request succeeds and that you get back a response from the chat completion API.

    curl "$INGRESS_GW_ADDRESS/mistral" -H content-type:application/json  -d '{
       "model": "",
       "messages": [
         {
           "role": "system",
           "content": "You are a helpful assistant."
         },
         {
           "role": "user",
           "content": "Write a short haiku about artificial intelligence."
         }
       ]
     }' | jq
    curl "localhost:8080/mistral" -H content-type:application/json  -d '{
       "model": "",
       "messages": [
         {
           "role": "system",
           "content": "You are a helpful assistant."
         },
         {
           "role": "user",
           "content": "Write a short haiku about artificial intelligence."
         }
       ]
     }' | jq

    Example output:

    {
      "model": "mistral-medium-2505",
      "usage": {
        "prompt_tokens": 20,
        "completion_tokens": 18,
        "total_tokens": 38
      },
      "choices": [
        {
          "message": {
            "content": "Silent circuits hum,\nLearning echoes through the void,\nWisdom without warmth.",
            "role": "assistant",
            "tool_calls": null
          },
          "index": 0,
          "finish_reason": "stop"
        }
      ],
      "id": "d05ef3973085435a8db8b51b580eeef8",
      "created": 1764614501,
      "object": "chat.completion"
    }

DeepSeek example

Set up OpenAI-compatible provider access to DeepSeek models.

  1. Get a DeepSeek API key.

  2. Save the API key in an environment variable.

    export DEEPSEEK_API_KEY=<insert your API key>
  3. Create a Kubernetes secret to store your DeepSeek API key.

    kubectl apply -f- <<EOF
    apiVersion: v1
    kind: Secret
    metadata:
      name: deepseek-secret
      namespace: agentgateway-system
    type: Opaque
    stringData:
      Authorization: $DEEPSEEK_API_KEY
    EOF
  4. Create an AgentgatewayBackend resource to configure your LLM provider and reference the AI API key secret that you created earlier.

    kubectl apply -f- <<EOF
    apiVersion: agentgateway.dev/v1alpha1
    kind: AgentgatewayBackend
    metadata:
      name: deepseek
      namespace: agentgateway-system
    spec:
      ai:
        provider:
          openai:
            model: deepseek-chat
          host: api.deepseek.com
          port: 443 
          path: "/v1/chat/completions"
      policies:
        auth:
          secretRef:
            name: deepseek-secret
        tls: 
          sni: api.deepseek.com
    EOF
  5. Create an HTTPRoute resource that routes incoming traffic to the AgentgatewayBackend. Note that agentgateway automatically rewrites the endpoint to the OpenAI chat completion endpoint of the LLM provider for you, based on the LLM provider that you set up in the AgentgatewayBackend resource.

    kubectl apply -f- <<EOF
    apiVersion: gateway.networking.k8s.io/v1
    kind: HTTPRoute
    metadata:
      name: deepseek
      namespace: agentgateway-system
    spec:
      parentRefs:
        - name: agentgateway-proxy
          namespace: agentgateway-system
      rules:
      - matches:
        - path:
            type: PathPrefix
            value: /deepseek
        backendRefs:
        - name: deepseek
          namespace: agentgateway-system
          group: agentgateway.dev
          kind: AgentgatewayBackend
    EOF
  6. Send a request to the LLM provider API. Verify that the request succeeds and that you get back a response from the chat completion API.

    curl "$INGRESS_GW_ADDRESS/deepseek" -H content-type:application/json  -d '{
       "model": "",
       "messages": [
         {
           "role": "system",
           "content": "You are a helpful assistant."
         },
         {
           "role": "user",
           "content": "Write a short haiku about artificial intelligence."
         }
       ]
     }' | jq
    curl "localhost:8080/deepseek" -H content-type:application/json  -d '{
       "model": "",
       "messages": [
         {
           "role": "system",
           "content": "You are a helpful assistant."
         },
         {
           "role": "user",
           "content": "Write a short haiku about artificial intelligence."
         }
       ]
     }' | jq

    Example output:

    {
      "id": "chatcmpl-deepseek-12345",
      "object": "chat.completion",
      "created": 1727967462,
      "model": "deepseek-chat",
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Neural networks learn,\nPatterns emerge from data streams,\nMind in silicon grows."
          },
          "finish_reason": "stop"
        }
      ],
      "usage": {
        "prompt_tokens": 20,
        "completion_tokens": 17,
        "total_tokens": 37
      }
    }

Groq example

Set up OpenAI-compatible provider access to Groq for fast inference.

  1. Get a Groq API key.

  2. Save the API key in an environment variable.

    export GROQ_API_KEY=<insert your API key>
  3. Create a Kubernetes secret to store your Groq API key.

    kubectl apply -f- <<EOF
    apiVersion: v1
    kind: Secret
    metadata:
      name: groq-secret
      namespace: agentgateway-system
    type: Opaque
    stringData:
      Authorization: $GROQ_API_KEY
    EOF
  4. Create an AgentgatewayBackend resource to configure your LLM provider.

    kubectl apply -f- <<EOF
    apiVersion: agentgateway.dev/v1alpha1
    kind: AgentgatewayBackend
    metadata:
      name: groq
      namespace: agentgateway-system
    spec:
      ai:
        provider:
          openai:
            model: llama-3.3-70b-versatile
          host: api.groq.com
          port: 443
          path: "/openai/v1/chat/completions"
      policies:
        auth:
          secretRef:
            name: groq-secret
        tls:
          sni: api.groq.com
    EOF
  5. Create an HTTPRoute resource that routes incoming traffic to the AgentgatewayBackend.

    kubectl apply -f- <<EOF
    apiVersion: gateway.networking.k8s.io/v1
    kind: HTTPRoute
    metadata:
      name: groq
      namespace: agentgateway-system
    spec:
      parentRefs:
        - name: agentgateway-proxy
          namespace: agentgateway-system
      rules:
      - matches:
        - path:
            type: PathPrefix
            value: /groq
        backendRefs:
        - name: groq
          namespace: agentgateway-system
          group: agentgateway.dev
          kind: AgentgatewayBackend
    EOF
  6. Send a request to verify the setup.

    curl "$INGRESS_GW_ADDRESS/groq" -H content-type:application/json  -d '{
       "model": "llama-3.3-70b-versatile",
       "messages": [
         {
           "role": "user",
           "content": "Explain quantum computing in one sentence."
         }
       ]
     }' | jq
    curl "localhost:8080/groq" -H content-type:application/json  -d '{
       "model": "llama-3.3-70b-versatile",
       "messages": [
         {
           "role": "user",
           "content": "Explain quantum computing in one sentence."
         }
       ]
     }' | jq

Together AI example

Set up OpenAI-compatible provider access to Together AI for open-source models.

  1. Get a Together AI API key.

  2. Save the API key in an environment variable.

    export TOGETHER_API_KEY=<insert your API key>
  3. Create a Kubernetes secret to store your Together AI API key.

    kubectl apply -f- <<EOF
    apiVersion: v1
    kind: Secret
    metadata:
      name: together-secret
      namespace: agentgateway-system
    type: Opaque
    stringData:
      Authorization: $TOGETHER_API_KEY
    EOF
  4. Create an AgentgatewayBackend resource to configure your LLM provider.

    kubectl apply -f- <<EOF
    apiVersion: agentgateway.dev/v1alpha1
    kind: AgentgatewayBackend
    metadata:
      name: together
      namespace: agentgateway-system
    spec:
      ai:
        provider:
          openai:
            model: meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo
          host: api.together.xyz
          port: 443
      policies:
        auth:
          secretRef:
            name: together-secret
        tls:
          sni: api.together.xyz
    EOF
  5. Create an HTTPRoute resource that routes incoming traffic to the AgentgatewayBackend.

    kubectl apply -f- <<EOF
    apiVersion: gateway.networking.k8s.io/v1
    kind: HTTPRoute
    metadata:
      name: together
      namespace: agentgateway-system
    spec:
      parentRefs:
        - name: agentgateway-proxy
          namespace: agentgateway-system
      rules:
      - matches:
        - path:
            type: PathPrefix
            value: /together
        backendRefs:
        - name: together
          namespace: agentgateway-system
          group: agentgateway.dev
          kind: AgentgatewayBackend
    EOF
  6. Send a request to verify the setup.

    curl "$INGRESS_GW_ADDRESS/together" -H content-type:application/json  -d '{
       "model": "meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo",
       "messages": [
         {
           "role": "user",
           "content": "What are the benefits of open-source AI models?"
         }
       ]
     }' | jq
    curl "localhost:8080/together" -H content-type:application/json  -d '{
       "model": "meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo",
       "messages": [
         {
           "role": "user",
           "content": "What are the benefits of open-source AI models?"
         }
       ]
     }' | jq

Perplexity example

Set up OpenAI-compatible provider access to Perplexity for online search-augmented responses.

  1. Get a Perplexity API key.

  2. Save the API key in an environment variable.

    export PERPLEXITY_API_KEY=<insert your API key>
  3. Create a Kubernetes secret to store your Perplexity API key.

    kubectl apply -f- <<EOF
    apiVersion: v1
    kind: Secret
    metadata:
      name: perplexity-secret
      namespace: agentgateway-system
    type: Opaque
    stringData:
      Authorization: $PERPLEXITY_API_KEY
    EOF
  4. Create an AgentgatewayBackend resource to configure your LLM provider.

    kubectl apply -f- <<EOF
    apiVersion: agentgateway.dev/v1alpha1
    kind: AgentgatewayBackend
    metadata:
      name: perplexity
      namespace: agentgateway-system
    spec:
      ai:
        provider:
          openai:
            model: llama-3.1-sonar-large-128k-online
          host: api.perplexity.ai
          port: 443
      policies:
        auth:
          secretRef:
            name: perplexity-secret
        tls:
          sni: api.perplexity.ai
    EOF
  5. Create an HTTPRoute resource that routes incoming traffic to the AgentgatewayBackend.

    kubectl apply -f- <<EOF
    apiVersion: gateway.networking.k8s.io/v1
    kind: HTTPRoute
    metadata:
      name: perplexity
      namespace: agentgateway-system
    spec:
      parentRefs:
        - name: agentgateway-proxy
          namespace: agentgateway-system
      rules:
      - matches:
        - path:
            type: PathPrefix
            value: /perplexity
        backendRefs:
        - name: perplexity
          namespace: agentgateway-system
          group: agentgateway.dev
          kind: AgentgatewayBackend
    EOF
  6. Send a request to verify the setup.

    curl "$INGRESS_GW_ADDRESS/perplexity" -H content-type:application/json  -d '{
       "model": "llama-3.1-sonar-large-128k-online",
       "messages": [
         {
           "role": "user",
           "content": "What are the latest developments in AI?"
         }
       ]
     }' | jq
    curl "localhost:8080/perplexity" -H content-type:application/json  -d '{
       "model": "llama-3.1-sonar-large-128k-online",
       "messages": [
         {
           "role": "user",
           "content": "What are the latest developments in AI?"
         }
       ]
     }' | jq

Fireworks AI example

Set up OpenAI-compatible provider access to Fireworks AI for fast inference on open-source models.

  1. Get a Fireworks AI API key.

  2. Save the API key in an environment variable.

    export FIREWORKS_API_KEY=<insert your API key>
  3. Create a Kubernetes secret to store your Fireworks AI API key.

    kubectl apply -f- <<EOF
    apiVersion: v1
    kind: Secret
    metadata:
      name: fireworks-secret
      namespace: agentgateway-system
    type: Opaque
    stringData:
      Authorization: $FIREWORKS_API_KEY
    EOF
  4. Create an AgentgatewayBackend resource to configure your LLM provider.

    kubectl apply -f- <<EOF
    apiVersion: agentgateway.dev/v1alpha1
    kind: AgentgatewayBackend
    metadata:
      name: fireworks
      namespace: agentgateway-system
    spec:
      ai:
        provider:
          openai:
            model: accounts/fireworks/models/llama-v3p1-70b-instruct
          host: api.fireworks.ai
          port: 443
          path: "/inference/v1/chat/completions"
      policies:
        auth:
          secretRef:
            name: fireworks-secret
        tls:
          sni: api.fireworks.ai
    EOF
  5. Create an HTTPRoute resource that routes incoming traffic to the AgentgatewayBackend.

    kubectl apply -f- <<EOF
    apiVersion: gateway.networking.k8s.io/v1
    kind: HTTPRoute
    metadata:
      name: fireworks
      namespace: agentgateway-system
    spec:
      parentRefs:
        - name: agentgateway-proxy
          namespace: agentgateway-system
      rules:
      - matches:
        - path:
            type: PathPrefix
            value: /fireworks
        backendRefs:
        - name: fireworks
          namespace: agentgateway-system
          group: agentgateway.dev
          kind: AgentgatewayBackend
    EOF
  6. Send a request to verify the setup.

    curl "$INGRESS_GW_ADDRESS/fireworks" -H content-type:application/json  -d '{
       "model": "accounts/fireworks/models/llama-v3p1-70b-instruct",
       "messages": [
         {
           "role": "user",
           "content": "Explain the advantages of running LLMs on optimized inference engines."
         }
       ]
     }' | jq
    curl "localhost:8080/fireworks" -H content-type:application/json  -d '{
       "model": "accounts/fireworks/models/llama-v3p1-70b-instruct",
       "messages": [
         {
           "role": "user",
           "content": "Explain the advantages of running LLMs on optimized inference engines."
         }
       ]
     }' | jq

Next steps

Agentgateway assistant

Ask me anything about agentgateway configuration, features, or usage.

Note: AI-generated content might contain errors; please verify and test all returned information.

↑↓ navigate select esc dismiss

What could be improved?