CosmicAC Logo

Managed Inference Job

What a Managed Inference Job is, when to use one, and how you call it.

A Managed Inference Job runs an open-source language model inside a KubeVirt virtual machine instance (VMI) on your cluster and serves it behind an OpenAI-compatible API. You send requests the same way you would to any OpenAI-compatible endpoint.

When to use one

A Managed Inference Job fits when you want to call a language model over an API and let CosmicAC run it for you. You pick an open-source model, and CosmicAC serves it.

If you instead want direct control of a GPU to run your own code, a GPU Container Job is the better fit. It gives you a machine and a shell, and you set up the environment yourself.

What you get

  • An OpenAI-compatible API, so your existing clients and SDKs work without changes.
  • Open-source models, served on your own cluster with vLLM.

How you connect

You call the model in two ways. Send requests to the endpoint from any OpenAI-compatible client, or run inference directly with cosmicac-cli. Both authenticate with an API key.

For the steps to connect a client, see Connect to Managed Inference.

Next steps

On this page