Managed Inference Job

A Managed Inference Job runs an open-source language model inside a KubeVirt virtual machine instance (VMI) on your cluster and serves it behind an OpenAI-compatible API. You send requests the same way you would to any OpenAI-compatible endpoint.

When to use one

A Managed Inference Job fits when you want to call a language model over an API and let CosmicAC run it for you. You pick an open-source model, and CosmicAC serves it.

If you instead want direct control of a GPU to run your own code, a GPU Container Job is the better fit. It gives you a machine and a shell, and you set up the environment yourself.

What you get

An OpenAI-compatible API, so your existing clients and SDKs work without changes.
Open-source models, served on your own cluster with vLLM.

Managed Inference Job

When to use one

What you get

How you connect

Next steps

Get a Managed Inference API key

Connect to Managed Inference

GPU Container Job

On this page