API
En slutpunkt. OpenAI-kompatibel.
Strömma completions från Llana 3.2 med den OpenAI-klient du redan har. Ta med din egen SDK; vi spelar med.
§ I — Quickstart
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.kapllan.ai/v1",
apiKey: process.env.KAPLLAN_API_KEY,
});
const stream = await client.chat.completions.create({
model: "llana-3.2",
stream: true,
messages: [{ role: "user", content: "Why does ice float?" }],
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}§ II — Endpoints
| POST | /v1/chat/completions | OpenAI-compatible chat. Streaming optional. |
| POST | /v1/completions | Legacy text completions. Use chat unless you have a reason. |
| POST | /v1/embeddings | Multimodal embeddings via Llana-VL. |
| GET | /v1/models | List available model IDs and context windows. |
§ III — Models
llana-3.2128KFlagship reasoning model. Default choice.
llana-3.2-fast32KDistilled, lower latency, ~80% of full quality.
llana-coder128KCode-specialized variant. SWE-bench tuned.
llana-vl32KVision-language. Charts, diagrams, scans.
§ IV — Pricing
| Model | Input / 1M | Output / 1M |
|---|---|---|
| llana-3.2 | $3.00 | $15.00 |
| llana-3.2-fast | $0.50 | $2.00 |
| llana-coder | $3.00 | $15.00 |
| llana-vl | $2.00 | $8.00 |