Inference
Deploy and serve ML models in production.
The fastest LLM inference, 1,800+ tokens/sec on wafer-scale chips
Record-fast LLM inference on wafer-scale chips.
Serverless inference for open models.
Flat-rate serverless access to thousands of Hugging Face LLMs via one API
Production inference platform for open-weights LLMs
Gitee's serverless platform for model inference, hosting and apps.
Ultra-fast LLM inference on custom LPU silicon
Open-access AI cloud: serverless inference plus on-demand GPU rentals
Low-cost inference API for open-weight models from a major GPU cloud
Run AI applications on an efficient cloud.
Affordable model APIs and GPU cloud.
One API for hundreds of LLMs.
AI deployment network: serverless, dedicated and batch inference on open models
Distributed cloud offering low-cost LLM inference APIs and GPU compute.
Run any open AI model via API — no infrastructure
High-speed inference on custom AI chips.
Very fast inference on custom RDU hardware with an OpenAI-compatible API
Fast, low-cost inference APIs for 200+ open-source LLMs and multimodal models.
High-performance inference for open-weights LLMs
ByteDance's model-as-a-service platform for Doubao and other LLMs.
Baseten is in talks to raise US$1 billion at an US$11 billion valuation as inference money keeps flowing
Baseten, which rents Nvidia servers to companies running AI models, is in talks to raise US$1 billion at an US...
AI
1 Jun
Lablup open-sources MLXcel, an Apple-Silicon inference engine, under Apache 2.0
Lablup has released MLXcel, an open-source engine for running AI models on Apple Silicon, under the permissive...
KE
1 Jun