Onnx Runtime
The best 41 Onnx Runtime AI tools - Free & Paid
Explore 41 AI for Onnx Runtime
gpt-oss playground provides open-weight demos of gpt-oss-120b and 20b for infrastructure testing, distributed and on-device inference, benchmarking, API integration, and reproducible research, with adjustable reasoning levels and visible-reasoning for diagnostics. Demo-only; validate outputs.
Freemium
Unsloth Studio is a no-code web UI enabling local training, running, and exporting of open AI models like Qwen3.5 and NVIDIA Nemotron 3, simplifying experimentation for users without extensive technical expertise.
Free
Foundry Local runs AI models on-device using ONNX Runtime (CPU/GPU/NPU) to keep data local, offering an OpenAI-compatible API, Python/JS/C#/Rust SDKs, a model hub, and CLI tools for edge and enterprise deployments.
Free
ComfyOnline lets users run ComfyUI workflows online, automatically installing dependencies and models. It auto‑generates APIs for image, video, audio, and text generation, supports advanced services, LLMs, custom nodes, and scales with traffic.
Subscription
- $70/mo
Open Operator is a user-friendly AI tool that allows users to view, run, and browse AI models directly in their web browser. Powered by Stagehand and BrowserBase, it offers a seamless experience for exploring AI predictions effortlessly.
OpenRouter gives one API key to access 300+ models from 60+ providers, SDK‑compatible, with visual routing, automated fall‑back, edge hosting, data‑policy controls, and agentic tools for building efficient autonomous workflows.
Freemium
OpenCode.ai is an open-source AI coding agent that runs directly in your terminal, IDE, or desktop. It connects to 75+ LLM providers, supports offline use, and enables multi-session collaboration for code review and debugging.
Free
RunningHub is a cloud IDE for ComfyUI workflows, enabling in‑browser design, editing, and GPU‑accelerated execution. It offers pre‑installed nodes, access to major diffusion and video models, training tools, API integration, and real‑time collaboration.
Free
Nebius AI Studio offers efficient model deployment with hosted open-source models, ultra-low latency, and scalable processing options. It simplifies AI model exploration through an intuitive interface while ensuring verified quality and performance for diverse applications.
Free trial
GPUX is a serverless inference platform that delivers 1‑second cold starts and GPU‑accelerated execution for models like Stable Diffusion XL, ESRGAN, and Whisper. It supports P2P and read‑write volume access for rapid, scalable deployment on NVIDIA RTX 4090 GPUs.
Freemium
ONVY consolidates biometric, lab, wearable, and environmental data via a single API, delivering AI‑driven coaching, personalized nudges, and adaptive nutrition or training plans. It offers GDPR/HIPAA‑compliant security, enterprise dashboards, modular integration, and continuous learning for preventi
Subscription
Onyxium consolidates image, language, and speech AI models for developers, designers, and teams. Customizable parameters and usage logs support tailored output, workflow tracking, and seamless embedding into applications, boosting efficiency throughout the development cycle.
Freemium
OnDemand AI Agents is a decentralized OS that lets users build, deploy, and scale AI agents without a dev team. It offers a no‑code workflow builder, an agent marketplace, secure model integration, an AI playground for testing, and enterprise‑grade security.
Freemium
OpenHuman is an open-source personal AI framework for private, on‑premises deployments and local model execution, providing an agent framework, prompt management, local speech (Whisper/Piper), integrations, Docker/one‑click deployment, and developer tooling.
Free
local.ai runs language models locally without GPUs. Its Rust backend keeps the binary under 10 MB and performs CPU inference with GGML quantization. A single‑click interface streams responses to a UI, while a model manager tracks, verifies, and resumes downloads.
Freemium
xTuring is an open‑source framework that lets developers and researchers build, fine‑tune, and deploy LLMs efficiently. It supports LoRA adapters, INT8 quantization, custom datasets, offers CLI and notebooks, and provides a unified API for multiple backends.
Freemium
ZETIC deploys TorchScript, TensorFlow, and ONNX models to mobile and embedded devices, quantizing for CPU, GPU, or NPU to reach up to 60× speed and 50% size reduction. It supplies benchmarks and a 3‑line offline code snippet for privacy‑preserving AI.
Free
OpenCraft AI is a secure, multi‑model copilot that unifies GPT‑4, Claude, and Gemini. It preserves context across model switches, keeps uploaded files accessible, auto‑formats chats into reports or decks, and generates images with consistent voice tone for streamlined workflows.
Paid
LLMWare AI installs a lightweight client on PCs, providing instant access to 100+ AI models optimized for Intel and Qualcomm hardware. It supports RAG, auto‑tunes weights, runs locally without Wi‑Fi, and offers an admin console for monitoring, scaling, and audit logs.
Freemium
Openfabric is a decentralized layer‑one blockchain that lets AI developers, data providers, and infrastructure partners build, train, and deploy algorithms on a permissionless network. Its marketplace offers ready‑made tools, and token holders can stake for governance and liquidity.
Freemium
Onvo AI revolutionizes data visualization through AI prompts, enabling users to easily generate tailored charts and dashboards without intricate queries. It ensures secure sharing, supports multiple data source integrations, and provides SDKs for smooth product incorporation.
Free trial
Onyx.app is a conversational AI platform that combines chat-based search, configurable agents, and action orchestration for teams. It integrates with enterprise systems to automate tasks and manage knowledge, with deployment options for regulated industries.
Free trial
- $20/mo
OfoxAI is a centralized AI gateway that streamlines access and management of AI models and inference endpoints. It enables multi-model orchestration, intelligent request routing, and built-in API management with security, observability, and MLOps integration for scalable, reliable deployments.
Freemium
Nexa AI offers an on‑device platform that lets developers deploy vision, audio, and text models to NPUs, GPUs, and CPUs with one line of code. The SDK supports day‑zero deployment, multimodal inference, and optimizations for mobile, automotive, and IoT devices.
Free
openmed is an on-device clinical AI platform for PHI/PII detection, de-identification, and healthcare NER, offering 1,000+ model variants, multilingual support, curated biomedical datasets, configurable privacy controls, and air-gapped macOS/iOS/server runtimes.
Supertonic is a lightning-fast, on-device text-to-speech (TTS) system built for local inference using ONNX Runtime, supporting 31 languages and offering a compact, open-weight model for edge deployment.
Free
Open Notebook is a self-hosted, open-source notebook for private LLM workflows, supporting over 16 AI providers. It enables multi-modal content management, vector search, and contextual chat with full data sovereignty for research and development teams.
Freemium
OpenFang.sh is an open-source agent operating system that orchestrates autonomous AI agents and capability packages across macOS, Linux, and Windows. It provides a secure, sandboxed runtime with built-in tools for tasks like research, monitoring, and automation, all managed through a native desktop
Freemium
Llama.cpp is an open-source tool for efficient inference of large language models. Run open source LLM models locally everywhere.
Free
Opper is a unified AI gateway and agent control plane that routes requests across 200+ models and modalities, offering centralized model routing, automated fallbacks, budget caps, LLM observability, a multi-provider testing playground, OpenAI-compatible SDK, and enterprise privacy/compliance control
Usage Based
canirun.ai is a searchable database mapping AI models to compatible hardware, listing CPUs/GPUs (including Apple M-series and NVIDIA cards), model requirements, VRAM/memory needs, filters and comparisons to plan local inference, fine-tuning, or deployment.
Free
OminiGate.ai is a unified AI API gateway that provides an OpenAI-compatible endpoint for text, image, and video models, enabling seamless switching between providers like OpenAI and Anthropic with minimal code changes. It features intelligent routing, automatic failover, cost optimization, and enter
Subscription
KoboldCpp is a versatile AI text-generation tool that supports various GGML and GGUF models with an intuitive UI, native image generation, and enhanced performance via CUDA and CLBlast acceleration.
Free
Odysseus is a privacy-first, self-hosted AI workspace for running and serving local LLMs, autonomous agents, and multi-turn chat, offering model management, hardware-aware serving, built-in tools, persistent memory, research workflows, and integrations.
Free
Infinity is an AI‑native database offering hybrid search across dense/sparse embeddings, tensors, and full‑text with optional RRF, weighted‑sum, or ColBERT reranking. It delivers 0.1 ms latency, 15 k qps, supports strings, numerics, and vectors for LLM developers, data scientists, and AI engineers.
Freemium
OC Maker AI generates anime-style characters from text or image inputs, offering inpainting/outpainting, pose control, animation and video-to-video conversion, 3D/Live2D export, and fine-grained style editing for illustration, animation, and game asset workflows.
Free
- $8