What is Pricepertoken.com?
LLM Pricing MCP Server for Claude Code & Cursor provides real-time LLM pricing, price-per-token data, benchmark scores, latency, and endpoint availability inside Claude Code, Cursor, Windsurf, and other MCP-enabled assistants. It exposes tools to query all models, filter by author, context length, TTFT, speed, or capabilities, and retrieve detailed model metadata and provider-specific slugs.
Built-in comparisons let users run side-by-side model and provider pricing comparisons and rank models by coding, math, or general intelligence benchmarks. Integration instructions cover Claude Code, Cursor, Claude Desktop, and Windsurf with simple MCP configuration and no API key required.
The service supports workflows for cost-aware model selection, performance-driven benchmarking, and provider compatibility checks. Target users include AI developers, ML engineers, platform operators, and researchers who need model cost comparisons, benchmark rankings, and provider mapping.
Pricepertoken.com pricing Freemium
Verify on the official pricing page.
View plansPricepertoken.com user reviews
Would you recommend Pricepertoken.com?
Pricepertoken.com's key features
-
MCP server integration exposing real-time pricing, benchmark, latency, and endpoint-availability data to MCP-enabled assistants
-
get_all_models: retrieve pricing for all models with filtering by author, context length, TTFT, speed, and capabilities
-
get_model: fetch detailed model information with optional provider override for provider-specific pricing
-
compare_models: side-by-side comparison of multiple models with optional provider selection
-
get_benchmarks: rank and retrieve models by specific benchmarks (coding, math, intelligence, etc.)
Pricepertoken.com use cases
-
Optimize API spending using MCP Server's real-time price-per-token, token rates, and benchmark rankings to automatically select the lowest-cost model that meets latency and accuracy SLAs across providers
-
Create a deployment guardrail that monitors endpoint availability and latency in real time, rerouting traffic to compatible providers and models when outages or SLA breaches are detected
-
Develop continuous A/B benchmarking pipelines that compare models' cost-per-inference, throughput, and latency, filtering and ranking options to surface the best model for batch vs. low-latency workloads and feeding results into CI/CD for automated model selection
Who is it for?
-
Machine learning developers
-
Mcp assistants
-
Cost-conscious project managers
-
Performance engineers
-
Provider evaluators