What is Pricepertoken.com?

LLM Pricing MCP Server for Claude Code & Cursor provides real-time LLM pricing, price-per-token data, benchmark scores, latency, and endpoint availability inside Claude Code, Cursor, Windsurf, and other MCP-enabled assistants. It exposes tools to query all models, filter by author, context length, TTFT, speed, or capabilities, and retrieve detailed model metadata and provider-specific slugs.

Built-in comparisons let users run side-by-side model and provider pricing comparisons and rank models by coding, math, or general intelligence benchmarks. Integration instructions cover Claude Code, Cursor, Claude Desktop, and Windsurf with simple MCP configuration and no API key required.

The service supports workflows for cost-aware model selection, performance-driven benchmarking, and provider compatibility checks. Target users include AI developers, ML engineers, platform operators, and researchers who need model cost comparisons, benchmark rankings, and provider mapping.

Pricepertoken.com pricing Freemium

Ranges $0

10 models are available at no cost $0

Liquidai/lfm2-24b-a2b preview $0.00/m input tokens

Liquidai/lfm2-8b-a1b $0.01/m input tokens

Liquidai/lfm2-2.6b $0.01/m input tokens

Azure openai $75.00/m input vs $150.00/m output tokens

Verify on the official pricing page.

View plans

Pricepertoken.com user reviews

Would you recommend Pricepertoken.com?

Recommend this tool?

Pricepertoken.com's key features

MCP server integration exposing real-time pricing, benchmark, latency, and endpoint-availability data to MCP-enabled assistants
get_all_models: retrieve pricing for all models with filtering by author, context length, TTFT, speed, and capabilities
get_model: fetch detailed model information with optional provider override for provider-specific pricing
compare_models: side-by-side comparison of multiple models with optional provider selection
get_benchmarks: rank and retrieve models by specific benchmarks (coding, math, intelligence, etc.)

Pricepertoken.com use cases

Optimize API spending using MCP Server's real-time price-per-token, token rates, and benchmark rankings to automatically select the lowest-cost model that meets latency and accuracy SLAs across providers
Create a deployment guardrail that monitors endpoint availability and latency in real time, rerouting traffic to compatible providers and models when outages or SLA breaches are detected
Develop continuous A/B benchmarking pipelines that compare models' cost-per-inference, throughput, and latency, filtering and ranking options to surface the best model for batch vs. low-latency workloads and feeding results into CI/CD for automated model selection