New top story on Hacker News: Show HN: Llmswap – Python package to reduce LLM API costs by 50-90% with caching
Show HN: Llmswap – Python package to reduce LLM API costs by 50-90% with caching
8 by sreenathmenon | 1 comments on Hacker News.
I built llmswap to solve a problem I kept hitting in hackathons - burning through API credits while testing the same prompts repeatedly during development. It's a simple Python package that provides a unified interface for OpenAI, Anthropic, Google Gemini, and local models (Ollama), with built-in response caching that can cut API costs by 50-90%. Key features: - Intelligent caching with TTL and memory limits - Context-aware caching for multi-user apps - Auto-fallback between providers when one fails - Zero configuration - works with environment variables from llmswap import LLMClient client = LLMClient(cache_enabled=True) response = client.query("Explain quantum computing") # Second identical query returns from cache instantly (free) The caching is disabled by default for security. When enabled, it's thread-safe and includes context isolation for multi-user applications. Built this from components of a hackathon project. Already at 2.2k downloads on PyPI. Hope it helps others save on API costs during development. GitHub: https://ift.tt/AGBkUiI PyPI: https://ift.tt/IhBwHSL
8 by sreenathmenon | 1 comments on Hacker News.
I built llmswap to solve a problem I kept hitting in hackathons - burning through API credits while testing the same prompts repeatedly during development. It's a simple Python package that provides a unified interface for OpenAI, Anthropic, Google Gemini, and local models (Ollama), with built-in response caching that can cut API costs by 50-90%. Key features: - Intelligent caching with TTL and memory limits - Context-aware caching for multi-user apps - Auto-fallback between providers when one fails - Zero configuration - works with environment variables from llmswap import LLMClient client = LLMClient(cache_enabled=True) response = client.query("Explain quantum computing") # Second identical query returns from cache instantly (free) The caching is disabled by default for security. When enabled, it's thread-safe and includes context isolation for multi-user applications. Built this from components of a hackathon project. Already at 2.2k downloads on PyPI. Hope it helps others save on API costs during development. GitHub: https://ift.tt/AGBkUiI PyPI: https://ift.tt/IhBwHSL
No comments