New top story on Hacker News: Show HN: Burr – A framework for building and debugging GenAI apps faster

Show HN: Burr – A framework for building and debugging GenAI apps faster
20 by elijahbenizzy | 6 comments on Hacker News.
Hey HN, we're developing Burr (github.com/dagworks-inc/burr), an open-source python framework that makes it easier to build and debug GenAI applications. Burr is a lightweight library that can integrate with your favorite tools and comes with a debugging UI. If you prefer a video introduction, you can watch me build a chatbot here: https://www.youtube.com/watch?v=rEZ4oDN0GdU . Common friction points we’ve seen with GenAI applications include logically modeling application flow, debugging and recreating error cases, and curating data for testing/evaluation (see https://ift.tt/yzZ9wg1 ). Burr aims to make these easier. You can run Burr locally – see instructions in the repo. We talked to many companies about the pains they felt in building applications on top of LLMs and were surprised how many built bespoke state management layers and used printlines to debug. We found that everyone wanted the ability to pull up the state of an application at a given point, poke at it to debug/tweak code, and use for later testing/evaluation. People integrating with LLMOps tools fared slightly better, but these tend to focus solely on API calls to test & evaluate prompts, and left the problem of logically modeling/checkpointing unsolved. Having platform tooling backgrounds, we felt that a good abstraction would help improve the experience. These problems all got easier to think about when we modeled applications a state machines composed of “actions” designed for introspection (for more read https://ift.tt/oGbPh76... ). We don’t want to limit what people can write, but we do want to constrain it just enough that the framework provides value and doesn’t get in the way. This led us to design Burr with the following core functionalities: 1. BYOF. Burr allows you to bring your own frameworks/delegate to any python code, like LangChain, LlamaIndex, Hamilton, etc. inside of “actions”. This provides you with the flexibility to mix and match so you’re not limited. 2. Pluggability. Burr comes with APIs to allow you to save/load (i.e. checkpoint) application state, run custom code before/after action execution, and add in your own telemetry provider (e.g. langfuse, datadog, DAGWorks, etc.). 3. UI. Burr comes with its own UI (following the python batteries included ethos) that you can run locally, with the intent to connect with your development/debugging workflow. You can see your application as it progresses and inspect its state at any given point. The above functionalities lend themselves well to building many types of applications quickly and flexibly using the tools you want. E.g. conversational RAG bots, text based games, human in the loop workflows, text to SQL bots, etc. Start with LangChain and then easily transition to your custom code or another framework without having to rewrite much of your application. Side note: we also see Burr as useful outside of interactive GenAI/LLMs applications, e.g. building hyper-parameter optimization routines for chunking and embeddings & orchestrating simulations. We have a swath of improvements planned. We would love feedback, contributions, & help prioritizing. Typescript support, more ergonomic UX + APIs for annotation and test/eval curation, as well as integrations with common telemetry frameworks and capture of finer grained information from frameworks like LangChain, LlamaIndex, Hamilton, etc… Re: the name Burr, you may recognize us as the authors of Hamilton (github.com/dagworks-inc/hamilton), named after Alexander Hamilton (the creator of the federal reserve). While Aaron Burr killed him in a duel, we see Burr being a complement, rather than killer to Hamilton ! That’s all for now. Please don’t hesitate to open github issues/discussions or join our discord https://ift.tt/4EQnavt to chat with us there. We’re still very early and would love to get your feedback!

No comments