Wednesday, April 3, 2024

New top story on Hacker News: Show HN: Plandex – an AI coding engine for complex tasks

Show HN: Plandex – an AI coding engine for complex tasks
12 by danenania | 8 comments on Hacker News.
Hey HN, I'm building Plandex ( https://plandex.ai ), an open source, terminal-based AI coding engine for complex tasks. I built Plandex because I was tired of copying and pasting code back and forth between ChatGPT and my projects. It can complete tasks that span multiple files and require many steps. It uses the OpenAI API with your API key (support for other models, including Claude, Gemini, and open source models is on the roadmap). You can watch a 2 minute demo here: https://player.vimeo.com/video/926634577 Here's a prompt I used to build the AWS infrastructure for Plandex Cloud (Plandex can be self-hosted or cloud-hosted): https://ift.tt/hdVkvPD... Something I think sets Plandex apart is a focus on working around bad outputs and iterating on tasks systematically. It's relatively easy to make a great looking demo for any tool, but the day-to-day of working with it has a lot more to do with how it handles edge cases and failures. Plandex tries to tighten the feedback loop between developer and LLM: - Every aspect of a Plandex plan is version-controlled, from the context to the conversation itself to model settings. As soon as things start to go off the rails, you can use the `plandex rewind` command to back up and add more context or iterate on the prompt. Git-style branches allow you to test and compare multiple approaches. - As a plan proceeds, tentative updates are accumulated in a protected sandbox (also version-controlled), preventing any wayward edits to your project files. - The `plandex changes` command opens a diff review TUI that lets you review pending changes side-by-side like the GitHub PR review UI. Just hit the 'r' key to reject any change that doesn’t look right. Once you’re satisfied, either press ctrl+a from the changes TUI or run `plandex apply` to apply the changes. - If you work on files you’ve loaded into context outside of Plandex, your changes are pulled in automatically so that the model always uses the latest state of your project. Plandex makes it easy to load files and directories in the terminal. You can load multiple paths: plandex load components/some-component.ts lib/api.ts ../sibling-dir/another-file.ts You can load entire directories recursively: plandex load src/lib -r You can use glob patterns: plandex load src/**/*.{ts,tsx} You can load directory layouts (file names only): plandex load src --tree Text content of urls: plandex load https://ift.tt/luXoZfW Or pipe data in: cargo test | plandex load For sending prompts, you can pass in a file: plandex tell -f "prompts/stripe/add-webhooks.txt" Or you can pop up vim and write your prompt there: plandex tell For shorter prompts you can pass them inline: plandex tell "set the header's background to #222 and text to white" You can run tasks in the background: plandex tell "write tests for all functions in lib/math/math.go. put them in lib/math_tests." --bg You can list all running or recently finished tasks: plandex ps And connect to any running task to start streaming it: plandex connect For more details, here’s a quick overview of commands and functionality: https://ift.tt/Ypgwnxm... Plandex is written in Go and is statically compiled, so it runs from a single small binary with no dependencies on any package managers or language runtimes. There’s a 1-line quick install: curl -sL https://ift.tt/4cSKeNh | bash It's early days, but Plandex is working well and is legitimately the tool I reach for first when I want to do something that is too large or complex for ChatGPT or GH Copilot. I would love to get your feedback. Feel free to hop into the Discord ( https://ift.tt/MBJx6yV ) and let me know how it goes. PRs are also welcome!

New top story on Hacker News: 'Lavender': The AI machine directing Israel's bombing in Gaza

'Lavender': The AI machine directing Israel's bombing in Gaza
363 by contemporary343 | 310 comments on Hacker News.


New top story on Hacker News: Show HN: Burr – A framework for building and debugging GenAI apps faster

Show HN: Burr – A framework for building and debugging GenAI apps faster
20 by elijahbenizzy | 6 comments on Hacker News.
Hey HN, we're developing Burr (github.com/dagworks-inc/burr), an open-source python framework that makes it easier to build and debug GenAI applications. Burr is a lightweight library that can integrate with your favorite tools and comes with a debugging UI. If you prefer a video introduction, you can watch me build a chatbot here: https://www.youtube.com/watch?v=rEZ4oDN0GdU . Common friction points we’ve seen with GenAI applications include logically modeling application flow, debugging and recreating error cases, and curating data for testing/evaluation (see https://ift.tt/yzZ9wg1 ). Burr aims to make these easier. You can run Burr locally – see instructions in the repo. We talked to many companies about the pains they felt in building applications on top of LLMs and were surprised how many built bespoke state management layers and used printlines to debug. We found that everyone wanted the ability to pull up the state of an application at a given point, poke at it to debug/tweak code, and use for later testing/evaluation. People integrating with LLMOps tools fared slightly better, but these tend to focus solely on API calls to test & evaluate prompts, and left the problem of logically modeling/checkpointing unsolved. Having platform tooling backgrounds, we felt that a good abstraction would help improve the experience. These problems all got easier to think about when we modeled applications a state machines composed of “actions” designed for introspection (for more read https://ift.tt/oGbPh76... ). We don’t want to limit what people can write, but we do want to constrain it just enough that the framework provides value and doesn’t get in the way. This led us to design Burr with the following core functionalities: 1. BYOF. Burr allows you to bring your own frameworks/delegate to any python code, like LangChain, LlamaIndex, Hamilton, etc. inside of “actions”. This provides you with the flexibility to mix and match so you’re not limited. 2. Pluggability. Burr comes with APIs to allow you to save/load (i.e. checkpoint) application state, run custom code before/after action execution, and add in your own telemetry provider (e.g. langfuse, datadog, DAGWorks, etc.). 3. UI. Burr comes with its own UI (following the python batteries included ethos) that you can run locally, with the intent to connect with your development/debugging workflow. You can see your application as it progresses and inspect its state at any given point. The above functionalities lend themselves well to building many types of applications quickly and flexibly using the tools you want. E.g. conversational RAG bots, text based games, human in the loop workflows, text to SQL bots, etc. Start with LangChain and then easily transition to your custom code or another framework without having to rewrite much of your application. Side note: we also see Burr as useful outside of interactive GenAI/LLMs applications, e.g. building hyper-parameter optimization routines for chunking and embeddings & orchestrating simulations. We have a swath of improvements planned. We would love feedback, contributions, & help prioritizing. Typescript support, more ergonomic UX + APIs for annotation and test/eval curation, as well as integrations with common telemetry frameworks and capture of finer grained information from frameworks like LangChain, LlamaIndex, Hamilton, etc… Re: the name Burr, you may recognize us as the authors of Hamilton (github.com/dagworks-inc/hamilton), named after Alexander Hamilton (the creator of the federal reserve). While Aaron Burr killed him in a duel, we see Burr being a complement, rather than killer to Hamilton ! That’s all for now. Please don’t hesitate to open github issues/discussions or join our discord https://ift.tt/4EQnavt to chat with us there. We’re still very early and would love to get your feedback!

New top story on Hacker News: The speed of sight: Individual variation in critical flicker fusion thresholds

The speed of sight: Individual variation in critical flicker fusion thresholds
6 by bookofjoe | 1 comments on Hacker News.