Don't know where your data is from? Bayesian modeling for unknown coordinates
4 by ckrapu | 0 comments on Hacker News.
Sunday, May 24, 2026
Saturday, May 23, 2026
Friday, May 22, 2026
Thursday, May 21, 2026
Wednesday, May 20, 2026
New top story on Hacker News: Show HN: Hocuspocus 4 – self-hosted Yjs collaboration backend
Show HN: Hocuspocus 4 – self-hosted Yjs collaboration backend
11 by philipisik | 3 comments on Hacker News.
Hi HN! I'm Philip, one of the founders of Tiptap. Next to our open-source rich text editor framework, we started developing Hocuspocus about five years ago and open-sourced it too, to solve one of our biggest challenges back then: real-time collaboration in web editors. We found Yjs by Kevin Jahns, a CRDT library that handles concurrent edits without conflicts. Basically, Yjs merges changes from users without conflicts and in real-time. Hocuspocus is the WebSocket server built on top of Yjs. It handles real-time sync, presence/awareness, persistence, and Redis-based scaling. While we use Hocuspocus at Tiptap as the collaboration backend for our cloud services, it also works with any Yjs client (Slate, Quill, Monaco, ProseMirror, or your own setup), and Yjs documents aren't limited to text at all. You can sync any structured data through them, and in the meantime we see projects that rely on Hocuspocus without using the Tiptap editor. We released Hocuspocus v4 under the MIT license a few weeks ago, and the biggest change is that it's no longer tied to Node. The previous versions depended on the ws package, which meant you couldn't run Hocuspocus on Bun, Deno, or Cloudflare Workers. We moved to crossws, a universal websocket adapter, so the same server now runs on Node, Bun, Deno, Cloudflare Workers, and Node with uWebSockets. That also lets you run collaboration at the edge. The other changes are smaller but are important if you're using Hocuspocus in production: 1. Every core class and hook payload takes a generic Context type now, so the auth/session shape you build in onAuthenticate flows through every other hook with full type safety (defaults to any so existing code doesn't break). 2. Document updates are now processed sequentially per connection through an internal queue, which fixes a correctness bug where async hooks could cause CRDT updates to apply out of order under load. 3. Transaction origins are structured objects now with a source field instead of raw values and there's an isTransactionOrigin() helper for narrowing. 4. Hook payloads use web-standard Request and Headers instead of Node's IncomingMessage. 5. The wire protocol is backward compatible in both directions, so you can roll out servers and providers independently. If you want to test Hocuspocus: npm install @hocuspocus/server @hocuspocus/provider Docs at: https://ift.tt/ATQLu6l Source at: https://ift.tt/R0XxyU9 Because running real-time collaboration on Workers or Durable Objects is new in v4, that's the use case we'd most like to hear your questions and feedback on.
11 by philipisik | 3 comments on Hacker News.
Hi HN! I'm Philip, one of the founders of Tiptap. Next to our open-source rich text editor framework, we started developing Hocuspocus about five years ago and open-sourced it too, to solve one of our biggest challenges back then: real-time collaboration in web editors. We found Yjs by Kevin Jahns, a CRDT library that handles concurrent edits without conflicts. Basically, Yjs merges changes from users without conflicts and in real-time. Hocuspocus is the WebSocket server built on top of Yjs. It handles real-time sync, presence/awareness, persistence, and Redis-based scaling. While we use Hocuspocus at Tiptap as the collaboration backend for our cloud services, it also works with any Yjs client (Slate, Quill, Monaco, ProseMirror, or your own setup), and Yjs documents aren't limited to text at all. You can sync any structured data through them, and in the meantime we see projects that rely on Hocuspocus without using the Tiptap editor. We released Hocuspocus v4 under the MIT license a few weeks ago, and the biggest change is that it's no longer tied to Node. The previous versions depended on the ws package, which meant you couldn't run Hocuspocus on Bun, Deno, or Cloudflare Workers. We moved to crossws, a universal websocket adapter, so the same server now runs on Node, Bun, Deno, Cloudflare Workers, and Node with uWebSockets. That also lets you run collaboration at the edge. The other changes are smaller but are important if you're using Hocuspocus in production: 1. Every core class and hook payload takes a generic Context type now, so the auth/session shape you build in onAuthenticate flows through every other hook with full type safety (defaults to any so existing code doesn't break). 2. Document updates are now processed sequentially per connection through an internal queue, which fixes a correctness bug where async hooks could cause CRDT updates to apply out of order under load. 3. Transaction origins are structured objects now with a source field instead of raw values and there's an isTransactionOrigin() helper for narrowing. 4. Hook payloads use web-standard Request and Headers instead of Node's IncomingMessage. 5. The wire protocol is backward compatible in both directions, so you can roll out servers and providers independently. If you want to test Hocuspocus: npm install @hocuspocus/server @hocuspocus/provider Docs at: https://ift.tt/ATQLu6l Source at: https://ift.tt/R0XxyU9 Because running real-time collaboration on Workers or Durable Objects is new in v4, that's the use case we'd most like to hear your questions and feedback on.
Tuesday, May 19, 2026
Monday, May 18, 2026
Sunday, May 17, 2026
Saturday, May 16, 2026
Friday, May 15, 2026
Thursday, May 14, 2026
Wednesday, May 13, 2026
Tuesday, May 12, 2026
New top story on Hacker News: Show HN: Statewright – Visual state machines that make AI agents reliable
Show HN: Statewright – Visual state machines that make AI agents reliable
8 by azurewraith | 0 comments on Hacker News.
Agentic problem solving in its current state is very brittle. I fell in love with it, but it creates as many problems as it solves. I'm Ben Cochran, I spent 20+ years in the trenches with full-stack Engineering, DevOps, high performance computing & ML with stints at NVIDIA, AMD and various other organizations most recently as a Distinguished Engineer. For agents to work reliably you either need massive parameter counts or massive context windows to keep the solution spaces workable. Most people are brute forcing reliability with bigger models and longer prompts. What if I made the problem smaller instead of making the model bigger? I took a different approach by using smaller models: models in the 13-20B parameter range and set them to task solving real SWE-bench problems. I constrained the tool and solution spaces using formal state machines. Each state in the machine defines which tools the model can access, how many iterations it gets and what transitions are valid. A planning state gets read-only tools. An implementation state gets edit tools (scoped to prevent mega edits) and write friendly bash tools. The testing state gets bash but only for testing commands. The model cannot physically skip steps or use the wrong tool at the wrong time. It is enforced via protocol, not via prompts. The results were more promising than I would have expected. Across multiple model families irrespective of age (qwen-coder, gpt-oss, gemma4) and the improvements were consistent above the 13B parameter inflection point. Below that, models can navigate the state machine but can't retain enough context to produce accurate edits. More on the research bit: https://ift.tt/2foS1dF Surprisingly this yielded improvements in frontier models as well. Haiku and Sonnet start to punch above their weight and Opus solves more reliably with fewer tokens and death spirals. Fine tuning did not yield these kinds of functional improvements for me. The takeaway it seems is that context window utilization matters more than raw context size - a tightly scoped working context at each step outperforms a model given carte blanche over everything. Constraining LLMs which are non-idempotent by using deterministic code is a pattern that nobody is currently talking about. So, I built Statewright. Its core is a Rust engine that evaluates state machine definitions: states, transitions, guards and tool restrictions. Its orchestration doesn't use an LLM, just enforces the state machine. On top of that is a plugin layer that integrates with Claude Code (and soon Codex, Cursor and others) via MCP. When you activate a workflow, hooks enforce the guardrails per state automatically. The model sees 5 tools available instead of dozens, gets clear instructions for the current phase and transitions when conditions are met. Importantly it tells the model when it's attempting to do something that isn't in scope, incorrect or when it needs to try something else after getting stuck. You can use your agent via MCP to build a state machine for you to solve a problem in your current context. The visual editor at statewright.ai lets you tweak these workflows in a graph view... You can clearly see the failure paths, the retry loops and the approval gates. State machines aren't DAGs; they loop and retry, which is what agentic work actually needs. Statewright is currently live with a free tier, try it out in Claude Code by running the following: /plugin marketplace add statewright/statewright /plugin install statewright /reload-plugins Then "start the bugfix workflow" or /statewright start bugfix. You'll need to paste your API key when prompted. The latest versions of Claude may complain -- paste the API key again and say you really mean it, Claude is just being cautious here. Feedback is welcome on the workflow editor, the plugin experience, and tell me what workflows you'd want to build first. Agents are suggestions, states are laws.
8 by azurewraith | 0 comments on Hacker News.
Agentic problem solving in its current state is very brittle. I fell in love with it, but it creates as many problems as it solves. I'm Ben Cochran, I spent 20+ years in the trenches with full-stack Engineering, DevOps, high performance computing & ML with stints at NVIDIA, AMD and various other organizations most recently as a Distinguished Engineer. For agents to work reliably you either need massive parameter counts or massive context windows to keep the solution spaces workable. Most people are brute forcing reliability with bigger models and longer prompts. What if I made the problem smaller instead of making the model bigger? I took a different approach by using smaller models: models in the 13-20B parameter range and set them to task solving real SWE-bench problems. I constrained the tool and solution spaces using formal state machines. Each state in the machine defines which tools the model can access, how many iterations it gets and what transitions are valid. A planning state gets read-only tools. An implementation state gets edit tools (scoped to prevent mega edits) and write friendly bash tools. The testing state gets bash but only for testing commands. The model cannot physically skip steps or use the wrong tool at the wrong time. It is enforced via protocol, not via prompts. The results were more promising than I would have expected. Across multiple model families irrespective of age (qwen-coder, gpt-oss, gemma4) and the improvements were consistent above the 13B parameter inflection point. Below that, models can navigate the state machine but can't retain enough context to produce accurate edits. More on the research bit: https://ift.tt/2foS1dF Surprisingly this yielded improvements in frontier models as well. Haiku and Sonnet start to punch above their weight and Opus solves more reliably with fewer tokens and death spirals. Fine tuning did not yield these kinds of functional improvements for me. The takeaway it seems is that context window utilization matters more than raw context size - a tightly scoped working context at each step outperforms a model given carte blanche over everything. Constraining LLMs which are non-idempotent by using deterministic code is a pattern that nobody is currently talking about. So, I built Statewright. Its core is a Rust engine that evaluates state machine definitions: states, transitions, guards and tool restrictions. Its orchestration doesn't use an LLM, just enforces the state machine. On top of that is a plugin layer that integrates with Claude Code (and soon Codex, Cursor and others) via MCP. When you activate a workflow, hooks enforce the guardrails per state automatically. The model sees 5 tools available instead of dozens, gets clear instructions for the current phase and transitions when conditions are met. Importantly it tells the model when it's attempting to do something that isn't in scope, incorrect or when it needs to try something else after getting stuck. You can use your agent via MCP to build a state machine for you to solve a problem in your current context. The visual editor at statewright.ai lets you tweak these workflows in a graph view... You can clearly see the failure paths, the retry loops and the approval gates. State machines aren't DAGs; they loop and retry, which is what agentic work actually needs. Statewright is currently live with a free tier, try it out in Claude Code by running the following: /plugin marketplace add statewright/statewright /plugin install statewright /reload-plugins Then "start the bugfix workflow" or /statewright start bugfix. You'll need to paste your API key when prompted. The latest versions of Claude may complain -- paste the API key again and say you really mean it, Claude is just being cautious here. Feedback is welcome on the workflow editor, the plugin experience, and tell me what workflows you'd want to build first. Agents are suggestions, states are laws.
Monday, May 11, 2026
Sunday, May 10, 2026
Saturday, May 9, 2026
Friday, May 8, 2026
Thursday, May 7, 2026
New top story on Hacker News: Show HN: Stage CLI – an easier way of reading your AI generated changes locally
Show HN: Stage CLI – an easier way of reading your AI generated changes locally
8 by cpan22 | 2 comments on Hacker News.
Hey HN! We're Charles and Dean. A few weeks ago we posted about Stage, a code review tool that guides you through reading a PR step by step - https://ift.tt/Kf1pD4I . We got a lot of great feedback but also heard from many people that they wanted to have the chapters experience even before opening a PR… so we built the Stage CLI as the local, open-source version that anyone can try. Here’s a quick demo video: https://ift.tt/qWrjyoQ It works with any coding agent of your choice. The skill instructs the agent to read your current branch’s changes, break them down into separate logical chapters, and open them in a local browser. We’ve found that reading changes this way is a lot easier for us than reading them in an IDE or other similar CLI tools, which present diffs to you in repository tree order. You can see a few examples of what it feels like here: https://ift.tt/pY0zi1k . Try it out and let us know what you think! Would love to hear any feedback :)
8 by cpan22 | 2 comments on Hacker News.
Hey HN! We're Charles and Dean. A few weeks ago we posted about Stage, a code review tool that guides you through reading a PR step by step - https://ift.tt/Kf1pD4I . We got a lot of great feedback but also heard from many people that they wanted to have the chapters experience even before opening a PR… so we built the Stage CLI as the local, open-source version that anyone can try. Here’s a quick demo video: https://ift.tt/qWrjyoQ It works with any coding agent of your choice. The skill instructs the agent to read your current branch’s changes, break them down into separate logical chapters, and open them in a local browser. We’ve found that reading changes this way is a lot easier for us than reading them in an IDE or other similar CLI tools, which present diffs to you in repository tree order. You can see a few examples of what it feels like here: https://ift.tt/pY0zi1k . Try it out and let us know what you think! Would love to hear any feedback :)
Wednesday, May 6, 2026
New top story on Hacker News: Mexico City is sinking so quickly, it can be seen from space
Mexico City is sinking so quickly, it can be seen from space
12 by randycupertino | 1 comments on Hacker News.
12 by randycupertino | 1 comments on Hacker News.
Tuesday, May 5, 2026
Monday, May 4, 2026
Sunday, May 3, 2026
Saturday, May 2, 2026
Friday, May 1, 2026
Thursday, April 30, 2026
Wednesday, April 29, 2026
Tuesday, April 28, 2026
Monday, April 27, 2026
New top story on Hacker News: China blocks Meta's acquisition of AI startup Manus
China blocks Meta's acquisition of AI startup Manus
33 by yakkomajuri | 7 comments on Hacker News.
https://ift.tt/iBFlkr2... https://ift.tt/IApTbhR
33 by yakkomajuri | 7 comments on Hacker News.
https://ift.tt/iBFlkr2... https://ift.tt/IApTbhR
Sunday, April 26, 2026
Saturday, April 25, 2026
Friday, April 24, 2026
Thursday, April 23, 2026
Wednesday, April 22, 2026
Tuesday, April 21, 2026
Monday, April 20, 2026
Sunday, April 19, 2026
Saturday, April 18, 2026
New top story on Hacker News: Show HN: AI Subroutines – Run automation scripts inside your browser tab
Show HN: AI Subroutines – Run automation scripts inside your browser tab
7 by arjunchint | 0 comments on Hacker News.
We built AI Subroutines in rtrvr.ai. Record a browser task once, save it as a callable tool, replay it at: zero token cost, zero LLM inference delay, and zero mistakes. The subroutine itself is a deterministic script composed of discovered network calls hitting the site's backend as well as page interactions like click/type/find. The key architectural decision: the script executes inside the webpage itself, not through a proxy, not in a headless worker, not out of process. The script dispatches requests from the tab's execution context, so auth, CSRF, TLS session, and signed headers get added to all requests and propagate for free. No certificate installation, no TLS fingerprint modification, no separate auth stack to maintain. During recording, the extension intercepts network requests (MAIN-world fetch/XHR patch + webRequest fallback). We score and trim ~300 requests down to ~5 based on method, timing relative to DOM events, and origin. Volatile GraphQL operation IDs are detected and force a DOM-only fallback before they break silently on the next run. The generated code combines network calls with DOM actions (click, type, find) in the same function via an rtrvr.* helper namespace. Point the agent at a spreadsheet of 500 rows and with just one LLM call parameters are assigned and 500 Subroutines kicked off. Key use cases: - record sending IG DM, then have reusable and callable routine to send DMs at zero token cost - create routine getting latest products in site catalog, call it to get thousands of products via direct graphql queries - setup routine to file EHR form based on parameters to the tool, AI infers parameters from current page context and calls tool - reuse routine daily to sync outbound messages on LinkedIn/Slack/Gmail to a CRM using a MCP server We see the fundamental reason that browser agents haven't taken off is that for repetitive tasks going through the inference loop is unnecessary. Better to just record once, and get the LLM to generate a script leveraging all the possible ways to interact with a site and the wider web like directly calling backed API's, interacting with the DOM, and calling 3P tools/APIs/MCP servers.
7 by arjunchint | 0 comments on Hacker News.
We built AI Subroutines in rtrvr.ai. Record a browser task once, save it as a callable tool, replay it at: zero token cost, zero LLM inference delay, and zero mistakes. The subroutine itself is a deterministic script composed of discovered network calls hitting the site's backend as well as page interactions like click/type/find. The key architectural decision: the script executes inside the webpage itself, not through a proxy, not in a headless worker, not out of process. The script dispatches requests from the tab's execution context, so auth, CSRF, TLS session, and signed headers get added to all requests and propagate for free. No certificate installation, no TLS fingerprint modification, no separate auth stack to maintain. During recording, the extension intercepts network requests (MAIN-world fetch/XHR patch + webRequest fallback). We score and trim ~300 requests down to ~5 based on method, timing relative to DOM events, and origin. Volatile GraphQL operation IDs are detected and force a DOM-only fallback before they break silently on the next run. The generated code combines network calls with DOM actions (click, type, find) in the same function via an rtrvr.* helper namespace. Point the agent at a spreadsheet of 500 rows and with just one LLM call parameters are assigned and 500 Subroutines kicked off. Key use cases: - record sending IG DM, then have reusable and callable routine to send DMs at zero token cost - create routine getting latest products in site catalog, call it to get thousands of products via direct graphql queries - setup routine to file EHR form based on parameters to the tool, AI infers parameters from current page context and calls tool - reuse routine daily to sync outbound messages on LinkedIn/Slack/Gmail to a CRM using a MCP server We see the fundamental reason that browser agents haven't taken off is that for repetitive tasks going through the inference loop is unnecessary. Better to just record once, and get the LLM to generate a script leveraging all the possible ways to interact with a site and the wider web like directly calling backed API's, interacting with the DOM, and calling 3P tools/APIs/MCP servers.
Friday, April 17, 2026
Thursday, April 16, 2026
Wednesday, April 15, 2026
Tuesday, April 14, 2026
New top story on Hacker News: Show HN: A memory database that forgets, consolidates, and detects contradiction
Show HN: A memory database that forgets, consolidates, and detects contradiction
9 by pranabsarkar | 2 comments on Hacker News.
Vector databases store memories. They don't manage them. After 10k memories, recall quality degrades because there's no consolidation, no forgetting, no conflict resolution. Your AI agent just gets noisier. YantrikDB is a cognitive memory engine — embed it, run it as a server, or connect via MCP. It thinks about what it stores: consolidation collapses duplicate memories, contradiction detection flags incompatible facts, temporal decay with configurable half-life lets unimportant memories fade like human memory does. Single Rust binary. HTTP + binary wire protocol. 2-voter + 1-witness HA cluster via Docker Compose or Kubernetes. Chaos-tested failover, runtime deadlock detection (parking_lot), per-tenant quotas, Prometheus metrics. Ran a 42-task hardening sprint last week — 1178 core tests, cargo-fuzz targets, CRDT property tests, 5 ops runbooks. Live on a 3-node Proxmox homelab cluster with multiple tenants. Alpha — primary user is me, looking for the second one.
9 by pranabsarkar | 2 comments on Hacker News.
Vector databases store memories. They don't manage them. After 10k memories, recall quality degrades because there's no consolidation, no forgetting, no conflict resolution. Your AI agent just gets noisier. YantrikDB is a cognitive memory engine — embed it, run it as a server, or connect via MCP. It thinks about what it stores: consolidation collapses duplicate memories, contradiction detection flags incompatible facts, temporal decay with configurable half-life lets unimportant memories fade like human memory does. Single Rust binary. HTTP + binary wire protocol. 2-voter + 1-witness HA cluster via Docker Compose or Kubernetes. Chaos-tested failover, runtime deadlock detection (parking_lot), per-tenant quotas, Prometheus metrics. Ran a 42-task hardening sprint last week — 1178 core tests, cargo-fuzz targets, CRDT property tests, 5 ops runbooks. Live on a 3-node Proxmox homelab cluster with multiple tenants. Alpha — primary user is me, looking for the second one.
Monday, April 13, 2026
Sunday, April 12, 2026
Saturday, April 11, 2026
Friday, April 10, 2026
Thursday, April 9, 2026
Wednesday, April 8, 2026
Tuesday, April 7, 2026
Monday, April 6, 2026
New top story on Hacker News: Show HN: GovAuctions lets you browse government auctions at once
Show HN: GovAuctions lets you browse government auctions at once
20 by player_piano | 13 comments on Hacker News.
I've long been into finding deals on government auction sites (seizures, surplus sales etc.) - right now for example San Diego DHS is selling 26 tons of lead shot, with bidding starting at $1,000 ¯\_(ツ)_/¯ It has historically been extremely tedious though: scanning dozens of janky sites which have interminable page loading times; back buttons take you all the way back to the homepage etc. The site I built - GovAuctions - lets you search every government surplus auction at once. You can filter by location, category, and price, save items to a watchlist, and get alerts when new auctions match what you're looking for. Let me know what you think, if you have any suggestions, and if you find any deals in your area!
20 by player_piano | 13 comments on Hacker News.
I've long been into finding deals on government auction sites (seizures, surplus sales etc.) - right now for example San Diego DHS is selling 26 tons of lead shot, with bidding starting at $1,000 ¯\_(ツ)_/¯ It has historically been extremely tedious though: scanning dozens of janky sites which have interminable page loading times; back buttons take you all the way back to the homepage etc. The site I built - GovAuctions - lets you search every government surplus auction at once. You can filter by location, category, and price, save items to a watchlist, and get alerts when new auctions match what you're looking for. Let me know what you think, if you have any suggestions, and if you find any deals in your area!
Sunday, April 5, 2026
Saturday, April 4, 2026
Friday, April 3, 2026
Thursday, April 2, 2026
Wednesday, April 1, 2026
Tuesday, March 31, 2026
Monday, March 30, 2026
Sunday, March 29, 2026
New top story on Hacker News: Pretext: TypeScript library for multiline text measurement and layout
Pretext: TypeScript library for multiline text measurement and layout
41 by emersonmacro | 2 comments on Hacker News.
https://ift.tt/ARZV30X , https://ift.tt/APVevja Demos: https://ift.tt/8GlSmNn , https://somnai-dreams.github.io/pretext-demos/
41 by emersonmacro | 2 comments on Hacker News.
https://ift.tt/ARZV30X , https://ift.tt/APVevja Demos: https://ift.tt/8GlSmNn , https://somnai-dreams.github.io/pretext-demos/
Saturday, March 28, 2026
Friday, March 27, 2026
New top story on Hacker News: Show HN: Sup AI, a confidence-weighted ensemble (52.15% on Humanity's Last Exam)
Show HN: Sup AI, a confidence-weighted ensemble (52.15% on Humanity's Last Exam)
15 by supai | 9 comments on Hacker News.
Hi HN. I'm Ken, a 20-year-old Stanford CS student. I built Sup AI. I started working on this because no single AI model is right all the time, but their errors don’t strongly correlate. In other words, models often make unique mistakes relative to other models. So I run multiple models in parallel and synthesize the outputs by weighting segments based on confidence. Low entropy in the output token probability distributions correlates with accuracy. High entropy is often where hallucinations begin. My dad Scott (AI Research Scientist at TRI) is my research partner on this. He sends me papers at all hours, we argue about whether they actually apply and what modifications make sense, and then I build and test things. The entropy-weighting approach came out of one of those conversations. In our eval on Humanity's Last Exam, Sup scored 52.15%. The best individual model in the same evaluation run got 44.74%. The relative gap is statistically significant (p < 0.001). Methodology, eval code, data, and raw results: - https://sup.ai/research/hle-white-paper-jan-9-2026 - https://github.com/supaihq/hle Limitations: - We evaluated 1,369 of the 2,500 HLE questions (details in the above links) - Not all APIs expose token logprobs; we use several methods to estimate confidence when they don't We tried offering free access and it got abused so badly it nearly killed us. Right now the sustainable option is a $5 starter credit with card verification (no auto-charge). If you don't want to sign up, drop a prompt in the comments and I'll run it myself and post the result. Try it at https://sup.ai . My dad Scott (@scottmu) is in the thread too. Would love blunt feedback, especially where this really works for you and where it falls short. Here's a short demo video: https://www.youtube.com/watch?v=DRcns0rRhsg
15 by supai | 9 comments on Hacker News.
Hi HN. I'm Ken, a 20-year-old Stanford CS student. I built Sup AI. I started working on this because no single AI model is right all the time, but their errors don’t strongly correlate. In other words, models often make unique mistakes relative to other models. So I run multiple models in parallel and synthesize the outputs by weighting segments based on confidence. Low entropy in the output token probability distributions correlates with accuracy. High entropy is often where hallucinations begin. My dad Scott (AI Research Scientist at TRI) is my research partner on this. He sends me papers at all hours, we argue about whether they actually apply and what modifications make sense, and then I build and test things. The entropy-weighting approach came out of one of those conversations. In our eval on Humanity's Last Exam, Sup scored 52.15%. The best individual model in the same evaluation run got 44.74%. The relative gap is statistically significant (p < 0.001). Methodology, eval code, data, and raw results: - https://sup.ai/research/hle-white-paper-jan-9-2026 - https://github.com/supaihq/hle Limitations: - We evaluated 1,369 of the 2,500 HLE questions (details in the above links) - Not all APIs expose token logprobs; we use several methods to estimate confidence when they don't We tried offering free access and it got abused so badly it nearly killed us. Right now the sustainable option is a $5 starter credit with card verification (no auto-charge). If you don't want to sign up, drop a prompt in the comments and I'll run it myself and post the result. Try it at https://sup.ai . My dad Scott (@scottmu) is in the thread too. Would love blunt feedback, especially where this really works for you and where it falls short. Here's a short demo video: https://www.youtube.com/watch?v=DRcns0rRhsg
Thursday, March 26, 2026
Wednesday, March 25, 2026
Tuesday, March 24, 2026
Monday, March 23, 2026
Sunday, March 22, 2026
Saturday, March 21, 2026
New top story on Hacker News: Show HN: Joonote – A note-taking app on your lock screen and notification panel
Show HN: Joonote – A note-taking app on your lock screen and notification panel
7 by kilgarenone | 0 comments on Hacker News.
I finally built this app after many years of being sick of unlocking my phone every goddamn time I need to take or view my notes. It particularly sucks when I'm doing my grocery and going down the list. I started building last year June. This is a native app written in Kotlin. And since I'm a 100% Web dev guy, I gotta say this wouldn't have been possible without this AI to assist me. So this isn't "vibe-coded". I simply used the chat interface in Gemini website, manually copy paste codes to build and integrate every single thing in the app! I used gemini to build it just because I was piggybacking on my last company's enterprise subscription. I personally didn't subscribe to any AI (and still don't cuz the free quota seems enough for me :) So I certainly have learnt alot about Android development, architecture patterns, Kotlin syntax, and obeying Google's whims. Can't say I love it all, but for the sake of this app, I will :) Anyway, I finally have the app I wish existed, and I'm using it everyday. It not only does the main thing I needed it to do, but there's also all this stuff: - Make your notes private if you don't want to show them on lock screen. - Create check/to-do lists. - Set one time or recurring reminders. - Full-text search your notes in the app. - Speech-to-text. - Organize your notes with custom or color labels. - Pin the app as a widget on your home screen. - You can auto backup and restore your notes on new install or Android device. - Works offline. - And no funny business happening in the background https://ift.tt/0on3IWz It's 30-day trial, then a one-time $9.99 to go Pro forever. I would love you all to check it out, FWIW. Ok thanks!
7 by kilgarenone | 0 comments on Hacker News.
I finally built this app after many years of being sick of unlocking my phone every goddamn time I need to take or view my notes. It particularly sucks when I'm doing my grocery and going down the list. I started building last year June. This is a native app written in Kotlin. And since I'm a 100% Web dev guy, I gotta say this wouldn't have been possible without this AI to assist me. So this isn't "vibe-coded". I simply used the chat interface in Gemini website, manually copy paste codes to build and integrate every single thing in the app! I used gemini to build it just because I was piggybacking on my last company's enterprise subscription. I personally didn't subscribe to any AI (and still don't cuz the free quota seems enough for me :) So I certainly have learnt alot about Android development, architecture patterns, Kotlin syntax, and obeying Google's whims. Can't say I love it all, but for the sake of this app, I will :) Anyway, I finally have the app I wish existed, and I'm using it everyday. It not only does the main thing I needed it to do, but there's also all this stuff: - Make your notes private if you don't want to show them on lock screen. - Create check/to-do lists. - Set one time or recurring reminders. - Full-text search your notes in the app. - Speech-to-text. - Organize your notes with custom or color labels. - Pin the app as a widget on your home screen. - You can auto backup and restore your notes on new install or Android device. - Works offline. - And no funny business happening in the background https://ift.tt/0on3IWz It's 30-day trial, then a one-time $9.99 to go Pro forever. I would love you all to check it out, FWIW. Ok thanks!
Friday, March 20, 2026
Thursday, March 19, 2026
New top story on Hacker News: Show HN: Dumped Wix for an AI Edge agent so I never have to hire junior staff
Show HN: Dumped Wix for an AI Edge agent so I never have to hire junior staff
5 by axotopia | 2 comments on Hacker News.
I run a building design consultancy. I got tired of paying Wix $40/month for a brochure that couldn’t answer simple service questions, and me wasting hours on the same FAQs. So I killed it all and spent 4 months building a 'talker': https://axoworks.com The stack is completely duct-taped: Netlify’s 10s serverless timeout forced me to split the agent into three pieces: Brain (Edge), Hands (Browser), and Voice (Edge). I haven’t coded in 30 years. This was 3 steps forward, 2 steps back, heavily guided by AI. The fight that proved it worked: 2 weeks ago, a licensed architect attacked the bot, trying to prove my business model harms the profession. The AI (DeepSeek-R3) completely dismantled his arguments. It was hilariously caustic. Log: https://ift.tt/82iQkYn... A few battle scars: * Web Speech API works fine, right up until someone speaks Chinese without toggling the language mode. Then it forcefully spits out English phonetic gibberish. Still a headache. * Liability is the killer. Hallucinate a building code clause? We’re dead. Insurance won’t touch us. * We publish the audit logs to keep ourselves honest and make sure the system stays hardened. Audit: https://ift.tt/4sTxpUE The hardest part was getting the intent right: making one LLM pivot seamlessly from a warm principal’s tone with a homeowner, to a defensive bulldog when attacked by a peer. That took 2.5 months of tuning. We burn through tokens with an 'Eager RAG' hack (pre-fetching guesses) just to improve responsiveness. I also ripped out the “essential” persistent DBs—less than 5% of visitors ever return, so why bother? If a client drops mid-query, their session vanishes. No server-side queues. The point: To let me operate with a network of seasoned pros, and trim the fat. Try to break it. I’ll be in the comments. Kee
5 by axotopia | 2 comments on Hacker News.
I run a building design consultancy. I got tired of paying Wix $40/month for a brochure that couldn’t answer simple service questions, and me wasting hours on the same FAQs. So I killed it all and spent 4 months building a 'talker': https://axoworks.com The stack is completely duct-taped: Netlify’s 10s serverless timeout forced me to split the agent into three pieces: Brain (Edge), Hands (Browser), and Voice (Edge). I haven’t coded in 30 years. This was 3 steps forward, 2 steps back, heavily guided by AI. The fight that proved it worked: 2 weeks ago, a licensed architect attacked the bot, trying to prove my business model harms the profession. The AI (DeepSeek-R3) completely dismantled his arguments. It was hilariously caustic. Log: https://ift.tt/82iQkYn... A few battle scars: * Web Speech API works fine, right up until someone speaks Chinese without toggling the language mode. Then it forcefully spits out English phonetic gibberish. Still a headache. * Liability is the killer. Hallucinate a building code clause? We’re dead. Insurance won’t touch us. * We publish the audit logs to keep ourselves honest and make sure the system stays hardened. Audit: https://ift.tt/4sTxpUE The hardest part was getting the intent right: making one LLM pivot seamlessly from a warm principal’s tone with a homeowner, to a defensive bulldog when attacked by a peer. That took 2.5 months of tuning. We burn through tokens with an 'Eager RAG' hack (pre-fetching guesses) just to improve responsiveness. I also ripped out the “essential” persistent DBs—less than 5% of visitors ever return, so why bother? If a client drops mid-query, their session vanishes. No server-side queues. The point: To let me operate with a network of seasoned pros, and trim the fat. Try to break it. I’ll be in the comments. Kee
Wednesday, March 18, 2026
New top story on Hacker News: 2025 Turing award given for quantum information science
2025 Turing award given for quantum information science
21 by srvmshr | 4 comments on Hacker News.
https://ift.tt/OMNQTDs... https://ift.tt/6teoTNQ... https://ift.tt/ylDX3pF... https://ift.tt/gJCjOBQ...
21 by srvmshr | 4 comments on Hacker News.
https://ift.tt/OMNQTDs... https://ift.tt/6teoTNQ... https://ift.tt/ylDX3pF... https://ift.tt/gJCjOBQ...
Tuesday, March 17, 2026
Monday, March 16, 2026
Sunday, March 15, 2026
New top story on Hacker News: Ask HN: How is AI-assisted coding going for you professionally?
Ask HN: How is AI-assisted coding going for you professionally?
18 by svara | 6 comments on Hacker News.
Comment sections on AI threads tend to split into "we're all cooked" and "AI is useless." I'd like to cut through the noise and learn what's actually working and what isn't, from concrete experience. If you've recently used AI tools for professional coding work, tell us about it. What tools did you use? What worked well and why? What challenges did you hit, and how (if at all) did you solve them? Please share enough context (stack, project type, team size, experience level) for others to learn from your experience. The goal is to build a grounded picture of where AI-assisted development actually stands in March 2026, without the hot air.
18 by svara | 6 comments on Hacker News.
Comment sections on AI threads tend to split into "we're all cooked" and "AI is useless." I'd like to cut through the noise and learn what's actually working and what isn't, from concrete experience. If you've recently used AI tools for professional coding work, tell us about it. What tools did you use? What worked well and why? What challenges did you hit, and how (if at all) did you solve them? Please share enough context (stack, project type, team size, experience level) for others to learn from your experience. The goal is to build a grounded picture of where AI-assisted development actually stands in March 2026, without the hot air.
Saturday, March 14, 2026
New top story on Hacker News: Show HN: Learn Arabic with spaced repetition and comprehensible input
Show HN: Learn Arabic with spaced repetition and comprehensible input
11 by adangit | 2 comments on Hacker News.
Sharing a friends first-ever Rails application, dedicated to Arabic learning, from 0 to 1. Pulls language learning methods from Anki, comprehensible input and more.
11 by adangit | 2 comments on Hacker News.
Sharing a friends first-ever Rails application, dedicated to Arabic learning, from 0 to 1. Pulls language learning methods from Anki, comprehensible input and more.
Friday, March 13, 2026
Thursday, March 12, 2026
Wednesday, March 11, 2026
Tuesday, March 10, 2026
Monday, March 9, 2026
Sunday, March 8, 2026
New top story on Hacker News: Show HN: Skir – like Protocol Buffer but better
Show HN: Skir – like Protocol Buffer but better
6 by gepheum | 1 comments on Hacker News.
Why I built Skir: https://ift.tt/JtT8yro... Quick start: npx skir init All the config lives in one YML file. Website: https://skir.build GitHub: https://ift.tt/6jspZtY Would love feedback especially from teams running mixed-language stacks.
6 by gepheum | 1 comments on Hacker News.
Why I built Skir: https://ift.tt/JtT8yro... Quick start: npx skir init All the config lives in one YML file. Website: https://skir.build GitHub: https://ift.tt/6jspZtY Would love feedback especially from teams running mixed-language stacks.
Saturday, March 7, 2026
Friday, March 6, 2026
New top story on Hacker News: Show HN: Claude-replay – A video-like player for Claude Code sessions
Show HN: Claude-replay – A video-like player for Claude Code sessions
13 by es617 | 7 comments on Hacker News.
I got tired of sharing AI demos with terminal screenshots or screen recordings. Claude Code already stores full session transcripts locally as JSONL files. Those logs contain everything: prompts, tool calls, thinking blocks, and timestamps. I built a small CLI tool that converts those logs into an interactive HTML replay. You can step through the session, jump through the timeline, expand tool calls, and inspect the full conversation. The output is a single self-contained HTML file — no dependencies. You can email it, host it anywhere, embed it in a blog post, and it works on mobile. Repo: https://ift.tt/kVc30Zt Example replay: https://es617.github.io/assets/demos/peripheral-uart-demo.ht...
13 by es617 | 7 comments on Hacker News.
I got tired of sharing AI demos with terminal screenshots or screen recordings. Claude Code already stores full session transcripts locally as JSONL files. Those logs contain everything: prompts, tool calls, thinking blocks, and timestamps. I built a small CLI tool that converts those logs into an interactive HTML replay. You can step through the session, jump through the timeline, expand tool calls, and inspect the full conversation. The output is a single self-contained HTML file — no dependencies. You can email it, host it anywhere, embed it in a blog post, and it works on mobile. Repo: https://ift.tt/kVc30Zt Example replay: https://es617.github.io/assets/demos/peripheral-uart-demo.ht...
Thursday, March 5, 2026
Wednesday, March 4, 2026
Tuesday, March 3, 2026
New top story on Hacker News: Show HN: Open-Source Article 12 Logging Infrastructure for the EU AI Act
Show HN: Open-Source Article 12 Logging Infrastructure for the EU AI Act
12 by systima | 0 comments on Hacker News.
EU legislation (which affects UK and US companies in many cases) requires being able to truly reconstruct agentic events. I've worked in a number of regulated industries off & on for years, and recently hit this gap. We already had strong observability, but if someone asked me to prove exactly what happened for a specific AI decision X months ago (and demonstrate that the log trail had not been altered), I could not. The EU AI Act has already entered force, and its Article 12 kicks-in in August this year, requiring automatic event recording and six-month retention for high-risk systems, which many legal commentators have suggested reads more like an append-only ledger requirement than standard application logging. With this in mind, we built a small free, open-source TypeScript library for Node apps using the Vercel AI SDK that captures inference as an append-only log. It wraps the model in middleware, automatically logs every inference call to structured JSONL in your own S3 bucket, chains entries with SHA-256 hashes for tamper detection, enforces a 180-day retention floor, and provides a CLI to reconstruct a decision and verify integrity. There is also a coverage command that flags likely gaps (in practice omissions are a bigger risk than edits). The library is deliberately simple: TS, targeting Vercel AI SDK middleware, S3 or local fs, linear hash chaining. It also works with Mastra (agentic framework), and I am happy to expand its integrations via PRs. Blog post with link to repo: https://ift.tt/X2v3YSm I'd value feedback, thoughts, and any critique.
12 by systima | 0 comments on Hacker News.
EU legislation (which affects UK and US companies in many cases) requires being able to truly reconstruct agentic events. I've worked in a number of regulated industries off & on for years, and recently hit this gap. We already had strong observability, but if someone asked me to prove exactly what happened for a specific AI decision X months ago (and demonstrate that the log trail had not been altered), I could not. The EU AI Act has already entered force, and its Article 12 kicks-in in August this year, requiring automatic event recording and six-month retention for high-risk systems, which many legal commentators have suggested reads more like an append-only ledger requirement than standard application logging. With this in mind, we built a small free, open-source TypeScript library for Node apps using the Vercel AI SDK that captures inference as an append-only log. It wraps the model in middleware, automatically logs every inference call to structured JSONL in your own S3 bucket, chains entries with SHA-256 hashes for tamper detection, enforces a 180-day retention floor, and provides a CLI to reconstruct a decision and verify integrity. There is also a coverage command that flags likely gaps (in practice omissions are a bigger risk than edits). The library is deliberately simple: TS, targeting Vercel AI SDK middleware, S3 or local fs, linear hash chaining. It also works with Mastra (agentic framework), and I am happy to expand its integrations via PRs. Blog post with link to repo: https://ift.tt/X2v3YSm I'd value feedback, thoughts, and any critique.
Monday, March 2, 2026
Sunday, March 1, 2026
Saturday, February 28, 2026
New top story on Hacker News: Show HN: Tomoshibi – A writing app where your words fade by firelight
Show HN: Tomoshibi – A writing app where your words fade by firelight
11 by hakumei | 5 comments on Hacker News.
I spent ten years trying to write a novel. Every time I sat down, I'd write a sentence, decide it wasn't good enough, and rewrite it. The problem wasn't discipline — it was that I could always see what I'd written and go back to change it. I tried other approaches. Apps that delete your words when you stop typing — they fight fear with fear. That just made me panic. I wanted the opposite: not punishment, but permission. "Tomoshibi" is Japanese for a small light in the dark — just enough to see what's in front of you. You write on a dark screen. Older lines fade, but not when you hit return. They fade when you start writing again. If you pause, they wait. You can edit the current line and one line back — enough to fix a typo, not enough to spiral. The one-line-back rule also catches my own practical issue: Japanese IME often fires an accidental newline on kanji confirmation. Everything is saved. There's a separate reader view for going back through what you've written. Tomoshibi is for writing over months, not just one session. When you come back, your last sentence appears as an epigraph — as if it always belonged there. No account, no server, no build step. Your writing stays in your browser's local storage — export anytime as .txt. Vanilla HTML/CSS/ES modules. Try it in your browser. A native Mac app (built with Tauri) with file system integration is coming to the store. I've been writing on it for two months. https://ift.tt/Cx54l1w
11 by hakumei | 5 comments on Hacker News.
I spent ten years trying to write a novel. Every time I sat down, I'd write a sentence, decide it wasn't good enough, and rewrite it. The problem wasn't discipline — it was that I could always see what I'd written and go back to change it. I tried other approaches. Apps that delete your words when you stop typing — they fight fear with fear. That just made me panic. I wanted the opposite: not punishment, but permission. "Tomoshibi" is Japanese for a small light in the dark — just enough to see what's in front of you. You write on a dark screen. Older lines fade, but not when you hit return. They fade when you start writing again. If you pause, they wait. You can edit the current line and one line back — enough to fix a typo, not enough to spiral. The one-line-back rule also catches my own practical issue: Japanese IME often fires an accidental newline on kanji confirmation. Everything is saved. There's a separate reader view for going back through what you've written. Tomoshibi is for writing over months, not just one session. When you come back, your last sentence appears as an epigraph — as if it always belonged there. No account, no server, no build step. Your writing stays in your browser's local storage — export anytime as .txt. Vanilla HTML/CSS/ES modules. Try it in your browser. A native Mac app (built with Tauri) with file system integration is coming to the store. I've been writing on it for two months. https://ift.tt/Cx54l1w
Friday, February 27, 2026
New top story on Hacker News: Show HN: Unfudged – version control without commits
Show HN: Unfudged – version control without commits
8 by cyrusradfar | 4 comments on Hacker News.
I built unf after I pasted a prompt into the wrong agent terminal and it overwrote hours of hand-edits across a handful of files. Git couldn't help because I hadn't finished/committed my in progress work. I wanted something that recorded every save automatically so I could rewind to any point in time. I wanted to make it difficult for an agent to permanently screw anything up, even with an errant rm -rf unf is a background daemon that watches directories you choose (via CLI) and snapshots every text file on save. It stores file contents in an object store, tracks metadata in SQLite, and gives you a CLI to query and restore any version. The install includes a UI, as well to explore the history through time. The tool skips binaries and respects `.gitignore` if one exists. The interface borrows from git so it should feel familiar: unf log , unf diff , unf restore . I say "UN-EF" vs U.N.F, but that's for y'all to decide: I started by calling the project Unfucked and got unfucked.ai, which if you know me and the messes I get myself into, is a fitting purchase. The CLI command is `unf` and the Tauri desktop app is called "Unfudged" — the clean version. Didn’t want to force folks to have it in their apps, windows headers, etc. You can rag on me for my dad vibes. How it works: https://ift.tt/bBxI5vV (summary below) The daemon uses FSEvents on macOS and inotify on Linux. When a file changes, `unf` hashes the content with BLAKE3 and checks whether that hash already exists in the object store — if it does, it just records a new metadata entry pointing to the existing blob. If not, it writes the blob and records the entry. Each snapshot is a row in SQLite. Restores read the blob back from the object store and overwrite the file, after taking a safety snapshot of the current state first (so restoring is itself reversible). There are two processes. The core daemon does the real work of managing FSEvents/inotify subscriptions across multiple watched directories and writing snapshots. A sentinel watchdog supervises it, kept alive and aligned by launchd on macOS and systemd on Linux. If the daemon crashes, the sentinel respawns it and reconciles any drift between what you asked to watch and what's actually being watched. It was hard to build the second daemon because it felt like conceding that the core wasn't solid enough, but I didn't want to ship a tool that demanded perfection to deliver on the product promise, so the sentinel is the safety net. Fingers crossed, I haven’t seen it crash in over a week of personal usage on my Mac. But, I don't want to trigger "works for me" trauma. The part I like most: On the UI, I enjoy viewing files through time. You can select a time section and filter your projects on a histogram of activity. That has been invaluable in seeing what the agent was doing. On the CLI, the commands are composable. Everything outputs to stdout so you can pipe it into whatever you want. I use these regularly and AI agents are better with the tool than I am: # What did my config look like before we broke it? unf cat nginx.conf --at 1h | nginx -t -c /dev/stdin # Grep through a deleted file unf cat old-routes.rs --at 2d | grep "pub fn" # Count how many lines changed in the last 10 minutes unf diff --at 10m | grep '^[+-]' | wc -l # Feed the last hour of changes to an AI for review unf diff --at 1h | pbcopy # Compare two points in time with your own diff tool diff <(unf cat app.tsx --at 1h) <(unf cat app.tsx --at 5m) # Restore just the .rs files that changed in the last 5 minutes unf diff --at 5m --json | jq -r '.changes[].file' | grep '\.rs$' | xargs -I{} unf restore {} --at 5m # Watch for changes in real time watch -n5 'unf diff --at 30s' What was new for me: I came to Rust in Nov. 2025 honestly because of HN enthusiasm and some FOMO. No regrets. I enjoy the language enough that I'm now working on custom clippy lints to enforce functional programming practices. This project was also my first Apple-notarized DMG, my first Homebrew tap, and my second Tauri app (first one I've shared). Install & Usage: > brew install cyrusradfar/unf/unfudged Then unf watch in a directory. unf help covers the details (or ask your agent to coach).
8 by cyrusradfar | 4 comments on Hacker News.
I built unf after I pasted a prompt into the wrong agent terminal and it overwrote hours of hand-edits across a handful of files. Git couldn't help because I hadn't finished/committed my in progress work. I wanted something that recorded every save automatically so I could rewind to any point in time. I wanted to make it difficult for an agent to permanently screw anything up, even with an errant rm -rf unf is a background daemon that watches directories you choose (via CLI) and snapshots every text file on save. It stores file contents in an object store, tracks metadata in SQLite, and gives you a CLI to query and restore any version. The install includes a UI, as well to explore the history through time. The tool skips binaries and respects `.gitignore` if one exists. The interface borrows from git so it should feel familiar: unf log , unf diff , unf restore . I say "UN-EF" vs U.N.F, but that's for y'all to decide: I started by calling the project Unfucked and got unfucked.ai, which if you know me and the messes I get myself into, is a fitting purchase. The CLI command is `unf` and the Tauri desktop app is called "Unfudged" — the clean version. Didn’t want to force folks to have it in their apps, windows headers, etc. You can rag on me for my dad vibes. How it works: https://ift.tt/bBxI5vV (summary below) The daemon uses FSEvents on macOS and inotify on Linux. When a file changes, `unf` hashes the content with BLAKE3 and checks whether that hash already exists in the object store — if it does, it just records a new metadata entry pointing to the existing blob. If not, it writes the blob and records the entry. Each snapshot is a row in SQLite. Restores read the blob back from the object store and overwrite the file, after taking a safety snapshot of the current state first (so restoring is itself reversible). There are two processes. The core daemon does the real work of managing FSEvents/inotify subscriptions across multiple watched directories and writing snapshots. A sentinel watchdog supervises it, kept alive and aligned by launchd on macOS and systemd on Linux. If the daemon crashes, the sentinel respawns it and reconciles any drift between what you asked to watch and what's actually being watched. It was hard to build the second daemon because it felt like conceding that the core wasn't solid enough, but I didn't want to ship a tool that demanded perfection to deliver on the product promise, so the sentinel is the safety net. Fingers crossed, I haven’t seen it crash in over a week of personal usage on my Mac. But, I don't want to trigger "works for me" trauma. The part I like most: On the UI, I enjoy viewing files through time. You can select a time section and filter your projects on a histogram of activity. That has been invaluable in seeing what the agent was doing. On the CLI, the commands are composable. Everything outputs to stdout so you can pipe it into whatever you want. I use these regularly and AI agents are better with the tool than I am: # What did my config look like before we broke it? unf cat nginx.conf --at 1h | nginx -t -c /dev/stdin # Grep through a deleted file unf cat old-routes.rs --at 2d | grep "pub fn" # Count how many lines changed in the last 10 minutes unf diff --at 10m | grep '^[+-]' | wc -l # Feed the last hour of changes to an AI for review unf diff --at 1h | pbcopy # Compare two points in time with your own diff tool diff <(unf cat app.tsx --at 1h) <(unf cat app.tsx --at 5m) # Restore just the .rs files that changed in the last 5 minutes unf diff --at 5m --json | jq -r '.changes[].file' | grep '\.rs$' | xargs -I{} unf restore {} --at 5m # Watch for changes in real time watch -n5 'unf diff --at 30s' What was new for me: I came to Rust in Nov. 2025 honestly because of HN enthusiasm and some FOMO. No regrets. I enjoy the language enough that I'm now working on custom clippy lints to enforce functional programming practices. This project was also my first Apple-notarized DMG, my first Homebrew tap, and my second Tauri app (first one I've shared). Install & Usage: > brew install cyrusradfar/unf/unfudged Then unf watch in a directory. unf help covers the details (or ask your agent to coach).
Thursday, February 26, 2026
Wednesday, February 25, 2026
New top story on Hacker News: Large-Scale Online Deanonymization with LLMs
Large-Scale Online Deanonymization with LLMs
48 by DalasNoin | 71 comments on Hacker News.
Pdf: https://ift.tt/Y2ae3Js (via https://ift.tt/Rl9XOGQ )
48 by DalasNoin | 71 comments on Hacker News.
Pdf: https://ift.tt/Y2ae3Js (via https://ift.tt/Rl9XOGQ )
Tuesday, February 24, 2026
Monday, February 23, 2026
Sunday, February 22, 2026
Saturday, February 21, 2026
New top story on Hacker News: Claws are now a new layer on top of LLM agents
Claws are now a new layer on top of LLM agents
37 by Cyphase | 387 comments on Hacker News.
https://ift.tt/wGPeM5K
37 by Cyphase | 387 comments on Hacker News.
https://ift.tt/wGPeM5K
Friday, February 20, 2026
Thursday, February 19, 2026
Wednesday, February 18, 2026
Tuesday, February 17, 2026
New top story on Hacker News: Show HN: I taught LLMs to play Magic: The Gathering against each other
Show HN: I taught LLMs to play Magic: The Gathering against each other
33 by GregorStocks | 19 comments on Hacker News.
I've been teaching LLMs to play Magic: The Gathering recently, via MCP tools hooked up to the open-source XMage codebase. It's still pretty buggy and I think there's significant room for existing models to get better at it via tooling improvements, but it pretty much works today. The ratings for expensive frontier models are artificially low right now because I've been focusing on cheaper models until I work out the bugs, so they don't have a lot of games in the system.
33 by GregorStocks | 19 comments on Hacker News.
I've been teaching LLMs to play Magic: The Gathering recently, via MCP tools hooked up to the open-source XMage codebase. It's still pretty buggy and I think there's significant room for existing models to get better at it via tooling improvements, but it pretty much works today. The ratings for expensive frontier models are artificially low right now because I've been focusing on cheaper models until I work out the bugs, so they don't have a lot of games in the system.
Monday, February 16, 2026
New top story on Hacker News: Show HN: Maths, CS and AI Compendium
Show HN: Maths, CS and AI Compendium
12 by HenryNdubuaku | 1 comments on Hacker News.
Hey HN, I don’t know who else has the same issue, but: Textbooks often bury good ideas in dense notation, skip the intuition, assume you already know half the material, and get outdated in fast-moving fields like AI. Over the past 7 years of my AI/ML experience, I filled notebooks with intuition-first, real-world context, no hand-waving explanations of maths, computing and AI concepts. In 2024, a few friends used these notes to prep for interviews at DeepMind, OpenAI, Nvidia etc. They all got in and currently perform well in their roles. So I'm sharing. This is an open & unconventional textbook covering maths, computing, and artificial intelligence from the ground up. For curious practitioners seeking deeper understanding, not just survive an exam/interview. To ambitious students, an early careers or experts in adjacent fields looking to become cracked AI research engineers or progress to PhD, dig in and let me know your thoughts.
12 by HenryNdubuaku | 1 comments on Hacker News.
Hey HN, I don’t know who else has the same issue, but: Textbooks often bury good ideas in dense notation, skip the intuition, assume you already know half the material, and get outdated in fast-moving fields like AI. Over the past 7 years of my AI/ML experience, I filled notebooks with intuition-first, real-world context, no hand-waving explanations of maths, computing and AI concepts. In 2024, a few friends used these notes to prep for interviews at DeepMind, OpenAI, Nvidia etc. They all got in and currently perform well in their roles. So I'm sharing. This is an open & unconventional textbook covering maths, computing, and artificial intelligence from the ground up. For curious practitioners seeking deeper understanding, not just survive an exam/interview. To ambitious students, an early careers or experts in adjacent fields looking to become cracked AI research engineers or progress to PhD, dig in and let me know your thoughts.
Sunday, February 15, 2026
Saturday, February 14, 2026
Friday, February 13, 2026
New top story on Hacker News: Show HN: Moltis – AI assistant with memory, tools, and self-extending skills
Show HN: Moltis – AI assistant with memory, tools, and self-extending skills
12 by fabienpenso | 2 comments on Hacker News.
Hey HN. I'm Fabien, principal engineer, 25 years shipping production systems (Ruby, Swift, now Rust). I built Moltis because I wanted an AI assistant I could run myself, trust end to end, and make extensible in the Rust way using traits and the type system. It shares some ideas with OpenClaw (same memory approach, Pi-inspired self-extension) but is Rust-native from the ground up. The agent can create its own skills at runtime. Moltis is one Rust binary, 150k lines, ~60MB, web UI included. No Node, no Python, no runtime deps. Multi-provider LLM routing (OpenAI, local GGUF/MLX, Hugging Face), sandboxed execution (Docker/Podman/Apple Containers), hybrid vector + full-text memory, MCP tool servers with auto-restart, and multi-channel (web, Telegram, API) with shared context. MIT licensed. No telemetry phoning home, but full observability built in (OpenTelemetry, Prometheus). I've included 1-click deploys on DigitalOcean and Fly.io, but since a Docker image is provided you can easily run it on your own servers as well. I've written before about owning your content ( https://ift.tt/JsFUCj0 ) and owning your email ( https://ift.tt/suJipMO ). Same logic here: if something touches your files, credentials, and daily workflow, you should be able to inspect it, audit it, and fork it if the project changes direction. It's alpha. I use it daily and I'm shipping because it's useful, not because it's done. Longer architecture deep-dive: https://ift.tt/go689vA... Happy to discuss the Rust architecture, security model, or local LLM setup. Would love feedback.
12 by fabienpenso | 2 comments on Hacker News.
Hey HN. I'm Fabien, principal engineer, 25 years shipping production systems (Ruby, Swift, now Rust). I built Moltis because I wanted an AI assistant I could run myself, trust end to end, and make extensible in the Rust way using traits and the type system. It shares some ideas with OpenClaw (same memory approach, Pi-inspired self-extension) but is Rust-native from the ground up. The agent can create its own skills at runtime. Moltis is one Rust binary, 150k lines, ~60MB, web UI included. No Node, no Python, no runtime deps. Multi-provider LLM routing (OpenAI, local GGUF/MLX, Hugging Face), sandboxed execution (Docker/Podman/Apple Containers), hybrid vector + full-text memory, MCP tool servers with auto-restart, and multi-channel (web, Telegram, API) with shared context. MIT licensed. No telemetry phoning home, but full observability built in (OpenTelemetry, Prometheus). I've included 1-click deploys on DigitalOcean and Fly.io, but since a Docker image is provided you can easily run it on your own servers as well. I've written before about owning your content ( https://ift.tt/JsFUCj0 ) and owning your email ( https://ift.tt/suJipMO ). Same logic here: if something touches your files, credentials, and daily workflow, you should be able to inspect it, audit it, and fork it if the project changes direction. It's alpha. I use it daily and I'm shipping because it's useful, not because it's done. Longer architecture deep-dive: https://ift.tt/go689vA... Happy to discuss the Rust architecture, security model, or local LLM setup. Would love feedback.
New top story on Hacker News: Dario Amodei – "We are near the end of the exponential"
Dario Amodei – "We are near the end of the exponential"
31 by danielmorozoff | 35 comments on Hacker News.
31 by danielmorozoff | 35 comments on Hacker News.
Thursday, February 12, 2026
New top story on Hacker News: Show HN: Pgclaw – A "Clawdbot" in every row with 400 lines of Postgres SQL
Show HN: Pgclaw – A "Clawdbot" in every row with 400 lines of Postgres SQL
11 by calebhwin | 7 comments on Hacker News.
Hi HN, Been hacking on a simple way to run agents entirely inside of a Postgres database, "an agent per row". Things you could build with this: * Your own agent orchestrator * A personal assistant with time travel * (more things I can't think of yet) Not quite there yet but thought I'd share it in its current state.
11 by calebhwin | 7 comments on Hacker News.
Hi HN, Been hacking on a simple way to run agents entirely inside of a Postgres database, "an agent per row". Things you could build with this: * Your own agent orchestrator * A personal assistant with time travel * (more things I can't think of yet) Not quite there yet but thought I'd share it in its current state.
Wednesday, February 11, 2026
Tuesday, February 10, 2026
Monday, February 9, 2026
New top story on Hacker News: Discord will require a face scan or ID for full access next month
Discord will require a face scan or ID for full access next month
164 by x01 | 209 comments on Hacker News.
https://ift.tt/bT5NEJP... https://ift.tt/SXwWAdp...
164 by x01 | 209 comments on Hacker News.
https://ift.tt/bT5NEJP... https://ift.tt/SXwWAdp...
Sunday, February 8, 2026
Saturday, February 7, 2026
Friday, February 6, 2026
New top story on Hacker News: Show HN: I spent 4 years building a UI design tool with only the features I use
Show HN: I spent 4 years building a UI design tool with only the features I use
2 by vecti | 0 comments on Hacker News.
Hello everyone! I'm a solo developer who's been doing UI/UX work since 2007. Over the years, I watched design tools evolve from lightweight products into bloated feature-heavy platforms. I kept finding myself using a small amount of the features while the rest just mostly got in the way. So a few years ago I set out to build a design tool just like I wanted. So I built Vecti with what I actually need: pixel-perfect grid snapping, a performant canvas renderer, shared asset libraries, and export/presentation features. No collaborative whiteboarding. No plugin ecosystem. No enterprise features. Just the design loop. Four years later, I can proudly show it off. Built and hosted in the EU with European privacy regulations. Free tier available (no credit card, one editor forever). On privacy: I use some basic analytics (page views, referrers) but zero tracking inside the app itself. No session recordings, no behavior analytics, no third-party scripts beyond the essentials. If you're a solo designer or small team who wants a tool that stays out of your way, I'd genuinely appreciate your feedback: https://vecti.com Happy to answer questions about the tech stack, architecture decisions, why certain features didn't make the cut, or what's next.
2 by vecti | 0 comments on Hacker News.
Hello everyone! I'm a solo developer who's been doing UI/UX work since 2007. Over the years, I watched design tools evolve from lightweight products into bloated feature-heavy platforms. I kept finding myself using a small amount of the features while the rest just mostly got in the way. So a few years ago I set out to build a design tool just like I wanted. So I built Vecti with what I actually need: pixel-perfect grid snapping, a performant canvas renderer, shared asset libraries, and export/presentation features. No collaborative whiteboarding. No plugin ecosystem. No enterprise features. Just the design loop. Four years later, I can proudly show it off. Built and hosted in the EU with European privacy regulations. Free tier available (no credit card, one editor forever). On privacy: I use some basic analytics (page views, referrers) but zero tracking inside the app itself. No session recordings, no behavior analytics, no third-party scripts beyond the essentials. If you're a solo designer or small team who wants a tool that stays out of your way, I'd genuinely appreciate your feedback: https://vecti.com Happy to answer questions about the tech stack, architecture decisions, why certain features didn't make the cut, or what's next.
New top story on Hacker News: Show HN: Daily-updated database of malicious browser extensions
Show HN: Daily-updated database of malicious browser extensions
4 by toborrm9 | 2 comments on Hacker News.
Hey HN, I built an automated system that tracks malicious Chrome/Edge extensions daily. The database updates automatically by monitoring chrome-stats for removed extensions and scanning security blogs. Currently tracking 1000+ known malicious extensions with extension IDs, names, and dates. I'm working on detection tools (GUI + CLI) to scan locally installed extensions against this database, but wanted to share the raw data first since maintained threat intelligence lists like this are hard to find. The automation runs 24/7 and pushes updates to GitHub. Free to use for research, integration into security tools, or whatever you need. Happy to answer questions about the scraping approach or data collection methods.
4 by toborrm9 | 2 comments on Hacker News.
Hey HN, I built an automated system that tracks malicious Chrome/Edge extensions daily. The database updates automatically by monitoring chrome-stats for removed extensions and scanning security blogs. Currently tracking 1000+ known malicious extensions with extension IDs, names, and dates. I'm working on detection tools (GUI + CLI) to scan locally installed extensions against this database, but wanted to share the raw data first since maintained threat intelligence lists like this are hard to find. The automation runs 24/7 and pushes updates to GitHub. Free to use for research, integration into security tools, or whatever you need. Happy to answer questions about the scraping approach or data collection methods.
Thursday, February 5, 2026
Wednesday, February 4, 2026
Tuesday, February 3, 2026
New top story on Hacker News: Show HN: PII-Shield – Log Sanitization Sidecar with JSON Integrity (Go, Entropy)
Show HN: PII-Shield – Log Sanitization Sidecar with JSON Integrity (Go, Entropy)
4 by aragoss | 0 comments on Hacker News.
What PII-Shield does: It's a K8s sidecar (or CLI tool) that pipes application logs, detects secrets using Shannon entropy (catching unknown keys like "sk-live-..." without predefined patterns), and redacts them deterministically using HMAC. Why deterministic? So that "pass123" always hashes to the same "[HIDDEN:a1b2c]", allowing QA/Devs to correlate errors without seeing the raw data. Key features: 1. JSON Integrity: It parses JSON, sanitizes values, and rebuilds it. It guarantees valid JSON output for your SIEM (ELK/Datadog). 2. Entropy Detection: Uses context-aware entropy analysis to catch high-randomness strings. 3. Fail-Open: Designed as a transparent pipe wrapper to preserve app uptime. The project is open-source (Apache 2.0). Repo: https://ift.tt/gzwqx75 Docs: https://pii-shield.gitbook.io/docs/ I'd love your feedback on the entropy/threshold logic!
4 by aragoss | 0 comments on Hacker News.
What PII-Shield does: It's a K8s sidecar (or CLI tool) that pipes application logs, detects secrets using Shannon entropy (catching unknown keys like "sk-live-..." without predefined patterns), and redacts them deterministically using HMAC. Why deterministic? So that "pass123" always hashes to the same "[HIDDEN:a1b2c]", allowing QA/Devs to correlate errors without seeing the raw data. Key features: 1. JSON Integrity: It parses JSON, sanitizes values, and rebuilds it. It guarantees valid JSON output for your SIEM (ELK/Datadog). 2. Entropy Detection: Uses context-aware entropy analysis to catch high-randomness strings. 3. Fail-Open: Designed as a transparent pipe wrapper to preserve app uptime. The project is open-source (Apache 2.0). Repo: https://ift.tt/gzwqx75 Docs: https://pii-shield.gitbook.io/docs/ I'd love your feedback on the entropy/threshold logic!
New top story on Hacker News: The next steps for Airbus' big bet on open rotor engines
The next steps for Airbus' big bet on open rotor engines
16 by CGMthrowaway | 13 comments on Hacker News.
16 by CGMthrowaway | 13 comments on Hacker News.
Monday, February 2, 2026
New top story on Hacker News: Ask HN: Who wants to be hired? (February 2026)
Ask HN: Who wants to be hired? (February 2026)
35 by whoishiring | 82 comments on Hacker News.
Share your information if you are looking for work. Please use this format: Location: Remote: Willing to relocate: Technologies: Résumé/CV: Email: Please only post if you are personally looking for work. Agencies, recruiters, job boards, and so on, are off topic here. Readers: please only email these addresses to discuss work opportunities. There's a site for searching these posts at https://ift.tt/tO1Covp .
35 by whoishiring | 82 comments on Hacker News.
Share your information if you are looking for work. Please use this format: Location: Remote: Willing to relocate: Technologies: Résumé/CV: Email: Please only post if you are personally looking for work. Agencies, recruiters, job boards, and so on, are off topic here. Readers: please only email these addresses to discuss work opportunities. There's a site for searching these posts at https://ift.tt/tO1Covp .
Sunday, February 1, 2026
New top story on Hacker News: Show HN: Voiden – an offline, Git-native API tool built around Markdown
Show HN: Voiden – an offline, Git-native API tool built around Markdown
5 by dhruv3006 | 1 comments on Hacker News.
Hi HN, We have open-sourced Voiden. Most API tools are built like platforms. They are heavy because they optimize for accounts, sync, and abstraction - not for simple, local API work. Voiden treats API tooling as files. It’s an offline-first, Git-native API tool built on Markdown, where specs, tests, and docs live together as executable Markdown in your repo. Git is the source of truth. No cloud. No syncing. No accounts. No telemetry.Just Markdown, Git, hotkeys, and your damn specs. Voiden is extensible via plugins (including gRPC and WSS). Repo: https://ift.tt/3RH80LI Download Voiden here : https://ift.tt/o0M9HA1 We'd love feedback from folks tired of overcomplicated and bloated API tooling !
5 by dhruv3006 | 1 comments on Hacker News.
Hi HN, We have open-sourced Voiden. Most API tools are built like platforms. They are heavy because they optimize for accounts, sync, and abstraction - not for simple, local API work. Voiden treats API tooling as files. It’s an offline-first, Git-native API tool built on Markdown, where specs, tests, and docs live together as executable Markdown in your repo. Git is the source of truth. No cloud. No syncing. No accounts. No telemetry.Just Markdown, Git, hotkeys, and your damn specs. Voiden is extensible via plugins (including gRPC and WSS). Repo: https://ift.tt/3RH80LI Download Voiden here : https://ift.tt/o0M9HA1 We'd love feedback from folks tired of overcomplicated and bloated API tooling !
Saturday, January 31, 2026
Friday, January 30, 2026
Thursday, January 29, 2026
Wednesday, January 28, 2026
Tuesday, January 27, 2026
New top story on Hacker News: A few random notes from Claude coding quite a bit last few weeks
A few random notes from Claude coding quite a bit last few weeks
21 by bigwheels | 18 comments on Hacker News.
https://ift.tt/lqOLIdx
21 by bigwheels | 18 comments on Hacker News.
https://ift.tt/lqOLIdx