Thursday, June 11, 2026

New top story on Hacker News: Show HN: I built a Red Flag Warning zone-check tool for the East Bay in 48h

Show HN: I built a Red Flag Warning zone-check tool for the East Bay in 48h
8 by vedant28t | 0 comments on Hacker News.
Hey HN. I'm a high schooler in Fremont, CA. Tuesday morning I got a county-wide AC Alert text telling everyone in Alameda County to prepare a go-bag for an East Bay Hills Red Flag Warning that starts tonight at 11 PM. The text went to ~half a million phones. The actual NWS warning polygon only covers East Bay Hills (NWS zone CAZ515). Most people who got the text don't need a go-bag tonight. Some in the hills don't realize how close they are. So I built this tool - https://ift.tt/sLHjuqi mit licensed public github - https://ift.tt/z5Ifhli It does a few things - tells people if they are in the flagged zone, and also provides a way to check if a buddy is in flagged zone and send them a text. Everything without installing an app. I heard back from Oakland Firesafe Council director about a gap in my understanding (and the tool). To my surprise, and through feedback, I realized that you cannot assume that only the flagged area is at risk. Adjacent areas are at risk too! Fires do not follow zone boundaries! I fixed the tool. I built this in 48 hours to close that specific gap: type your address, get a yes/no on whether the NWS polygon covers it, your Genasys evacuation zone, tonight's wind + humidity at your point, a plain-English action checklist, a per-school decision view for East Bay districts, and a one-tap iMessage buddy-check template for a hill-neighbor at 10:30 PM.

New top story on Hacker News: Claude Fable 5: mid-tier results on coding tasks

Claude Fable 5: mid-tier results on coding tasks
26 by bugvader | 0 comments on Hacker News.


New top story on Hacker News: Show HN: A police department for your Claude Code agents

Show HN: A police department for your Claude Code agents
2 by softie123 | 0 comments on Hacker News.


Thursday, June 4, 2026

New top story on Hacker News: Show HN: Cost.dev (YC W21) – making agents cost-aware and cheaper to call

Show HN: Cost.dev (YC W21) – making agents cost-aware and cheaper to call
8 by akh | 1 comments on Hacker News.
We launched Infracost on HN five years ago ( https://ift.tt/1wrugWz ) where our CLI generated cost estimates for infra-as-code, e.g. "this Terraform PR adds $400/mo". The idea was to shift cloud costs (FinOps) left, so engineers get visibility of costs before deployment and make better decisions. Earlier this year we started seeing agent traffic in our logs and it looked like coding agents were calling our CLI. But that CLI wasn't designed with coding agents in mind. We went down a philosophical rabbit hole to see if a CLI is even needed anymore given that Claude, Copilot et al. already follow best practices. Ultimately we decided to create a new CLI from the ground up with coding agents in mind for two reasons: 1. We optimized the CLI for agent callers and cut Claude's output token usage by up to 79% and API cost by up to 67% versus a bare-Claude baseline. We wrote a blog documenting our lessons on optimizing user token usage when designing a CLI, e.g. using predicate flags so the agent doesn't compose jq | python | wc pipelines, output format that strips JSON's redundant field names. The blog is here: https://ift.tt/a3pRmOb... 2. With cloud costs, precision matters. Telling a coding agent "make this Terraform cost-optimized" can be expensive and lossy. You burn tokens loading code and policy context into every conversation. Your agent could make up a price and you wouldn't know because it's difficult to verify that across the ~10M price points that AWS, Azure and Google have. The CLI runs static analysis on the code, uses the latest prices from cloud vendors, and passes that context to the coding agent. So that's what we're launching today - Cost.dev: https://cost.dev/ . - It runs locally. Your code never leaves your machine, you get a fast feedback loop, and you're not burning API calls per character when you want to fetch prices. - The CLI does the deterministic work. Fetching price points, scanning the code, validating fixes. The coding agent does the natural-language part. You don't have to trust the LLM to remember the rules, and can verify it called the right CLI command. - It provides a consistent rule layer across every tool you use. Get cost estimates in your IDE and your coding agent with a single install. We support Claude Code, GitHub Copilot, Cursor, Windsurf, OpenAI Codex, Gemini CLI, as well as IDEs like VS Code and JetBrains Before we keep building more in that direction, I want to sanity-check with HN: is "agents writing IaC in prod" actually a thing yet, or am I betting on a future that's still a year out? I know software developers are using coding agents heavily, but are platform/infra folks doing that for prod too? Also, if you have any feedback on Cost.dev, I'd love to hear it!

New top story on Hacker News: The desperation of NYTimes

The desperation of NYTimes
107 by rozumem | 80 comments on Hacker News.


Wednesday, June 3, 2026

New top story on Hacker News: Launch HN: Hyper (YC P26) – Company brain to power agentic development

Launch HN: Hyper (YC P26) – Company brain to power agentic development
16 by shalinshah | 6 comments on Hacker News.
Hey HN, we’re Shalin & Kanyes, best friends who've been hacking together for 10+yrs, and now founders of Hyper ( https://heyhyper.ai/ ). Hyper is a shared “company brain” that plugs into information flowing inside a company to make AI agents and automations better and ultimately save people time. Models have gotten good enough that they can (mostly) take on long-horizon, complex tasks. We believe the bottleneck now is that these smart-enough models often lack information about your company, which is scattered in people's heads, Slack threads, stale docs, and in back-and-forth convos with AI. MCP is useful for getting some info in front of an agent, but there are problems: (1) Once the session dies, so does the insight, so instead of copy-pasting a whole doc each time you're telling the agent to dig through Drive each time - not much of a win; (2) Even when MCP works, what it gathers isn't comprehensive, because people decide things on a whiteboard, brainstorm out loud, post a little in Slack, and scribble the rest in a doc, which leaves the agent working from partial information; (3) And even if it had everything, it doesn't do the meta-reasoning required to do a great job. If you paste in a Notion doc and it won't learn your design taste or your writing style unless you tell it to, and it won't know why a decision was made or when. As undergrads 5 years ago, we were into the tools-for-thought wave and became power users of Notion, Obsidian, Roam, Anki, real believers in building a second brain. After GPT-3.5 came out we started to realize how much more powerful that second brain could be if an AI could actually read it, because suddenly it would know our backstory, our taste, our preferences, and unlock genuinely new capabilities. That’s why we’re building Hyper. We know it’s not for everybody! But for people who do want to be on the cutting edge, this is a force multiplier that makes agents faster and better. It increases the number of tasks they can do, and how effectively they do them. Hyper works by ingesting everything you give it access to, Docs, Slack, Email, Calendar, Granola, and synthesizes it into a knowledge graph of facts and their relationships with embeddings for semantic search. The memory system we’ve built is hybrid, with two modalities. Episodes are the raw source items kept as the source of truth. Facts are the meaning pulled out of each episode, stored as subject-predicate-object records with a plain summary and timestamps for when the fact was introduced and when it was invalidated (subject=person, predicate=works_at, object=company). Facts form a graph with typed edges between them: X is in tension with Y, A is derived from B, J supersedes K. Every time a new fact comes in we update the facts in its neighborhood, so the graph stays current, and that's how we handle stale information. When "we'll ship Friday" is later contradicted by "we're shipping Monday," the new fact supersedes the old one instead of both looking equally true, and we never auto-discard the superseded version, so you can still ask how you landed on Monday. Every fact carries provenance back to its source and access-control tags for who is allowed to see it. At retrieval we query-expand, then fuse semantic search over embeddings with Postgres full-text search using reciprocal rank fusion, and we only ever evaluate a query against the facts and episodes that person has access to, which means two people on the same team can ask the same question and get different answers. We keep information fresh with webhooks where they exist and polling where they don't, hashing contents to catch changes for sources that don’t handle native dedupe. Agents read and write through two paths: lifecycle hooks in tools like Claude Code, Cowork, Codex, and Cursor, where we inject relevant context on every prompt and pull interesting facts out of every response, and plain MCP tool calls for everything that doesn't expose hooks. We love it! and so do our early users: one CEO uses Hyper to draft emails in his voice with full company context. What took hours/week now takes minutes and gets sharper each time Hyper learns more how he thinks and how his company is changing. Another YC founder one-shotted a launch video script because Hyper already knew their product, voice, positioning accumulated over months. We have a 3-day free trial, explained more on our pricing page ( https://ift.tt/z23S1fc ) and there are more details in our FAQ ( https://heyhyper.ai/faq ), including things like privacy, compliance, and how we’re different from other “memory” companies.. Give it a spin! break it! and tell us where it falls short: https://heyhyper.ai/ . We'd love to build you a 10-star experience :) Comments welcome!

New top story on Hacker News: Gooey: A GPU-accelerated UI framework for Zig

Gooey: A GPU-accelerated UI framework for Zig
22 by ksec | 0 comments on Hacker News.


New top story on Hacker News: Angular v22

Angular v22
20 by Klaster_1 | 4 comments on Hacker News.


New top story on Hacker News: I was recently diagnosed with anti-NMDA receptor encephalitis

I was recently diagnosed with anti-NMDA receptor encephalitis
103 by Tomte | 20 comments on Hacker News.


Tuesday, June 2, 2026

New top story on Hacker News: Show HN: RePlaya – self-hosted browser session replay with live tailing

Show HN: RePlaya – self-hosted browser session replay with live tailing
4 by shikhar | 0 comments on Hacker News.
Hi HN, I'm one of the founders of s2.dev. RePlaya ( https://ift.tt/6yGE3X5 ) is a self-hosted browser session replay tool using rrweb ( https://ift.tt/2oas0K5 ). It occurred to me that a durable stream per session would be a much neater architectural foundation for much of what you'd want from such a tool. As a unique feature, it also made live tailing straightforward because the player can read from the same stream the recorder is appending to. The alternative architecture is likely an ingest firehose which is then indexed, with associated complexity and latency. You'd have to string together multiple data systems like a message queue, a metadata database, and blob storage and/or an OLAP database. Here the only dependency is S2, which has an open source version you can self-host called s2-lite ( https://ift.tt/LOkswu2 ). How it works: - one S2 stream per browser session - large rrweb events (like a full snapshot) get framed across multiple binary S2 records and reassembled on read - active sessions are tailed with an S2 read session, and bridged to the browser over SSE - session listing relies on stream names encoding reverse timestamps, as S2 returns a lexicographic order listing - relying on fencing tokens so a stopped session can't be written to again by a late recorder - retention and GC are handled via S2 stream config, so no background job needed Curious to hear from folks on the tool or the stream-per-session model!

New top story on Hacker News: Larry Ellison: "Citizens will be on their best behavior because we’re recording" (2024)

Larry Ellison: "Citizens will be on their best behavior because we’re recording" (2024)
172 by CharlesW | 86 comments on Hacker News.


New top story on Hacker News: Trump signs downsized AI order after weeks of reversals

Trump signs downsized AI order after weeks of reversals
33 by _alternator_ | 16 comments on Hacker News.
https://ift.tt/aoGUtEY... https://ift.tt/fR0xvYL...

New top story on Hacker News: Rethinking Search as Code Generation

Rethinking Search as Code Generation
20 by 1zael | 1 comments on Hacker News.


Thursday, May 28, 2026

New top story on Hacker News: Endive: A JVM native WebAssembly runtime

Endive: A JVM native WebAssembly runtime
6 by theanonymousone | 1 comments on Hacker News.


New top story on Hacker News: Anthropic raises $65B in Series H funding at $965B post-money valuation

Anthropic raises $65B in Series H funding at $965B post-money valuation
8 by meetpateltech | 1 comments on Hacker News.


New top story on Hacker News: Show HN: Ktx – Open-source executable context layer for data agents

Show HN: Ktx – Open-source executable context layer for data agents
16 by lucamrtl | 3 comments on Hacker News.
Hi HN, we’re open-sourcing ktx. It’s an executable context layer that makes agents reliable on your data stack. We built it after going through the experience of building production-grade data agents for dozens of companies. If you’ve also tried building them, or simply tried using Claude Code or Codex on your data warehouse, you’ll know that accuracy is the #1 issue. Agents are great at generating valid SQL, but it’s not always correct SQL. To cite a few examples of “agents gone wrong”: - Stale column + hidden business rule: when preparing a board report, a finance analyst asks Claude Code for “ARR by customer segment”, it derives ARR from multiple tables (subscriptions, plans, accounts), then groups by accounts.industry. But CC doesn’t know that this industry column was deprecated a few months prior, or that past board reports excluded paused subscriptions from the ARR calculation - Join fanout: a data analyst at a retailer uses their company’s internal agent to prep a product revenue deck for a QBR. The agent joins orders to order_items, then sums orders.total_amount_cents grouped by order_items.product_id. The SQL runs fine, but each order’s revenue is repeated once per line item, which most people will miss if most orders only have 1 item - Missing attribution logic: a marketing analyst asks Codex “Which campaigns drove the most revenue?” Codex joins marketing_touches to users to orders and groups by utm_campaign. But since each order can have multiple touches before purchase, the same order can be credited to first touch, last touch, every touch, or every campaign the user clicked before buying. If the agent chooses the method that doesn’t match the team’s attribution logic, they’ll make suboptimal decisions To solve this at first we gave the agent more context through skills + a wiki-style knowledge base. That gives it some useful extra context but still relies on it writing the SQL without incorrect assumptions. The next solution we explored was implementing a classic semantic layer. That solves the executable part, but they’re such a pain to build and maintain since they were made for legacy BI tools. Plus as a standalone tool, they lack all the useful context from unstructured data sources like internal docs. So we built ktx and split it into 2 parts: 1. Business context goes in Markdown wiki pages that are auto-ingested and auto-populated 2. Queryable definitions go into YAML files that define tables, row grain, joins, measures, dimensions, filters, and filter groups That way, when an agent needs a metric, it asks ktx for a measure, dimensions, filters, and filter groups instead of writing the whole query itself. ktx’s planner chooses the join path, uses grain and relationship metadata, catches issues like join fanout and chasm joins, and compiles the warehouse SQL, while utilizing the extra unstructured knowledge it has access to. ktx is Apache 2.0. It can ingest from most warehouses (BigQuery, Snowflake, Postgres & others), modeling tools (dbt, MetricFlow, LookML), BI tools (Looker, Metabase), doc tools like Notion, and corrections from user interactions. Install manually: npm install -g @kaelio/ktx ktx setup Or give this prompt to your agent: Run npx skills add Kaelio/ktx --skill ktx and use ktx skill to install and configure ktx We’d especially like feedback from people who’ve tried using Claude Code, Codex, or building custom agents on analytics warehouses. Where did they fail? And what did you try to make the answers more reliable?

Wednesday, May 20, 2026

New top story on Hacker News: LoRA and Weight Decay (2023)

LoRA and Weight Decay (2023)
5 by jxmorris12 | 0 comments on Hacker News.


New top story on Hacker News: Cooling copper plates could slash data center energy use by 90%

Cooling copper plates could slash data center energy use by 90%
4 by geox | 0 comments on Hacker News.


New top story on Hacker News: Show HN: Hocuspocus 4 – self-hosted Yjs collaboration backend

Show HN: Hocuspocus 4 – self-hosted Yjs collaboration backend
11 by philipisik | 3 comments on Hacker News.
Hi HN! I'm Philip, one of the founders of Tiptap. Next to our open-source rich text editor framework, we started developing Hocuspocus about five years ago and open-sourced it too, to solve one of our biggest challenges back then: real-time collaboration in web editors. We found Yjs by Kevin Jahns, a CRDT library that handles concurrent edits without conflicts. Basically, Yjs merges changes from users without conflicts and in real-time. Hocuspocus is the WebSocket server built on top of Yjs. It handles real-time sync, presence/awareness, persistence, and Redis-based scaling. While we use Hocuspocus at Tiptap as the collaboration backend for our cloud services, it also works with any Yjs client (Slate, Quill, Monaco, ProseMirror, or your own setup), and Yjs documents aren't limited to text at all. You can sync any structured data through them, and in the meantime we see projects that rely on Hocuspocus without using the Tiptap editor. We released Hocuspocus v4 under the MIT license a few weeks ago, and the biggest change is that it's no longer tied to Node. The previous versions depended on the ws package, which meant you couldn't run Hocuspocus on Bun, Deno, or Cloudflare Workers. We moved to crossws, a universal websocket adapter, so the same server now runs on Node, Bun, Deno, Cloudflare Workers, and Node with uWebSockets. That also lets you run collaboration at the edge. The other changes are smaller but are important if you're using Hocuspocus in production: 1. Every core class and hook payload takes a generic Context type now, so the auth/session shape you build in onAuthenticate flows through every other hook with full type safety (defaults to any so existing code doesn't break). 2. Document updates are now processed sequentially per connection through an internal queue, which fixes a correctness bug where async hooks could cause CRDT updates to apply out of order under load. 3. Transaction origins are structured objects now with a source field instead of raw values and there's an isTransactionOrigin() helper for narrowing. 4. Hook payloads use web-standard Request and Headers instead of Node's IncomingMessage. 5. The wire protocol is backward compatible in both directions, so you can roll out servers and providers independently. If you want to test Hocuspocus: npm install @hocuspocus/server @hocuspocus/provider Docs at: https://ift.tt/ATQLu6l Source at: https://ift.tt/R0XxyU9 Because running real-time collaboration on Workers or Durable Objects is new in v4, that's the use case we'd most like to hear your questions and feedback on.

Tuesday, May 12, 2026

New top story on Hacker News: Show HN: Statewright – Visual state machines that make AI agents reliable

Show HN: Statewright – Visual state machines that make AI agents reliable
8 by azurewraith | 0 comments on Hacker News.
Agentic problem solving in its current state is very brittle. I fell in love with it, but it creates as many problems as it solves. I'm Ben Cochran, I spent 20+ years in the trenches with full-stack Engineering, DevOps, high performance computing & ML with stints at NVIDIA, AMD and various other organizations most recently as a Distinguished Engineer. For agents to work reliably you either need massive parameter counts or massive context windows to keep the solution spaces workable. Most people are brute forcing reliability with bigger models and longer prompts. What if I made the problem smaller instead of making the model bigger? I took a different approach by using smaller models: models in the 13-20B parameter range and set them to task solving real SWE-bench problems. I constrained the tool and solution spaces using formal state machines. Each state in the machine defines which tools the model can access, how many iterations it gets and what transitions are valid. A planning state gets read-only tools. An implementation state gets edit tools (scoped to prevent mega edits) and write friendly bash tools. The testing state gets bash but only for testing commands. The model cannot physically skip steps or use the wrong tool at the wrong time. It is enforced via protocol, not via prompts. The results were more promising than I would have expected. Across multiple model families irrespective of age (qwen-coder, gpt-oss, gemma4) and the improvements were consistent above the 13B parameter inflection point. Below that, models can navigate the state machine but can't retain enough context to produce accurate edits. More on the research bit: https://ift.tt/2foS1dF Surprisingly this yielded improvements in frontier models as well. Haiku and Sonnet start to punch above their weight and Opus solves more reliably with fewer tokens and death spirals. Fine tuning did not yield these kinds of functional improvements for me. The takeaway it seems is that context window utilization matters more than raw context size - a tightly scoped working context at each step outperforms a model given carte blanche over everything. Constraining LLMs which are non-idempotent by using deterministic code is a pattern that nobody is currently talking about. So, I built Statewright. Its core is a Rust engine that evaluates state machine definitions: states, transitions, guards and tool restrictions. Its orchestration doesn't use an LLM, just enforces the state machine. On top of that is a plugin layer that integrates with Claude Code (and soon Codex, Cursor and others) via MCP. When you activate a workflow, hooks enforce the guardrails per state automatically. The model sees 5 tools available instead of dozens, gets clear instructions for the current phase and transitions when conditions are met. Importantly it tells the model when it's attempting to do something that isn't in scope, incorrect or when it needs to try something else after getting stuck. You can use your agent via MCP to build a state machine for you to solve a problem in your current context. The visual editor at statewright.ai lets you tweak these workflows in a graph view... You can clearly see the failure paths, the retry loops and the approval gates. State machines aren't DAGs; they loop and retry, which is what agentic work actually needs. Statewright is currently live with a free tier, try it out in Claude Code by running the following: /plugin marketplace add statewright/statewright /plugin install statewright /reload-plugins Then "start the bugfix workflow" or /statewright start bugfix. You'll need to paste your API key when prompted. The latest versions of Claude may complain -- paste the API key again and say you really mean it, Claude is just being cautious here. Feedback is welcome on the workflow editor, the plugin experience, and tell me what workflows you'd want to build first. Agents are suggestions, states are laws.

Thursday, May 7, 2026

New top story on Hacker News: Brazil's Pix Payment System Faces Pressure from Visa and Mastercard

Brazil's Pix Payment System Faces Pressure from Visa and Mastercard
9 by wslh | 1 comments on Hacker News.


New top story on Hacker News: Show HN: Stage CLI – an easier way of reading your AI generated changes locally

Show HN: Stage CLI – an easier way of reading your AI generated changes locally
8 by cpan22 | 2 comments on Hacker News.
Hey HN! We're Charles and Dean. A few weeks ago we posted about Stage, a code review tool that guides you through reading a PR step by step - https://ift.tt/Kf1pD4I . We got a lot of great feedback but also heard from many people that they wanted to have the chapters experience even before opening a PR… so we built the Stage CLI as the local, open-source version that anyone can try. Here’s a quick demo video: https://ift.tt/qWrjyoQ It works with any coding agent of your choice. The skill instructs the agent to read your current branch’s changes, break them down into separate logical chapters, and open them in a local browser. We’ve found that reading changes this way is a lot easier for us than reading them in an IDE or other similar CLI tools, which present diffs to you in repository tree order. You can see a few examples of what it feels like here: https://ift.tt/pY0zi1k . Try it out and let us know what you think! Would love to hear any feedback :)

New top story on Hacker News: Printing Blogs

Printing Blogs
11 by fi-le | 1 comments on Hacker News.


Saturday, April 18, 2026

New top story on Hacker News: Show HN: AI Subroutines – Run automation scripts inside your browser tab

Show HN: AI Subroutines – Run automation scripts inside your browser tab
7 by arjunchint | 0 comments on Hacker News.
We built AI Subroutines in rtrvr.ai. Record a browser task once, save it as a callable tool, replay it at: zero token cost, zero LLM inference delay, and zero mistakes. The subroutine itself is a deterministic script composed of discovered network calls hitting the site's backend as well as page interactions like click/type/find. The key architectural decision: the script executes inside the webpage itself, not through a proxy, not in a headless worker, not out of process. The script dispatches requests from the tab's execution context, so auth, CSRF, TLS session, and signed headers get added to all requests and propagate for free. No certificate installation, no TLS fingerprint modification, no separate auth stack to maintain. During recording, the extension intercepts network requests (MAIN-world fetch/XHR patch + webRequest fallback). We score and trim ~300 requests down to ~5 based on method, timing relative to DOM events, and origin. Volatile GraphQL operation IDs are detected and force a DOM-only fallback before they break silently on the next run. The generated code combines network calls with DOM actions (click, type, find) in the same function via an rtrvr.* helper namespace. Point the agent at a spreadsheet of 500 rows and with just one LLM call parameters are assigned and 500 Subroutines kicked off. Key use cases: - record sending IG DM, then have reusable and callable routine to send DMs at zero token cost - create routine getting latest products in site catalog, call it to get thousands of products via direct graphql queries - setup routine to file EHR form based on parameters to the tool, AI infers parameters from current page context and calls tool - reuse routine daily to sync outbound messages on LinkedIn/Slack/Gmail to a CRM using a MCP server We see the fundamental reason that browser agents haven't taken off is that for repetitive tasks going through the inference loop is unnecessary. Better to just record once, and get the LLM to generate a script leveraging all the possible ways to interact with a site and the wider web like directly calling backed API's, interacting with the DOM, and calling 3P tools/APIs/MCP servers.

New top story on Hacker News: Traders placed over $1B in perfectly timed bets on the Iran war

Traders placed over $1B in perfectly timed bets on the Iran war
13 by trocado | 3 comments on Hacker News.


New top story on Hacker News: Graphs That Explain the State of AI in 2026

Graphs That Explain the State of AI in 2026
6 by bryanrasmussen | 1 comments on Hacker News.


Tuesday, April 14, 2026

New top story on Hacker News: Show HN: A memory database that forgets, consolidates, and detects contradiction

Show HN: A memory database that forgets, consolidates, and detects contradiction
9 by pranabsarkar | 2 comments on Hacker News.
Vector databases store memories. They don't manage them. After 10k memories, recall quality degrades because there's no consolidation, no forgetting, no conflict resolution. Your AI agent just gets noisier. YantrikDB is a cognitive memory engine — embed it, run it as a server, or connect via MCP. It thinks about what it stores: consolidation collapses duplicate memories, contradiction detection flags incompatible facts, temporal decay with configurable half-life lets unimportant memories fade like human memory does. Single Rust binary. HTTP + binary wire protocol. 2-voter + 1-witness HA cluster via Docker Compose or Kubernetes. Chaos-tested failover, runtime deadlock detection (parking_lot), per-tenant quotas, Prometheus metrics. Ran a 42-task hardening sprint last week — 1178 core tests, cargo-fuzz targets, CRDT property tests, 5 ops runbooks. Live on a 3-node Proxmox homelab cluster with multiple tenants. Alpha — primary user is me, looking for the second one.

Thursday, April 9, 2026

New top story on Hacker News: Little Snitch comes to Linux, but the core logic is closed source

Little Snitch comes to Linux, but the core logic is closed source
12 by TheIPW | 3 comments on Hacker News.


New top story on Hacker News: ChatGPT Pro now starts at $100/month

ChatGPT Pro now starts at $100/month
44 by strongpigeon | 33 comments on Hacker News.


New top story on Hacker News: Research-Driven Agents: What Happens When Your Agent Reads Before It Codes

Research-Driven Agents: What Happens When Your Agent Reads Before It Codes
14 by hopechong | 2 comments on Hacker News.


New top story on Hacker News: Where does all the milk go?

Where does all the milk go?
12 by DiffTheEnder | 4 comments on Hacker News.


New top story on Hacker News: FreeBSD Laptop Compatibility: Top Laptops to Use with FreeBSD

FreeBSD Laptop Compatibility: Top Laptops to Use with FreeBSD
24 by fork-bomber | 5 comments on Hacker News.


New top story on Hacker News: Introduction to Nintendo DS Programming

Introduction to Nintendo DS Programming
3 by medbar | 0 comments on Hacker News.


Monday, April 6, 2026

New top story on Hacker News: Smart people recognize each other – science proves it

Smart people recognize each other – science proves it
24 by 01-_- | 9 comments on Hacker News.


New top story on Hacker News: Show HN: GovAuctions lets you browse government auctions at once

Show HN: GovAuctions lets you browse government auctions at once
20 by player_piano | 13 comments on Hacker News.
I've long been into finding deals on government auction sites (seizures, surplus sales etc.) - right now for example San Diego DHS is selling 26 tons of lead shot, with bidding starting at $1,000 ¯\_(ツ)_/¯ It has historically been extremely tedious though: scanning dozens of janky sites which have interminable page loading times; back buttons take you all the way back to the homepage etc. The site I built - GovAuctions - lets you search every government surplus auction at once. You can filter by location, category, and price, save items to a watchlist, and get alerts when new auctions match what you're looking for. Let me know what you think, if you have any suggestions, and if you find any deals in your area!

Sunday, March 29, 2026

Friday, March 27, 2026

New top story on Hacker News: Show HN: Sup AI, a confidence-weighted ensemble (52.15% on Humanity's Last Exam)

Show HN: Sup AI, a confidence-weighted ensemble (52.15% on Humanity's Last Exam)
15 by supai | 9 comments on Hacker News.
Hi HN. I'm Ken, a 20-year-old Stanford CS student. I built Sup AI. I started working on this because no single AI model is right all the time, but their errors don’t strongly correlate. In other words, models often make unique mistakes relative to other models. So I run multiple models in parallel and synthesize the outputs by weighting segments based on confidence. Low entropy in the output token probability distributions correlates with accuracy. High entropy is often where hallucinations begin. My dad Scott (AI Research Scientist at TRI) is my research partner on this. He sends me papers at all hours, we argue about whether they actually apply and what modifications make sense, and then I build and test things. The entropy-weighting approach came out of one of those conversations. In our eval on Humanity's Last Exam, Sup scored 52.15%. The best individual model in the same evaluation run got 44.74%. The relative gap is statistically significant (p < 0.001). Methodology, eval code, data, and raw results: - https://sup.ai/research/hle-white-paper-jan-9-2026 - https://github.com/supaihq/hle Limitations: - We evaluated 1,369 of the 2,500 HLE questions (details in the above links) - Not all APIs expose token logprobs; we use several methods to estimate confidence when they don't We tried offering free access and it got abused so badly it nearly killed us. Right now the sustainable option is a $5 starter credit with card verification (no auto-charge). If you don't want to sign up, drop a prompt in the comments and I'll run it myself and post the result. Try it at https://sup.ai . My dad Scott (@scottmu) is in the thread too. Would love blunt feedback, especially where this really works for you and where it falls short. Here's a short demo video: https://www.youtube.com/watch?v=DRcns0rRhsg

Saturday, March 21, 2026

New top story on Hacker News: Former FBI Director Robert Mueller Has Died

Former FBI Director Robert Mueller Has Died
26 by WarOnPrivacy | 5 comments on Hacker News.


New top story on Hacker News: Show HN: Joonote – A note-taking app on your lock screen and notification panel

Show HN: Joonote – A note-taking app on your lock screen and notification panel
7 by kilgarenone | 0 comments on Hacker News.
I finally built this app after many years of being sick of unlocking my phone every goddamn time I need to take or view my notes. It particularly sucks when I'm doing my grocery and going down the list. I started building last year June. This is a native app written in Kotlin. And since I'm a 100% Web dev guy, I gotta say this wouldn't have been possible without this AI to assist me. So this isn't "vibe-coded". I simply used the chat interface in Gemini website, manually copy paste codes to build and integrate every single thing in the app! I used gemini to build it just because I was piggybacking on my last company's enterprise subscription. I personally didn't subscribe to any AI (and still don't cuz the free quota seems enough for me :) So I certainly have learnt alot about Android development, architecture patterns, Kotlin syntax, and obeying Google's whims. Can't say I love it all, but for the sake of this app, I will :) Anyway, I finally have the app I wish existed, and I'm using it everyday. It not only does the main thing I needed it to do, but there's also all this stuff: - Make your notes private if you don't want to show them on lock screen. - Create check/to-do lists. - Set one time or recurring reminders. - Full-text search your notes in the app. - Speech-to-text. - Organize your notes with custom or color labels. - Pin the app as a widget on your home screen. - You can auto backup and restore your notes on new install or Android device. - Works offline. - And no funny business happening in the background https://ift.tt/0on3IWz It's 30-day trial, then a one-time $9.99 to go Pro forever. I would love you all to check it out, FWIW. Ok thanks!

Thursday, March 19, 2026

New top story on Hacker News: Android: Balancing Openness and Choice with Safety

Android: Balancing Openness and Choice with Safety
14 by 0xedb | 2 comments on Hacker News.


New top story on Hacker News: Scaling Karpathy's Autoresearch: What Happens When the Agent Gets a GPU Cluster

Scaling Karpathy's Autoresearch: What Happens When the Agent Gets a GPU Cluster
28 by hopechong | 6 comments on Hacker News.


New top story on Hacker News: Love of corporate bullshit is correlated with bad judgment

Love of corporate bullshit is correlated with bad judgment
40 by hn_acker | 7 comments on Hacker News.


New top story on Hacker News: Show HN: Dumped Wix for an AI Edge agent so I never have to hire junior staff

Show HN: Dumped Wix for an AI Edge agent so I never have to hire junior staff
5 by axotopia | 2 comments on Hacker News.
I run a building design consultancy. I got tired of paying Wix $40/month for a brochure that couldn’t answer simple service questions, and me wasting hours on the same FAQs. So I killed it all and spent 4 months building a 'talker': https://axoworks.com The stack is completely duct-taped: Netlify’s 10s serverless timeout forced me to split the agent into three pieces: Brain (Edge), Hands (Browser), and Voice (Edge). I haven’t coded in 30 years. This was 3 steps forward, 2 steps back, heavily guided by AI. The fight that proved it worked: 2 weeks ago, a licensed architect attacked the bot, trying to prove my business model harms the profession. The AI (DeepSeek-R3) completely dismantled his arguments. It was hilariously caustic. Log: https://ift.tt/82iQkYn... A few battle scars: * Web Speech API works fine, right up until someone speaks Chinese without toggling the language mode. Then it forcefully spits out English phonetic gibberish. Still a headache. * Liability is the killer. Hallucinate a building code clause? We’re dead. Insurance won’t touch us. * We publish the audit logs to keep ourselves honest and make sure the system stays hardened. Audit: https://ift.tt/4sTxpUE The hardest part was getting the intent right: making one LLM pivot seamlessly from a warm principal’s tone with a homeowner, to a defensive bulldog when attacked by a peer. That took 2.5 months of tuning. We burn through tokens with an 'Eager RAG' hack (pre-fetching guesses) just to improve responsiveness. I also ripped out the “essential” persistent DBs—less than 5% of visitors ever return, so why bother? If a client drops mid-query, their session vanishes. No server-side queues. The point: To let me operate with a network of seasoned pros, and trim the fat. Try to break it. I’ll be in the comments. Kee

Wednesday, March 18, 2026

Sunday, March 15, 2026

New top story on Hacker News: Grandparents are glued to their phones, families are worried [video]

Grandparents are glued to their phones, families are worried [video]
68 by tartoran | 32 comments on Hacker News.


New top story on Hacker News: SuperTux 0.7.0

SuperTux 0.7.0
20 by pentagrama | 1 comments on Hacker News.


New top story on Hacker News: Ask HN: How is AI-assisted coding going for you professionally?

Ask HN: How is AI-assisted coding going for you professionally?
18 by svara | 6 comments on Hacker News.
Comment sections on AI threads tend to split into "we're all cooked" and "AI is useless." I'd like to cut through the noise and learn what's actually working and what isn't, from concrete experience. If you've recently used AI tools for professional coding work, tell us about it. What tools did you use? What worked well and why? What challenges did you hit, and how (if at all) did you solve them? Please share enough context (stack, project type, team size, experience level) for others to learn from your experience. The goal is to build a grounded picture of where AI-assisted development actually stands in March 2026, without the hot air.

New top story on Hacker News: In Memoriam: John W. Addison, my PhD advisor

In Memoriam: John W. Addison, my PhD advisor
9 by herodotus | 0 comments on Hacker News.


Sunday, March 8, 2026

New top story on Hacker News: WSL Manager

WSL Manager
9 by gballan | 2 comments on Hacker News.


New top story on Hacker News: Show HN: Skir – like Protocol Buffer but better

Show HN: Skir – like Protocol Buffer but better
6 by gepheum | 1 comments on Hacker News.
Why I built Skir: https://ift.tt/JtT8yro... Quick start: npx skir init All the config lives in one YML file. Website: https://skir.build GitHub: https://ift.tt/6jspZtY Would love feedback especially from teams running mixed-language stacks.

Friday, March 6, 2026

New top story on Hacker News: Show HN: Claude-replay – A video-like player for Claude Code sessions

Show HN: Claude-replay – A video-like player for Claude Code sessions
13 by es617 | 7 comments on Hacker News.
I got tired of sharing AI demos with terminal screenshots or screen recordings. Claude Code already stores full session transcripts locally as JSONL files. Those logs contain everything: prompts, tool calls, thinking blocks, and timestamps. I built a small CLI tool that converts those logs into an interactive HTML replay. You can step through the session, jump through the timeline, expand tool calls, and inspect the full conversation. The output is a single self-contained HTML file — no dependencies. You can email it, host it anywhere, embed it in a blog post, and it works on mobile. Repo: https://ift.tt/kVc30Zt Example replay: https://es617.github.io/assets/demos/peripheral-uart-demo.ht...

Tuesday, March 3, 2026

New top story on Hacker News: Show HN: Open-Source Article 12 Logging Infrastructure for the EU AI Act

Show HN: Open-Source Article 12 Logging Infrastructure for the EU AI Act
12 by systima | 0 comments on Hacker News.
EU legislation (which affects UK and US companies in many cases) requires being able to truly reconstruct agentic events. I've worked in a number of regulated industries off & on for years, and recently hit this gap. We already had strong observability, but if someone asked me to prove exactly what happened for a specific AI decision X months ago (and demonstrate that the log trail had not been altered), I could not. The EU AI Act has already entered force, and its Article 12 kicks-in in August this year, requiring automatic event recording and six-month retention for high-risk systems, which many legal commentators have suggested reads more like an append-only ledger requirement than standard application logging. With this in mind, we built a small free, open-source TypeScript library for Node apps using the Vercel AI SDK that captures inference as an append-only log. It wraps the model in middleware, automatically logs every inference call to structured JSONL in your own S3 bucket, chains entries with SHA-256 hashes for tamper detection, enforces a 180-day retention floor, and provides a CLI to reconstruct a decision and verify integrity. There is also a coverage command that flags likely gaps (in practice omissions are a bigger risk than edits). The library is deliberately simple: TS, targeting Vercel AI SDK middleware, S3 or local fs, linear hash chaining. It also works with Mastra (agentic framework), and I am happy to expand its integrations via PRs. Blog post with link to repo: https://ift.tt/X2v3YSm I'd value feedback, thoughts, and any critique.

Saturday, February 28, 2026

New top story on Hacker News: Show HN: Tomoshibi – A writing app where your words fade by firelight

Show HN: Tomoshibi – A writing app where your words fade by firelight
11 by hakumei | 5 comments on Hacker News.
I spent ten years trying to write a novel. Every time I sat down, I'd write a sentence, decide it wasn't good enough, and rewrite it. The problem wasn't discipline — it was that I could always see what I'd written and go back to change it. I tried other approaches. Apps that delete your words when you stop typing — they fight fear with fear. That just made me panic. I wanted the opposite: not punishment, but permission. "Tomoshibi" is Japanese for a small light in the dark — just enough to see what's in front of you. You write on a dark screen. Older lines fade, but not when you hit return. They fade when you start writing again. If you pause, they wait. You can edit the current line and one line back — enough to fix a typo, not enough to spiral. The one-line-back rule also catches my own practical issue: Japanese IME often fires an accidental newline on kanji confirmation. Everything is saved. There's a separate reader view for going back through what you've written. Tomoshibi is for writing over months, not just one session. When you come back, your last sentence appears as an epigraph — as if it always belonged there. No account, no server, no build step. Your writing stays in your browser's local storage — export anytime as .txt. Vanilla HTML/CSS/ES modules. Try it in your browser. A native Mac app (built with Tauri) with file system integration is coming to the store. I've been writing on it for two months. https://ift.tt/Cx54l1w

New top story on Hacker News: Werner Herzog Between Fact and Fiction

Werner Herzog Between Fact and Fiction
5 by Hooke | 0 comments on Hacker News.