Rendered at 22:15:25 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
steve_adams_86 20 hours ago [-]
I was just wishing something like this existed last week. What timing.
I'm piping sensor readings into duckdb with a deno server, and couldn't use duckdb -ui to look over the data without shutting down the server. I had no interest in using the server to allow me to look at the contents of the db, so I was just going to live with it for now. This perfectly solves that, along with several other similar kinds of problems I've encountered with duckdb.
duckdb is my favourite technology of 2025/26. It has worked its way into so many of my workflows. It's integral to how I work with LLMs, how I store all kinds of data, analytics, data pipelines... I love it.
malnourish 10 hours ago [-]
Can you expand more on how you use it in your workflows? I'm very interested but I haven't incorporated it into my problem solving mindset yet so I don't even know what use cases I could map to it.
rglover 1 days ago [-]
This is rad. I've been eyeballing using DuckDB in my firm's internal app framework and this just solved the "but how do I horizontally scale this" problem. Kudos to the DuckDB folks. Love "Quack" for the protocol name, too.
smithclay 17 hours ago [-]
Been working on open-source projects involving storing and querying observability data (metrics, logs, traces) in parquet[0] and have been frustrated with the usability of Apache Iceberg … despite strongly agreeing and wanting to use an open storage format and catalog.
This makes Ducklake much more interesting for my use case, excited where this is going.
That said… think duckdb/ducklake/quack could potentially be a future replacement for Mimir or Clickstack with way less operational complexity.
simlevesque 1 days ago [-]
I like DuckDB but I'm not sure what it wants to be. There's always new ways to use it and it's not easy to see what's the right one.
wenc 1 days ago [-]
DuckDB is both a standalone and a component. This effort is actually very coherent and brings it back into a familiar usage model — that of a traditional client server RDBMS.
RDBMS have always been multi-user concurrent systems. DuckDB is a very fast local engine that has a multitude of use cases because it is a embeddable in other systems.
It’s like saying what does SQLite wanna be? It’s in your phones, your browser, your desktop apps, iot devices and people have extended it in different directions. The only difference here is that this is first party not third party. But to me it’s a very legible move.
philipallstar 12 hours ago [-]
If SQLite added a protocol and client/server code to talk to other SQLites, it might get similar questions.
simlevesque 19 hours ago [-]
SQLite isn't a moving target like DuckDB is. It's scope is very well defined.
I'm not knocking Quack or DuckDB but I'm starting to get a bit confused.
wenc 9 hours ago [-]
But why though? DuckDB can still be used as a local query engine — I still use it as that. I haven’t touched any of the DuckLake stuff and the duckdb cli and Python library are still my bread and butter. They can add new use cases, but it doesn’t affect the core engine.
Is the concern that the duckdb messaging is now diluted by it having all these extra features? That you can’t sell it to friends as “this thing” like you can a one use tool like curl? I get that, but I also feel that duckdb is so much bigger than a “do one thing and do it well” tool.
It’s an engine that drives the modern data tool stack. Duckdb’s team has been prescient in that it has made many tasteful bets on what users want —- the ability to interop with pandas and polars, addition of geospatial, the plug-in infra. They’re all optional but when you neeed these things, they’re so useful. They’ve also clued me into what the broader data world is thinking about (I didn’t know about sketches and hilbert, but those are so useful in probailistic large scale queries and in geospatial queries). And they exist in larger database systems like Redshift too.
So far duckdb’s bets have been tasteful, and mostly ignorable if you don’t happen to use them.
fastasucan 16 hours ago [-]
Just find the one that is right for you.
whalesalad 1 days ago [-]
Our data pipeline produces .duckdb files that our app downloads (it watches the asset in S3 and pulls when etag changes). Makes it easy to get BQ/Clickhouse like performance without running or paying for that infrastructure. Not perfect for all cases, but it handles a lot more than you would expect.
duzer65657 1 days ago [-]
this is a great use-case for duckdb, but not sure how it maps to the use of this protocol?
esafak 22 hours ago [-]
Roughly how big are the datasets?
whalesalad 21 hours ago [-]
~30GB .duckdb file
1 days ago [-]
slotix 1 days ago [-]
I read it less as "DuckDB wants to become Postgres" and more as DuckDB becoming an execution layer inside bigger workflows.
The engine is often not the painful part anymore. The pain is the stuff around it: live DBs, S3 paths, Parquet files, credentials, repeatable runs, exports, validation, and the moment a one-off script quietly becomes infrastructure.
Quack makes the remote/server part cleaner, but the bigger trend seems to be DuckDB becoming the SQL layer inside tools, not necessarily the final user-facing tool.
Lemaxoxo 1 days ago [-]
+1
I can't think of many use cases for this and Arrow Flight, other than moving data around.
twoodfin 1 days ago [-]
The use case is local user DuckDB talking to MotherDuck for $.
This is not commercially a terrible idea. Why keep paying Snowflake for bog-standard SQL query workload when SF makes it easy to migrate to Iceberg & commodity engines like MotherDuck?
szarnyasg 1 days ago [-]
Hello, DuckDB DevRel here. Quack is independent from MotherDuck. MotherDuck has its own proprietary protocol, which has been around for years and it supports things like dual execution – see more here:
Of course, in the future MotherDuck can also support Quack, but this is not the only interesting use case for Quack.
twoodfin 1 days ago [-]
Sure! Not knocking the architecture: Building out peer-to-peer federation in place of client/server makes perfect sense for DuckDB. And I’m a big fan of owning the protocol so you can optimize it to internal structures.
Just making the point that DuckDB is disruptive technology & what it’s most likely to disrupt.
simlevesque 19 hours ago [-]
MotherDuck is very expensive.
dmkii 10 hours ago [-]
Compared to what exactly? Snowflake? Hiring an engineer to deploy DuckDB? A hobby project? FWIW I work at MotherDuck so obviously biased, but curious to hear what makes you say that.
ks2048 21 hours ago [-]
"moving data around" is what millions of people of do all day, every day.
jtbaker 1 days ago [-]
uh, doing analytics type queries on large datasets that postgres would choke on, as an RPC? I'm using it (ducklake specifically) to build a lakehouse RPC server that can scale horizontally based on resource utilization in k8s.
Lemaxoxo 1 days ago [-]
Right, I get that usecase. You have to crunch numbers that sit somewhere, and store the outputs in the same place. DuckLake is great for that. But where does this DuckDB client-server setup fit in?
jtbaker 1 days ago [-]
Sounds like it means you don't have to wire up the RPC server yourself anymore? Just build a docker container that invokes this quack server command, expose it over the network and connect to it from remote clients using your own access controls?
Ducklake handles the metadata and storage, but a local duckdb instance connected to it still has to do the compute itself. This lets you federate access to the compute.
Fun for me, I just finished a big streaming implementation doing essentially the same thing in Go-gRPC with arrow table record batches. It was fun though.
feverzsj 18 hours ago [-]
They didn't explain what "concurrent writers" is. But seems it's just serialized writes on server side.
geysersam 12 hours ago [-]
I don't think that's correct. DuckDB already supports concurrent writes within one process. I don't see why this would suddenly serialize all writes.
NortySpock 1 days ago [-]
Sounds useful for small-ball internal analytics datasets you want to place on shared team server.
I can definitely see exploring this for some homelab use.
arpinum 1 days ago [-]
With ducklake this scales well to multi-terabyte data sets. The big benefit of this server protocol is sharing a high memory server and taking advantage of a shared cache for recent data.
hermitcrab 1 days ago [-]
I have a C++ application. Everything is in memory during execution. Saved to disk between session as XML. Works great, except that that it is strictly single user and some of my customers would love me to generalize it for multiple concurrent users reading and writing. Performance requirements are quite low - a few thousand records being updated by 2 or 3 people at a time. Would DuckDb + Quack be a good choice for this? Or are there better choices? I looked at SQLite, but I understand it doesn't operate as client server.
password4321 20 hours ago [-]
https://firebirdsql.org has been flying under the radar in-between SQLite and full-blown PostgreSQL for decades, but if you're asking which client-server database to use PostgreSQL is the default recommendation.
hermitcrab 14 hours ago [-]
Did some reading. Given my modest performance requirements, Firebird might be a good choice due to simpler install and admin. Thanks.
downsplat 12 hours ago [-]
If postgres is too heavyweight for you but you still want client-server, I'd consider MySql. It's an old classic, pretty fast and scalable, and has much better mainstream support and a bigger ecosystem than Firebird.
I'm not really sure what Firebird is for at this point in life really. It was pretty exciting when it was open sourced in the early 2000s, before postgres became the mature beast it is, before mysql acquired something as basic as transactions, and before sqlite became the default embedded db. But then it never really went anywhere.
hermitcrab 11 hours ago [-]
Good to know. Thanks.
appplication 24 hours ago [-]
DuckDB is more for analytics. I don’t think you’re going to find good options for a DB that can handle concurrent users without hosting it in some way server side. It’s certainly possible (think how some games create their own client servers for direct multiplayer) but honestly hosting Postgres or SQLite is ridiculously cheap, easy, and more importantly the standard approach to this issue.
hermitcrab 23 hours ago [-]
IIRC SQLite is in-process and says in it's documentation that it is not a client-server database.
setr 8 hours ago [-]
It’s not, but you could do something like https://litestream.io/ and just continuously replicate it to pretend to be multi-user
hermitcrab 6 hours ago [-]
Does SQLite + replication have any advantages over client-server, for someone not already using SQLite?
WebBurnout 11 hours ago [-]
Sounds like a good use case for CRDTs, which would also enable offline editing
hermitcrab 11 hours ago [-]
In my use case I have 2 or 3 users editing the same database concurrently and they all want to see other's updates in near real time (within a second or two). Would a CRDT support that? It would be great if it did and I could just keep using XML to persist everything with no server. But that sounds unlikely.
apitman 22 hours ago [-]
I think the term you want to search for is local-first.
hermitcrab 14 hours ago [-]
My understanding is that Local First means syncs across multiple devices, which is not the same thing as multi-user concurrent access.
apitman 2 hours ago [-]
It's both. I recommend looking into it a bit deeper
This is fantastic. I’ve been building an Excel-like but columnar spreadsheet app using DuckDB and had to reinvent the “client” through classic HTTP layer.
hona_mind 20 hours ago [-]
The "what does DuckDB want to be" question keeps coming up, but I think the answer is already clear: it wants to be the SQLite of analytics. Embedded, zero-config, works everywhere. Quack is just the part that makes "everywhere" include remote.
boruto 8 hours ago [-]
I think a DuckDB cookbook by them would be excellent.
mritchie712 1 days ago [-]
> Can I use DuckDB with Quack as the catalog database for DuckLake?
> Not yet, but we are working on it!
Seems like a niche use case, but it's the one I'm most interested in.
Our lakehouse uses ducklake with postgres as the catalog. Seems like a DuckDB / Quack catalog would be an excellent alternative.
pdet 1 days ago [-]
I think that Quack will become the primary option for a DuckLake catalog in the future, for several reasons. To list a few:
1. No type mismatches for inlining. If you use a non-DuckDB catalog, many types do not have a 1:1 mapping, which introduces additional overhead when operating on those data types.
2. You get the raw performance of DuckDB analytics (and now transactions) over the catalog. DuckDB reading DuckDB is simply faster than any of our Postgres/SQLite scanners.
3. No round-trip for retries. We can easily(tm) run the full retry logic on the DuckDB server side. Right now, these retries trigger multiple round trips for Postgres, making it a performance bottleneck for high-contention workloads.
Disclaimer: I'm a duckdb/ducklake developer.
dangoodmanUT 22 hours ago [-]
This. Type casting is an insidious problem (both correctness, and perf)
Does this mean I can finally connect to a ducklake instnace hosted remotely? i.e. DuckLake is writing to disk on the remote server and my client is just a client.
Because rn even with Postgres as a catalog my client needs access to the underlying storage to use Ducklake.
szarnyasg 1 days ago [-]
Yes, Quack resolves this problem. In particular, your client (likely a DuckDB instance) will talk to a remote DuckDB that both has access to the underlying storage and can also serve as the catalog itself.
mritchie712 9 hours ago [-]
already works now! just tried it out
ashkankiani 23 hours ago [-]
My first thought: setting up a self replicating duckdb wrapper over ssh so that I can execute queries on any computer. Can’t wait to play with this!
timsuchanek 19 hours ago [-]
This is very exciting. Now we just need this for Postgres as well.
ozgrakkurt 1 days ago [-]
> It would be rather misguided not to build a database protocol on top of HTTP in 2026
This is wrong, HTTP is bad for transferring large amount of data and it is also bad for doing streaming.
It is bad for large amount of data because you have timeout issues on some clients, you hit request/response size limits etc.
It is obviously bad for streaming as there is no concept of streaming in it.
It is comical to go the path of least resistance so lazy people can put a reverse proxy on top of it. And then say HTTP is the only relevant way to do it in 2026.
The benchmark doesn't seem to mean much as TCP can max out 50GB/s on a single thread. Pretty sure it can do more than that even. So you could be using anything that isn't terrible and you should get max performance out of this.
Also the protocol is something else from the format. For example if you are transferring mp4 over ftp and http you can compare that.
If you are transferring different things over different protocols then the comparison means nothing.
The benchmark graph for bulk transfer should show more granularity so it is possible to understand how much of the % of the hardware limit it is reaching. Similar to how BLAS GEMM routines are benchmarked based on the % of theoretical max flops of the hardware.
> 60 million rows (76 GB in CSV format!)
This reads a bit disingenuous.
It is dissappointing to see this instead of something like PostgreSQL protocol with support for a columnar format.
arpinum 1 days ago [-]
It uses http/2, it has streaming.
geysersam 23 hours ago [-]
They mention in the benchmarks section that the network they're on is a "up to" 15 Gbps connection. So to max out 50GB/s is not realistic.
I agree they should have also listed the compressed size of the table instead of only mentioning the CSV size. But the compressed dataset is probably not smaller than 1/10 of the CSV size. If that's the case they're transferring ~8GB in 4.6 s on a 2GB/s (15Gbps) connection. Seems pretty close to max.
ozgrakkurt 21 hours ago [-]
That makes sense. I meant to write 50gbps, I don’t mean they should reach that, I mean you could use any protocol that is fairly efficient and it would reach that.
The size of the dataset should be under 3GB in parquet from what I understand. [0]
So it did 3*8/4.94 = 4.85 Gbps which is underwhelming in terms of network performance.
It is still not possible to make any conclusions since we don’t know how specifically they encode it or how they are running the query.
I just mean this writing is useless in terms of engineering perspective, also what it says about http doesn’t make sense
Agreed, that does seem a bit underwhelming. Hopefully there are some performance gains to be made before the production release in september.
jpdenford 14 hours ago [-]
They also wanted the protocol to work with duckdb wasm in the browser. I can’t comment on the performance side but that consistency piece is pretty key to duckdbs value proposition I think.
duzer65657 1 days ago [-]
really like duckdb and sorry to pile on, but the parent makes some strong points. I wonder if MotherDuck builds on http as well?
jdnier 19 hours ago [-]
The parent reads more like "it works in practice but does it work in theory?" The innovations that have come out of the DuckDB team seem to always focus on "in practice" instead of focusing on how things are supposed to (or are expected to) be done.
matsonj 19 hours ago [-]
no we don't (source: work at motherduck)
znite 1 days ago [-]
Does this work with duckdb-wasm?
neomantra 9 hours ago [-]
Although a maintainer answered you, watch the video from the blog. There's a WASM demo at the end, which is great. It also has a good explainer for those confused about the HTTP decision.
And I appreciate that the Hannes still appreciates the magic of the WASM. [And I keep hearing quark which makes me hungry for tangy creamy German yogurt]
PhilippGille 1 days ago [-]
It's in the article:
> HTTP also allows the DuckDB-Wasm distribution to speak Quack natively! So DuckDB running in a browser can e.g., directly connect to a DuckDB instance running in an EC2 server using Quack.
anentropic 8 hours ago [-]
I missed that and it seems like one of the more compelling features...!
znite 20 hours ago [-]
Thanks, thought I searched for it & didn't come up. Great stuff
philipallstar 12 hours ago [-]
That is a pretty amazing feature.
hfmuehleisen 1 days ago [-]
Maintainer here. Yes!
znite 20 hours ago [-]
Thanks, thought I searched for it & didn't come up. Great stuff
I'm piping sensor readings into duckdb with a deno server, and couldn't use duckdb -ui to look over the data without shutting down the server. I had no interest in using the server to allow me to look at the contents of the db, so I was just going to live with it for now. This perfectly solves that, along with several other similar kinds of problems I've encountered with duckdb.
duckdb is my favourite technology of 2025/26. It has worked its way into so many of my workflows. It's integral to how I work with LLMs, how I store all kinds of data, analytics, data pipelines... I love it.
This makes Ducklake much more interesting for my use case, excited where this is going.
[0] https://github.com/smithclay/duckdb-otlp
That said… think duckdb/ducklake/quack could potentially be a future replacement for Mimir or Clickstack with way less operational complexity.
RDBMS have always been multi-user concurrent systems. DuckDB is a very fast local engine that has a multitude of use cases because it is a embeddable in other systems.
It’s like saying what does SQLite wanna be? It’s in your phones, your browser, your desktop apps, iot devices and people have extended it in different directions. The only difference here is that this is first party not third party. But to me it’s a very legible move.
I'm not knocking Quack or DuckDB but I'm starting to get a bit confused.
Is the concern that the duckdb messaging is now diluted by it having all these extra features? That you can’t sell it to friends as “this thing” like you can a one use tool like curl? I get that, but I also feel that duckdb is so much bigger than a “do one thing and do it well” tool.
It’s an engine that drives the modern data tool stack. Duckdb’s team has been prescient in that it has made many tasteful bets on what users want —- the ability to interop with pandas and polars, addition of geospatial, the plug-in infra. They’re all optional but when you neeed these things, they’re so useful. They’ve also clued me into what the broader data world is thinking about (I didn’t know about sketches and hilbert, but those are so useful in probailistic large scale queries and in geospatial queries). And they exist in larger database systems like Redshift too.
So far duckdb’s bets have been tasteful, and mostly ignorable if you don’t happen to use them.
The engine is often not the painful part anymore. The pain is the stuff around it: live DBs, S3 paths, Parquet files, credentials, repeatable runs, exports, validation, and the moment a one-off script quietly becomes infrastructure.
Quack makes the remote/server part cleaner, but the bigger trend seems to be DuckDB becoming the SQL layer inside tools, not necessarily the final user-facing tool.
I can't think of many use cases for this and Arrow Flight, other than moving data around.
This is not commercially a terrible idea. Why keep paying Snowflake for bog-standard SQL query workload when SF makes it easy to migrate to Iceberg & commodity engines like MotherDuck?
https://duckdb.org/quack/faq#what-is-the-relationship-betwee...
Of course, in the future MotherDuck can also support Quack, but this is not the only interesting use case for Quack.
Just making the point that DuckDB is disruptive technology & what it’s most likely to disrupt.
Ducklake handles the metadata and storage, but a local duckdb instance connected to it still has to do the compute itself. This lets you federate access to the compute.
Fun for me, I just finished a big streaming implementation doing essentially the same thing in Go-gRPC with arrow table record batches. It was fun though.
I can definitely see exploring this for some homelab use.
I'm not really sure what Firebird is for at this point in life really. It was pretty exciting when it was open sourced in the early 2000s, before postgres became the mature beast it is, before mysql acquired something as basic as transactions, and before sqlite became the default embedded db. But then it never really went anywhere.
https://www.inkandswitch.com/essay/local-first/
> Not yet, but we are working on it!
Seems like a niche use case, but it's the one I'm most interested in.
Our lakehouse uses ducklake with postgres as the catalog. Seems like a DuckDB / Quack catalog would be an excellent alternative.
1. No type mismatches for inlining. If you use a non-DuckDB catalog, many types do not have a 1:1 mapping, which introduces additional overhead when operating on those data types.
2. You get the raw performance of DuckDB analytics (and now transactions) over the catalog. DuckDB reading DuckDB is simply faster than any of our Postgres/SQLite scanners.
3. No round-trip for retries. We can easily(tm) run the full retry logic on the DuckDB server side. Right now, these retries trigger multiple round trips for Postgres, making it a performance bottleneck for high-contention workloads.
Disclaimer: I'm a duckdb/ducklake developer.
So you'll be able to test it in a few days.
Because rn even with Postgres as a catalog my client needs access to the underlying storage to use Ducklake.
This is wrong, HTTP is bad for transferring large amount of data and it is also bad for doing streaming.
It is bad for large amount of data because you have timeout issues on some clients, you hit request/response size limits etc.
It is obviously bad for streaming as there is no concept of streaming in it.
It is comical to go the path of least resistance so lazy people can put a reverse proxy on top of it. And then say HTTP is the only relevant way to do it in 2026.
The benchmark doesn't seem to mean much as TCP can max out 50GB/s on a single thread. Pretty sure it can do more than that even. So you could be using anything that isn't terrible and you should get max performance out of this.
Also the protocol is something else from the format. For example if you are transferring mp4 over ftp and http you can compare that.
If you are transferring different things over different protocols then the comparison means nothing.
The benchmark graph for bulk transfer should show more granularity so it is possible to understand how much of the % of the hardware limit it is reaching. Similar to how BLAS GEMM routines are benchmarked based on the % of theoretical max flops of the hardware.
> 60 million rows (76 GB in CSV format!)
This reads a bit disingenuous.
It is dissappointing to see this instead of something like PostgreSQL protocol with support for a columnar format.
I agree they should have also listed the compressed size of the table instead of only mentioning the CSV size. But the compressed dataset is probably not smaller than 1/10 of the CSV size. If that's the case they're transferring ~8GB in 4.6 s on a 2GB/s (15Gbps) connection. Seems pretty close to max.
The size of the dataset should be under 3GB in parquet from what I understand. [0]
So it did 3*8/4.94 = 4.85 Gbps which is underwhelming in terms of network performance.
It is still not possible to make any conclusions since we don’t know how specifically they encode it or how they are running the query.
I just mean this writing is useless in terms of engineering perspective, also what it says about http doesn’t make sense
[0] - https://clickhouse.com/docs/getting-started/example-datasets...
And I appreciate that the Hannes still appreciates the magic of the WASM. [And I keep hearing quark which makes me hungry for tangy creamy German yogurt]
> HTTP also allows the DuckDB-Wasm distribution to speak Quack natively! So DuckDB running in a browser can e.g., directly connect to a DuckDB instance running in an EC2 server using Quack.