Ivan surister

Why Queryzen.

Rapid development

Explicar el workflow actual: para sacar data de una bdd a una aplicacion. [surister]
Explicar el de queryzen en yuxtaposición con el anterior (es mucho mejor.) [darkfus]

parameterized inputs, versioning, rollbacks, monitoring

Explain every point with slides/graphs. [surister]

Demo de Queryzen de la API, Que simple!

Demo personal!!! [darkfus]

Right now CrateDB has ..

What other databases are doing:

postgres: In postgres 18, uuid4() (RFC), uuid7(rfc), gen_random_uuid() (uuid4 RFC)[^1]
tinybird: generateUUIDv4() (RFC) and clickhouse[^2]
cockroachdb: gen_random_uuid, uuid_v4 (RFC) and unique_rowid (timebased sortable)[^3]
singlestore: UUID4 (RFC)[^4]
firebase: UUID4, custom token (timebased sortable)[^5]
elasticsearch: elasticflake (like us), k-ordered (timebased sortable)

How to build a hybrid search service in Python with CrateDB.

In this talk Ivan, a Database Ecosystem Engineer at @CrateDB is going to show you how to create from scratch a hybrid search (key-word search and vector search ) service in Python with CrateDB.

He will give/explain you:

what is and why we want hybrid search
have a quick look at CrateDB
how to create Python client library to do hybrid search
how to use FastApi to build a web service
how to use everything we created to make a documentation search service for our technical documentation at CrateDB.

Query ran at 2025-01-27T21:35:03.043Z on CrateDB 6.0.0

SELECT
  table_name,
  SUM(num_docs) as records,
  (SUM(size) / (1024 * 1024)) as total_size_mib,
  (SUM(size) / count(*)) / (1024 * 1024) as avg_size_per_shard_in_mib,
  (SUM(size) / SUM(num_docs) :: DOUBLE) as avg_size_in_bytes_per_record
FROM
  sys.shards

CrateDB - Storage usage on disk

CrateDB stores data in a row and column store, on top of that, it automatically creates an index, on reads the index will be leveraged, and depending on the query, it will use the most efficient store.

This is one of the many features that makes CrateDB very fast when reading and aggregating data, but it has an impact on storage.

We are going to use Yellow taxi trip - January 2024 which has 2_964_624 rows

Hybrid Index: The secret to blazingly fast queries on any data structure @ CrateDB

One of the most effective ways to improve query performance is through indexing. At CrateDB, we said, what's faster than one index? everything indexed! - We took the bold approach: indexing every column by default. But we didn't stop there—we leverage multiple data structures for every indexed column. At query time, CrateDB intelligently selects the optimal index based on the query type, enabling faster and more efficient results.

But you probably have many questions. Does this actually work? How did you do it? Isn't there a performance penalty on write speed? And updates? How about storage size?

In this talk we will tell you all about Hybrid Idexes, one of the fundamental aspects of CrateDB: an Open-source distributed SQL Database for Real-Time Analytics and Hybrid Search.

In https://cratedb.com/blog/hybrid-search-explained we learned about Hybrid Search and how to do it in pure SQL, the resulting query can be hard to understand if you don't have too much experience doing Common table expressions (CTEs), in this piece we will dive deeper into CTEs and the smaller details of the query.

Recap

In the last chapter, we learned that Hybrid Search is pretty much doing some queries that capture different meanings and combine them, don't forget about this as we will see how CTEs are very similar.

##Common Table Expressions.

CTEs are subqueries that can be referenced from a main query, they are temporal, meaning that outside of this main query, they do not exist.

	services:
	cratedb01:
	image: crate/crate:5.10.1
	ports:
	- "4200:4200"
	- "5432:5432"
	volumes:
	- cratedb1:/data
	command: ["crate",
	"-Chttp.cors.enabled=true",

	<!--
	In order to display math equations in vue3, you need to use something like mathjax or katex, I prefer
	katex since it seems to be the most powerful solution.

	vue3 katex libraries are mostly unmaintained or don't properly work on my setup, as of 2024-12-07 they bug out
	on my latest vue3 + nuxt projects, this is the most simple way I made it to work.


	You only need to run

	There are already a bunch of hybrid search in haystack past conferences:

	EU 2023: (Mastering Hybrid Search: Blending Classic Ranking Functions with Vector Search for Superior Search Relevance)[https://haystackconf.com/eu2023/talk-10/]
	EU 2023: (Reciprocal Rank Fusion (RRF) or How to Stop Worrying about Boosting)[https://haystackconf.com/eu2023/talk-2/]
	US 2024: (All Vector Search is Hybrid Search)[https://haystackconf.com/us2024/talk-1/]
	US 2024: (Better Semantic Search with Hybrid (Sparse-Dense) Search)

	# Doing hybrid search on your real-time data in pure SQL with CrateDB's index-all strategy.

	Points to highlight: