About Us

Who We Are - KwikQuery

TabbyDB is a specialized fork of Apache Spark designed to optimize complex queries. It significantly reduces compilation time and memory usage for queries with nested joins, complex case statements, and large query trees through intelligent compile-time and runtime enhancements.

Spark SQL Modules Optimized in TabbyDB

TabbyDB transforms complex query execution by drastically reducing compile and runtime durations. Our specialized optimizations target intricate case statements, nested joins, and expansive query trees, delivering measurable improvements that empower data teams to handle demanding workloads with ease and efficiency.

N x Performance Improvements

TPCDS is not a realistic benchmark for performance of apache Spark as analytics engine. The complexity of SQL queries is limited due to input being a String and limits on level of nesting.

TPCDS Benchmark for 1 TB and 2TB , for Spark with Hive External non partitioned tables, 13 % improvement in execution time was seen in both the cases . This does not take into account the impact of compile time optimizations as the TPCDS queries are not that complex. Also the improvement is seen across the board ( so not an outlier effect)

Analytic queries, especially those created by some looping logic using DataFrame APIs, can become extremely large and Stock Spark is seen to take hours ( > 1 Hour to 8 hours) to compile and even then may fail with OutOfMemory. KwikQuery’s TabbyDB will be able to bring down those times to realistic levels of minutes / seconds.

Performance Enhancements That Redefine Complex Query Execution

Optimize complex queries for faster, more efficient execution.

Time
Optimization

Changing at fundamental level, the algorithm of some of the critical rules like constraints propagation, collapsing the project nodes early (analysis phase), minimizing the calls to hive meta store & other modifications, tremendously improve compile-time performance.

Advanced
Broadcast

Our fork optimizes the broadcast hash joins on non partitioned columns, to do dynamic file pruning, boosting the runtime performance of nested join queries. In a limited TPCDS testing, it has shown 13% performance improvement in time taken, compared to stock spark.

Caches
Optimization

The cache lookup of in-memory plans is now more intelligent, improving the hit rate of successful lookups and reducing unnecessary computations. This heightened sensitivity can significantly enhance overall runtime performance and efficiency for complex queries.

Scalable
Query

The new rules & algorithm change allow for collapse of projects in the analysis phase, thereby capping the tree size. This results in savings in terms of time taken to compile, preventing out of memory errors.

Seamless
Integration

While boosting performance, KwikQuery's TabbyDB, retains full compatibility with Apache Spark’s APIs and features, allowing users to leverage familiar tools with enhanced speed.

System
Reliability

KwikQuery’s TabbyDB ensures consistent and reliable query execution, even under heavy workloads, while maintaining full Spark compatibility and minimizing unexpected failures.

Ready to optimize your complex queries?

You can even convert your existing Spark 4.0.1 to TabbyDB 4.0.1, by replacing existing jars. If you want to go back to apache spark 4.0.1, just remove the TabbyDB jars and bring back spark jars. Or Just share your details our representative will contact your shortly.

Please choose the Demo below!

Please select your desired option from here.