StreamBench is a project to measure the performance of popular streaming engines using Yahoo Streaming Benchmark.
We compare the performance of an efficient stream processing engine designed for single servers, SABER, with that achieved by popular distributed stream processing systems, Apache Spark and Apache Flink. We also compare the results to that by StreamBox, another recently proposed single-server design that emphases out-of-order processing of data. Based on our results, we argue that a single multicore server can provide better throughput than a multi-node cluster for many streaming applications. This opens an opportunity to cut down system complexity and operational costs by replacing cluster-based stream processing systems with (potentially replicated) single server deployments.
This repository contains code for running the Yahoo Streaming Benchmark in SABER, Spark Streaming, Apache Flink and StreamBox. For Spark and Flink, we follow the approach from previous blogposts by Databricks and DataArtisans. We provide a script for each of these engines to setup and run the benchmark on a single node. The code can be configured to run on a distributed deployment as well.
The Yahoo Streaming Benchmark was designed to emulate an advertisement streaming application. It has a streaming query with four operators: filter, project, join (with relational data) and aggregate (a windowed count).
How to run the code
For every engine, the script provided installs, builds and runs the engines as well as the streaming query.
StreamBench is brought to you by George Theodorakis, Panagiotis Garefalakis, Alexandros Koliousis, Holger Pirk, Peter Pietzuch