Google Dataproc is a fully-managed service that hosts open source distributed processing platforms such as Apache Spark, Presto, and Apache Hadoop on Google Cloud. Dataproc provides the flexibility to manage and configure clusters of varying size, on demand.
However, even with Dataproc users are responsible for right-sizing the cluster and identifying the hardware for running each node and the best job execution parameters. Finding a configuration that optimizes both cost and performance at the same, is more an art than a science - in most cases simply an impossible task even for the experts.
In this short video, we show how Akamas AI-powered optimization can effectively address this challenge by automatically identifying the optimal configuration that reduces the cost of the Google Dataproc service and speed-ups Spark applications.