Logs can really add up. Let’s learn to make like a tree and reduce them via convenient built-in methods.

Apache Spark alone, by default, generates a lot of information in its logs. Spark Streaming creates a metric ton more (in fairness, there’s a lot going on). So, how do we lower that gargantuan wall of text to something more manageable?

One way is to lower the log level for the Spark Context, which is retrieved from the Streaming Context. Simply:

val conf = new SparkConf().setAppName(appName) // run on cluster
val ssc = new StreamingContext(conf, Seconds(5))
val sc = ssc.sparkContext

Pretty easy, right?

3 thoughts

  1. Are you deploying this in cluster mode? Because I tried this, and it didn’t seem to make a difference. Desperately need to reduce logging. 😀


    1. Hey Vetle!

      Most of my apps are deployed in YARN mode, though I imagine this would apply to cluster mode as well. What mode are you setting your logging to (INFO, ERROR, WARN, etc)?


      1. I was setting to all the levels, just to try to change *something* 🙂

        I ended up with a solution I found elsewhere, which is setting it per executor JVM in a call to .forEachPartition. That’s the only thing I could get to work.

        Running cluster in Standalone mode, might make a difference.

        Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.