
Apache Spark alone, by default, generates a lot of information in its logs. Spark Streaming creates a metric ton more (in fairness, there’s a lot going on). So, how do we lower that gargantuan wall of text to something more manageable?
One way is to lower the log level for the Spark Context, which is retrieved from the Streaming Context. Simply:
val conf = new SparkConf().setAppName(appName) // run on cluster | |
val ssc = new StreamingContext(conf, Seconds(5)) | |
val sc = ssc.sparkContext | |
sc.setLogLevel("ERROR") |
Pretty easy, right?
Are you deploying this in cluster mode? Because I tried this, and it didn’t seem to make a difference. Desperately need to reduce logging. 😀
LikeLike
Hey Vetle!
Most of my apps are deployed in YARN mode, though I imagine this would apply to cluster mode as well. What mode are you setting your logging to (INFO, ERROR, WARN, etc)?
LikeLike
I was setting to all the levels, just to try to change *something* 🙂
I ended up with a solution I found elsewhere, which is setting it per executor JVM in a call to .forEachPartition. That’s the only thing I could get to work.
Running cluster in Standalone mode, might make a difference.
LikeLiked by 1 person