Spark Starter Guide 4.2: How to Create a Spark Session
Previous post: Spark Starter Guide 4.1: Introduction to Data Pipelines
In this exercise, you’ll learn how to create a Spark Session – the basic building block for working with Spark in a modern context (ha, that’s a joke you’ll get later). In shorter terms: it’s the modern entry point into Spark.
Introduction
The Spark Session was introduced in the 2.0 release of Apache Spark, and was designed to both replace and consolidate the previous methods for accessing Spark: contexts! Sessions also made it easier to:
- configure runtime properties of Spark applications after instantiation, (e.g. spark.sql.shuffle.partitions)
- create DataFrames and DataSets
- use Spark SQL and access Hive
So, instead of using the Spark Context, SQL Context and Hive Context objects of the past, you can simply use a Spark Session. You maintain all of the convenience of those legacy objects with one straightforward instantiation.
NOTE
All exercises in this chapter can be written in a project on your local workstation (using an editor like IntelliJ, Eclipse, PyCharm, and so on), or in a Notebook (Jupyter, Qubole, Databricks, and so on).
Code will execute the same in both environments, so the choice of what to use is up to you!
The steps for creating a Spark Session are provided in two different languages: Scala and Python. Select your API of choice and proceed!

Follow these steps to complete the exercise in SCALA:
- Import SparkSession from the Spark SQL library using the following code:
import org.apache.spark.sql.SparkSession
- Create a Spark Session in local mode using the following code:
val spark = SparkSession .builder() .master("local[2]") .appName("My Spark App") .getOrCreate()

Follow these steps to complete the exercise in PYTHON:
- Import SparkSession from the Spark SQL library using the following code:
from pyspark.sql import SparkSession
spark = SparkSession\ .builder\ .appName("My Spark App")\ .master("local[2]")\ .getOrCreate()
Now, you’re ready to start using Spark. That might not have been very exciting, but in the next exercise we’ll dive headfirst into our first actual Spark application – data deduplication!
One thought