Spark Starter Guide 4.2: How to Create a Spark Session

Previous post: Spark Starter Guide 4.1: Introduction to Data Pipelines

Image courtesy of DataBricks: How to use Spark Session in Apache Spark 2.0

In this exercise, you’ll learn how to create a Spark Session – the basic building block for working with Spark in a modern context (ha, that’s a joke you’ll get later). In shorter terms: it’s the modern entry point into Spark.

Introduction

The Spark Session was introduced in the 2.0 release of Apache Spark, and was designed to both replace and consolidate the previous methods for accessing Spark: contexts! Sessions also made it easier to:

  • configure runtime properties of Spark applications after instantiation, (e.g. spark.sql.shuffle.partitions)
  • create DataFrames and DataSets
  • use Spark SQL and access Hive

So, instead of using the Spark Context, SQL Context and Hive Context objects of the past, you can simply use a Spark Session. You maintain all of the convenience of those legacy objects with one straightforward instantiation.


NOTE

All exercises in this chapter can be written in a project on your local workstation (using an editor like IntelliJ, Eclipse, PyCharm, and so on), or in a Notebook (Jupyter, Qubole, Databricks, and so on).

Code will execute the same in both environments, so the choice of what to use is up to you!


The steps for creating a Spark Session are provided in two different languages: Scala and Python. Select your API of choice and proceed!

Follow these steps to complete the exercise in SCALA:

  1. Import SparkSession from the Spark SQL library using the following code:
import org.apache.spark.sql.SparkSession
  • Create a Spark Session in local mode using the following code:
val spark = SparkSession
    .builder()
    .master("local[2]")
    .appName("My Spark App")
    .getOrCreate()

Follow these steps to complete the exercise in PYTHON:

  1. Import SparkSession from the Spark SQL library using the following code:
from pyspark.sql import SparkSession
spark = SparkSession\
    .builder\
    .appName("My Spark App")\
    .master("local[2]")\
    .getOrCreate()

Now, you’re ready to start using Spark. That might not have been very exciting, but in the next exercise we’ll dive headfirst into our first actual Spark application – data deduplication!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.