SparkSession autocomplete in IntelliJ

Posted Aug 5, 2017

By Michele Costantino Soccio

1 min read

I really enjoy working with the spark-shell. It provides a quick way to experiment with code snippets. However, there are times when you might prefer to write those snippets in a proper file and have access to syntax checks and highlighting.

Personally, I use IntelliJ, but you should be able to adapt this example to your preferred IDE.

Once you have created your script file, you can add the following code at the beginning of the file:

  
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder()
  .appName("Spark shell")
  .config("spark.sql.warehouse.dir", "file:/tmp/spark-warehouse")
  .master("local")
  .getOrCreate()
import spark.implicits._

The first two lines are used to create a SparkSession object. The spark-shell creates it for you, but you don’t have it (yet) when writing in IntelliJ. It basically creates this object so that IntelliJ does not complain too much about the syntax, and you have all the auto-completions that a good CTRL+Space can give you. On the other hand, when you use this file in the spark-shell, the spark object you created will be kindly ignored in favor of the local instance. It’s not exactly like this, but the effect is pretty much the same.

The import spark.implicits._ allows you to use in IntelliJ all the implicit conversions provided by spark on top of the native types of scala (thanks to the scala Implicits. For example, you can write:

  
val myList = "Michele" :: "Maggie" :: "Vincenzo" :: "Luca" :: Nil
val myDF = myList.toDF

despite the fact that the List type in scala has no toDF method.

Update 2023-08-12. This post has been imported from my previous neglected blog.

This post is licensed under CC BY 4.0 by the author.