Post

Spark and MySQL

Spark and MySQL

The MySQL Connector

Do you already have all the libraries to connect MySQL? Get the hands on your spark-shell and check with:

1
Class.forName("com.mysql.jdbc.Driver")

If you get an exception, then you are missing the MySQL connector.

Download it. There is a jar inside the tarball, that’s all you need. Just tell spark where the file is:

1
spark-shell --jars mysql-connector-java-5.1.44-bin.jar

You won’t get the exception anymore.

Connecting and reading a table

Let’s assume that you have a MySQL instance running.

You have to provide a few informations:

  • username and password
  • hostname and port, of the MySQL Server
  • Database that you want to use

We use the hostname, port and Database to create a jdbc URL. The username and password will be passed to the reader via a java.util.Properties object.

We are going to use the jdbc method of the DataFrameReader. The jdbc is pretty general, so you will have to tell it which driver it should use to connect. Hint: for MySQL it’s com.mysql.jdbc.Driver.

Now that we have all the information we need, let’s read a table called HEROES from the database MY_DATABASE stored in our local MySQL:

1
2
3
4
5
6
7
8
9
10
11
val host = "localhost"
val port = 3306
val database = "MY_DATABASE"
val url = s"jdbc:mysql://${host}:${port}/${database}"

val props = new java.util.Properties
props.put("user", "root")
props.put("password", "password")

val heroes = spark.read.option("driver", "com.mysql.jdbc.Driver").jdbc(url, "HEROES", props)
heroes.show
1
val heros = spark.read.option("driver", "com.mysql.jdbc.Driver").jdbc(jdbcUrl, "HEROES", connectionProperties)

You can also create a new table in the database:

1
2
3
4
5
6
7
8
9
10
import java.sql.Date

val newData = List(
        ("Adolf Hitler", "Germany", Date.valueOf("1945-04-30")),
        ("Benito Mussolini", "Italy", Date.valueOf("1945-04-28")),
        ("Francisco Franco", "Spain", Date.valueOf("1975-11-20")))
  
val villains = newData.toDF("NAME", "COUNTRY", "DIED")

villains.write.option("driver", "com.mysql.jdbc.Driver").jdbc(url, "VILLAINS", props)

And you can access it from the MySQL client:

use MY_DATABASE
select * from VILLAINS;
1
2
3
4
5
6
7
+------------------+---------+------------+
| NAME             | COUNTRY | DIED       |
+------------------+---------+------------+
| Benito Mussolini | Italy   | 1945-04-28 |
| Francisco Franco | Spain   | 1975-11-20 |
| Adolf Hitler     | Germany | 1945-04-30 |
+------------------+---------+------------+

References

https://docs.databricks.com/spark/latest/data-sources/sql-databases.html

https://dev.mysql.com/downloads/connector/j/

Update 2023-08-12. This post has been imported from my previous neglected blog.

This post is licensed under CC BY 4.0 by the author.