Spark and MySQL
The MySQL Connector
Do you already have all the libraries to connect MySQL? Get the hands on your spark-shell
and check with:
1
Class.forName("com.mysql.jdbc.Driver")
If you get an exception, then you are missing the MySQL connector.
Download it. There is a jar inside the tarball, that’s all you need. Just tell spark
where the file is:
1
spark-shell --jars mysql-connector-java-5.1.44-bin.jar
You won’t get the exception anymore.
Connecting and reading a table
Let’s assume that you have a MySQL instance running.
You have to provide a few informations:
- username and password
- hostname and port, of the MySQL Server
- Database that you want to use
We use the hostname, port and Database to create a jdbc
URL. The username and password will be passed to the reader via a java.util.Properties
object.
We are going to use the jdbc
method of the DataFrameReader
. The jdbc
is pretty general, so you will have to tell it which driver
it should use to connect. Hint: for MySQL it’s com.mysql.jdbc.Driver
.
Now that we have all the information we need, let’s read a table called HEROES
from the database MY_DATABASE
stored in our local MySQL:
1
2
3
4
5
6
7
8
9
10
11
val host = "localhost"
val port = 3306
val database = "MY_DATABASE"
val url = s"jdbc:mysql://${host}:${port}/${database}"
val props = new java.util.Properties
props.put("user", "root")
props.put("password", "password")
val heroes = spark.read.option("driver", "com.mysql.jdbc.Driver").jdbc(url, "HEROES", props)
heroes.show
1
val heros = spark.read.option("driver", "com.mysql.jdbc.Driver").jdbc(jdbcUrl, "HEROES", connectionProperties)
You can also create a new table in the database:
1
2
3
4
5
6
7
8
9
10
import java.sql.Date
val newData = List(
("Adolf Hitler", "Germany", Date.valueOf("1945-04-30")),
("Benito Mussolini", "Italy", Date.valueOf("1945-04-28")),
("Francisco Franco", "Spain", Date.valueOf("1975-11-20")))
val villains = newData.toDF("NAME", "COUNTRY", "DIED")
villains.write.option("driver", "com.mysql.jdbc.Driver").jdbc(url, "VILLAINS", props)
And you can access it from the MySQL client:
use MY_DATABASE
select * from VILLAINS;
1
2
3
4
5
6
7
+------------------+---------+------------+
| NAME | COUNTRY | DIED |
+------------------+---------+------------+
| Benito Mussolini | Italy | 1945-04-28 |
| Francisco Franco | Spain | 1975-11-20 |
| Adolf Hitler | Germany | 1945-04-30 |
+------------------+---------+------------+
References
https://docs.databricks.com/spark/latest/data-sources/sql-databases.html
https://dev.mysql.com/downloads/connector/j/
Update 2023-08-12. This post has been imported from my previous neglected blog.