Connect remotely to an external Spark
Using HTTP REST APIs from the Apache Livy service, you can connect remotely to an external Spark from your Data Science Experience notebook.
- You must install Apache Livy on the remote Spark site. Livy provides a URL to interact with the remote Spark.
- You must install all the dependencies (any libraries and packages) for your Spark code on the remote Spark site.
See Apache Livy Examples for more details on how a Python, Scala, or R notebook can connect to the remote Spark site.
Want to see connecting to an external Spark service in action? Watch this short video:
To connect to the remote Spark site, create the Livy session (either by UI mode or command mode) by using the REST API endpoint. The endpoint must include the Livy URL, port number, and authentication type. Sparkmagic example:
%spark add -s session1 -l python -u https://my.hdp.system.com:8443/gateway/default/livy/v1 -a u -k
session1 represents the session name and
python represents the notebook language.
You can enter
%%help for the list of commands.
If you want subsequent lines in your notebook to use the DSX Spark service, then you can specify
%%local. Otherwise, the cell defaults to the remote Spark service. If you want data to be returned in the local notebook, use
%%spark -o dd.
Afterward, ensure that you delete the Livy session to release the Spark resource:
%spark delete session1
See the sample notebooks for examples on how to connect to a remote Spark from your DSX notebook by using Sparkmagic.