Table of contents

Connect remotely to an external Spark

Using HTTP REST APIs from the Apache Livy service, you can connect remotely to an external Spark from your Data Science Experience notebook.

Requirements:

  • You must install Apache Livy on the remote Spark site. Livy provides a URL to interact with the remote Spark.
  • You must install all the dependencies (any libraries and packages) for your Spark code on the remote Spark site.

See Apache Livy Examples for more details on how a Python, Scala, or R notebook can connect to the remote Spark site.

Want to see connecting to an external Spark service in action? Watch this short video:

Figure 1. Video iconConnecting to an external Spark service from IBM DSX Local
This video walks you through the process of writing notebooks in IBM DSX Local that remotely connect to an external Spark service with Livy using Sparkmagic.

To connect to the remote Spark site, create the Livy session (either by UI mode or command mode) by using the REST API endpoint. The endpoint must include the Livy URL, port number, and authentication type. Sparkmagic example:

  %spark add -s session1 -l python -u https://my.hdp.system.com:8443/gateway/default/livy/v1 -a u -k

where session1 represents the session name and python represents the notebook language.

You can enter %spark? or %%help for the list of commands.

If you want subsequent lines in your notebook to use the DSX Spark service, then you can specify %%local. Otherwise, the cell defaults to the remote Spark service. If you want data to be returned in the local notebook, use %%spark -o dd.

Afterward, ensure that you delete the Livy session to release the Spark resource:

%spark delete session1

Learn more

See the sample notebooks for examples on how to connect to a remote Spark from your DSX notebook by using Sparkmagic.