Skip to main content

sparkmon

Project description

PyPI Python Version License

Read the documentation at https://sparkmon.readthedocs.io/ Tests Codecov

pre-commit Black

Features

Monitoring plot example:

docs/images/monitoring-plot-example.png
  • Logs the executors metrics

  • Plot monitoring, display in a notebook, or export to a file

  • Can monitor remote Spark application

  • Can run directly in your PySpark application, or run in a notebook, or via the command-line interface

  • Log to mlflow

Requirements

  • Python

  • Spark

  • mlflow (optional)

Installation

You can install sparkmon via pip from PyPI:

$ pip install sparkmon
$ pip install sparkmon[mlflow]

Usage

import sparkmon

# Create an app connection
# via a Spark session
application = sparkmon.create_application_from_spark(spark)
# or via a remote Spark web UI link
application = sparkmon.create_application_from_link(index=0, web_url='http://localhost:4040')

# Create and start the monitoring process
mon = sparkmon.SparkMon(application, period=5, callbacks=[
    sparkmon.callbacks.plot_to_image,
    sparkmon.callbacks.log_to_mlflow,
])
mon.start()

# Stop monitoring
mon.stop()

You can also use it from a notebook: Notebook Example

There is also a command-line interface, see Command-line Reference for details.

How does it work?

SparkMon is running in the background a Python thread that is querying Spark web UI API and logging all the executors information over time.

The callbacks list parameters allows you to define what do after each update, like exporting executors historical info to a csv, or plotting to a file, or to your notebook.

Contributing

Contributions are very welcome. To learn more, see the Contributor Guide.

License

Distributed under the terms of the Apache 2.0 license, sparkmon is free and open source software.

Issues

If you encounter any problems, please file an issue along with a detailed description.

Credits

This project was generated from @cjolowicz’s Hypermodern Python Cookiecutter template.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page