💸 Reduce costs by consolidating proprietary analytics & reporting software to open-source & Dash.
Challenge us to replace your analytics with Dash and reduce costs.

Plotly and Zeppelin notebook

I try to create a simple bar chart in zeppelin notebook by using pyspark. Unfortunately the plot is not popped up in the notebook.
any thought?

Hey Arash,

Just had to do this, too, a week or so ago. So I’m sharing what I did to get Plot.ly working on Zeppelin on here: https://github.com/beljun/zeppelin-plotly

Note that this is not official or endorsed by either communities as of yet. Just one more user of these 2 awesome projects scratching his own itch.



Thanks a lot junjun! I faced another problem: interpreter memory problem in Zeppelin notebook. it accepts only 3 bar graphs when I use plotly but there is not memory problem when I use matplotlib or even D3. any idea?
thanks, Arash

Can you describe more your setup and where you’re seeing the error. Best would be to attached the paragraph code and the error log so we can diagnose.

it just says error and no more information then you can not execute even the previously executed codes. when I check the log it says “out of memory” but when I change the memory by changing the “zeppelin-env.sh” then I can add two more graphs. here are the changes that I made instead of using the default:

export ZEPPELIN_MEM="-Xmx10024m -XX:MaxPermSize=5120m" (default is 512m)
export SPARK_SUBMIT_OPTIONS="–driver-java-options -Xmx10g --driver-memory 10g --executor-memory 10g" (default is 1g)

after two graphs then it stops again!

Hmm, it might be a case of you collecting large amounts of data from your Spark executors to your driver (you use Spark I assume given you have SPARK_SUBMIT_OPTIONS).

For my setup here, as long as I’m careful to aggregate my Spark RDD/DF before calling toPandas(), then everything’s fine. Been using this approach for visualizing fairly large datasets aggregated just to what’s needed for the chart.

through pyspark I just created a DF then started using pandas. maybe I need to crate RDD through spark then using pandas but the problem is I don’t know how I can crate a DF from RDD in pyspark!