Identifying plotly charts in URLs

Hi, I’m looking for a somewhat standard way to identify interactive charts on web pages built in libraries like plotly.JS. So this is a ‘how do they appear’ question not a ‘how do I make’ question if that’s allowed.

Would plotly.JS charts all show up in an SVG tag element in the HTML?

Would that be true for charts made in other JavaScript libraries? I’d prefer a broad approach that’s not 100% vs one that relies on anything that may not be consistently present across sites (like id tags?)

My goal is to answer the question, there is an interactive chart element on this page (yes/no), and my thought is that this would get me close to that answer. Maybe by limiting to the body of a page I’d weed out most logos, and that would leave graphs and maybe animations or some illustrations.

If I’m way off base I’m open to any hints you could share that could help me find these on specific URLs programmatically (with Python)? I am usually in the Python side of this forum but would love to hear from the group mind here!

I don’t think there’s a single rule that will capture every charting library. Plotly.js has SVG elements in it, even if you use a WebGL trace type, but there are plenty of other uses of SVG, like MathJax, that would probably give you a lot of false positives. And some other charting libraries only create Canvas elements (Cytoscape and Bokeh for example).

You might be able to construct a rule that amounts to “is there a large graphical element” ie search for an SVG or a Canvas element with a certain minimum size - maybe 300px width and 200px height via getBoundingClientRect or getComputedStyle? That would give a false positive if the page includes large aesthetic / artistic elements but maybe in the kind of pages you’re thinking about such elements would be relatively rare? And it would miss small graphs like sparklines, which may or may not be your intent.

Otherwise I’d pursue an approach based on positively identifying each charting library. For example Plotly.js gives its top-level Div a js-plotly-plot class; Bokeh creates several canvas.bk elements; Cytoscape gives its top level Div a __________cytoscape_container class (wow that’s a lot of underscores!)

Note that all of these approaches would require actually executing the page in a browser - it’s not enough to search the page HTML. In Dash for example we don’t even load the Plotly.js library until Dash Core Components tries to render a graph. So it wouldn’t work to request the page in Python and traverse it with BeautifulSoup or some such, you’d need to use Selenium.

1 Like

@alexcjohnson this is super helpful info, thank you so much for sharing
all this knowledge and direction. Especially to look for Canvas elements
which I didn’t realize.

I was looking at the code for a Chrome extension that identifies some
charting libraries but I’ve found some pages it isn’t reading. It seems
like a good work in the direction of identifying specific libraries.

I was prepared to use Selenium or actions after finding some scripts
combining that with Beautifulsoup. This is shaping up to be a longer
term project than I was hoping for, at least to do it well, but might be
worth the time to test for “is there a large graphical element”.

For the group of sites I’m interested in right now at least, actually
finding a cool illustration might be an uncommon bonus, since overall
I’d like to tie this with time spent on page or engagement (or dispute
an idea about that with viz anyway especially interactive viz).

If anyone hears or sees any projects along these lines, let me know.
Thank you Alex!