Graphistry chart integration for big data tables/graphs (server-generated)

Any pointers for a Graphistry integration into redash? And whether/how this carries over to Databricks dashboards? We’ve been getting asked more and more, and some active projects can really use it, so thought it was time to ask :slight_smile:

For background, Graphistry visualizations use client+server GPU acceleration, so normal Jupyter or Streamlit flow might look like the below split between viz generation vs vis loading:

  1. Chart generation: (SQL engine) —[1GB arrow dataframe]–>(Python kernel)—[200MB arrow dataframe]–(graphistry server)–>(iframe url)

  2. Chart viewing: (Python kernel)–[iframe url]–(browser)<–[1MB/s JS/arrow stream]–(graphistry server)

The main point is dashboard server <> graphistry server can handle bigger datasets than we’d want for the iframe doing graphistry server <> browser. So while Graphistry does have a React component, we don’t want to round trip big data through the browser, just keep it to symbolic things like filter controls.

This would be similar to apps doing things like GIS, Bokeh datashader, and other modern non-tiny charting. And the databricks dashboarding question because I suspect we may be able to carry over the benefits to both communities in one go :slight_smile:

2 Likes

This is a fantastic question. Will noodle this over the weekend and get back to you. I like the idea of writing to arrow so that we can stream the results out, however.

Awesome, thanks. We actually were the ones to create the Arrow JS libs and explicitly for these purposes, so happy to (try to) answer q’s on those aspects. But for the same reasons, that’s why we don’t want to send 100MB-1GB of data to a browser (browser-side JS VMs actually run out of memory!). So for the best current viable experience is to work with the viz server that’s doing the < 50ms latency tier stuff and then the browser’s webgl does the < 20m stuff on slices.

If it helps, one of the current prompts is working with a DB extension that’s already returning a DF to redash. They’re already using Graphistry for viz in streamlit/plotly/etc. and already runs interactively on big datasets by the architecture I described, so we’re trying to figure out how to recreate here. But my ideal would to enable for all redash users, including spark (as we have customers wanting exactly that for sec/fraud/misinfo/genomics/etc), vs. just for that DB :slight_smile:

Hi @jesse ! Any thoughts or tips?