Any pointers for a Graphistry integration into redash? And whether/how this carries over to Databricks dashboards? We’ve been getting asked more and more, and some active projects can really use it, so thought it was time to ask :slight_smile:

For background, Graphistry visualizations use client+server GPU acceleration, so normal Jupyter or Streamlit flow might look like the below split between viz generation vs vis loading:

  1. Chart generation: (SQL engine) —[1GB arrow dataframe]–>(Python kernel)—[200MB arrow dataframe]–(graphistry server)–>(iframe url)

  2. Chart viewing: (Python kernel)–[iframe url]–(browser)<–[1MB/s JS/arrow stream]–(graphistry server)

The main point is dashboard server <> graphistry server can handle bigger datasets than we’d want for the iframe doing graphistry server <> browser. So while Graphistry does have a React component, we don’t want to round trip big data through the browser, just keep it to symbolic things like filter controls.

This would be similar to apps doing things like GIS, Bokeh datashader, and other modern non-tiny charting. And the databricks dashboarding question because I suspect we may be able to carry over the benefits to both communities in one go :slight_smile:

2 Likes

This is a fantastic question. Will noodle this over the weekend and get back to you. I like the idea of writing to arrow so that we can stream the results out, however.

Awesome, thanks. We actually were the ones to create the Arrow JS libs and explicitly for these purposes, so happy to (try to) answer q’s on those aspects. But for the same reasons, that’s why we don’t want to send 100MB-1GB of data to a browser (browser-side JS VMs actually run out of memory!). So for the best current viable experience is to work with the viz server that’s doing the < 50ms latency tier stuff and then the browser’s webgl does the < 20m stuff on slices.

If it helps, one of the current prompts is working with a DB extension that’s already returning a DF to redash. They’re already using Graphistry for viz in streamlit/plotly/etc. and already runs interactively on big datasets by the architecture I described, so we’re trying to figure out how to recreate here. But my ideal would to enable for all redash users, including spark (as we have customers wanting exactly that for sec/fraud/misinfo/genomics/etc), vs. just for that DB :slight_smile:

Hi @jesse ! Any thoughts or tips?

@jesse @lmeyerov Reviving this tread. I would ABSOLUTELY love to have this inside of reDash. It would open a whole new side of reDash for Graph analytics! Hopefully we can keep this tread alive!

Same!

As some progress since then:

1 Like

Thank you for bumping this! When this issue was first posted we were in the throes of sunsetting hosted Redash. Will be following up on many of these items over the next couple of weeks!

1 Like

@Herk Would you be interested in helping test this integration? What’s your dream data source for this kind of analysis?

1 Like

@jesse yes our team is very much interested in testing. We’ve already built a lot of modifications into reDash on a forked version.

  • Connection to TigerGraph (Graph Database)
  • GSQL support (Language for TigerGraph)
  • Actively developing GraphQL support into reDash as well because TigerGraph has a GraphQL connector
  • Dynamic Schema listing based on the Graph Box you connected to
  • Also doing a few more…

If you’re interested, let’s get that merged into master on the main repo! I’ve wanted to implement a neo4j connector for ages but i’m not a graph expert. Amazing to see others care about it too.

GraphQL support would possibly enable a Dgraph backend. We’d be happy to test.