Post Visualizations to Slack on a schedule


#1

One of my coworkers wrote a python script that regularly takes a screenshot of a visualization (using selenium + phantom) and posts it in a slack channel. We’re planning on integrating this into our fork of redash, and it’d be awesome if we could PR it back to the main repo. So I figured I’d open this topic for discussion on some possible approaches.

One difficulty I see is how integrated we’d like it to be with the query schedule functionality. On the one hand, there’s not much use in posting a screenshot unless the query has been run again. On the other hand, integrating it w/ query scheduling could make it messy and confusing from a UI and code standpoint.


#2

It’s a good idea to start a discussion about this. It’s also good timing, because I was doing similar work and also been thinking about how it will work along with queries scheduling.

I recently implemented a Slack bot for Redash as an external tool, that uses Redash’s API (can be installed from here: http://redash.io/slack). I also used Phantom to generate the screenshots, although I’m using it directly without Selenium. Was there any reason to use Selenium as a middleman?

One of the most common feedbacks I got about the bot is that it while search functionality is nice, the real value will be to get scheduled queries delivered to Slack…

Why have the bot as an external tool? There were few reasons:

  1. Different technology: Ability to use BotKit, which is based on Node.js. Also the Screenshot web service I wrote is using PhantomJS and Node.js.
  2. Ability to bring this to all Redash users, without requiring Redash upgrade or installing new infrastructure components.
  3. Business reason: I’m trying to find a way to make the project sustainable. The hosted business is taking time to ramp up, and I’m trying to find a model which will be indifferent to how people run Redash. I though of offering several “extensions” (similar to WordPress Jetpack plugin) that will be paid. Slack bot was one of them… I’m still “on the fence” with the business reason, but wasn’t open sourcing the bot and screenshot webservice yet for this reason. I’ll be happy to hear thoughts of the community on this as well.

But when thinking of how an optimal workflow will be I see a lot of value in having the Slack integration more closely integrated in Redash’s interface.

Now back to your question on how such feature should behave in regards to query scheduling – I agree that it makes sense to send the screenshot once the query was refreshed. One thought I had is to extend the “Alerts” feature and change it as follows (we should probably rename it as well – “Notifications”?):

  1. “Alerts” will have several types: rule (like today; send notification when the rule matches), on every update (send notification on every query refresh - like what you described), on new row (send notification when there is a new row in the result).
  2. All alert notification will include a snapshot of the visualization.
  3. Alerts will have their own schedule on which the query will be executed regardless of query schedule.

WDYT?

(I hope I managed to convey my thoughts in a coherent way, as it’s a bit late and I’m tired :slight_smile:)


#3

Bot as an external tool

In general I think the only downside, like you said, is this limits the integration possibility with the Redash UI, which I think is the ideal workflow here. That being said, it’d probably be okay for users to have to grab a link and type something like /redash <url to visualization> <time of day> into a Slack channel.

Schedule consistency

The other issue is scheduling consistency between queries and slack posts. The more I think about this, the more I’m leaning in support of having them separate. The UX seems to get much more complicated when we talk about having them on the same schedule. The downside there might be possible confusion for users (ie. “Did it post this chart after the query ran or before? How do I make sure it only happens after?”). I don’t think that would be an issue with how we use scheduled queries, but I can’t speak for all users.


#4

Hello all,

My name is “Coworker Who Writes Scripts” (Eric). Hopefully I can contribute a bit and give some perspective on the original incentives to build out the scripts we did.

Bot as an external tool

As far as process/workflow goes, having the process integrated into the re:dash UI would be valuable to our end users from a simplicity’s standpoint. Much of our work is done within re:dash and then we share out links to the original requester for them to view the visualizations/data tables they had asked for and this also allows them to make edits to the original base query if they would like. Given re:dash’s schedule functionality, we can set that to correspond to how often the data itself is updated and it keeps people coming back to re:dash as the source of truth for their data needs. So going back to simplicity, it makes sense to also have this notification functionality baked directly in to the re:dash UI.

I also appreciated Arik mentioning changing the alerting up to just “notifications”. I think this makes sense as some data sets are not going to work with visualizations and the end user may just want content delivered in text format. This could simply be periodic information or an alert, but either way I believe it makes sense to have this all nested in the same location. I’m not 100% sold on the functionality being built in to the query page itself as I see this working much like how you build out a dashboard. You can press (+) to add a new element and from there you can search through existing queries and then select whether its just the table data, or rather, the visualization you would like posted.

I certainly understand the technical difficulties however, so that’s definitely being considered.

Schedule Consistency

On the topic of whether these posts correspond to the refreshing of the query’s data, I think they absolutely need to be separate. Clearly our business units will want these visualizations and data posted during business hours, but we found out quickly having a mass of scheduled queries running over business hours murdered performance. It’s not to say all queries will be that intensive, and I am sure business units will want data from the morning, afternoon, etc, but for the queries covering days, weeks, years, at a time, it makes sense for the functionality of our clusters to be able to run some of those queries in off-hours and then post when users are working.

Hopefully that helps. I am sure we aren’t the only ones providing feedback on a feature with this type of functionality though, so I am excited to hear what you think!


#5

Arik, I think you’re spot-on in that the real value would be to have scheduled queries delivered to Slack. I found this thread as I was looking for ways to get that done. (Also, I was thinking about having our team build something similar).

Bot as an External Tool

I get the technical advantages (and they seem considerable). Also, the business advantage to making re:dash sustainable. Honestly, the Slackbot you wrote is a pretty fantastic start. If we could just set a schedule for it, (and maybe set the unfurl to be a little cleaner) we’d be all set. And I’d be happy to pay for it.

For the timing concerns, you could start by letting the bot be ignorant of that. Not really a problem from my perspective. Yes, an integrated scheduler would be ideal, but wouldn’t have the other benefits you described above.

On the other hand, I don’t know exactly how that Slackbot works – and I do feel a bit worried about sending my customer data through an external service. Please forgive my ignorance if that’s not the case.

Internal to re:dash

From a user’s perspective, definitely the better option. That said, I get that it doesn’t have the other advantages you described above.


#6

Josh, any chance you’d be willing to share your fork (or just the python script)? We could really use it!


#7

@arikfr I’m wondering how I can contribute the work we did with Chatlytics to this effort (https://www.chatlytics.co and https://github.com/openbridge/chatlytics). Our approach was to use a defined SQL query and then render within the chatbot a visualization. I can see what we built grabbing a query from Redash and then rendering it. If you want to discuss further let me know.


#8

The data only travels through our (Redash) server which saves the visualization to S3 and then sends it to Slack. It’s not as secure as having it all self hosted and only sent to Slack, but there are very few services involved and none of them store the visualization (except for S3).


#9

I can definitely see Chatlytics working w/ Redash. I’ll be happy to talk further – I’ll send you an email.

Thanks.