Sum/Average X axis data


#1

This would ease the need for separated queries from the same data source.


#2

If you are talking about accumulation.
I thought about this before. Then I realized that for many cases,
unless you bring all the data to the front-end first, accumulation of just some of the data is meaningless.

It could be helpful for small sized dataset still, I just worry more about the confusion it might create.


#3

It’s my fault I didn’t develop too much on the proposal.

Take the following example: You have a query that returns 3 Columns: Month, City, Revenue. Each city has an entry per Month, which means that January will have as much entries as the number of cities.

Now, if you plot a Chart with bars with X values with the Month and Y Values with the Revenue, I should expect the option to SUM or AVG the values that appear on each Month. Instead, it looks as Redash is just picking up the first one or in some cases plotting an unpredictable Bar height and a non-corresponding legend.

The only workaround this is to do the calculation at the Query level, but now I need to do a Query for each Chart. If this was possible, I could instead run 1x Query for several visualizations.

I hope I explained it clear enough.


#4

Thanks for the explanation, I strongly agree with you.
I had the same problem, but I decided to work around this with query.

The visualization library (plotly) we’re using actually support this kind of aggregation.
I tried to fix once but I just could not manage to turn it on.


#5

Bump on this.

It’d be nice to have one query drive several visualizations (e.g. revenue by month and revenue by city) in the above example.


#6

+1 on this as well. Even more valuable if you can combine this with the group by functionality.

For example, imagine the above query but you also had revenue by product type. So the query returns month, city, product type, revenue.

You could generate two visualizations, one for revenue by city over time and another for revenue by product type over time.

Would vastly reduce total query count.


#7

This issue comes up from time to time, and I can see how it can be useful and save time. The main reason we never introduced such functionality was to protect the user from generating the wrong data. The example with revenues by city is a good case for this functionality. But allow me to offer a counter example:

SELECT created_at::date, action, count(distinct user_id)
FROM events
GROUP by 1, 2
ORDER by 1 DESC

The above query returns number of unique users per day / action type. Now most of you will know that you can’t sum the number of users per action to get the total number of unique users in each day (as the same user might do multiple actions), but your CMO? He might just remove the group by action from the chart settings to get this value.

Although I have to admit that a similar mistake can happen in the query as well. It just requires more effort to type it out. Maybe our approach should be less “protective” of the user?

Previous discussion on the same topic with @vladis:


#8

Mmm I see what you’re saying. I suppose it depends on a company’s analytics workflow.

Realistically at my org, myself or another technical contributor will be the only person manipulating visualizations. All other stakeholders simply get their data from dashboards that these contributors build. Therefore the risk of the CMO changing a dashboard is low.

That being said, I don’t know what the best approach is here. I know some other tools offer this functionality (mode, for example), but idk what the right answer is. The risk you pose is certainly a real one.