Is this an aggregation bug in chart visualisation?


#1

Found what seems like a bug in chart visualisations.

Being new to Redash though, I’m not sure. :wink:

[Apologies in advance, this will likely be a bit screenshot heavy.]

So, I’ve created a chart visualisation using the brand new “Redash JSON” export added to our DBHub.io service earlier today: :wink:


(data source here, json export url here)

At first glance, it seems ok. Looks like the grouping/stacking has autoarranged things decently.

However, when hovering the mouse over the columns, the numbers displayed for any of the categories seem to be off. This one seems ok at first glance too, as “231” (top value in the hover) is in the ball park of what’s shown in the graph (blue “Plastic” entry).

But this is clearly not correct:

The hover has “70” for the Plastic entry, whereas it should be 300+ going by the graph value.

Zooming in on the values for the group containing 231 shows this:

Yeah… that’s not correct. The brown entry “Sanitary” shows ~100 in the graph, but the hover shows “1”.

Investigating by downloading the data as CSV from the Redash download link…

The values from the date range:

...
2012-10-08,Plastic,0
2012-10-08,Plastic,0
2012-10-08,Plastic,231
2012-10-08,Sanitary,0
2012-10-08,Sanitary,92
2012-10-08,Sanitary,0
2012-10-08,Sanitary,4
2012-10-08,Sanitary,1

So… it seems like the graph values are aggregated, however the hover values seem to be picking a singular value from the data set. Something like sanitary[0] instead of sum(sanitary).

Alternatively, maybe I just need to change some options in the chart definition or SQL query and it’s not a bug? :slight_smile:


#2

It’s not really a bug (nor a feature :wink: ). See previous discussion here:

Only thing I would like to add to the previous discussion is that at this point I’m inclining towards conceding to applying an aggregation in such cases… It seems that many people expect that, hopefully more people will benefit from this than shoot themselves in the foot :slight_smile:


#3

Super cool! I wonder if it makes sense to add a dedciated data source to Redash for DBHub.io and somehow query it directly?


#4

Thanks @arikfr. Will read and grok, though maybe tomorrow instead of today (getting tired atm).

It’d probably be more efficient than serialising JSON, throwing that over the wire, then deserialising it. :slight_smile:

The databases are already in SQLite format and directly available over the wire (for public ones), so it’d probably be something very similar to the existing url query approach. Just saving the SQLite database directly instead of needing to convert it from JSON first.

For private databases, we’d likely need to figure out some api/key/something approach. Haven’t even thought about that yet, so that could come later. :wink:


#5

Yeah, that would be in line with how Redash “automatically figures out” other things too. :slight_smile:

In the case of my query, manually changing it to this looks like it worked (note - only a rough visual confirmation so far, haven’t manually verified with sampling values yet):

SELECT substr(Date, 7, 4) || '-' || substr(Date, 4, 2) || '-' || substr(Date, 1, 2) as Date, Category, sum(distinct Number) as SumVal
FROM query_16
GROUP BY 1, 2;

For automatically applying an aggregation when multiple values for the same day+category occur, it’ll probably need to be a drop down selector somewhere appropriate. Maybe the Y-axis tab, bearing in mind my newness to Redash still. :wink:

The drop down would then have values such as:

  • Auto
  • Off
  • count()
  • sum()
  • min()
  • max()
  • avg()
  • user defined function?

If Redash includes Liam Healy’s Math function extensions for SQLite (source here, mirror here), the aggregate ones it includes may be useful for people too.

Also, not sure what approach to use for “Auto” either. Maybe count(), because if it’s the wrong choice then the values will be sooooo far from expected that the user should realise something still needs adjusting.