Using Redash to visualize the output of aggregation queries in Elasticsearch


#1

We’re super excited about using Redash to visualize data in Elasticsearch.

It looks from this pull request that this is now supported.

When I try it e.g. by running the following query:

{
    "index": "trials",
    "query": {
        "match_all": {}
    },
    "aggregations": {
        "day": {
            "date_histogram": {
                "field": "failure_tstamp",
                "interval": "day",
                "format": "yyyy-MM-dd"
            }
        }
    }
}

The data returned in the Redash UI is a list of underlying records rather than the aggregated data set. It’s actually the same response I get to the same query, without the aggregations key i.e.:

{
    "index": "trials",
    "query": {
        "match_all": {}
    }
}

I think the issue is that Redash is returning the contents of the hits object returned from ES rather than the aggregations object. Is there something I need to do in the query editor to get Redash to do that? Or a different way I need to express the query? I’ve tried specifying “key_as_string” and “doc_count” as arguments in the query:

{
    "index": "trials",
    "query": {
        "match_all": {}
    },
    "aggregations": {
        "day": {
            "date_histogram": {
                "field": "failure_tstamp",
                "interval": "day",
                "format": "yyyy-MM-dd"
            }
        }
    },
    "fields": [
        "key_as_string",
        "doc_count"
    ]
}

But Redash returns:

Error running query: fields

What have I missed / got wrong?

Thank you!


#2

This pull request is part of version 0.12 which the hosted version isn’t running yet. I “back ported” this pull request to the hosted version, so you should be able to use aggregations now.


#3

Legend! Thanks so much @arikfr!


#4

A follow on question about visualizing the output of Elasticsearch aggregations…

With queries like the one above i.e. where you’re aggregating by day, the value for day shown in the Redash interface is the timestamp in ms rather than a nicely formatted date time. It looks like Redash is taking this value from the key field in the ES response rather than the key_as_string value.

So for example the following query in Redash against ES:

{
    "index": "trials",
    "query": {
        "match_all": {}
    },
    "aggregations": {
        "day": {
            "date_histogram": {
                "field": "failure_tstamp",
                "interval": "day",
                "format": "yyyy-MM-dd"
            },
          "aggregations": {
            "bad_row_count": {
              "value_count": {"field": "line"}
            }
          }
        }
    }
}

returns this (downloaded as a CSV:

bad_row_count,day
7089944,1469750400000
6543625,1469836800000
2502341,1469923200000
7598353,1470009600000
12748166,1470096000000
13084995,1470182400000
12732980,1470268800000
12949006,1470355200000
5016354,1470441600000
222953,1470528000000
232749,1470614400000
236244,1470700800000
253458,1470787200000
99294,1470873600000
172617,1470960000000
232193,1471046400000
250381,1471132800000
267011,1471219200000

When the underlying ES response is:

{
	"took": 298,
	"timed_out": false,
	"_shards": {
		"total": 5,
		"successful": 3,
		"failed": 2,
		"failures": [{
			"index": "trials",
			"shard": 1,
			"status": 500,
			"reason": "RemoteTransportException[[Baron Brimstone][inet[/10.212.59.253:9300]][indices:data/read/search[phase/query]]]; nested: ElasticsearchException[org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [line] would be larger than limit of [961137868/916.6mb]]; nested: UncheckedExecutionException[org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [line] would be larger than limit of [961137868/916.6mb]]; nested: CircuitBreakingException[[FIELDDATA] Data too large, data for [line] would be larger than limit of [961137868/916.6mb]]; "
		}, {
			"index": "trials",
			"shard": 2,
			"status": 500,
			"reason": "ElasticsearchException[org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [line] would be larger than limit of [961137868/916.6mb]]; nested: UncheckedExecutionException[org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [line] would be larger than limit of [961137868/916.6mb]]; nested: CircuitBreakingException[[FIELDDATA] Data too large, data for [line] would be larger than limit of [961137868/916.6mb]]; "
		}]
	},
	"hits": {
		"total": 1541495,
		"max_score": 0.0,
		"hits": []
	},
	"aggregations": {
		"day": {
			"buckets": [{
				"key_as_string": "2016-07-29",
				"key": 1469750400000,
				"doc_count": 134906,
				"bad_row_count": {
					"value": 7089944
				}
			}, {
				"key_as_string": "2016-07-30",
				"key": 1469836800000,
				"doc_count": 123364,
				"bad_row_count": {
					"value": 6543625
				}
			}, {
				"key_as_string": "2016-07-31",
				"key": 1469923200000,
				"doc_count": 44943,
				"bad_row_count": {
					"value": 2502341
				}
			}, {
				"key_as_string": "2016-08-01",
				"key": 1470009600000,
				"doc_count": 143754,
				"bad_row_count": {
					"value": 7598353
				}
			}, {
				"key_as_string": "2016-08-02",
				"key": 1470096000000,
				"doc_count": 242010,
				"bad_row_count": {
					"value": 12748166
				}
			}, {
				"key_as_string": "2016-08-03",
				"key": 1470182400000,
				"doc_count": 249542,
				"bad_row_count": {
					"value": 13084995
				}
			}, {
				"key_as_string": "2016-08-04",
				"key": 1470268800000,
				"doc_count": 244229,
				"bad_row_count": {
					"value": 12732980
				}
			}, {
				"key_as_string": "2016-08-05",
				"key": 1470355200000,
				"doc_count": 248030,
				"bad_row_count": {
					"value": 12949006
				}
			}, {
				"key_as_string": "2016-08-06",
				"key": 1470441600000,
				"doc_count": 94290,
				"bad_row_count": {
					"value": 5016354
				}
			}, {
				"key_as_string": "2016-08-07",
				"key": 1470528000000,
				"doc_count": 1660,
				"bad_row_count": {
					"value": 222953
				}
			}, {
				"key_as_string": "2016-08-08",
				"key": 1470614400000,
				"doc_count": 2027,
				"bad_row_count": {
					"value": 232749
				}
			}, {
				"key_as_string": "2016-08-09",
				"key": 1470700800000,
				"doc_count": 2146,
				"bad_row_count": {
					"value": 236244
				}
			}, {
				"key_as_string": "2016-08-10",
				"key": 1470787200000,
				"doc_count": 2281,
				"bad_row_count": {
					"value": 253458
				}
			}, {
				"key_as_string": "2016-08-11",
				"key": 1470873600000,
				"doc_count": 901,
				"bad_row_count": {
					"value": 99294
				}
			}, {
				"key_as_string": "2016-08-12",
				"key": 1470960000000,
				"doc_count": 1516,
				"bad_row_count": {
					"value": 172617
				}
			}, {
				"key_as_string": "2016-08-13",
				"key": 1471046400000,
				"doc_count": 1782,
				"bad_row_count": {
					"value": 232193
				}
			}, {
				"key_as_string": "2016-08-14",
				"key": 1471132800000,
				"doc_count": 1963,
				"bad_row_count": {
					"value": 250381
				}
			}, {
				"key_as_string": "2016-08-15",
				"key": 1471219200000,
				"doc_count": 2151,
				"bad_row_count": {
					"value": 267011
				}
			}]
		}
	}
}

It’s a bit odd that this is happening as the code. Should I raise an issue on Github? I wanted to post this here first in case it relates to the particular backport.

Thanks in advance!

Yali


#5

This is the code we’re using, so you’re right it’s odd :confused:

Can you send me in a private message the query id? I’ll try to run it with some debugging to understand what’s going on.


#6

Sure! Dumb question -but where do I get the query ID?


#7

It’s the numeric part of the URL (you can just send me the whole URL and I’ll figure it out).


#8

Hi @arikfr - know you’re very busy - did you get the chance to look at this?
Thanks!

Yali


#9

Pull request #1289 fixes this. It will be deployed soon to the hosted version (I’ll ping you in private).


#10

Thanks @arikfr - that’s working a treat!


#11

im tryin to implement aggregation. but Im getting only 10 records. and aggregation is not wokrin