Hello all :).

I’ve got a scheduled query that runs every minute and produces a single scalar value. I’ve created an alert based on that query with the following settings:

  • trigger said value is < 65000
  • notifications are sent “just once, until back to normal”
  • slack channel set up as alert destination
  • custom description that includes the {{ALERT_STATUS}} and {{QUERY_RESULT_VALUE}} macros (for debugging purposes).

I’m seeing two kinds of strange behavior:

  • Often (but unfortunately for our debugging purposes, not always) the slack channel will receive a message every minute saying “{{ALERT_NAME}} went back to normal}}”, coupled with an “OK” status and a value that is clearly above the 65000 alert threshold. These messages even when the alert was not recently in a “TRIGGERED” state. Put another way, the slack channel multiple “back to normal” messages for a single “triggered” message, if a “triggered” message appears at all.
  • This behavior persists if I tell the alert to send notifications “at most every 5 minutes”. The only difference is that the green notifications are sent every 5 minutes instead of every minute when the underlying query refreshes.

Other troubleshooting steps I’ve tried:

  • I’ve debugged the underlying query to the point that I’m pretty confident it’s not returning values below the threshold for brief periods of time. The query computes rolling average anyways, so that kind of behavior shouldn’t happy given what I know about the underlying timeseries it’s pulling from. Plus, wouldn’t I see a red “triggered” message in slack if the alert entered the “triggered” state?
  • I’ve tried creating a new query and alert with the same query text, refresh rate, and alert settings and the same problem exists.

Questions:

  • Any other troubleshooting steps I should try?
  • Any other ways I could configure this alert to stop this behavior?

Relevant platform information:

  • self-hosted redash
  • redash/preview:9.0.0-beta.b49483 image

For clarification, this setting “at most every 5 minutes” only affects the frequency of TRIGGERED messages. You will always receive a notification when the alert is first triggered and you will always receive one notification when the value goes back to normal. You shouldn’t see multiple back-to-normal notifications, though, so this does look like a bug.

Does this reproduce using the V10 beta?

Thanks for following up!

For clarification, this setting “at most every 5 minutes” only affects the frequency of TRIGGERED messages.

Good to know. For reference, upon subsequent testing of the “just once, until back to normal setting”, we saw both TRIGGERED AND OK messages delivering multiple times, each time the underlying query was evaluated.

Does this reproduce using the V10 beta?

Not sure, but I’m happy to try!

1 Like

Very interesting. I wonder if the jobs aren’t being updated in redis so the alert is picked up twice (or more).

Very curious to hear the result of trying this with V10.

We upgraded to V10 and observed the same behavior (“triggered” and “ok” notifications deliver repeatedly every time the alert is evaluated instead of once when the state changes).

One thing I am wondering though: we are actually running two redash instances. One is on v10 (as of today), and the other is on V8 (specifically redash/redash:8.0.0.b32245). They share the same backing postgres database. Do you think it’s possible that the two services are interfering with each other in some way?

Happy to try any other debugging strategies you think might be helpful.

Yes absolutely this would cause that kind of issue.

The postgres database contains all of Redash’s state. I’m actually surprised that this works with the same backing database. As there are significant database schema differences between those versions :open_mouth:

Try disabling the alert on one of the instances and see if the issue resolves.