Why am I suggesting this?

We at Hudl self-host the OSS version of Redash. v3. We consistently see one bad query returning large data set use the entire memory of the server affecting other workers and queries. Currently Celery v3.x does not allow adding memory usage limits (--max-memory-per-child flag). However, Celery v4.x does. We would like to use this to isolate and limit contention between celery workers.

This issue exists for Redash v4 too as the celery version is the same major version.

What needs to be done?

To limit contention of resources between Celery workers is to limit how much memory gets used by each worker. One way to do this is to upgrade version of Celery to v4.x. This is not a trivial change however. Here’s what’s changed in celery between v3 and v4: http://docs.celeryproject.org/en/master/whatsnew-4.0.html

There are changes to interfaces, routing, redis events, and dropped support for few old frameworks, etc. This is probably a large update to the backend and could be phased. The lazy upgrade is to update python package and any decorators and function signatures. This is only to be done in small number of places but testing will be important.

There will be a lot more changes to do such as changing setting names, function signatures, etc.

Next Steps

The main aim is to use this as a discussion thread. If there are ways to limit the impact of upgrade, I’d like to hear this, If there are ways that don’t require an upgrade to Celery or the owners and community don’t want to upgrade celery, I’d like to find other ways.

I know parts of the codebase but not everything so I’m looking for help identifying list of changes to be done.

I’m all for upgrading Celery :+1: , but as you said we need to do so carefully. I think that most of the changes don’t affect us, except for name changes of some of the settings, that we will need to update.

The folks from Mozilla already use Celery 4 with Redash, so we can try and check what they did. Unfortunately I’m not sure if any of them are on the forum :man_shrugging:

The other concern is how will it work when during an actual upgrade if the user still has jobs from Celery 3.x. V4 already using a version of Celery that should be forward compatible, but not all users will upgrade from V4. The least we should have is clear instructions on what to do (probably clear Redis).

It would be good to hear about Mozilla’s experience. Do they maintain their work fork then?

For upgrades, for the settings, we could have a small script that does the conversion. Some other changes that affect redash is adding typing=False to task decorator for not using function signatures. although we could do without it (I need to spend time on reading how celery does this). And secondly adding **kwargs to function parameters for all tasks.

I’ll start my investigation on this and document in this thread what changes we need in addition to above.

Yes their fork is here: github.com/mozilla/redash.

We can move the discussion to a GitHub issue, and then mention them to bring them to the discussion :slight_smile:

I added the Github issue at https://github.com/getredash/redash/issues/2517

From Mozilla’s fork it looks like they have a commit for upgrade https://github.com/mozilla/redash/commit/de4a5a1419902953146a12efbea6e17776b6e5d6

Doesn’t look too bad.