Celery worker fails. Why?


#1

I am attempting to setup redash so I can debug a new query_runner. Unfortunately, I need Celery to complicate the process. I can click on “execute” new query, I see the query be put on a queue, I see the query being processed by my new query_runner, and the query runner returns a pair indicating success.

The UI reports “Error running query: failed communicating with server. Please check your Internet connection and try again.” (the celery worker stacktrace is below). It would be nice if the errors revealed some particulars that would point to what went wrong, rather than the code that went wrong.

What “target machine actively refused” what request?

Thank you very much for this.

[2017-05-04 13:59:25,522: ERROR/MainProcess] Task redash.tasks.execute_query[478ee3c9-84eb-47ec-a2ff-e1955c8e4bb2] raised unexpected: error(error(10061, ‘No connection could be made because the target machine actively refused it’),)
Traceback (most recent call last):
File “C:\Users\kyle\code\redash\celery\app\trace.py”, line 240, in trace_task
R = retval = fun(*args, **kwargs)
File “C:\Users\kyle\code\redash\celery\app\trace.py”, line 438, in protected_call
return self.run(*args, **kwargs)
File “C:\Users\kyle\code\redash\redash\tasks\queries.py”, line 496, in execute_query
scheduled_query).run()
File “C:\Users\kyle\code\redash\redash\tasks\queries.py”, line 451, in run
check_alerts_for_query.delay(query_id)
File “C:\Users\kyle\code\redash\celery\app\task.py”, line 453, in delay
return self.apply_async(args, kwargs)
File “C:\Users\kyle\code\redash\celery\app\task.py”, line 565, in apply_async
**dict(self._get_exec_options(), **options)
File “C:\Users\kyle\code\redash\celery\app\base.py”, line 354, in send_task
reply_to=reply_to or self.oid, **options
File “C:\Users\kyle\code\redash\celery\app\amqp.py”, line 310, in publish_task
**kwargs
File “C:\Users\kyle\code\redash.env\lib\site-packages\kombu\messaging.py”, line 172, in publish
routing_key, mandatory, immediate, exchange, declare)
File “C:\Users\kyle\code\redash.env\lib\site-packages\kombu\connection.py”, line 470, in ensured
interval_max)
File “C:\Users\kyle\code\redash.env\lib\site-packages\kombu\connection.py”, line 382, in ensure_connection
interval_start, interval_step, interval_max, callback)
File "C:\Users\kyle\code\redash.env\lib\site-packages\kombu\utils_init
.py", line 246, in retry_over_time
return fun(*args, **kwargs)
File “C:\Users\kyle\code\redash.env\lib\site-packages\kombu\connection.py”, line 250, in connect
return self.connection
File “C:\Users\kyle\code\redash.env\lib\site-packages\kombu\connection.py”, line 756, in connection
self._connection = self._establish_connection()
File “C:\Users\kyle\code\redash.env\lib\site-packages\kombu\connection.py”, line 711, in _establish_connection
conn = self.transport.establish_connection()
File “C:\Users\kyle\code\redash.env\lib\site-packages\kombu\transport\pyamqp.py”, line 116, in establish_connection
conn = self.Connection(**opts)
File “C:\Users\kyle\code\redash.env\lib\site-packages\amqp\connection.py”, line 165, in init
self.transport = self.Transport(host, connect_timeout, ssl)
File “C:\Users\kyle\code\redash.env\lib\site-packages\amqp\connection.py”, line 186, in Transport
return create_transport(host, connect_timeout, ssl)
File “C:\Users\kyle\code\redash.env\lib\site-packages\amqp\transport.py”, line 299, in create_transport
return TCPTransport(host, connect_timeout)
File “C:\Users\kyle\code\redash.env\lib\site-packages\amqp\transport.py”, line 95, in init
raise socket.error(last_err)
error: [Errno 10061] No connection could be made because the target machine actively refused it


#2

Based on the stacktrace it seems like Celery tries to use the amqp protocol for some reason. Because it does pick up the task from the queue, I think the results backend wasn’t configured properly for it.

Did you modify anything in the code aside from adding the new query runner? What version of Celery do you use?

Btw, you can easily test the query runner without Celery or even Redash itself – just create an instance of your query runner and pass a configuration dictionary to it.


#3

Thank you! I have managed to determine that the code is attempting to contact

ampq://guest:**@127.0.0.1:5672//

I do not know what that means. What is that URL used for? Do I need an ampq server at that location?

On the subject of testing a query_runner on it’s own: That will not work. I already learned that the query passed to the query_runner is prefixed with /* some metadata */ which would have broken the datasource parsing. This confirms that full end-to-end testing is a good idea; I would expect the data my query_runner returns to break some assumptions in the return data pipeline (maybe that is the problem I see now?).

Here is Celery startup:

 -------------- celery@ekyle29792 v3.1.23 (Cipater)
---- **** ----- 
--- * ***  * -- Windows-10-10.0.14393
-- * - **** --- 
- ** ---------- [config]
- ** ---------- .> app:         redash:0x596a870
- ** ---------- .> transport:   redis://localhost:6379/0
- ** ---------- .> results:     redis://localhost:6379/0
- *** --- * --- .> concurrency: 1 (prefork)
-- ******* ---- 
--- ***** ----- [queues]
 -------------- .> celery           exchange=celery(direct) key=celery
                .> queries          exchange=queries(direct) key=queries
                .> scheduled_queries exchange=scheduled_queries(direct) key=scheduled_queries

#4

Here is the version I am working with

SHA-1: 5ba6af6ad4bbb5e6be44044125d28e38fa155eef

  • Merge pull request #1713 from deecay/plotly-box
    Change: Box plot library from d3.js to Plotly.js

My changes are only scripts added to run on Windows, and I inserted more debug lines where I needed to know more: https://github.com/klahnakoski/redash/commits/windows


#5

Thank you! I have managed to determine that the code is attempting to contact
ampq://guest:**@127.0.0.1:5672//
I do not know what that means. What is that URL used for? Do I need an ampq server at that location?

Celery can use RabbitMQ/amqp instead of Redis, but we use Redis. I would’ve thought Celery somehow doesn’t pickup the correct configuration, but the Celery startup screen shows it uses Redis so I’m not sure. :\ Maybe it’s some issue with Celery on Windows?

On the subject of testing a query_runner on it’s own: That will not work. I already learned that the query passed to the query_runner is prefixed with /* some metadata */ which would have broken the datasource parsing. This confirms that full end-to-end testing is a good idea;

You can disable the annotation like this: https://github.com/getredash/redash/blob/master/redash/query_runner/url.py#L18-L20


#6

Maybe more direct questions may help me understand: Are you familiar with the query lifecycle in redash? Does the stack trace indicate it is part of that query lifecycle, or maybe some unimportant peripheral task? Given the stack trace, does it appear that Celery attempting to return the query result?

Your response gives me some clue as to what the Celery worker might be doing: Celery loves to spawn processes, and might be relying on Linux propensity to copy the parent process state, and therefore the Celery state. Maybe Celery has a way to turn off this process-spawning feature?


#7

I wonder how much work it is to mock Celery. I looked at billiard, the multiprocessing library, and I fear it has too many spawning side effects that Celery may depend on.


#8

After the query finishes running, Redash enqueues another Celery task to check alerts status (for alerts that use this query). At this point (when enqueueing the task) you get this exception.

It’s strange that Celery starts with the correct configuration but changes mid-flight.


#9

I think that mocking Celery will be too much work, two alternatives:

  1. Use Docker or a VM to run Redash in Ubuntu.
  2. Invoke the execute_quer task without Celery (you can just run the function directly).

#10

Thank you.

I decided against attempting to remote debug a multiprocess Redash/Ubuntu instance. I decided to mock Celery instead. I did not invoke execute_query directly because I still must do integration testing; which revealed some corner cases with respect to query string (mentioned above), error handling and expected response format. I am now at the point where the frontend requests http://localhost:8080/api/query_results/73 and gets a response that matches what my query_runner provides.