I’m trying to connect my self-hosted Redash instance (on AWS EC2) to a MYSQL Amazon RDS DB in a private VPC network behind a bastion (My RDS DB & EC2 instance running Redash are in separate networks).
I’ve read the instructions here:
The part that was confusing is that it asks you to download the Redash public key and put it in the home of your bastion. For self-hosted Redash, should I generate my own private/public key pair, and then ssh into my EC2 server that is hosting Redash and copy that private key into the file under ssh_tunnel_auth here: Run queries through ad-hoc SSH tunnels by rauchy · Pull Request #4797 · getredash/redash · GitHub
What’s the best way to do this?
Then I take the corresponding public key I generated (as opposed to the Redash public key) and put it in the home of my user for the bastion?
The doc you linked is specific to customers of app.redash.io. For a self-hosted instance you need your own public/private key pair. Add the public key to the trusted hosts on your bastion. Add the path to the private key to the Python file you linked. And configure an ssh_tunnel object on the data source using the REST API.
For a digital ocean droplet using Redash marketplace app, is it possible to modify the file redash/settings/dynamic_settings.py on the droplet itself? Where would that file be located? I’d prefer not to build my own Redash just to get the ssh tunnel feature. Thanks
Hey sorry for my late response here! Yes you can totally edit the file on digital ocean, although it is probably more hassle than it should be (this is an area for improvement). I’ve been using sed since that’s the only built-in utility within the container itself.
I’ll put together a guide of how to do it in the next couple days. Until then, you are welcome to message me directly through the forum.
I haven’t done this before, but I think only the worker containers really need the change. They are the containers that actually connect and run queries. The others (server, scheduler, nginx, redis, postgres) never communicate outside the local network.
Possibly a script that copies the python code into the container. Although it would be better to have that as a mapped volume on the container so it doesn’t get removed if the container is re-created. Even better, could the source be modified with a default key configured to a mapped volume? That’d be best I think.
You can of course modify anything on the image itself. Will need to consider how we can update the defaults going forward (we’re getting ready to build the V10 images so this is topical )
We won’t reopen that issue because we’re not going to make ssh tunnels the default behavior. But we could certainly use some documentation for setting one up. I’d love to review a PR adding those docs (along with many others )
That /opt/redash/overrides/ directory is something you’d need to manually create, then put the modified dynamic_settings.py in. The modified dynamic_settings.py has an updated ssh_tunnel_auth() function:
def ssh_tunnel_auth():
"""
To enable data source connections via SSH tunnels, provide your SSH authentication
pkey here. Return a string pointing at your **private** key's path (which will be used
to extract the public key), or a `paramiko.pkey.PKey` instance holding your **public** key.
"""
return {
'ssh_pkey': '/keys/id_rsa'
}
Note that the /keys/ directory there matches up with the /keys directory given in the volume clause above. So, the /keys/id_rsa file is really just an id_rsa file that needs to exist in your actual keys directory. The file needs to be readable by the ubuntu user inside the container too (uid 1000), which is probably easiest to do by chown-ing it. eg chown 1000: id_rsa.
As Jesse mentions above, only the scheduled_worker, adhoc_worker, and worker containers need the volume piece added, and they can all use the exact same keys.
There’s also another approach - using persistent ssh tunnels - which doesn’t need using modified python scripts, instead having the ssh tunnel be set up externally to Redash. eg using a container to manage the tunnel.
Both ways seem to work fine, but have different strengths and weaknesses:
Redash managed SSH tunnel
Slow to run queries due to tunnel creation each time
For long running queries, this extra time isn’t really noticeable
But, doesn’t really need separate monitoring
Persistent SSH tunnel
Quick to run queries, as the tunnel is already existing and ready to go
Better for fast queries, where faster GUI responsiveness is noticeable
Needs separate monitoring
Each SSH tunnel needs manually setting up/configuring
Beauty! Setting up the volume mapping looks great! I am not familiar enough with open source Redash but it appears the overrides folder is a mechanism built in to Redash to allow setting customization, right?
Almost. Docker (Redash uses it for management) allows sharing files and folders from the host server with it’s containers. So, in this case, it’s a way of both making SSH keys available to the worker containers + a way of persistently replacing specific Docker files.
Without an override like that, people would need to build their own custom Redash docker images (possible, but a bunch of effort). Or they’d need to manually log into their Docker containers and update files inside them. Which would then lose the changes any time the Docker container is rebuilt (can be pretty often, depending on what’s happening).
I configured the volume mapping with the key and placed the public key on the ssh tunnel host in the same manner as the hosted Redash configuration. I’m getting an ssh negotiation error, most likely not picking up the key or something along those lines. Any idea how to troubleshoot on the self hosted redash with Docker implementation?
Hmmm, if you manually run SSH (using that key) from the host your Redash is on, does the connection succeed? eg:
$ ssh -i path_to_key someuser@your_bastion_host
Note that a simple ssh like above will try creating a remote login session for your user (eg in order to run commands remotely). That capability can be disabled on the bastion server, and isn’t needed for tunnels. So, it’s very possible you’ll connect successfully when testing, then ssh will just close the connection without further message.
The thing to look for is whether the attempted connection times out, generates an error, or something similar. A timeout or “No route to host” will generally mean there’s a network layer problem that needs fixing (maybe a firewall needs updating?), whereas other things are more obvious. eg if ssh prompts for acceptance of a host key, then it means the connection is getting to the server and it might be a public key problem after all
So, try the connection, and let us know what happens with it…