Need help with setting up Impala as a datasource

I have spent some time trying to setup Impala as a datasource. I couldn’t figure out why it’s not passing test connection.

Host: hadoop01.domain.com
Port: 21000
Protocol: Hive Server 2
Database: default
Use LDAP: Unchecked
Username: kulink (this is a database read only service account)
Password: filled in
Timeout: 3600

Here is the error I am getting:
Connection Test Failed:
Metastore Error [Failed after retrying 3 times]

Can you please tell me how to resolve this issue and make it work?

Thanks!

anyone else experienced issues connecting? how did you resolve? thanks in advance.

I ditched test connection and saved datasource settings. Created a new query using the Impala datasource. Now I am getting this error.

" Error running query: Bad status: 3 (b’Unsupported mechanism type PLAIN’)"

Maybe you have a secure cluster, that allows Kerberos (GSSAPI) authentication only. If so, do you have a valid Kerberos ticket for the user running Redash?

Thank you for responding… admin team confirmed that we use Kerberos and the user I have entered in the datasource setup form and using to connect to database has valid certificate.

In this case, you have to specify auth_mechanism=GSSAPI in the database connection string, which is not possible out of the box as far as i can remember. Also SASL Python packages (pip) are required besides Impyla for it to work.

If someone can add this to the Impala query runner we’d be happy to merge it.

Here’s my patch for V10-beta. Should I send a PR instead?

--- redash/query_runner/impala_ds.py.orig	2019-12-11 13:51:21.000000000 +0100
+++ redash/query_runner/impala_ds.py	2021-08-24 08:55:59.632420809 +0200
@@ -46,16 +46,9 @@
                 "port": {
                     "type": "number"
                 },
-                "protocol": {
-                    "type": "string",
-                    "title": "Please specify beeswax or hiveserver2"
-                },
                 "database": {
                     "type": "string"
                 },
-                "use_ldap": {
-                    "type": "boolean"
-                },
                 "ldap_user": {
                     "type": "string"
                 },
@@ -64,10 +57,22 @@
                 },
                 "timeout": {
                     "type": "number"
-                }
+                },
+                "auth_mechanism": {
+                    "type": "string",
+                    "title": "Please specify one of the following: NOSASL, PLAIN, GSSAPI, LDAP"
+                },
+                "impersonation": {
+                    "type": "boolean"
+                },
+                "impersonation_domains": {
+                    "type": "string",
+                    "title": "Use localpart of the email addresses of these comma separated domains. You have to specify at least one to enable impersonation."
+                },
             },
             "required": ["host"],
-            "secret": ["ldap_password"]
+            "secret": ["ldap_password"],
+            "order": ['host', 'port', 'database', 'auth_mechanism', 'impersonation', 'impersonation_domains', 'ldap_user', 'ldap_password', 'timeout']
         }
 
     @classmethod
@@ -94,9 +99,15 @@
 
         connection = None
         try:
-            connection = connect(**self.configuration.to_dict())
+            configuration = self.configuration.to_dict()
+            connection = connect(**{k: v for k, v in configuration.items() if k not in ('impersonation', 'impersonation_domains')})
+            impersonation_domains = map(str.strip, configuration['impersonation_domains'].split(','))
+            if configuration['impersonation'] and any(user.email.endswith('@'+domain) for domain in impersonation_domains):
+                cursor_configuration = {'impala.doas.user': user.email.split('@')[0] }
+            else:
+                cursor_configuration = None
 
-            cursor = connection.cursor()
+            cursor = connection.cursor(configuration=cursor_configuration)
 
             cursor.execute(query)

Yes please open a PR for this.