Teradata data source support


#1

Redash is a very powerful tool and we’re already using it with Big Query for some use cases. Unfortunately we’re tied to a Teradata MPP Datawarehouse for highly sensible and critical data.

Teradata published a Python module (https://developer.teradata.com/tools/reference/teradata-python-module) last year. Since we’re already used to this module we made a first approach (https://gist.github.com/mrbungie/5fcc076e67e047c8a61de706403b1138) but it’s still in rough shape.

There are a few problems about our current implementation:

  • Teradata’s module uses ODBC, so there are a few dependencies (unixodbc and Teradata ODBC driver) that may add complexity to the entire project.
  • It also handles datatypes in such a way that we’re getting NaNs and Infinite values via json.dumps. That means invalid JSON. We made a workaround via a custom JSON Encoder, but is dirty. Another workaround would involve using a custom datatype handler.

Both Redash and Teradata are excellent tools by themselves but we think integrating them would make them even more powerful.

PS: Sorry about my english.


Teradata Connector
#2

Teradata query runner will be very welcomed. And yours looks OK.

As for the problems you mentioned:

  • Support for NaN: maybe we should add it to redash.utils.JSONEncoder as other data sources might return a NaN as well. Maybe instead of the is_finite check we can use math.isnan?
  • Some other data sources require custom packages as well (system or Python). That’s OK. If they are too complex to install (like in the case of Oracle) then we just not include them by default, and only document it. Although I’m thinking of splitting out the query runners into separate PyPi packages, to make it easier to manage their dependencies. We can start with the Teradata query runner as a first pilot, if you’re interested.

Btw, no need to apologize for your English - it’s great and anyway I’m not a native speaker myself.


#3

Thanks for your prompt response.

  • I think that’s an option, but Teradata may also return Decimal(‘Infinity’) or Decimal(’-Infinity’). That’s why we used is_finite, as it covers those three cases.
  • We would be glad to work on it, but we need some guidance over how it should be designed.

#4

OK, let’s keep is_finite then.

We would be glad to work on it, but we need some guidance over how it should be designed.

I’ll be happy to provide guidance. What you did so far looks good to me. You’re welcome to make a pull request (with an update to the documentation with instructions on what dependencies it needs).

If you have the time and patience, we can use the Teradata connector as a pilot for having the query runners in a separate project.


#5

@german.oviedo , Hey , I have tried to use your gist of td.py and have installed unixodbc and Teradata ODBC driver but still no success. Just wondering if you have got this running successfully.

Thanks


#6

Hi @neouber.

It’s a shame we couldn’t develop this any further, being time the main issue here. Anyways, I would say the main thing here is to test Teradata’s python module and try to use the gist once you’re sure pytd is working by itself.

Some insights about configuring Teradata ODBC in Linux.

  1. You don’t need unixodbc for using TD ODBC Driver. Yes I know I said it before, but we learned this some time after.
  2. You should have the following TD packages installed: tdicu, TeraGSS, tdodbc, cliv2, piom, sqlpp.
  3. Make sure you have your environment set correctly. At least you should have ODBCINST, ODBCINI, COPLIB, COPERR set to the ODBC directories. Also check you’re including Teradata’s lib/lib64 in LD_LIBRARY_PATH.
  4. Install teradata’s python module using pip: pip install teradata.
  5. Test the connection using pytd. Check this guide: https://developer.teradata.com/tools/reference/teradata-python-module

Now, remember some time has passed since we last tested this code. I don’t know if there’s been any changes in APIs or whatsoever that would’ve rendered my gist useless.


#7

Thanks @german.oviedo ,
I have got pytd up and running separately using the link http://crashthatch.tumblr.com/post/66957708538/teradata-odbc-connection-using-python-on-ubuntu , so from command line below commands works successfully

import teradata
udaExec = teradata.UdaExec (appName="HelloWorld", version="1.0",logConsole=False)
session = udaExec.connect(method="odbc", system="12.34.56.78" ,username="XXXX" , password="XXXX");
for row in session.execute("SELECT * from xyztable"):
    print(row)

For setup I have followed the below steps

  1. Created a new file as redash/query_runner/td.py and pasted the gist

  2. Added ‘redash.query_runner.td’ in settings.py

  3. Add teradata in requirements_all_ds.txt and I have manually installed teradata python module above as well.

4 . Then I tried to create a new connection from redash
But using the same credentials when I am trying to test connection , its saying Connection Failed . Also can’t see any useful error message even with DEBUG mode on redash side.(REDASH_LOG_LEVEL). This link has all the debug logs - http://pastebin.com/hfHxhfL0

Even though connection test was unsuccessful , i tried querying using the same connection and it gives me an error something like “TypeError: run_query() takes exactly 2 arguments (3 given)” , so it is expecting 2 params and got 3 in run_query function.
To fix this I have added 3rd param “user” as below in your gist code.

def run_query(self, query, user):

Still the connection test was unsuccessful with Connection Failed error but on Querying i have started getting the below error
“Error running query: ‘module’ object has no attribute ‘api’”

More detailed error –

[2017-01-19 10:57:07,653: WARNING/Worker-9] Unexpected error while running query:
Traceback (most recent call last):
  File "/opt/redash/current/redash/tasks/queries.py", line 410, in run
    data, error = query_runner.run_query(annotated_query, self.user)
  File "/opt/redash/current/redash/query_runner/td.py", line 99, in run_query
    except teradata.api.Error, e:
AttributeError: 'module' object has no attribute 'api'

Don’t know where to go from here now. Any help will be really appreciated.

Thanks


#8

I made some changes to my gist. Take a look and check whether it works https://gist.github.com/mrbungie/5fcc076e67e047c8a61de706403b1138


#9

Using your new gist code now . Still unsuccessful.
To debug the issue , I have removed try /except from run_query function and now its throwing me error as

[2017-01-19 16:15:56,343: WARNING/Worker-2] Unexpected error while running query:
Traceback (most recent call last):
  File "/opt/redash/current/redash/tasks/queries.py", line 410, in run
    data, error = query_runner.run_query(annotated_query, self.user)
  File "/opt/redash/current/redash/query_runner/td.py", line 101, in run_query
    udaExec = teradata.UdaExec(appName="redash", version="1.0",logConsole=False)
**AttributeError: 'module' object has no attribute 'UdaExec'**

For me it looks like that its not able to import teradata module and thats why it cannot understand attribute ‘UdaExec’.

Funny thing is I can run the below code from command line successfully but from redash it not able to import teradata module in td.py

import teradata
udaExec = teradata.UdaExec (appName="HelloWorld", version="1.0",logConsole=False)
session = udaExec.connect(method="odbc", system="12.34.56.78" ,username="XXXX" , password="XXXX");
for row in session.execute("SELECT * from xyztable"):
    print(row)

I have even mentioned below variables in env

  • ODBCINI=/usr/local/etc/odbc.ini
  • ODBCINST=/usr/local/etc/odbcinst.ini
  • NLSPATH=/opt/teradata/client/14.10/odbc_64/msg/tdodbc.cat
  • LD_LIBRARY_PATH=/opt/teradata/client/14.10/odbc_64/lib

Any way to debug this ?


#10

It’s redash running under the same user/environment/profile you’re running your “test” code?

I would check that, maybe you’re running Redash under a different python environment. What’s weird is that there are no import errors though.


#11

I am running all this in Docker , I am sshing into the redash docker container and then running the python script … But when using through web means Redash UI from the same container I am getting this error.
In my docker-compose.yml i have added below env variables

- ODBCINI=/usr/local/etc/odbc.ini
- ODBCINST=/usr/local/etc/odbcinst.ini
- NLSPATH=/opt/teradata/client/14.10/odbc_64/msg/tdodbc.cat
- LD_LIBRARY_PATH=/opt/teradata/client/14.10/odbc_64/lib

#12

And the myredash/v1:latest Dockerfile?


#13

Here you go … my docker-compose.yml

  redash:
    #  image: redash/redash:latest
      image: myredash/v1:latest
      ports:
        - "5000:5000"
      links:
        - redis
        - postgres
      environment:
        - REDASH_LOG_LEVEL=DEBUG
        - REDASH_REDIS_URL=redis://redis:6379/0
        - REDASH_DATABASE_URL=postgresql://postgres@postgres/postgres
        - REDASH_COOKIE_SECRET=yoyosecret
        - REDASH_NAME=TestDash
        - REDASH_MAIL_SERVER=relay.testmail.com
        - REDASH_MAIL_DEFAULT_SENDER=noreply@testmail.com
        - REDASH_HOST=http://192.168.99.100
        - ODBCINI=/usr/local/etc/odbc.ini
        - ODBCINST=/usr/local/etc/odbcinst.ini
        - NLSPATH=/opt/teradata/client/14.10/odbc_64/msg/tdodbc.cat
        - LD_LIBRARY_PATH=/opt/teradata/client/14.10/odbc_64/lib
    redis:
      image: redis:2.8
    postgres:
      image: postgres:9.3
    #  environment:
    #    - POSTGRES_PASSWORD=supersecret
      volumes:
        - /home/docker/postgres-data:/var/lib/postgresql/data
    redash-nginx:
      image: redash/nginx:latest
      ports:
        - "80:80"
      links:
        - redash

#14

Ohh sorry , I have pasted the docker-compose.yml instead of Dockerfile.

I haven’t created “myredash/v1” image from Dockerfile, what i did is this below —

  1. Started the official Redash Docker image

  2. SSH into that image

  3. Installed all the Teradata Drivers and odbc related files referring - http://crashthatch.tumblr.com/post/66957708538/teradata-odbc-connection-using-python-on-ubuntu

  4. added below variables in ~/.bashrc
    ODBCINI=/usr/local/etc/odbc.ini
    ODBCINST=/usr/local/etc/odbcinst.ini
    NLSPATH=/opt/teradata/client/14.10/odbc_64/msg/tdodbc.cat
    LD_LIBRARY_PATH=/opt/teradata/client/14.10/odbc_64/lib

  5. Commit the container and named it as “myredash/v1”

How did you installed your Teradata Drivers ? and yes I am also wondering that how come it is not throwing error while “import teradata” command


#15

That might explain the problem, when I used it I built a specific Dockerfile for Redash. Maybe as redash is running as a daemon, it can’t see new modules on runtime. It has happened to me in other cases, so it’s posible.

Also, I don’t remember doing as much as that guide, try the following:

  1. Download/clone this repo: https://github.com/getredash/redash
  2. Put td.py inside redash/query_runners
  3. Add ‘redash.query_runner.td’ in settings.py
  4. Add teradata in requirements_all_ds.txt.
  5. Download and replace the repo Dockerfile with it: https://gist.github.com/mrbungie/585d8ed1389b624efbf691dc134c366c
  6. Copy tdodbc__linux_imdep.14.10.00.00-1.tar inside the root
  7. Build it with: docker build -t redashtd:latest .

Tell me if it works.

P.S.: Check the tar command in the modified Dockerfile, just in case.

Edit: I just reread your comment and you commited with teradata’s files. That’s really weird then, but give the gist Dockerfile a try :wink:


#16

Hooooo … Finally got it up and running using the above steps.
I have done few more changes and below is my updated gist which works

Redash-Teradata Dockerfile

One thing which i am seeing is that when querying date or timestamps , its just returning null.
So below queries will return null

SELECT CURRENT_DATE
SELECT CURRENT_TIMESTAMP

Running them on command line using the below code is returning results but not in Redash. Other datatypes are coming fine.

import teradata
udaExec = teradata.UdaExec (appName="HelloWorld", version="1.0",logConsole=False)
session = udaExec.connect(method="odbc", system="12.34.56.78" ,username="XXXX" , password="XXXX");
for row in session.execute("SELECT CURRENT_TIMESTAMP"):
    print(row)

I suspect the culprit might be Line 116 in gist of td.py , as thats the place where retured data from cursor is getting manipulated

Let me know your thoughts.

Thanks a lot for your help and guidance till now. I really appreciate it


#17

@german.oviedo @arikfr … any thoughts on above issue of Dates/Timestamps coming as NULL.