Hi re:dash team,

Our re:dash POC seems to be getting some traction, and our COO asked if it has Google DFP integration, because that would be a killer feature in his opinion. We checked and the answer is “no”, but we’d like to know if that’s something that would be likely to make its way into the roadmap, and what sort of development effort might be required to achieve that.

(There’s a Python library that connects to DFP that could be useful for this.)


I’m always happy to see additional data sources added to Redash :thumbsup:

I guess the easiest way to have an MVP for such data source will be to have a JSON based “query” that describes the data you want to fetch, and the code that will translate this into calls to DFP’s API.

For reference you can check all the API based data sources, like JIRA (queries example) or Google Analytics (queries example).

I’ll be happy to brainstorm on how to implement this if you get to doing this.

We have an integration of DFP to Redshift. Not direct, but the data can get pipelined to a supported data source. There are some mechanics that create some complexity. For example, auditing to make sure that DC has in fact delivered the expected payloads. Happy to discuss further.

Here are some sample data definitions:

Before we go further, did you mean Redshift or Redash? :slight_smile:

I meant Redshift. :slight_smile:

With the data in Redshift you can easily point Redash to that as a data source. There is post processing that you will typically want to perform after the data comes out of DC, part of which occurs in Redshift. For example, since DC provides no assurances of record uniqueness you need to undertake a de-duplication process. Part of that process occurs in Redshift so a Redash user has the underlying data ready to go.

Im also not sure about performance on the Redash side in certain cases like a DC integration. For example, over the course of a month you could easily accumulate over 200M+ rows of data depending on your account activity. My understanding of the Redash model is not to act as a data store per se, but provide the connectivity to a location that is well suited for that role. In your case maybe the data from DC is in the thousands, not millions/billions of records which might be manageable. However, the volume can be significant variable that would need to be cared for at some level if Redash was to become part ETL and part warehouse for DC.

Lastly, I assume you are looking talking about the Reporting API for DC and not data transfer (https://developers.google.com/doubleclick-advertisers/dtv2/overview). Data transfer is another animal altogether :slight_smile:

So in summary, you’re saying that we need to get the DFP data (yes Reporting API) into $some_database and then point Redash to that, because querying that much data over the API doesn’t suit Redash (if not, when does / doesn’t a data source suit Redash)? I don’t know how much data it would be - I can ask our AdOps team.

Yes, exactly. Load DC to a DB then point Redash at that DB and the DC data.

In terms of when does a data source suit Redash, I would need to leave that to @arikfr to clarify.

My 2 cents would say that tools like Redash look for an abstraction between the database (i.e. Redshift, MySQL…) and how that data arrived into that DB. I don’t think Redash is focused on pipelining data from upstream sources and then storing it locally/internally. I could be wrong. There are some cases, like Google Analytics, where that is not true. However, to me the sweet spot is connecting Redash to a database where any ETL/ELT has already occurred.

In general, I’m happy to merge in any data source that someone finds useful. But I think that datasources that need further data manipulation are less of use in Redash’s context.