Joining two sources

Hello all,

I have a question. I have two data sources:

  • data from google analytics incl a ga_adwordsCampaignID
  • data from a PBX with call information incl the same id (now called Adgroup_id)

What i do now: i make two queries on the two sources (query 9 and 11). I then want to join them using another query:

select * from query_9 as q1830
join query_11 as qga on q1830.Adgroup_id = qga.ga_adwordsCampaignID and date(date) =date(ga_date)

This kind of does what i want, BUT sometimes i have id’s present in table 1 and not in table 2 and visa versa:

table1                 table2
id   col1    col2   id    col3    col 4
1    2         3    1     6       7
2    5         2
                    3     4       2

What i would like is the following result:


id   col1    col2     col3    col 4
1    2         3         6         7
2    5         2         0         0
3    0         0         4         2

Now i basically have data for every id and zeros for non existing ids in the corresponding query.
But the above query gives me:


id   col1    col2     col3    col 4
1    2         3         6         7

Any help would be greatly appreciated.

Welcome to the forum! This question is more to do with SQLite than Redash specifically. The QRDS data source uses SQLite.

The problem is you’re using JOIN syntax, which only retrieves rows where the ids match. This will discard any rows on either table if the id doesn’t appear on both tables.

In most other databases you would use the FULL OUTER JOIN syntax to include all the rows. But SQLite doesn’t support full outer joins. There is a workaround though. You’ll select all the data from both tables with a UNION statement, and then join both tables to this unioned table. There are a few ways to write it. I tried my best to make the query out of what you provided. Give this a try and see how it goes!


WITH all_ids as (
SELECT Adgroup_id id FROM query_9
UNION ALL
SELECT ga_adwordsCampaignID id FROM query_11
)

SELECT * FROM all_ids
LEFT JOIN query_9 as q1830 ON all_ids.id = q1830.Adgroup_id
LEFT JOIN query_11 as qga ON all_ids.id = qga.ga_adwordsCampaignID and date(q1830.date) = date(qga.ga_date)
ORDER BY id

Hello Jesse, impressively enough that query works out of the box, but what happens now is that i have a lot of duplicates like this:

date           numcalls       clicks
14/03/22 11:58  56             19
14/03/22 11:58  56             19
14/03/22 11:58  56             19
14/03/22 11:58  56             19
14/03/22 11:58  56             19
15/03/22 11:07  76             25
15/03/22 11:07  76             25

I wonder if this is coming from the union or from the LEFT JOIN?

UPDATE: Yes, it is definitely coming from the UNION. The problem is the ids are included many times in the GA data (multiple dates, etc). So all_ids contains ids like: 1,1,1,1,2,2,2,2,2,3,4,4,4,5,6,7,7,etc
I tried adding a second WITH distinct_ids as () but that resulted in a syntax error (it was a long shot anyway). So my question really is: how can i make sure ids are only included once?

Changing UNION ALL to just UNION will do it.