Cohort Query Example

Cohorts are a good tool to review retention of users over a defined period of time.

To create your own cohort report, you’ll need to define your cohort time frame, population, and activeness of users.

Here is an example of a cohort query in PostgreSQL, step by step:

We used the WITH clause in this example - WITH clauses allow you to name a sub-query block, this way your query is modular (and sometimes runs faster), it can later be referred in the main query instead of making a large main query with many aliases, JOINs or other complexities. You can have multiple sub-queries, just be sure to have a comma between them.

  1. Select the time frame you want to investigate (usually a between a week and a month)

    WITH    
    time_frame AS (    
    SELECT CURRENT_DATE - 14   
    ),
    
  2. Define your population relative to the cohort date, for each following day

    population AS (
      SELECT created_at::date AS cohort_date, id AS unique_id
      FROM users
      WHERE created_at > (SELECT * FROM time_frame)
    ),
    
  3. Define what’s an active user to you - what event interest you to examine

    activity AS (
      SELECT created_at::date AS activity_date, org_id AS unique_id, cohort_date
      FROM events
      JOIN population ON population.unique_id = org_id
      WHERE created_at > (SELECT * FROM time_frame)
    ),
    
  4. Aggregate your population by cohort date (day 1, day 2…)

    population_agg AS (
      SELECT cohort_date, COUNT(distinct unique_id) AS total
      FROM population
      GROUP BY 1
    )
    
  5. Write your query to show your population % by cohort dates

    SELECT activity.cohort_date AS date,
           date_part('day',age(activity_date, activity.cohort_date)) AS day_number,
           COUNT(distinct unique_id) AS value,
           total
    FROM activity
    JOIN population_agg ON activity.cohort_date = population_agg.cohort_date
    GROUP BY 1, 2, 4
    
  6. Add a cohort visualization to your query and you’re done!

4 Likes

In point 5 SELECT activity.cohort_date AS DATE, you have to use alias DATE in lowercase otherwise it will not work

1 Like

It was probably missed because Postgres is case insensitive, but fixed now. :+1:

1 Like

about cohort visualization, it can only start from day one.how can i change the starting point to day 0

This only works for daily cohort, but not for weekly or monthly.
Why? Say that a user did activities in 2 days in a week.
Using this cohort and weekly is chosen, that user will be counted as 2 whereas that user should be counted as 1.

After all it depends on how you define the calculation of the cohort. And this cohort feature in redash can’t support my definition for weekly or monthly cohort.

Your query needs to return weekly or monthly data for this to work. The visualization won’t recalculate the values when switching to weekly/monthly cohort (for the exact reason you mentioned).

Yes, i just had an idea if the bucket field only accepts date data type. I’ll use date_trunc function for weekly and monthly respectively. Will try soon.
Thanks!

@arikfr I got to step 6, my query render’s correctly as a table till that point.

But I’m not sure what you mean/Implemnent step 6? I use metabase, not a redash user. But how did you implement step 6 and get that to display?

As how do I Add a cohort visualization to your query and you’re done! Can you share the Query for that?

Thanks,

Just wanted to note on this thread that we’ve added some documentation for Cohorts in Redash here.

Hi, I’m having trouble creating a monthly cohort chart. I copied the query and modified the variables based on my case and successfully created a daily cohort chart. When I changed it into a monthly timeframe, it is still displaying the cohort in a daily format.

In order to transform the data into a monthly one, I’d modify:

  1. created_at::DATEDATE_TRUNC('month', created_at)::DATE (in population CTE)
  2. created_at::DATEDATE_TRUNC('month', created_at)::DATE (in activity CTE)
  3. DATE_PART('day', age(activity_date, activity_.cohort_date))DATE_PART('month', age(activity_date, activity_.cohort_date)) (in the last clause)

Can you please give me an example on how you would create a monthly cohort based on the current case?

In Step 3, instead of created_at::date AS activity_date should be your activity date such as event_timestamp::date. Otherwise the age is always zero.

Hi Arik, can I get a link to the data source you used for this. getting the dataset will enable me to understand the code very well. Or at least a snapshot of the first 10rows with the columns

Hi arikfr, this query really help me, i did for monthly cohort. Thank You !! :smiling_face_with_three_hearts:

1 Like