묻고 답해요
141만명의 커뮤니티!! 함께 토론해봐요.
인프런 TOP Writers
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 연습문제
1번with base as ( select user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER (PARTITION BY user_pseudo_id), WEEK(MONDAY)) as event_first_week, DATE_TRUNC(event_date,WEEK(monday)) as event_week, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') as event_time from advanced.app_logs ) , diff_date_tbl as ( select user_pseudo_id, event_first_week, event_week, DATE_DIFF(event_week, event_first_week, WEEK) as diff_date from base order by diff_date ) , user_counts as ( select event_first_week, diff_date, count(distinct user_pseudo_id) as user_count from diff_date_tbl group by all order by diff_date ) select * , ROUND(SAFE_DIVIDE(user_count, first_user_count), 2) as retention_rate from ( select * , FIRST_VALUE(user_count) OVER (PARTITION BY event_first_week ORDER BY diff_date) as first_user_count from user_counts ) 2번with base as ( select user_pseudo_id, MIN(event_date) OVER (PARTITION BY user_pseudo_id) as event_first_day, event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') as event_time from advanced.app_logs ) , period as ( select user_pseudo_id, event_date, LAG(event_date) OVER (PARTITION BY user_pseudo_id ORDER BY event_date) as before_event_date from base GROUP BY ALL ORDER BY user_pseudo_id ) -- 평균주기 구하기 SELECT AVG(diff_day) as avg_diff_period FROM ( select *, DATE_DIFF(event_date, before_event_date, DAY ) as diff_day from period ) -- 접속 주기 분포도 확인 SELECT diff_day, COUNT(distinct user_pseudo_id) as user_count FROM ( select *, DATE_DIFF(event_date, before_event_date, DAY ) as diff_day from period ) GROUP BY ALL ORDER BY diff_day 1) new 유저 = (최근월 - 1)월 = 첫접속일자(event_first_month) 2) current 유저 = (최근월 -1)월 , (최근월 -2)월 모두 활동 o 3) dormant 유저 = (최근월 - 1)월 , (최근월 -2)월 모두 활동 x 4) resurrected 유저 = (최근월 -2)월 활동 x, (최근월 -1)월 에는 활동 with base as ( select user_pseudo_id, event_date, DATE_TRUNC(event_date, MONTH) as event_month from advanced.app_logs order by 1 ) , min_max_month as ( SELECT *, MIN(event_month) over (partition by user_pseudo_id order by event_month) as first_month, MAX(DATE_TRUNC(event_date, MONTH)) OVER () as lastest_month from base ) , filter_month as ( SELECT user_pseudo_id, first_month, lastest_month, COUNT(CASE WHEN event_month = DATE_SUB(lastest_month, INTERVAL 1 MONTH) THEN 1 END) AS previous_1_lastest_month, COUNT(CASE WHEN event_month = DATE_SUB(lastest_month, INTERVAL 2 MONTH) THEN 1 END) AS previous_2_lastest_month FROM min_max_month group by ALL ) , user_classification as ( SELECT *, CASE WHEN first_month = DATE_SUB(lastest_month, INTERVAL 1 MONTH ) THEN 'New' WHEN previous_1_lastest_month > 0 and previous_2_lastest_month >= 0 THEN 'Current' WHEN previous_1_lastest_month = 0 and previous_2_lastest_month = 0 THEN 'Dormant' WHEN previous_1_lastest_month = 0 and previous_2_lastest_month > 0 THEN 'Resurrected' ELSE 'Others' END AS user_category FROM filter_month ) select user_category, count(distinct user_pseudo_id) from user_classification group by all 4번WITH base AS ( SELECT user_pseudo_id , first_value(datetime(timestamp_micros(event_timestamp), 'Asia/Seoul')) over(partition by user_pseudo_id order by event_timestamp) as first_visit , last_value(datetime(timestamp_micros(event_timestamp), 'Asia/Seoul')) over(partition by user_pseudo_id order by event_timestamp) as last_visit , datetime(timestamp_micros(event_timestamp), 'Asia/Seoul') as event_datetime , date_trunc(datetime(timestamp_micros(event_timestamp), 'Asia/Seoul'), day) as event_date , date_trunc(datetime(timestamp_micros(event_timestamp), 'Asia/Seoul'), week(monday)) as event_week , date_trunc(datetime(timestamp_micros(event_timestamp), 'Asia/Seoul'), month) as event_month , event_name FROM advanced.app_logs ), click_payment AS ( SELECT DISTINCT user_pseudo_id , min(event_week) over(partition by user_pseudo_id) as first_week , event_week as week_date FROM base WHERE event_name = 'click_payment' ), diff_week AS ( SELECT first_week, date_diff(week_date, first_week, week) as week_diff, count(distinct user_pseudo_id) as users_cnt FROM click_payment GROUP BY ALL ) SELECT first_week , week_diff , users_cnt , safe_divide(users_cnt, first_value(users_cnt) over(partition by first_week order by week_diff)) as retention_rate FROM diff_week order by 1, 2 ;
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 분석
노션 링크로 과제 업로드 합니다.과제 풀이 노션 링크 : [인프런 빅쿼리 빠짝스터디 3주차] 리텐션 분석
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디] 3주차 연습 문제 풀이 제출 - 리텐션 및 추가문제
과제풀이를 노션에 하여서 노션링크 업로드 하여 제출합니다노션링크: https://canyon-king-6a2.notion.site/3-13ce4e3151278004b5abe5e38e5268b8?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디] 3주차 연습 문제 풀이 제출 - 리텐션
과제 풀이를 노션에 작성하여, 노션 링크 업로드 합니다.과제 풀이 노션 링크 : https://sapphire-lotus-719.notion.site/3-1-3-13571425e28d8056afe1c68dd7efaa6e?pvs=4
-
미해결실무자를 위한 구글애널리틱스(GA4+GTM) 활용법(24년 Update)
측정항목 비활성화
GA-탐색-유입경로 탐색 분석 시 측정항목이 비활성화되어있습니다.항목들을 활성화하려면 어떻게 해야 하나요?
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 분석
✅ 연습문제 (1) weekly retention--1) weekly retention WITH base AS ( -- 날짜 데이터 전처리 -- DATE도 실제 날짜와 일치하지 않을 수 있으니, 다시 확인하기 -- DATETIME(TIMESTAMP_MICROS(timestamp데이터),'대륙/국가') 기억하고 사용하기 SELECT DATETIME(TIMESTAMP_MICROS(event_timestamp),'Asia/Seoul') AS event_datetime, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp),'Asia/Seoul')) AS event_date, event_name, user_pseudo_id FROM `bigquery-432401.avdanced.app_logs_temp` ) -- 일자 간 차이 구하기 , cal_week AS ( SELECT DISTINCT user_pseudo_id, -- 최초일(week 단위로) 구하기 DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, -- 원본 날짜 데이터에서 주차 데이터로 구하기 DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ), cal_diff AS ( -- 주차데이터간 차이 구하기 SELECT *, DATE_DIFF(event_week, first_week,WEEK) AS diff_week FROM cal_week ), user_count AS ( -- 주차별 user 수 구하기 SELECT diff_week, COUNT(distinct user_pseudo_id) as user_cnt FROM cal_diff GROUP BY diff_week ) -- 출력 SELECT *, ROUND(SAFE_DIVIDE(user_cnt, first_week_user_cnt),2) AS retention_rate FROM ( SELECT diff_week, user_cnt, FIRST_VALUE(user_cnt) OVER(ORDER BY diff_week) AS first_week_user_cnt FROM user_count ) 아직 익숙하지 않아서 중간중간 다시 코드 보고 작성했다. 점점 더 익숙해지길! (2) Retain UserWITH base AS ( -- 날짜 데이터 전처리 -- DATE도 실제 날짜와 일치하지 않을 수 있으니, 다시 확인하기 -- DATETIME(TIMESTAMP_MICROS(timestamp데이터),'대륙/국가') 기억하고 사용하기 SELECT DATETIME(TIMESTAMP_MICROS(event_timestamp),'Asia/Seoul') AS event_datetime, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp),'Asia/Seoul')) AS event_date, event_name, user_pseudo_id FROM `bigquery-432401.avdanced.app_logs_temp` WHERE event_date BETWEEN "2022-08-01" AND "2022-11-03" ) , cal_diff AS ( -- 날짜데이터 간 차이 구하기 SELECT *, DATE_DIFF("2022-11-03", event_date,DAY) AS diff_date FROM base ), user_type AS ( -- user type 구하기 -- 0. 해당없음 -- 1. new : 첫번째 접속 --2. current : 최근 30일내에 접속 -- 3. resurrected : 30일 이상 활동이 없고, 최근 7일 이내 다시 접속한 사용자 -- 4. dormant : 최근 60일 간 접속기록 없음 SELECT *, CASE -- 첫 번째 접속 => 'new' WHEN diff_date = FIRST_VALUE(diff_date) OVER(PARTITION BY user_pseudo_id ORDER BY event_date) THEN 'new' -- 최근 30일 내에 접속한 사용자 => 'current' WHEN diff_date <= 30 THEN 'current' -- 30일 이상 활동이 없고, 최근 7일 이내 다시 접속한 사용자 => 'resurrected' WHEN diff_date > 30 AND diff_date <= 37 THEN 'resurrected' -- 60일 이상 접속하지 않은 사용자 => 'dormant' WHEN diff_date > 60 THEN 'dormant' ELSE 'none' END AS type FROM cal_diff GROUP BY all ORDER BY user_pseudo_id,diff_date ) SELECT * FROM user_type나름 user를 나눈다고 나눴는데, 잘 못 나눠진 느낌이 든다! 현업에는 어떻게 나눌지 궁금하다(3) 어떤 user가 많은가?WITH base AS ( -- 날짜 데이터 전처리 SELECT DATETIME(TIMESTAMP_MICROS(event_timestamp),'Asia/Seoul') AS event_datetime, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp),'Asia/Seoul')) AS event_date, event_name, user_pseudo_id FROM `bigquery-432401.avdanced.app_logs_temp` WHERE event_date BETWEEN "2022-08-01" AND "2022-11-03" ), cal_diff AS ( -- 날짜데이터 간 차이 구하기 SELECT *, DATE_DIFF("2022-11-03", event_date, DAY) AS diff_date FROM base ), user_type AS ( -- user type 구하기 SELECT distinct user_pseudo_id, CASE WHEN diff_date = FIRST_VALUE(diff_date) OVER(PARTITION BY user_pseudo_id ORDER BY event_date) THEN 'new' WHEN diff_date <= 30 THEN 'current' WHEN diff_date > 30 AND diff_date <= 37 THEN 'resurrected' WHEN diff_date > 60 THEN 'dormant' ELSE 'none' END AS type FROM cal_diff ), cal_cnt AS ( -- 각 type별 사용자 수 계산 SELECT type, COUNT(user_pseudo_id) AS user_cnt FROM user_type GROUP BY type ) SELECT type, user_cnt, ROUND(user_cnt * 100.0 / (SELECT COUNT(*) FROM user_type), 2) AS ratio FROM cal_cnt ORDER BY ratio DESC new인 사용자가 69.87 %로 전체의 대다수를 차지한다. 그다음이 최근에 7일 내에 접속한 유저가 21.83%, resurrected : 30일 이상 활동이 없고, 최근 7일 이내 다시 접속한 사용자가 2.59%, 최근 60일 간 접속기록 없는 사용자가 1.25%를 차지하였다.2022-11-03일 기준 "2022-08-01" 에서 "2022-11-03" 기간 동안 서비스를 이용하는 대다수의 사람들은 신규 가입자들이 많다는 것을 알 수 있다.또한, current User가 많은 것으로 보아, 최근 7일이내에 서비스를 사용한 사람들의 비율이 두번째로 높다.현재 신규이용자의 수가 대다수를 차지함으로 이 신규 사용자를 current user로 전환하기 위한 전략을 고안해야 할 것으로 보인다.none이 resurrected, dormant보다 많은데, type을 나누는 기준을 수정하면 더 효율적이게 나눌 수 있지 않을까 생각된다.(4) core event를 click_payment로 놓았을 때 Weekly retentionWITH base AS ( SELECT event_name, user_pseudo_id, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_time FROM `bigquery-432401.avdanced.app_logs_temp` WHERE event_name = "click_payment" ), week_diff AS ( SELECT *, DATE_DIFF(event_week, first_week, WEEK) AS diff FROM ( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ) ), user_counts AS ( SELECT diff, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM week_diff GROUP BY diff ) SELECT *, ROUND(SAFE_DIVIDE(user_cnt, first_week_user_cnt), 2) AS retention_rate FROM ( SELECT diff, user_cnt, FIRST_VALUE(user_cnt) OVER(ORDER BY diff ASC) AS first_week_user_cnt FROM user_counts )1주차 이후로 사용자의 수가 급감. 2주차 부터는 계속 0.01을 유지. 15주차부터 24주차까지는 0 retention rate를 유지.사용자가 1주차 이후로 대거 이탈함을 볼 수 있음. 15주차부터 0.01에서 0.0으로 감소. 1주차에 이벤트 혹은 서비스 런칭으로 사용자수가 급격히 증가하였으나, 서비스 경험이 사용자에게 매력적으로 다가오지 못하여 바로 그 다음주부터 급격하게 감소하는 것을 볼 수 있음.따라서, core_event에 대한 근본적인 개선이 필요할 것으로 보임. ✅ 배운점 및 느낀점 리텐션 분석에 대한 전반적인 지식을 얻을 수 있었다. 아직, 데이터에서 리텐션 분석을 하는 것이 익숙치 않아 학습을 더 해야 할 것 같다. user type을 나누는 부분에서 더 짜임새 있는? 기준이 뭔지, 현업에서는 어떤 기준으로 사용자를 나누는지 궁금하다!고객을 분석하기 위해서는 어떤 부분을 더 봐야하는 지 아직 크게 감이 없는데, 다른 레퍼런스를 보는 것이 필요할 것 같다는 생각이 들었다
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 연습문제
1번 과제#weekly retention WITH base AS( SELECT DISTINCT user_id, event_name, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, DATE(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_date, user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN '2022-08-01' AND '2022-11-03' ) , dates AS( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER (PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ) , first_week_and_diff AS ( SELECT *, DATE_DIFF(event_week, first_week, week) AS diff_of_week FROM dates ) , user_counts AS ( SELECT diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_week_and_diff GROUP BY diff_of_week ORDER BY diff_of_week ) , first_week_user_count AS ( SELECT diff_of_week, user_cnt, FIRST_VALUE(user_cnt) OVER(ORDER BY diff_of_week) AS first_week_user_cnt FROM user_counts WHERE diff_of_week = 0 ) SELECT *, SAFE_DIVIDE(user_cnt, first_week_user_cnt) AS retain_rate FROM first_week_user_count 2번 과제New+Current+Resurrected+Dormant User로 나누기일단 1주일만에 리텐션이 6.8%대이니까, 신규유저의 기준을 7일로 두겠습니다.리텐션이 반으로 떨어지는 3% 미만이 8주부터, 1% 미만이 12주부터여서 임의로 구간을 세팅하겠습니다.이에 따라 미접속 8주차부터는 휴면 유저로 보겠습니다.따라서 8주 뒤 다시 들어온 유저는 복귀 유저로 보겠습니다. # 신규 유저, 기존 유저, 이탈 유저, 복귀 유저로 나눈 쿼리 # 첫 event 후 7일 안 지난 유저 # event 후 7일 지났고, 마지막 접속으로부터 8주가 안 지난 유저 # event 후 7일이 지났고, 마지막 접속으로부터 8주가 지난 유저 # 첫 event 후 7일이 지났고, 마지막 접속으로부터 8주가 지났다가, 다시 event가 발생한 유저 WITH base AS( SELECT DISTINCT user_id, event_name, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, DATE(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_date, user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN '2022-08-01' AND '2023-08-31' ) , dates AS ( SELECT MIN(event_date) AS first_date, MAX(event_date) AS last_date FROM base GROUP BY user_pseudo_id ) SELECT event_date, COUNT(DISTINCT CASE WHEN DATE_DIFF(event_date, first_date, DAY) < 8 THEN user_pseudo_id ELSE NULL END) AS new_user, COUNT(DISTINCT CASE WHEN DATE_DIFF(event_date, first_date, DAY) >= 8 AND DATE_DIFF(event_date, last_date, DAY) < 56 THEN user_pseudo_id ELSE NULL END) AS current_user, COUNT(DISTINCT CASE WHEN DATE_DIFF(event_date, first_date, DAY) >= 8 AND DATE_DIFF(event_date, last_date, DAY) >= 56 THEN user_pseudo_id ELSE NULL END) AS resurrected_user, COUNT(DISTINCT CASE WHEN DATE_DIFF(event_date, first_date, DAY) >= 8 END) AS dormant_user FROM dates GROUP BY event_date 2번 과제는 이렇게까진 짜보았으나 실패했습니다..짜보다보니 last_date가 절대로 event_date보다 +가 나올 수가 없더군요...그래서 그냥 일반 날짜 컬럼을 붙일 수 있나...? 어떻게 해야하지? 하고이미 짜둔 쿼리에 매몰돼서 구하질 못했네요 ㅎㅎ혹시 이렇게 짜는 방향에 대해서 조언해주실 점이 있으시면 부탁드리겠습니다.. 3번 과제 아래와 같이 퍼널 나눴다. 획득screen_view, click_login 활성화click_banner, click_food_category, click_restaurant, click_food,view_recommend_extra_food, click_recommend_extra_foodclick_recommend_food, click_restaurant_nearbyclick_search, request_search 수익화 관련click_cartclick_payment 하지만 각 행동한 유저의 수를 구해보니 아래와 같았다.recommend 관련 행동을 한 유저가 오히려 더 적었다. 그래서 각 이벤트별로 유저 수를 구해보았고#event가 click_cart일 때 weekly retention WITH base AS( SELECT DISTINCT user_id, event_name, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, DATE(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_date, user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN '2022-08-01' AND '2023-08-31' AND user_pseudo_id IN (SELECT user_pseudo_id FROM advanced.app_logs WHERE event_name = 'click_cart') ) , dates AS( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER (PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ) , first_week_and_diff AS ( SELECT *, DATE_DIFF(event_week, first_week, week) AS diff_of_week FROM dates ) , user_counts AS ( SELECT diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_week_and_diff GROUP BY diff_of_week ORDER BY diff_of_week ) , first_week_user_count AS ( SELECT diff_of_week, user_cnt, FIRST_VALUE(user_cnt) OVER(ORDER BY diff_of_week) AS first_week_user_cnt FROM user_counts ) SELECT *, SAFE_DIVIDE(user_cnt, first_week_user_cnt) AS retain_rate FROM first_week_user_count 4번 과제처럼 core event를 각각 설정해보았다.4번 과제의 결과인 click_payment 했던 유저의 리텐션이 역시 제일 좋았었다. 반면 cart를 클릭한 유저의 리텐션은 비교적 조오금 낮았고 반대로 click_search, click_recommend_food, click_restaurant_nearby를 한 유저의 리텐션이 구매자 리텐션과 유사했다.그래서 기본 행동들에서, 고객이 더 관심 기울여하는 행동인 검색, 추천 서비스 클릭하기 등의 행동을 유도하면 어떨까.. 싶었다! 사실 10% 이상으로 다 비슷하긴 하지만 약간의 차이라도 보이는 것들을 좀 추려보았다. 4번 과제# click_payment event를 경험한 유저들의 weekly retention WITH base AS( SELECT DISTINCT user_id, event_name, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, DATE(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_date, user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN '2022-08-01' AND '2023-08-31' AND user_pseudo_id IN (SELECT user_pseudo_id FROM advanced.app_logs WHERE event_name = 'click_payment') ) , dates AS( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER (PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ) , first_week_and_diff AS ( SELECT *, DATE_DIFF(event_week, first_week, week) AS diff_of_week FROM dates ) , user_counts AS ( SELECT diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_week_and_diff GROUP BY diff_of_week ORDER BY diff_of_week ) , first_week_user_count AS ( SELECT diff_of_week, user_cnt, FIRST_VALUE(user_cnt) OVER(ORDER BY diff_of_week) AS first_week_user_cnt FROM user_counts ) SELECT *, SAFE_DIVIDE(user_cnt, first_week_user_cnt) AS retain_rate FROM first_week_user_count
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 연습문제
1)WeeklyRetention을구하는쿼리를바닥부터스스로작성해보세요WITH event_log_base AS ( SELECT DISTINCT DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATE_TRUNC(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul'), WEEK(MONDAY)) AS event_week, DATE_TRUNC(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul'), MONTH) AS event_month, user_id, user_pseudo_id, event_name, platform, event_params FROM advanced.app_logs ), user_first_visit AS ( SELECT user_pseudo_id, MIN(event_week) OVER (PARTITION BY user_pseudo_id) AS first_visit_week, event_week FROM event_log_base ), weekly_retention_base AS ( SELECT first_visit_week, DATE_DIFF(event_week, first_visit_week, WEEK) AS week_diff, COUNT(DISTINCT user_pseudo_id) AS active_users FROM user_first_visit GROUP BY first_visit_week, week_diff ) SELECT first_visit_week, week_diff, active_users, SAFE_DIVIDE(active_users, FIRST_VALUE(active_users) OVER (PARTITION BY first_visit_week ORDER BY week_diff ASC)) AS retention_rate FROM weekly_retention_base ORDER BY first_visit_week, week_diff; 2)RetainUser를New+Current+Resurrected+DormantUser로나누는쿼리를 작성해보세요.신규 사용자: 첫 방문 후 30일 이내에 활동한 사용자. 첫 방문 날짜와 마지막 방문 날짜 간의 차이가 30일 이하일 경우 현재 사용자: 첫 방문 후 30일이 지난 사용자 중, 최근 30일 이내에 접속한 사용자. 첫 방문과 마지막 방문 간 차이가 30일을 넘고, 가장 최근 활동으로부터 30일 이내에 접속한 경우 휴면 사용자: 마지막 방문 이후 30일 이상 비활동 상태에 있는 사용자. 마지막 방문이 가장 최근 이벤트로부터 30일 이상 경과한 경우 복귀 사용자: 30일 이상 비활동 상태였다가 다시 활동을 시작한 사용자. 이전 방문 이후 30일 동안 비활동이 이어지다 다시 방문한 경우 WITH event_data AS ( SELECT DISTINCT DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, user_pseudo_id FROM advanced.app_logs ), -- 사용자별 첫 방문 및 마지막 방문 날짜 계산 user_first_last_visit AS ( SELECT user_pseudo_id, MIN(event_date) AS first_visit_date, MAX(event_date) AS last_visit_date FROM event_data GROUP BY user_pseudo_id ), -- 전체 이벤트 데이터에서 가장 최근의 이벤트 날짜를 계산 latest_event_date AS ( SELECT MAX(event_date) AS current_date FROM event_data ), -- 사용자의 이벤트 로그를 통해 비활동 기간을 기록하여 휴면 여부 판단 user_activity AS ( SELECT user_pseudo_id, event_date, LEAD(event_date) OVER (PARTITION BY user_pseudo_id ORDER BY event_date) AS next_event_date, CASE WHEN DATE_DIFF(LEAD(event_date) OVER (PARTITION BY user_pseudo_id ORDER BY event_date), event_date, DAY) > 30 THEN 1 ELSE 0 END AS dormant_history FROM event_data ), -- 사용자 유형 분류: 신규, 현재, 휴면, 복귀 사용자 user_category AS ( SELECT u.user_pseudo_id, u.first_visit_date, u.last_visit_date, led.current_date, MAX(ua.dormant_history) AS dormant_history, CASE WHEN DATE_DIFF(led.current_date, u.last_visit_date, DAY) > 30 THEN 'inactive_user' WHEN DATE_DIFF(u.last_visit_date, u.first_visit_date, DAY) <= 30 THEN 'first_month_user' WHEN MAX(ua.dormant_history) = 1 THEN 'returning_user' ELSE 'active_user' END AS user_type FROM user_first_last_visit AS u CROSS JOIN latest_event_date AS led LEFT JOIN user_activity AS ua ON u.user_pseudo_id = ua.user_pseudo_id GROUP BY u.user_pseudo_id, u.first_visit_date, u.last_visit_date, led.current_date ), -- 사용자 유형별 첫 방문 주차와 현재 주차 간 차이 계산 user_weekly_retention AS ( SELECT uc.user_type, ed.user_pseudo_id, ed.event_date, DATE_DIFF(DATE_TRUNC(ed.event_date, WEEK(MONDAY)), DATE_TRUNC(uc.first_visit_date, WEEK(MONDAY)), WEEK) AS week_difference FROM event_data AS ed JOIN user_category AS uc ON ed.user_pseudo_id = uc.user_pseudo_id ), -- 사용자 유형과 주차별 활성 사용자 수 집계 user_count_by_type_and_week AS ( SELECT user_type, week_difference, COUNT(DISTINCT user_pseudo_id) AS user_count FROM user_weekly_retention GROUP BY user_type, week_difference ), -- 유지율 계산을 위한 초기 사용자 수와 현재 주차별 유지율 계산 retention_calculation AS ( SELECT user_type, week_difference, user_count, FIRST_VALUE(user_count) OVER (PARTITION BY user_type ORDER BY week_difference) AS initial_user_count FROM user_count_by_type_and_week ) SELECT user_type, week_difference, ROUND(SAFE_DIVIDE(user_count, initial_user_count), 2) AS retention_rate FROM retention_calculation ORDER BY user_type, week_difference; 3)주어진데이터에서어떤사람들이리텐션이그나마높을까요?찾아보세요2번을 참고해서 고객을 4개의 타입으로 분류하였을 때first_month_user: 첫 주 이후 리텐션이 높지만 이후 감소하는 모습을 보인다. active_user: first_month_user와 비슷하게 첫 주 이후 빠르게 감소한다. returning_user: 2주부터 0.08로 떨어지지만 이후 완만하게 리텐션이 감소한다.오랜기간 10주차 이후에도 일부 사용자가 유지되고 있는 모습을 보인다. inactive_user: 리텐션이 다른 유저에 비해 높다고는 할 수 없지만 20주차까지 꾸준하게 소수의사용자가 유지되고 있는 모습을 보인다. --> 따라서 returning_use이 가장 완만하게 리텐션이 감소, 가장 오랫동안 리텐션이 지속했기때문에 가장 높은 리텐션을 유지했다고 판단하였습니다. 4)CoreEvent를“click_payment”라고설정하고WeeklyRetention을구해주세요WITH event_data AS ( SELECT DISTINCT user_pseudo_id, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATE_TRUNC(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul'), WEEK(MONDAY)) AS event_week, event_name FROM advanced.app_logs WHERE event_name = 'click_payment' -- Core event 필터링 ), -- 사용자별 첫 방문 주차와 현재 이벤트 주차 간의 차이 계산 user_first_week_data AS ( SELECT user_pseudo_id, MIN(event_week) OVER (PARTITION BY user_pseudo_id) AS first_visit_week, event_week AS current_week, DATE_DIFF(event_week, MIN(event_week) OVER (PARTITION BY user_pseudo_id), WEEK) AS week_difference FROM event_data ), -- 주차별 사용자 수 집계 user_count_by_week AS ( SELECT first_visit_week, week_difference, COUNT(DISTINCT user_pseudo_id) AS user_count FROM user_first_week_data GROUP BY first_visit_week, week_difference ORDER BY first_visit_week, week_difference ), -- 유지율 계산을 위한 초기 사용자 수와 주차별 유지율 계산 retention_base AS ( SELECT first_visit_week, week_difference, user_count, FIRST_VALUE(user_count) OVER (PARTITION BY first_visit_week ORDER BY week_difference) AS initial_user_count FROM user_count_by_week ) -- 최종 리텐션 비율 계산 및 결과 출력 SELECT first_visit_week, week_difference, ROUND(SAFE_DIVIDE(user_count, initial_user_count), 2) AS retention_rate FROM retention_base ORDER BY first_visit_week, week_difference;
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 분석
1) 리텐션 연습 문제https://torch-cart-08d.notion.site/3-139701964d8c805d9e49f3c68ab9b0b5 2) 3주차 추가 문제1번 문제(15분)주차별 각 카테고리별 평균 할인율이 가장 높았던 기간과 할인율을 구하는 쿼리를 작성해주세요단, 날짜 데이터를 YYYY-MM-DD 23:59:39 이런 형태로 변경해주세요SELECT *, RANK() OVER(PARTITION BY category ORDER BY rate DESC) AS category_rank FROM ( SELECT category, FORMAT_DATETIME('%Y-%m-%d 23:59:39', PARSE_DATETIME('%m/%d/%Y %H:%M:%S', transaction_date)) AS transaction_date, actual_price, list_price, ((list_price - actual_price) / list_price) * 100 AS rate FROM transaction_data AS td LEFT JOIN item_info ii ON td.item_id = ii.item_id ) QUALIFY category_rank = 1 ORDER BY category, transaction_date 2번 문제(10분)2024년 1월에 가장 많은 매출을 기록한 카테고리를 구하는 쿼리를 작성해주세요SELECT category, SUM(actual_price) AS total_price FROM ( SELECT category, FORMAT_DATETIME('%Y-%m-%d 23:59:39', PARSE_DATETIME('%m/%d/%Y %H:%M:%S', transaction_date)) AS transaction_date, actual_price, FROM transaction_data AS td LEFT JOIN item_info ii ON td.item_id = ii.item_id WHERE EXTRACT(MONTH FROM PARSE_DATETIME('%m/%d/%Y %H:%M:%S', transaction_date)) = 1 ) GROUP BY ALL ORDER BY total_price DESC LIMIT 1 3번 문제(10분)유저별 총 구매 금액이 200만원 이상인 유저들이 가장 많이 구매한 카테고리를 찾는 쿼리를 작성해주세요over_200 AS ( SELECT td.user_id, SUM(actual_price) AS user_price FROM transaction_data AS td LEFT JOIN user_info AS ui ON td.user_id = ui.user_id GROUP BY ALL HAVING user_price >= 2000000 ) SELECT category, COUNT(*) AS cnt FROM over_200 AS O2 CROSS JOIN transaction_data AS td LEFT JOIN item_info AS ii ON td.item_id = ii.item_id GROUP BY ALL ORDER BY cnt DESC LIMIT 1약 20분/5분/8분 소요1번 문제에서 날짜 데이터를 변환하는 방법이 떠오르지 않아 시간이 많이 소요됐습니다.
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 3주차 과제] 리텐션 과제
# 1) Weekly Retention을 구하는 쿼리를 바닥부터 스스로 작성해보세요 WITH base AS ( SELECT DISTINCT user_id, event_name, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN '2022-08-01' AND '2023-08-03' ), first_diff_of_week AS ( SELECT *, DATE_DIFF(event_week, first_week, WEEK) AS diff_of_week FROM ( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ) ), user_counts AS ( SELECT diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_diff_of_week GROUP BY diff_of_week ) SELECT diff_of_week, user_cnt, ROUND(SAFE_DIVIDE(user_cnt, first_cnt), 3) AS retention_rate FROM ( SELECT *, FIRST_VALUE(user_cnt) OVER(ORDER BY diff_of_week) AS first_cnt FROM user_counts ) ORDER BY diff_of_week# 2) Retain User를 New + Current + Resurrected + Dormant User로 나누는 쿼리를 작성해보세요. # 신규유저(New) : 제품을 처음 사용하는 유저 # 기존유저(Current) : 제품을 지속적으로 사용하는 유저 # 복귀유저(Resurrected) : 과거에 사용 -> 비활성 -> 다시 제품을 사용한 유저 # 휴면유저(Dormant) : 일정 기간 제품을 사용하지 않은 비활성화 사용자 WITH base AS ( SELECT DISTINCT user_id, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATE_TRUNC(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul'), WEEK(MONDAY)) AS event_week, user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN '2022-08-01' AND '2023-08-03' ), user_activity AS ( SELECT user_pseudo_id, event_week, LAG(event_week, 1) OVER(PARTITION BY user_pseudo_id ORDER BY event_week) AS prev_week, MIN(event_week) OVER(PARTITION BY user_pseudo_id) AS first_week FROM base ), user_classification AS ( SELECT user_pseudo_id, event_week, CASE WHEN event_week = first_week THEN 'New' WHEN prev_week IS NOT NULL AND DATE_DIFF(event_week, prev_week, WEEK) = 1 THEN 'Current' WHEN prev_week IS NOT NULL AND DATE_DIFF(event_week, prev_week, WEEK) > 1 THEN 'Resurrected' ELSE 'Dormant' END AS user_type FROM user_activity ) SELECT event_week, user_type, COUNT(DISTINCT user_pseudo_id) AS user_count FROM user_classification GROUP BY ALL ORDER BY event_week, user_type# 3) 주어진 데이터에서 어떤 사람들이 리텐션이 그나마 높을까요? WITH base AS ( SELECT DISTINCT user_id, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATE_TRUNC(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul'), WEEK(MONDAY)) AS event_week, user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN '2022-08-01' AND '2023-08-03' ), user_activity AS ( SELECT user_pseudo_id, event_week, LAG(event_week, 1) OVER(PARTITION BY user_pseudo_id ORDER BY event_week) AS prev_week, MIN(event_week) OVER(PARTITION BY user_pseudo_id) AS first_week FROM base ), user_classification AS ( SELECT user_pseudo_id, event_week, CASE WHEN event_week = first_week THEN 'New' WHEN prev_week IS NOT NULL AND DATE_DIFF(event_week, prev_week, WEEK) = 1 THEN 'Current' WHEN prev_week IS NOT NULL AND DATE_DIFF(event_week, prev_week, WEEK) > 1 THEN 'Resurrected' ELSE 'Dormant' END AS user_type FROM user_activity ), user_counts AS ( SELECT event_week, user_type, COUNT(DISTINCT user_pseudo_id) AS user_count FROM user_classification GROUP BY ALL ORDER BY event_week, user_type ) SELECT event_week, DATE_DIFF(event_week, first_week, WEEK) AS weeks_after_first_week, user_type, user_count, cohort_users, ROUND(SAFE_DIVIDE(user_count, cohort_users), 3) AS retention_rate FROM ( SELECT DATE(event_week) AS event_week, DATE(MIN(event_week) OVER()) AS first_week, user_type, user_count, MAX(CASE WHEN user_type = 'New' THEN user_count END) OVER(PARTITION BY DATE(event_week)) AS cohort_users FROM user_counts ORDER BY event_week, CASE user_type WHEN 'New' THEN 1 WHEN 'Current' THEN 2 WHEN 'Resurrected' THEN 3 WHEN 'Dormant' THEN 4 END )2022-08-01 ~ 2023-08-03 기간으로 설정하여, 개인적인 의견으로 Retain User를 쪼개 보았습니다.배달어플은 유저들의 사용 간격이 짧다고 생각하기 때문에, 주의 주기로 리텐션으로 설정했습니다. 저는바로 전 주에 사용했고, 그 다음주에 사용했다면 기존유저(Current)로 설정했습니다.2주 이상 텀을 두고 다시 사용했다면 복귀유저(Resurrected)로 설정했습니다.처음 들어가고 사용하지 않았다면 휴면유저(Dormant)로 설정했습니다.주차마다 '신규/기존/복귀/휴면' 유저를 새로 계산하는 쿼리입니다. 데이터를 살펴보면,1) 신규유저(New) : 제품을 처음 사용하는 유저출시(2022-08-01을 출시기간으로 생각하면)로부터 현재(2023-01-16 => 0803으로 설정했지만 0116까지가 끝입니다.)까지의 신규유저 수의 변화를 살펴보겠습니다.런칭부터 10주까지의 신규유저의 수는 주춤한 경우도 있지만 평균 4~500명 정도 증가했습니다.하지만, 11주에 4000명 대에서 3100명으로 900명이 급감한 이후로 계속 하락세를 타고 24주차 현재 신규 가입자의 수는 500명 남짓하고 있습니다.11주차부터의 마케팅 효과는 빛을 보지 못하고 있는 것 같습니다. 코호트 리텐션 시각화를 보자면,2) 기존유저(Current) : 제품을 지속적으로 사용하는 유저(바로 전 주와 이번 주)출시로부터 현재까지 줄곧 주차마다의 신규유저 대비 기존유저의 비율이 증가하고 있습니다.특히, 24주차에 가입한 신규유저 대비 기존유저의 비율은 76%나 차지하고 있습니다.시간이 흐름에 따라 점차 어플이 안정화되면서 신규유저들이 만족을 느끼는 비율이 늘어가고 있다고 보입니다. 3) 복귀유저(Resurrected) : 과거에 사용 -> 비활성화 -> 다시 제품을 사용한 유저(2주 후 복귀)복귀유저는 12주차를 분기로 13주차부터는 신규가입자보다 복귀유저의 수가 많아지기 시작했습니다.24주차에는 신규유저 대비 복귀유저는 6배가 많습니다.복귀유저가 기존유저로 안정되는 현황을 추출할 수 있는 쿼리는 조금 더 연구해 보겠습니다. 4) 휴면유저(Dormant) : 일정기간 제품을 사용하지 않은 비활성화 사용자신규유저 대비 휴면유저는 꾸준히 늘어가고 있습니다(휴면유저가 누적됨).24주차 신규가입자의 5.7배가 휴면유저 입니다.# 4) Core Event를 'click_payment'라고 설정하고 Weekly Retention을 구해주세요 WITH base AS ( SELECT DISTINCT user_id, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN '2022-08-01' AND '2023-08-03' AND event_name = 'click_payment' ), first_week_and_diff AS ( SELECT *, DATE_DIFF(event_week, first_week, WEEK) AS diff_of_week FROM ( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ) ), user_counts AS ( SELECT diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_week_and_diff GROUP BY diff_of_week ORDER BY diff_of_week ) SELECT diff_of_week, user_cnt, ROUND(SAFE_DIVIDE(user_cnt, cohort_users), 3) AS retention_rate FROM ( SELECT *, FIRST_VALUE(user_cnt) OVER(ORDER BY diff_of_week) AS cohort_users FROM user_counts ) ORDER BY diff_of_week
-
해결됨BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[1-9. 퍼널 SQL 쿼리 작성하기] 집계데이터 pivot쿼리
안녕하세요 카일스쿨님:)강의 마지막에 선택과제(?)로 주신 집계데이터 pivot쿼리 작성해봤습니다.쿼리를 작성하면서, 지금은 피봇할 컬럼이 많지 않아서 하나하나 피봇해주는 것으로 진행했는데 이게 최선일지가 의문이 들어서요,피봇하는 컬럼 조건이 다를 뿐 과정은 동일해서 공통으로 적용할 수 있는 방법이 있을 것 같은데 제가 잘못 이해하고 있는 것인지, 아니면 피봇을 항상 이런식으로 진행해야 하는 것인지도 같이 질문드립니다.감사합니다:) with base as ( select event_date ,event_timestamp ,event_name ,user_id ,user_pseudo_id ,max(if(param.key = 'firebase_screen',param.value.string_value,null)) as `firebase_screen` ,max(if(param.key = 'food_id',param.value.int_value,null)) as `food_id` ,max(if(param.key = 'session_id',param.value.string_value,null)) as `session_id` from advanced.app_logs , unnest(event_params) as param where 1=1 and event_name in ('screen_view','click_payment') and event_date >= '2022-08-01' and event_date <= '2022-08-18' group by all order by 1 ) , before_pivot as ( select event_date ,concat(event_name,'-',firebase_screen) as `event_name_with_screen` ,case when event_name = 'screen_view' and firebase_screen = 'welcome' then 1 when event_name = 'screen_view' and firebase_screen = 'home' then 2 when event_name = 'screen_view' and firebase_screen = 'food_category' then 3 when event_name = 'screen_view' and firebase_screen = 'restaurant' then 4 when event_name = 'screen_view' and firebase_screen = 'cart' then 5 when event_name = 'click_payment' and firebase_screen = 'cart' then 6 end as `step_number` ,count(*) as cnt from base where 1=1 group by 1,2,3 having step_number is not null ) select event_date ,max(if (event_name_with_screen = 'screen_view-welcome',cnt,null)) as `screen_view-welcome` ,max(if (event_name_with_screen = 'screen_view-home',cnt,null)) as `screen_view-home` ,max(if (event_name_with_screen = 'screen_view-food_category',cnt,null)) as `screen_view-food_category` ,max(if (event_name_with_screen = 'screen_view-restaurant',cnt,null)) as `screen_view-restaurant` ,max(if (event_name_with_screen = 'screen_view-cart',cnt,null)) as `screen_view-cart` ,max(if (event_name_with_screen = 'click_payment-cart',cnt,null)) as `click_payment-cart` from before_pivot group by all order by 1
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디] 3주차 리텐션 과제
노션 링크로 업로드 합니다!https://qkffpsxkdlwm.notion.site/3-13a35e3a8bef8019b7a6d9c6abc72193?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 3주차 과제] 리텐션 분석
노션에서 작성했습니다! https://salt-baron-5c5.notion.site/3-13aa734e64b880a1a9a1e84a92fa3593
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디] 3주차 리텐션 과제
Notion 링크로 대체합니다.https://colney4844.notion.site/3-13a59b98d5db80438b7fcef0146771f6?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 3주차 과제] 리텐션 연습문제
1. Weekly RetentionWITH base AS ( SELECT DISTINCT DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATE_TRUNC(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul'), WEEK(MONDAY)) AS event_week, DATE_TRUNC(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul'), MONTH) AS event_month, user_id, user_pseudo_id, event_name, platform, event_params FROM advanced.app_logs ), user_visit_base AS ( SELECT user_pseudo_id, MIN(event_week) OVER (PARTITION BY user_pseudo_id) AS first_visit_week, event_week AS visit_week FROM base ), user_visit_weekdiff AS ( SELECT first_visit_week, DATE_DIFF(visit_week, first_visit_week, week) AS week_diff, COUNT(DISTINCT user_pseudo_id) AS visit_users FROM user_visit_base GROUP BY ALL ) SELECT first_visit_week, week_diff, visit_users, SAFE_DIVIDE(visit_users, FIRST_VALUE(visit_users) OVER (PARTITION BY first_visit_week ORDER BY week_diff ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)) AS retention FROM user_visit_weekdiff 2. Retain User → New User, Current User, Resurrected User, Dormant User-- 한 달(30일)을 기준으로 구분하였다. -- 신규 유저: 최근 한 달 이내 새로 방문 -- 기존 유저: 최근 한 달 이내 재방문, 그 이전 한달에도 방문 -- 복귀 유저: 최근 한 달 이내 재방문, 그 이전 한달에는 방문 X -- 휴먼 유저: 최근 한 달 이내 재방문 X WITH base AS ( SELECT DISTINCT DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, user_pseudo_id FROM advanced.app_logs ), user_first_last AS ( SELECT user_pseudo_id, MIN(event_date) AS first_date, MAX(event_date) AS last_date FROM base GROUP BY user_pseudo_id ), last_event_date AS ( SELECT MAX(event_date) AS current_date FROM base ), user_activity AS ( SELECT user_pseudo_id, event_date, LEAD(event_date) OVER (PARTITION BY user_pseudo_id ORDER BY event_date) AS next_event_date, CASE WHEN DATE_DIFF(LEAD(event_date) OVER (PARTITION BY user_pseudo_id ORDER BY event_date), event_date, DAY) > 30 THEN 1 ELSE 0 END AS dormant_history FROM base ), user_types AS ( SELECT u.user_pseudo_id, u.first_date, u.last_date, led.current_date, MAX(us.dormant_history) AS dormant_history, CASE WHEN DATE_DIFF(led.current_date, u.last_date, DAY) > 30 THEN 'dormant_user' WHEN DATE_DIFF(u.last_date, u.first_date, DAY) <= 30 THEN 'new_user' WHEN MAX(us.dormant_history) = 1 THEN 'resurrected_user' ELSE 'current_user' END AS user_type FROM user_first_last AS u CROSS JOIN last_event_date AS led LEFT JOIN user_activity AS us ON u.user_pseudo_id = us.user_pseudo_id GROUP BY u.user_pseudo_id, u.first_date, u.last_date, led.current_date ), first_week_and_diff AS ( SELECT ut.user_type, fw.user_pseudo_id, fw.event_date, DATE_DIFF(DATE_TRUNC(fw.event_date, WEEK(MONDAY)), DATE_TRUNC(ut.first_date, WEEK(MONDAY)), WEEK) AS diff_of_week FROM base AS fw JOIN user_types AS ut ON fw.user_pseudo_id = ut.user_pseudo_id ), user_cnt_by_type_and_week AS ( SELECT user_type, diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_week_and_diff GROUP BY user_type, diff_of_week ), retention_base AS ( SELECT user_type, diff_of_week, user_cnt, FIRST_VALUE(user_cnt) OVER (PARTITION BY user_type ORDER BY diff_of_week) AS first_user_cnt FROM user_cnt_by_type_and_week ) SELECT user_type, diff_of_week, ROUND(SAFE_DIVIDE(user_cnt, first_user_cnt), 2) AS retention_rate FROM retention_base ORDER BY user_type, diff_of_week 3. retention이 높은 그룹?current user: 최고 0.41 (4~5주차)new user: 최고 0.11 (1,3주차)resurrected user: 최고 0.14 (9~12주차)dormant user: 최고 0.09 (3주차)리텐션이 그나마 높은 그룹: current usercurrent user, resurrected user 리텐션 분포의 경우 상승세 → 하강세 경향성을 보여 피크를 찍을 때 즈음 어떤 이벤트가 있었다고도 추측할 수 있다. 4. click_payment 이벤트를 중점으로 본 Weekly Retention?WITH base AS ( SELECT DISTINCT DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, event_name, user_id, user_pseudo_id FROM advanced.app_logs WHERE 1=1 AND event_name = "click_payment" ), event_week_and_first_week AS ( SELECT DISTINCT DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week, user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id ORDER BY event_date), WEEK(MONDAY)) AS first_week FROM base ), retention_base AS ( SELECT *, FIRST_VALUE(user_cnt) OVER(ORDER BY day_of_week) AS total_user FROM ( SELECT DATE_DIFF(event_week, first_week, WEEK) AS day_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM event_week_and_first_week GROUP BY day_of_week ) ) SELECT day_of_week, ROUND(SAFE_DIVIDE(user_cnt, total_user), 2) AS retention FROM retention_base ORDER BY day_of_week
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 3주차 과제] 리텐션 과제
Notion으로 작성하여 링크를 첨부합니다.https://polite-vinyl-a61.notion.site/3-13a4994b207d804f969dc98e5bce9794
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디] 3주차 과제 제품 현황 분석 : 리텐션 분석
과제. 직접 retention 구해보기(주별 weekly)WITH base AS ( SELECT DISTINCT user_id, user_pseudo_id, event_name, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime FROM advanced.app_logs WHERE event_date BETWEEN "2022-08-01" AND "2022-11-03" ), first_week_and_diff AS( SELECT *, -- DATE_DIFF(event_date, first_date, DAY) AS diff_of_day, DATE_DIFF(event_week, first_week, WEEK) AS diff_of_week FROM ( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, event_date, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ) ), user_cnt_and_diff AS( SELECT diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_week_and_diff GROUP BY ALL ORDER BY diff_of_week ), retain_base AS ( SELECT *, FIRST_VALUE(user_cnt) OVER (ORDER BY diff_of_week) AS first_user_cnt FROM user_cnt_and_diff ) SELECT diff_of_week, ROUND(SAFE_DIVIDE(user_cnt, first_user_cnt), 2) AS retention FROM retain_base과제. 월별 리텐션 쿼리 작성해보기WITH base AS ( SELECT DISTINCT user_id, user_pseudo_id, event_name, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime FROM advanced.app_logs WHERE event_date BETWEEN "2022-08-01" AND "2022-11-03" ), first_month_and_diff AS( SELECT *, -- DATE_DIFF(event_date, first_date, DAY) AS diff_of_day, -- DATE_DIFF(event_week, first_week, WEEK) AS diff_of_week DATE_DIFF(event_month, first_month, MONTH) AS diff_of_month FROM ( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id), MONTH) AS first_month, event_date, DATE_TRUNC(event_date, MONTH) AS event_month FROM base ) ), user_cnt_and_diff AS( SELECT diff_of_month, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_month_and_diff GROUP BY ALL ORDER BY diff_of_month ), retain_base AS ( SELECT *, FIRST_VALUE(user_cnt) OVER (ORDER BY diff_of_month) AS first_user_cnt FROM user_cnt_and_diff ) SELECT diff_of_month, user_cnt SAFE_DIVIDE(user_cnt, first_user_cnt) AS retention FROM retain_base 리텐션 연습문제Weekly Retention을 구하는 쿼리를 바닥부터 스스로 작성해보세요. # 답 -- Weekly Retention을 구하자! (Retention이란, 시간이 흐르면서 다시 제품을 사용하는지 측정하는 지표) -- 첫 방문 후 매주 재방문 비율(weekly retention)을 계산하자! WITH base AS ( SELECT DISTINCT ###### DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, event_name, user_id, ###### user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN "2022-08-01" AND "2023-08-31" ), event_week_and_first_week AS ( SELECT DISTINCT DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week, user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id ORDER BY event_date), WEEK(MONDAY)) AS first_week FROM base ), retention_base AS ( SELECT *, FIRST_VALUE(user_cnt) OVER(ORDER BY day_of_week) AS total_user FROM( SELECT DATE_DIFF(event_week, first_week, WEEK) AS day_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM event_week_and_first_week GROUP BY ALL ) -- ORDER BY day_of_week ) SELECT day_of_week, SAFE_DIVIDE(user_cnt, total_user) AS retention FROM retention_base Retain User를 New + Current + Resurrected + Dormant User로 나누는 쿼리를 작성해보세요# 답 -- 신규 유저: 첫 활동일 기준 7일 이내 -- 기존 유저: 첫 활동일 기준 7일 이후, 마지막 활동으로부터 90일 이내 -- 휴면 유저: 마지막 활동일 이후 90일 이상 경과 -- 복귀 유저: 90일 이상 비활성화 이후 다시 돌아온 유저 WITH base AS ( SELECT DISTINCT DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, user_pseudo_id FROM advanced.app_logs --WHERE event_date BETWEEN "2022-08-01" AND "2023-08-31" ),user_first_last AS ( SELECT user_pseudo_id, MIN(event_date) AS first_date, #OVER(PARTITION BY user_pseudo_id)를 넣으면 윈도우함수사용과 GROUPBY졸 사용 비교했을 때처럼 중복행이 생긴다 MAX(event_date) AS last_date FROM base GROUP BY user_pseudo_id ), last_event_date AS ( -- 원래라면 CURRENT_DATE()로 해야하지만, 앱로그 데이터의 마지막 날로 설정. SELECT MAX(event_date) AS current_date FROM base ), user_activity AS ( -- 90일 이상 제품을 사용하지 않았다면 1, 아니면 0 SELECT user_pseudo_id, event_date, LEAD(event_date) OVER (PARTITION BY user_pseudo_id ORDER BY event_date) AS next_event_date, CASE WHEN DATE_DIFF(LEAD(event_date) OVER (PARTITION BY user_pseudo_id ORDER BY event_date), event_date, DAY) > 90 THEN 1 ELSE 0 END AS dormant_history FROM base ) -- user_types AS ( SELECT DISTINCT u.user_pseudo_id, u.first_date, u.last_date, led.current_date, us.dormant_history, CASE WHEN DATE_DIFF(led.current_date, u.last_date, DAY) > 90 THEN 'dormant_user' WHEN DATE_DIFF(u.last_date, u.first_date, DAY) <= 7 THEN 'new_user' WHEN dormant_history = 1 THEN 'resurrected_user' ELSE 'current_user' END AS user_type FROM user_first_last AS u CROSS JOIN last_event_date AS led LEFT JOIN user_activity AS us ON u.user_pseudo_id = us.user_pseudo_id주어진 데이터에서 어떤 사람들이 리텐션이 그나마 높을까요? 찾아보세요WITH base AS ( SELECT DISTINCT DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, user_pseudo_id FROM advanced.app_logs ), user_first_last AS ( SELECT user_pseudo_id, MIN(event_date) AS first_date, MAX(event_date) AS last_date FROM base GROUP BY user_pseudo_id ), last_event_date AS ( SELECT MAX(event_date) AS current_date FROM base ), user_activity AS ( SELECT user_pseudo_id, event_date, LEAD(event_date) OVER (PARTITION BY user_pseudo_id ORDER BY event_date) AS next_event_date, CASE WHEN DATE_DIFF(LEAD(event_date) OVER (PARTITION BY user_pseudo_id ORDER BY event_date), event_date, DAY) > 90 THEN 1 ELSE 0 END AS dormant_history FROM base ), user_types AS ( SELECT u.user_pseudo_id, u.first_date, u.last_date, led.current_date, MAX(us.dormant_history) AS dormant_history, CASE WHEN DATE_DIFF(led.current_date, u.last_date, DAY) > 90 THEN 'dormant_user' WHEN DATE_DIFF(u.last_date, u.first_date, DAY) <= 7 THEN 'new_user' WHEN MAX(us.dormant_history) = 1 THEN 'resurrected_user' ELSE 'current_user' END AS user_type FROM user_first_last AS u CROSS JOIN last_event_date AS led LEFT JOIN user_activity AS us ON u.user_pseudo_id = us.user_pseudo_id GROUP BY u.user_pseudo_id, u.first_date, u.last_date, led.current_date ), first_week_and_diff AS ( SELECT ut.user_type, fw.user_pseudo_id, fw.event_date, DATE_DIFF(DATE_TRUNC(fw.event_date, WEEK(MONDAY)), DATE_TRUNC(ut.first_date, WEEK(MONDAY)), WEEK) AS diff_of_week FROM base AS fw JOIN user_types AS ut ON fw.user_pseudo_id = ut.user_pseudo_id ), user_cnt_by_type_and_week AS ( SELECT user_type, diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_week_and_diff GROUP BY user_type, diff_of_week ), retention_base AS ( SELECT user_type, diff_of_week, user_cnt, FIRST_VALUE(user_cnt) OVER (PARTITION BY user_type ORDER BY diff_of_week) AS first_user_cnt FROM user_cnt_by_type_and_week ) SELECT user_type, diff_of_week, ROUND(SAFE_DIVIDE(user_cnt, first_user_cnt), 2) AS retention_rate FROM retention_base ORDER BY user_type, diff_of_week current user의 1주차 리텐션은 0.11부터 시작해서 5~6주차에는 0.15까지 오릅니다.new user의 경우 1주차까지 밖에 리텐션을 구하지 못하고, 0.06입니다.resurrected user의 경우 1주차 리텐션은 0.04부터 시작해서 7주차에는 0으로 빠르게 감소합니다.dormant user의 경우 휴면고객의 리텐션을 구하는 것이 의미가 있는지 잘 모르겠지만, 1주차 리텐션은 0.05부터 시작해서 11주차까지 0으로 감소합니다.리텐션이 그나마 높은 유저는 current user이고, 리텐션이 주마다 서서히 증가하거나 감소하는데 2주차에 오른 것으로 보아 기능 배포나 이벤트가 있지 않았을까 예상해봅니다.Core Event를 “click_payment”라고 설정하고 Weekly Retention을 구해주세요WITH base AS ( SELECT DISTINCT DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, event_name, user_id, user_pseudo_id FROM advanced.app_logs WHERE event_name = "click_payment" ), event_week_and_first_week AS ( SELECT DISTINCT DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week, user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id ORDER BY event_date), WEEK(MONDAY)) AS first_week FROM base ), retention_base AS ( SELECT *, FIRST_VALUE(user_cnt) OVER(ORDER BY day_of_week) AS total_user FROM ( SELECT DATE_DIFF(event_week, first_week, WEEK) AS day_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM event_week_and_first_week GROUP BY day_of_week ) ) SELECT day_of_week, ROUND(SAFE_DIVIDE(user_cnt, total_user), 2) AS retention FROM retention_base ORDER BY day_of_weekapp_logs 테이블에서 event_name이 click_payment인 것만 필터 걸어서 리텐션 구하기를 진행했는데 이렇게 푸는 것이 맞을까요..?
-
미해결마케터를 위한 구글 애널리틱스 실무
강의 PDF 자료 요청
안녕하세요, 강의 PDF 자료 요청드립니다! 감사합니다.tina2347@naver.com
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
빠짝스터디 2주차 윈도우 함수
--1. 사용자별 쿼리를 실행한 총 횟수를 구하는 쿼리를 작성해주세요. 단, group by를 사용해서 집곟나ㅡㄴ 것이 아닌 query_logs의 데이터의 우측에 새로운 컬럼을 만들어주세요. select *, count(query_date) over(partition by user) as total_query_cnt from advanced.query_logs order by 1, 3 --2. 주차별로 팀 내에서 쿼리를 많이 실행한 수를 구한 후, 실행한 수를 활용해 랭킹을 구해주세요. 단, 랭킹이 1등인 사람만 결과가 보이게 해주세요 with query_cnt_by_team as( select extract(WEEK from query_date) as week_number, team, user, count(user) as query_cnt from advanced.query_logs group by all ) select *, rank() over(partition by week_number, team order by query_cnt desc ) as rk from query_cnt_by_team qualify rk = 1 order by 1, 2, 4 desc -- 3. (2번 문제에서 사용한 주차별 쿼리 사용) 쿼리를 실행한 시점 기준 1주 전에 쿼리 실행 수를 별도의 컬럼으로 확인할 수 있는 쿼리를 작성해주세요. with query_cnt_by_team as( select extract(WEEK from query_date) as week_number, team, user, count(user) as query_cnt from advanced.query_logs group by all ) select *, lag(query_cnt, 1) over(partition by user order by week_number) as prev_week_query_cnt from query_cnt_by_team -- 4. 시간의 흐름에 따라 일자별로 유저가 실행한 누적 쿼리 수를 작성해주세요. select *, sum(query_cnt) over (partition by user order by query_date rows between unbounded preceding and current row) as cumulative_sum from ( select query_date, user, count(user) as query_cnt from advanced.query_logs group by all ) order by 2, 1 -- 5. 다음 데이터는 주문 횟수를 나타낸 데이터입니다. 만약 주문 횟수가 없다면 NULL로 기록됩니다. 이런 데이터에서 NULL 값이라고 되어있는 부분을 바로 이전 날짜의 값으로 채워주는 쿼리를 작성해주세요. WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ), raw_data2 as( select *, last_value(raw_data.number_of_orders ignore nulls) over(order by date) as last_value_orders from raw_data ) select * from raw_data2 WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ), raw_data2 as( select *, last_value(raw_data.number_of_orders ignore nulls) over(order by date) as last_value_orders from raw_data ) --6. 5번 문제에서 NULL을 채운 후, 2일 전 ~ 현재 데이터의 평균을 구하는 쿼리를 작성해주세요(이동평균) select * except(number_of_orders), avg(last_value_orders) over (order by date rows between 2 preceding and current row) as moving_avg from raw_data2 --7. app_logs 테이블에서 Custom session을 만들어 주세요. 이전 이벤트 로그와 20초가 지나면 새로운 session을 만들어 주세요. session은 숫자로 (1, 2, 3..) 표시해도 됩니다. -- 2022-08-18의 user_pseudo_id(1997494153.8491999091)은 session_id가 4까지 나옵니다 with base as( select event_date, datetime(timestamp_micros(event_timestamp), 'Asia/Seoul') as event_datetime, event_name, user_id, user_pseudo_id from advanced.app_logs where event_date = "2022-08-18" and user_pseudo_id = "1997494153.8491999091" order by event_timestamp ), diff_data as ( select *, datetime_diff(event_datetime, prev_event_datetime, second) as second_diff from ( select *, lag(event_datetime, 1) over(partition by user_pseudo_id order by event_datetime) as prev_event_datetime from base order by event_datetime ) ) select *, sum(session_start) over(partition by user_pseudo_id order by event_datetime) as session_number from ( select *, case when prev_event_datetime is null then 1 when second_diff >= 20 then 1 else null end as session_start from diff_data )
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
빠짝스터디 2주차 과제 윈도우 함수
윈도우 함수 연습문제 1select *, count(*) over(partition by user) as total_query_cnt from `advanced.query_logs`;윈도우 함수 연습문제 2select *, rank() over(partition by team, week_number order by query_cnt desc) as team_rank from ( select user, week_number, team, count(*) as query_cnt from ( select *,extract(week from query_date) as week_number from `advanced.query_logs` ) group by all ) qualify team_rank=1;윈도우 함수 연습문제3select *, lag(query_cnt) over(partition by user order by week_number asc) as prev_week_query_cnt from ( select user, week_number, team, count(*) as query_cnt from ( select *,extract(week from query_date) as week_number from `advanced.query_logs` ) group by all );윈도우 함수 연습문제4select *, sum(query_count) over(partition by user order by query_date asc) as cumulative_query_cnt from ( select user, query_date, team, count(*) as query_count from `advanced.query_logs` group by all );윈도우 함수 연습문제5SELECT date, case when number_of_orders is null then lag(number_of_orders) over(order by date asc) else number_of_orders end as number_of_orders FROM raw_data;윈도우 함수 연습문제6select *, avg(number_of_orders) over(order by date asc rows between 2 preceding and current row) as moving_average from ( SELECT date, case when number_of_orders is null then lag(number_of_orders) over(order by date asc) else number_of_orders end as number_of_orders, FROM raw_data ) ;윈도우 함수 연습문제7select *, sum(session_start) over(partition by user_pseudo_id order by event_timestamp asc) as session_id from ( select *, case when time_diff is null then 1 when time_diff >= 20 then 1 else null end as session_start from ( select *, cast((event_timestamp - before_event_timestamp)/1000000 as int) as time_diff from ( select event_date, event_timestamp, event_name, user_id, user_pseudo_id, lag(event_timestamp) over(partition by user_pseudo_id order by event_timestamp asc) as before_event_timestamp from `advanced.app_logs` where user_pseudo_id='1997494153.8491999091' and event_date='2022-08-18' ) ) );