묻고 답해요
152만명의 커뮤니티!! 함께 토론해봐요.
인프런 TOP Writers
순위 정보를
불러오고 있어요
-
해결됨BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
바짝 스터디 3주차 과제 제출
연습문제 1번select *, TRUNC(user_count / first_count, 3) AS ratio from ( select *, first_value(user_count) over(order by week_diff asc) as first_count from ( select week_diff, count(*) as user_count from ( select *, date_diff(datetime_week, first_week, week) as week_diff from ( select distinct event_date, user_pseudo_id, platform, datetime_week, -- event_name, first_value(datetime_week) over (partition by user_pseudo_id order by datetime_week asc) as first_week from ( select * except(firebase_screen,event_name), DATETIME_TRUNC(CAST(event_date AS DATE), WEEK) AS datetime_week, concat(firebase_screen,'_',event_name) as event_name from ( select * except(event_param), max(if(event_param.key = 'firebase_screen',event_param.value.string_value , null)) as firebase_screen from ( select event_date, event_timestamp, user_pseudo_id, platform, event_name, event_param from `advanced.app_logs` cross join unnest(event_params) as event_param where event_date >= '2022-08-01' and event_date <= '2022-08-30' ) group by all ) ) ) ) group by week_diff ) ) order by week_diff ;연습문제2현재 시간에서 마지막 활동이 2주 이상 경과면 이탈마지막 활동이 현재 시간에서 1주 이내 인데 이전 활동과 갭이 2주이상이면 복귀마지막 action이 현재시간에서 1주내이고 이전 액션이 없다면 new나머진 activeselect user_pseudo_id, case when DATETIME_DIFF(now, datetime_week, WEEK) >= 2 then 'dormant' when DATETIME_DIFF(datetime_week, last_visit_week, WEEK) < 2 and DATETIME_DIFF(now, datetime_week, WEEK) < 2 then 'active' when DATETIME_DIFF(datetime_week, last_visit_week, WEEK) >= 2 and DATETIME_DIFF(now, datetime_week, WEEK) < 2 then 'resurrected' when DATETIME_DIFF(now, datetime_week, WEEK) < 2 and last_visit_week is null then 'new' else null end as status from ( select *, rank() over (partition by user_pseudo_id order by event_timestamp desc) as rank from ( select distinct event_date, user_pseudo_id, platform, datetime_week, event_timestamp, -- event_name, -- first_value(datetime_week) over (partition by user_pseudo_id order by datetime_week asc) as first_week lag(datetime_week) over(partition by user_pseudo_id order by event_timestamp) as last_visit_week, CAST('2022-08-31 00:00:00' AS DATETIME) as now, from ( select * except(firebase_screen,event_name), DATETIME_TRUNC(CAST(event_date AS DATE), WEEK) AS datetime_week, concat(firebase_screen,'_',event_name) as event_name from ( select * except(event_param), max(if(event_param.key = 'firebase_screen',event_param.value.string_value , null)) as firebase_screen from ( select event_date, event_timestamp, user_pseudo_id, platform, event_name, event_param from `advanced.app_logs` cross join unnest(event_params) as event_param where event_date >= '2022-08-01' and event_date <= '2022-08-30' ) group by all ) ) ) qualify rank=1 ) ;연습문제3, 연습문제4쿼리는 동일하게 사용하고 event_name을 변경하면서 리텐션이 가장 높은 값을 추적click_restaurant의 리텐션이 가장 높아 이를 핵심 이벤트로 지정해야 하나 싶긴 한데 결과가 조금 이상해서 쿼리 점검 필요 해보임..아래 쿼리에서 event_name만 click_payment로 변경하면 click_payment의 weekly retentionselect *, TRUNC(user_count / first_count, 3) AS ratio from ( select *, first_value(user_count) over(order by week_diff asc) as first_count from ( select week_diff, count(*) as user_count from ( select *, date_diff(datetime_week, first_week, week) as week_diff from ( select distinct user_pseudo_id, datetime_week, -- event_name, first_value(datetime_week) over (partition by user_pseudo_id order by datetime_week asc) as first_week from ( select *, DATETIME_TRUNC(CAST(event_date AS DATE), WEEK) AS datetime_week from ( select event_date, event_timestamp, user_pseudo_id, platform, event_name from `advanced.app_logs` where event_date >= '2022-08-01' and event_date <= '2022-08-30' and event_name='click_restaurant' ) ) ) ) group by week_diff ) ) order by week_diff ;
-
해결됨BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 3주차 과제] 리텐션 연습문제 및 추가 코딩테스트 문제풀이
1) 3주차 추가 코딩테스트 대비 문제풀이1번문제 (제한시간: 15분 >> 소요시간 23분)문제: 주차별 각 카테고리별 평균 할인율이 가장 높았던 기간과 할인율을 구하는 쿼리를 작성해주세요단, 날짜 데이터를 YYYY-MM-DD 23:59:39 이런 형태로 변경해주세요WITH transaction_data AS ( SELECT 111 AS user_id, 1001 AS item_id, 719200 AS actual_price, '01/08/2024 12:00:00' AS transaction_date UNION ALL SELECT 111, 2002, 89000, '01/10/2024 12:00:00' UNION ALL SELECT 189, 2002, 89000, '01/12/2024 12:00:00' UNION ALL SELECT 156, 3002, 459000, '01/15/2024 12:00:00' UNION ALL SELECT 121, 1001, 719200, '01/18/2024 12:00:00' UNION ALL SELECT 156, 2001, 90300, '01/25/2024 12:00:00' UNION ALL SELECT 145, 3001, 399000, '01/26/2024 12:00:00' UNION ALL SELECT 189, 1002, 607200, '01/28/2024 12:00:00' UNION ALL SELECT 111, 3001, 399000, '02/05/2024 12:00:00' UNION ALL SELECT 178, 1002, 759000, '02/07/2024 12:00:00' UNION ALL SELECT 121, 2002, 62300, '02/08/2024 12:00:00' UNION ALL SELECT 156, 1001, 899000, '02/10/2024 12:00:00' UNION ALL SELECT 190, 2001, 90300, '02/11/2024 12:00:00' UNION ALL SELECT 189, 2001, 90300, '02/14/2024 12:00:00' UNION ALL SELECT 111, 1002, 759000, '02/15/2024 12:00:00' UNION ALL SELECT 156, 3001, 299250, '02/20/2024 12:00:00' UNION ALL SELECT 189, 3002, 344250, '02/25/2024 12:00:00' UNION ALL SELECT 111, 2001, 90300, '02/28/2024 12:00:00' ), item_info AS ( SELECT 1001 AS item_id, 'Electronics' AS category, 'Smartphone' AS item_name, 899000 AS list_price UNION ALL SELECT 1002 AS item_id, 'Electronics' AS category, 'Tablet' AS item_name, 759000 AS list_price UNION ALL SELECT 2001 AS item_id, 'Fashion' AS category, 'Sneakers' AS item_name, 129000 AS list_price UNION ALL SELECT 2002 AS item_id, 'Fashion' AS category, 'Backpack' AS item_name, 89000 AS list_price UNION ALL SELECT 3001 AS item_id, 'Home' AS category, 'Coffee Machine' AS item_name, 399000 AS list_price UNION ALL SELECT 3002 AS item_id, 'Home' AS category, 'Air Purifier' AS item_name, 459000 AS list_price ) SELECT *, RANK() OVER (ORDER BY avg_sales_ratio DESC) as rnk FROM ( SELECT weekly, category, AVG(sales_ratio) AS avg_sales_ratio FROM ( SELECT DATE_TRUNC(DATETIME(CONCAT(SPLIT(SPLIT(a.transaction_date, '/')[OFFSET(2)], ' ')[OFFSET(0)],'-',SPLIT(a.transaction_date, '/')[OFFSET(0)] , '-',SPLIT(a.transaction_date, '/')[OFFSET(1)])), WEEK(MONDAY)) AS weekly, a.item_id, b.category, b.list_price, a.actual_price, ROUND(1- ROUND(SAFE_DIVIDE(actual_price, list_price),4),4) AS sales_ratio FROM transaction_data a LEFT JOIN item_info b ON a.item_id = b.item_id ) GROUP BY ALL ) QUALIFY rnk = 1 ORDER BY 1 ;2번문제 (제한시간: 10분 >> 소요시간 7분)문제: 2024년 1월에 가장 많은 매출을 기록한 카테고리를 구하는 쿼리를 작성해주세요WITH transaction_data AS ( SELECT 111 AS user_id, 1001 AS item_id, 719200 AS actual_price, '01/08/2024 12:00:00' AS transaction_date UNION ALL SELECT 111, 2002, 89000, '01/10/2024 12:00:00' UNION ALL SELECT 189, 2002, 89000, '01/12/2024 12:00:00' UNION ALL SELECT 156, 3002, 459000, '01/15/2024 12:00:00' UNION ALL SELECT 121, 1001, 719200, '01/18/2024 12:00:00' UNION ALL SELECT 156, 2001, 90300, '01/25/2024 12:00:00' UNION ALL SELECT 145, 3001, 399000, '01/26/2024 12:00:00' UNION ALL SELECT 189, 1002, 607200, '01/28/2024 12:00:00' UNION ALL SELECT 111, 3001, 399000, '02/05/2024 12:00:00' UNION ALL SELECT 178, 1002, 759000, '02/07/2024 12:00:00' UNION ALL SELECT 121, 2002, 62300, '02/08/2024 12:00:00' UNION ALL SELECT 156, 1001, 899000, '02/10/2024 12:00:00' UNION ALL SELECT 190, 2001, 90300, '02/11/2024 12:00:00' UNION ALL SELECT 189, 2001, 90300, '02/14/2024 12:00:00' UNION ALL SELECT 111, 1002, 759000, '02/15/2024 12:00:00' UNION ALL SELECT 156, 3001, 299250, '02/20/2024 12:00:00' UNION ALL SELECT 189, 3002, 344250, '02/25/2024 12:00:00' UNION ALL SELECT 111, 2001, 90300, '02/28/2024 12:00:00' ), item_info AS ( SELECT 1001 AS item_id, 'Electronics' AS category, 'Smartphone' AS item_name, 899000 AS list_price UNION ALL SELECT 1002 AS item_id, 'Electronics' AS category, 'Tablet' AS item_name, 759000 AS list_price UNION ALL SELECT 2001 AS item_id, 'Fashion' AS category, 'Sneakers' AS item_name, 129000 AS list_price UNION ALL SELECT 2002 AS item_id, 'Fashion' AS category, 'Backpack' AS item_name, 89000 AS list_price UNION ALL SELECT 3001 AS item_id, 'Home' AS category, 'Coffee Machine' AS item_name, 399000 AS list_price UNION ALL SELECT 3002 AS item_id, 'Home' AS category, 'Air Purifier' AS item_name, 459000 AS list_price ) SELECT category FROM ( SELECT transaction_month, category, SUM(actual_price) AS category_sum_price FROM ( SELECT DATE_TRUNC(DATETIME(CONCAT(SPLIT(SPLIT(a.transaction_date, '/')[OFFSET(2)], ' ')[OFFSET(0)],'-',SPLIT(a.transaction_date, '/')[OFFSET(0)] , '-',SPLIT(a.transaction_date, '/')[OFFSET(1)])), MONTH) AS transaction_month, a.item_id, b.category, a.actual_price FROM transaction_data a LEFT JOIN item_info b ON a.item_id = b.item_id ) WHERE transaction_month = "2024-01-01" GROUP BY ALL ) ORDER BY category_sum_price DESC LIMIT 1 ;3번문제 (제한시간: 10분 >> 소요시간 6분 40초)문제: 유저별 총 구매 금액이 200만원 이상인 유저들이 가장 많이 구매한 카테고리를 찾는 쿼리를 작성해주세요WITH transaction_data AS ( SELECT 111 AS user_id, 1001 AS item_id, 719200 AS actual_price, '01/08/2024 12:00:00' AS transaction_date UNION ALL SELECT 111, 2002, 89000, '01/10/2024 12:00:00' UNION ALL SELECT 189, 2002, 89000, '01/12/2024 12:00:00' UNION ALL SELECT 156, 3002, 459000, '01/15/2024 12:00:00' UNION ALL SELECT 121, 1001, 719200, '01/18/2024 12:00:00' UNION ALL SELECT 156, 2001, 90300, '01/25/2024 12:00:00' UNION ALL SELECT 145, 3001, 399000, '01/26/2024 12:00:00' UNION ALL SELECT 189, 1002, 607200, '01/28/2024 12:00:00' UNION ALL SELECT 111, 3001, 399000, '02/05/2024 12:00:00' UNION ALL SELECT 178, 1002, 759000, '02/07/2024 12:00:00' UNION ALL SELECT 121, 2002, 62300, '02/08/2024 12:00:00' UNION ALL SELECT 156, 1001, 899000, '02/10/2024 12:00:00' UNION ALL SELECT 190, 2001, 90300, '02/11/2024 12:00:00' UNION ALL SELECT 189, 2001, 90300, '02/14/2024 12:00:00' UNION ALL SELECT 111, 1002, 759000, '02/15/2024 12:00:00' UNION ALL SELECT 156, 3001, 299250, '02/20/2024 12:00:00' UNION ALL SELECT 189, 3002, 344250, '02/25/2024 12:00:00' UNION ALL SELECT 111, 2001, 90300, '02/28/2024 12:00:00' ), item_info AS ( SELECT 1001 AS item_id, 'Electronics' AS category, 'Smartphone' AS item_name, 899000 AS list_price UNION ALL SELECT 1002 AS item_id, 'Electronics' AS category, 'Tablet' AS item_name, 759000 AS list_price UNION ALL SELECT 2001 AS item_id, 'Fashion' AS category, 'Sneakers' AS item_name, 129000 AS list_price UNION ALL SELECT 2002 AS item_id, 'Fashion' AS category, 'Backpack' AS item_name, 89000 AS list_price UNION ALL SELECT 3001 AS item_id, 'Home' AS category, 'Coffee Machine' AS item_name, 399000 AS list_price UNION ALL SELECT 3002 AS item_id, 'Home' AS category, 'Air Purifier' AS item_name, 459000 AS list_price ) SELECT category FROM ( SELECT category, SUM(actual_price) AS sum_purchase_price FROM ( SELECT a.user_id, a.item_id, b.category, a.actual_price FROM transaction_data a LEFT JOIN item_info b ON a.item_id = b.item_id WHERE a.user_id IN ( SELECT user_id FROM ( SELECT user_id, SUM(actual_price) AS sum_purchase_price FROM transaction_data a GROUP BY ALL HAVING SUM(actual_price) >= 2000000 ) ) ) GROUP BY ALL ) ORDER BY sum_purchase_price DESC LIMIT 1 2) 리텐션 연습문제 풀이해당 연습문제는 전부 해결하지 못해 최대한 기한 내에 풀이를 완료한 쿼리를 작성하게 되었습니다. 완료하지 못한 과제는 추후에라도 풀이 완료하여 수정 업로드 하겠습니다.문제 1) Weekly Retention을 구하는 쿼리를 바닥부터 스스로 작성해보세요.WITH base AS ( # event데이터의 raw데이터를 추출 SELECT DISTINCT user_id, user_pseudo_id, DATE(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_date, event_name FROM advanced.app_logs WHERE event_date BETWEEN "2022-08-01" AND "2022-12-31" ), first_week_diff_event_week AS ( # 유저별 첫번쨰 week, 이벤트 발생 week, 주차별 차이 데이터 추출 SELECT DISTINCT user_pseudo_id, first_week, event_week, DATE_DIFF(event_week, first_week, WEEK) AS weeks_after_first_week FROM ( SELECT user_pseudo_id, event_name, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week, DATE_TRUNC(MIN(event_date) OVER (PARTITION BY user_pseudo_id ORDER BY event_date), WEEK(MONDAY)) AS first_week FROM base ) ), user_counts AS ( # 첫번째 이벤트 발생주차 | 이벤트 Retain 발생차이 | 유저 수 | 코호트 유저 수 SELECT *, FIRST_VALUE(user_cnt) OVER (ORDER BY weeks_after_first_week) AS cohort_user FROM ( SELECT weeks_after_first_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_week_diff_event_week GROUP BY ALL ) ) # Retention Rate 추출 SELECT *, ROUND(SAFE_DIVIDE(user_cnt, cohort_user), 4) AS retention_rate FROM user_counts ORDER BY 1,2 ;
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝 스터디 3주차과제] 리텐션 연습문제
노션링크 첨부합니다.https://focapa.notion.site/3-13c094a2b89980719923c79f9298fefd?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 분석
노션으로 과제 제출합니다. https://www.notion.so/12735b8a6e338052a9cbf91cab8fdce2?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빅쿼리 빠짝스터디 3주차] 리텐션 분석
노션으로 작성하였습니다.https://apple-baroness-590.notion.site/3-13aacf7d68f680179560f0ac848277f5?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 3주차 과제] 리텐션 연습문제
문제 1) Weekly Retention을 구하는 쿼리를 바닥부터 스스로 작성해보세요 WITH base AS ( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul'), WEEK(MONDAY)) AS event_week, DATE_TRUNC(MIN(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week FROM `advanced.app_logs` ), first_week_diff AS ( SELECT user_pseudo_id, DATE_DIFF(event_week, first_week, WEEK) AS diff_of_week FROM base ), user_counts AS ( SELECT diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_count FROM first_week_diff GROUP BY diff_of_week ) SELECT diff_of_week, user_count, ROUND(SAFE_DIVIDE(user_count, FIRST_VALUE(user_count) OVER (ORDER BY diff_of_week ASC)), 2) AS retention_rate FROM user_counts ORDER BY diff_of_week; 문제 2) Retain User를 New + Current + Resurrected + Dormant User로 나누는 쿼리를 작성해보세요. WITH event_data AS ( SELECT DISTINCT DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, user_pseudo_id FROM advanced.app_logs ), user_visits AS ( SELECT user_pseudo_id, MIN(event_date) AS first_visit, MAX(event_date) AS last_visit -- 최근 방믄 FROM event_data GROUP BY user_pseudo_id ), -- 최신일자 current_date AS ( SELECT MAX(event_date) AS max_date FROM event_data ), user_types AS ( SELECT u.user_pseudo_id, u.first_visit, u.last_visit, c.max_date as current_date, DATE_DIFF(c.max_date, u.last_visit, DAY) as days_since_last_visit, CASE WHEN DATE_DIFF(c.max_date, u.last_visit, DAY) <= 14 THEN 'current' -- 최근 2주 내 방문 WHEN DATE_DIFF(c.max_date, u.last_visit, DAY) > 60 THEN 'dormant' -- 60일 이상 미접속 WHEN DATE_DIFF(c.max_date, u.last_visit, DAY) > 30 AND DATE_DIFF(c.max_date, u.last_visit, DAY) <= 44 AND EXISTS ( SELECT 1 FROM event_data e WHERE e.user_pseudo_id = u.user_pseudo_id AND DATE_DIFF(c.max_date, e.event_date, DAY) <= 14 ) THEN 'resurrected' -- 30일 이상 비활동 후 최근 14일 내 재접속 WHEN DATE_DIFF(c.max_date, u.last_visit, DAY) > 60 THEN 'inactive' -- 60일 이상 비활동 ELSE 'none' END AS user_type FROM user_visits u CROSS JOIN current_date c ), weekly_retention AS ( SELECT ut.user_type, DATE_DIFF(DATE_TRUNC(ed.event_date, WEEK), DATE_TRUNC(ut.first_visit, WEEK), WEEK) as week_number, COUNT(DISTINCT ed.user_pseudo_id) as user_count FROM event_data ed JOIN user_types ut ON ed.user_pseudo_id = ut.user_pseudo_id GROUP BY 1, 2 ), retention_rates AS ( SELECT user_type, week_number, user_count, FIRST_VALUE(user_count) OVER ( PARTITION BY user_type ORDER BY week_number ) as initial_users FROM weekly_retention ) SELECT user_type, week_number, user_count as active_users, initial_users as cohort_size, ROUND(user_count * 100.0 / initial_users, 2) as retention_rate FROM retention_rates ORDER BY user_type, week_number; 문제 3) 주어진 데이터에서 어떤 사람들이 리텐션이 그나마 높을까요? 찾아보세요 - current의 경우 15주까지는 리텐션 유지 및 증가하며 큰 변동 없다가 그 후부터 지속적으로 하락하였다. - current는 그래도 리텐션율이 9.X로 시작했으나 dormant는 7.X로 시작하였고 더 빠르게 유입율이 감소하며 리텐션이 좋지 않았다. 문제 4) Core Event를 “click_payment”라고 설정하고 Weekly Retention을 구해주세요 WITH base AS ( SELECT user_pseudo_id, DATE_TRUNC(DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')), WEEK(MONDAY)) as week FROM advanced.app_logs WHERE event_name = "click_payment" ), cal_week AS ( SELECT user_pseudo_id, week as visit_week, MIN(week) OVER (PARTITION BY user_pseudo_id) as first_week FROM base ), weekly_retention AS ( SELECT first_week DATE_DIFF(visit_week, first_week, WEEK) as week_number, COUNT(DISTINCT user_pseudo_id) as user_count FROM cal_week GROUP BY first_week, DATE_DIFF(visit_week, first_week, WEEK) ), cohort_sizes AS ( SELECT first_week, MAX(CASE WHEN week_number = 0 THEN user_count END) as initial_users FROM weekly_retention GROUP BY cohort_week ) SELECT wr.first_week, cs.initial_users as total_users, wr.week_number, wr.user_count as active_users, ROUND(wr.user_count * 100.0 / cs.initial_users, 2) as retention_rate FROM weekly_retention wr JOIN cohort_sizes cs ON wr.cohort_week = cs.cohort_week WHERE wr.week_number >= 0 ORDER BY wr.cohort_week, wr.week_number;
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 3주차 과제] 리텐션 연습문제
이번 주 과제는 아래 링크로 대체합니다. 참고 부탁드립니다!https://hajekim.notion.site/3-13bfd22d19d6806eb216c13ea620d6cb?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빅쿼리 빠짝스터디 3주차 과제]
스터디 과제 노션 링크로 제출합니다!https://www.notion.so/3-13d9f4b04d53806db52bfb5295fd0215?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝 스터디 3주차 과제] 리텐션 과제
노션으로 작성하여 링크 첨부합니다.https://sapphire-spade-aa6.notion.site/3-13d5677c37548059a160fd6fc201b6c4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디] 3주차 과제 - 리텐션
노션으로 과제 진행하여 링크 복사합니다!! 웹 게시로 수정했습니다 :) https://jypack788.notion.site/3-14-888ecbcf244f42dc9a4e5640fe0fa8dd
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 3주차 과제] 리텐션 연습문제
노션 링크 : https://www.notion.so/3-13a3dc9851a680cda683e39c64a8dc91?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 연습문제
1번with base as ( select user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER (PARTITION BY user_pseudo_id), WEEK(MONDAY)) as event_first_week, DATE_TRUNC(event_date,WEEK(monday)) as event_week, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') as event_time from advanced.app_logs ) , diff_date_tbl as ( select user_pseudo_id, event_first_week, event_week, DATE_DIFF(event_week, event_first_week, WEEK) as diff_date from base order by diff_date ) , user_counts as ( select event_first_week, diff_date, count(distinct user_pseudo_id) as user_count from diff_date_tbl group by all order by diff_date ) select * , ROUND(SAFE_DIVIDE(user_count, first_user_count), 2) as retention_rate from ( select * , FIRST_VALUE(user_count) OVER (PARTITION BY event_first_week ORDER BY diff_date) as first_user_count from user_counts ) 2번with base as ( select user_pseudo_id, MIN(event_date) OVER (PARTITION BY user_pseudo_id) as event_first_day, event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') as event_time from advanced.app_logs ) , period as ( select user_pseudo_id, event_date, LAG(event_date) OVER (PARTITION BY user_pseudo_id ORDER BY event_date) as before_event_date from base GROUP BY ALL ORDER BY user_pseudo_id ) -- 평균주기 구하기 SELECT AVG(diff_day) as avg_diff_period FROM ( select *, DATE_DIFF(event_date, before_event_date, DAY ) as diff_day from period ) -- 접속 주기 분포도 확인 SELECT diff_day, COUNT(distinct user_pseudo_id) as user_count FROM ( select *, DATE_DIFF(event_date, before_event_date, DAY ) as diff_day from period ) GROUP BY ALL ORDER BY diff_day 1) new 유저 = (최근월 - 1)월 = 첫접속일자(event_first_month) 2) current 유저 = (최근월 -1)월 , (최근월 -2)월 모두 활동 o 3) dormant 유저 = (최근월 - 1)월 , (최근월 -2)월 모두 활동 x 4) resurrected 유저 = (최근월 -2)월 활동 x, (최근월 -1)월 에는 활동 with base as ( select user_pseudo_id, event_date, DATE_TRUNC(event_date, MONTH) as event_month from advanced.app_logs order by 1 ) , min_max_month as ( SELECT *, MIN(event_month) over (partition by user_pseudo_id order by event_month) as first_month, MAX(DATE_TRUNC(event_date, MONTH)) OVER () as lastest_month from base ) , filter_month as ( SELECT user_pseudo_id, first_month, lastest_month, COUNT(CASE WHEN event_month = DATE_SUB(lastest_month, INTERVAL 1 MONTH) THEN 1 END) AS previous_1_lastest_month, COUNT(CASE WHEN event_month = DATE_SUB(lastest_month, INTERVAL 2 MONTH) THEN 1 END) AS previous_2_lastest_month FROM min_max_month group by ALL ) , user_classification as ( SELECT *, CASE WHEN first_month = DATE_SUB(lastest_month, INTERVAL 1 MONTH ) THEN 'New' WHEN previous_1_lastest_month > 0 and previous_2_lastest_month >= 0 THEN 'Current' WHEN previous_1_lastest_month = 0 and previous_2_lastest_month = 0 THEN 'Dormant' WHEN previous_1_lastest_month = 0 and previous_2_lastest_month > 0 THEN 'Resurrected' ELSE 'Others' END AS user_category FROM filter_month ) select user_category, count(distinct user_pseudo_id) from user_classification group by all 4번WITH base AS ( SELECT user_pseudo_id , first_value(datetime(timestamp_micros(event_timestamp), 'Asia/Seoul')) over(partition by user_pseudo_id order by event_timestamp) as first_visit , last_value(datetime(timestamp_micros(event_timestamp), 'Asia/Seoul')) over(partition by user_pseudo_id order by event_timestamp) as last_visit , datetime(timestamp_micros(event_timestamp), 'Asia/Seoul') as event_datetime , date_trunc(datetime(timestamp_micros(event_timestamp), 'Asia/Seoul'), day) as event_date , date_trunc(datetime(timestamp_micros(event_timestamp), 'Asia/Seoul'), week(monday)) as event_week , date_trunc(datetime(timestamp_micros(event_timestamp), 'Asia/Seoul'), month) as event_month , event_name FROM advanced.app_logs ), click_payment AS ( SELECT DISTINCT user_pseudo_id , min(event_week) over(partition by user_pseudo_id) as first_week , event_week as week_date FROM base WHERE event_name = 'click_payment' ), diff_week AS ( SELECT first_week, date_diff(week_date, first_week, week) as week_diff, count(distinct user_pseudo_id) as users_cnt FROM click_payment GROUP BY ALL ) SELECT first_week , week_diff , users_cnt , safe_divide(users_cnt, first_value(users_cnt) over(partition by first_week order by week_diff)) as retention_rate FROM diff_week order by 1, 2 ;
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 분석
노션 링크로 과제 업로드 합니다.과제 풀이 노션 링크 : [인프런 빅쿼리 빠짝스터디 3주차] 리텐션 분석
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디] 3주차 연습 문제 풀이 제출 - 리텐션 및 추가문제
과제풀이를 노션에 하여서 노션링크 업로드 하여 제출합니다노션링크: https://canyon-king-6a2.notion.site/3-13ce4e3151278004b5abe5e38e5268b8?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디] 3주차 연습 문제 풀이 제출 - 리텐션
과제 풀이를 노션에 작성하여, 노션 링크 업로드 합니다.과제 풀이 노션 링크 : https://sapphire-lotus-719.notion.site/3-1-3-13571425e28d8056afe1c68dd7efaa6e?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 분석
✅ 연습문제 (1) weekly retention--1) weekly retention WITH base AS ( -- 날짜 데이터 전처리 -- DATE도 실제 날짜와 일치하지 않을 수 있으니, 다시 확인하기 -- DATETIME(TIMESTAMP_MICROS(timestamp데이터),'대륙/국가') 기억하고 사용하기 SELECT DATETIME(TIMESTAMP_MICROS(event_timestamp),'Asia/Seoul') AS event_datetime, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp),'Asia/Seoul')) AS event_date, event_name, user_pseudo_id FROM `bigquery-432401.avdanced.app_logs_temp` ) -- 일자 간 차이 구하기 , cal_week AS ( SELECT DISTINCT user_pseudo_id, -- 최초일(week 단위로) 구하기 DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, -- 원본 날짜 데이터에서 주차 데이터로 구하기 DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ), cal_diff AS ( -- 주차데이터간 차이 구하기 SELECT *, DATE_DIFF(event_week, first_week,WEEK) AS diff_week FROM cal_week ), user_count AS ( -- 주차별 user 수 구하기 SELECT diff_week, COUNT(distinct user_pseudo_id) as user_cnt FROM cal_diff GROUP BY diff_week ) -- 출력 SELECT *, ROUND(SAFE_DIVIDE(user_cnt, first_week_user_cnt),2) AS retention_rate FROM ( SELECT diff_week, user_cnt, FIRST_VALUE(user_cnt) OVER(ORDER BY diff_week) AS first_week_user_cnt FROM user_count ) 아직 익숙하지 않아서 중간중간 다시 코드 보고 작성했다. 점점 더 익숙해지길! (2) Retain UserWITH base AS ( -- 날짜 데이터 전처리 -- DATE도 실제 날짜와 일치하지 않을 수 있으니, 다시 확인하기 -- DATETIME(TIMESTAMP_MICROS(timestamp데이터),'대륙/국가') 기억하고 사용하기 SELECT DATETIME(TIMESTAMP_MICROS(event_timestamp),'Asia/Seoul') AS event_datetime, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp),'Asia/Seoul')) AS event_date, event_name, user_pseudo_id FROM `bigquery-432401.avdanced.app_logs_temp` WHERE event_date BETWEEN "2022-08-01" AND "2022-11-03" ) , cal_diff AS ( -- 날짜데이터 간 차이 구하기 SELECT *, DATE_DIFF("2022-11-03", event_date,DAY) AS diff_date FROM base ), user_type AS ( -- user type 구하기 -- 0. 해당없음 -- 1. new : 첫번째 접속 --2. current : 최근 30일내에 접속 -- 3. resurrected : 30일 이상 활동이 없고, 최근 7일 이내 다시 접속한 사용자 -- 4. dormant : 최근 60일 간 접속기록 없음 SELECT *, CASE -- 첫 번째 접속 => 'new' WHEN diff_date = FIRST_VALUE(diff_date) OVER(PARTITION BY user_pseudo_id ORDER BY event_date) THEN 'new' -- 최근 30일 내에 접속한 사용자 => 'current' WHEN diff_date <= 30 THEN 'current' -- 30일 이상 활동이 없고, 최근 7일 이내 다시 접속한 사용자 => 'resurrected' WHEN diff_date > 30 AND diff_date <= 37 THEN 'resurrected' -- 60일 이상 접속하지 않은 사용자 => 'dormant' WHEN diff_date > 60 THEN 'dormant' ELSE 'none' END AS type FROM cal_diff GROUP BY all ORDER BY user_pseudo_id,diff_date ) SELECT * FROM user_type나름 user를 나눈다고 나눴는데, 잘 못 나눠진 느낌이 든다! 현업에는 어떻게 나눌지 궁금하다(3) 어떤 user가 많은가?WITH base AS ( -- 날짜 데이터 전처리 SELECT DATETIME(TIMESTAMP_MICROS(event_timestamp),'Asia/Seoul') AS event_datetime, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp),'Asia/Seoul')) AS event_date, event_name, user_pseudo_id FROM `bigquery-432401.avdanced.app_logs_temp` WHERE event_date BETWEEN "2022-08-01" AND "2022-11-03" ), cal_diff AS ( -- 날짜데이터 간 차이 구하기 SELECT *, DATE_DIFF("2022-11-03", event_date, DAY) AS diff_date FROM base ), user_type AS ( -- user type 구하기 SELECT distinct user_pseudo_id, CASE WHEN diff_date = FIRST_VALUE(diff_date) OVER(PARTITION BY user_pseudo_id ORDER BY event_date) THEN 'new' WHEN diff_date <= 30 THEN 'current' WHEN diff_date > 30 AND diff_date <= 37 THEN 'resurrected' WHEN diff_date > 60 THEN 'dormant' ELSE 'none' END AS type FROM cal_diff ), cal_cnt AS ( -- 각 type별 사용자 수 계산 SELECT type, COUNT(user_pseudo_id) AS user_cnt FROM user_type GROUP BY type ) SELECT type, user_cnt, ROUND(user_cnt * 100.0 / (SELECT COUNT(*) FROM user_type), 2) AS ratio FROM cal_cnt ORDER BY ratio DESC new인 사용자가 69.87 %로 전체의 대다수를 차지한다. 그다음이 최근에 7일 내에 접속한 유저가 21.83%, resurrected : 30일 이상 활동이 없고, 최근 7일 이내 다시 접속한 사용자가 2.59%, 최근 60일 간 접속기록 없는 사용자가 1.25%를 차지하였다.2022-11-03일 기준 "2022-08-01" 에서 "2022-11-03" 기간 동안 서비스를 이용하는 대다수의 사람들은 신규 가입자들이 많다는 것을 알 수 있다.또한, current User가 많은 것으로 보아, 최근 7일이내에 서비스를 사용한 사람들의 비율이 두번째로 높다.현재 신규이용자의 수가 대다수를 차지함으로 이 신규 사용자를 current user로 전환하기 위한 전략을 고안해야 할 것으로 보인다.none이 resurrected, dormant보다 많은데, type을 나누는 기준을 수정하면 더 효율적이게 나눌 수 있지 않을까 생각된다.(4) core event를 click_payment로 놓았을 때 Weekly retentionWITH base AS ( SELECT event_name, user_pseudo_id, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_time FROM `bigquery-432401.avdanced.app_logs_temp` WHERE event_name = "click_payment" ), week_diff AS ( SELECT *, DATE_DIFF(event_week, first_week, WEEK) AS diff FROM ( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ) ), user_counts AS ( SELECT diff, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM week_diff GROUP BY diff ) SELECT *, ROUND(SAFE_DIVIDE(user_cnt, first_week_user_cnt), 2) AS retention_rate FROM ( SELECT diff, user_cnt, FIRST_VALUE(user_cnt) OVER(ORDER BY diff ASC) AS first_week_user_cnt FROM user_counts )1주차 이후로 사용자의 수가 급감. 2주차 부터는 계속 0.01을 유지. 15주차부터 24주차까지는 0 retention rate를 유지.사용자가 1주차 이후로 대거 이탈함을 볼 수 있음. 15주차부터 0.01에서 0.0으로 감소. 1주차에 이벤트 혹은 서비스 런칭으로 사용자수가 급격히 증가하였으나, 서비스 경험이 사용자에게 매력적으로 다가오지 못하여 바로 그 다음주부터 급격하게 감소하는 것을 볼 수 있음.따라서, core_event에 대한 근본적인 개선이 필요할 것으로 보임. ✅ 배운점 및 느낀점 리텐션 분석에 대한 전반적인 지식을 얻을 수 있었다. 아직, 데이터에서 리텐션 분석을 하는 것이 익숙치 않아 학습을 더 해야 할 것 같다. user type을 나누는 부분에서 더 짜임새 있는? 기준이 뭔지, 현업에서는 어떤 기준으로 사용자를 나누는지 궁금하다!고객을 분석하기 위해서는 어떤 부분을 더 봐야하는 지 아직 크게 감이 없는데, 다른 레퍼런스를 보는 것이 필요할 것 같다는 생각이 들었다
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 연습문제
1번 과제#weekly retention WITH base AS( SELECT DISTINCT user_id, event_name, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, DATE(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_date, user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN '2022-08-01' AND '2022-11-03' ) , dates AS( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER (PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ) , first_week_and_diff AS ( SELECT *, DATE_DIFF(event_week, first_week, week) AS diff_of_week FROM dates ) , user_counts AS ( SELECT diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_week_and_diff GROUP BY diff_of_week ORDER BY diff_of_week ) , first_week_user_count AS ( SELECT diff_of_week, user_cnt, FIRST_VALUE(user_cnt) OVER(ORDER BY diff_of_week) AS first_week_user_cnt FROM user_counts WHERE diff_of_week = 0 ) SELECT *, SAFE_DIVIDE(user_cnt, first_week_user_cnt) AS retain_rate FROM first_week_user_count 2번 과제New+Current+Resurrected+Dormant User로 나누기일단 1주일만에 리텐션이 6.8%대이니까, 신규유저의 기준을 7일로 두겠습니다.리텐션이 반으로 떨어지는 3% 미만이 8주부터, 1% 미만이 12주부터여서 임의로 구간을 세팅하겠습니다.이에 따라 미접속 8주차부터는 휴면 유저로 보겠습니다.따라서 8주 뒤 다시 들어온 유저는 복귀 유저로 보겠습니다. # 신규 유저, 기존 유저, 이탈 유저, 복귀 유저로 나눈 쿼리 # 첫 event 후 7일 안 지난 유저 # event 후 7일 지났고, 마지막 접속으로부터 8주가 안 지난 유저 # event 후 7일이 지났고, 마지막 접속으로부터 8주가 지난 유저 # 첫 event 후 7일이 지났고, 마지막 접속으로부터 8주가 지났다가, 다시 event가 발생한 유저 WITH base AS( SELECT DISTINCT user_id, event_name, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, DATE(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_date, user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN '2022-08-01' AND '2023-08-31' ) , dates AS ( SELECT MIN(event_date) AS first_date, MAX(event_date) AS last_date FROM base GROUP BY user_pseudo_id ) SELECT event_date, COUNT(DISTINCT CASE WHEN DATE_DIFF(event_date, first_date, DAY) < 8 THEN user_pseudo_id ELSE NULL END) AS new_user, COUNT(DISTINCT CASE WHEN DATE_DIFF(event_date, first_date, DAY) >= 8 AND DATE_DIFF(event_date, last_date, DAY) < 56 THEN user_pseudo_id ELSE NULL END) AS current_user, COUNT(DISTINCT CASE WHEN DATE_DIFF(event_date, first_date, DAY) >= 8 AND DATE_DIFF(event_date, last_date, DAY) >= 56 THEN user_pseudo_id ELSE NULL END) AS resurrected_user, COUNT(DISTINCT CASE WHEN DATE_DIFF(event_date, first_date, DAY) >= 8 END) AS dormant_user FROM dates GROUP BY event_date 2번 과제는 이렇게까진 짜보았으나 실패했습니다..짜보다보니 last_date가 절대로 event_date보다 +가 나올 수가 없더군요...그래서 그냥 일반 날짜 컬럼을 붙일 수 있나...? 어떻게 해야하지? 하고이미 짜둔 쿼리에 매몰돼서 구하질 못했네요 ㅎㅎ혹시 이렇게 짜는 방향에 대해서 조언해주실 점이 있으시면 부탁드리겠습니다.. 3번 과제 아래와 같이 퍼널 나눴다. 획득screen_view, click_login 활성화click_banner, click_food_category, click_restaurant, click_food,view_recommend_extra_food, click_recommend_extra_foodclick_recommend_food, click_restaurant_nearbyclick_search, request_search 수익화 관련click_cartclick_payment 하지만 각 행동한 유저의 수를 구해보니 아래와 같았다.recommend 관련 행동을 한 유저가 오히려 더 적었다. 그래서 각 이벤트별로 유저 수를 구해보았고#event가 click_cart일 때 weekly retention WITH base AS( SELECT DISTINCT user_id, event_name, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, DATE(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_date, user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN '2022-08-01' AND '2023-08-31' AND user_pseudo_id IN (SELECT user_pseudo_id FROM advanced.app_logs WHERE event_name = 'click_cart') ) , dates AS( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER (PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ) , first_week_and_diff AS ( SELECT *, DATE_DIFF(event_week, first_week, week) AS diff_of_week FROM dates ) , user_counts AS ( SELECT diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_week_and_diff GROUP BY diff_of_week ORDER BY diff_of_week ) , first_week_user_count AS ( SELECT diff_of_week, user_cnt, FIRST_VALUE(user_cnt) OVER(ORDER BY diff_of_week) AS first_week_user_cnt FROM user_counts ) SELECT *, SAFE_DIVIDE(user_cnt, first_week_user_cnt) AS retain_rate FROM first_week_user_count 4번 과제처럼 core event를 각각 설정해보았다.4번 과제의 결과인 click_payment 했던 유저의 리텐션이 역시 제일 좋았었다. 반면 cart를 클릭한 유저의 리텐션은 비교적 조오금 낮았고 반대로 click_search, click_recommend_food, click_restaurant_nearby를 한 유저의 리텐션이 구매자 리텐션과 유사했다.그래서 기본 행동들에서, 고객이 더 관심 기울여하는 행동인 검색, 추천 서비스 클릭하기 등의 행동을 유도하면 어떨까.. 싶었다! 사실 10% 이상으로 다 비슷하긴 하지만 약간의 차이라도 보이는 것들을 좀 추려보았다. 4번 과제# click_payment event를 경험한 유저들의 weekly retention WITH base AS( SELECT DISTINCT user_id, event_name, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, DATE(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_date, user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN '2022-08-01' AND '2023-08-31' AND user_pseudo_id IN (SELECT user_pseudo_id FROM advanced.app_logs WHERE event_name = 'click_payment') ) , dates AS( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER (PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ) , first_week_and_diff AS ( SELECT *, DATE_DIFF(event_week, first_week, week) AS diff_of_week FROM dates ) , user_counts AS ( SELECT diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_week_and_diff GROUP BY diff_of_week ORDER BY diff_of_week ) , first_week_user_count AS ( SELECT diff_of_week, user_cnt, FIRST_VALUE(user_cnt) OVER(ORDER BY diff_of_week) AS first_week_user_cnt FROM user_counts ) SELECT *, SAFE_DIVIDE(user_cnt, first_week_user_cnt) AS retain_rate FROM first_week_user_count
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 연습문제
1)WeeklyRetention을구하는쿼리를바닥부터스스로작성해보세요WITH event_log_base AS ( SELECT DISTINCT DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATE_TRUNC(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul'), WEEK(MONDAY)) AS event_week, DATE_TRUNC(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul'), MONTH) AS event_month, user_id, user_pseudo_id, event_name, platform, event_params FROM advanced.app_logs ), user_first_visit AS ( SELECT user_pseudo_id, MIN(event_week) OVER (PARTITION BY user_pseudo_id) AS first_visit_week, event_week FROM event_log_base ), weekly_retention_base AS ( SELECT first_visit_week, DATE_DIFF(event_week, first_visit_week, WEEK) AS week_diff, COUNT(DISTINCT user_pseudo_id) AS active_users FROM user_first_visit GROUP BY first_visit_week, week_diff ) SELECT first_visit_week, week_diff, active_users, SAFE_DIVIDE(active_users, FIRST_VALUE(active_users) OVER (PARTITION BY first_visit_week ORDER BY week_diff ASC)) AS retention_rate FROM weekly_retention_base ORDER BY first_visit_week, week_diff; 2)RetainUser를New+Current+Resurrected+DormantUser로나누는쿼리를 작성해보세요.신규 사용자: 첫 방문 후 30일 이내에 활동한 사용자. 첫 방문 날짜와 마지막 방문 날짜 간의 차이가 30일 이하일 경우 현재 사용자: 첫 방문 후 30일이 지난 사용자 중, 최근 30일 이내에 접속한 사용자. 첫 방문과 마지막 방문 간 차이가 30일을 넘고, 가장 최근 활동으로부터 30일 이내에 접속한 경우 휴면 사용자: 마지막 방문 이후 30일 이상 비활동 상태에 있는 사용자. 마지막 방문이 가장 최근 이벤트로부터 30일 이상 경과한 경우 복귀 사용자: 30일 이상 비활동 상태였다가 다시 활동을 시작한 사용자. 이전 방문 이후 30일 동안 비활동이 이어지다 다시 방문한 경우 WITH event_data AS ( SELECT DISTINCT DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, user_pseudo_id FROM advanced.app_logs ), -- 사용자별 첫 방문 및 마지막 방문 날짜 계산 user_first_last_visit AS ( SELECT user_pseudo_id, MIN(event_date) AS first_visit_date, MAX(event_date) AS last_visit_date FROM event_data GROUP BY user_pseudo_id ), -- 전체 이벤트 데이터에서 가장 최근의 이벤트 날짜를 계산 latest_event_date AS ( SELECT MAX(event_date) AS current_date FROM event_data ), -- 사용자의 이벤트 로그를 통해 비활동 기간을 기록하여 휴면 여부 판단 user_activity AS ( SELECT user_pseudo_id, event_date, LEAD(event_date) OVER (PARTITION BY user_pseudo_id ORDER BY event_date) AS next_event_date, CASE WHEN DATE_DIFF(LEAD(event_date) OVER (PARTITION BY user_pseudo_id ORDER BY event_date), event_date, DAY) > 30 THEN 1 ELSE 0 END AS dormant_history FROM event_data ), -- 사용자 유형 분류: 신규, 현재, 휴면, 복귀 사용자 user_category AS ( SELECT u.user_pseudo_id, u.first_visit_date, u.last_visit_date, led.current_date, MAX(ua.dormant_history) AS dormant_history, CASE WHEN DATE_DIFF(led.current_date, u.last_visit_date, DAY) > 30 THEN 'inactive_user' WHEN DATE_DIFF(u.last_visit_date, u.first_visit_date, DAY) <= 30 THEN 'first_month_user' WHEN MAX(ua.dormant_history) = 1 THEN 'returning_user' ELSE 'active_user' END AS user_type FROM user_first_last_visit AS u CROSS JOIN latest_event_date AS led LEFT JOIN user_activity AS ua ON u.user_pseudo_id = ua.user_pseudo_id GROUP BY u.user_pseudo_id, u.first_visit_date, u.last_visit_date, led.current_date ), -- 사용자 유형별 첫 방문 주차와 현재 주차 간 차이 계산 user_weekly_retention AS ( SELECT uc.user_type, ed.user_pseudo_id, ed.event_date, DATE_DIFF(DATE_TRUNC(ed.event_date, WEEK(MONDAY)), DATE_TRUNC(uc.first_visit_date, WEEK(MONDAY)), WEEK) AS week_difference FROM event_data AS ed JOIN user_category AS uc ON ed.user_pseudo_id = uc.user_pseudo_id ), -- 사용자 유형과 주차별 활성 사용자 수 집계 user_count_by_type_and_week AS ( SELECT user_type, week_difference, COUNT(DISTINCT user_pseudo_id) AS user_count FROM user_weekly_retention GROUP BY user_type, week_difference ), -- 유지율 계산을 위한 초기 사용자 수와 현재 주차별 유지율 계산 retention_calculation AS ( SELECT user_type, week_difference, user_count, FIRST_VALUE(user_count) OVER (PARTITION BY user_type ORDER BY week_difference) AS initial_user_count FROM user_count_by_type_and_week ) SELECT user_type, week_difference, ROUND(SAFE_DIVIDE(user_count, initial_user_count), 2) AS retention_rate FROM retention_calculation ORDER BY user_type, week_difference; 3)주어진데이터에서어떤사람들이리텐션이그나마높을까요?찾아보세요2번을 참고해서 고객을 4개의 타입으로 분류하였을 때first_month_user: 첫 주 이후 리텐션이 높지만 이후 감소하는 모습을 보인다. active_user: first_month_user와 비슷하게 첫 주 이후 빠르게 감소한다. returning_user: 2주부터 0.08로 떨어지지만 이후 완만하게 리텐션이 감소한다.오랜기간 10주차 이후에도 일부 사용자가 유지되고 있는 모습을 보인다. inactive_user: 리텐션이 다른 유저에 비해 높다고는 할 수 없지만 20주차까지 꾸준하게 소수의사용자가 유지되고 있는 모습을 보인다. --> 따라서 returning_use이 가장 완만하게 리텐션이 감소, 가장 오랫동안 리텐션이 지속했기때문에 가장 높은 리텐션을 유지했다고 판단하였습니다. 4)CoreEvent를“click_payment”라고설정하고WeeklyRetention을구해주세요WITH event_data AS ( SELECT DISTINCT user_pseudo_id, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATE_TRUNC(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul'), WEEK(MONDAY)) AS event_week, event_name FROM advanced.app_logs WHERE event_name = 'click_payment' -- Core event 필터링 ), -- 사용자별 첫 방문 주차와 현재 이벤트 주차 간의 차이 계산 user_first_week_data AS ( SELECT user_pseudo_id, MIN(event_week) OVER (PARTITION BY user_pseudo_id) AS first_visit_week, event_week AS current_week, DATE_DIFF(event_week, MIN(event_week) OVER (PARTITION BY user_pseudo_id), WEEK) AS week_difference FROM event_data ), -- 주차별 사용자 수 집계 user_count_by_week AS ( SELECT first_visit_week, week_difference, COUNT(DISTINCT user_pseudo_id) AS user_count FROM user_first_week_data GROUP BY first_visit_week, week_difference ORDER BY first_visit_week, week_difference ), -- 유지율 계산을 위한 초기 사용자 수와 주차별 유지율 계산 retention_base AS ( SELECT first_visit_week, week_difference, user_count, FIRST_VALUE(user_count) OVER (PARTITION BY first_visit_week ORDER BY week_difference) AS initial_user_count FROM user_count_by_week ) -- 최종 리텐션 비율 계산 및 결과 출력 SELECT first_visit_week, week_difference, ROUND(SAFE_DIVIDE(user_count, initial_user_count), 2) AS retention_rate FROM retention_base ORDER BY first_visit_week, week_difference;
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 분석
1) 리텐션 연습 문제https://torch-cart-08d.notion.site/3-139701964d8c805d9e49f3c68ab9b0b5 2) 3주차 추가 문제1번 문제(15분)주차별 각 카테고리별 평균 할인율이 가장 높았던 기간과 할인율을 구하는 쿼리를 작성해주세요단, 날짜 데이터를 YYYY-MM-DD 23:59:39 이런 형태로 변경해주세요SELECT *, RANK() OVER(PARTITION BY category ORDER BY rate DESC) AS category_rank FROM ( SELECT category, FORMAT_DATETIME('%Y-%m-%d 23:59:39', PARSE_DATETIME('%m/%d/%Y %H:%M:%S', transaction_date)) AS transaction_date, actual_price, list_price, ((list_price - actual_price) / list_price) * 100 AS rate FROM transaction_data AS td LEFT JOIN item_info ii ON td.item_id = ii.item_id ) QUALIFY category_rank = 1 ORDER BY category, transaction_date 2번 문제(10분)2024년 1월에 가장 많은 매출을 기록한 카테고리를 구하는 쿼리를 작성해주세요SELECT category, SUM(actual_price) AS total_price FROM ( SELECT category, FORMAT_DATETIME('%Y-%m-%d 23:59:39', PARSE_DATETIME('%m/%d/%Y %H:%M:%S', transaction_date)) AS transaction_date, actual_price, FROM transaction_data AS td LEFT JOIN item_info ii ON td.item_id = ii.item_id WHERE EXTRACT(MONTH FROM PARSE_DATETIME('%m/%d/%Y %H:%M:%S', transaction_date)) = 1 ) GROUP BY ALL ORDER BY total_price DESC LIMIT 1 3번 문제(10분)유저별 총 구매 금액이 200만원 이상인 유저들이 가장 많이 구매한 카테고리를 찾는 쿼리를 작성해주세요over_200 AS ( SELECT td.user_id, SUM(actual_price) AS user_price FROM transaction_data AS td LEFT JOIN user_info AS ui ON td.user_id = ui.user_id GROUP BY ALL HAVING user_price >= 2000000 ) SELECT category, COUNT(*) AS cnt FROM over_200 AS O2 CROSS JOIN transaction_data AS td LEFT JOIN item_info AS ii ON td.item_id = ii.item_id GROUP BY ALL ORDER BY cnt DESC LIMIT 1약 20분/5분/8분 소요1번 문제에서 날짜 데이터를 변환하는 방법이 떠오르지 않아 시간이 많이 소요됐습니다.
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 3주차 과제] 리텐션 과제
# 1) Weekly Retention을 구하는 쿼리를 바닥부터 스스로 작성해보세요 WITH base AS ( SELECT DISTINCT user_id, event_name, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN '2022-08-01' AND '2023-08-03' ), first_diff_of_week AS ( SELECT *, DATE_DIFF(event_week, first_week, WEEK) AS diff_of_week FROM ( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ) ), user_counts AS ( SELECT diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_diff_of_week GROUP BY diff_of_week ) SELECT diff_of_week, user_cnt, ROUND(SAFE_DIVIDE(user_cnt, first_cnt), 3) AS retention_rate FROM ( SELECT *, FIRST_VALUE(user_cnt) OVER(ORDER BY diff_of_week) AS first_cnt FROM user_counts ) ORDER BY diff_of_week# 2) Retain User를 New + Current + Resurrected + Dormant User로 나누는 쿼리를 작성해보세요. # 신규유저(New) : 제품을 처음 사용하는 유저 # 기존유저(Current) : 제품을 지속적으로 사용하는 유저 # 복귀유저(Resurrected) : 과거에 사용 -> 비활성 -> 다시 제품을 사용한 유저 # 휴면유저(Dormant) : 일정 기간 제품을 사용하지 않은 비활성화 사용자 WITH base AS ( SELECT DISTINCT user_id, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATE_TRUNC(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul'), WEEK(MONDAY)) AS event_week, user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN '2022-08-01' AND '2023-08-03' ), user_activity AS ( SELECT user_pseudo_id, event_week, LAG(event_week, 1) OVER(PARTITION BY user_pseudo_id ORDER BY event_week) AS prev_week, MIN(event_week) OVER(PARTITION BY user_pseudo_id) AS first_week FROM base ), user_classification AS ( SELECT user_pseudo_id, event_week, CASE WHEN event_week = first_week THEN 'New' WHEN prev_week IS NOT NULL AND DATE_DIFF(event_week, prev_week, WEEK) = 1 THEN 'Current' WHEN prev_week IS NOT NULL AND DATE_DIFF(event_week, prev_week, WEEK) > 1 THEN 'Resurrected' ELSE 'Dormant' END AS user_type FROM user_activity ) SELECT event_week, user_type, COUNT(DISTINCT user_pseudo_id) AS user_count FROM user_classification GROUP BY ALL ORDER BY event_week, user_type# 3) 주어진 데이터에서 어떤 사람들이 리텐션이 그나마 높을까요? WITH base AS ( SELECT DISTINCT user_id, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATE_TRUNC(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul'), WEEK(MONDAY)) AS event_week, user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN '2022-08-01' AND '2023-08-03' ), user_activity AS ( SELECT user_pseudo_id, event_week, LAG(event_week, 1) OVER(PARTITION BY user_pseudo_id ORDER BY event_week) AS prev_week, MIN(event_week) OVER(PARTITION BY user_pseudo_id) AS first_week FROM base ), user_classification AS ( SELECT user_pseudo_id, event_week, CASE WHEN event_week = first_week THEN 'New' WHEN prev_week IS NOT NULL AND DATE_DIFF(event_week, prev_week, WEEK) = 1 THEN 'Current' WHEN prev_week IS NOT NULL AND DATE_DIFF(event_week, prev_week, WEEK) > 1 THEN 'Resurrected' ELSE 'Dormant' END AS user_type FROM user_activity ), user_counts AS ( SELECT event_week, user_type, COUNT(DISTINCT user_pseudo_id) AS user_count FROM user_classification GROUP BY ALL ORDER BY event_week, user_type ) SELECT event_week, DATE_DIFF(event_week, first_week, WEEK) AS weeks_after_first_week, user_type, user_count, cohort_users, ROUND(SAFE_DIVIDE(user_count, cohort_users), 3) AS retention_rate FROM ( SELECT DATE(event_week) AS event_week, DATE(MIN(event_week) OVER()) AS first_week, user_type, user_count, MAX(CASE WHEN user_type = 'New' THEN user_count END) OVER(PARTITION BY DATE(event_week)) AS cohort_users FROM user_counts ORDER BY event_week, CASE user_type WHEN 'New' THEN 1 WHEN 'Current' THEN 2 WHEN 'Resurrected' THEN 3 WHEN 'Dormant' THEN 4 END )2022-08-01 ~ 2023-08-03 기간으로 설정하여, 개인적인 의견으로 Retain User를 쪼개 보았습니다.배달어플은 유저들의 사용 간격이 짧다고 생각하기 때문에, 주의 주기로 리텐션으로 설정했습니다. 저는바로 전 주에 사용했고, 그 다음주에 사용했다면 기존유저(Current)로 설정했습니다.2주 이상 텀을 두고 다시 사용했다면 복귀유저(Resurrected)로 설정했습니다.처음 들어가고 사용하지 않았다면 휴면유저(Dormant)로 설정했습니다.주차마다 '신규/기존/복귀/휴면' 유저를 새로 계산하는 쿼리입니다. 데이터를 살펴보면,1) 신규유저(New) : 제품을 처음 사용하는 유저출시(2022-08-01을 출시기간으로 생각하면)로부터 현재(2023-01-16 => 0803으로 설정했지만 0116까지가 끝입니다.)까지의 신규유저 수의 변화를 살펴보겠습니다.런칭부터 10주까지의 신규유저의 수는 주춤한 경우도 있지만 평균 4~500명 정도 증가했습니다.하지만, 11주에 4000명 대에서 3100명으로 900명이 급감한 이후로 계속 하락세를 타고 24주차 현재 신규 가입자의 수는 500명 남짓하고 있습니다.11주차부터의 마케팅 효과는 빛을 보지 못하고 있는 것 같습니다. 코호트 리텐션 시각화를 보자면,2) 기존유저(Current) : 제품을 지속적으로 사용하는 유저(바로 전 주와 이번 주)출시로부터 현재까지 줄곧 주차마다의 신규유저 대비 기존유저의 비율이 증가하고 있습니다.특히, 24주차에 가입한 신규유저 대비 기존유저의 비율은 76%나 차지하고 있습니다.시간이 흐름에 따라 점차 어플이 안정화되면서 신규유저들이 만족을 느끼는 비율이 늘어가고 있다고 보입니다. 3) 복귀유저(Resurrected) : 과거에 사용 -> 비활성화 -> 다시 제품을 사용한 유저(2주 후 복귀)복귀유저는 12주차를 분기로 13주차부터는 신규가입자보다 복귀유저의 수가 많아지기 시작했습니다.24주차에는 신규유저 대비 복귀유저는 6배가 많습니다.복귀유저가 기존유저로 안정되는 현황을 추출할 수 있는 쿼리는 조금 더 연구해 보겠습니다. 4) 휴면유저(Dormant) : 일정기간 제품을 사용하지 않은 비활성화 사용자신규유저 대비 휴면유저는 꾸준히 늘어가고 있습니다(휴면유저가 누적됨).24주차 신규가입자의 5.7배가 휴면유저 입니다.# 4) Core Event를 'click_payment'라고 설정하고 Weekly Retention을 구해주세요 WITH base AS ( SELECT DISTINCT user_id, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN '2022-08-01' AND '2023-08-03' AND event_name = 'click_payment' ), first_week_and_diff AS ( SELECT *, DATE_DIFF(event_week, first_week, WEEK) AS diff_of_week FROM ( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ) ), user_counts AS ( SELECT diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_week_and_diff GROUP BY diff_of_week ORDER BY diff_of_week ) SELECT diff_of_week, user_cnt, ROUND(SAFE_DIVIDE(user_cnt, cohort_users), 3) AS retention_rate FROM ( SELECT *, FIRST_VALUE(user_cnt) OVER(ORDER BY diff_of_week) AS cohort_users FROM user_counts ) ORDER BY diff_of_week
주간 인기글
순위 정보를
불러오고 있어요