묻고 답해요
152만명의 커뮤니티!! 함께 토론해봐요.
인프런 TOP Writers
순위 정보를
불러오고 있어요
-
해결됨BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 3주차 과제] 리텐션 연습문제
1) Weekly Retention을 구하는 쿼리를 바닥부터 스스로 작성해보세요WITH base AS ( SELECT DISTINCT user_id, user_pseudo_id, event_name, DATETIME(TIMESTAMP_MICROS(event_timestamp), "Asia/Seoul") AS event_datetime, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), "Asia/Seoul")) AS event_date FROM `advanced.app_logs` ), first_week_and_diff AS ( SELECT *, DATE_DIFF(event_week, first_week, WEEK) AS diff_of_week FROM ( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ) ) SELECT diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt, ROUND(SAFE_DIVIDE((COUNT(DISTINCT user_pseudo_id)), (FIRST_VALUE(COUNT(DISTINCT user_pseudo_id)) OVER(ORDER BY diff_of_week))), 3) AS retention FROM first_week_and_diff GROUP BY diff_of_week ORDER BY diff_of_week2) Retain User를 New+Current+Resurrected+DormantUser로 나누는 쿼리를 작성해보세요.유저 구분 기준은 제품이나 서비스에 따라 다르겠지만 applogs 데이터는 주단위로 했을때 총 24주치 데이터가 있었고, 유저별로 평균 구매 주기와 방문 횟수를 구해보았을 때 6주를 기준으로 유저를 구분하면 될 것이라 생각함. New : 첫 방문한 신규 유저 Current : 방문 횟수 2회 이상이면서 6주 이내 재 방문 이력 있는 유저Resurrected : 방문 횟수 2회 이상 & 직전 방문 경과 7주 이상이면서 6주 이내 재방문 이력 있는 유저Dormant : 직전 방문 경과 7주 이상이면서 6주 이내 재방문 없는 유저(사실 이렇게 나누는게 맞는지 모르겠지만, 나름의 논리적인 기준을 찾아보려고 노력하였음. 😫 과제 기한 연장에도 개인적인 사정으로 시간이 부족하여 깊이 탐구하지 못했고, 쿼리 작성을 완성하지 못했습니다 😩 추후에 완성하면 보완하도록 하겠습니다. )3) 주어진 데이터에서 어떤 사람들이 리텐션이 그나마 높을까요? 찾아보세요 미완성으로 추후에 완성하면 보완하겠습니다 ! 4) Core Event를 “click_payment” 라고 설정하고 Weekly Retention을 구해주세요WITH base AS ( SELECT DISTINCT user_id, user_pseudo_id, event_name, DATETIME(TIMESTAMP_MICROS(event_timestamp), "Asia/Seoul") AS event_datetime, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), "Asia/Seoul")) AS event_date FROM `advanced.app_logs` ), first_week_and_diff AS ( SELECT *, DATE_DIFF(event_week, first_week, WEEK) AS diff_of_week FROM ( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ) ) SELECT diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt, ROUND(SAFE_DIVIDE((COUNT(DISTINCT user_pseudo_id)), (FIRST_VALUE(COUNT(DISTINCT user_pseudo_id)) OVER(ORDER BY diff_of_week))), 3) AS retention FROM first_week_and_diff GROUP BY diff_of_week ORDER BY diff_of_week
-
해결됨BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 과제
1번 문제WITH -- 1단계: 필수 데이터만 추출하기 base_events AS ( SELECT user_pseudo_id, DATE(TIMESTAMP_MICROS(event_timestamp), "Asia/Seoul") AS event_date FROM advanced.app_logs ), -- 2단계: 사용자별 첫 방문 주차와 각 활동 주차 구하기 user_weeks AS ( SELECT DISTINCT -- 중복 제거 user_pseudo_id, -- 사용자별 첫 방문 주차 (월요일 기준) DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, -- 실제 방문한 주차 (월요일 기준) DATE_TRUNC(event_date, WEEK(MONDAY)) AS visit_week FROM base_events ), -- 3단계: 첫 방문 이후 몇 주차인지 계산하기 week_numbers AS ( SELECT user_pseudo_id, -- 첫 방문 이후 경과된 주차 계산 DATE_DIFF(visit_week, first_week, WEEK) AS week_number FROM user_weeks -- 최대 12주까지만 분석 WHERE DATE_DIFF(visit_week, first_week, WEEK) <= 12 ), -- 4단계: 주차별 총 사용자 수 계산하기 weekly_users AS ( SELECT week_number, COUNT(DISTINCT user_pseudo_id) AS user_count FROM week_numbers GROUP BY week_number ) -- 5단계: 최종 리텐션 계산하기 SELECT week_number, user_count as active_users, FIRST_VALUE(user_count) OVER(ORDER BY week_number) as first_week_users, ROUND(100.0 * user_count / FIRST_VALUE(user_count) OVER(ORDER BY week_number), 2) as retention_rate FROM weekly_users ORDER BY week_number; 2번 문제WITH -- 1단계: 사용자별 주차 데이터 준비 user_weeks AS ( SELECT DISTINCT user_pseudo_id, -- 첫 방문 주차 DATE_TRUNC(MIN(DATE(TIMESTAMP_MICROS(event_timestamp), "Asia/Seoul")) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, -- 활동 주차 DATE_TRUNC(DATE(TIMESTAMP_MICROS(event_timestamp), "Asia/Seoul"), WEEK(MONDAY)) AS event_week FROM advanced.app_logs ), -- 2단계: 사용자 상태 확인 user_status AS ( SELECT user_pseudo_id, event_week, first_week, -- 이전 방문 주차 LAG(event_week) OVER(PARTITION BY user_pseudo_id ORDER BY event_week) AS prev_week FROM user_weeks ), -- 3단계: 상태 분류 weekly_status AS ( SELECT event_week, user_pseudo_id, CASE WHEN event_week = first_week THEN 'New' WHEN DATE_DIFF(event_week, prev_week, WEEK) = 1 THEN 'Current' WHEN DATE_DIFF(event_week, prev_week, WEEK) > 1 THEN 'Resurrected' END as user_type FROM user_status ), -- 4단계: 각 주차별 전체 유저 수 total_users AS ( SELECT fwd.event_week, COUNT(DISTINCT user_pseudo_id) AS total_user_count FROM user_status CROSS JOIN (SELECT DISTINCT event_week FROM user_weeks) AS fwd WHERE first_week <= fwd.event_week GROUP BY event_week ), -- 5단계: 주차별 활성 유저 수 계산 active_users AS ( SELECT event_week, COUNTIF(user_type = 'New') AS new_users, COUNTIF(user_type = 'Current') AS current_users, COUNTIF(user_type = 'Resurrected') AS resurrected_users, COUNT(DISTINCT user_pseudo_id) AS retain_users FROM weekly_status GROUP BY event_week ) -- 6단계: 최종 결과 SELECT FORMAT_DATE('%Y-%m-%d', a.event_week) as week_start, new_users, current_users, resurrected_users, (t.total_user_count - a.retain_users) as dormant_users, ROUND(100.0 * current_users / NULLIF(LAG(new_users) OVER(ORDER BY a.event_week), 0), 1) as retention_rate, retain_users as active_users, t.total_user_count as total_users FROM active_users a JOIN total_users t ON a.event_week = t.event_week ORDER BY a.event_week; 3번 문제WITH -- 1단계: 기본 데이터 준비 user_weeks AS ( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(DATE(TIMESTAMP_MICROS(event_timestamp), "Asia/Seoul")) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, DATE_TRUNC(DATE(TIMESTAMP_MICROS(event_timestamp), "Asia/Seoul"), WEEK(MONDAY)) AS event_week FROM advanced.app_logs ), -- 2단계: 사용자별 방문 주차 수 계산 user_visit_frequency AS ( SELECT user_pseudo_id, COUNT(DISTINCT event_week) as total_visit_weeks, DATE_DIFF(MAX(event_week), MIN(event_week), WEEK) + 1 as weeks_since_first, MIN(event_week) as first_visit_week FROM user_weeks GROUP BY user_pseudo_id ), -- 3단계: 사용자별 리텐션 점수 계산 user_retention_score AS ( SELECT user_pseudo_id, total_visit_weeks, weeks_since_first, ROUND(100.0 * total_visit_weeks / weeks_since_first, 2) as visit_rate, FORMAT_DATE('%Y-%m', first_visit_week) as cohort_month FROM user_visit_frequency WHERE weeks_since_first >= 4 -- 최소 4주 이상 경과된 사용자만 ), -- 4단계: 사용자 행동 데이터 분석 user_behavior AS ( SELECT r.user_pseudo_id, r.visit_rate, r.cohort_month, COUNT(DISTINCT DATE(TIMESTAMP_MICROS(l.event_timestamp), "Asia/Seoul")) as active_days, COUNT(DISTINCT l.event_timestamp) as total_events, COUNT(DISTINCT l.event_name) as unique_event_types FROM user_retention_score r JOIN advanced.app_logs l ON r.user_pseudo_id = l.user_pseudo_id GROUP BY r.user_pseudo_id, r.visit_rate, r.cohort_month ), -- 5단계: 리텐션 세그먼트별 행동 패턴 분석 retention_segments AS ( SELECT cohort_month, CASE WHEN visit_rate >= 75 THEN 'Very High (75%+)' WHEN visit_rate >= 50 THEN 'High (50-74%)' WHEN visit_rate >= 25 THEN 'Medium (25-49%)' ELSE 'Low (<25%)' END as retention_segment, COUNT(DISTINCT user_pseudo_id) as user_count, ROUND(AVG(visit_rate), 2) as avg_retention_rate, ROUND(AVG(active_days), 1) as avg_active_days, ROUND(AVG(total_events), 1) as avg_total_events, ROUND(AVG(unique_event_types), 1) as avg_unique_events FROM user_behavior GROUP BY cohort_month, CASE WHEN visit_rate >= 75 THEN 'Very High (75%+)' WHEN visit_rate >= 50 THEN 'High (50-74%)' WHEN visit_rate >= 25 THEN 'Medium (25-49%)' ELSE 'Low (<25%)' END ) -- 최종 결과 SELECT cohort_month, retention_segment, user_count, avg_retention_rate, avg_active_days, avg_total_events, avg_unique_events, ROUND(100.0 * user_count / SUM(user_count) OVER (PARTITION BY cohort_month), 2) as segment_percentage FROM retention_segments ORDER BY cohort_month, CASE retention_segment WHEN 'Very High (75%+)' THEN 1 WHEN 'High (50-74%)' THEN 2 WHEN 'Medium (25-49%)' THEN 3 WHEN 'Low (<25%)' THEN 4 END; 4번WITH -- 1단계: 결제 이벤트 기본 데이터 payment_base AS ( SELECT DISTINCT user_pseudo_id, DATE(TIMESTAMP_MICROS(event_timestamp), "Asia/Seoul") AS event_date FROM advanced.app_logs WHERE event_name = "click_payment" ), -- 2단계: 사용자별 첫 결제일과 주차 계산 user_weeks AS ( SELECT user_pseudo_id, -- 첫 결제 주차 DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, -- 실제 결제 주차 DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM payment_base ), -- 3단계: 주차별 상태 계산 week_status AS ( SELECT user_pseudo_id, event_week, first_week, -- 첫 주차와의 차이 DATE_DIFF(event_week, first_week, WEEK) AS week_number FROM user_weeks WHERE DATE_DIFF(event_week, first_week, WEEK) <= 12 -- 최대 12주까지만 분석 ), -- 4단계: 주차별 활성 사용자 수 계산 weekly_users AS ( SELECT week_number, COUNT(DISTINCT user_pseudo_id) AS active_users FROM week_status GROUP BY week_number ), -- 5단계: 첫 주 사용자 수 (코호트 크기) 구하기 first_week_users AS ( SELECT COUNT(DISTINCT user_pseudo_id) AS cohort_size FROM week_status WHERE week_number = 0 ) -- 최종 결과 SELECT week_number as week, active_users, cohort_size, ROUND(100.0 * active_users / cohort_size, 2) as retention_rate FROM weekly_users CROSS JOIN first_week_users ORDER BY week_number; 서비스 성장세총 사용자: 52,823명까지 증가8월부터 12월까지 꾸준한 성장10월에 가장 높은 신규 유저 유입 (4,048명)리텐션 개선전체 리텐션: 8월 2.6% → 1월 47.9%로 큰 폭 개선12월부터 40% 이상의 안정적인 리텐션 유지재방문 사용자 비중이 지속적으로 증가 개선 필요점결제 리텐션이 매우 낮음 (1% 수준)신규 사용자 유입이 감소 추세휴면 사용자가 지속적으로 증가
-
해결됨BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 3주차 과제] 리텐션 연습문제 및 추가 코딩테스트 문제 풀이
노션 링크로 과제 제출합니다!https://pleasant-moth-c20.notion.site/3-13dc19b9b6fe80dbb81dfa041247cfd7?pvs=4
-
해결됨BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
바짝 스터디 3주차 과제 제출
연습문제 1번select *, TRUNC(user_count / first_count, 3) AS ratio from ( select *, first_value(user_count) over(order by week_diff asc) as first_count from ( select week_diff, count(*) as user_count from ( select *, date_diff(datetime_week, first_week, week) as week_diff from ( select distinct event_date, user_pseudo_id, platform, datetime_week, -- event_name, first_value(datetime_week) over (partition by user_pseudo_id order by datetime_week asc) as first_week from ( select * except(firebase_screen,event_name), DATETIME_TRUNC(CAST(event_date AS DATE), WEEK) AS datetime_week, concat(firebase_screen,'_',event_name) as event_name from ( select * except(event_param), max(if(event_param.key = 'firebase_screen',event_param.value.string_value , null)) as firebase_screen from ( select event_date, event_timestamp, user_pseudo_id, platform, event_name, event_param from `advanced.app_logs` cross join unnest(event_params) as event_param where event_date >= '2022-08-01' and event_date <= '2022-08-30' ) group by all ) ) ) ) group by week_diff ) ) order by week_diff ;연습문제2현재 시간에서 마지막 활동이 2주 이상 경과면 이탈마지막 활동이 현재 시간에서 1주 이내 인데 이전 활동과 갭이 2주이상이면 복귀마지막 action이 현재시간에서 1주내이고 이전 액션이 없다면 new나머진 activeselect user_pseudo_id, case when DATETIME_DIFF(now, datetime_week, WEEK) >= 2 then 'dormant' when DATETIME_DIFF(datetime_week, last_visit_week, WEEK) < 2 and DATETIME_DIFF(now, datetime_week, WEEK) < 2 then 'active' when DATETIME_DIFF(datetime_week, last_visit_week, WEEK) >= 2 and DATETIME_DIFF(now, datetime_week, WEEK) < 2 then 'resurrected' when DATETIME_DIFF(now, datetime_week, WEEK) < 2 and last_visit_week is null then 'new' else null end as status from ( select *, rank() over (partition by user_pseudo_id order by event_timestamp desc) as rank from ( select distinct event_date, user_pseudo_id, platform, datetime_week, event_timestamp, -- event_name, -- first_value(datetime_week) over (partition by user_pseudo_id order by datetime_week asc) as first_week lag(datetime_week) over(partition by user_pseudo_id order by event_timestamp) as last_visit_week, CAST('2022-08-31 00:00:00' AS DATETIME) as now, from ( select * except(firebase_screen,event_name), DATETIME_TRUNC(CAST(event_date AS DATE), WEEK) AS datetime_week, concat(firebase_screen,'_',event_name) as event_name from ( select * except(event_param), max(if(event_param.key = 'firebase_screen',event_param.value.string_value , null)) as firebase_screen from ( select event_date, event_timestamp, user_pseudo_id, platform, event_name, event_param from `advanced.app_logs` cross join unnest(event_params) as event_param where event_date >= '2022-08-01' and event_date <= '2022-08-30' ) group by all ) ) ) qualify rank=1 ) ;연습문제3, 연습문제4쿼리는 동일하게 사용하고 event_name을 변경하면서 리텐션이 가장 높은 값을 추적click_restaurant의 리텐션이 가장 높아 이를 핵심 이벤트로 지정해야 하나 싶긴 한데 결과가 조금 이상해서 쿼리 점검 필요 해보임..아래 쿼리에서 event_name만 click_payment로 변경하면 click_payment의 weekly retentionselect *, TRUNC(user_count / first_count, 3) AS ratio from ( select *, first_value(user_count) over(order by week_diff asc) as first_count from ( select week_diff, count(*) as user_count from ( select *, date_diff(datetime_week, first_week, week) as week_diff from ( select distinct user_pseudo_id, datetime_week, -- event_name, first_value(datetime_week) over (partition by user_pseudo_id order by datetime_week asc) as first_week from ( select *, DATETIME_TRUNC(CAST(event_date AS DATE), WEEK) AS datetime_week from ( select event_date, event_timestamp, user_pseudo_id, platform, event_name from `advanced.app_logs` where event_date >= '2022-08-01' and event_date <= '2022-08-30' and event_name='click_restaurant' ) ) ) ) group by week_diff ) ) order by week_diff ;
-
해결됨BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 3주차 과제] 리텐션 연습문제 및 추가 코딩테스트 문제풀이
1) 3주차 추가 코딩테스트 대비 문제풀이1번문제 (제한시간: 15분 >> 소요시간 23분)문제: 주차별 각 카테고리별 평균 할인율이 가장 높았던 기간과 할인율을 구하는 쿼리를 작성해주세요단, 날짜 데이터를 YYYY-MM-DD 23:59:39 이런 형태로 변경해주세요WITH transaction_data AS ( SELECT 111 AS user_id, 1001 AS item_id, 719200 AS actual_price, '01/08/2024 12:00:00' AS transaction_date UNION ALL SELECT 111, 2002, 89000, '01/10/2024 12:00:00' UNION ALL SELECT 189, 2002, 89000, '01/12/2024 12:00:00' UNION ALL SELECT 156, 3002, 459000, '01/15/2024 12:00:00' UNION ALL SELECT 121, 1001, 719200, '01/18/2024 12:00:00' UNION ALL SELECT 156, 2001, 90300, '01/25/2024 12:00:00' UNION ALL SELECT 145, 3001, 399000, '01/26/2024 12:00:00' UNION ALL SELECT 189, 1002, 607200, '01/28/2024 12:00:00' UNION ALL SELECT 111, 3001, 399000, '02/05/2024 12:00:00' UNION ALL SELECT 178, 1002, 759000, '02/07/2024 12:00:00' UNION ALL SELECT 121, 2002, 62300, '02/08/2024 12:00:00' UNION ALL SELECT 156, 1001, 899000, '02/10/2024 12:00:00' UNION ALL SELECT 190, 2001, 90300, '02/11/2024 12:00:00' UNION ALL SELECT 189, 2001, 90300, '02/14/2024 12:00:00' UNION ALL SELECT 111, 1002, 759000, '02/15/2024 12:00:00' UNION ALL SELECT 156, 3001, 299250, '02/20/2024 12:00:00' UNION ALL SELECT 189, 3002, 344250, '02/25/2024 12:00:00' UNION ALL SELECT 111, 2001, 90300, '02/28/2024 12:00:00' ), item_info AS ( SELECT 1001 AS item_id, 'Electronics' AS category, 'Smartphone' AS item_name, 899000 AS list_price UNION ALL SELECT 1002 AS item_id, 'Electronics' AS category, 'Tablet' AS item_name, 759000 AS list_price UNION ALL SELECT 2001 AS item_id, 'Fashion' AS category, 'Sneakers' AS item_name, 129000 AS list_price UNION ALL SELECT 2002 AS item_id, 'Fashion' AS category, 'Backpack' AS item_name, 89000 AS list_price UNION ALL SELECT 3001 AS item_id, 'Home' AS category, 'Coffee Machine' AS item_name, 399000 AS list_price UNION ALL SELECT 3002 AS item_id, 'Home' AS category, 'Air Purifier' AS item_name, 459000 AS list_price ) SELECT *, RANK() OVER (ORDER BY avg_sales_ratio DESC) as rnk FROM ( SELECT weekly, category, AVG(sales_ratio) AS avg_sales_ratio FROM ( SELECT DATE_TRUNC(DATETIME(CONCAT(SPLIT(SPLIT(a.transaction_date, '/')[OFFSET(2)], ' ')[OFFSET(0)],'-',SPLIT(a.transaction_date, '/')[OFFSET(0)] , '-',SPLIT(a.transaction_date, '/')[OFFSET(1)])), WEEK(MONDAY)) AS weekly, a.item_id, b.category, b.list_price, a.actual_price, ROUND(1- ROUND(SAFE_DIVIDE(actual_price, list_price),4),4) AS sales_ratio FROM transaction_data a LEFT JOIN item_info b ON a.item_id = b.item_id ) GROUP BY ALL ) QUALIFY rnk = 1 ORDER BY 1 ;2번문제 (제한시간: 10분 >> 소요시간 7분)문제: 2024년 1월에 가장 많은 매출을 기록한 카테고리를 구하는 쿼리를 작성해주세요WITH transaction_data AS ( SELECT 111 AS user_id, 1001 AS item_id, 719200 AS actual_price, '01/08/2024 12:00:00' AS transaction_date UNION ALL SELECT 111, 2002, 89000, '01/10/2024 12:00:00' UNION ALL SELECT 189, 2002, 89000, '01/12/2024 12:00:00' UNION ALL SELECT 156, 3002, 459000, '01/15/2024 12:00:00' UNION ALL SELECT 121, 1001, 719200, '01/18/2024 12:00:00' UNION ALL SELECT 156, 2001, 90300, '01/25/2024 12:00:00' UNION ALL SELECT 145, 3001, 399000, '01/26/2024 12:00:00' UNION ALL SELECT 189, 1002, 607200, '01/28/2024 12:00:00' UNION ALL SELECT 111, 3001, 399000, '02/05/2024 12:00:00' UNION ALL SELECT 178, 1002, 759000, '02/07/2024 12:00:00' UNION ALL SELECT 121, 2002, 62300, '02/08/2024 12:00:00' UNION ALL SELECT 156, 1001, 899000, '02/10/2024 12:00:00' UNION ALL SELECT 190, 2001, 90300, '02/11/2024 12:00:00' UNION ALL SELECT 189, 2001, 90300, '02/14/2024 12:00:00' UNION ALL SELECT 111, 1002, 759000, '02/15/2024 12:00:00' UNION ALL SELECT 156, 3001, 299250, '02/20/2024 12:00:00' UNION ALL SELECT 189, 3002, 344250, '02/25/2024 12:00:00' UNION ALL SELECT 111, 2001, 90300, '02/28/2024 12:00:00' ), item_info AS ( SELECT 1001 AS item_id, 'Electronics' AS category, 'Smartphone' AS item_name, 899000 AS list_price UNION ALL SELECT 1002 AS item_id, 'Electronics' AS category, 'Tablet' AS item_name, 759000 AS list_price UNION ALL SELECT 2001 AS item_id, 'Fashion' AS category, 'Sneakers' AS item_name, 129000 AS list_price UNION ALL SELECT 2002 AS item_id, 'Fashion' AS category, 'Backpack' AS item_name, 89000 AS list_price UNION ALL SELECT 3001 AS item_id, 'Home' AS category, 'Coffee Machine' AS item_name, 399000 AS list_price UNION ALL SELECT 3002 AS item_id, 'Home' AS category, 'Air Purifier' AS item_name, 459000 AS list_price ) SELECT category FROM ( SELECT transaction_month, category, SUM(actual_price) AS category_sum_price FROM ( SELECT DATE_TRUNC(DATETIME(CONCAT(SPLIT(SPLIT(a.transaction_date, '/')[OFFSET(2)], ' ')[OFFSET(0)],'-',SPLIT(a.transaction_date, '/')[OFFSET(0)] , '-',SPLIT(a.transaction_date, '/')[OFFSET(1)])), MONTH) AS transaction_month, a.item_id, b.category, a.actual_price FROM transaction_data a LEFT JOIN item_info b ON a.item_id = b.item_id ) WHERE transaction_month = "2024-01-01" GROUP BY ALL ) ORDER BY category_sum_price DESC LIMIT 1 ;3번문제 (제한시간: 10분 >> 소요시간 6분 40초)문제: 유저별 총 구매 금액이 200만원 이상인 유저들이 가장 많이 구매한 카테고리를 찾는 쿼리를 작성해주세요WITH transaction_data AS ( SELECT 111 AS user_id, 1001 AS item_id, 719200 AS actual_price, '01/08/2024 12:00:00' AS transaction_date UNION ALL SELECT 111, 2002, 89000, '01/10/2024 12:00:00' UNION ALL SELECT 189, 2002, 89000, '01/12/2024 12:00:00' UNION ALL SELECT 156, 3002, 459000, '01/15/2024 12:00:00' UNION ALL SELECT 121, 1001, 719200, '01/18/2024 12:00:00' UNION ALL SELECT 156, 2001, 90300, '01/25/2024 12:00:00' UNION ALL SELECT 145, 3001, 399000, '01/26/2024 12:00:00' UNION ALL SELECT 189, 1002, 607200, '01/28/2024 12:00:00' UNION ALL SELECT 111, 3001, 399000, '02/05/2024 12:00:00' UNION ALL SELECT 178, 1002, 759000, '02/07/2024 12:00:00' UNION ALL SELECT 121, 2002, 62300, '02/08/2024 12:00:00' UNION ALL SELECT 156, 1001, 899000, '02/10/2024 12:00:00' UNION ALL SELECT 190, 2001, 90300, '02/11/2024 12:00:00' UNION ALL SELECT 189, 2001, 90300, '02/14/2024 12:00:00' UNION ALL SELECT 111, 1002, 759000, '02/15/2024 12:00:00' UNION ALL SELECT 156, 3001, 299250, '02/20/2024 12:00:00' UNION ALL SELECT 189, 3002, 344250, '02/25/2024 12:00:00' UNION ALL SELECT 111, 2001, 90300, '02/28/2024 12:00:00' ), item_info AS ( SELECT 1001 AS item_id, 'Electronics' AS category, 'Smartphone' AS item_name, 899000 AS list_price UNION ALL SELECT 1002 AS item_id, 'Electronics' AS category, 'Tablet' AS item_name, 759000 AS list_price UNION ALL SELECT 2001 AS item_id, 'Fashion' AS category, 'Sneakers' AS item_name, 129000 AS list_price UNION ALL SELECT 2002 AS item_id, 'Fashion' AS category, 'Backpack' AS item_name, 89000 AS list_price UNION ALL SELECT 3001 AS item_id, 'Home' AS category, 'Coffee Machine' AS item_name, 399000 AS list_price UNION ALL SELECT 3002 AS item_id, 'Home' AS category, 'Air Purifier' AS item_name, 459000 AS list_price ) SELECT category FROM ( SELECT category, SUM(actual_price) AS sum_purchase_price FROM ( SELECT a.user_id, a.item_id, b.category, a.actual_price FROM transaction_data a LEFT JOIN item_info b ON a.item_id = b.item_id WHERE a.user_id IN ( SELECT user_id FROM ( SELECT user_id, SUM(actual_price) AS sum_purchase_price FROM transaction_data a GROUP BY ALL HAVING SUM(actual_price) >= 2000000 ) ) ) GROUP BY ALL ) ORDER BY sum_purchase_price DESC LIMIT 1 2) 리텐션 연습문제 풀이해당 연습문제는 전부 해결하지 못해 최대한 기한 내에 풀이를 완료한 쿼리를 작성하게 되었습니다. 완료하지 못한 과제는 추후에라도 풀이 완료하여 수정 업로드 하겠습니다.문제 1) Weekly Retention을 구하는 쿼리를 바닥부터 스스로 작성해보세요.WITH base AS ( # event데이터의 raw데이터를 추출 SELECT DISTINCT user_id, user_pseudo_id, DATE(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_date, event_name FROM advanced.app_logs WHERE event_date BETWEEN "2022-08-01" AND "2022-12-31" ), first_week_diff_event_week AS ( # 유저별 첫번쨰 week, 이벤트 발생 week, 주차별 차이 데이터 추출 SELECT DISTINCT user_pseudo_id, first_week, event_week, DATE_DIFF(event_week, first_week, WEEK) AS weeks_after_first_week FROM ( SELECT user_pseudo_id, event_name, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week, DATE_TRUNC(MIN(event_date) OVER (PARTITION BY user_pseudo_id ORDER BY event_date), WEEK(MONDAY)) AS first_week FROM base ) ), user_counts AS ( # 첫번째 이벤트 발생주차 | 이벤트 Retain 발생차이 | 유저 수 | 코호트 유저 수 SELECT *, FIRST_VALUE(user_cnt) OVER (ORDER BY weeks_after_first_week) AS cohort_user FROM ( SELECT weeks_after_first_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_week_diff_event_week GROUP BY ALL ) ) # Retention Rate 추출 SELECT *, ROUND(SAFE_DIVIDE(user_cnt, cohort_user), 4) AS retention_rate FROM user_counts ORDER BY 1,2 ;
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝 스터디 3주차과제] 리텐션 연습문제
노션링크 첨부합니다.https://focapa.notion.site/3-13c094a2b89980719923c79f9298fefd?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 분석
노션으로 과제 제출합니다. https://www.notion.so/12735b8a6e338052a9cbf91cab8fdce2?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빅쿼리 빠짝스터디 3주차] 리텐션 분석
노션으로 작성하였습니다.https://apple-baroness-590.notion.site/3-13aacf7d68f680179560f0ac848277f5?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 3주차 과제] 리텐션 연습문제
문제 1) Weekly Retention을 구하는 쿼리를 바닥부터 스스로 작성해보세요 WITH base AS ( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul'), WEEK(MONDAY)) AS event_week, DATE_TRUNC(MIN(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week FROM `advanced.app_logs` ), first_week_diff AS ( SELECT user_pseudo_id, DATE_DIFF(event_week, first_week, WEEK) AS diff_of_week FROM base ), user_counts AS ( SELECT diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_count FROM first_week_diff GROUP BY diff_of_week ) SELECT diff_of_week, user_count, ROUND(SAFE_DIVIDE(user_count, FIRST_VALUE(user_count) OVER (ORDER BY diff_of_week ASC)), 2) AS retention_rate FROM user_counts ORDER BY diff_of_week; 문제 2) Retain User를 New + Current + Resurrected + Dormant User로 나누는 쿼리를 작성해보세요. WITH event_data AS ( SELECT DISTINCT DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, user_pseudo_id FROM advanced.app_logs ), user_visits AS ( SELECT user_pseudo_id, MIN(event_date) AS first_visit, MAX(event_date) AS last_visit -- 최근 방믄 FROM event_data GROUP BY user_pseudo_id ), -- 최신일자 current_date AS ( SELECT MAX(event_date) AS max_date FROM event_data ), user_types AS ( SELECT u.user_pseudo_id, u.first_visit, u.last_visit, c.max_date as current_date, DATE_DIFF(c.max_date, u.last_visit, DAY) as days_since_last_visit, CASE WHEN DATE_DIFF(c.max_date, u.last_visit, DAY) <= 14 THEN 'current' -- 최근 2주 내 방문 WHEN DATE_DIFF(c.max_date, u.last_visit, DAY) > 60 THEN 'dormant' -- 60일 이상 미접속 WHEN DATE_DIFF(c.max_date, u.last_visit, DAY) > 30 AND DATE_DIFF(c.max_date, u.last_visit, DAY) <= 44 AND EXISTS ( SELECT 1 FROM event_data e WHERE e.user_pseudo_id = u.user_pseudo_id AND DATE_DIFF(c.max_date, e.event_date, DAY) <= 14 ) THEN 'resurrected' -- 30일 이상 비활동 후 최근 14일 내 재접속 WHEN DATE_DIFF(c.max_date, u.last_visit, DAY) > 60 THEN 'inactive' -- 60일 이상 비활동 ELSE 'none' END AS user_type FROM user_visits u CROSS JOIN current_date c ), weekly_retention AS ( SELECT ut.user_type, DATE_DIFF(DATE_TRUNC(ed.event_date, WEEK), DATE_TRUNC(ut.first_visit, WEEK), WEEK) as week_number, COUNT(DISTINCT ed.user_pseudo_id) as user_count FROM event_data ed JOIN user_types ut ON ed.user_pseudo_id = ut.user_pseudo_id GROUP BY 1, 2 ), retention_rates AS ( SELECT user_type, week_number, user_count, FIRST_VALUE(user_count) OVER ( PARTITION BY user_type ORDER BY week_number ) as initial_users FROM weekly_retention ) SELECT user_type, week_number, user_count as active_users, initial_users as cohort_size, ROUND(user_count * 100.0 / initial_users, 2) as retention_rate FROM retention_rates ORDER BY user_type, week_number; 문제 3) 주어진 데이터에서 어떤 사람들이 리텐션이 그나마 높을까요? 찾아보세요 - current의 경우 15주까지는 리텐션 유지 및 증가하며 큰 변동 없다가 그 후부터 지속적으로 하락하였다. - current는 그래도 리텐션율이 9.X로 시작했으나 dormant는 7.X로 시작하였고 더 빠르게 유입율이 감소하며 리텐션이 좋지 않았다. 문제 4) Core Event를 “click_payment”라고 설정하고 Weekly Retention을 구해주세요 WITH base AS ( SELECT user_pseudo_id, DATE_TRUNC(DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')), WEEK(MONDAY)) as week FROM advanced.app_logs WHERE event_name = "click_payment" ), cal_week AS ( SELECT user_pseudo_id, week as visit_week, MIN(week) OVER (PARTITION BY user_pseudo_id) as first_week FROM base ), weekly_retention AS ( SELECT first_week DATE_DIFF(visit_week, first_week, WEEK) as week_number, COUNT(DISTINCT user_pseudo_id) as user_count FROM cal_week GROUP BY first_week, DATE_DIFF(visit_week, first_week, WEEK) ), cohort_sizes AS ( SELECT first_week, MAX(CASE WHEN week_number = 0 THEN user_count END) as initial_users FROM weekly_retention GROUP BY cohort_week ) SELECT wr.first_week, cs.initial_users as total_users, wr.week_number, wr.user_count as active_users, ROUND(wr.user_count * 100.0 / cs.initial_users, 2) as retention_rate FROM weekly_retention wr JOIN cohort_sizes cs ON wr.cohort_week = cs.cohort_week WHERE wr.week_number >= 0 ORDER BY wr.cohort_week, wr.week_number;
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 3주차 과제] 리텐션 연습문제
이번 주 과제는 아래 링크로 대체합니다. 참고 부탁드립니다!https://hajekim.notion.site/3-13bfd22d19d6806eb216c13ea620d6cb?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빅쿼리 빠짝스터디 3주차 과제]
스터디 과제 노션 링크로 제출합니다!https://www.notion.so/3-13d9f4b04d53806db52bfb5295fd0215?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝 스터디 3주차 과제] 리텐션 과제
노션으로 작성하여 링크 첨부합니다.https://sapphire-spade-aa6.notion.site/3-13d5677c37548059a160fd6fc201b6c4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디] 3주차 과제 - 리텐션
노션으로 과제 진행하여 링크 복사합니다!! 웹 게시로 수정했습니다 :) https://jypack788.notion.site/3-14-888ecbcf244f42dc9a4e5640fe0fa8dd
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 3주차 과제] 리텐션 연습문제
노션 링크 : https://www.notion.so/3-13a3dc9851a680cda683e39c64a8dc91?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 연습문제
1번with base as ( select user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER (PARTITION BY user_pseudo_id), WEEK(MONDAY)) as event_first_week, DATE_TRUNC(event_date,WEEK(monday)) as event_week, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') as event_time from advanced.app_logs ) , diff_date_tbl as ( select user_pseudo_id, event_first_week, event_week, DATE_DIFF(event_week, event_first_week, WEEK) as diff_date from base order by diff_date ) , user_counts as ( select event_first_week, diff_date, count(distinct user_pseudo_id) as user_count from diff_date_tbl group by all order by diff_date ) select * , ROUND(SAFE_DIVIDE(user_count, first_user_count), 2) as retention_rate from ( select * , FIRST_VALUE(user_count) OVER (PARTITION BY event_first_week ORDER BY diff_date) as first_user_count from user_counts ) 2번with base as ( select user_pseudo_id, MIN(event_date) OVER (PARTITION BY user_pseudo_id) as event_first_day, event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') as event_time from advanced.app_logs ) , period as ( select user_pseudo_id, event_date, LAG(event_date) OVER (PARTITION BY user_pseudo_id ORDER BY event_date) as before_event_date from base GROUP BY ALL ORDER BY user_pseudo_id ) -- 평균주기 구하기 SELECT AVG(diff_day) as avg_diff_period FROM ( select *, DATE_DIFF(event_date, before_event_date, DAY ) as diff_day from period ) -- 접속 주기 분포도 확인 SELECT diff_day, COUNT(distinct user_pseudo_id) as user_count FROM ( select *, DATE_DIFF(event_date, before_event_date, DAY ) as diff_day from period ) GROUP BY ALL ORDER BY diff_day 1) new 유저 = (최근월 - 1)월 = 첫접속일자(event_first_month) 2) current 유저 = (최근월 -1)월 , (최근월 -2)월 모두 활동 o 3) dormant 유저 = (최근월 - 1)월 , (최근월 -2)월 모두 활동 x 4) resurrected 유저 = (최근월 -2)월 활동 x, (최근월 -1)월 에는 활동 with base as ( select user_pseudo_id, event_date, DATE_TRUNC(event_date, MONTH) as event_month from advanced.app_logs order by 1 ) , min_max_month as ( SELECT *, MIN(event_month) over (partition by user_pseudo_id order by event_month) as first_month, MAX(DATE_TRUNC(event_date, MONTH)) OVER () as lastest_month from base ) , filter_month as ( SELECT user_pseudo_id, first_month, lastest_month, COUNT(CASE WHEN event_month = DATE_SUB(lastest_month, INTERVAL 1 MONTH) THEN 1 END) AS previous_1_lastest_month, COUNT(CASE WHEN event_month = DATE_SUB(lastest_month, INTERVAL 2 MONTH) THEN 1 END) AS previous_2_lastest_month FROM min_max_month group by ALL ) , user_classification as ( SELECT *, CASE WHEN first_month = DATE_SUB(lastest_month, INTERVAL 1 MONTH ) THEN 'New' WHEN previous_1_lastest_month > 0 and previous_2_lastest_month >= 0 THEN 'Current' WHEN previous_1_lastest_month = 0 and previous_2_lastest_month = 0 THEN 'Dormant' WHEN previous_1_lastest_month = 0 and previous_2_lastest_month > 0 THEN 'Resurrected' ELSE 'Others' END AS user_category FROM filter_month ) select user_category, count(distinct user_pseudo_id) from user_classification group by all 4번WITH base AS ( SELECT user_pseudo_id , first_value(datetime(timestamp_micros(event_timestamp), 'Asia/Seoul')) over(partition by user_pseudo_id order by event_timestamp) as first_visit , last_value(datetime(timestamp_micros(event_timestamp), 'Asia/Seoul')) over(partition by user_pseudo_id order by event_timestamp) as last_visit , datetime(timestamp_micros(event_timestamp), 'Asia/Seoul') as event_datetime , date_trunc(datetime(timestamp_micros(event_timestamp), 'Asia/Seoul'), day) as event_date , date_trunc(datetime(timestamp_micros(event_timestamp), 'Asia/Seoul'), week(monday)) as event_week , date_trunc(datetime(timestamp_micros(event_timestamp), 'Asia/Seoul'), month) as event_month , event_name FROM advanced.app_logs ), click_payment AS ( SELECT DISTINCT user_pseudo_id , min(event_week) over(partition by user_pseudo_id) as first_week , event_week as week_date FROM base WHERE event_name = 'click_payment' ), diff_week AS ( SELECT first_week, date_diff(week_date, first_week, week) as week_diff, count(distinct user_pseudo_id) as users_cnt FROM click_payment GROUP BY ALL ) SELECT first_week , week_diff , users_cnt , safe_divide(users_cnt, first_value(users_cnt) over(partition by first_week order by week_diff)) as retention_rate FROM diff_week order by 1, 2 ;
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 분석
노션 링크로 과제 업로드 합니다.과제 풀이 노션 링크 : [인프런 빅쿼리 빠짝스터디 3주차] 리텐션 분석
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디] 3주차 연습 문제 풀이 제출 - 리텐션 및 추가문제
과제풀이를 노션에 하여서 노션링크 업로드 하여 제출합니다노션링크: https://canyon-king-6a2.notion.site/3-13ce4e3151278004b5abe5e38e5268b8?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디] 3주차 연습 문제 풀이 제출 - 리텐션
과제 풀이를 노션에 작성하여, 노션 링크 업로드 합니다.과제 풀이 노션 링크 : https://sapphire-lotus-719.notion.site/3-1-3-13571425e28d8056afe1c68dd7efaa6e?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 분석
✅ 연습문제 (1) weekly retention--1) weekly retention WITH base AS ( -- 날짜 데이터 전처리 -- DATE도 실제 날짜와 일치하지 않을 수 있으니, 다시 확인하기 -- DATETIME(TIMESTAMP_MICROS(timestamp데이터),'대륙/국가') 기억하고 사용하기 SELECT DATETIME(TIMESTAMP_MICROS(event_timestamp),'Asia/Seoul') AS event_datetime, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp),'Asia/Seoul')) AS event_date, event_name, user_pseudo_id FROM `bigquery-432401.avdanced.app_logs_temp` ) -- 일자 간 차이 구하기 , cal_week AS ( SELECT DISTINCT user_pseudo_id, -- 최초일(week 단위로) 구하기 DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, -- 원본 날짜 데이터에서 주차 데이터로 구하기 DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ), cal_diff AS ( -- 주차데이터간 차이 구하기 SELECT *, DATE_DIFF(event_week, first_week,WEEK) AS diff_week FROM cal_week ), user_count AS ( -- 주차별 user 수 구하기 SELECT diff_week, COUNT(distinct user_pseudo_id) as user_cnt FROM cal_diff GROUP BY diff_week ) -- 출력 SELECT *, ROUND(SAFE_DIVIDE(user_cnt, first_week_user_cnt),2) AS retention_rate FROM ( SELECT diff_week, user_cnt, FIRST_VALUE(user_cnt) OVER(ORDER BY diff_week) AS first_week_user_cnt FROM user_count ) 아직 익숙하지 않아서 중간중간 다시 코드 보고 작성했다. 점점 더 익숙해지길! (2) Retain UserWITH base AS ( -- 날짜 데이터 전처리 -- DATE도 실제 날짜와 일치하지 않을 수 있으니, 다시 확인하기 -- DATETIME(TIMESTAMP_MICROS(timestamp데이터),'대륙/국가') 기억하고 사용하기 SELECT DATETIME(TIMESTAMP_MICROS(event_timestamp),'Asia/Seoul') AS event_datetime, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp),'Asia/Seoul')) AS event_date, event_name, user_pseudo_id FROM `bigquery-432401.avdanced.app_logs_temp` WHERE event_date BETWEEN "2022-08-01" AND "2022-11-03" ) , cal_diff AS ( -- 날짜데이터 간 차이 구하기 SELECT *, DATE_DIFF("2022-11-03", event_date,DAY) AS diff_date FROM base ), user_type AS ( -- user type 구하기 -- 0. 해당없음 -- 1. new : 첫번째 접속 --2. current : 최근 30일내에 접속 -- 3. resurrected : 30일 이상 활동이 없고, 최근 7일 이내 다시 접속한 사용자 -- 4. dormant : 최근 60일 간 접속기록 없음 SELECT *, CASE -- 첫 번째 접속 => 'new' WHEN diff_date = FIRST_VALUE(diff_date) OVER(PARTITION BY user_pseudo_id ORDER BY event_date) THEN 'new' -- 최근 30일 내에 접속한 사용자 => 'current' WHEN diff_date <= 30 THEN 'current' -- 30일 이상 활동이 없고, 최근 7일 이내 다시 접속한 사용자 => 'resurrected' WHEN diff_date > 30 AND diff_date <= 37 THEN 'resurrected' -- 60일 이상 접속하지 않은 사용자 => 'dormant' WHEN diff_date > 60 THEN 'dormant' ELSE 'none' END AS type FROM cal_diff GROUP BY all ORDER BY user_pseudo_id,diff_date ) SELECT * FROM user_type나름 user를 나눈다고 나눴는데, 잘 못 나눠진 느낌이 든다! 현업에는 어떻게 나눌지 궁금하다(3) 어떤 user가 많은가?WITH base AS ( -- 날짜 데이터 전처리 SELECT DATETIME(TIMESTAMP_MICROS(event_timestamp),'Asia/Seoul') AS event_datetime, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp),'Asia/Seoul')) AS event_date, event_name, user_pseudo_id FROM `bigquery-432401.avdanced.app_logs_temp` WHERE event_date BETWEEN "2022-08-01" AND "2022-11-03" ), cal_diff AS ( -- 날짜데이터 간 차이 구하기 SELECT *, DATE_DIFF("2022-11-03", event_date, DAY) AS diff_date FROM base ), user_type AS ( -- user type 구하기 SELECT distinct user_pseudo_id, CASE WHEN diff_date = FIRST_VALUE(diff_date) OVER(PARTITION BY user_pseudo_id ORDER BY event_date) THEN 'new' WHEN diff_date <= 30 THEN 'current' WHEN diff_date > 30 AND diff_date <= 37 THEN 'resurrected' WHEN diff_date > 60 THEN 'dormant' ELSE 'none' END AS type FROM cal_diff ), cal_cnt AS ( -- 각 type별 사용자 수 계산 SELECT type, COUNT(user_pseudo_id) AS user_cnt FROM user_type GROUP BY type ) SELECT type, user_cnt, ROUND(user_cnt * 100.0 / (SELECT COUNT(*) FROM user_type), 2) AS ratio FROM cal_cnt ORDER BY ratio DESC new인 사용자가 69.87 %로 전체의 대다수를 차지한다. 그다음이 최근에 7일 내에 접속한 유저가 21.83%, resurrected : 30일 이상 활동이 없고, 최근 7일 이내 다시 접속한 사용자가 2.59%, 최근 60일 간 접속기록 없는 사용자가 1.25%를 차지하였다.2022-11-03일 기준 "2022-08-01" 에서 "2022-11-03" 기간 동안 서비스를 이용하는 대다수의 사람들은 신규 가입자들이 많다는 것을 알 수 있다.또한, current User가 많은 것으로 보아, 최근 7일이내에 서비스를 사용한 사람들의 비율이 두번째로 높다.현재 신규이용자의 수가 대다수를 차지함으로 이 신규 사용자를 current user로 전환하기 위한 전략을 고안해야 할 것으로 보인다.none이 resurrected, dormant보다 많은데, type을 나누는 기준을 수정하면 더 효율적이게 나눌 수 있지 않을까 생각된다.(4) core event를 click_payment로 놓았을 때 Weekly retentionWITH base AS ( SELECT event_name, user_pseudo_id, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_time FROM `bigquery-432401.avdanced.app_logs_temp` WHERE event_name = "click_payment" ), week_diff AS ( SELECT *, DATE_DIFF(event_week, first_week, WEEK) AS diff FROM ( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ) ), user_counts AS ( SELECT diff, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM week_diff GROUP BY diff ) SELECT *, ROUND(SAFE_DIVIDE(user_cnt, first_week_user_cnt), 2) AS retention_rate FROM ( SELECT diff, user_cnt, FIRST_VALUE(user_cnt) OVER(ORDER BY diff ASC) AS first_week_user_cnt FROM user_counts )1주차 이후로 사용자의 수가 급감. 2주차 부터는 계속 0.01을 유지. 15주차부터 24주차까지는 0 retention rate를 유지.사용자가 1주차 이후로 대거 이탈함을 볼 수 있음. 15주차부터 0.01에서 0.0으로 감소. 1주차에 이벤트 혹은 서비스 런칭으로 사용자수가 급격히 증가하였으나, 서비스 경험이 사용자에게 매력적으로 다가오지 못하여 바로 그 다음주부터 급격하게 감소하는 것을 볼 수 있음.따라서, core_event에 대한 근본적인 개선이 필요할 것으로 보임. ✅ 배운점 및 느낀점 리텐션 분석에 대한 전반적인 지식을 얻을 수 있었다. 아직, 데이터에서 리텐션 분석을 하는 것이 익숙치 않아 학습을 더 해야 할 것 같다. user type을 나누는 부분에서 더 짜임새 있는? 기준이 뭔지, 현업에서는 어떤 기준으로 사용자를 나누는지 궁금하다!고객을 분석하기 위해서는 어떤 부분을 더 봐야하는 지 아직 크게 감이 없는데, 다른 레퍼런스를 보는 것이 필요할 것 같다는 생각이 들었다
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 연습문제
1번 과제#weekly retention WITH base AS( SELECT DISTINCT user_id, event_name, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, DATE(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_date, user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN '2022-08-01' AND '2022-11-03' ) , dates AS( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER (PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ) , first_week_and_diff AS ( SELECT *, DATE_DIFF(event_week, first_week, week) AS diff_of_week FROM dates ) , user_counts AS ( SELECT diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_week_and_diff GROUP BY diff_of_week ORDER BY diff_of_week ) , first_week_user_count AS ( SELECT diff_of_week, user_cnt, FIRST_VALUE(user_cnt) OVER(ORDER BY diff_of_week) AS first_week_user_cnt FROM user_counts WHERE diff_of_week = 0 ) SELECT *, SAFE_DIVIDE(user_cnt, first_week_user_cnt) AS retain_rate FROM first_week_user_count 2번 과제New+Current+Resurrected+Dormant User로 나누기일단 1주일만에 리텐션이 6.8%대이니까, 신규유저의 기준을 7일로 두겠습니다.리텐션이 반으로 떨어지는 3% 미만이 8주부터, 1% 미만이 12주부터여서 임의로 구간을 세팅하겠습니다.이에 따라 미접속 8주차부터는 휴면 유저로 보겠습니다.따라서 8주 뒤 다시 들어온 유저는 복귀 유저로 보겠습니다. # 신규 유저, 기존 유저, 이탈 유저, 복귀 유저로 나눈 쿼리 # 첫 event 후 7일 안 지난 유저 # event 후 7일 지났고, 마지막 접속으로부터 8주가 안 지난 유저 # event 후 7일이 지났고, 마지막 접속으로부터 8주가 지난 유저 # 첫 event 후 7일이 지났고, 마지막 접속으로부터 8주가 지났다가, 다시 event가 발생한 유저 WITH base AS( SELECT DISTINCT user_id, event_name, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, DATE(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_date, user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN '2022-08-01' AND '2023-08-31' ) , dates AS ( SELECT MIN(event_date) AS first_date, MAX(event_date) AS last_date FROM base GROUP BY user_pseudo_id ) SELECT event_date, COUNT(DISTINCT CASE WHEN DATE_DIFF(event_date, first_date, DAY) < 8 THEN user_pseudo_id ELSE NULL END) AS new_user, COUNT(DISTINCT CASE WHEN DATE_DIFF(event_date, first_date, DAY) >= 8 AND DATE_DIFF(event_date, last_date, DAY) < 56 THEN user_pseudo_id ELSE NULL END) AS current_user, COUNT(DISTINCT CASE WHEN DATE_DIFF(event_date, first_date, DAY) >= 8 AND DATE_DIFF(event_date, last_date, DAY) >= 56 THEN user_pseudo_id ELSE NULL END) AS resurrected_user, COUNT(DISTINCT CASE WHEN DATE_DIFF(event_date, first_date, DAY) >= 8 END) AS dormant_user FROM dates GROUP BY event_date 2번 과제는 이렇게까진 짜보았으나 실패했습니다..짜보다보니 last_date가 절대로 event_date보다 +가 나올 수가 없더군요...그래서 그냥 일반 날짜 컬럼을 붙일 수 있나...? 어떻게 해야하지? 하고이미 짜둔 쿼리에 매몰돼서 구하질 못했네요 ㅎㅎ혹시 이렇게 짜는 방향에 대해서 조언해주실 점이 있으시면 부탁드리겠습니다.. 3번 과제 아래와 같이 퍼널 나눴다. 획득screen_view, click_login 활성화click_banner, click_food_category, click_restaurant, click_food,view_recommend_extra_food, click_recommend_extra_foodclick_recommend_food, click_restaurant_nearbyclick_search, request_search 수익화 관련click_cartclick_payment 하지만 각 행동한 유저의 수를 구해보니 아래와 같았다.recommend 관련 행동을 한 유저가 오히려 더 적었다. 그래서 각 이벤트별로 유저 수를 구해보았고#event가 click_cart일 때 weekly retention WITH base AS( SELECT DISTINCT user_id, event_name, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, DATE(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_date, user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN '2022-08-01' AND '2023-08-31' AND user_pseudo_id IN (SELECT user_pseudo_id FROM advanced.app_logs WHERE event_name = 'click_cart') ) , dates AS( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER (PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ) , first_week_and_diff AS ( SELECT *, DATE_DIFF(event_week, first_week, week) AS diff_of_week FROM dates ) , user_counts AS ( SELECT diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_week_and_diff GROUP BY diff_of_week ORDER BY diff_of_week ) , first_week_user_count AS ( SELECT diff_of_week, user_cnt, FIRST_VALUE(user_cnt) OVER(ORDER BY diff_of_week) AS first_week_user_cnt FROM user_counts ) SELECT *, SAFE_DIVIDE(user_cnt, first_week_user_cnt) AS retain_rate FROM first_week_user_count 4번 과제처럼 core event를 각각 설정해보았다.4번 과제의 결과인 click_payment 했던 유저의 리텐션이 역시 제일 좋았었다. 반면 cart를 클릭한 유저의 리텐션은 비교적 조오금 낮았고 반대로 click_search, click_recommend_food, click_restaurant_nearby를 한 유저의 리텐션이 구매자 리텐션과 유사했다.그래서 기본 행동들에서, 고객이 더 관심 기울여하는 행동인 검색, 추천 서비스 클릭하기 등의 행동을 유도하면 어떨까.. 싶었다! 사실 10% 이상으로 다 비슷하긴 하지만 약간의 차이라도 보이는 것들을 좀 추려보았다. 4번 과제# click_payment event를 경험한 유저들의 weekly retention WITH base AS( SELECT DISTINCT user_id, event_name, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, DATE(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_date, user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN '2022-08-01' AND '2023-08-31' AND user_pseudo_id IN (SELECT user_pseudo_id FROM advanced.app_logs WHERE event_name = 'click_payment') ) , dates AS( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER (PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ) , first_week_and_diff AS ( SELECT *, DATE_DIFF(event_week, first_week, week) AS diff_of_week FROM dates ) , user_counts AS ( SELECT diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_week_and_diff GROUP BY diff_of_week ORDER BY diff_of_week ) , first_week_user_count AS ( SELECT diff_of_week, user_cnt, FIRST_VALUE(user_cnt) OVER(ORDER BY diff_of_week) AS first_week_user_cnt FROM user_counts ) SELECT *, SAFE_DIVIDE(user_cnt, first_week_user_cnt) AS retain_rate FROM first_week_user_count
주간 인기글
순위 정보를
불러오고 있어요