묻고 답해요
141만명의 커뮤니티!! 함께 토론해봐요.
인프런 TOP Writers
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 연습문제
1번with base as ( select user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER (PARTITION BY user_pseudo_id), WEEK(MONDAY)) as event_first_week, DATE_TRUNC(event_date,WEEK(monday)) as event_week, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') as event_time from advanced.app_logs ) , diff_date_tbl as ( select user_pseudo_id, event_first_week, event_week, DATE_DIFF(event_week, event_first_week, WEEK) as diff_date from base order by diff_date ) , user_counts as ( select event_first_week, diff_date, count(distinct user_pseudo_id) as user_count from diff_date_tbl group by all order by diff_date ) select * , ROUND(SAFE_DIVIDE(user_count, first_user_count), 2) as retention_rate from ( select * , FIRST_VALUE(user_count) OVER (PARTITION BY event_first_week ORDER BY diff_date) as first_user_count from user_counts ) 2번with base as ( select user_pseudo_id, MIN(event_date) OVER (PARTITION BY user_pseudo_id) as event_first_day, event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') as event_time from advanced.app_logs ) , period as ( select user_pseudo_id, event_date, LAG(event_date) OVER (PARTITION BY user_pseudo_id ORDER BY event_date) as before_event_date from base GROUP BY ALL ORDER BY user_pseudo_id ) -- 평균주기 구하기 SELECT AVG(diff_day) as avg_diff_period FROM ( select *, DATE_DIFF(event_date, before_event_date, DAY ) as diff_day from period ) -- 접속 주기 분포도 확인 SELECT diff_day, COUNT(distinct user_pseudo_id) as user_count FROM ( select *, DATE_DIFF(event_date, before_event_date, DAY ) as diff_day from period ) GROUP BY ALL ORDER BY diff_day 1) new 유저 = (최근월 - 1)월 = 첫접속일자(event_first_month) 2) current 유저 = (최근월 -1)월 , (최근월 -2)월 모두 활동 o 3) dormant 유저 = (최근월 - 1)월 , (최근월 -2)월 모두 활동 x 4) resurrected 유저 = (최근월 -2)월 활동 x, (최근월 -1)월 에는 활동 with base as ( select user_pseudo_id, event_date, DATE_TRUNC(event_date, MONTH) as event_month from advanced.app_logs order by 1 ) , min_max_month as ( SELECT *, MIN(event_month) over (partition by user_pseudo_id order by event_month) as first_month, MAX(DATE_TRUNC(event_date, MONTH)) OVER () as lastest_month from base ) , filter_month as ( SELECT user_pseudo_id, first_month, lastest_month, COUNT(CASE WHEN event_month = DATE_SUB(lastest_month, INTERVAL 1 MONTH) THEN 1 END) AS previous_1_lastest_month, COUNT(CASE WHEN event_month = DATE_SUB(lastest_month, INTERVAL 2 MONTH) THEN 1 END) AS previous_2_lastest_month FROM min_max_month group by ALL ) , user_classification as ( SELECT *, CASE WHEN first_month = DATE_SUB(lastest_month, INTERVAL 1 MONTH ) THEN 'New' WHEN previous_1_lastest_month > 0 and previous_2_lastest_month >= 0 THEN 'Current' WHEN previous_1_lastest_month = 0 and previous_2_lastest_month = 0 THEN 'Dormant' WHEN previous_1_lastest_month = 0 and previous_2_lastest_month > 0 THEN 'Resurrected' ELSE 'Others' END AS user_category FROM filter_month ) select user_category, count(distinct user_pseudo_id) from user_classification group by all 4번WITH base AS ( SELECT user_pseudo_id , first_value(datetime(timestamp_micros(event_timestamp), 'Asia/Seoul')) over(partition by user_pseudo_id order by event_timestamp) as first_visit , last_value(datetime(timestamp_micros(event_timestamp), 'Asia/Seoul')) over(partition by user_pseudo_id order by event_timestamp) as last_visit , datetime(timestamp_micros(event_timestamp), 'Asia/Seoul') as event_datetime , date_trunc(datetime(timestamp_micros(event_timestamp), 'Asia/Seoul'), day) as event_date , date_trunc(datetime(timestamp_micros(event_timestamp), 'Asia/Seoul'), week(monday)) as event_week , date_trunc(datetime(timestamp_micros(event_timestamp), 'Asia/Seoul'), month) as event_month , event_name FROM advanced.app_logs ), click_payment AS ( SELECT DISTINCT user_pseudo_id , min(event_week) over(partition by user_pseudo_id) as first_week , event_week as week_date FROM base WHERE event_name = 'click_payment' ), diff_week AS ( SELECT first_week, date_diff(week_date, first_week, week) as week_diff, count(distinct user_pseudo_id) as users_cnt FROM click_payment GROUP BY ALL ) SELECT first_week , week_diff , users_cnt , safe_divide(users_cnt, first_value(users_cnt) over(partition by first_week order by week_diff)) as retention_rate FROM diff_week order by 1, 2 ;
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 분석
노션 링크로 과제 업로드 합니다.과제 풀이 노션 링크 : [인프런 빅쿼리 빠짝스터디 3주차] 리텐션 분석
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디] 3주차 연습 문제 풀이 제출 - 리텐션 및 추가문제
과제풀이를 노션에 하여서 노션링크 업로드 하여 제출합니다노션링크: https://canyon-king-6a2.notion.site/3-13ce4e3151278004b5abe5e38e5268b8?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디] 3주차 연습 문제 풀이 제출 - 리텐션
과제 풀이를 노션에 작성하여, 노션 링크 업로드 합니다.과제 풀이 노션 링크 : https://sapphire-lotus-719.notion.site/3-1-3-13571425e28d8056afe1c68dd7efaa6e?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 분석
✅ 연습문제 (1) weekly retention--1) weekly retention WITH base AS ( -- 날짜 데이터 전처리 -- DATE도 실제 날짜와 일치하지 않을 수 있으니, 다시 확인하기 -- DATETIME(TIMESTAMP_MICROS(timestamp데이터),'대륙/국가') 기억하고 사용하기 SELECT DATETIME(TIMESTAMP_MICROS(event_timestamp),'Asia/Seoul') AS event_datetime, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp),'Asia/Seoul')) AS event_date, event_name, user_pseudo_id FROM `bigquery-432401.avdanced.app_logs_temp` ) -- 일자 간 차이 구하기 , cal_week AS ( SELECT DISTINCT user_pseudo_id, -- 최초일(week 단위로) 구하기 DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, -- 원본 날짜 데이터에서 주차 데이터로 구하기 DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ), cal_diff AS ( -- 주차데이터간 차이 구하기 SELECT *, DATE_DIFF(event_week, first_week,WEEK) AS diff_week FROM cal_week ), user_count AS ( -- 주차별 user 수 구하기 SELECT diff_week, COUNT(distinct user_pseudo_id) as user_cnt FROM cal_diff GROUP BY diff_week ) -- 출력 SELECT *, ROUND(SAFE_DIVIDE(user_cnt, first_week_user_cnt),2) AS retention_rate FROM ( SELECT diff_week, user_cnt, FIRST_VALUE(user_cnt) OVER(ORDER BY diff_week) AS first_week_user_cnt FROM user_count ) 아직 익숙하지 않아서 중간중간 다시 코드 보고 작성했다. 점점 더 익숙해지길! (2) Retain UserWITH base AS ( -- 날짜 데이터 전처리 -- DATE도 실제 날짜와 일치하지 않을 수 있으니, 다시 확인하기 -- DATETIME(TIMESTAMP_MICROS(timestamp데이터),'대륙/국가') 기억하고 사용하기 SELECT DATETIME(TIMESTAMP_MICROS(event_timestamp),'Asia/Seoul') AS event_datetime, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp),'Asia/Seoul')) AS event_date, event_name, user_pseudo_id FROM `bigquery-432401.avdanced.app_logs_temp` WHERE event_date BETWEEN "2022-08-01" AND "2022-11-03" ) , cal_diff AS ( -- 날짜데이터 간 차이 구하기 SELECT *, DATE_DIFF("2022-11-03", event_date,DAY) AS diff_date FROM base ), user_type AS ( -- user type 구하기 -- 0. 해당없음 -- 1. new : 첫번째 접속 --2. current : 최근 30일내에 접속 -- 3. resurrected : 30일 이상 활동이 없고, 최근 7일 이내 다시 접속한 사용자 -- 4. dormant : 최근 60일 간 접속기록 없음 SELECT *, CASE -- 첫 번째 접속 => 'new' WHEN diff_date = FIRST_VALUE(diff_date) OVER(PARTITION BY user_pseudo_id ORDER BY event_date) THEN 'new' -- 최근 30일 내에 접속한 사용자 => 'current' WHEN diff_date <= 30 THEN 'current' -- 30일 이상 활동이 없고, 최근 7일 이내 다시 접속한 사용자 => 'resurrected' WHEN diff_date > 30 AND diff_date <= 37 THEN 'resurrected' -- 60일 이상 접속하지 않은 사용자 => 'dormant' WHEN diff_date > 60 THEN 'dormant' ELSE 'none' END AS type FROM cal_diff GROUP BY all ORDER BY user_pseudo_id,diff_date ) SELECT * FROM user_type나름 user를 나눈다고 나눴는데, 잘 못 나눠진 느낌이 든다! 현업에는 어떻게 나눌지 궁금하다(3) 어떤 user가 많은가?WITH base AS ( -- 날짜 데이터 전처리 SELECT DATETIME(TIMESTAMP_MICROS(event_timestamp),'Asia/Seoul') AS event_datetime, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp),'Asia/Seoul')) AS event_date, event_name, user_pseudo_id FROM `bigquery-432401.avdanced.app_logs_temp` WHERE event_date BETWEEN "2022-08-01" AND "2022-11-03" ), cal_diff AS ( -- 날짜데이터 간 차이 구하기 SELECT *, DATE_DIFF("2022-11-03", event_date, DAY) AS diff_date FROM base ), user_type AS ( -- user type 구하기 SELECT distinct user_pseudo_id, CASE WHEN diff_date = FIRST_VALUE(diff_date) OVER(PARTITION BY user_pseudo_id ORDER BY event_date) THEN 'new' WHEN diff_date <= 30 THEN 'current' WHEN diff_date > 30 AND diff_date <= 37 THEN 'resurrected' WHEN diff_date > 60 THEN 'dormant' ELSE 'none' END AS type FROM cal_diff ), cal_cnt AS ( -- 각 type별 사용자 수 계산 SELECT type, COUNT(user_pseudo_id) AS user_cnt FROM user_type GROUP BY type ) SELECT type, user_cnt, ROUND(user_cnt * 100.0 / (SELECT COUNT(*) FROM user_type), 2) AS ratio FROM cal_cnt ORDER BY ratio DESC new인 사용자가 69.87 %로 전체의 대다수를 차지한다. 그다음이 최근에 7일 내에 접속한 유저가 21.83%, resurrected : 30일 이상 활동이 없고, 최근 7일 이내 다시 접속한 사용자가 2.59%, 최근 60일 간 접속기록 없는 사용자가 1.25%를 차지하였다.2022-11-03일 기준 "2022-08-01" 에서 "2022-11-03" 기간 동안 서비스를 이용하는 대다수의 사람들은 신규 가입자들이 많다는 것을 알 수 있다.또한, current User가 많은 것으로 보아, 최근 7일이내에 서비스를 사용한 사람들의 비율이 두번째로 높다.현재 신규이용자의 수가 대다수를 차지함으로 이 신규 사용자를 current user로 전환하기 위한 전략을 고안해야 할 것으로 보인다.none이 resurrected, dormant보다 많은데, type을 나누는 기준을 수정하면 더 효율적이게 나눌 수 있지 않을까 생각된다.(4) core event를 click_payment로 놓았을 때 Weekly retentionWITH base AS ( SELECT event_name, user_pseudo_id, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_time FROM `bigquery-432401.avdanced.app_logs_temp` WHERE event_name = "click_payment" ), week_diff AS ( SELECT *, DATE_DIFF(event_week, first_week, WEEK) AS diff FROM ( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ) ), user_counts AS ( SELECT diff, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM week_diff GROUP BY diff ) SELECT *, ROUND(SAFE_DIVIDE(user_cnt, first_week_user_cnt), 2) AS retention_rate FROM ( SELECT diff, user_cnt, FIRST_VALUE(user_cnt) OVER(ORDER BY diff ASC) AS first_week_user_cnt FROM user_counts )1주차 이후로 사용자의 수가 급감. 2주차 부터는 계속 0.01을 유지. 15주차부터 24주차까지는 0 retention rate를 유지.사용자가 1주차 이후로 대거 이탈함을 볼 수 있음. 15주차부터 0.01에서 0.0으로 감소. 1주차에 이벤트 혹은 서비스 런칭으로 사용자수가 급격히 증가하였으나, 서비스 경험이 사용자에게 매력적으로 다가오지 못하여 바로 그 다음주부터 급격하게 감소하는 것을 볼 수 있음.따라서, core_event에 대한 근본적인 개선이 필요할 것으로 보임. ✅ 배운점 및 느낀점 리텐션 분석에 대한 전반적인 지식을 얻을 수 있었다. 아직, 데이터에서 리텐션 분석을 하는 것이 익숙치 않아 학습을 더 해야 할 것 같다. user type을 나누는 부분에서 더 짜임새 있는? 기준이 뭔지, 현업에서는 어떤 기준으로 사용자를 나누는지 궁금하다!고객을 분석하기 위해서는 어떤 부분을 더 봐야하는 지 아직 크게 감이 없는데, 다른 레퍼런스를 보는 것이 필요할 것 같다는 생각이 들었다
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 연습문제
1번 과제#weekly retention WITH base AS( SELECT DISTINCT user_id, event_name, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, DATE(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_date, user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN '2022-08-01' AND '2022-11-03' ) , dates AS( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER (PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ) , first_week_and_diff AS ( SELECT *, DATE_DIFF(event_week, first_week, week) AS diff_of_week FROM dates ) , user_counts AS ( SELECT diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_week_and_diff GROUP BY diff_of_week ORDER BY diff_of_week ) , first_week_user_count AS ( SELECT diff_of_week, user_cnt, FIRST_VALUE(user_cnt) OVER(ORDER BY diff_of_week) AS first_week_user_cnt FROM user_counts WHERE diff_of_week = 0 ) SELECT *, SAFE_DIVIDE(user_cnt, first_week_user_cnt) AS retain_rate FROM first_week_user_count 2번 과제New+Current+Resurrected+Dormant User로 나누기일단 1주일만에 리텐션이 6.8%대이니까, 신규유저의 기준을 7일로 두겠습니다.리텐션이 반으로 떨어지는 3% 미만이 8주부터, 1% 미만이 12주부터여서 임의로 구간을 세팅하겠습니다.이에 따라 미접속 8주차부터는 휴면 유저로 보겠습니다.따라서 8주 뒤 다시 들어온 유저는 복귀 유저로 보겠습니다. # 신규 유저, 기존 유저, 이탈 유저, 복귀 유저로 나눈 쿼리 # 첫 event 후 7일 안 지난 유저 # event 후 7일 지났고, 마지막 접속으로부터 8주가 안 지난 유저 # event 후 7일이 지났고, 마지막 접속으로부터 8주가 지난 유저 # 첫 event 후 7일이 지났고, 마지막 접속으로부터 8주가 지났다가, 다시 event가 발생한 유저 WITH base AS( SELECT DISTINCT user_id, event_name, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, DATE(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_date, user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN '2022-08-01' AND '2023-08-31' ) , dates AS ( SELECT MIN(event_date) AS first_date, MAX(event_date) AS last_date FROM base GROUP BY user_pseudo_id ) SELECT event_date, COUNT(DISTINCT CASE WHEN DATE_DIFF(event_date, first_date, DAY) < 8 THEN user_pseudo_id ELSE NULL END) AS new_user, COUNT(DISTINCT CASE WHEN DATE_DIFF(event_date, first_date, DAY) >= 8 AND DATE_DIFF(event_date, last_date, DAY) < 56 THEN user_pseudo_id ELSE NULL END) AS current_user, COUNT(DISTINCT CASE WHEN DATE_DIFF(event_date, first_date, DAY) >= 8 AND DATE_DIFF(event_date, last_date, DAY) >= 56 THEN user_pseudo_id ELSE NULL END) AS resurrected_user, COUNT(DISTINCT CASE WHEN DATE_DIFF(event_date, first_date, DAY) >= 8 END) AS dormant_user FROM dates GROUP BY event_date 2번 과제는 이렇게까진 짜보았으나 실패했습니다..짜보다보니 last_date가 절대로 event_date보다 +가 나올 수가 없더군요...그래서 그냥 일반 날짜 컬럼을 붙일 수 있나...? 어떻게 해야하지? 하고이미 짜둔 쿼리에 매몰돼서 구하질 못했네요 ㅎㅎ혹시 이렇게 짜는 방향에 대해서 조언해주실 점이 있으시면 부탁드리겠습니다.. 3번 과제 아래와 같이 퍼널 나눴다. 획득screen_view, click_login 활성화click_banner, click_food_category, click_restaurant, click_food,view_recommend_extra_food, click_recommend_extra_foodclick_recommend_food, click_restaurant_nearbyclick_search, request_search 수익화 관련click_cartclick_payment 하지만 각 행동한 유저의 수를 구해보니 아래와 같았다.recommend 관련 행동을 한 유저가 오히려 더 적었다. 그래서 각 이벤트별로 유저 수를 구해보았고#event가 click_cart일 때 weekly retention WITH base AS( SELECT DISTINCT user_id, event_name, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, DATE(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_date, user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN '2022-08-01' AND '2023-08-31' AND user_pseudo_id IN (SELECT user_pseudo_id FROM advanced.app_logs WHERE event_name = 'click_cart') ) , dates AS( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER (PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ) , first_week_and_diff AS ( SELECT *, DATE_DIFF(event_week, first_week, week) AS diff_of_week FROM dates ) , user_counts AS ( SELECT diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_week_and_diff GROUP BY diff_of_week ORDER BY diff_of_week ) , first_week_user_count AS ( SELECT diff_of_week, user_cnt, FIRST_VALUE(user_cnt) OVER(ORDER BY diff_of_week) AS first_week_user_cnt FROM user_counts ) SELECT *, SAFE_DIVIDE(user_cnt, first_week_user_cnt) AS retain_rate FROM first_week_user_count 4번 과제처럼 core event를 각각 설정해보았다.4번 과제의 결과인 click_payment 했던 유저의 리텐션이 역시 제일 좋았었다. 반면 cart를 클릭한 유저의 리텐션은 비교적 조오금 낮았고 반대로 click_search, click_recommend_food, click_restaurant_nearby를 한 유저의 리텐션이 구매자 리텐션과 유사했다.그래서 기본 행동들에서, 고객이 더 관심 기울여하는 행동인 검색, 추천 서비스 클릭하기 등의 행동을 유도하면 어떨까.. 싶었다! 사실 10% 이상으로 다 비슷하긴 하지만 약간의 차이라도 보이는 것들을 좀 추려보았다. 4번 과제# click_payment event를 경험한 유저들의 weekly retention WITH base AS( SELECT DISTINCT user_id, event_name, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, DATE(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_date, user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN '2022-08-01' AND '2023-08-31' AND user_pseudo_id IN (SELECT user_pseudo_id FROM advanced.app_logs WHERE event_name = 'click_payment') ) , dates AS( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER (PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ) , first_week_and_diff AS ( SELECT *, DATE_DIFF(event_week, first_week, week) AS diff_of_week FROM dates ) , user_counts AS ( SELECT diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_week_and_diff GROUP BY diff_of_week ORDER BY diff_of_week ) , first_week_user_count AS ( SELECT diff_of_week, user_cnt, FIRST_VALUE(user_cnt) OVER(ORDER BY diff_of_week) AS first_week_user_cnt FROM user_counts ) SELECT *, SAFE_DIVIDE(user_cnt, first_week_user_cnt) AS retain_rate FROM first_week_user_count
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 연습문제
1)WeeklyRetention을구하는쿼리를바닥부터스스로작성해보세요WITH event_log_base AS ( SELECT DISTINCT DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATE_TRUNC(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul'), WEEK(MONDAY)) AS event_week, DATE_TRUNC(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul'), MONTH) AS event_month, user_id, user_pseudo_id, event_name, platform, event_params FROM advanced.app_logs ), user_first_visit AS ( SELECT user_pseudo_id, MIN(event_week) OVER (PARTITION BY user_pseudo_id) AS first_visit_week, event_week FROM event_log_base ), weekly_retention_base AS ( SELECT first_visit_week, DATE_DIFF(event_week, first_visit_week, WEEK) AS week_diff, COUNT(DISTINCT user_pseudo_id) AS active_users FROM user_first_visit GROUP BY first_visit_week, week_diff ) SELECT first_visit_week, week_diff, active_users, SAFE_DIVIDE(active_users, FIRST_VALUE(active_users) OVER (PARTITION BY first_visit_week ORDER BY week_diff ASC)) AS retention_rate FROM weekly_retention_base ORDER BY first_visit_week, week_diff; 2)RetainUser를New+Current+Resurrected+DormantUser로나누는쿼리를 작성해보세요.신규 사용자: 첫 방문 후 30일 이내에 활동한 사용자. 첫 방문 날짜와 마지막 방문 날짜 간의 차이가 30일 이하일 경우 현재 사용자: 첫 방문 후 30일이 지난 사용자 중, 최근 30일 이내에 접속한 사용자. 첫 방문과 마지막 방문 간 차이가 30일을 넘고, 가장 최근 활동으로부터 30일 이내에 접속한 경우 휴면 사용자: 마지막 방문 이후 30일 이상 비활동 상태에 있는 사용자. 마지막 방문이 가장 최근 이벤트로부터 30일 이상 경과한 경우 복귀 사용자: 30일 이상 비활동 상태였다가 다시 활동을 시작한 사용자. 이전 방문 이후 30일 동안 비활동이 이어지다 다시 방문한 경우 WITH event_data AS ( SELECT DISTINCT DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, user_pseudo_id FROM advanced.app_logs ), -- 사용자별 첫 방문 및 마지막 방문 날짜 계산 user_first_last_visit AS ( SELECT user_pseudo_id, MIN(event_date) AS first_visit_date, MAX(event_date) AS last_visit_date FROM event_data GROUP BY user_pseudo_id ), -- 전체 이벤트 데이터에서 가장 최근의 이벤트 날짜를 계산 latest_event_date AS ( SELECT MAX(event_date) AS current_date FROM event_data ), -- 사용자의 이벤트 로그를 통해 비활동 기간을 기록하여 휴면 여부 판단 user_activity AS ( SELECT user_pseudo_id, event_date, LEAD(event_date) OVER (PARTITION BY user_pseudo_id ORDER BY event_date) AS next_event_date, CASE WHEN DATE_DIFF(LEAD(event_date) OVER (PARTITION BY user_pseudo_id ORDER BY event_date), event_date, DAY) > 30 THEN 1 ELSE 0 END AS dormant_history FROM event_data ), -- 사용자 유형 분류: 신규, 현재, 휴면, 복귀 사용자 user_category AS ( SELECT u.user_pseudo_id, u.first_visit_date, u.last_visit_date, led.current_date, MAX(ua.dormant_history) AS dormant_history, CASE WHEN DATE_DIFF(led.current_date, u.last_visit_date, DAY) > 30 THEN 'inactive_user' WHEN DATE_DIFF(u.last_visit_date, u.first_visit_date, DAY) <= 30 THEN 'first_month_user' WHEN MAX(ua.dormant_history) = 1 THEN 'returning_user' ELSE 'active_user' END AS user_type FROM user_first_last_visit AS u CROSS JOIN latest_event_date AS led LEFT JOIN user_activity AS ua ON u.user_pseudo_id = ua.user_pseudo_id GROUP BY u.user_pseudo_id, u.first_visit_date, u.last_visit_date, led.current_date ), -- 사용자 유형별 첫 방문 주차와 현재 주차 간 차이 계산 user_weekly_retention AS ( SELECT uc.user_type, ed.user_pseudo_id, ed.event_date, DATE_DIFF(DATE_TRUNC(ed.event_date, WEEK(MONDAY)), DATE_TRUNC(uc.first_visit_date, WEEK(MONDAY)), WEEK) AS week_difference FROM event_data AS ed JOIN user_category AS uc ON ed.user_pseudo_id = uc.user_pseudo_id ), -- 사용자 유형과 주차별 활성 사용자 수 집계 user_count_by_type_and_week AS ( SELECT user_type, week_difference, COUNT(DISTINCT user_pseudo_id) AS user_count FROM user_weekly_retention GROUP BY user_type, week_difference ), -- 유지율 계산을 위한 초기 사용자 수와 현재 주차별 유지율 계산 retention_calculation AS ( SELECT user_type, week_difference, user_count, FIRST_VALUE(user_count) OVER (PARTITION BY user_type ORDER BY week_difference) AS initial_user_count FROM user_count_by_type_and_week ) SELECT user_type, week_difference, ROUND(SAFE_DIVIDE(user_count, initial_user_count), 2) AS retention_rate FROM retention_calculation ORDER BY user_type, week_difference; 3)주어진데이터에서어떤사람들이리텐션이그나마높을까요?찾아보세요2번을 참고해서 고객을 4개의 타입으로 분류하였을 때first_month_user: 첫 주 이후 리텐션이 높지만 이후 감소하는 모습을 보인다. active_user: first_month_user와 비슷하게 첫 주 이후 빠르게 감소한다. returning_user: 2주부터 0.08로 떨어지지만 이후 완만하게 리텐션이 감소한다.오랜기간 10주차 이후에도 일부 사용자가 유지되고 있는 모습을 보인다. inactive_user: 리텐션이 다른 유저에 비해 높다고는 할 수 없지만 20주차까지 꾸준하게 소수의사용자가 유지되고 있는 모습을 보인다. --> 따라서 returning_use이 가장 완만하게 리텐션이 감소, 가장 오랫동안 리텐션이 지속했기때문에 가장 높은 리텐션을 유지했다고 판단하였습니다. 4)CoreEvent를“click_payment”라고설정하고WeeklyRetention을구해주세요WITH event_data AS ( SELECT DISTINCT user_pseudo_id, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATE_TRUNC(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul'), WEEK(MONDAY)) AS event_week, event_name FROM advanced.app_logs WHERE event_name = 'click_payment' -- Core event 필터링 ), -- 사용자별 첫 방문 주차와 현재 이벤트 주차 간의 차이 계산 user_first_week_data AS ( SELECT user_pseudo_id, MIN(event_week) OVER (PARTITION BY user_pseudo_id) AS first_visit_week, event_week AS current_week, DATE_DIFF(event_week, MIN(event_week) OVER (PARTITION BY user_pseudo_id), WEEK) AS week_difference FROM event_data ), -- 주차별 사용자 수 집계 user_count_by_week AS ( SELECT first_visit_week, week_difference, COUNT(DISTINCT user_pseudo_id) AS user_count FROM user_first_week_data GROUP BY first_visit_week, week_difference ORDER BY first_visit_week, week_difference ), -- 유지율 계산을 위한 초기 사용자 수와 주차별 유지율 계산 retention_base AS ( SELECT first_visit_week, week_difference, user_count, FIRST_VALUE(user_count) OVER (PARTITION BY first_visit_week ORDER BY week_difference) AS initial_user_count FROM user_count_by_week ) -- 최종 리텐션 비율 계산 및 결과 출력 SELECT first_visit_week, week_difference, ROUND(SAFE_DIVIDE(user_count, initial_user_count), 2) AS retention_rate FROM retention_base ORDER BY first_visit_week, week_difference;
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 분석
1) 리텐션 연습 문제https://torch-cart-08d.notion.site/3-139701964d8c805d9e49f3c68ab9b0b5 2) 3주차 추가 문제1번 문제(15분)주차별 각 카테고리별 평균 할인율이 가장 높았던 기간과 할인율을 구하는 쿼리를 작성해주세요단, 날짜 데이터를 YYYY-MM-DD 23:59:39 이런 형태로 변경해주세요SELECT *, RANK() OVER(PARTITION BY category ORDER BY rate DESC) AS category_rank FROM ( SELECT category, FORMAT_DATETIME('%Y-%m-%d 23:59:39', PARSE_DATETIME('%m/%d/%Y %H:%M:%S', transaction_date)) AS transaction_date, actual_price, list_price, ((list_price - actual_price) / list_price) * 100 AS rate FROM transaction_data AS td LEFT JOIN item_info ii ON td.item_id = ii.item_id ) QUALIFY category_rank = 1 ORDER BY category, transaction_date 2번 문제(10분)2024년 1월에 가장 많은 매출을 기록한 카테고리를 구하는 쿼리를 작성해주세요SELECT category, SUM(actual_price) AS total_price FROM ( SELECT category, FORMAT_DATETIME('%Y-%m-%d 23:59:39', PARSE_DATETIME('%m/%d/%Y %H:%M:%S', transaction_date)) AS transaction_date, actual_price, FROM transaction_data AS td LEFT JOIN item_info ii ON td.item_id = ii.item_id WHERE EXTRACT(MONTH FROM PARSE_DATETIME('%m/%d/%Y %H:%M:%S', transaction_date)) = 1 ) GROUP BY ALL ORDER BY total_price DESC LIMIT 1 3번 문제(10분)유저별 총 구매 금액이 200만원 이상인 유저들이 가장 많이 구매한 카테고리를 찾는 쿼리를 작성해주세요over_200 AS ( SELECT td.user_id, SUM(actual_price) AS user_price FROM transaction_data AS td LEFT JOIN user_info AS ui ON td.user_id = ui.user_id GROUP BY ALL HAVING user_price >= 2000000 ) SELECT category, COUNT(*) AS cnt FROM over_200 AS O2 CROSS JOIN transaction_data AS td LEFT JOIN item_info AS ii ON td.item_id = ii.item_id GROUP BY ALL ORDER BY cnt DESC LIMIT 1약 20분/5분/8분 소요1번 문제에서 날짜 데이터를 변환하는 방법이 떠오르지 않아 시간이 많이 소요됐습니다.
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 3주차 과제] 리텐션 과제
# 1) Weekly Retention을 구하는 쿼리를 바닥부터 스스로 작성해보세요 WITH base AS ( SELECT DISTINCT user_id, event_name, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN '2022-08-01' AND '2023-08-03' ), first_diff_of_week AS ( SELECT *, DATE_DIFF(event_week, first_week, WEEK) AS diff_of_week FROM ( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ) ), user_counts AS ( SELECT diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_diff_of_week GROUP BY diff_of_week ) SELECT diff_of_week, user_cnt, ROUND(SAFE_DIVIDE(user_cnt, first_cnt), 3) AS retention_rate FROM ( SELECT *, FIRST_VALUE(user_cnt) OVER(ORDER BY diff_of_week) AS first_cnt FROM user_counts ) ORDER BY diff_of_week# 2) Retain User를 New + Current + Resurrected + Dormant User로 나누는 쿼리를 작성해보세요. # 신규유저(New) : 제품을 처음 사용하는 유저 # 기존유저(Current) : 제품을 지속적으로 사용하는 유저 # 복귀유저(Resurrected) : 과거에 사용 -> 비활성 -> 다시 제품을 사용한 유저 # 휴면유저(Dormant) : 일정 기간 제품을 사용하지 않은 비활성화 사용자 WITH base AS ( SELECT DISTINCT user_id, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATE_TRUNC(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul'), WEEK(MONDAY)) AS event_week, user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN '2022-08-01' AND '2023-08-03' ), user_activity AS ( SELECT user_pseudo_id, event_week, LAG(event_week, 1) OVER(PARTITION BY user_pseudo_id ORDER BY event_week) AS prev_week, MIN(event_week) OVER(PARTITION BY user_pseudo_id) AS first_week FROM base ), user_classification AS ( SELECT user_pseudo_id, event_week, CASE WHEN event_week = first_week THEN 'New' WHEN prev_week IS NOT NULL AND DATE_DIFF(event_week, prev_week, WEEK) = 1 THEN 'Current' WHEN prev_week IS NOT NULL AND DATE_DIFF(event_week, prev_week, WEEK) > 1 THEN 'Resurrected' ELSE 'Dormant' END AS user_type FROM user_activity ) SELECT event_week, user_type, COUNT(DISTINCT user_pseudo_id) AS user_count FROM user_classification GROUP BY ALL ORDER BY event_week, user_type# 3) 주어진 데이터에서 어떤 사람들이 리텐션이 그나마 높을까요? WITH base AS ( SELECT DISTINCT user_id, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATE_TRUNC(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul'), WEEK(MONDAY)) AS event_week, user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN '2022-08-01' AND '2023-08-03' ), user_activity AS ( SELECT user_pseudo_id, event_week, LAG(event_week, 1) OVER(PARTITION BY user_pseudo_id ORDER BY event_week) AS prev_week, MIN(event_week) OVER(PARTITION BY user_pseudo_id) AS first_week FROM base ), user_classification AS ( SELECT user_pseudo_id, event_week, CASE WHEN event_week = first_week THEN 'New' WHEN prev_week IS NOT NULL AND DATE_DIFF(event_week, prev_week, WEEK) = 1 THEN 'Current' WHEN prev_week IS NOT NULL AND DATE_DIFF(event_week, prev_week, WEEK) > 1 THEN 'Resurrected' ELSE 'Dormant' END AS user_type FROM user_activity ), user_counts AS ( SELECT event_week, user_type, COUNT(DISTINCT user_pseudo_id) AS user_count FROM user_classification GROUP BY ALL ORDER BY event_week, user_type ) SELECT event_week, DATE_DIFF(event_week, first_week, WEEK) AS weeks_after_first_week, user_type, user_count, cohort_users, ROUND(SAFE_DIVIDE(user_count, cohort_users), 3) AS retention_rate FROM ( SELECT DATE(event_week) AS event_week, DATE(MIN(event_week) OVER()) AS first_week, user_type, user_count, MAX(CASE WHEN user_type = 'New' THEN user_count END) OVER(PARTITION BY DATE(event_week)) AS cohort_users FROM user_counts ORDER BY event_week, CASE user_type WHEN 'New' THEN 1 WHEN 'Current' THEN 2 WHEN 'Resurrected' THEN 3 WHEN 'Dormant' THEN 4 END )2022-08-01 ~ 2023-08-03 기간으로 설정하여, 개인적인 의견으로 Retain User를 쪼개 보았습니다.배달어플은 유저들의 사용 간격이 짧다고 생각하기 때문에, 주의 주기로 리텐션으로 설정했습니다. 저는바로 전 주에 사용했고, 그 다음주에 사용했다면 기존유저(Current)로 설정했습니다.2주 이상 텀을 두고 다시 사용했다면 복귀유저(Resurrected)로 설정했습니다.처음 들어가고 사용하지 않았다면 휴면유저(Dormant)로 설정했습니다.주차마다 '신규/기존/복귀/휴면' 유저를 새로 계산하는 쿼리입니다. 데이터를 살펴보면,1) 신규유저(New) : 제품을 처음 사용하는 유저출시(2022-08-01을 출시기간으로 생각하면)로부터 현재(2023-01-16 => 0803으로 설정했지만 0116까지가 끝입니다.)까지의 신규유저 수의 변화를 살펴보겠습니다.런칭부터 10주까지의 신규유저의 수는 주춤한 경우도 있지만 평균 4~500명 정도 증가했습니다.하지만, 11주에 4000명 대에서 3100명으로 900명이 급감한 이후로 계속 하락세를 타고 24주차 현재 신규 가입자의 수는 500명 남짓하고 있습니다.11주차부터의 마케팅 효과는 빛을 보지 못하고 있는 것 같습니다. 코호트 리텐션 시각화를 보자면,2) 기존유저(Current) : 제품을 지속적으로 사용하는 유저(바로 전 주와 이번 주)출시로부터 현재까지 줄곧 주차마다의 신규유저 대비 기존유저의 비율이 증가하고 있습니다.특히, 24주차에 가입한 신규유저 대비 기존유저의 비율은 76%나 차지하고 있습니다.시간이 흐름에 따라 점차 어플이 안정화되면서 신규유저들이 만족을 느끼는 비율이 늘어가고 있다고 보입니다. 3) 복귀유저(Resurrected) : 과거에 사용 -> 비활성화 -> 다시 제품을 사용한 유저(2주 후 복귀)복귀유저는 12주차를 분기로 13주차부터는 신규가입자보다 복귀유저의 수가 많아지기 시작했습니다.24주차에는 신규유저 대비 복귀유저는 6배가 많습니다.복귀유저가 기존유저로 안정되는 현황을 추출할 수 있는 쿼리는 조금 더 연구해 보겠습니다. 4) 휴면유저(Dormant) : 일정기간 제품을 사용하지 않은 비활성화 사용자신규유저 대비 휴면유저는 꾸준히 늘어가고 있습니다(휴면유저가 누적됨).24주차 신규가입자의 5.7배가 휴면유저 입니다.# 4) Core Event를 'click_payment'라고 설정하고 Weekly Retention을 구해주세요 WITH base AS ( SELECT DISTINCT user_id, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN '2022-08-01' AND '2023-08-03' AND event_name = 'click_payment' ), first_week_and_diff AS ( SELECT *, DATE_DIFF(event_week, first_week, WEEK) AS diff_of_week FROM ( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ) ), user_counts AS ( SELECT diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_week_and_diff GROUP BY diff_of_week ORDER BY diff_of_week ) SELECT diff_of_week, user_cnt, ROUND(SAFE_DIVIDE(user_cnt, cohort_users), 3) AS retention_rate FROM ( SELECT *, FIRST_VALUE(user_cnt) OVER(ORDER BY diff_of_week) AS cohort_users FROM user_counts ) ORDER BY diff_of_week
-
해결됨BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[1-9. 퍼널 SQL 쿼리 작성하기] 집계데이터 pivot쿼리
안녕하세요 카일스쿨님:)강의 마지막에 선택과제(?)로 주신 집계데이터 pivot쿼리 작성해봤습니다.쿼리를 작성하면서, 지금은 피봇할 컬럼이 많지 않아서 하나하나 피봇해주는 것으로 진행했는데 이게 최선일지가 의문이 들어서요,피봇하는 컬럼 조건이 다를 뿐 과정은 동일해서 공통으로 적용할 수 있는 방법이 있을 것 같은데 제가 잘못 이해하고 있는 것인지, 아니면 피봇을 항상 이런식으로 진행해야 하는 것인지도 같이 질문드립니다.감사합니다:) with base as ( select event_date ,event_timestamp ,event_name ,user_id ,user_pseudo_id ,max(if(param.key = 'firebase_screen',param.value.string_value,null)) as `firebase_screen` ,max(if(param.key = 'food_id',param.value.int_value,null)) as `food_id` ,max(if(param.key = 'session_id',param.value.string_value,null)) as `session_id` from advanced.app_logs , unnest(event_params) as param where 1=1 and event_name in ('screen_view','click_payment') and event_date >= '2022-08-01' and event_date <= '2022-08-18' group by all order by 1 ) , before_pivot as ( select event_date ,concat(event_name,'-',firebase_screen) as `event_name_with_screen` ,case when event_name = 'screen_view' and firebase_screen = 'welcome' then 1 when event_name = 'screen_view' and firebase_screen = 'home' then 2 when event_name = 'screen_view' and firebase_screen = 'food_category' then 3 when event_name = 'screen_view' and firebase_screen = 'restaurant' then 4 when event_name = 'screen_view' and firebase_screen = 'cart' then 5 when event_name = 'click_payment' and firebase_screen = 'cart' then 6 end as `step_number` ,count(*) as cnt from base where 1=1 group by 1,2,3 having step_number is not null ) select event_date ,max(if (event_name_with_screen = 'screen_view-welcome',cnt,null)) as `screen_view-welcome` ,max(if (event_name_with_screen = 'screen_view-home',cnt,null)) as `screen_view-home` ,max(if (event_name_with_screen = 'screen_view-food_category',cnt,null)) as `screen_view-food_category` ,max(if (event_name_with_screen = 'screen_view-restaurant',cnt,null)) as `screen_view-restaurant` ,max(if (event_name_with_screen = 'screen_view-cart',cnt,null)) as `screen_view-cart` ,max(if (event_name_with_screen = 'click_payment-cart',cnt,null)) as `click_payment-cart` from before_pivot group by all order by 1
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디] 3주차 리텐션 과제
노션 링크로 업로드 합니다!https://qkffpsxkdlwm.notion.site/3-13a35e3a8bef8019b7a6d9c6abc72193?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 3주차 과제] 리텐션 분석
노션에서 작성했습니다! https://salt-baron-5c5.notion.site/3-13aa734e64b880a1a9a1e84a92fa3593
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디] 3주차 리텐션 과제
Notion 링크로 대체합니다.https://colney4844.notion.site/3-13a59b98d5db80438b7fcef0146771f6?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 3주차 과제] 리텐션 연습문제
1. Weekly RetentionWITH base AS ( SELECT DISTINCT DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATE_TRUNC(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul'), WEEK(MONDAY)) AS event_week, DATE_TRUNC(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul'), MONTH) AS event_month, user_id, user_pseudo_id, event_name, platform, event_params FROM advanced.app_logs ), user_visit_base AS ( SELECT user_pseudo_id, MIN(event_week) OVER (PARTITION BY user_pseudo_id) AS first_visit_week, event_week AS visit_week FROM base ), user_visit_weekdiff AS ( SELECT first_visit_week, DATE_DIFF(visit_week, first_visit_week, week) AS week_diff, COUNT(DISTINCT user_pseudo_id) AS visit_users FROM user_visit_base GROUP BY ALL ) SELECT first_visit_week, week_diff, visit_users, SAFE_DIVIDE(visit_users, FIRST_VALUE(visit_users) OVER (PARTITION BY first_visit_week ORDER BY week_diff ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)) AS retention FROM user_visit_weekdiff 2. Retain User → New User, Current User, Resurrected User, Dormant User-- 한 달(30일)을 기준으로 구분하였다. -- 신규 유저: 최근 한 달 이내 새로 방문 -- 기존 유저: 최근 한 달 이내 재방문, 그 이전 한달에도 방문 -- 복귀 유저: 최근 한 달 이내 재방문, 그 이전 한달에는 방문 X -- 휴먼 유저: 최근 한 달 이내 재방문 X WITH base AS ( SELECT DISTINCT DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, user_pseudo_id FROM advanced.app_logs ), user_first_last AS ( SELECT user_pseudo_id, MIN(event_date) AS first_date, MAX(event_date) AS last_date FROM base GROUP BY user_pseudo_id ), last_event_date AS ( SELECT MAX(event_date) AS current_date FROM base ), user_activity AS ( SELECT user_pseudo_id, event_date, LEAD(event_date) OVER (PARTITION BY user_pseudo_id ORDER BY event_date) AS next_event_date, CASE WHEN DATE_DIFF(LEAD(event_date) OVER (PARTITION BY user_pseudo_id ORDER BY event_date), event_date, DAY) > 30 THEN 1 ELSE 0 END AS dormant_history FROM base ), user_types AS ( SELECT u.user_pseudo_id, u.first_date, u.last_date, led.current_date, MAX(us.dormant_history) AS dormant_history, CASE WHEN DATE_DIFF(led.current_date, u.last_date, DAY) > 30 THEN 'dormant_user' WHEN DATE_DIFF(u.last_date, u.first_date, DAY) <= 30 THEN 'new_user' WHEN MAX(us.dormant_history) = 1 THEN 'resurrected_user' ELSE 'current_user' END AS user_type FROM user_first_last AS u CROSS JOIN last_event_date AS led LEFT JOIN user_activity AS us ON u.user_pseudo_id = us.user_pseudo_id GROUP BY u.user_pseudo_id, u.first_date, u.last_date, led.current_date ), first_week_and_diff AS ( SELECT ut.user_type, fw.user_pseudo_id, fw.event_date, DATE_DIFF(DATE_TRUNC(fw.event_date, WEEK(MONDAY)), DATE_TRUNC(ut.first_date, WEEK(MONDAY)), WEEK) AS diff_of_week FROM base AS fw JOIN user_types AS ut ON fw.user_pseudo_id = ut.user_pseudo_id ), user_cnt_by_type_and_week AS ( SELECT user_type, diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_week_and_diff GROUP BY user_type, diff_of_week ), retention_base AS ( SELECT user_type, diff_of_week, user_cnt, FIRST_VALUE(user_cnt) OVER (PARTITION BY user_type ORDER BY diff_of_week) AS first_user_cnt FROM user_cnt_by_type_and_week ) SELECT user_type, diff_of_week, ROUND(SAFE_DIVIDE(user_cnt, first_user_cnt), 2) AS retention_rate FROM retention_base ORDER BY user_type, diff_of_week 3. retention이 높은 그룹?current user: 최고 0.41 (4~5주차)new user: 최고 0.11 (1,3주차)resurrected user: 최고 0.14 (9~12주차)dormant user: 최고 0.09 (3주차)리텐션이 그나마 높은 그룹: current usercurrent user, resurrected user 리텐션 분포의 경우 상승세 → 하강세 경향성을 보여 피크를 찍을 때 즈음 어떤 이벤트가 있었다고도 추측할 수 있다. 4. click_payment 이벤트를 중점으로 본 Weekly Retention?WITH base AS ( SELECT DISTINCT DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, event_name, user_id, user_pseudo_id FROM advanced.app_logs WHERE 1=1 AND event_name = "click_payment" ), event_week_and_first_week AS ( SELECT DISTINCT DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week, user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id ORDER BY event_date), WEEK(MONDAY)) AS first_week FROM base ), retention_base AS ( SELECT *, FIRST_VALUE(user_cnt) OVER(ORDER BY day_of_week) AS total_user FROM ( SELECT DATE_DIFF(event_week, first_week, WEEK) AS day_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM event_week_and_first_week GROUP BY day_of_week ) ) SELECT day_of_week, ROUND(SAFE_DIVIDE(user_cnt, total_user), 2) AS retention FROM retention_base ORDER BY day_of_week
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 3주차 과제] 리텐션 과제
Notion으로 작성하여 링크를 첨부합니다.https://polite-vinyl-a61.notion.site/3-13a4994b207d804f969dc98e5bce9794
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디] 3주차 과제 제품 현황 분석 : 리텐션 분석
과제. 직접 retention 구해보기(주별 weekly)WITH base AS ( SELECT DISTINCT user_id, user_pseudo_id, event_name, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime FROM advanced.app_logs WHERE event_date BETWEEN "2022-08-01" AND "2022-11-03" ), first_week_and_diff AS( SELECT *, -- DATE_DIFF(event_date, first_date, DAY) AS diff_of_day, DATE_DIFF(event_week, first_week, WEEK) AS diff_of_week FROM ( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, event_date, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ) ), user_cnt_and_diff AS( SELECT diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_week_and_diff GROUP BY ALL ORDER BY diff_of_week ), retain_base AS ( SELECT *, FIRST_VALUE(user_cnt) OVER (ORDER BY diff_of_week) AS first_user_cnt FROM user_cnt_and_diff ) SELECT diff_of_week, ROUND(SAFE_DIVIDE(user_cnt, first_user_cnt), 2) AS retention FROM retain_base과제. 월별 리텐션 쿼리 작성해보기WITH base AS ( SELECT DISTINCT user_id, user_pseudo_id, event_name, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime FROM advanced.app_logs WHERE event_date BETWEEN "2022-08-01" AND "2022-11-03" ), first_month_and_diff AS( SELECT *, -- DATE_DIFF(event_date, first_date, DAY) AS diff_of_day, -- DATE_DIFF(event_week, first_week, WEEK) AS diff_of_week DATE_DIFF(event_month, first_month, MONTH) AS diff_of_month FROM ( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id), MONTH) AS first_month, event_date, DATE_TRUNC(event_date, MONTH) AS event_month FROM base ) ), user_cnt_and_diff AS( SELECT diff_of_month, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_month_and_diff GROUP BY ALL ORDER BY diff_of_month ), retain_base AS ( SELECT *, FIRST_VALUE(user_cnt) OVER (ORDER BY diff_of_month) AS first_user_cnt FROM user_cnt_and_diff ) SELECT diff_of_month, user_cnt SAFE_DIVIDE(user_cnt, first_user_cnt) AS retention FROM retain_base 리텐션 연습문제Weekly Retention을 구하는 쿼리를 바닥부터 스스로 작성해보세요. # 답 -- Weekly Retention을 구하자! (Retention이란, 시간이 흐르면서 다시 제품을 사용하는지 측정하는 지표) -- 첫 방문 후 매주 재방문 비율(weekly retention)을 계산하자! WITH base AS ( SELECT DISTINCT ###### DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, event_name, user_id, ###### user_pseudo_id FROM advanced.app_logs WHERE event_date BETWEEN "2022-08-01" AND "2023-08-31" ), event_week_and_first_week AS ( SELECT DISTINCT DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week, user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id ORDER BY event_date), WEEK(MONDAY)) AS first_week FROM base ), retention_base AS ( SELECT *, FIRST_VALUE(user_cnt) OVER(ORDER BY day_of_week) AS total_user FROM( SELECT DATE_DIFF(event_week, first_week, WEEK) AS day_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM event_week_and_first_week GROUP BY ALL ) -- ORDER BY day_of_week ) SELECT day_of_week, SAFE_DIVIDE(user_cnt, total_user) AS retention FROM retention_base Retain User를 New + Current + Resurrected + Dormant User로 나누는 쿼리를 작성해보세요# 답 -- 신규 유저: 첫 활동일 기준 7일 이내 -- 기존 유저: 첫 활동일 기준 7일 이후, 마지막 활동으로부터 90일 이내 -- 휴면 유저: 마지막 활동일 이후 90일 이상 경과 -- 복귀 유저: 90일 이상 비활성화 이후 다시 돌아온 유저 WITH base AS ( SELECT DISTINCT DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, user_pseudo_id FROM advanced.app_logs --WHERE event_date BETWEEN "2022-08-01" AND "2023-08-31" ),user_first_last AS ( SELECT user_pseudo_id, MIN(event_date) AS first_date, #OVER(PARTITION BY user_pseudo_id)를 넣으면 윈도우함수사용과 GROUPBY졸 사용 비교했을 때처럼 중복행이 생긴다 MAX(event_date) AS last_date FROM base GROUP BY user_pseudo_id ), last_event_date AS ( -- 원래라면 CURRENT_DATE()로 해야하지만, 앱로그 데이터의 마지막 날로 설정. SELECT MAX(event_date) AS current_date FROM base ), user_activity AS ( -- 90일 이상 제품을 사용하지 않았다면 1, 아니면 0 SELECT user_pseudo_id, event_date, LEAD(event_date) OVER (PARTITION BY user_pseudo_id ORDER BY event_date) AS next_event_date, CASE WHEN DATE_DIFF(LEAD(event_date) OVER (PARTITION BY user_pseudo_id ORDER BY event_date), event_date, DAY) > 90 THEN 1 ELSE 0 END AS dormant_history FROM base ) -- user_types AS ( SELECT DISTINCT u.user_pseudo_id, u.first_date, u.last_date, led.current_date, us.dormant_history, CASE WHEN DATE_DIFF(led.current_date, u.last_date, DAY) > 90 THEN 'dormant_user' WHEN DATE_DIFF(u.last_date, u.first_date, DAY) <= 7 THEN 'new_user' WHEN dormant_history = 1 THEN 'resurrected_user' ELSE 'current_user' END AS user_type FROM user_first_last AS u CROSS JOIN last_event_date AS led LEFT JOIN user_activity AS us ON u.user_pseudo_id = us.user_pseudo_id주어진 데이터에서 어떤 사람들이 리텐션이 그나마 높을까요? 찾아보세요WITH base AS ( SELECT DISTINCT DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, user_pseudo_id FROM advanced.app_logs ), user_first_last AS ( SELECT user_pseudo_id, MIN(event_date) AS first_date, MAX(event_date) AS last_date FROM base GROUP BY user_pseudo_id ), last_event_date AS ( SELECT MAX(event_date) AS current_date FROM base ), user_activity AS ( SELECT user_pseudo_id, event_date, LEAD(event_date) OVER (PARTITION BY user_pseudo_id ORDER BY event_date) AS next_event_date, CASE WHEN DATE_DIFF(LEAD(event_date) OVER (PARTITION BY user_pseudo_id ORDER BY event_date), event_date, DAY) > 90 THEN 1 ELSE 0 END AS dormant_history FROM base ), user_types AS ( SELECT u.user_pseudo_id, u.first_date, u.last_date, led.current_date, MAX(us.dormant_history) AS dormant_history, CASE WHEN DATE_DIFF(led.current_date, u.last_date, DAY) > 90 THEN 'dormant_user' WHEN DATE_DIFF(u.last_date, u.first_date, DAY) <= 7 THEN 'new_user' WHEN MAX(us.dormant_history) = 1 THEN 'resurrected_user' ELSE 'current_user' END AS user_type FROM user_first_last AS u CROSS JOIN last_event_date AS led LEFT JOIN user_activity AS us ON u.user_pseudo_id = us.user_pseudo_id GROUP BY u.user_pseudo_id, u.first_date, u.last_date, led.current_date ), first_week_and_diff AS ( SELECT ut.user_type, fw.user_pseudo_id, fw.event_date, DATE_DIFF(DATE_TRUNC(fw.event_date, WEEK(MONDAY)), DATE_TRUNC(ut.first_date, WEEK(MONDAY)), WEEK) AS diff_of_week FROM base AS fw JOIN user_types AS ut ON fw.user_pseudo_id = ut.user_pseudo_id ), user_cnt_by_type_and_week AS ( SELECT user_type, diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_week_and_diff GROUP BY user_type, diff_of_week ), retention_base AS ( SELECT user_type, diff_of_week, user_cnt, FIRST_VALUE(user_cnt) OVER (PARTITION BY user_type ORDER BY diff_of_week) AS first_user_cnt FROM user_cnt_by_type_and_week ) SELECT user_type, diff_of_week, ROUND(SAFE_DIVIDE(user_cnt, first_user_cnt), 2) AS retention_rate FROM retention_base ORDER BY user_type, diff_of_week current user의 1주차 리텐션은 0.11부터 시작해서 5~6주차에는 0.15까지 오릅니다.new user의 경우 1주차까지 밖에 리텐션을 구하지 못하고, 0.06입니다.resurrected user의 경우 1주차 리텐션은 0.04부터 시작해서 7주차에는 0으로 빠르게 감소합니다.dormant user의 경우 휴면고객의 리텐션을 구하는 것이 의미가 있는지 잘 모르겠지만, 1주차 리텐션은 0.05부터 시작해서 11주차까지 0으로 감소합니다.리텐션이 그나마 높은 유저는 current user이고, 리텐션이 주마다 서서히 증가하거나 감소하는데 2주차에 오른 것으로 보아 기능 배포나 이벤트가 있지 않았을까 예상해봅니다.Core Event를 “click_payment”라고 설정하고 Weekly Retention을 구해주세요WITH base AS ( SELECT DISTINCT DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, event_name, user_id, user_pseudo_id FROM advanced.app_logs WHERE event_name = "click_payment" ), event_week_and_first_week AS ( SELECT DISTINCT DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week, user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id ORDER BY event_date), WEEK(MONDAY)) AS first_week FROM base ), retention_base AS ( SELECT *, FIRST_VALUE(user_cnt) OVER(ORDER BY day_of_week) AS total_user FROM ( SELECT DATE_DIFF(event_week, first_week, WEEK) AS day_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM event_week_and_first_week GROUP BY day_of_week ) ) SELECT day_of_week, ROUND(SAFE_DIVIDE(user_cnt, total_user), 2) AS retention FROM retention_base ORDER BY day_of_weekapp_logs 테이블에서 event_name이 click_payment인 것만 필터 걸어서 리텐션 구하기를 진행했는데 이렇게 푸는 것이 맞을까요..?
-
해결됨초보자를 위한 BigQuery(SQL) 입문
2-5 집계 23:42 GROUP BY 연습문제 질문 드려요
포켓몬의 수를 타입 별 집계, 수가 10 이상인 타입만 남기는 쿼리 작성 연습문제에 대하여 질문 드려요. 작성 전에 고민했던 부분은포켓몬 타입은 type 1, 2로 나뉘어 있다.type 2는 null 인 경우가 있다. => where 로 제외할 수 있을까?type 1, 2 을 포함해서 세야 할 것 같다. (총 포켓몬 수보다 합산 값이 클 수 있다)였거든요.Group by 에서 type2를 빼면 통합되지 않았다는 오류가 발생해서 어쩔 수 없이 넣긴 했는데, 제가 원한 형태로 구현되지는 않았습니다. 어떻게 해야 null 을 제외한 type2 의 값까지 전부 셀 수 있을까요?WHERE TYPE2 IS NOT NULL 을 넣게 되면 1개 타입만 가진 개체의 경우 아예 집계를 하지 않는 것 같아서 그 부분도 뺄 수 밖에 없었어요.최종적으로는 1열에 타입, 2열에 해당하는 개체 수가 나오게 구현하고 싶습니다.(바위, 불인 포켓몬의 경우 바위에 1개, 불에 1개 계산되도록이요.바위, 불 포켓몬 + 바위 포켓몬 + 불 포켓몬 = 바위 2개체 불 2개체)SELECT type1, type2, COUNT(id) as cnt FROM `basic.pokemon` GROUP BY type1, type2
-
미해결초보자를 위한 BigQuery(SQL) 입문
battel 테이블 오류가 뜨면서 테이블 저장이 안되요
Battel 데이터 문제로 보이는데, 설정을 잘못한걸까요?
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[3]_[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 과제와 추가 문제 풀이
추가 문제 1번WITH transaction_data AS ( SELECT 111 AS user_id, 1001 AS item_id, 719200 AS actual_price, '01/08/2024 12:00:00' AS transaction_date UNION ALL SELECT 111, 2002, 89000, '01/10/2024 12:00:00' UNION ALL SELECT 189, 2002, 89000, '01/12/2024 12:00:00' UNION ALL SELECT 156, 3002, 459000, '01/15/2024 12:00:00' UNION ALL SELECT 121, 1001, 719200, '01/18/2024 12:00:00' UNION ALL SELECT 156, 2001, 90300, '01/25/2024 12:00:00' UNION ALL SELECT 145, 3001, 399000, '01/26/2024 12:00:00' UNION ALL SELECT 189, 1002, 607200, '01/28/2024 12:00:00' UNION ALL SELECT 111, 3001, 399000, '02/05/2024 12:00:00' UNION ALL SELECT 178, 1002, 759000, '02/07/2024 12:00:00' UNION ALL SELECT 121, 2002, 62300, '02/08/2024 12:00:00' UNION ALL SELECT 156, 1001, 899000, '02/10/2024 12:00:00' UNION ALL SELECT 190, 2001, 90300, '02/11/2024 12:00:00' UNION ALL SELECT 189, 2001, 90300, '02/14/2024 12:00:00' UNION ALL SELECT 111, 1002, 759000, '02/15/2024 12:00:00' UNION ALL SELECT 156, 3001, 299250, '02/20/2024 12:00:00' UNION ALL SELECT 189, 3002, 344250, '02/25/2024 12:00:00' UNION ALL SELECT 111, 2001, 90300, '02/28/2024 12:00:00' ), user_info AS ( SELECT 111 AS user_id, 'Seoul' AS city, 28 AS age, 'Female' AS gender UNION ALL SELECT 121, 'Busan', 35, 'Male' UNION ALL SELECT 145, 'Incheon', 42, 'Female' UNION ALL SELECT 156, 'Seoul', 31, 'Male' UNION ALL SELECT 178, 'Daegu', 25, 'Female' UNION ALL SELECT 189, 'Seoul', 39, 'Male' UNION ALL SELECT 190, 'Busan', 29, 'Female' ), item_info AS ( SELECT 1001 AS item_id, 'Electronics' AS category, 'Smartphone' AS item_name, 899000 AS list_price UNION ALL SELECT 1002 AS item_id, 'Electronics' AS category, 'Tablet' AS item_name, 759000 AS list_price UNION ALL SELECT 2001 AS item_id, 'Fashion' AS category, 'Sneakers' AS item_name, 129000 AS list_price UNION ALL SELECT 2002 AS item_id, 'Fashion' AS category, 'Backpack' AS item_name, 89000 AS list_price UNION ALL SELECT 3001 AS item_id, 'Home' AS category, 'Coffee Machine' AS item_name, 399000 AS list_price UNION ALL SELECT 3002 AS item_id, 'Home' AS category, 'Air Purifier' AS item_name, 459000 AS list_price ) # 1번 문제(15분) -- - 주차별 각 카테고리별 평균 할인율이 가장 높았던 기간과 할인율을 구하는 쿼리를 작성해주세요 -- - 단, 날짜 데이터를 YYYY-MM-DD 23:59:39 이런 형태로 변경해주세요 , real_base as ( select * from ( select *, avg(diff_price) as avg_price, from( SELECT transaction_date, category, list_price, (list_price - actual_price) as diff_price FROM item_info as item left join transaction_data trans_date on item.item_id = trans_date.item_id ) as base group by all ) as base2 ) ## 주차별 평균 할인율 가장 높은 기간 --> 2024-01-08 ~ 2024-01-18 -- select -- FORMAT_TIMESTAMP('%Y-%m-%d 23:59:39', PARSE_TIMESTAMP('%m/%d/%Y %H:%M:%S', transaction_date)) AS formatted_date, -- avg_price -- from real_base -- group by -- 1,2 -- order by -- 2 desc ## 카테고리별 평균 할인율이 가장 높았던 기간 select *, round((max_price / list_price) * 100, 2) as `할인율` from ( select distinct FORMAT_TIMESTAMP('%Y-%m-%d 23:59:39', PARSE_TIMESTAMP('%m/%d/%Y %H:%M:%S', transaction_date)) AS formatted_date, category, max(avg_price) as max_price, list_price from real_base group by all order by 3 desc ) as avg_price_table order by 5 desc ## 할인율이 가장 높았던 주차 -- 01/25/2024 -- 02/08/2024 -- 02/11/2024 -- 02/14/2024 -- 02/28/2024 -- select -- transaction_date, -- round((diff_price / list_price) * 100, 2) as `할인율` -- -- transaction_date, -- -- safe_divide(diff_price, list_price) -- from real_base -- group by -- 1,2 -- order by -- 2 desc ## 할인율이 높았던 카테고리 -- Fashion -- select -- category, -- round((diff_price / list_price) * 100, 2) as `할인율` -- -- transaction_date, -- -- safe_divide(diff_price, list_price) -- from real_base -- group by all -- order by -- 2 desc 2번WITH transaction_data AS ( SELECT 111 AS user_id, 1001 AS item_id, 719200 AS actual_price, '01/08/2024 12:00:00' AS transaction_date UNION ALL SELECT 111, 2002, 89000, '01/10/2024 12:00:00' UNION ALL SELECT 189, 2002, 89000, '01/12/2024 12:00:00' UNION ALL SELECT 156, 3002, 459000, '01/15/2024 12:00:00' UNION ALL SELECT 121, 1001, 719200, '01/18/2024 12:00:00' UNION ALL SELECT 156, 2001, 90300, '01/25/2024 12:00:00' UNION ALL SELECT 145, 3001, 399000, '01/26/2024 12:00:00' UNION ALL SELECT 189, 1002, 607200, '01/28/2024 12:00:00' UNION ALL SELECT 111, 3001, 399000, '02/05/2024 12:00:00' UNION ALL SELECT 178, 1002, 759000, '02/07/2024 12:00:00' UNION ALL SELECT 121, 2002, 62300, '02/08/2024 12:00:00' UNION ALL SELECT 156, 1001, 899000, '02/10/2024 12:00:00' UNION ALL SELECT 190, 2001, 90300, '02/11/2024 12:00:00' UNION ALL SELECT 189, 2001, 90300, '02/14/2024 12:00:00' UNION ALL SELECT 111, 1002, 759000, '02/15/2024 12:00:00' UNION ALL SELECT 156, 3001, 299250, '02/20/2024 12:00:00' UNION ALL SELECT 189, 3002, 344250, '02/25/2024 12:00:00' UNION ALL SELECT 111, 2001, 90300, '02/28/2024 12:00:00' ), user_info AS ( SELECT 111 AS user_id, 'Seoul' AS city, 28 AS age, 'Female' AS gender UNION ALL SELECT 121, 'Busan', 35, 'Male' UNION ALL SELECT 145, 'Incheon', 42, 'Female' UNION ALL SELECT 156, 'Seoul', 31, 'Male' UNION ALL SELECT 178, 'Daegu', 25, 'Female' UNION ALL SELECT 189, 'Seoul', 39, 'Male' UNION ALL SELECT 190, 'Busan', 29, 'Female' ), item_info AS ( SELECT 1001 AS item_id, 'Electronics' AS category, 'Smartphone' AS item_name, 899000 AS list_price UNION ALL SELECT 1002 AS item_id, 'Electronics' AS category, 'Tablet' AS item_name, 759000 AS list_price UNION ALL SELECT 2001 AS item_id, 'Fashion' AS category, 'Sneakers' AS item_name, 129000 AS list_price UNION ALL SELECT 2002 AS item_id, 'Fashion' AS category, 'Backpack' AS item_name, 89000 AS list_price UNION ALL SELECT 3001 AS item_id, 'Home' AS category, 'Coffee Machine' AS item_name, 399000 AS list_price UNION ALL SELECT 3002 AS item_id, 'Home' AS category, 'Air Purifier' AS item_name, 459000 AS list_price ) # 2번 문제(10분) 1610 -- - 2024년 1월에 가장 많은 매출을 기록한 카테고리를 구하는 쿼리를 작성해주세요 -- 날짜 테이블 -- 아이템 테이블 활용 -- 기간에 필터링 걸기 -- 카테고리 -- 매출 -- 카테고리별 구매가 -- 구매가의 합 = 매출 ## 카테고리별 구매가 , category_price as ( select item_date.transaction_date, item.category, item_date.actual_price from item_info as item left join transaction_data as item_date on item.item_id = item_date.item_id group by all ) ## 매출 | 카테고리별 구매가의 합 , `매출_테이블` as ( select *, sum(actual_price) over(partition by category) as `매출` from category_price ) ## 카테고리별 매출에 1월 필터링 select * from `매출_테이블` where transaction_date between '01/08/2024' and '01/28/2024' 3번WITH transaction_data AS ( SELECT 111 AS user_id, 1001 AS item_id, 719200 AS actual_price, '01/08/2024 12:00:00' AS transaction_date UNION ALL SELECT 111, 2002, 89000, '01/10/2024 12:00:00' UNION ALL SELECT 189, 2002, 89000, '01/12/2024 12:00:00' UNION ALL SELECT 156, 3002, 459000, '01/15/2024 12:00:00' UNION ALL SELECT 121, 1001, 719200, '01/18/2024 12:00:00' UNION ALL SELECT 156, 2001, 90300, '01/25/2024 12:00:00' UNION ALL SELECT 145, 3001, 399000, '01/26/2024 12:00:00' UNION ALL SELECT 189, 1002, 607200, '01/28/2024 12:00:00' UNION ALL SELECT 111, 3001, 399000, '02/05/2024 12:00:00' UNION ALL SELECT 178, 1002, 759000, '02/07/2024 12:00:00' UNION ALL SELECT 121, 2002, 62300, '02/08/2024 12:00:00' UNION ALL SELECT 156, 1001, 899000, '02/10/2024 12:00:00' UNION ALL SELECT 190, 2001, 90300, '02/11/2024 12:00:00' UNION ALL SELECT 189, 2001, 90300, '02/14/2024 12:00:00' UNION ALL SELECT 111, 1002, 759000, '02/15/2024 12:00:00' UNION ALL SELECT 156, 3001, 299250, '02/20/2024 12:00:00' UNION ALL SELECT 189, 3002, 344250, '02/25/2024 12:00:00' UNION ALL SELECT 111, 2001, 90300, '02/28/2024 12:00:00' ), user_info AS ( SELECT 111 AS user_id, 'Seoul' AS city, 28 AS age, 'Female' AS gender UNION ALL SELECT 121, 'Busan', 35, 'Male' UNION ALL SELECT 145, 'Incheon', 42, 'Female' UNION ALL SELECT 156, 'Seoul', 31, 'Male' UNION ALL SELECT 178, 'Daegu', 25, 'Female' UNION ALL SELECT 189, 'Seoul', 39, 'Male' UNION ALL SELECT 190, 'Busan', 29, 'Female' ), item_info AS ( SELECT 1001 AS item_id, 'Electronics' AS category, 'Smartphone' AS item_name, 899000 AS list_price UNION ALL SELECT 1002 AS item_id, 'Electronics' AS category, 'Tablet' AS item_name, 759000 AS list_price UNION ALL SELECT 2001 AS item_id, 'Fashion' AS category, 'Sneakers' AS item_name, 129000 AS list_price UNION ALL SELECT 2002 AS item_id, 'Fashion' AS category, 'Backpack' AS item_name, 89000 AS list_price UNION ALL SELECT 3001 AS item_id, 'Home' AS category, 'Coffee Machine' AS item_name, 399000 AS list_price UNION ALL SELECT 3002 AS item_id, 'Home' AS category, 'Air Purifier' AS item_name, 459000 AS list_price ) # 3번 문제(10분) 1646 -- - 유저별 총 구매 금액이 200만원 이상인 유저들이 가장 많이 구매한 카테고리를 찾는 쿼리를 작성해주세요 -- 유저별 총 구매 금액 -- 유저 | 총 구매금액 -- 유저별 | 구매금액 -- 구매금액 합산 = 총 구매금액 , `유저별_총구매금액` as ( select user_id, `총 구매금액` from ( select user_id, sum(actual_price) over(partition by user_id) as `총 구매금액` from ( SELECT user_id, actual_price, FROM transaction_data group by all ) as base1 ) base2 group by 1,2 ) -- 200만원 이상 필터링 , `200만원 이상 필터링` as ( select * from `유저별_총구매금액` where `총 구매금액` >= 2000000 ) ## 유저들이 가장 많이 구매한 카테고리 -- 유저 | 카테고리 | 구매 -- 구매가가 있는 카테고리 수를 세면 어떨까? -- 카테고리별 카운트? select category, `카테고리별카운트` from ( select *, sum(`카테고리수`) over(partition by category) as `카테고리별카운트` from ( select user_id, category, count(category) over(partition by user_id) as `카테고리수` from ( select td2.user_id, item2.category, td2.actual_price from transaction_data as td2 left join item_info as item2 on td2.item_id = item2.item_id ) b_base ) as b_base2 ) as b_base3 group by 1,2 order by 2 desc
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[2]_[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 과제와 추가 문제 풀이
유저 상태 new user# User Status 정의 -- "우리 서비스를 이용하는 유저들의 주차별 상태가 어떻게 되나?" /* 1. New User: 해당 주차에 처음 방문한 유저 2. Current User: 지난주에도 방문했고 이번 주에도 방문한 유저 3. Resurrected User: 지난주에는 없었는데 이번주에 다시 방문한 유저 (첫 방문은 아님) 4. Dormant User: 지난주까지는 있었는데 이번주에 방문하지 않은 유저 */ # New User: 해당 주차에 처음 방문한 유저 -- '유저의 첫 방문 주'와 '유저별 이벤트 발생주'가 같은지 확인 -- 같으면, 1 -- 아니면 0 ## '유저의 첫 방문 주' with user_first_visit as ( select user_pseudo_id, date_trunc(min(event_date), week(monday)) as first_visit_date from `advanced.app_logs` where event_date between '2022-08-01' and '2022-12-31' group by 1 ) ## '유저의 모든 방문 주' , user_event_week as ( select distinct user_pseudo_id, date_trunc(event_date, week(monday)) as visit_date from `advanced.app_logs` where event_date between '2022-08-01' and '2022-12-31' ) ## 첫 방문 주와 모든 방문 주 중 같은 데이터가 있다면 '방문(=1)' , new_user_table as ( select ufv.user_pseudo_id, ufv.first_visit_date, uew.visit_date, if(ufv.first_visit_date = uew.visit_date, 1, 0) as new_user from user_first_visit as ufv inner join user_event_week as uew on ufv.user_pseudo_id = uew.user_pseudo_id group by 1,2,3 ) select first_visit_date as week, sum(new_user) as new_user_count from new_user_table group by 1 order by 1 current user# User Status 정의 -- "우리 서비스를 이용하는 유저들의 주차별 상태가 어떻게 되나?" /* 1. New User: 해당 주차에 처음 방문한 유저 2. Current User: 지난주에도 방문했고 이번 주에도 방문한 유저 3. Resurrected User: 지난주에는 없었는데 이번주에 다시 방문한 유저 (첫 방문은 아님) 4. Dormant User: 지난주까지는 있었는데 이번주에 방문하지 않은 유저 */ -- Current User: 지난주에도 방문했고 이번 주에도 방문한 유저 -- 유저별 모든 방문 주 구하기 -- 모든 방문주 - 전주 방문 데이터 = 1 --> current user ## 유저별 모든 방문 주 구하기 with user_visits as ( select user_pseudo_id, date_trunc((event_date), week(monday)) as visit_date from `advanced.app_logs` where event_date between '2022-08-01' and '2022-12-31' group by 1,2 ) ## 유저의 전주 방문 데이터 , prev_user_table as ( select user_pseudo_id, visit_date, lag(visit_date) over(partition by user_pseudo_id order by visit_date) as previous_visit_user from user_visits ) ## 모든 방문주 - 전주 방문 데이터 = 1 --> current user select visit_date as week, count(distinct user_pseudo_id) as current_user_cnt from( select distinct user_pseudo_id, visit_date, date_diff(visit_date, previous_visit_user, week) = 1 as current_user from prev_user_table ) as base where current_user is not null group by 1 order by 1 resurrected user# User Status 정의 -- "우리 서비스를 이용하는 유저들의 주차별 상태가 어떻게 되나?" /* 1. New User: 해당 주차에 처음 방문한 유저 2. Current User: 지난주에도 방문했고 이번 주에도 방문한 유저 3. Resurrected User: 지난주에는 없었는데 이번주에 다시 방문한 유저 (첫 방문은 아님) 4. Dormant User: 지난주까지는 있었는데 이번주에 방문하지 않은 유저 */ # Resurrected User: 지난주에는 없었는데 이번주에 다시 방문한 유저 (첫 방문은 아님) ## 유저별 모든 방문 주 구하기 with user_visits as ( select user_pseudo_id, date_trunc((event_date), week(monday)) as visit_date from `advanced.app_logs` where event_date between '2022-08-01' and '2022-12-31' group by 1,2 ) ## 유저의 전주 방문 데이터 , prev_user_table as ( select user_pseudo_id, visit_date, lag(visit_date) over(partition by user_pseudo_id order by visit_date) as previous_visit_user from user_visits ) ## (첫 방문이 아님) + (모든 방문 주 데이터 - 이전 방문주 > 1) = Resurrected User -- 첫 방문이 아닌 유저 : 'previous_visit_user is not null' -- 첫 방문이었으면 previous_visit_user가 null이었을테니깐. select visit_date as week, count(user_status) as Resurrected_User from ( select user_pseudo_id, visit_date, previous_visit_user, case when previous_visit_user is not null and date_diff(visit_date, previous_visit_user, week) > 1 then 'Resurrected_User' else null end as user_status from prev_user_table ) as Resurrected_User where user_status is not null group by 1 order by 1 dormant user# User Status 정의 -- "우리 서비스를 이용하는 유저들의 주차별 상태가 어떻게 되나?" /* 1. New User: 해당 주차에 처음 방문한 유저 2. Current User: 지난주에도 방문했고 이번 주에도 방문한 유저 3. Resurrected User: 지난주에는 없었는데 이번주에 다시 방문한 유저 (첫 방문은 아님) 4. Dormant User: 지난주까지는 있었는데 이번주에 방문하지 않은 유저 */ # Dormant User: 지난주까지는 있었는데 이번주에 방문하지 않은 유저 ## 모든 방문 데이터 with base as ( select user_pseudo_id, date_trunc((event_date), week(monday)) as visit_date from `advanced.app_logs` where event_date between '2022-08-01' and '2022-12-31' group by 1,2 ) ## 유저의 다음 방문주 데이터= , next_visit_date_table as ( select *, lead(visit_date) over(partition by user_pseudo_id order by visit_date) as next_visit_date from base ) ## Dormant User -- Dormant User? -- 다음주 방문 데이터가 null 이거나, -- next_visit_date > 1 week가 아닌 유저 , Dormant_User_table as ( select *, case when next_visit_date is null or date_diff(next_visit_date, visit_date, week) > 1 then 'Dormant_User' else null end as Dormant_User_data from next_visit_date_table ) select visit_date as week, count(Dormant_User_data) as Dormant_User_data_cnt from Dormant_User_table group by 1 order by 1