묻고 답해요
143만명의 커뮤니티!! 함께 토론해봐요.
인프런 TOP Writers
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 4주차] 6-1의 최종 과제
과제 관련 노션 링크 첨부합니다!https://www.notion.so/4-6-1-1419f4b04d5380308bbdf4a19394e924?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 4주차 과제] 강의 6-1. 최종과제
4주차 최종과제 노션 링크입니다:) https://salt-baron-5c5.notion.site/4-Foodie-Express-13ea734e64b8801595f8c8c3467d9bc5?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 4주차 과제] 강의 6-1의 최종과제
Notion 링크로 대체합니다.https://colney4844.notion.site/4-13e59b98d5db80d6937fc8fbd6e207e6?pvs=4
-
해결됨초보자를 위한 BigQuery(SQL) 입문
강의자료 전달 요청
강의자료 업로드 상태 확인 및 전달 부탁드립니다.커리큘럼 목차에서 강의자료 다운로드 한 이후 zip 안에 파일이 없는 것으로 나와 강의자료 확인할 수가 없습니다.
-
미해결SQLD 2주만에 따는 필승 합격법: 최신 개정판 기출문제로 초단기 공략
문제 pdf
PDF교재에는 문제와 답+풀이가 한번에 나와있는데 문제만 나와있는버전은 없을까요?
-
미해결FastAPI 완벽 가이드
가상환경 관련 질문있스니다.
선생님 강의에서는 아나콘다를 사용하셨는데, 저는 pipenv를 사용하고싶거든요..콘다 말고 다른 가상환경을 사용해도 괜찮을까요?
-
해결됨SW 개발자를 위한 성능 좋은 SQL 쿼리 작성법
강의 연장 여부 질문
안녕하세요, 김정선 강사님.강의 잘 듣고 있습니다.다름이 아니라, 개인적인 사정으로 인해 강의를 기간 내에 모두 수강하지 못하였는데, 혹시 강의 연장이 가능한지 여쭙고자 합니다.부디 확인 부탁드리며, 답변 주시면 감사하겠습니다!!감사합니다.
-
해결됨실전 jOOQ! Type Safe SQL with Java
pojo 에 setter 가 없는 경우가 있을까요?
안녕하세요 강사님. jooq 강의 잘 듣고 있습니다. 현재 update 부분 강의를 들으면서 실습해보고 있는데 Actor 에 setter 메소드들이 없어서 dao 를 통한 update를 하는데 다소 어려움이 생겼습니다. insert 의 경우는 생성자에 데이터를 넣어서 잘 넘어갔는데, update 에서는 setter 가 없으니까, insert 한 값을 Actor 객체로 반환 받아서 그 객체에 있는 setter 를 이용해 update 하는 방식이 불가능하다 보니 "setter 는 어디로 갔는가?" 생각이 들더라구요. 실습중인 jooq 버전은 3.19.5 이고, 아래는 Actor pojo 파일 구조 입니다.
-
해결됨FastAPI 완벽 가이드
ThreadPool 방식 질문드립니다
강의 정말 도움 많이 되었습니다!!강의 내용 중에 동시성인지 병렬성인지 헷갈리는게 있어 질문드립니다..!7:35에 일반 def 함수 라우터로 작성한 ThreadPool(병렬) 방식은 매번 요청시 ThreadPool에서 유휴 Tread를 찾아 동작하고 i/o wait가 발생할때 context switch를 해가면서 작업하는 것으로 이해했는데...이게 맞다면..! 완벽한 병렬성이 아니라 동시에 작업하는 것 처럼 보이는 동시성으로 동작하는 건가요?
-
미해결8시간 완성 SQLD(2과목)
계층형 쿼리 용어 정리 및 문법 강의 - 강의 오타인 거 같습니다
안녕하세요 SQLD 준비하고 있는 전진호 학생입니다. 강의 2:47에서 'PRIOR PK = FK' PK (부모)-> FK(자식) 순방향 전개'PRIOR FK = PK' FK (자식)-> PK(부모) 역방향 전개이렇게 나와야 하는 슬라이드에서 'PRIOR PK = FK' PK (부모)-> FK(자식) 순방향 전개'PRIOR FK = PK' FK (자식)-> FK(부모) 역방향 전개 FK -> PK가 FK -> FK로 바뀐 거 같습니다. 감사합니다. 전진호 드림.
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 3주차 과제] 리텐션 연습문제
1) Weekly Retention을 구하는 쿼리를 바닥부터 스스로 작성해보세요WITH base AS ( SELECT DISTINCT user_id, user_pseudo_id, event_name, DATETIME(TIMESTAMP_MICROS(event_timestamp), "Asia/Seoul") AS event_datetime, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), "Asia/Seoul")) AS event_date FROM `advanced.app_logs` ), first_week_and_diff AS ( SELECT *, DATE_DIFF(event_week, first_week, WEEK) AS diff_of_week FROM ( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ) ) SELECT diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt, ROUND(SAFE_DIVIDE((COUNT(DISTINCT user_pseudo_id)), (FIRST_VALUE(COUNT(DISTINCT user_pseudo_id)) OVER(ORDER BY diff_of_week))), 3) AS retention FROM first_week_and_diff GROUP BY diff_of_week ORDER BY diff_of_week2) Retain User를 New+Current+Resurrected+DormantUser로 나누는 쿼리를 작성해보세요.유저 구분 기준은 제품이나 서비스에 따라 다르겠지만 applogs 데이터는 주단위로 했을때 총 24주치 데이터가 있었고, 유저별로 평균 구매 주기와 방문 횟수를 구해보았을 때 6주를 기준으로 유저를 구분하면 될 것이라 생각함. New : 첫 방문한 신규 유저 Current : 방문 횟수 2회 이상이면서 6주 이내 재 방문 이력 있는 유저Resurrected : 방문 횟수 2회 이상 & 직전 방문 경과 7주 이상이면서 6주 이내 재방문 이력 있는 유저Dormant : 직전 방문 경과 7주 이상이면서 6주 이내 재방문 없는 유저(사실 이렇게 나누는게 맞는지 모르겠지만, 나름의 논리적인 기준을 찾아보려고 노력하였음. 😫 과제 기한 연장에도 개인적인 사정으로 시간이 부족하여 깊이 탐구하지 못했고, 쿼리 작성을 완성하지 못했습니다 😩 추후에 완성하면 보완하도록 하겠습니다. )3) 주어진 데이터에서 어떤 사람들이 리텐션이 그나마 높을까요? 찾아보세요 미완성으로 추후에 완성하면 보완하겠습니다 ! 4) Core Event를 “click_payment” 라고 설정하고 Weekly Retention을 구해주세요WITH base AS ( SELECT DISTINCT user_id, user_pseudo_id, event_name, DATETIME(TIMESTAMP_MICROS(event_timestamp), "Asia/Seoul") AS event_datetime, DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), "Asia/Seoul")) AS event_date FROM `advanced.app_logs` ), first_week_and_diff AS ( SELECT *, DATE_DIFF(event_week, first_week, WEEK) AS diff_of_week FROM ( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM base ) ) SELECT diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt, ROUND(SAFE_DIVIDE((COUNT(DISTINCT user_pseudo_id)), (FIRST_VALUE(COUNT(DISTINCT user_pseudo_id)) OVER(ORDER BY diff_of_week))), 3) AS retention FROM first_week_and_diff GROUP BY diff_of_week ORDER BY diff_of_week
-
미해결8시간 완성 SQLD(2과목)
- FIRST_VALUE, LAST_VALUE
WITH Sales AS ( SELECT 1 AS sale_id, DATE '2023-01-01' AS sale_date, 101 AS employee_id, 500 AS amount FROM dual UNION ALL SELECT 2 AS sale_id, DATE '2023-01-02' AS sale_date, 102 AS employee_id, 700 AS amount FROM dual UNION ALL SELECT 3 AS sale_id, DATE '2023-01-03' AS sale_date, 101 AS employee_id, 300 AS amount FROM dual UNION ALL SELECT 4 AS sale_id, DATE '2023-01-04' AS sale_date, 103 AS employee_id, 900 AS amount FROM dual UNION ALL SELECT 5 AS sale_id, DATE '2023-01-05' AS sale_date, 101 AS employee_id, 400 AS amount FROM dual UNION ALL SELECT 6 AS sale_id, DATE '2023-01-06' AS sale_date, 104 AS employee_id, 600 AS amount FROM dual UNION ALL SELECT 7 AS sale_id, DATE '2023-01-07' AS sale_date, 105 AS employee_id, 800 AS amount FROM dual UNION ALL SELECT 8 AS sale_id, DATE '2023-01-08' AS sale_date, 101 AS employee_id, 350 AS amount FROM dual UNION ALL SELECT 9 AS sale_id, DATE '2023-01-09' AS sale_date, 102 AS employee_id, 450 AS amount FROM dual UNION ALL SELECT 10 AS sale_id, DATE '2023-01-10' AS sale_date, 103 AS employee_id, 750 AS amount FROM dual UNION ALL SELECT 11 AS sale_id, DATE '2023-01-11' AS sale_date, 101 AS employee_id, 250 AS amount FROM dual UNION ALL SELECT 12 AS sale_id, DATE '2023-01-12' AS sale_date, 104 AS employee_id, 650 AS amount FROM dual UNION ALL SELECT 13 AS sale_id, DATE '2023-01-13' AS sale_date, 105 AS employee_id, 850 AS amount FROM dual UNION ALL SELECT 14 AS sale_id, DATE '2023-01-14' AS sale_date, 101 AS employee_id, 450 AS amount FROM dual UNION ALL SELECT 15 AS sale_id, DATE '2023-01-15' AS sale_date, 102 AS employee_id, 550 AS amount FROM dual UNION ALL SELECT 16 AS sale_id, DATE '2023-01-16' AS sale_date, 103 AS employee_id, 950 AS amount FROM dual UNION ALL SELECT 17 AS sale_id, DATE '2023-01-17' AS sale_date, 104 AS employee_id, 700 AS amount FROM dual UNION ALL SELECT 18 AS sale_id, DATE '2023-01-18' AS sale_date, 105 AS employee_id, 900 AS amount FROM dual UNION ALL SELECT 19 AS sale_id, DATE '2023-01-19' AS sale_date, 101 AS employee_id, 500 AS amount FROM dual UNION ALL SELECT 20 AS sale_id, DATE '2023-01-20' AS sale_date, 102 AS employee_id, 600 AS amount FROM dual ) SELECT SALE_ID, TO_CHAR(SALE_DATE, 'YYYY-MM-DD') AS sale_date, employee_id, amount, FIRST_VALUE(amount) OVER (PARTITION BY employee_id ORDER BY sale_date) AS "FIRST_VALUE" FROM ( SELECT SALE_ID, SALE_DATE, employee_id, amount, ROW_NUMBER() OVER (PARTITION BY employee_id ORDER BY sale_date) AS rn FROM Sales ) WHERE rn = 1; 안녕하세요 SQLD 준비하면서 강의 듣고 있는전진호 학생입니다. 이렇게 SUBQUERY에서 ROW-NUMBER를 이용해서 하지 않고, PARTITION BY 의 첫번째 값만 뽑아낼 수 있는 좀 더 쉬운 방법이 있을까요? 감사합니다.
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 과제
1번 문제WITH -- 1단계: 필수 데이터만 추출하기 base_events AS ( SELECT user_pseudo_id, DATE(TIMESTAMP_MICROS(event_timestamp), "Asia/Seoul") AS event_date FROM advanced.app_logs ), -- 2단계: 사용자별 첫 방문 주차와 각 활동 주차 구하기 user_weeks AS ( SELECT DISTINCT -- 중복 제거 user_pseudo_id, -- 사용자별 첫 방문 주차 (월요일 기준) DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, -- 실제 방문한 주차 (월요일 기준) DATE_TRUNC(event_date, WEEK(MONDAY)) AS visit_week FROM base_events ), -- 3단계: 첫 방문 이후 몇 주차인지 계산하기 week_numbers AS ( SELECT user_pseudo_id, -- 첫 방문 이후 경과된 주차 계산 DATE_DIFF(visit_week, first_week, WEEK) AS week_number FROM user_weeks -- 최대 12주까지만 분석 WHERE DATE_DIFF(visit_week, first_week, WEEK) <= 12 ), -- 4단계: 주차별 총 사용자 수 계산하기 weekly_users AS ( SELECT week_number, COUNT(DISTINCT user_pseudo_id) AS user_count FROM week_numbers GROUP BY week_number ) -- 5단계: 최종 리텐션 계산하기 SELECT week_number, user_count as active_users, FIRST_VALUE(user_count) OVER(ORDER BY week_number) as first_week_users, ROUND(100.0 * user_count / FIRST_VALUE(user_count) OVER(ORDER BY week_number), 2) as retention_rate FROM weekly_users ORDER BY week_number; 2번 문제WITH -- 1단계: 사용자별 주차 데이터 준비 user_weeks AS ( SELECT DISTINCT user_pseudo_id, -- 첫 방문 주차 DATE_TRUNC(MIN(DATE(TIMESTAMP_MICROS(event_timestamp), "Asia/Seoul")) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, -- 활동 주차 DATE_TRUNC(DATE(TIMESTAMP_MICROS(event_timestamp), "Asia/Seoul"), WEEK(MONDAY)) AS event_week FROM advanced.app_logs ), -- 2단계: 사용자 상태 확인 user_status AS ( SELECT user_pseudo_id, event_week, first_week, -- 이전 방문 주차 LAG(event_week) OVER(PARTITION BY user_pseudo_id ORDER BY event_week) AS prev_week FROM user_weeks ), -- 3단계: 상태 분류 weekly_status AS ( SELECT event_week, user_pseudo_id, CASE WHEN event_week = first_week THEN 'New' WHEN DATE_DIFF(event_week, prev_week, WEEK) = 1 THEN 'Current' WHEN DATE_DIFF(event_week, prev_week, WEEK) > 1 THEN 'Resurrected' END as user_type FROM user_status ), -- 4단계: 각 주차별 전체 유저 수 total_users AS ( SELECT fwd.event_week, COUNT(DISTINCT user_pseudo_id) AS total_user_count FROM user_status CROSS JOIN (SELECT DISTINCT event_week FROM user_weeks) AS fwd WHERE first_week <= fwd.event_week GROUP BY event_week ), -- 5단계: 주차별 활성 유저 수 계산 active_users AS ( SELECT event_week, COUNTIF(user_type = 'New') AS new_users, COUNTIF(user_type = 'Current') AS current_users, COUNTIF(user_type = 'Resurrected') AS resurrected_users, COUNT(DISTINCT user_pseudo_id) AS retain_users FROM weekly_status GROUP BY event_week ) -- 6단계: 최종 결과 SELECT FORMAT_DATE('%Y-%m-%d', a.event_week) as week_start, new_users, current_users, resurrected_users, (t.total_user_count - a.retain_users) as dormant_users, ROUND(100.0 * current_users / NULLIF(LAG(new_users) OVER(ORDER BY a.event_week), 0), 1) as retention_rate, retain_users as active_users, t.total_user_count as total_users FROM active_users a JOIN total_users t ON a.event_week = t.event_week ORDER BY a.event_week; 3번 문제WITH -- 1단계: 기본 데이터 준비 user_weeks AS ( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(MIN(DATE(TIMESTAMP_MICROS(event_timestamp), "Asia/Seoul")) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, DATE_TRUNC(DATE(TIMESTAMP_MICROS(event_timestamp), "Asia/Seoul"), WEEK(MONDAY)) AS event_week FROM advanced.app_logs ), -- 2단계: 사용자별 방문 주차 수 계산 user_visit_frequency AS ( SELECT user_pseudo_id, COUNT(DISTINCT event_week) as total_visit_weeks, DATE_DIFF(MAX(event_week), MIN(event_week), WEEK) + 1 as weeks_since_first, MIN(event_week) as first_visit_week FROM user_weeks GROUP BY user_pseudo_id ), -- 3단계: 사용자별 리텐션 점수 계산 user_retention_score AS ( SELECT user_pseudo_id, total_visit_weeks, weeks_since_first, ROUND(100.0 * total_visit_weeks / weeks_since_first, 2) as visit_rate, FORMAT_DATE('%Y-%m', first_visit_week) as cohort_month FROM user_visit_frequency WHERE weeks_since_first >= 4 -- 최소 4주 이상 경과된 사용자만 ), -- 4단계: 사용자 행동 데이터 분석 user_behavior AS ( SELECT r.user_pseudo_id, r.visit_rate, r.cohort_month, COUNT(DISTINCT DATE(TIMESTAMP_MICROS(l.event_timestamp), "Asia/Seoul")) as active_days, COUNT(DISTINCT l.event_timestamp) as total_events, COUNT(DISTINCT l.event_name) as unique_event_types FROM user_retention_score r JOIN advanced.app_logs l ON r.user_pseudo_id = l.user_pseudo_id GROUP BY r.user_pseudo_id, r.visit_rate, r.cohort_month ), -- 5단계: 리텐션 세그먼트별 행동 패턴 분석 retention_segments AS ( SELECT cohort_month, CASE WHEN visit_rate >= 75 THEN 'Very High (75%+)' WHEN visit_rate >= 50 THEN 'High (50-74%)' WHEN visit_rate >= 25 THEN 'Medium (25-49%)' ELSE 'Low (<25%)' END as retention_segment, COUNT(DISTINCT user_pseudo_id) as user_count, ROUND(AVG(visit_rate), 2) as avg_retention_rate, ROUND(AVG(active_days), 1) as avg_active_days, ROUND(AVG(total_events), 1) as avg_total_events, ROUND(AVG(unique_event_types), 1) as avg_unique_events FROM user_behavior GROUP BY cohort_month, CASE WHEN visit_rate >= 75 THEN 'Very High (75%+)' WHEN visit_rate >= 50 THEN 'High (50-74%)' WHEN visit_rate >= 25 THEN 'Medium (25-49%)' ELSE 'Low (<25%)' END ) -- 최종 결과 SELECT cohort_month, retention_segment, user_count, avg_retention_rate, avg_active_days, avg_total_events, avg_unique_events, ROUND(100.0 * user_count / SUM(user_count) OVER (PARTITION BY cohort_month), 2) as segment_percentage FROM retention_segments ORDER BY cohort_month, CASE retention_segment WHEN 'Very High (75%+)' THEN 1 WHEN 'High (50-74%)' THEN 2 WHEN 'Medium (25-49%)' THEN 3 WHEN 'Low (<25%)' THEN 4 END; 4번WITH -- 1단계: 결제 이벤트 기본 데이터 payment_base AS ( SELECT DISTINCT user_pseudo_id, DATE(TIMESTAMP_MICROS(event_timestamp), "Asia/Seoul") AS event_date FROM advanced.app_logs WHERE event_name = "click_payment" ), -- 2단계: 사용자별 첫 결제일과 주차 계산 user_weeks AS ( SELECT user_pseudo_id, -- 첫 결제 주차 DATE_TRUNC(MIN(event_date) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week, -- 실제 결제 주차 DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week FROM payment_base ), -- 3단계: 주차별 상태 계산 week_status AS ( SELECT user_pseudo_id, event_week, first_week, -- 첫 주차와의 차이 DATE_DIFF(event_week, first_week, WEEK) AS week_number FROM user_weeks WHERE DATE_DIFF(event_week, first_week, WEEK) <= 12 -- 최대 12주까지만 분석 ), -- 4단계: 주차별 활성 사용자 수 계산 weekly_users AS ( SELECT week_number, COUNT(DISTINCT user_pseudo_id) AS active_users FROM week_status GROUP BY week_number ), -- 5단계: 첫 주 사용자 수 (코호트 크기) 구하기 first_week_users AS ( SELECT COUNT(DISTINCT user_pseudo_id) AS cohort_size FROM week_status WHERE week_number = 0 ) -- 최종 결과 SELECT week_number as week, active_users, cohort_size, ROUND(100.0 * active_users / cohort_size, 2) as retention_rate FROM weekly_users CROSS JOIN first_week_users ORDER BY week_number; 서비스 성장세총 사용자: 52,823명까지 증가8월부터 12월까지 꾸준한 성장10월에 가장 높은 신규 유저 유입 (4,048명)리텐션 개선전체 리텐션: 8월 2.6% → 1월 47.9%로 큰 폭 개선12월부터 40% 이상의 안정적인 리텐션 유지재방문 사용자 비중이 지속적으로 증가 개선 필요점결제 리텐션이 매우 낮음 (1% 수준)신규 사용자 유입이 감소 추세휴면 사용자가 지속적으로 증가
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 3주차 과제] 리텐션 연습문제 및 추가 코딩테스트 문제 풀이
노션 링크로 과제 제출합니다!https://pleasant-moth-c20.notion.site/3-13dc19b9b6fe80dbb81dfa041247cfd7?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
바짝 스터디 3주차 과제 제출
연습문제 1번select *, TRUNC(user_count / first_count, 3) AS ratio from ( select *, first_value(user_count) over(order by week_diff asc) as first_count from ( select week_diff, count(*) as user_count from ( select *, date_diff(datetime_week, first_week, week) as week_diff from ( select distinct event_date, user_pseudo_id, platform, datetime_week, -- event_name, first_value(datetime_week) over (partition by user_pseudo_id order by datetime_week asc) as first_week from ( select * except(firebase_screen,event_name), DATETIME_TRUNC(CAST(event_date AS DATE), WEEK) AS datetime_week, concat(firebase_screen,'_',event_name) as event_name from ( select * except(event_param), max(if(event_param.key = 'firebase_screen',event_param.value.string_value , null)) as firebase_screen from ( select event_date, event_timestamp, user_pseudo_id, platform, event_name, event_param from `advanced.app_logs` cross join unnest(event_params) as event_param where event_date >= '2022-08-01' and event_date <= '2022-08-30' ) group by all ) ) ) ) group by week_diff ) ) order by week_diff ;연습문제2현재 시간에서 마지막 활동이 2주 이상 경과면 이탈마지막 활동이 현재 시간에서 1주 이내 인데 이전 활동과 갭이 2주이상이면 복귀마지막 action이 현재시간에서 1주내이고 이전 액션이 없다면 new나머진 activeselect user_pseudo_id, case when DATETIME_DIFF(now, datetime_week, WEEK) >= 2 then 'dormant' when DATETIME_DIFF(datetime_week, last_visit_week, WEEK) < 2 and DATETIME_DIFF(now, datetime_week, WEEK) < 2 then 'active' when DATETIME_DIFF(datetime_week, last_visit_week, WEEK) >= 2 and DATETIME_DIFF(now, datetime_week, WEEK) < 2 then 'resurrected' when DATETIME_DIFF(now, datetime_week, WEEK) < 2 and last_visit_week is null then 'new' else null end as status from ( select *, rank() over (partition by user_pseudo_id order by event_timestamp desc) as rank from ( select distinct event_date, user_pseudo_id, platform, datetime_week, event_timestamp, -- event_name, -- first_value(datetime_week) over (partition by user_pseudo_id order by datetime_week asc) as first_week lag(datetime_week) over(partition by user_pseudo_id order by event_timestamp) as last_visit_week, CAST('2022-08-31 00:00:00' AS DATETIME) as now, from ( select * except(firebase_screen,event_name), DATETIME_TRUNC(CAST(event_date AS DATE), WEEK) AS datetime_week, concat(firebase_screen,'_',event_name) as event_name from ( select * except(event_param), max(if(event_param.key = 'firebase_screen',event_param.value.string_value , null)) as firebase_screen from ( select event_date, event_timestamp, user_pseudo_id, platform, event_name, event_param from `advanced.app_logs` cross join unnest(event_params) as event_param where event_date >= '2022-08-01' and event_date <= '2022-08-30' ) group by all ) ) ) qualify rank=1 ) ;연습문제3, 연습문제4쿼리는 동일하게 사용하고 event_name을 변경하면서 리텐션이 가장 높은 값을 추적click_restaurant의 리텐션이 가장 높아 이를 핵심 이벤트로 지정해야 하나 싶긴 한데 결과가 조금 이상해서 쿼리 점검 필요 해보임..아래 쿼리에서 event_name만 click_payment로 변경하면 click_payment의 weekly retentionselect *, TRUNC(user_count / first_count, 3) AS ratio from ( select *, first_value(user_count) over(order by week_diff asc) as first_count from ( select week_diff, count(*) as user_count from ( select *, date_diff(datetime_week, first_week, week) as week_diff from ( select distinct user_pseudo_id, datetime_week, -- event_name, first_value(datetime_week) over (partition by user_pseudo_id order by datetime_week asc) as first_week from ( select *, DATETIME_TRUNC(CAST(event_date AS DATE), WEEK) AS datetime_week from ( select event_date, event_timestamp, user_pseudo_id, platform, event_name from `advanced.app_logs` where event_date >= '2022-08-01' and event_date <= '2022-08-30' and event_name='click_restaurant' ) ) ) ) group by week_diff ) ) order by week_diff ;
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 3주차 과제] 리텐션 연습문제 및 추가 코딩테스트 문제풀이
1) 3주차 추가 코딩테스트 대비 문제풀이1번문제 (제한시간: 15분 >> 소요시간 23분)문제: 주차별 각 카테고리별 평균 할인율이 가장 높았던 기간과 할인율을 구하는 쿼리를 작성해주세요단, 날짜 데이터를 YYYY-MM-DD 23:59:39 이런 형태로 변경해주세요WITH transaction_data AS ( SELECT 111 AS user_id, 1001 AS item_id, 719200 AS actual_price, '01/08/2024 12:00:00' AS transaction_date UNION ALL SELECT 111, 2002, 89000, '01/10/2024 12:00:00' UNION ALL SELECT 189, 2002, 89000, '01/12/2024 12:00:00' UNION ALL SELECT 156, 3002, 459000, '01/15/2024 12:00:00' UNION ALL SELECT 121, 1001, 719200, '01/18/2024 12:00:00' UNION ALL SELECT 156, 2001, 90300, '01/25/2024 12:00:00' UNION ALL SELECT 145, 3001, 399000, '01/26/2024 12:00:00' UNION ALL SELECT 189, 1002, 607200, '01/28/2024 12:00:00' UNION ALL SELECT 111, 3001, 399000, '02/05/2024 12:00:00' UNION ALL SELECT 178, 1002, 759000, '02/07/2024 12:00:00' UNION ALL SELECT 121, 2002, 62300, '02/08/2024 12:00:00' UNION ALL SELECT 156, 1001, 899000, '02/10/2024 12:00:00' UNION ALL SELECT 190, 2001, 90300, '02/11/2024 12:00:00' UNION ALL SELECT 189, 2001, 90300, '02/14/2024 12:00:00' UNION ALL SELECT 111, 1002, 759000, '02/15/2024 12:00:00' UNION ALL SELECT 156, 3001, 299250, '02/20/2024 12:00:00' UNION ALL SELECT 189, 3002, 344250, '02/25/2024 12:00:00' UNION ALL SELECT 111, 2001, 90300, '02/28/2024 12:00:00' ), item_info AS ( SELECT 1001 AS item_id, 'Electronics' AS category, 'Smartphone' AS item_name, 899000 AS list_price UNION ALL SELECT 1002 AS item_id, 'Electronics' AS category, 'Tablet' AS item_name, 759000 AS list_price UNION ALL SELECT 2001 AS item_id, 'Fashion' AS category, 'Sneakers' AS item_name, 129000 AS list_price UNION ALL SELECT 2002 AS item_id, 'Fashion' AS category, 'Backpack' AS item_name, 89000 AS list_price UNION ALL SELECT 3001 AS item_id, 'Home' AS category, 'Coffee Machine' AS item_name, 399000 AS list_price UNION ALL SELECT 3002 AS item_id, 'Home' AS category, 'Air Purifier' AS item_name, 459000 AS list_price ) SELECT *, RANK() OVER (ORDER BY avg_sales_ratio DESC) as rnk FROM ( SELECT weekly, category, AVG(sales_ratio) AS avg_sales_ratio FROM ( SELECT DATE_TRUNC(DATETIME(CONCAT(SPLIT(SPLIT(a.transaction_date, '/')[OFFSET(2)], ' ')[OFFSET(0)],'-',SPLIT(a.transaction_date, '/')[OFFSET(0)] , '-',SPLIT(a.transaction_date, '/')[OFFSET(1)])), WEEK(MONDAY)) AS weekly, a.item_id, b.category, b.list_price, a.actual_price, ROUND(1- ROUND(SAFE_DIVIDE(actual_price, list_price),4),4) AS sales_ratio FROM transaction_data a LEFT JOIN item_info b ON a.item_id = b.item_id ) GROUP BY ALL ) QUALIFY rnk = 1 ORDER BY 1 ;2번문제 (제한시간: 10분 >> 소요시간 7분)문제: 2024년 1월에 가장 많은 매출을 기록한 카테고리를 구하는 쿼리를 작성해주세요WITH transaction_data AS ( SELECT 111 AS user_id, 1001 AS item_id, 719200 AS actual_price, '01/08/2024 12:00:00' AS transaction_date UNION ALL SELECT 111, 2002, 89000, '01/10/2024 12:00:00' UNION ALL SELECT 189, 2002, 89000, '01/12/2024 12:00:00' UNION ALL SELECT 156, 3002, 459000, '01/15/2024 12:00:00' UNION ALL SELECT 121, 1001, 719200, '01/18/2024 12:00:00' UNION ALL SELECT 156, 2001, 90300, '01/25/2024 12:00:00' UNION ALL SELECT 145, 3001, 399000, '01/26/2024 12:00:00' UNION ALL SELECT 189, 1002, 607200, '01/28/2024 12:00:00' UNION ALL SELECT 111, 3001, 399000, '02/05/2024 12:00:00' UNION ALL SELECT 178, 1002, 759000, '02/07/2024 12:00:00' UNION ALL SELECT 121, 2002, 62300, '02/08/2024 12:00:00' UNION ALL SELECT 156, 1001, 899000, '02/10/2024 12:00:00' UNION ALL SELECT 190, 2001, 90300, '02/11/2024 12:00:00' UNION ALL SELECT 189, 2001, 90300, '02/14/2024 12:00:00' UNION ALL SELECT 111, 1002, 759000, '02/15/2024 12:00:00' UNION ALL SELECT 156, 3001, 299250, '02/20/2024 12:00:00' UNION ALL SELECT 189, 3002, 344250, '02/25/2024 12:00:00' UNION ALL SELECT 111, 2001, 90300, '02/28/2024 12:00:00' ), item_info AS ( SELECT 1001 AS item_id, 'Electronics' AS category, 'Smartphone' AS item_name, 899000 AS list_price UNION ALL SELECT 1002 AS item_id, 'Electronics' AS category, 'Tablet' AS item_name, 759000 AS list_price UNION ALL SELECT 2001 AS item_id, 'Fashion' AS category, 'Sneakers' AS item_name, 129000 AS list_price UNION ALL SELECT 2002 AS item_id, 'Fashion' AS category, 'Backpack' AS item_name, 89000 AS list_price UNION ALL SELECT 3001 AS item_id, 'Home' AS category, 'Coffee Machine' AS item_name, 399000 AS list_price UNION ALL SELECT 3002 AS item_id, 'Home' AS category, 'Air Purifier' AS item_name, 459000 AS list_price ) SELECT category FROM ( SELECT transaction_month, category, SUM(actual_price) AS category_sum_price FROM ( SELECT DATE_TRUNC(DATETIME(CONCAT(SPLIT(SPLIT(a.transaction_date, '/')[OFFSET(2)], ' ')[OFFSET(0)],'-',SPLIT(a.transaction_date, '/')[OFFSET(0)] , '-',SPLIT(a.transaction_date, '/')[OFFSET(1)])), MONTH) AS transaction_month, a.item_id, b.category, a.actual_price FROM transaction_data a LEFT JOIN item_info b ON a.item_id = b.item_id ) WHERE transaction_month = "2024-01-01" GROUP BY ALL ) ORDER BY category_sum_price DESC LIMIT 1 ;3번문제 (제한시간: 10분 >> 소요시간 6분 40초)문제: 유저별 총 구매 금액이 200만원 이상인 유저들이 가장 많이 구매한 카테고리를 찾는 쿼리를 작성해주세요WITH transaction_data AS ( SELECT 111 AS user_id, 1001 AS item_id, 719200 AS actual_price, '01/08/2024 12:00:00' AS transaction_date UNION ALL SELECT 111, 2002, 89000, '01/10/2024 12:00:00' UNION ALL SELECT 189, 2002, 89000, '01/12/2024 12:00:00' UNION ALL SELECT 156, 3002, 459000, '01/15/2024 12:00:00' UNION ALL SELECT 121, 1001, 719200, '01/18/2024 12:00:00' UNION ALL SELECT 156, 2001, 90300, '01/25/2024 12:00:00' UNION ALL SELECT 145, 3001, 399000, '01/26/2024 12:00:00' UNION ALL SELECT 189, 1002, 607200, '01/28/2024 12:00:00' UNION ALL SELECT 111, 3001, 399000, '02/05/2024 12:00:00' UNION ALL SELECT 178, 1002, 759000, '02/07/2024 12:00:00' UNION ALL SELECT 121, 2002, 62300, '02/08/2024 12:00:00' UNION ALL SELECT 156, 1001, 899000, '02/10/2024 12:00:00' UNION ALL SELECT 190, 2001, 90300, '02/11/2024 12:00:00' UNION ALL SELECT 189, 2001, 90300, '02/14/2024 12:00:00' UNION ALL SELECT 111, 1002, 759000, '02/15/2024 12:00:00' UNION ALL SELECT 156, 3001, 299250, '02/20/2024 12:00:00' UNION ALL SELECT 189, 3002, 344250, '02/25/2024 12:00:00' UNION ALL SELECT 111, 2001, 90300, '02/28/2024 12:00:00' ), item_info AS ( SELECT 1001 AS item_id, 'Electronics' AS category, 'Smartphone' AS item_name, 899000 AS list_price UNION ALL SELECT 1002 AS item_id, 'Electronics' AS category, 'Tablet' AS item_name, 759000 AS list_price UNION ALL SELECT 2001 AS item_id, 'Fashion' AS category, 'Sneakers' AS item_name, 129000 AS list_price UNION ALL SELECT 2002 AS item_id, 'Fashion' AS category, 'Backpack' AS item_name, 89000 AS list_price UNION ALL SELECT 3001 AS item_id, 'Home' AS category, 'Coffee Machine' AS item_name, 399000 AS list_price UNION ALL SELECT 3002 AS item_id, 'Home' AS category, 'Air Purifier' AS item_name, 459000 AS list_price ) SELECT category FROM ( SELECT category, SUM(actual_price) AS sum_purchase_price FROM ( SELECT a.user_id, a.item_id, b.category, a.actual_price FROM transaction_data a LEFT JOIN item_info b ON a.item_id = b.item_id WHERE a.user_id IN ( SELECT user_id FROM ( SELECT user_id, SUM(actual_price) AS sum_purchase_price FROM transaction_data a GROUP BY ALL HAVING SUM(actual_price) >= 2000000 ) ) ) GROUP BY ALL ) ORDER BY sum_purchase_price DESC LIMIT 1 2) 리텐션 연습문제 풀이해당 연습문제는 전부 해결하지 못해 최대한 기한 내에 풀이를 완료한 쿼리를 작성하게 되었습니다. 완료하지 못한 과제는 추후에라도 풀이 완료하여 수정 업로드 하겠습니다.문제 1) Weekly Retention을 구하는 쿼리를 바닥부터 스스로 작성해보세요.WITH base AS ( # event데이터의 raw데이터를 추출 SELECT DISTINCT user_id, user_pseudo_id, DATE(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_date, event_name FROM advanced.app_logs WHERE event_date BETWEEN "2022-08-01" AND "2022-12-31" ), first_week_diff_event_week AS ( # 유저별 첫번쨰 week, 이벤트 발생 week, 주차별 차이 데이터 추출 SELECT DISTINCT user_pseudo_id, first_week, event_week, DATE_DIFF(event_week, first_week, WEEK) AS weeks_after_first_week FROM ( SELECT user_pseudo_id, event_name, DATE_TRUNC(event_date, WEEK(MONDAY)) AS event_week, DATE_TRUNC(MIN(event_date) OVER (PARTITION BY user_pseudo_id ORDER BY event_date), WEEK(MONDAY)) AS first_week FROM base ) ), user_counts AS ( # 첫번째 이벤트 발생주차 | 이벤트 Retain 발생차이 | 유저 수 | 코호트 유저 수 SELECT *, FIRST_VALUE(user_cnt) OVER (ORDER BY weeks_after_first_week) AS cohort_user FROM ( SELECT weeks_after_first_week, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM first_week_diff_event_week GROUP BY ALL ) ) # Retention Rate 추출 SELECT *, ROUND(SAFE_DIVIDE(user_cnt, cohort_user), 4) AS retention_rate FROM user_counts ORDER BY 1,2 ;
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝 스터디 3주차과제] 리텐션 연습문제
노션링크 첨부합니다.https://focapa.notion.site/3-13c094a2b89980719923c79f9298fefd?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 분석
노션으로 과제 제출합니다. https://www.notion.so/12735b8a6e338052a9cbf91cab8fdce2?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빅쿼리 빠짝스터디 3주차] 리텐션 분석
노션으로 작성하였습니다.https://apple-baroness-590.notion.site/3-13aacf7d68f680179560f0ac848277f5?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 3주차 과제] 리텐션 연습문제
문제 1) Weekly Retention을 구하는 쿼리를 바닥부터 스스로 작성해보세요 WITH base AS ( SELECT DISTINCT user_pseudo_id, DATE_TRUNC(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul'), WEEK(MONDAY)) AS event_week, DATE_TRUNC(MIN(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) OVER(PARTITION BY user_pseudo_id), WEEK(MONDAY)) AS first_week FROM `advanced.app_logs` ), first_week_diff AS ( SELECT user_pseudo_id, DATE_DIFF(event_week, first_week, WEEK) AS diff_of_week FROM base ), user_counts AS ( SELECT diff_of_week, COUNT(DISTINCT user_pseudo_id) AS user_count FROM first_week_diff GROUP BY diff_of_week ) SELECT diff_of_week, user_count, ROUND(SAFE_DIVIDE(user_count, FIRST_VALUE(user_count) OVER (ORDER BY diff_of_week ASC)), 2) AS retention_rate FROM user_counts ORDER BY diff_of_week; 문제 2) Retain User를 New + Current + Resurrected + Dormant User로 나누는 쿼리를 작성해보세요. WITH event_data AS ( SELECT DISTINCT DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')) AS event_date, user_pseudo_id FROM advanced.app_logs ), user_visits AS ( SELECT user_pseudo_id, MIN(event_date) AS first_visit, MAX(event_date) AS last_visit -- 최근 방믄 FROM event_data GROUP BY user_pseudo_id ), -- 최신일자 current_date AS ( SELECT MAX(event_date) AS max_date FROM event_data ), user_types AS ( SELECT u.user_pseudo_id, u.first_visit, u.last_visit, c.max_date as current_date, DATE_DIFF(c.max_date, u.last_visit, DAY) as days_since_last_visit, CASE WHEN DATE_DIFF(c.max_date, u.last_visit, DAY) <= 14 THEN 'current' -- 최근 2주 내 방문 WHEN DATE_DIFF(c.max_date, u.last_visit, DAY) > 60 THEN 'dormant' -- 60일 이상 미접속 WHEN DATE_DIFF(c.max_date, u.last_visit, DAY) > 30 AND DATE_DIFF(c.max_date, u.last_visit, DAY) <= 44 AND EXISTS ( SELECT 1 FROM event_data e WHERE e.user_pseudo_id = u.user_pseudo_id AND DATE_DIFF(c.max_date, e.event_date, DAY) <= 14 ) THEN 'resurrected' -- 30일 이상 비활동 후 최근 14일 내 재접속 WHEN DATE_DIFF(c.max_date, u.last_visit, DAY) > 60 THEN 'inactive' -- 60일 이상 비활동 ELSE 'none' END AS user_type FROM user_visits u CROSS JOIN current_date c ), weekly_retention AS ( SELECT ut.user_type, DATE_DIFF(DATE_TRUNC(ed.event_date, WEEK), DATE_TRUNC(ut.first_visit, WEEK), WEEK) as week_number, COUNT(DISTINCT ed.user_pseudo_id) as user_count FROM event_data ed JOIN user_types ut ON ed.user_pseudo_id = ut.user_pseudo_id GROUP BY 1, 2 ), retention_rates AS ( SELECT user_type, week_number, user_count, FIRST_VALUE(user_count) OVER ( PARTITION BY user_type ORDER BY week_number ) as initial_users FROM weekly_retention ) SELECT user_type, week_number, user_count as active_users, initial_users as cohort_size, ROUND(user_count * 100.0 / initial_users, 2) as retention_rate FROM retention_rates ORDER BY user_type, week_number; 문제 3) 주어진 데이터에서 어떤 사람들이 리텐션이 그나마 높을까요? 찾아보세요 - current의 경우 15주까지는 리텐션 유지 및 증가하며 큰 변동 없다가 그 후부터 지속적으로 하락하였다. - current는 그래도 리텐션율이 9.X로 시작했으나 dormant는 7.X로 시작하였고 더 빠르게 유입율이 감소하며 리텐션이 좋지 않았다. 문제 4) Core Event를 “click_payment”라고 설정하고 Weekly Retention을 구해주세요 WITH base AS ( SELECT user_pseudo_id, DATE_TRUNC(DATE(DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul')), WEEK(MONDAY)) as week FROM advanced.app_logs WHERE event_name = "click_payment" ), cal_week AS ( SELECT user_pseudo_id, week as visit_week, MIN(week) OVER (PARTITION BY user_pseudo_id) as first_week FROM base ), weekly_retention AS ( SELECT first_week DATE_DIFF(visit_week, first_week, WEEK) as week_number, COUNT(DISTINCT user_pseudo_id) as user_count FROM cal_week GROUP BY first_week, DATE_DIFF(visit_week, first_week, WEEK) ), cohort_sizes AS ( SELECT first_week, MAX(CASE WHEN week_number = 0 THEN user_count END) as initial_users FROM weekly_retention GROUP BY cohort_week ) SELECT wr.first_week, cs.initial_users as total_users, wr.week_number, wr.user_count as active_users, ROUND(wr.user_count * 100.0 / cs.initial_users, 2) as retention_rate FROM weekly_retention wr JOIN cohort_sizes cs ON wr.cohort_week = cs.cohort_week WHERE wr.week_number >= 0 ORDER BY wr.cohort_week, wr.week_number;