묻고 답해요
141만명의 커뮤니티!! 함께 토론해봐요.
인프런 TOP Writers
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[1]_[인프런 빅쿼리 빠짝스터디 3주차] 리텐션 과제와 추가 문제 풀이
[종합 느낀점]1. 날짜 데이터 변환 함수, 날짜 데이터로 추출 하는 함수 등 문제 풀 때 사용하는 함수를 까먹어 애를 먹었습니다.2. 테이블 조인시 첫 테이블을 잘못 선택하면, 쿼리가 길어지고 꼬인다는 것을 느꼈습니다.3. 지표가 어떤 지표느냐에 따라 해석 방법이 달라진 점도 알게 됐습니다. [문제 풀이 소요 시간]1번 문제 : 30분2번 문제 : 12분3번 문제 : 12분 정답이면서도 확장성 있는 쿼리를 작성하다가는 영원히 못풀겠더라고요..그래서 최대한 쿼리 작성을 마무리 짓는데 초점을 뒀습니다.. ㅠㅠ [new_users]위와 같은 시각화 결과가 나왔습니다.눈에 띄는 데이터로는 두 가지가 있습니다:1. 10월 new users 상승 상승 추정 이유: 1. 새로운 마케팅 전략 시행(시기가 추석 근처이니깐 추석을 기념해서 배달앱 쿠폰 지급?)2. 12월 new users 하락 하락 추정 이유: 1. 연말 모임으로 인한 배달 수요 감소 (사실 감소세는 10월달에 고점을 찍은 이후 쭉 하락세이지만, 그 이유를 단정짓지는 못하겠습니다..) [Current User]위와 같은 시각화 결과가 나왔습니다.눈에 띄는 데이터로는 세가지가 있습니다:1. 9월 중순 이후 Current User 상승 상승 추정 이유: 1. 새로운 마케팅 전략 시행(시기가 추석 근처이니깐 추석을 기념해서 배달앱 쿠폰 지급?)2. 12월까지 꾸준한 상승 상승 추정 이유: 1. 기존 유저의 긍정적인 경험이 이어져서 서비스에 잔류 2. 마케팅으로 유입된 신규 유저도 긍정적인 경험을 해서 서비스에 잔류3. 12월 Current User 하락 하락 추정 이유: 1. 연말 모임으로 인한 배달 수요 감소 [Resurrected User]'Current User' 해석과 비슷할 거 같습니다. [Dormant User]1. 9월 -> 10월 급격한 상승 상승 추정 이유: 1. 앞선 이유과 같음. 추석 관련 이벤트 실시. 2. 그러나 계속 상승이 이어지지 않은 점을 고민해 봐야 함.2. 10월 -> 11월 말 유지 유지 추정 이유: 1. 마케팅으로 긍정적인 경험을 한 기존·신규 유저 잔류 2. 하지만 입소문이 나서 계속 상승하지는 않음. 3. 일부 유저만 잔류하고, 나머지 신규 유입 유저는 이벤트만 맛보고 이탈.3. 12월 급격한 상승 상승 추정 이유: 1. 연말에 배달 수요가 감소할 것을 예측해 새로운 마케팅 전략 실시 주차별 리텐션# 리텐션 정의 -- "우리 서비스를 이용한 사람들이 현재는 얼마큼 이용하고 있을까?" /* 1. 무슨 데이터가 필요할까? - 유저 - 날짜 왜 이런 데이터가 필요할까? 이에 대한 답변: '사람들'이니깐, '유저'를 인식할 수 있는 데이터가 필요하다. '현재'이니깐, '시간'을 인식할 수 있는 데이터가 필요하다. ## 1. 유저별 첫 방문 주 구하기*/ with first_visit_table as ( select user_pseudo_id, date_trunc(min(event_date), week(monday)) as first_visit_date from `advanced.app_logs` where event_date between '2022-08-01' and '2022-12-31' group by 1 ) /*2. 처음 이벤트를 발생 시킨 주는? - 무슨 데이터가 필요할까? 1. 첫 이벤트 발생 날짜 2. 유저 - 왜 이런 데이터가 필요할까? 이에 대한 답변: '이벤트'는 주체가 있어야 발동 되는 거니깐 '유저' 데이터가 필요하다. 또 이벤트 발생이란 결과에는 '날짜'라는 개념도 포함 되어 있기 때문에, '날짜' 데이터가 필요하다. 그중에서도 '처음 이벤트를 발생한 주'의 데이터가 필요하다. 3. 유저가 이벤트를 실행한 주는? - 무슨 데이터가 필요할까? 1. 유저 2. 이벤트 이름(선택) 3. 이벤트 발생 날짜 + 중복제거 왜? 목적이 모든 유저의 리텐션이기 때문에. 중복 제거를 하지 않으면, 모든 유저가 발생시킨 모든 이벤트가 보이기 때문에 가독성이 안 좋기 때문에. - 왜 이런 데이터가 필요할까? 이에 대한 답변: 처음 이벤트를 발생시킨 날짜와 비교하기 위해서 ## 2. 유저별 이벤트 발생 주 구하기*/ , user_event as ( select distinct user_pseudo_id, date_trunc(event_date, week(monday)) as user_event_week from `advanced.app_logs` where event_date between '2022-08-01' and '2022-12-31' ) /*4. '유저가 이벤트 실행한 주 - 유저가 처음 방문한 주' - 무슨 데이터가 필요할까? 1. 유저 2. 첫 이벤트 발생 날짜 3. 이벤트 발생 날짜 4. (이벤트 발생 날짜 - 첫 이벤트 발생 날짜) - 왜 이런 데이터가 필요할까? 이에 대한 답변: 처음 이벤트를 발생시킨 날짜와 비교하기 위해서 ## 3. 유저별 이벤트 발생주와 첫 방문주 차이 구하기*/ , diff_week_table as ( select fvt.user_pseudo_id, fvt.first_visit_date, ue.user_event_week, date_diff(ue.user_event_week, fvt.first_visit_date, week(monday)) as diff_week from first_visit_table as fvt left join user_event as ue on fvt.user_pseudo_id = ue.user_pseudo_id order by 2, 4 ) /*5. 주차별 유저수 구하기 - 왜 위와 같은 생각을 했는가? 이에 대한 답변: 주차수에 따른 유저수를 알아야만 첫 주차에 비해 얼마큼 유저가 이벤트를 발생시키지 않았는지 알 수 있으니깐. - 무슨 데이터가 필요한가? 이에 대한 답변: 1. 이벤트 발생 차이 주 2. 중복 없는 유저 수 ## 4. 주차별 유저수 구하기*/ , unique_user_cnt_table as ( select diff_week, count(distinct user_pseudo_id) as unique_user_cnt from diff_week_table group by 1 order by 1 ) /*6. 모든 행에 첫 주 유저수 적용하기 - 왜 위와 같은 생각을 했는가? 이에 대한 답변: 첫 주 유저수로 주차마다 달라지는 유저수를 나누면, 리텐션을 구할 수 있기 때문에. - 무슨 데이터가 필요한가? 이에 대한 답변: 1. 이벤트 발생 차이주 2. 주차별 중복 없는 유저수 3. 모든 행에 적용하는 첫 주 유저수 ## 5. 모든 행에 첫 주 유저수 적용하기*/ , first_visit_week_data_table as ( select *, first_value(unique_user_cnt) over(order by unique_user_cnt desc) AS first_visit_week_data from unique_user_cnt_table order by 1 ) ## 6. 리텐션 구하기 | 주차별 유저수 / 첫 주 유저수 select *, round(safe_divide(unique_user_cnt, first_visit_week_data), 2) as retention_rate from first_visit_week_data_table
-
해결됨초보자를 위한 BigQuery(SQL) 입문
1-2 환경설정에서 마지막 실습 구문이 작동이 안돼요.
강의 내용 따라서 테이블 생성에는 문제가 없었어요.새 쿼리 탭에서 select* (쿼리에선 탭 1회)from 'basic.pokemon'order by id 로 작성하였는데 구문 오류라고 뜹니다.데이터셋 이름도 basic이고 테이블도 pokemon으로 해두었는데 왜 이러는 걸까요? 혹시나 싶어 'bigquery101-441101.basic.pokemon' 으로도 시도해 보았는데 마찬가지로 구문 오류가 떴습니다. 데이터셋이나 테이블 이름에 space가 들어갔나 싶어, 테이블셋 세부 정보에서 테이블 ID 복사 기능으로 붙여넣어도 작동이 안되는 상황입니다.
-
해결됨초보자를 위한 BigQuery(SQL) 입문
4-5. 3번 시간데이터 연습문제 질문
3. 각 트레이너별로 그들이 포켓몬을 포획한 첫 날(catch_date)을 찾고, 그 날짜를 'DD/MM/YYYY' 형식으로 출력해주세요.해당 문제에서 catch_date는 UTC 기준의 데이터이므로, 한국 기준으로 하려면 catch_datetime을 사용해야 한다고 하셨는데요!테이블을 보면, TIMESTAMP 타입인 catch_datetime만 UTC 기준의 데이터인 것으로 이해했는데DATE 타입인 catch_date가 UTC 기준의 데이터인 이유가 무엇인가요?
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
빠짝스터디 2주차 윈도우 함수
--1. 사용자별 쿼리를 실행한 총 횟수를 구하는 쿼리를 작성해주세요. 단, group by를 사용해서 집곟나ㅡㄴ 것이 아닌 query_logs의 데이터의 우측에 새로운 컬럼을 만들어주세요. select *, count(query_date) over(partition by user) as total_query_cnt from advanced.query_logs order by 1, 3 --2. 주차별로 팀 내에서 쿼리를 많이 실행한 수를 구한 후, 실행한 수를 활용해 랭킹을 구해주세요. 단, 랭킹이 1등인 사람만 결과가 보이게 해주세요 with query_cnt_by_team as( select extract(WEEK from query_date) as week_number, team, user, count(user) as query_cnt from advanced.query_logs group by all ) select *, rank() over(partition by week_number, team order by query_cnt desc ) as rk from query_cnt_by_team qualify rk = 1 order by 1, 2, 4 desc -- 3. (2번 문제에서 사용한 주차별 쿼리 사용) 쿼리를 실행한 시점 기준 1주 전에 쿼리 실행 수를 별도의 컬럼으로 확인할 수 있는 쿼리를 작성해주세요. with query_cnt_by_team as( select extract(WEEK from query_date) as week_number, team, user, count(user) as query_cnt from advanced.query_logs group by all ) select *, lag(query_cnt, 1) over(partition by user order by week_number) as prev_week_query_cnt from query_cnt_by_team -- 4. 시간의 흐름에 따라 일자별로 유저가 실행한 누적 쿼리 수를 작성해주세요. select *, sum(query_cnt) over (partition by user order by query_date rows between unbounded preceding and current row) as cumulative_sum from ( select query_date, user, count(user) as query_cnt from advanced.query_logs group by all ) order by 2, 1 -- 5. 다음 데이터는 주문 횟수를 나타낸 데이터입니다. 만약 주문 횟수가 없다면 NULL로 기록됩니다. 이런 데이터에서 NULL 값이라고 되어있는 부분을 바로 이전 날짜의 값으로 채워주는 쿼리를 작성해주세요. WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ), raw_data2 as( select *, last_value(raw_data.number_of_orders ignore nulls) over(order by date) as last_value_orders from raw_data ) select * from raw_data2 WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ), raw_data2 as( select *, last_value(raw_data.number_of_orders ignore nulls) over(order by date) as last_value_orders from raw_data ) --6. 5번 문제에서 NULL을 채운 후, 2일 전 ~ 현재 데이터의 평균을 구하는 쿼리를 작성해주세요(이동평균) select * except(number_of_orders), avg(last_value_orders) over (order by date rows between 2 preceding and current row) as moving_avg from raw_data2 --7. app_logs 테이블에서 Custom session을 만들어 주세요. 이전 이벤트 로그와 20초가 지나면 새로운 session을 만들어 주세요. session은 숫자로 (1, 2, 3..) 표시해도 됩니다. -- 2022-08-18의 user_pseudo_id(1997494153.8491999091)은 session_id가 4까지 나옵니다 with base as( select event_date, datetime(timestamp_micros(event_timestamp), 'Asia/Seoul') as event_datetime, event_name, user_id, user_pseudo_id from advanced.app_logs where event_date = "2022-08-18" and user_pseudo_id = "1997494153.8491999091" order by event_timestamp ), diff_data as ( select *, datetime_diff(event_datetime, prev_event_datetime, second) as second_diff from ( select *, lag(event_datetime, 1) over(partition by user_pseudo_id order by event_datetime) as prev_event_datetime from base order by event_datetime ) ) select *, sum(session_start) over(partition by user_pseudo_id order by event_datetime) as session_number from ( select *, case when prev_event_datetime is null then 1 when second_diff >= 20 then 1 else null end as session_start from diff_data )
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
빠짝스터디 2주차 과제 윈도우 함수
윈도우 함수 연습문제 1select *, count(*) over(partition by user) as total_query_cnt from `advanced.query_logs`;윈도우 함수 연습문제 2select *, rank() over(partition by team, week_number order by query_cnt desc) as team_rank from ( select user, week_number, team, count(*) as query_cnt from ( select *,extract(week from query_date) as week_number from `advanced.query_logs` ) group by all ) qualify team_rank=1;윈도우 함수 연습문제3select *, lag(query_cnt) over(partition by user order by week_number asc) as prev_week_query_cnt from ( select user, week_number, team, count(*) as query_cnt from ( select *,extract(week from query_date) as week_number from `advanced.query_logs` ) group by all );윈도우 함수 연습문제4select *, sum(query_count) over(partition by user order by query_date asc) as cumulative_query_cnt from ( select user, query_date, team, count(*) as query_count from `advanced.query_logs` group by all );윈도우 함수 연습문제5SELECT date, case when number_of_orders is null then lag(number_of_orders) over(order by date asc) else number_of_orders end as number_of_orders FROM raw_data;윈도우 함수 연습문제6select *, avg(number_of_orders) over(order by date asc rows between 2 preceding and current row) as moving_average from ( SELECT date, case when number_of_orders is null then lag(number_of_orders) over(order by date asc) else number_of_orders end as number_of_orders, FROM raw_data ) ;윈도우 함수 연습문제7select *, sum(session_start) over(partition by user_pseudo_id order by event_timestamp asc) as session_id from ( select *, case when time_diff is null then 1 when time_diff >= 20 then 1 else null end as session_start from ( select *, cast((event_timestamp - before_event_timestamp)/1000000 as int) as time_diff from ( select event_date, event_timestamp, event_name, user_id, user_pseudo_id, lag(event_timestamp) over(partition by user_pseudo_id order by event_timestamp asc) as before_event_timestamp from `advanced.app_logs` where user_pseudo_id='1997494153.8491999091' and event_date='2022-08-18' ) ) );
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
윈도우 함수 연습 문제
윈도우함수(탐색함수) 연습문제연습문제1-- 문제 1) user들의 다음 접속 월과 다다음 접속 월을 구하는 쿼리를 작성해주세요 select user_id, visit_month, lead(visit_month) over (partition by user_id order by visit_month asc) as after_visit_month, lead(visit_month, 2) over (partition by user_id order by visit_month asc) as after_visit_month from `advanced.analytics_function_01` order by user_id;연습문제2-- 문제2) user들의 다음 접속 월과 다다음 접속 월, 이전 접속 월을 구하는 쿼리를 작성해주세요. select *, lead(visit_month) over (partition by user_id order by visit_month asc) as after_visit_month, lead(visit_month, 2) over (partition by user_id order by visit_month asc) as after_two_visit_month, lag(visit_month) over(partition by user_id order by visit_month asc) as before_visit_month from `advanced.analytics_function_01` order by user_id, visit_month;연습문제3 -- 3번 유저가 접속했을 때 다음 접속까지의 간격을 구하시오 select *, after_visit_month - visit_month as diff from( select *, lead(visit_month, 1) over (partition by user_id order by visit_month) as after_visit_month from `advanced.analytics_function_01` ) order by user_id, visit_month;윈도우 함수 frame 연습문제SELECT * , SUM(amount) OVER(ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS amount_total , SUM(amount) OVER(ORDER BY order_id) AS cumulative_sum , SUM(amount) OVER(PARTITION BY user_id ORDER BY order_id) AS cumulative_sum_by_user , AVG(amount) OVER(ORDER BY order_id ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING) AS last_5_orders_avg_amount FROM `advanced.orders` ORDER BY order_id 윈도우 함수 연습문제연습문제1-- 1. 사용자별 쿼리를 실행한 총 횟수를 구하는 쿼리를 작성해주세요 단, group by를 사용해서 집계하는것이 아닌 quary_log의 데이터의 우측에 새로운 컬럼을 작성해주세요 select *, count(query_date) over (partition by user) as total_query_cnt from `advanced.query_logs` order by user,query_date;연습문제2-- 2. 주차별로 팀 내에서 쿼리를 많이 실행한수를 구한후 실행 수를 활요해 랭킹을 구해주세요. 단 랭킹이 1등인 사람만 보여주세요. with query_cnt_by_team AS ( select extract(week FROM query_date) as week_number, team, user, count(user) as query_cnt from `advanced.query_logs` group by all ) select *, rank() over(partition by week_number, team order by query_cnt desc) AS rk from query_cnt_by_team qualify rk = 1 order by week_number, team, query_cnt desc; 연습문제3-- 3. (2번문제에서 사용한 주차별 쿼리 사용) 쿼리를 실행한 시점 기준 1주 전에 쿼리 실행 수를 별도의 컬럼으로 확인할수 있는 쿼리를 작성해주세요. with query_cnt_by_team AS ( select extract(week FROM query_date) as week_number, team, user, count(user) as query_cnt from `advanced.query_logs` group by all ) select *, lag(query_cnt, 1) over(partition by user order by week_number) as prev_week_query_cnt from query_cnt_by_team ; 연습문제4-- 4. 시간에 흐름에 따라 일자별로 유저가 실행한 누적 쿼리 수를 작성해주세요(query_date) select *, sum(query_cnt) over(partition by user order by query_date) as cumulative_sum, sum(query_cnt) over(partition by user order by query_date rows between unbounded preceding and current row) as cumulative_sum2 from( select query_date, team, user, count(user) as query_cnt from `advanced.query_logs` group by all ) order by user, query_date;연습문제5-- 5. 다음 데이터는 주문 횟수를 나타낸 데이터입니다. 만약 주문 횟수가 없으면 null로 기록됩니다.이런 데이터에서 null값이라고 되어 있는 부분은 바로 이전 날짜의 값으로 채워주는 쿼리를 작성해주세요 WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ) select *, last_value(number_of_orders ignore nulls) over(order by date) as last_value_orders from raw_data;연습문제6-- 6. 5번 문제에서 null을 채운후 2일전 ~ 현재의 데이터의 평균을 구하는 쿼리를 작성해주세요(이동평균) WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ) , filled_data as( select * except(number_of_orders), last_value(number_of_orders ignore nulls) over(order by date) as number_of_orders from raw_data ) select *, avg(number_of_orders) over (order by date rows between 2 preceding and current row) as moving_avg from filled_data연습문제7-- 7) app_logs 테이블에서 custom session을 만들어 주세요 이전 이벤트 로그와 20초가 지나면 새로운 session을 만들어 주세요 session 숫자로 (1,2,3 ...)표시해도 됩니다. -- 2022-08-18일의 user_pseudo_id(1997494153.8491999091)은 session_id가 4까지 나옵니다. with base as( select event_date, datetime(timestamp_micros(event_timestamp), 'Asia/Seoul') as event_datetime, event_name, user_id, user_pseudo_id from advanced.app_logs where event_date = "2022-08-18" and user_pseudo_id = "1997494153.8491999091" ), diff_Data as( select *, from( select *, lag(event_datetime, 1) over(partition by user_pseudo_id order by event_datetime) as prev_event_datetime from base ) ) select *, sum(session_start) over (partition by user_pseudo_id order by event_datetime) as session_num from( select *, case when prev_event_datetime is null then 1 when second_diff >= 20 then 1 else 0 end as session_start from diff_data ) order by event_datetime
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 2주차 과제] 윈도우 함수 연습문제
윈도우 함수의 탐색 함수 : LEAD, LAG, FIRST_VALUE, LAST_VALUE## 문제1) user들의 다음 접속 월과 다다음 접속 월을 구하는 쿼리를 작성해주세요. -- SELECT -- user_id -- , visit_month -- , LEAD(visit_month, 1) OVER (PARTITION BY user_id ORDER BY visit_month) AS after_month -- , LEAD(visit_month, 2) OVER (PARTITION BY user_id ORDER BY visit_month) AS after_next_month -- FROM `advanced.analytics_function_01` -- ORDER BY user_id ## 문제2) user들의 다음 접속 월과 다다음 접속 월,이전 접속 월을 구하는 쿼리를 작성해주세요. -- LAG 함수를 사용할 때 NULL이 나온다 => 그 값은 처음이다! -- LEAD 함수를 사용할 때 NULL이 나온다 => 그 값은 마지막이다! -- SELECT -- user_id -- , visit_month -- , LEAD(visit_month, 1) OVER (PARTITION BY user_id ORDER BY visit_month) AS after_month -- , LEAD(visit_month, 2) OVER (PARTITION BY user_id ORDER BY visit_month) AS after_next_month -- , LAG(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) AS before_month -- FROM `advanced.analytics_function_01` -- ORDER BY user_id ## 3번 : 유저가 접속했을 때, 다음 접속까지의 간격을 구하시오 -- user_id | visit_month | after_visit_month | diff_month -- SELECT -- user_id -- , visit_month -- , LEAD(visit_month, 1) OVER (PARTITION BY user_id ORDER BY visit_month) AS after_month -- , LEAD(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) - visit_month AS diff_month -- FROM `advanced.analytics_function_01` -- ORDER BY user_id ## 윈도우 함수를 이렇게 쓰는게 좋을까? => 중복된 쿼리는 줄이는 것이 좋을 수 있음 -- 쿼리를 수정할 상황이 생김 => 2번 수정 => 굉장히 많아지면 복잡해지고, 실수하기 좋음 -- 쿼리가 길어지는 것을 무서워하지 말고, 쿼리를 덜 수정할 수 있는 구조를 만들면 좋겠다! -- 윈도우 함수 쓰다보면 쿼리 줄이 길어짐. 감안하고 사용하면 좋겠다 -- -- 그래서 서브쿼리로 만들어보면,, -- SELECT -- * -- , (after_month - visit_month) AS diff_month -- FROM ( -- SELECT -- * -- , LEAD(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) AS after_month -- FROM `advanced.analytics_function_01` -- ) -- ORDER BY user_id ## 추가 문제 : 이 데이터셋을 기준으로 user_id의 첫번째 방문 월, 마지막 방문 월을 구하는 쿼리를 작성해주세요 SELECT user_id , visit_month , FIRST_VALUE(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) AS first_visit , LAST_VALUE(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) AS last_visit FROM `advanced.analytics_function_01` ORDER BY user_id, visit_month윈도우 함수 Frame 연습문제-- amount_total : 전체 SUM -- cumulative_sum : row 시점에 누적 SUM -- cumulative_sum_by_user : row 시점에 유저별 누적 SUM -- last_5_orders_avg_amount : order_id 기준으로 정렬하고, 직전 5개의 주문의 평균 amount -- 집계분석함수() OVER(PARTITION BY ~~~ ORDER BY ~~~ ROWS BETWEEN A and B) SELECT * , SUM(amount) OVER() AS amount_total ## OVER 안에 아무것도 들어가지 않는 경우도 있구나! , SUM(amount) OVER(ORDER BY order_id) AS cumulative_sum , SUM(amount) OVER(PARTITION BY user_id ORDER BY order_id) AS cumulative_sum_by_user , AVG(amount) OVER(ORDER BY order_id ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING) AS last_5_orders_avg_amount -- , AVG(amount) OVER(ORDER BY order_id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) -- , AVG(amount) OVER(ORDER BY order_id ROWS BETWEEN 5 PRECEDING AND CURRENT ROW) FROM advanced.orders ORDER BY order_id 윈도우 함수의 연습문제## 윈도우 함수 연습문제 ## 1) 사용자별 쿼리를 실행한 총 횟수를 구하는 쿼리를 작성해주세요. 단, GROUP BY를 사용해서 집계하는 것이 아닌 query_logs의 데이터의 우측에 새로운 컬럼을 만들어주세요. -- 사용자별 쿼리를 실행한 총 횟수 : COUNT() 전체 실행. -- OVER(PARTITION BY user) -- SELECT -- * -- , COUNT(query_date) OVER(PARTITION BY user) AS total_query_cnt -- FROM `advanced.query_logs` -- ORDER BY query_date ## 2) 주차별로 팀 내에서 쿼리를 많이 실행한 수를 구한 후, 실행한 수를 활용해 랭킹을 구해주세요. 단, 랭킹이 1등인 사람만 결과가 보이도록 해주세요. -- 주차별로 개인당 실행한 쿼리 횟수 -- 위 쿼리 횟수를 기반으로 랭킹 -- 랭킹을 기반으로 필터링(랭킹=1) -- 문제의 의도 : 원본 데이터 => 1 row마다 데이터가 있고, 그걸 집계해서 사용. GROUP BY => 윈도우 함수 -- SELECT -- * -- , RANK() OVER(PARTITION BY week_number, team ORDER BY query_cnt DESC) AS team_rank -- FROM ( -- SELECT -- EXTRACT(week from query_date) AS week_number, -- team, -- user, -- COUNT(query_date) AS query_cnt -- FROM `advanced.query_logs` -- GROUP BY ALL -- ) -- QUALIFY team_rank = 1 -- ORDER BY week_number, team -- ## ## WITH 문 풀이 -- WITH query_cnt_by_team AS ( -- SELECT -- EXTRACT(week from query_date) AS week_number, -- team, -- user, -- COUNT(query_date) AS query_cnt -- FROM `advanced.query_logs` -- GROUP BY ALL -- ) -- SELECT -- *, -- RANK() OVER(PARTITION BY week_number, team ORDER BY query_cnt DESC) AS rk -- FROM query_cnt_by_team -- -- QUALIFY : 윈도우 함수의 조건을 설정할 때 사용 -- QUALIFY rk = 1 -- ORDER BY week_number, team, query_cnt DESC -- -- 16주차 - AI팀의 케이피, 16주차 - 코칭팀의 카일, 16주차 - 데이터 사이언스팀의 샘 ## 3)(2번 문제에서 사용한 주차별 쿼리 사용) 쿼리를 실행한 시점 기준 1주 전에 쿼리 실행 수를 별도의 컬럼으로 확인할 수 있는 쿼리를 작성해주세요 -- 1주 전의 쿼리 실행 수 => LAG -- WITH query_cnt_by_team AS ( -- SELECT -- EXTRACT(week from query_date) AS week_number, -- team, -- user, -- COUNT(query_date) AS query_cnt -- FROM `advanced.query_logs` -- GROUP BY ALL -- ) -- SELECT -- user, -- team, -- week_number, -- query_cnt, -- LAG(query_cnt) OVER(PARTITION BY user ORDER BY week_number) AS prev_week_query_cnt -- FROM query_cnt_by_team -- ORDER BY user -- -- ans T) -- WITH query_cnt_by_team AS ( -- SELECT -- EXTRACT(week from query_date) AS week_number, -- team, -- user, -- COUNT(query_date) AS query_cnt -- FROM `advanced.query_logs` -- GROUP BY ALL -- ) -- SELECT -- *, -- LAG(query_cnt, 1) OVER(PARTITION BY user ORDER BY week_number) AS prev_week_query_cnt -- FROM query_cnt_by_team ## 4) 시간의 흐름에 따라, 일자별로 유저가 실행한 누적 쿼리 수를 작성해주세요. -- 누적 쿼리 : 과거의 시간(UNBOUNDED PRECEDING)부터 current row까지 -- 출제 의도 : Default Frame에 대해 알려드리고 싶었음. -- For aggregate analytic functions, if the ORDER BY clause is present but the window frame clause is not, the following window frame clause is used by default: -- SELECT -- *, -- SUM(query_cnt) OVER(PARTITION BY user ORDER BY query_date) AS cumulative_sum -- FROM ( -- SELECT -- user, -- team, -- query_date, -- COUNT(query_date) AS query_cnt -- FROM `advanced.query_logs` -- GROUP BY ALL -- ) -- -- ans T) -- SELECT -- *, -- SUM(query_cnt) OVER(PARTITION BY user ORDER BY query_date) AS cumulative_sum, -- SUM(query_cnt) OVER(PARTITION BY user ORDER BY query_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_sum2 -- -- Frame의 Default 값 : UNBOUNDED PRECEDING ~ CURRENDT ROW -- FROM ( -- SELECT -- query_date, -- team, -- user, -- COUNT(user) AS query_cnt -- FROM `advanced.query_logs` -- GROUP BY ALL -- ) -- -- QUALIFY cumulative_sum != cumulative_sum2 -- -- -- WHERE, QUALIFY 조건 설정해서 2가지 값이 모두 같은지 비교 => 모두 같으면 != 연산 결과에 반환하는 값이 없을 -- ORDER BY user, query_date ## 5) 다음 데이터는 주문횟수를 나타낸 데이터입니다. 만약 주문 횟수가 없으면 NULL로 기록됩니다. 이런 데이터에서 NULL값이라고 되어있는 부분을 바로 이전 날짜의 값으로 채워주는 쿼리를 작성해주세요. -- LAG로 직전 값 가져오면 되지 않을까? -- number_of_orders가 null 이면, before_number_of_orders를 가져와라! -- 아래 쿼리는 어려운 방법 -- 그 다음 방법 : LAST VALUE를 쓰자! => 값이 없으면 NULL이 뜬다 ! -- FIRST_VALUE, LAST_VALUE => NULL 을 포함해서 연산 -- 출제 의도 : NULL 을 제외해서 연산하고 싶으면 IGNORE NULLS을 쓰면 된다 ! -- WITH raw_data AS ( -- SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL -- SELECT DATE '2024-05-02', 13 UNION ALL -- SELECT DATE '2024-05-03', NULL UNION ALL -- SELECT DATE '2024-05-04', 16 UNION ALL -- SELECT DATE '2024-05-05', NULL UNION ALL -- SELECT DATE '2024-05-06', 18 UNION ALL -- SELECT DATE '2024-05-07', 20 UNION ALL -- SELECT DATE '2024-05-08', NULL UNION ALL -- SELECT DATE '2024-05-09', 13 UNION ALL -- SELECT DATE '2024-05-10', 14 UNION ALL -- SELECT DATE '2024-05-11', NULL UNION ALL -- SELECT DATE '2024-05-12', NULL -- ), -- -- SELECT -- -- date, -- -- IFNULL(number_of_orders, last_value_orders) AS numbers_of_orders -- -- FROM ( -- -- SELECT -- -- *, -- -- -- LAG(number_of_orders) OVER(ORDER BY date) AS prev_orders, ## 마지막 값 NULL !! -- -- LAST_VALUE(number_of_orders IGNORE NULLS) OVER(ORDER BY date) AS last_value_orders -- -- FROM raw_data -- -- ) -- -- -- ans T) -- filled_data AS( -- SELECT -- * EXCEPT(number_of_orders), -- LAST_VALUE(number_of_orders IGNORE NULLS) OVER(ORDER BY date) AS number_of_orders -- FROM raw_data -- ) -- ## 6) 5번 문제에서 NULL을 채운 후, 2일 전 ~ 현재 데이터의 평균을 구하는 쿼리를 작성해주세요 (이동평균) -- -- Frame : 2일 전 => BTWEEN 2 PRECEDING AND CURRENT ROW -- -- 출제 의도 : Frame을 지정할 수 있는가? -- SELECT -- *, -- AVG(number_of_orders) OVER(ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg -- FROM filled_data ## 7) app_logs 테이블에서 CustomSession을 만들어주세요. 이전 이벤트 로그와 20초가 지나면 새로운 Session을 만들어주세요. Session은 숫자로(1,2,3…) 표시해도 됩니다. ## 2022-08-18일의 user_pseudo_id(1997494153.8491999091)은 session_id가 4까지 나옵니다 WITH base as ( SELECT event_date, event_timestamp, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') as event_datetime, event_name, user_id, user_pseudo_id FROM advanced.app_logs WHERE event_date = '2022-08-18' and user_pseudo_id = '1997494153.8491999091' ), base2 as ( SELECT *, LAG(event_datetime, 1) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime) as before_event_datetime FROM base ) SELECT *, SUM(session_start) OVER(PARTITION BY user_pseudo_id ORDER BY event_timestamp) as session_id FROM ( SELECT *, IF(second_diff is NULL or second_diff > 20, 1, NULL) as session_start FROM ( SELECT *, DATETIME_DIFF(event_datetime, before_event_datetime, second) as second_diff FROM base2 ) ) ORDER BY event_timestamp
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 2주차 과제] 윈도우 함수 연습 문제
1. 탐색함수 연습문제 문제 1. user들의 다음 접속 월과 다다음 접속 월을 구하는 쿼리SELECT user_id, visit_month, lead(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) AS lead_visit_month, lead(visit_month,2) OVER (PARTITION BY user_id ORDER BY visit_month) AS lead_visit_month2 FROM `advanced.analytics_function_01` ORDER BY user_id문제2. user들의 다음 접속 월과 다다음 접속 월, 이전 접속 월을 구하는 쿼리 SELECT user_id, visit_month, lead(visit_month,1) OVER (PARTITION BY user_id ORDER BY visit_month) AS lead_visit_month, lead(visit_month,2) OVER (PARTITION BY user_id ORDER BY visit_month) AS lead_visit_month2, lag(visit_month,1) OVER (PARTITION BY user_id ORDER BY visit_month) AS lag_visit_month FROM `advanced.analytics_function_01` ORDER BY user_id, visit_month # LAG 함수를 사용할 때 NULL 이면 그 값은 처음, # LEAD 함수를 사용할 때 NULL 이면 그 값은 마지막 문제3. 유저가 접속했을 때, 다음 접속까지의 간격을 구하기SELECT user_id, visit_month, lead(visit_month,1) OVER (PARTITION BY user_id ORDER BY visit_month) AS lead_visit_month, ((lead(visit_month,1) OVER (PARTITION BY user_id ORDER BY visit_month) ) - visit_month) AS diff_month # 별칭쓴거는 select 안에서 못함 FROM `advanced.analytics_function_01` ORDER BY user_id, visit_month # 이 경우에는 쿼리 많은경우에 수정할 경우 헷갈리기 시작한다. 따라서 서브쿼리로 묶어서 하면 더 편하고 실수 적어짐 # 쿼리 길어진다고 해도 무서워하지 말고 쿼리 덜 수정할 수 있는 구조를 만들자. # 윈도우 함수 쓰면 줄이 쿼리 길어짐. 감안하고 쓰자.문제 4. 이 데이터셋을 기준으로 user_id의 첫번째 접근 월을 구하는 쿼리를 작성하기# 마지막 추가 문제 SELECT *, FIRST_VALUE(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month) AS First_Value, LAST_VALUE(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month) AS Last_Value FROM ( SELECT *, lead(visit_month,1) OVER (PARTITION BY user_id ORDER BY visit_month) AS lead_visit_month FROM `advanced.analytics_function_01` ORDER BY user_id, visit_month ) ORDER BY user_id, visit_month 2. Frame 연습문제 문제 . 회사의 모든 주문량, 누적 주문량, 최근 직전5개 평균 주문량 구하기 SELECT * , SUM(amount) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS amount_total, # OVER 안에 아무것도 없으면 전체 출력이다! SUM(amount) OVER (ORDER BY order_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_sum, SUM(amount) OVER (PARTITION BY user_id ORDER BY order_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_sum_by_user, AVG(amount) OVER (ORDER BY order_id ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING) AS last_5_avg order by order_id 3. 연습문제문제 1. 사용자별 쿼리를 실행한 총 횟수를 구하는 쿼리를 작성하기. 단, GROUP BY를 사용해서 집계하는 것이 아닌 query_logs의 데이터의 우측에 새로운 컬럼을 만들기SELECT *, COUNT(user) OVER (PARTITION BY user ) AS cnt FROM advanced.query_logs ORDER BY query_date, user문제 2. 주차별로 팀 내에서 쿼리를 많이 실행한 수를 구한 후, 실행한 수를 활용해 랭킹을 구하기. 단, 랭킹이 1등인 사람만 결과가 보이도록 하기 # query_date를 바탕으로 주차별로 구분하여 WITH함수로 묶기 WITH week_number_table AS ( SELECT *, EXTRACT(WEEK FROM query_date) AS week_number, # 하루에 유저별 쿼리 이용수 추출 COUNT(*) AS query_cnt, FROM advanced.query_logs GROUP BY ALL ) # 팀 내에서 유저별로 랭킹 구하기 SELECT week_number, team, query_date, query_cnt, RANK() OVER (PARTITION BY week_number,team ORDER BY query_cnt DESC) AS team_rank FROM week_number_table # 1등인 유저만 출력 QUALIFY team_rank = 1 ORDER BY week_number, team 문제 3. (2번 문제에서 사용한 주차별 쿼리 사용) 쿼리를 실행한 시점 기준 1주 전에 쿼리 실행 수를 별도의 컬럼으로 확인할 수 있는 쿼리를 작성하기# 2번쿼리 WITH week_number_table AS ( SELECT *, EXTRACT(WEEK FROM query_date) AS week_number, COUNT(*) AS query_cnt, FROM advanced.query_logs GROUP BY ALL ) SELECT *, # 1주전의 실행수 이므로 LAG를 이용해서 전 week_number의 실행수 구하기 LAG(query_cnt, 1) OVER(PARTITION BY user ORDER BY week_number) AS prev_week_query_count FROM week_number_table 문제 4. 시간의 흐름에 따라 일자별로 유저가 실행한 누적 쿼리 수를 작성하기# 시간의 흐름에 따라 일자별로 쿼리수 묶기 WITH user_query AS ( SELECT *, COUNT(*) AS query_count FROM advanced.query_logs GROUP BY ALL ) # 유저별로 누적 쿼리수 구하기 SELECT *, SUM(query_count) OVER (PARTITION BY user ORDER BY query_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_query_count FROM user_query ORDER BY user # Frame 의 Default 값은 UNBOUNDED PRECEDING ~ CURRENT ROW 이다 # 데이터 정합성 확인 할때 2가지 값이 모두 같은지 비교하면 편리하다 ( WHERE, QUALIFY절에 활용 ) 문제 5. 만약 주문 횟수가 없으면 NULL로 기록됨. 이런 데이터에서 NULL값이라고 되어 있는 부분을 바로 이전 날짜의 값으로 채워주는 쿼리를 작성하기WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ) # CASE문을 이용해서 주문량 NULL 값이면 전 날짜의 주문량으로 대치 SELECT *, (CASE WHEN number_of_orders IS NULL THEN LAG(number_of_orders,1) OVER (ORDER BY date) ELSE number_of_orders END ) AS number_of_orders2 FROM raw_data # BUT 맨 마지막값 NULL이 연속2번이라 NULL이 나옴 # LAST_VALUE의 IGNORE NULLS 쓰기 SELECT *, LAST_VALUE(number_of_orders IGNORE NULLS) OVER (ORDER BY date) AS number_of_orders2 FROM raw_data 문제 6. 5번 문제에서 NULL을 채운 후, 2일전 ~ 현재 데이터의 평균을 구하는 쿼리를 작성하기(이동평균)SELECT date, number_of_orders2, AVG(number_of_orders2) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_agv # NULL값 처리한 테이블 서브쿼리로 묶은 다음에 윈도우 함수써서 2틀전~현재 평균 출력하기 FROM ( SELECT *, LAST_VALUE(number_of_orders IGNORE NULLS) OVER (ORDER BY date) AS number_of_orders2, FROM raw_data ) 문제 7. app_logs 테이블에서 custom session을 만들기. 이전 이벤트 로그와 20초가 지나면 새로운 session을 만들기. session은 숫자로 (1, 2, 3 …) 표시해도 됨.WITH START AS ( SELECT event_date, # timestamp 를 서울 표준 시간으로 바꾸기 DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, event_name, user_id, user_pseudo_id FROM advanced.app_logs WHERE event_date = '2022-08-16' ), # 기존 시간의 차이 구하기 DIFF_DATE AS ( SELECT *, DATETIME_DIFF(event_datetime, before_event_datetime, SECOND) AS second_diff FROM ( SELECT # 전(LAG) 시간을 불러오기 위해 서브쿼리로 묶음 *, LAG(event_datetime, 1 ) OVER ( PARTITION BY user_pseudo_id ORDER BY event_datetime) AS before_event_datetime FROM START) ORDER BY event_datetime) # session_start를 누적합 이용하여 session_number 구하 SELECT *, SUM(session_start) OVER (PARTITION BY user_pseudo_id ORDER BY event_datetime) AS session_number FROM ( SELECT *, # CASE문을 써서 처음 시작부분 (NULL) 1로 바꾸기 (CASE WHEN second_diff IS NULL THEN 1 # second_diff 가 20초 이상이면 1 아니면 0 WHEN second_diff >= 20 THEN 1 ELSE 0 END) AS session_start FROM DIFF_DATE)
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 2주차] 윈도우 함수, FRAME설정, QUALITY
윈도우 탐색 함수 연습문제(1) 연습문제 1-- 문제 1) USER의 다음 접속월, 다다음 접속 월 SELECT user_id, visit_month, LEAD(visit_month,1) OVER(PARTITION BY user_id ORDER BY visit_month) AS next_month, LEAD(visit_month,2) OVER(PARTITION BY user_id ORDER BY visit_month) AS the_month_after_next FROM `avdanced.analytics_function_01` (2) 연습문제 2-- 문제 2) USER의 다음 접속월, 다다음 접속 월, 이전 접속 월 SELECT user_id, visit_month, LEAD(visit_month,1) OVER(PARTITION BY user_id ORDER BY visit_month) AS next_month, LEAD(visit_month,2) OVER(PARTITION BY user_id ORDER BY visit_month) AS the_month_after_next, LAG(visit_month,1) OVER(PARTITION BY user_id ORDER BY visit_month) AS last_month FROM `avdanced.analytics_function_01` 윈도우 함수 FRAME 연습문제연습문제 (1~4)SELECT -- 1)모든 주문량 SUM(amount) OVER() AS amount_total, -- 2)특정주문시점에서 누적주문량 #SUM(amount) OVER(partition by order_date) AS cumulative_sum, SUM(amount) OVER (ORDER BY order_date) AS cumulative_sum, -- 3)고객별 주문 시점에서 누적 주문량 #SUM(amount) OVER(partition by user_id) AS cumulative_sum_by_user, SUM(amount) OVER(partition by user_id ORDER BY order_id) AS cumulative_sum_by_user, -- 4) 최근 직전 5개 평균 주문량 AVG(amount) OVER(ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING) AS last_5_orders_avg_amount, AVG(amount) OVER(ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS last_5_unbounded_orders_avg_amount, AVG(amount) OVER(ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS all_orders_avg_amount FROM `avdanced.orders` 윈도우 함수(1) 연습문제 1-- 연습문제1) 사용자별 쿼리 실행 횟수 WITH base AS( SELECT user, team, query_date, COUNT(*) OVER(PARTITION BY user) AS total_query_cnt, FROM `avdanced.query_logs` ) SELECT * FROM base(2) 연습문제 2-- 연습문제2) 주차별 팀내 쿼리 실행한 수 (RANK 1만 보이도록) WITH base2 AS( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(*) OVER(PARTITION BY EXTRACT(WEEK FROM query_date) ,user ORDER BY EXTRACT(WEEK FROM query_date) ) AS query_cnt, FROM `avdanced.query_logs` ORDER BY EXTRACT(WEEK FROM query_date) ) SELECT DISTINCT *, RANK() OVER(PARTITION BY team,week_number ORDER BY total_query_cnt DESC) AS team_rank FROM base2 QUALIFY team_rank = 1 ORDER BY week_number, team(3) 연습문제 3WITH base2 AS( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(*) OVER(PARTITION BY EXTRACT(WEEK FROM query_date) ,user ORDER BY EXTRACT(WEEK FROM query_date) ) AS query_cnt, FROM `avdanced.query_logs` #QUALIFY team_rank = 1 ORDER BY EXTRACT(WEEK FROM query_date) ), base3 AS( SELECT DISTINCT *, RANK() OVER(PARTITION BY team,week_number ORDER BY query_cnt DESC) AS team_rank FROM base2 QUALIFY team_rank = 1 ORDER BY week_number, team ) -- 연습문제3) 쿼리 실행 시점 1주전 쿼리 실행 SELECT DISTINCT *, LAG(query_cnt,1) OVER(PARTITION BY user ORDER BY week_number) AS prev_week_query_count FROM base2 GROUP BY ALL ORDER BY user, week_number(4) 연습문제 4--연습문제4) SELECT *, SUM(query_count) OVER(PARTITION BY user ORDER BY query_date) AS culmulative_query_count, SUM(query_cnt) OVER(PARTITION BY user ORDER BY query_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_sum2 FROM( SELECT DISTINCT *, COUNT(user) OVER(PARTITION BY query_date, user) AS query_count, FROM `avdanced.query_logs` ) ORDER BY user,query_date (5) 연습문제 5WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ) --연습문제 5) null에 이전 값 삽입 SELECT raw_data.date, IF(raw_data.number_of_orders IS NULL, LAG(raw_data.number_of_orders,1) OVER(ORDER BY date), raw_data.number_of_orders) FROM raw_data(6) 연습문제 6WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ), null_is_lag AS( --연습문제 5) null에 이전 값 삽입 SELECT raw_data.date, IF(raw_data.number_of_orders IS NULL, LAG(raw_data.number_of_orders,1) OVER(ORDER BY date), raw_data.number_of_orders) AS number_of_orders FROM raw_data ) -- 연습문제 6) 이동평균 SELECT *, AVG(nl.number_of_orders) OVER(ORDER BY nl.date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg FROM null_is_lag AS nl(7) 연습문제 7-- 1. TIMESTAMP → DATETIME -- 2. SECOND_DIFF 생성 : uSER로 묶어서 - -- 3. SESSION_START생성 : USER로 묶어서 LAG(DATA,1)이 NULL이면 1, SECOND_DIFF가 20이상이면 +1 -- 4. SESSION_ID생성: SESSION_START가 1일 경우 SESSION_ID +1, NULL일 경우 LAG(DATA,1) WITH add_date AS ( -- 1. TIMESTAMP → DATETIME SELECT event_date, event_timestamp, DATETIME(TIMESTAMP_MICROS(event_timestamp)) AS event_datetime, event_name, user_id, user_pseudo_id, LAG(DATETIME(TIMESTAMP_MICROS(event_timestamp))) OVER(PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS before_event_datetime FROM `avdanced.app_logs_temp` --,UNNEST(event_params) AS param -- FROM 절 안에서 UNNEST를 사용 WHERE event_date ="2022-08-18" AND user_pseudo_id = "1997494153.8491999091" ), add_diff AS ( -- 2. SECOND_DIFF 생성 : uSER로 묶어서 - SELECT *, DATE_DIFF(event_datetime, before_event_datetime,SECOND) AS second_diff, FROM add_date ), add_session AS( -- 3. SESSION_START생성 : USER로 묶어서 LAG(DATA,1)이 NULL이면 1, SECOND_DIFF가 20이상이면 +1 SELECT *, IF(second_diff IS NULL OR second_diff >=20, 1, NULL) AS session_start FROM add_diff ) -- 4. SESSION_ID생성 *, SUM(session_start) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime) AS session_num FROM add_session ORDER BY event_datetime
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 2주차 과제] 윈도우 함수 연습문제
1.--1) 사용자별 쿼리를 실행한 총 횟수를 구하는 쿼리를 작성해주세요. 단 Group By를 사용해서 집계하는 것이 아닌 query_logs의 데이터의 우측에 새로운 컬럼을 만들어주세요. select * , count(query_date) over(partition by user) as total_query_cnt from advanced.query_logs order by user, query_date2.--2) 주차별로 팀 내에서 쿼리를 많이 실행한 수를 구한 후, 실행한 수를 활용해 랭킹을 구해주세요. 단 랭킹이 1등인 사람만 결과가 보이도록 해주세요. select * , rank() over(partition by week_number, team order by query_cnt desc) as team_rank from ( select EXTRACT(WEEK FROM query_date) as week_number , team , user , count(user) as query_cnt from advanced.query_logs group by all ) as base qualify team_rank = 1 order by 1,2,3 3.--3) (2번 문제에서 사용한 주차별 쿼리 사용) 쿼리를 실행한 시점 기준 1주 전에 쿼리 실행 수를 별도의 컬럼으로 확인할 수 있는 쿼리를 작성해주세요. select * , lag(query_cnt) over(partition by user order by week_number) as prev_week_query_cnt from ( select EXTRACT(WEEK FROM query_date) as week_number , team , user , count(user) as query_cnt from advanced.query_logs group by all ) as base order by user4.--4) 시간의 흐름에 따라, 일자별로 유저가 실행한 누적 쿼리 수를 작성해주세요. select * , sum(query_cnt) over(partition by user order by query_date) as cumulative_query_cnt from ( select user , team , query_date , count(user) as query_cnt from advanced.query_logs group by all ) as base order by user5.--5) 다음 데이터는 주문 횟수를 나타낸 데이터입니다. 만약 주문 횟수가 없으면 NULL로 기록됩니다. 이런 데이터에서 NULL 값이라고 되어있는 부분을 바로 이전 날짜의 값으로 채워주는 쿼리를 작성해주세요. WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ) select * , ifnull(number_of_orders,last_value(number_of_orders ignore nulls) over(order by date)) as last_value_orders from raw_data6.--6) 5번 문제에서 NULL을 채운 후, 2일 전 ~ 현재 데이터의 평균을 구하는 쿼리를 작성해주세요(이동 평균) WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ) select date , number_of_orders , avg(number_of_orders) over(order by date rows between 2 preceding and current row) as moving_avg from (select date , ifnull(number_of_orders,last_value(number_of_orders ignore nulls) over(order by date)) as number_of_orders from raw_data) as base7.--7) app_logs 테이블에서 Custom Session을 만들어 주세요. 이전 이벤트 로그와 20초가 지나면 새로운 Session을 만들어 주세요. Session은 숫자로 (1,2,3...) 표시해도 됩니다 --2022-08-18일의 user_pseudo_id(1997494153.8491999091)은 session_id가 4까지 나옵니다. with base as ( select event_date , event_timestamp , datetime(timestamp_micros(event_timestamp), 'Asia/Seoul') as event_datetime , event_name , user_id , user_pseudo_id , lag(datetime(timestamp_micros(event_timestamp), 'Asia/Seoul')) over(partition by user_pseudo_id order by event_timestamp) as before_event_datetime from advanced.app_logs ) , diff_data as ( select * , if(second_diff > 20 or second_diff is null, 1, null) as session_start from ( select * , DATETIME_DIFF(event_datetime, before_event_datetime, second) as second_diff from base ) ) select * , sum(session_start) over(order by event_timestamp) as session_num from diff_data -- where event_date = '2022-08-18' -- and user_pseudo_id = '1997494153.8491999091' order by event_timestamp
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 2주차 과제] 윈도우 함수 연습문제
이번주는 평일 주말에 일이 많아서 따라가기 힘들었네요 ㅎㅎ그치만 마무리해서 올립니다!! #유저들의 다음 접속 월, 다다음 접속 월SELECT user_id,visit_month,LEAD(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month),LEAD(visit_month, 2) OVER(PARTITION BY user_id ORDER BY visit_month)FROM advanced.analytics_function_01#유저들의 다음 접속월, 다다음 접속 월, 이전 접속 월SELECT user_id,visit_month,LEAD(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month),LEAD(visit_month, 2) OVER(PARTITION BY user_id ORDER BY visit_month),LAG(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month)FROM advanced.analytics_function_01ORDER BY user_id#diff 구하기SELECT user_id,visit_month,LEAD(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month),LEAD(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month) - visit_monthFROM advanced.analytics_function_01ORDER BY user_idSELECT after_visit_month - visit_monthFROM(SELECT user_id,visit_month,LEAD(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month) AS after_visit_monthFROM advanced.analytics_function_01ORDER BY user_id)#윈도우함수SELECT *,SUM(amount) OVER () AS amount_total,SUM(amount) OVER (ORDER BY order_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_sum,SUM(amount) OVER (PARTITION BY user_id ORDER BY order_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_sum_user,AVG(amount) OVER (ORDER BY order_id ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING) AS last_5_avgFROM advanced.ordersORDER BY order_id#윈도우함수 연습문제 1SELECT *, COUNT(user) OVER(PARTITION BY user)FROM advanced.query_logs#윈도우함수 연습문제 2WITH nums AS (SELECT EXTRACT(week FROM query_date) AS week_number,team,userFROM advanced.query_logs), numss AS (SELECT week_number, team, user, COUNT(user) AS query_cntFROM numsGROUP BY week_number, team, user), rnks AS (SELECT *, RANK() OVER (PARTITION BY week_number, team ORDER BY query_cnt DESC) AS rkFROM numssORDER BY week_number, team, user)SELECT *FROM rnksWHERE rk = 1#윈도우함수 연습문제 3WITH nums AS (SELECT EXTRACT(week FROM query_date) AS week_number,team,userFROM advanced.query_logs), numss AS (SELECT week_number, team, user, COUNT(user) AS query_cntFROM numsGROUP BY week_number, team, user)SELECT user, team, week_number, query_cnt, LAG(query_cnt) OVER (PARTITION BY user ORDER BY week_number)FROM numssORDER BY user, team#윈도우함수 연습문제 4WITH cntcnt AS (SELECT user, team, query_date, COUNT(user) AS query_countFROM advanced.query_logsGROUP BY user, team, query_date)SELECT user, team, query_date, query_count, SUM(query_count) OVER (PARTITION BY user ORDER BY query_date) AS cntFROM cntcnt#윈도우함수 연습문제 5WITH raw_data AS(SELECT DATE '2024-05-01' AS date,15 AS number_of_orders UNION ALLSELECT DATE '2024-05-02',13 UNION ALLSELECT DATE '2024-05-03',NULL UNION ALLSELECT DATE '2024-05-04',16 UNION ALLSELECT DATE '2024-05-05',NULL UNION ALLSELECT DATE '2024-05-06',18 UNION ALLSELECT DATE '2024-05-07',20 UNION ALLSELECT DATE '2024-05-08',NULL UNION ALLSELECT DATE '2024-05-09',13 UNION ALLSELECT DATE '2024-05-10',14 UNION ALLSELECT DATE '2024-05-11',NULL UNION ALLSELECT DATE '2024-05-12',NULL), raws AS (SELECT date, number_of_orders, LAG(number_of_orders) OVER(ORDER BY date) AS beforesFROM raw_data)SELECT date, COALESCE(number_of_orders, befores)FROM raws#윈도우함수 연습문제 6WITH raw_data AS(SELECT DATE '2024-05-01' AS date,15 AS number_of_orders UNION ALLSELECT DATE '2024-05-02',13 UNION ALLSELECT DATE '2024-05-03',NULL UNION ALLSELECT DATE '2024-05-04',16 UNION ALLSELECT DATE '2024-05-05',NULL UNION ALLSELECT DATE '2024-05-06',18 UNION ALLSELECT DATE '2024-05-07',20 UNION ALLSELECT DATE '2024-05-08',NULL UNION ALLSELECT DATE '2024-05-09',13 UNION ALLSELECT DATE '2024-05-10',14 UNION ALLSELECT DATE '2024-05-11',NULL UNION ALLSELECT DATE '2024-05-12',NULL), raws AS (SELECT date, number_of_orders, LAG(number_of_orders) OVER(ORDER BY date) AS beforesFROM raw_data), dates AS (SELECT date, COALESCE(number_of_orders, befores) AS ordersFROM raws)SELECT date, orders, AVG(orders) OVER(ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)FROM dates#윈도우함수 연습문제7마지막 문제는 스스로 풀진 못했고 강의의 힘을 빌려 마무리했습니다! ㅎㅎ
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝 스터디 2주차 과제] 윈도우 함수 연습 문제
1. 윈도우 함수 탐색 함수 연습 문제1) 유저의 다음 접속 월과 다다음 접속 월 구하기SELECT user_id , visit_month , LEAD(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) AS lead_visit_month , LEAD(visit_month, 2) OVER (PARTITION BY user_id ORDER BY visit_month) AS lead2_visit_month FROM advanced.analytics_function_01 2) 이전 접속월 구하기SELECT user_id , visit_month , LAG(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) AS before_visit_month FROM advanced.analytics_function_01 3) 다음 접속 월까지의 간격 구하기SELECT * , after_visit_month - visit_month AS diff_month FROM ( SELECT user_id , visit_month , LEAD(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) AS after_visit_month FROM advanced.analytics_function_01 ) 4) 첫번째와 마지막 방문 월 구하기SELECT user_id , visit_month , FIRST_VALUE(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) AS first_visit_month , LAST_VALUE(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) AS last_visit_month FROM advanced.analytics_function_01 2. 윈도우 함수 Frame 연습 문제SELECT * , SUM(amount) OVER () AS amount_total , SUM(amount) OVER (ORDER BY order_id) AS cumulative_sum , SUM(amount) OVER (PARTITION BY user_id ORDER BY order_id) AS user_cumulative_sum , AVG(amount) OVER (ORDER BY order_id ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING) AS last_5_avg FROM advanced.orders 3. 윈도우 함수 연습문제1) 사용자 별 쿼리를 실행한 총 횟수 구하는 쿼리를 작성해주세요.SELECT *, COUNT(query_date) OVER(PARTITION BY user ORDER BY query_date) AS total_query_cnt # 38 row FROM `advanced.query_logs` 2) 주차별로 팀내에서 쿼리를 많이 실행한 수를 구한후, 실행한수를 활용해 랭킹을 구해주세요. WITH base AS ( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL) SELECT *, RANK() OVER(PARTITION BY team ORDER BY query_cnt) AS rk FROM base QUALIFY rk = 1 ORDER BY week_number, team, query_cnt DESC 3) 쿼리를 실행한 시점 기준1주전에 쿼리 실행 수를 별도의 컬럼으로 확인할 수 있는 쿼리를 작성해주세요WITH base AS ( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL) SELECT *, LAG(query_cnt, 1) OVER(PARTITION BY user ORDER BY week_number) AS prev_week_cnt FROM base4) 시간의 흐름에 따라,일자별로 유저가 실행한 누적 쿼리수를 작성해주세요SELECT *, SUM(query_cnt) OVER(PARTITION BY user ORDER BY query_date) AS cumulative_sum, SUM(query_cnt) OVER(PARTITION BY user ORDER BY query_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_sum2 FROM( SELECT query_date, team, user, COUNT(user) AS query_cnt FROM advanced.q5) NULL값이라고 되어있는 부분을 바로 이전 날짜의 값으로 채워주는 쿼리를 작성해주세요WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ) SELECT *, LAST_VALUE(number_of_orders IGNORE NULLS) OVER(ORDER BY date) AS last_value_orders FROM raw_data6) 5번문제에서 NULL을 채운 후, 2일전~현재데이터의 평균을 구하는 쿼리를 작성해주세요(이동평균)WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ) , filled_date AS( SELECT * EXCEPT(number_of_orders), LAST_VALUE(number_of_orders IGNORE NULLS) OVER(ORDER BY date) AS number_of_orders FROM raw_data) SELECT *, AVG(number_of_orders) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg FROM filled_date7) app_logs테이블에서 CustomSession을 만들어주세요. 이전 이벤트 로그와 20초가 지나면 새로운 Session을 만들어주세요.WITH base AS( SELECT EVENT_DATE, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, event_name, user_id, user_pseudo_id FROM advanced.app_logs WHERE event_date = '2022-08-18' AND user_pseudo_id = "1997494153.8491999091" ) , diff_date AS(SELECT *, DATETIME_DIFF(event_datetime, prev_event_datetime, SECOND) AS second_diff FROM ( SELECT *, LAG(EVENT_DATETIME, 1) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime) AS prev_event_datetime FROM base ORDER BY base.event_datetime)) SELECT *, SUM(session_start) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime) AS session_num FROM ( SELECT *, CASE WHEN prev_event_datetime IS NULL THEN 1 WHEN second_diff >= 20 THEN 1 ELSE 0 END AS session_start FROM diff_date)
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 2주차 과제] 윈도우 함수 연습 문제
#1SELECT *, LEAD(visit_month, 1) OVER(PARTITION BY user_id ORDER BY visit_month) AS lead_visit_month1, LEAD(visit_month, 2) OVER(PARTITION BY user_id ORDER BY visit_month) AS lead_visit_month2 FROM advanced.analytics_function_01 ORDER BY user_id;#2SELECT *, LEAD(visit_month, 1) OVER(PARTITION BY user_id ORDER BY visit_month) AS after_visit_month1, LEAD(visit_month, 2) OVER(PARTITION BY user_id ORDER BY visit_month) AS after_visit_month2, LAG(visit_month, 1) OVER(PARTITION BY user_id ORDER BY visit_month) AS before_visit_month1 FROM advanced.analytics_function_01 ORDER BY user_id, visit_month;#3SELECT *, after_visit_month - visit_month AS diff FROM ( SELECT *, LEAD(visit_month, 1) OVER (PARTITION BY user_id ORDER BY visit_month) AS after_visit_month, FROM advanced.analytics_function_01 );#4SELECT *, FIRST_VALUE(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS first, LAST_VALUE(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last FROM advanced.analytics_function_01;#5SELECT *, COUNT(user) OVER(PARTITION BY user) AS total_query_cnt FROM advanced.query_logs ORDER BY user, query_date;#6WITH query_cnt_cnt AS ( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(*) AS query_cnt FROM advanced.query_logs GROUP BY week_number,team,user ) SELECT *, RANK() OVER(PARTITION BY week_number, team ORDER BY query_cnt DESC) AS rk FROM query_cnt_cnt QUALIFY rk = 1 ORDER BY week_number, team, query_cnt DESC;#7WITH query_cnt_cnt AS ( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(*) AS query_cnt FROM advanced.query_logs GROUP BY week_number,team,user ) SELECT *, LAG(query_cnt, 1) OVER (PARTITION BY user ORDER BY week_number) AS prev_week_query_cnt FROM query_cnt_cnt;#8WITH query_cnt AS ( SELECT *, COUNT(*) AS query_cnt_cnt FROM advanced.query_logs GROUP BY user, team, query_date ) SELECT *, SUM(query_cnt_cnt) OVER (PARTITION BY user ORDER BY query_date ) AS query_cnt_cnt_cnt FROM query_cnt;#9WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ) SELECT *, LAST_VALUE(number_of_orders IGNORE NULLS) OVER (ORDER BY date) AS last_value_orders FROM raw_data;#10WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ), filled_data AS ( SELECT * EXCEPT(number_of_orders), LAST_VALUE(number_of_orders IGNORE NULLS) OVER (ORDER BY date) AS number_of_orders FROM raw_data ) SELECT *, AVG(number_of_orders) OVER(ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg FROM filled_data#11WITH base AS ( SELECT event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, event_name, user_id, user_pseudo_id FROM advanced.app_logs WHERE 1=1 AND event_date = "2022-08-18" AND user_pseudo_id = "1997494153.8491999091" ) , diff_date AS ( SELECT *, DATETIME_DIFF(event_datetime, prev_event_datetime, SECOND) AS second_diff # second_diff 기반으로 새로운 세션의 시작일지, 아닐지를 판단할 수 있음 FROM( SELECT *, LAG(event_datetime, 1) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime) AS prev_event_datetime -- DATETIME_DIFF() => 차이를 구할 수 있음 FROM base ) ) SELECT *, SUM(session_start) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime) AS session_num FROM ( SELECT *, CASE WHEN prev_event_datetime IS NULL THEN 1 WHEN second_diff >= 20 THEN 1 ELSE 0 END AS session_start FROM diff_data ORDER BY event_datetime )
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 2주차 과제] 윈도우 함수 연습 문제
1번-- 1) 사용자별 쿼리를 실행한 총 횟수를 구하는 쿼리를 작성해주세요. -- 단, GROUP BY를 사용해서 집계하는 것이 아닌 query_logs의 데이터의 우측에 새로운 컬럼을 만들어주세요. SELECT *, COUNT(query_date) over(PARTITION BY user) AS total_query_cnt FROM advanced.query_logs ORDER BY user, query_date 2번-- 2) 주차별로 팀 내에서 쿼리를 많이 실행한 수를 구한 후, 실행한 수를 활용해 랭킹을 구해주세요. -- 단, 랭킹이 1등인 사람만 결과가 보이도록 해주세요 WITH query_cnt_team AS ( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL ) SELECT *, RANK() OVER(PARTITION BY week_number, team ORDER BY query_cnt desc) AS rk FROM query_cnt_team QUALIFY rk = 1 ORDER BY week_number, query_cnt desc; 3번-- 3) (2번 문제에서 사용한 주차별 쿼리 사용) 쿼리를 실행한 시점 기준 1주 전에 쿼리 실행 수를 별도의 컬럼으로 확인할 수 있는 쿼리를 작성헤주세요. WITH query_cnt_team AS ( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL ) SELECT *, LAG(query_cnt) OVER(PARTITION BY team, user ORDER BY week_number) AS prev_week_query_cnt FROM query_cnt_team 4번-- 4) 시간의 흐름(query_date)에 따라, 일자별로 유저가 실행한 누적 쿼리 수를 작성해주세요. WITH query_cnt_team AS ( SELECT query_date, team, user, COUNT(user) as query_cnt FROM advanced.query_logs GROUP BY ALL ORDER BY user, query_date ) SELECT user, team, query_date, query_cnt, SUM(query_cnt) OVER(PARTITION BY team, user ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_query_count FROM query_cnt_team ORDER BY user, query_date; 5번-- 5) 주문 횟수를 나타낸 데이터에서 NULL 값이라고 되어있는 부분을 바로 이전 날짜의 값으로 채워주는 쿼리를 작성해주세요. WITH raw_data AS( SELECT DATE'2024-05-01'AS date,15 AS number_of_orders UNION ALL SELECT DATE'2024-05-02',13 UNION ALL SELECT DATE'2024-05-03',NULL UNION ALL SELECT DATE'2024-05-04',16 UNION ALL SELECT DATE'2024-05-05',NULL UNION ALL SELECT DATE'2024-05-06',18 UNION ALL SELECT DATE'2024-05-07',20 UNION ALL SELECT DATE'2024-05-08',NULL UNION ALL SELECT DATE'2024-05-09',13 UNION ALL SELECT DATE'2024-05-10',14 UNION ALL SELECT DATE'2024-05-11',NULL UNION ALL SELECT DATE'2024-05-12',NULL ) SELECT *, LAST_VALUE(number_of_orders IGNORE NULLS) OVER(ORDER BY date) AS last_value_orders FROM raw_data; 6번-- 6) 5번 문제에서 NULL을 채운 후, 2일 전 ~ 현재 데이터의 평균을 구하는 쿼리를 작성해주세요.(이동 평균) WITH raw_data AS( SELECT DATE'2024-05-01'AS date,15 AS number_of_orders UNION ALL SELECT DATE'2024-05-02',13 UNION ALL SELECT DATE'2024-05-03',NULL UNION ALL SELECT DATE'2024-05-04',16 UNION ALL SELECT DATE'2024-05-05',NULL UNION ALL SELECT DATE'2024-05-06',18 UNION ALL SELECT DATE'2024-05-07',20 UNION ALL SELECT DATE'2024-05-08',NULL UNION ALL SELECT DATE'2024-05-09',13 UNION ALL SELECT DATE'2024-05-10',14 UNION ALL SELECT DATE'2024-05-11',NULL UNION ALL SELECT DATE'2024-05-12',NULL ), filled_data AS ( SELECT * EXCEPT(number_of_orders), LAST_VALUE(number_of_orders IGNORE NULLS) OVER(ORDER BY date) AS last_value_orders FROM raw_data ) SELECT * FROM filled_data; 7번-- 7) app_logs 테이블에서 Custom Session을 만들어 주세요. 이전 이벤트 로그와 20초가 지나면 새로운 Session을 만들어 주세요. -- Session은 숫자로 (1, 2, 3 ...) 표시해도 됩니다. WITH base AS ( SELECT event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, event_name, user_id, user_pseudo_id FROM advanced.app_logs WHERE event_date = '2022-08-18' ), diff_date AS ( SELECT *, DATETIME_DIFF(event_datetime, pre_event_time, SECOND) AS second_diff FROM ( SELECT *, LAG(event_datetime) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime ASC) AS pre_event_time FROM base ) ), session_start AS ( SELECT *, CASE WHEN pre_event_time IS NULL THEN 1 WHEN second_diff >= 20 THEN 1 ELSE 0 END AS start_session FROM diff_date ) SELECT *, SUM(start_session) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS session_id FROM session_start ORDER BY event_datetime;
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 2주차 과제] 윈도우 함수 연습 문제
# 윈도우 함수(탐색 함수) 연습 문제 1 SELECT *, LEAD(visit_month, 1) OVER(PARTITION BY user_id ORDER BY visit_month ASC) AS lead_1, LEAD(visit_month, 2) OVER(PARTITION BY user_id ORDER BY visit_month ASC) AS lead_2 FROM advanced.analytics_function_01 ORDER BY user_id; # 윈도우 함수(탐색 함수) 연습 문제 2 SELECT *, LEAD(visit_month, 1) OVER(PARTITION BY user_id ORDER BY visit_month ASC) AS lead_1, LEAD(visit_month, 2) OVER(PARTITION BY user_id ORDER BY visit_month ASC) AS lead_2, LAG(visit_month, 1) OVER(PARTITION BY user_id ORDER BY visit_month ASC) AS lag_1 FROM advanced.analytics_function_01 ORDER BY user_id;# 1) 사용자별 쿼리를 실행한 총 횟수를 구하는 쿼리를 작성해주세요. 단, GROUP BY를 사용해서 집계하는 것이 아닌 query_logs의 데이터의 우측에 새로운 컬럼을 만들어주세요. SELECT *, COUNT(user) OVER(PARTITION BY user) AS total_query_cnt FROM advanced.query_logs ORDER BY query_date# 2) 주차별로 팀 내에서 쿼리를 많이 실행한 수를 구한 후, 실행한 수를 활용해 랭킹을 구해주세요. 단, 랭킹이 1등인 사람만 결과가 보이도록 해주세요 WITH base AS ( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL ) SELECT *, RANK() OVER(PARTITION BY week_number, team ORDER BY query_cnt DESC) AS team_rank FROM base QUALIFY team_rank = 1 ORDER BY week_number, team # 3) (2번 문제에서 사용한 주차별 쿼리 사용) 쿼리를 실행한 시점 기준 1주 전에 쿼리 실행 수를 별도의 컬럼으로 확인할 수 있는 쿼리를 작성해주세요 WITH base AS ( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL ) SELECT user, team, week_number, query_cnt, LAG(query_cnt, 1) OVER(PARTITION BY user ORDER BY week_number) AS prev_week_query_cnt FROM base# 4) 시간의 흐름에 따라, 일자별로 유저가 실행한 누적 쿼리 수를 작성해주세요 WITH base AS ( SELECT user, team, query_date, COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL ) SELECT *, SUM(query_cnt) OVER(PARTITION BY user ORDER BY query_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_query_count FROM base# 5) 다음 데이터는 주문 횟수를 나타낸 데이터입니다. 만약 주문 횟수가 없으면 NULL로 기록됩니다. 이런 데이터에서 NULL값이라고 되어있는 부분을 바로 이전 날짜의 값으로 채워주는 쿼리를 작성해주세요 WITH raw_data AS ( SELECT DATE'2024-05-01'AS date, 15 AS number_of_orders UNION ALL SELECT DATE'2024-05-02', 13 UNION ALL SELECT DATE'2024-05-03', NULL UNION ALL SELECT DATE'2024-05-04', 16 UNION ALL SELECT DATE'2024-05-05', NULL UNION ALL SELECT DATE'2024-05-06', 18 UNION ALL SELECT DATE'2024-05-07', 20 UNION ALL SELECT DATE'2024-05-08', NULL UNION ALL SELECT DATE'2024-05-09', 13 UNION ALL SELECT DATE'2024-05-10', 14 UNION ALL SELECT DATE'2024-05-11', NULL UNION ALL SELECT DATE'2024-05-12', NULL ) SELECT *, IFNULL(number_of_orders, LAG(number_of_orders,1) OVER(ORDER BY date ASC)) AS non_null FROM raw_data; SELECT *, LAST_VALUE(number_of_orders IGNORE NULLS) OVER(ORDER BY date ASC) AS non_null FROM raw_data# 6) 5번 문제에서 NULL을 채운 후, 2일 전 ~ 현재 데이터의 평균을 구하는 쿼리를 작성해주세요(이동 평균) WITH raw_data AS ( SELECT DATE'2024-05-01'AS date, 15 AS number_of_orders UNION ALL SELECT DATE'2024-05-02', 13 UNION ALL SELECT DATE'2024-05-03', NULL UNION ALL SELECT DATE'2024-05-04', 16 UNION ALL SELECT DATE'2024-05-05', NULL UNION ALL SELECT DATE'2024-05-06', 18 UNION ALL SELECT DATE'2024-05-07', 20 UNION ALL SELECT DATE'2024-05-08', NULL UNION ALL SELECT DATE'2024-05-09', 13 UNION ALL SELECT DATE'2024-05-10', 14 UNION ALL SELECT DATE'2024-05-11', NULL UNION ALL SELECT DATE'2024-05-12', NULL ) SELECT *, AVG(number_of_orders) OVER(ORDER BY date ASC ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg FROM ( SELECT date, LAST_VALUE(number_of_orders IGNORE NULLS) OVER(ORDER BY date) AS number_of_orders FROM raw_data )# 7) app_logs 테이블에서 Custom Session을 만들어 주세요. 이전 이벤트 로그와 20초가 지나면 새로운 Session을 만들어 주세요. Session은 숫자로(1,2,3...) 표시해도 됩니다 # 2022-08-18일의 user_pseudo_id(1997494153.8491999091)은 session_id가 4까지 나옵니다 WITH base AS ( SELECT event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, event_name, user_id, user_pseudo_id FROM advanced.app_logs WHERE event_date = '2022-08-18' AND user_pseudo_id = '1997494153.8491999091' ), diff_data AS ( SELECT *, DATETIME_DIFF(event_datetime, before_1_event_datetime, SECOND) AS second_diff FROM ( SELECT *, LAG(event_datetime, 1) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime ASC) AS before_1_event_datetime FROM base ) ) SELECT *, SUM(session_start) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS session_id FROM ( SELECT *, CASE WHEN second_diff IS NULL THEN 1 WHEN second_diff >= 20 THEN 1 ELSE NULL END AS session_start FROM diff_data )
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 2주차 과제] 윈도우 함수 연습문제
탐색 함수 연습문제 11. user 들의 다음 접속 월과 다다음 접속 월을 구하는 쿼리를 작성해주세요SELECT user_id, visit_month, LEAD (visit_month, 1) OVER (PARTITION BY user_id ORDER BY visit_month) AS after_visit_month, LEAD (visit_month, 2) OVER (PARTITION BY user_id ORDER BY visit_month) AS after_two_visit_month FROM advanced.analytics_function_01 user 들의 다음 접속 월과 다다음 접속 월, 이전 접속 월을 구하는 쿼리를 작성해주세요-- 다음접속월 LEAD -- 다다음접속월 LEAD -- 이전접속월 LAG SELECT *, LEAD(visit_month, 1) OVER(PARTITION BY user_id ORDER BY visit_month ASC) AS after_visit_month, LEAD(visit_month, 2) OVER(PARTITION BY user_id ORDER BY visit_month ASC) AS after_two_visit_month, LAG(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month ASC) AS before_visit_month FROM advanced.analytics_function_01유저가 접속했을 때, 다음 접속까지의 간격을 구하시오-- user_id | visit_month | after_visit_month | diff_month -- 일단 LEAD, LAG 구하고 after visit month 에서 visit month 뺴기 SELECT *, (after_visit_month - visit_month) AS diff_month FROM ( SELECT *, LEAD(visit_month, 1) OVER(PARTITION BY user_id ORDER BY visit_month) AS after_visit_month, FROM advanced.analytics_function_01 ) ORDER BY user_id, visit_month ASC별칭을 정의하고 실행 한 뒤 다음 쿼리에서 그걸 이용해야 하기 때문에 서브쿼리를 사용해야함길이가 짧은 쿼리 SELECT *, LEAD(visit_month, 1) OVER (PARTITION BY user_id ORDER BY visit_month) AS after_visit_month, LEAD(visit_month, 1) OVER (PARTITION BY user_id ORDER BY visit_month) - visit_month AS diff_month FROM advanced.analytics_function_01길이는 더 짧지만 같은 코드가 중복되기 때문에 코드를 수정해야 할 경우 두 번 수정해야함.user_id 의 첫번째 방문 월, 마지막 방문 월을 구하는 쿼리를 작성해주세요-- 첫번째방문 : FIRST_VALUE -- 마지막방문 : LAST_VALUE SELECT *, FIRST_VALUE(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS first_visit_month, LAST_VALUE(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_visit_month FROM advanced.analytics_function_01 ORDER BY user_id, visit_monthFRAME 연습문제 2SUM : amount_total 구하기누적합 : cumulative_sum 구하기유저별 누적합 : cumulative_sum(user)직전 5개 평균값 : last_5_avg ** 직전이므로 current row 포함되면 안 됨SELECT *, -- amount 전체 합 SUM(amount) OVER() AS amount_total, -- 누적 합 cumulative_sum SUM(amount) OVER(ORDER BY order_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as cumulative_sum, -- 유저별 누적합 cumulative_sum(user) SUM(amount) OVER( PARTITION BY user_id ORDER BY order_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) as cumulative_sum_user, -- 직전 5개 주문의 평균값 last_5_avg AVG(amount) OVER(ORDER BY order_id ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING) as last_5_avg FROM advanced.orders ORDER BY order_id 윈도우 함수 연습문제 3사용자별 쿼리를 실행한 총 횟수를 구하는 쿼리를 작성해주세요 단 GROUP BY를 사용해서 집계하는 것이 아닌 query_logs의 데이터의 우측에 새로운 컬럼을 만들어주세요SELECT *, COUNT(query_date) OVER( PARTITION BY user ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) as total_query_cnt FROM advanced.query_logs ORDER BY query_date주차별로 팀 내에서 쿼리를 많이 실행한 수를 구한 후, 실행한 수를 활용해 랭킹을 구해주세요. 단, 랭킹이 1등인 사람만 결과가 보이도록 해주세요WITH weekly AS ( -- 날짜를 주로 변환 SELECT *, EXTRACT(WEEK FROM query_date) AS week_number FROM advanced.query_logs ), querycnt AS ( -- 쿼리 실행 횟수 구하기 SELECT week_number, team, user, COUNT(query_date) OVER (PARTITION BY week_number, user) AS query_cnt FROM weekly ), team_ranks AS ( -- 팀 내에서 쿼리 많이 작성한 사람 랭크 매겨서, 주차별로 파티션 나누고, 랭크 1만 보이게 SELECT week_number, team, user, query_cnt, ROW_NUMBER() OVER(PARTITION BY week_number, team ORDER BY query_cnt DESC) AS team_rank -- 랭킹 행이 여러개 출력되어서 ROW_NUMBER 로 하나만 나오게 구함 FROM querycnt ) SELECT week_number, team, user, query_cnt, team_rank FROM team_ranks WHERE team_rank = 1 ORDER BY week_number ASC, team ASC;-- GROUP BY 이용 WITH query_cnt_by_team AS( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL ) SELECT *, RANK() OVER(PARTITION BY week_number, team ORDER BY query_cnt DESC) AS team_rank FROM query_cnt_by_team QUALIFY team_rank =1 ORDER BY week_number, team, query_cnt DESC2번 문제에서 사용한 주차별 쿼리 사용해 쿼리를 실행한 시점 기준 1주전 에 쿼리 실행 수를 별도의 컬럼으로 확인할 수 있는 쿼리를 작성해주세요WITH query_cnt_by_team AS( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL ) SELECT user, team, week_number, query_cnt, LAG(query_cnt) OVER (PARTITION BY team, user ORDER BY week_number, query_cnt) AS prev_week_query_cnt FROM query_cnt_by_team ORDER BY user, team시간의 흐름에 따라, 일자별로 유저가 실행한 누적 쿼리 수를 작성해주세요-- 시간의 흐름에 따라 => query_date 기준 -- 유저별, 일자별 쿼리수 합계 => SUM() OVER(PARTITION BY user ~ -- 누적 쿼리수 계산 새 컬럼 cumulative_query_cnt => SUM() OVER(PARTITION BY ORDER BY ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_query_cnt WITH querycnt AS( SELECT user, team, query_date, COUNT(user) AS query_count FROM advanced.query_logs GROUP BY ALL ORDER BY user ASC, query_count DESC ) SELECT user, team, query_date, query_count, SUM(query_count) OVER(PARTITION BY team, user ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_query_cnt FROM querycnt ORDER BY user, query_date ASC, query_count DESC다음 데이터는 주문 횟수를 나타낸 데이터입니다. 만약 주문 횟수가 없으면 NULL로 기록됩니다. 이런 데이터에서 NULL 값이라고 되어있는 부분을 바로 이전 날짜의 값으로 채워주는 쿼리를 작성해주세요SELECT *, LAST_VALUE(number_of_orders IGNORE NULLS) OVER(ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS last_orders FROM raw_data5번 문제에서 NULL을 채운 후 2일전 ~ 현재의 데이터의 평균을 구하는 쿼리를 작성해주세요SELECT date, last_orders, AVG(last_orders) OVER(ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as moving_avg FROM( SELECT *, LAST_VALUE(number_of_orders IGNORE NULLS) OVER(ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS last_orders FROM raw_data )app_logs 테이블에서 Custom Session을 만들어주세요. 이전 이벤트 로그와 20초가 지나면 새로운 Session을 만들어주세요 Session은 숫자로 (1,2,3...) 표시해도 됩니다. 2022-08-18일의 user_pseudo_id(1997494153.8491999091)은 session_id가 4까지 나옵니다.WITH time2 AS ( SELECT event_date, event_timestamp, TIMESTAMP_MICROS(event_timestamp) + INTERVAL 9 HOUR AS event_datetime, -- 구글링 검색하다가 한국 시간 기준으로 변환하는 방법찾아서 적용해봄 event_name, user_id, user_pseudo_id FROM advanced.app_logs ), lagged_data AS ( SELECT *, LAG(event_datetime) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime) AS before_event_datetime FROM time2 -- 결과 짧게 보려고 날짜 필터링 WHERE event_date = '2022-08-18' ), session_data AS ( SELECT *, TIMESTAMP_DIFF(event_datetime, before_event_datetime, SECOND) AS second_diff FROM lagged_data ) SELECT *, -- session_start: 첫 이벤트에만 1, 나머지는 NULL IF(before_event_datetime IS NULL, 1, NULL) AS session_start, -- session_id: 첫 이벤트에서 1로 시작하고, 이후 20초 이상 차이가 생길 때마다 1씩 증가 SUM(IF(before_event_datetime IS NULL OR second_diff >= 20, 1, 0)) OVER ( PARTITION BY user_pseudo_id ORDER BY event_datetime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS session_id FROM session_data ORDER BY user_pseudo_id, event_datetime
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 2주차 과제] 윈도우 함수 연습문제
1. 탐색 함수 연습 문제-- 1. user들의 다음 접속 월과 다다음 접속 월을 구하는 쿼리를 작성해주세요. -- partition 유저, 접속 월 정렬, LEAD(1), LEAD(2) SELECT user_id , visit_month , LEAD(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month) AS after_month , LEAD(visit_month, 2) OVER(PARTITION BY user_id ORDER BY visit_month) AS after2_month FROM `advanced.analytics_function_01` ORDER BY user_id, visit_month -- 2. user들의 다음 접속 월과 다다음 접속 월,이전 접속 월을 구하는 쿼리를 작성해주세요. -- partition: 유저, order: 접속 월, LEAD(1), LEAD(2), LAG(1) SELECT user_id , visit_month , LEAD(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month) AS after_month , LEAD(visit_month, 2) OVER(PARTITION BY user_id ORDER BY visit_month) AS after2_month , LAG(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month) AS before_month FROM `advanced.analytics_function_01` ORDER BY user_id, visit_month -- 3. 유저가 접속했을 때, 다음 접속까지의 간격을 구하시오. -- partition: 유저, order: 접속 월, 간격: LEAD(1) - 접속 월 SELECT after_visit_month - visit_month AS diff_month FROM ( user_id , visit_month , LEAD(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month) AS after_visit_month FROM `advanced.analytics_function_01` ) ORDER BY user_id, visit_month -- 4. user_id의 첫번째 방문 월, 마지막 방문월을 구하는 쿼리를 작성해주세요. -- partition: 유저, order: 접속 월, FIRST(전체 범위), LAST(전체 범위) SELECT DISTINCT user_id, first_visit_month, last_visit_month FROM ( SELECT user_id -- , visit_month , FIRST_VALUE(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS first_visit_month , LAST_VALUE(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_visit_month FROM `advanced.analytics_function_01` ) ORDER BY user_id 2. Frame 연습 문제-- 1. 우리 회사의 모든 주문량은?(amount_total) : SUM(amount, 전체 범위) -- 2. 특정 주문 시점에서 누적 주문량은?(cumulative_sum) : SUM(order by 주문id) -- 3. 고객별 주문 시점에서 누적 주문량은?(cumnulative_sum_by_user) : SUM(partition by 유저,order by 주문id) -- 4. 최근 직전 5개의 평균 주문량은?(last_5_orders_avg_amount) : AVG(order by 주문id, 5전 ~ 1전) SELECT * # amount_total : SUM(amount) OVER()와 동일한 결과 출력됨 , SUM(amount) OVER(ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS amount_total , SUM(amount) OVER(ORDER BY order_id) AS cumulative_sum , SUM(amount) OVER(PARTITION BY user_id ORDER BY order_id) AS cumulative_sum_by_user , AVG(amount) OVER(ORDER BY order_id ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING) AS last_5_orders_avg_amount FROM `advanced.orders` ORDER BY order_id 3. 윈도우 함수 총정리 연습 문제-- 1. 사용자별 쿼리를 실행한 총 횟수를 구하는 쿼리를 작성해주세요. 단, GROUP BY를 사용해서 집계하는 것이 아닌, query_logs 데이터의 우측에 새로운 컬럼을 만들어주세요. -- 동명이인 없음. COUNT(query_date) OVER(partition by 유저) SELECT * , COUNT(query_date) OVER(PARTITION BY user) AS total_query_cnt FROM `advanced.query_logs` ORDER BY user, query_date -- 2. 주차별로 팀 내에서 쿼리를 많이 실행한 수를 구한 후, 실행한 수를 활용해 랭킹을 구해주세요. 단, 랭킹이 1등인 사람만 결과가 보이도록 해주세요. -- 1) week, 주차별 실행 수 (중복제거 포함) WITH query_log_week_cnt AS ( SELECT EXTRACT(WEEK FROM query_date) AS week_number , team , user , COUNT(query_date) AS query_cnt FROM `advanced.query_logs` GROUP BY ALL ) -- 2) RANK() OVER(PARTITION BY week, team ORDER BY 주차별 실행 수 DESC) / QUALIFY rank = 1 SELECT week_number , team , user , query_cnt , RANK() OVER(PARTITION BY week_number, team ORDER BY query_cnt DESC) AS team_rank FROM query_log_week_cnt WHERE 1=1 QUALIFY team_rank = 1 ORDER BY week_number, team, user -- 3. 쿼리를 실행한 시점 기준 1주 전에 쿼리 실행수를 별도의 컬럼으로 확인할 수 있는 쿼리를 작성해주세요. -- 1) week, 주차별 실행 수 (중복제거 포함) WITH query_log_week_cnt AS ( SELECT EXTRACT(WEEK FROM query_date) AS week_number , team , user , COUNT(query_date) AS query_cnt FROM `advanced.query_logs` GROUP BY ALL ) -- 2) LAG(쿼리cnt) OVER(partition by 유저 order by 주차) SELECT user , team , week_number , LAG(query_cnt) OVER(PARTITION BY user ORDER BY week_number) AS prev_week_query_cnt FROM query_log_week_cnt ORDER BY user, team, week_number -- 4. 시간의 흐름에 따라, 일자별로 유저가 실행한 누적 쿼리 수를 작성해주세요. -- 1) 일자별 쿼리 수 (집계) WITH query_log_daily_cnt AS ( SELECT user , team , query_date , COUNT(query_date) AS query_cnt FROM `advanced.query_logs` GROUP BY ALL ) -- 2) 일자별 누적 쿼리 수 : SUM(쿼리수) OVER(파티션: 유저, 정렬: 일자) SELECT * , SUM(query_cnt) OVER(PARTITION BY user ORDER BY query_date) AS cumulative_query_cnt FROM query_log_daily_cnt ORDER BY user, query_date-- 5. NULL 값 부분을 바로 이전 날짜의 값으로 채워주는 쿼리를 작성해주세요. WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ) -- if null, lag, date 정렬 = 마지막 연속 null case 처리 불가.. -- : last value, ignore nulls 활용 SELECT * , LAST_VALUE(number_of_orders IGNORE NULLS) OVER(ORDER BY date) AS fill_order_num FROM raw_data ORDER BY date-- 6. NULL을 채운 후, 2일 전 ~ 현재 데이터의 평균을 구하는 쿼리를 작성해주세요. (이동평균) WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ) -- last value, ignore nulls 활용 , fill_data AS ( SELECT * , LAST_VALUE(number_of_orders IGNORE NULLS) OVER(ORDER BY date) AS fill_order_num FROM raw_data ) -- AVG(주문 수) OVER(date 정렬 rows between 2일전 ~ 현재) SELECT date , fill_order_num , AVG(fill_order_num) OVER(ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg FROM fill_data ORDER BY date-- 7. custom session 생성 (이전 이벤트 로그와 20초가 지나면 새로운 session) -- 컬럼 생성: event_datetime, 이전 event_datetime, 기준 - 이전 event_datetime, >= 20초 : session_id 1 WITH app_log_datetime AS ( SELECT event_date , event_timestamp , DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime , event_name , user_id , user_pseudo_id FROM `advanced.app_logs` WHERE event_date = '2022-08-18' AND user_pseudo_id = '1997494153.8491999091' ) -- 이전 event_datetime , app_log_datetime_before AS ( SELECT * , LAG(event_datetime) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime) AS before_event_datetime FROM app_log_datetime ) -- 기준 - 이전 event_datetime , app_log_diff AS ( SELECT * , DATETIME_DIFF(event_datetime, before_event_datetime, SECOND) AS second_diff FROM app_log_datetime_before ) -- session >= 20, 첫번째 값 null 처리 , app_log_session_start AS ( SELECT * , CASE WHEN before_event_datetime IS NULL THEN 1 WHEN second_diff >= 20 THEN 1 -- session을 나누는 기준(서비스 특성 고려하여 선정 필요, 보통 앱로그에서는 30초 60초) ELSE 0 END AS session_start FROM app_log_diff ) -- 누적 합을 통해 session_num 생성 SELECT * , SUM(session_start) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime) AS session_num FROM app_log_session_start
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 2주차 과제] 윈도우 함수 연습 문제
사용자별 쿼리를 실행한 총 횟수SELECT *, COUNT(user) OVER(PARTITION BY user) AS total_query_cnt FROM advanced.query_logs ORDER BY user,query_date주차별 팀 내 쿼리 실행횟수 1등SELECT *, RANK() OVER(PARTITION BY week_number,team ORDER BY query_cnt DESC) AS team_rank FROM ( SELECT EXTRACT(week FROM query_date) AS week_number, team, user, COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL ) AS query_logs_byweek QUALIFY team_rank = 1쿼리 실행시점 기준 1주전 쿼리 실행 횟수SELECT *, LAG(query_cnt,1) OVER(ORDER BY week_number) AS prev_week_query_cnt FROM ( SELECT EXTRACT(week FROM query_date) AS week_number, team, user, COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL ) AS query_logs_byweek 시간의 흐름에 따라, 일자별 유저의 누적 쿼리 실행 수SELECT *, SUM(query_count) OVER(PARTITION BY user ORDER BY query_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_query_count FROM ( SELECT *, COUNT(user) AS query_count FROM advanced.query_logs GROUP BY ALL ) AS query_logs_byday 주문 횟수가 없으면 NULL로 기록된 데이터. NULL값을 이전 날짜의 값으로 채워주기 WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ) SELECT date, IF(number_of_orders IS NULL, LAST_VALUE(number_of_orders IGNORE NULLS) OVER(ORDER BY date), number_of_orders) AS number_of_orders FROM raw_data ORDER BY date5번 문제에서 NULL을 채운 후, 2일전~현재데이터의 평균을 구하는 쿼리(이동 평균)SELECT *, AVG(number_of_orders) OVER(ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg FROM ( SELECT date, IF(number_of_orders IS NULL, LAST_VALUE(number_of_orders IGNORE NULLS) OVER(ORDER BY date), number_of_orders) AS number_of_orders FROM raw_data ) AS dataapp_logs테이블에서 custom session 만들기. 이전 이벤트 로그와 20초가 지나면 새로운 session을 표시(숫자로 1,2,3,..)WITH base AS ( SELECT event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp),'Asia/Seoul') AS event_datetime, event_name, user_pseudo_id FROM advanced.app_logs WHERE user_pseudo_id = '1997494153.8491999091' AND event_date = '2022-08-18' ) SELECT *, SUM(session_start) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime) AS session_num FROM ( SELECT *, CASE WHEN second_diff IS NULL THEN 1 WHEN second_diff >= 20 THEN 1 END AS session_start FROM ( SELECT *, Lag(event_datetime,1) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime) AS before_event_datetime, DATETIME_DIFF(event_datetime, Lag(event_datetime,1) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime),SECOND) AS second_diff FROM base ) AS A ) AS B
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 2주차 과제] 윈도우 함수 연습문제
문제 1.SELECT *, COUNT(query_date) OVER (PARTITION BY user ORDER BY user) AS total_query_cnt FROM advanced.query_logs 문제2.WITH table AS ( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL ) SELECT *, RANK() OVER(PARTITION BY week_number ORDER BY query_cnt) AS team_rank FROM table QUALIFY team_rank = 1 문제3.WITH table AS ( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL ) SELECT *, LAG(query_cnt) OVER (PARTITION BY user ORDER BY week_number) AS prev_week_query_cnt FROM table문제4.WITH query_count_table AS ( SELECT *, COUNT(*) AS query_count FROM advanced.query_logs GROUP BY ALL ) SELECT *, SUM(query_count) OVER (PARTITION BY user ORDER BY query_date) AS cululative_query_count FROM query_count_table 문제5.WITH raw_data AS( SELECT DATE'2024-05-01'AS date,15 AS number_of_orders UNION ALL SELECT DATE'2024-05-02',13 UNION ALL SELECT DATE'2024-05-03',NULL UNION ALL SELECT DATE'2024-05-04',16 UNION ALL SELECT DATE'2024-05-05',NULL UNION ALL SELECT DATE'2024-05-06',18 UNION ALL SELECT DATE'2024-05-07',20 UNION ALL SELECT DATE'2024-05-08',NULL UNION ALL SELECT DATE'2024-05-09',13 UNION ALL SELECT DATE'2024-05-10',14 UNION ALL SELECT DATE'2024-05-11',NULL UNION ALL SELECT DATE'2024-05-12',NULL ) SELECT date, IF(number_of_orders is null , last_value(number_of_orders IGNORE NULLS) OVER(ORDER BY date asc), number_of_orders) AS number_of_orders_not_null FROM raw_data;문제6.WITH raw_data AS( SELECT DATE'2024-05-01'AS date,15 AS number_of_orders UNION ALL SELECT DATE'2024-05-02', 13 UNION ALL SELECT DATE'2024-05-03', NULL UNION ALL SELECT DATE'2024-05-04', 16 UNION ALL SELECT DATE'2024-05-05', NULL UNION ALL SELECT DATE'2024-05-06', 18 UNION ALL SELECT DATE'2024-05-07', 20 UNION ALL SELECT DATE'2024-05-08', NULL UNION ALL SELECT DATE'2024-05-09', 13 UNION ALL SELECT DATE'2024-05-10', 14 UNION ALL SELECT DATE'2024-05-11', NULL UNION ALL SELECT DATE'2024-05-12', NULL ), filled_data AS ( SELECT * EXCEPT(number_of_orders), LAST_VALUE(number_of_orders IGNORE NULLS) OVER (ORDER BY date) AS number_of_orders FROM raw_data ) SELECT *, AVG(number_of_orders) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg FROM filled_data문제 7.WITH base AS( SELECT event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, event_name, user_id, user_pseudo_id FROM advanced.app_logs WHERE event_date = '2022-08-18' AND user_pseudo_id = "1997494153.8491999091" ), diff_data AS( SELECT *, DATETIME_DIFF(event_datetime, prev_event_datetime, SECOND) AS second_diff FROM ( SELECT *, # 직전 이벤트 시간을 prev_event_datetime으로 만들기 LAG(event_datetime, 1) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime) AS prev_event_datetime FROM base ) ) SELECT *, SUM(session_start) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime) AS session_num FROM( SELECT *, CASE WHEN prev_event_datetime IS NULL THEN 1 WHEN second_diff >= 20 THEN 1 ELSE NULL END AS session_start FROM diff_data ) ORDER BY event_datetime
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 2주차 과제] 윈도우 함수 연습문제
윈도우 함수(탐색 함수) 연습 문제윈도우 함수 연습 문제 1문제1) user들의 다음 접속 월과 다다음 접속 월을 구하는 쿼리를 작성해주세요.SELECT user_id, visit_month, LEAD(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month) AS after_visit_month, LEAD(visit_month, 2) OVER(PARTITION BY user_id ORDER BY visit_month) AS after_two_visit_month FROM advanced.analytics_function_01 윈도우 함수 연습 문제 2문제2) user들의 다음 접속 월과 다다음 접속 월, 이전 접속 월을 구하는 쿼리를 작성해주세요.SELECT user_id, visit_month, LEAD(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month) AS after_visit_month, LEAD(visit_month, 2) OVER(PARTITION BY user_id ORDER BY visit_month) AS after_two_visit_month, LAG(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month) AS before_visit_month FROM advanced.analytics_function_01 윈도우 함수 FRAME 연습 문제-- amount_total : 전체 sum-- cumulative_sum : row 시점에 누적 sum-- cumulative_sum_by_user : 유저별 row 시점에 누적 sum-- last_5_orders_avg_amount : order_id 기준으로 정렬하고, 직전 5개의 주문의 평균 amountSELECT -- SUM(amount), SUM(amount) OVER() AS amount_total, # OVER 안에 아무것도 들어가지 않는 경우도 있구나! SUM(amount) OVER (ORDER BY order_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_sum, -- SUM(amount) OVER (PARTITION BY user_id ORDER BY order_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_sum_by_user, SUM(amount) OVER (PARTITION BY user_id ORDER BY order_id) AS cumulative_sum_by_user, AVG(amount) OVER (ORDER BY order_id ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING) AS last_5_order_avg_amount FROM advanced.orders ORDER BY order_id 윈도우 함수 연습문제 (1번)1번사용자별 쿼리를 실행한 총 횟수를 구하는 쿼리를 작성해주세요. 단,GROUP BY를 사용해서 집계하는 것이 아닌 query_logs의 데이터의 우측에 새로운 컬럼을 만들어주세요.-- 1) 사용자별 쿼리를 실행한 총 횟수를 구하는 쿼리를 작성해주세요. 단,GROUP BY를 사용해서 집계하는 것이 아닌 query_logs의 데이터의 우측에 새로운 컬럼을 만들어주세요. -- SELECT * -- FROM advanced.query_logs SELECT user, team, query_date, COUNT(user) OVER(PARTITION BY user) AS total_query_cnt FROM advanced.query_logs ORDER BY user, query_date 윈도우 함수 연습문제 (2번~6번)주차별로 팀 내에서 쿼리를 많이 실행한 수를 구한 후, 실행한 수를 활용해 랭킹을 구해주세요. 단, 랭킹이 1등인 사람만 결과가 보이도록 해주세요.문제의 의도 : 원본 데이터 ⇒ 1 ROW 마다 데이터가 있고, 그걸 집계해서 사용. GROUP BY ⇒ 윈도우 함수WITH query_ct_by_team AS ( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL ) SELECT *, RANK() OVER (PARTITION BY week_number, team ORDER BY query_cnt DESC ) AS rk FROM query_ct_by_team WHERE 1=1 -- QUALIFY : 윈도우 함수의 조건을 설정할 때 사용 QUALIFY rk = 1 -- and week_number = 16 ORDER BY week_number, team, query_cnt DESC (2번 문제에서 사용한 주차별 쿼리 사용) 쿼리를 실행한 시점 기준 1주 전에 쿼리 실행 수를 별도의 컬럼으로 확인할 수 있는 쿼리를 작성해주세요.WITH query_ct_by_team AS ( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL ) SELECT *, LAG(query_cnt) OVER(PARTITION BY user ORDER BY week_number) AS prev_week_query_cnt FROM query_ct_by_team ORDER BY user, week_number 시간의 흐름에 따라, 일자별로 유저가 실행한 누적 쿼리 수를 작성해주세요.WITHWITH qcnt AS( SELECT *, COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL ) -- 누적 쿼리 : Frame. 과거의 시간(UNBOUNDED PRECEDING)부터 current, low까지 합쳐라 SELECT *, SUM(query_cnt) OVER(PARTITION BY user ORDER BY query_date) AS cumulative_query_cc, SUM(query_cnt) OVER(PARTITION BY user ORDER BY query_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS cumulative_query_cc2 -- Frame의 Default 값 : UNBOUNDED PRECEDING ~ CURRENT ROW FROM qcnt ORDER BY user, query_date SUBQUERYSELECT *, SUM(query_cnt) OVER(PARTITION BY user ORDER BY query_date) AS cumulative_query_cc, SUM(query_cnt) OVER(PARTITION BY user ORDER BY query_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS cumulative_query_cc2 -- Frame의 Default 값 : UNBOUNDED PRECEDING ~ CURRENT ROW FROM (SELECT *, COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL) ORDER BY user, query_date cumulative_sum 비교SELECT *, SUM(query_cnt) OVER(PARTITION BY user ORDER BY query_date) AS cumulative_query_cc, SUM(query_cnt) OVER(PARTITION BY user ORDER BY query_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS cumulative_query_cc2 -- Frame의 Default 값 : UNBOUNDED PRECEDING ~ CURRENT ROW FROM (SELECT *, COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL) -- WHERE, QUALIFY 조건 설정해서 2가지 값이 모두 같은지 비교 -- => 모두 같으면 != 연산 결과에 반환하는 값이 없을 것 QUALIFY cumulative_query_cc != cumulative_query_cc2 ORDER BY user, query_date 다음 데이터는 주문 횟수를 나타낸 데이터입니다. 만약 주문 횟수가 없으면 NULL로 기록됩니다. 이런 데이터에서 NULL값이라고 되어 있는 부분을 바로 이전 날짜의 값으로 채워주는 쿼리를 작성해주세요.LAGSELECT date, IF(number_of_orders IS NULL, LAG(number_of_orders) OVER(ORDER BY date), number_of_orders) AS number_of_orders, FROM raw_data order by date LAST_VALUE SELECT *, LAST_VALUE(number_of_orders IGNORE NULLS) OVER(ORDER BY date) AS last_value_orders FROM raw_data -- FIRST_VALUE, LAST_VALUE => NULL을 포함해서 연산 -- NULL을 제외해서 연산하고 싶으면 IGNORE NULLS를 쓰면 된다! 5번 문제에서 NULL을 채운 후, 2일 전 ~ 현재 데이터의 평균을 구하는 쿼리를 작성해주세요 (이동평균).WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ), filled_data AS ( SELECT *, LAST_VALUE(number_of_orders IGNORE NULLS) OVER(ORDER BY date) AS last_value_orders FROM raw_data ) SELECT * EXCEPT(number_of_orders), ROUND(AVG(last_value_orders) OVER(ROWS BETWEEN 2 PRECEDING AND CURRENT ROW),2) AS number_of_orders FROM filled_data -- SELECT * EXCEPT(number_of_orders), -- ROUND(AVG(last_value_orders) OVER(ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW),2) AS number_of_orders SELECT *, ROUND(AVG(last_value_orders) OVER(ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW),2) AS moving_avg FROM filled_data app_logs 테이블에서 Custom Session을 만들어주세요. 이전 이벤트 로그와 20초가 지나면 새로운 Session을 만들어주세요. Session은 숫자로 (1, 2, 3...) 표시해도 됩니다. 2022-08-18일의 user_pseudo_id(1997494153.8491999091)은 session_id가 4까지 나옵니다.SELECT * -- event_params가 필요하면 UNNEST => PIVOT해서 사용하면 됨 FROM advanced.app_logs WHERE event_date = "2022-08-18" AND user_pseudo_id = "1997494153.8491999091" ORDER BY event_timestamp WITH base AS ( SELECT --* event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, event_name, user_id, user_pseudo_id -- event_params가 필요하면 UNNEST => PIVOT해서 사용하면 됨 FROM advanced.app_logs WHERE event_date = "2022-08-18" AND user_pseudo_id = "1997494153.8491999091" -- WHERE 조건에 이벤트를 필터링해서 계산해도 괜찮음 ), diff_data AS ( SELECT *, DATETIME_DIFF(event_datetime, prev_event_datetime, SECOND) AS second_diff # secend_diff 기반으로 새로운 세션의 시작일지, 아닐지를 판단할 수 있음 FROM ( SELECT *, LAG(event_datetime, 1) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime) AS prev_event_datetime # event_datetime이랑 prev_event_datetime을 빼서 20초가 넘으면 새로운 세션으로 정의, # 20초가 넘지 않으면 기존 세션 # DATETIME_DIFF() => 차이를 구할 수 있음 FROM base ) ) SELECT * , # 누적합을 사용해서 session_number을 만들었다! SUM(session_start) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime) AS session_num # session을 구할 때 쿼리가 길어질 수 있음. 하루에 접속을 여러번 하는 서비스 => session 기반이 좋을 수도 있고, 아니라고 하면 일자별 유저 집계가 나을 수도 있다. FROM ( SELECT * , CASE WHEN prev_event_datetime IS NULL THEN 1 WHEN second_diff >= 20 THEN 1 # session을 나누는 기준 초, 데이터를 탐색하면서 결정. 보통 앱 로그에서는 30초, 60초 ELSE NULL -- ELSE 0 사용 가능 END AS session_start # session이 시작됨을 알리는 session_start FROM diff_data ) ORDER BY event_datetime -- 세션 정리 -- 이전 이벤트 로그와 현재 이벤트 로그의 diff => 초나 분을 구한다 -- 그 기준을 가지고 기준보다 높으면 새로운 세션이라고 생각한다 -- 첫번째 값엔 null이 있을 수 있어서, 이 부분도 챙겨야 한다 -- 새로운 세션, session_start 값을 기반으로 누적합 => session_num이 된다!