묻고 답해요
141만명의 커뮤니티!! 함께 토론해봐요.
인프런 TOP Writers
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 2주차 과제] 윈도우 함수(연습문제) - 탐색함수 / Frame / 총정리
탐색함수 연습문제문제 1. user들의 다음 접속 월과 다다음 접속 월을 구하는 쿼리를 작성해주세요.SELECT *, LEAD(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month) AS next_visit_month, LEAD(visit_month, 2) OVER (PARTITION BY user_id ORDER BY visit_month) AS nnext_visit_month FROM advanced.analytics_function_01 LEAD가 다음!!!! 은근 헷갈린다 정말. L, E, A, D. D로 끝나니까 다음이라고 생각해야겠다.문제2. user들의 다음 접속 월과 다다음 접속 월, 이전 접속 월을 구하는 쿼리를 작성해주세요.SELECT *, LEAD(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month) AS next_visit_month, LEAD(visit_month, 2) OVER (PARTITION BY user_id ORDER BY visit_month) AS nnext_visit_month, LAG(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month) AS previous_visit_month FROM advanced.analytics_function_01 LAG함수를 쓴 컬럼에서 NULL이 나온다? → 아 이 row가 처음이다.LEAD 함수를 쓴 컬럼에서 NULL이 나온다? → 아 이 row가 마지막이다.문제3. 유저가 접속했을 때, 다음 접속까지의 간격을 구하시오.#답 x, after_visit_month에서 에러남 SELECT *, LEAD(visit_month, 1) OVER (PARTITION BY user_id ORDER BY visit_month) AS after_visit_month, after_visit_month - visit_month FROM advanced.analytics_function_01 →SELECT 절은 가장 마지막에 실행되기 때문에 LEAD함수를 실행하면서 after_visit_month라 이름 붙인 것이다. 따라서 아직 SELECT 절에서 after_visit_month라는 것을 인식하지 못한다.#물론 쿼리가 짧을 때에는 이것도 가능. #하지만 이러한 쿼리가 굉장히 많아지면 복잡해지고, 실수하기 좋다. #중복된 쿼리는 줄이는 것이 좋다. SELECT *, LEAD(visit_month, 1) OVER (PARTITION BY user_id ORDER BY visit_month) - visit_month AS diff_visit_month, FROM advanced.analytics_function_01 → 윈도우 함수를 이렇게 쓰는 것이 좋을까? 중복된 쿼리는 줄이는 것이 좋다.→ 서브쿼리 사용하는 것이 더 낫다. 서브쿼리나 WITH문과 같은 CTE에서는 윈도우함수를 여러 개 쓰더라도 같은 정렬기준과 파티션 기준을 갖는다면 한 번의 데이터 스캔을 거친다.→ 쿼리문 수정시에도 좋다. 만약 쿼리를 수정해야할 때 서브쿼리 내의 쿼리문만 수정하면 되기 때문이다.SELECT *, after_visit_month - visit_month AS diff FROM ( SELECT *, LEAD(visit_month, 1) OVER (PARTITION BY user_id ORDER BY visit_month) AS after_visit_month, FROM advanced.analytics_function_01 ) 문제 4. 이 데이터셋을 기준으로 user_id의 첫번째 접근 월을 구하는 쿼리를 작성해주세요.SELECT *, FIRST_VALUE(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month) AS first_month LAST_VALUE(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month) AS last_month FROM advanced.analytics_function_01 Frame 연습문제문제 1. 우리 회사의 모든 주문량은?SELECT *, SUM(amount) OVER (ORDER BY order_date, order_id #OVER 안에 아무것도 안 들어가도 됨 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS amount_total FROM advanced.orders 문제 2. 특정 주문 시점에서 누적 주문량은?SELECT *, SUM(amount) OVER (ORDER BY order_date, order_id) AS cumulative_sum FROM advanced.orders 문제 3. 고객별 주문 시점에서 누적 주문량은?SELECT *, SUM(amount) OVER (PARTITION BY user_id ORDER BY order_date, order_id ) AS cumulative_sum_by_user 문제 4. 최근 직전 5개의 평균 주문량은?SELECT *, AVG(amount) OVER (ORDER BY order_date, order_id ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING) AS last_5_orders_avg_amout 총정리 연습문제문제 1. 사용자별 쿼리를 실행한 총 횟수를 구하는 쿼리를 작성해주세요. 단, GROUP BY를 사용해서 집계하는 것이 아닌 query_logs의 데이터의 우측에 새로운 컬럼을 만들어주세요SELECT *, COUNT(query_date) OVER (PARTITION BY user ORDER BY user) AS total_query_cnt FROM advanced.query_logs 문제 2. 주차별로 팀 내에서 쿼리를 많이 실행한 수를 구한 후, 실행한 수를 활용해 랭킹을 구해주세요. 단, 랭킹이 1등인 사람만 결과가 보이도록 해주세요.WITH table AS ( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL ) SELECT *, RANK() OVER(PARTITION BY week_number ORDER BY query_cnt) AS team_rank FROM table QUALIFY team_rank = 1문제 3. (2번 문제에서 사용한 주차별 쿼리 사용) 쿼리를 실행한 시점 기준 1주 전에 쿼리 실행 수를 별도의 컬럼으로 확인할 수 있는 쿼리를 작성해주세요.WITH table AS ( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL ) SELECT *, LAG(query_cnt) OVER (PARTITION BY user ORDER BY week_number) AS prev_week_query_cnt FROM table LAG(어떤 이전 값이 들어가야 하는지 = 1주 전 쿼리실행수)PARTITION BY로 쓸 기준은 user : 왜냐면 user 단위로 1주 전 쿼리실행수를 가져오기 때문ORDER BY로 쓸 기준은 week_number : 1주 전 쿼리이기 때문에 날짜 관련으로 정렬문제 4. 시간의 흐름에 따라 일자별로 유저가 실행한 누적 쿼리 수를 작성해주세요.WITH query_count_table AS ( SELECT *, COUNT(*) AS query_count FROM advanced.query_logs GROUP BY ALL ) SELECT *, SUM(query_count) OVER (PARTITION BY user ORDER BY query_date) AS cululative_query_count FROM query_count_table 처음에는 PARTITION BY에 query_date도 같이 넣었다가 결과가 날짜를 기준으로 구분되고 있는 것이 보여서 아차 하고 query_date를 뺐더니 정답이 되었다…..자꾸 문제에서 말하는 “일자 별로” 와 같은 말 때문에 PARTITION BY에 어떤 컬럼이 와야 하는지 헷갈리는 것 같다.문제 5. 다음 데이터는 주문 횟수를 나타낸 데이터입니다. 만약 주문 횟수가 없으면 NULL로 기록됩니다 이런 데이터에서 NULL값이라고 되어 있는 부분을 바로 이전 날짜의 값으로 채워주는 쿼리를 작성해주세요.WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07' , 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL UNION ALL ) SELECT *, LAST_VALUE(number_of_order IGNORE NULLS) OVER (ORDER BY date) AS last_value_orders FROM raw_dataLAST_VALUE()는 원래 NULL을 포함해서 연산하기 때문에 위에서 그냥 쓰면 NULL이 마지막 값으로 인정돼서 들어오지만, IGNORE NULLS를 사용하면 NULL값은 제외하고 값이 있는 것만 마지막값으로 생각하고 가져오기 때문에 쓰는 것. 문제 6. 5번 문제에서 NULL을 채운 후, 2일전 ~ 현재 데이터의 평균을 구하는 쿼리를 작성해주세요(이동평균)WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07' , 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ), filled_data AS ( SELECT * EXCEPT(number_of_orders), **LAST_VALUE**(number_of_orders **IGNORE NULLS**) OVER (ORDER BY date) AS number_of_orders FROM raw_data ) SELECT *, AVG(number_of_orders) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg FROM filled_data 문제 7. app_logs 테이블에서 custom session을 만들어주세요. 이전 이벤트 로그와 20초가 지나면 새로운 session을 만들어주세요. session은 숫자로 (1, 2, 3 …) 표시해도 됩니다.2022-08-18일의 user_pseudo_id(1997494153.8491999091)은 session_id가 4까지 나옵니다.WITH base AS( SELECT event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, event_name, user_id, user_pseudo_id FROM advanced.app_logs WHERE event_date = '2022-08-18' AND user_pseudo_id = "1997494153.8491999091" ), diff_data AS( SELECT *, #이전 이벤트 시간과 현재 이벤트시간의 간격을 SECOND 초단위로 구하기 / second_diff를 기반으로 새로운 세션의 시작일지 아닐지 판단 DATETIME_DIFF(event_datetime, prev_event_datetime, SECOND) AS second_diff FROM ( SELECT *, # 직전 이벤트 시간을 prev_event_datetime으로 만들기 LAG(event_datetime, 1) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime) AS prev_event_datetime FROM base ) ) SELECT *, SUM(session_start) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime) AS session_num FROM( SELECT *, CASE WHEN prev_event_datetime IS NULL THEN 1 WHEN second_diff >= 20 THEN 1 ELSE NULL END AS session_start FROM diff_data ) ORDER BY event_datetime
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 2주차] 윈도우 함수, FRAME 설정, QUALIFY
<윈도우 함수>연습문제 1SELECT user_id, visit_month, LEAD(visit_month, 1) OVER(PARTITION BY user_id ORDER BY visit_month) as next_visit_month, LEAD(visit_month, 2) OVER(PARTITION BY user_id ORDER BY visit_month) as two_next_visit_month, FROM advanced.analytics_function_01 ORDER BY user_id연습문제 2SELECT user_id, visit_month, LAG(visit_month, 1) OVER(PARTITION BY user_id ORDER BY visit_month) as prev_visit_month, LEAD(visit_month, 1) OVER(PARTITION BY user_id ORDER BY visit_month) as next_visit_month, LEAD(visit_month, 2) OVER(PARTITION BY user_id ORDER BY visit_month) as two_next_visit_month, FROM advanced.analytics_function_01 ORDER BY user_id추가문제 - 유저의 첫번째 방문월과 마지막 방문월 구하기SELECT user_id, visit_month, FIRST_VALUE(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month) as first_visit_month, LAST_VALUE(visit_month) OVER(PARTITION BY user_id ORDER BY visit_month) as last_visit_month, FROM advanced.analytics_function_01 ORDER BY user_id<FRAME 설정>advanced.order 문제SELECT *, SUM(amount) OVER() as amount_total, SUM(amount) OVER(ORDER BY order_id) as cumulative_sum, SUM(amount) OVER(PARTITION BY user_id ORDER BY order_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as cumulative_sum_by_user, AVG(amount) OVER(ORDER BY order_id ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING) as last_5_orders_avg_amount, FROM advanced.orders ORDER BY order_id, user_id<연습문제>연습문제 1SELECT *, COUNT(*) OVER(PARTITION BY user) as total_query_cnt FROM advanced.query_logs ORDER BY user연습문제 2WITH base as ( SELECT EXTRACT(WEEK FROM query_date) as week_number, team, user, COUNT(user) as query_cnt FROM advanced.query_logs GROUP BY ALL ) SELECT *, RANK() OVER(PARTITION BY week_number, team ORDER BY query_cnt DESC) as team_rank FROM base QUALIFY team_rank=1 ORDER BY week_number연습문제 3WITH base as ( SELECT user, team, EXTRACT(WEEK FROM query_date) as week_number, COUNT(user) as query_cnt, FROM advanced.query_logs GROUP BY ALL ) SELECT *, LAG(query_cnt, 1) OVER(PARTITION BY user ORDER BY week_number) as prev_week_query_count FROM base연습문제 4WITH base as ( SELECT user, team, query_date, COUNT(user) as query_count, FROM advanced.query_logs GROUP BY ALL ) SELECT *, SUM(query_count) OVER(PARTITION BY user ORDER BY query_date) as cumulative_query_count FROM base ORDER BY user연습문제 5-- SELECT -- *, -- IF(number_of_orders is NULL, prev_number_of_orders, number_of_orders) as result_number_of_orders, -- FROM ( -- SELECT -- *, -- LAG(number_of_orders, 1) OVER(ORDER BY date) as prev_number_of_orders -- FROM raw_data -- ) -- 이 방법은 전의 값들이 Null이 여러개면 제대로 안나옴...! SELECT *, IF(number_of_orders is NULL, last_number_of_orders, number_of_orders) as result_number_of_orders, FROM ( SELECT *, LAST_VALUE(number_of_orders IGNORE NULLS) OVER(ORDER BY date) as last_number_of_orders FROM raw_data )연습문제 6SELECT *, AVG(result_number_of_orders) OVER(ORDER BY date ROWS BETWEEN 2 PRECEDING and CURRENT ROW) as moving_avg FROM ( SELECT *, IF(number_of_orders is NULL, last_number_of_orders, number_of_orders) as result_number_of_orders, FROM ( SELECT *, LAST_VALUE(number_of_orders IGNORE NULLS) OVER(ORDER BY date) as last_number_of_orders FROM raw_data ) )연습문제 7WITH base as ( SELECT event_date, event_timestamp, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') as event_datetime, event_name, user_id, user_pseudo_id FROM advanced.app_logs WHERE event_date = '2022-08-18' and user_pseudo_id = '1997494153.8491999091' ), base2 as ( SELECT *, LAG(event_datetime, 1) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime) as before_event_datetime FROM base ) SELECT *, SUM(session_start) OVER(PARTITION BY user_pseudo_id ORDER BY event_timestamp) as session_id FROM ( SELECT *, IF(second_diff is NULL or second_diff > 20, 1, NULL) as session_start FROM ( SELECT *, DATETIME_DIFF(event_datetime, before_event_datetime, second) as second_diff FROM base2 ) ) ORDER BY event_timestamp배운점&느낀점EXTRACT(WEEK FROM query_date) as week_number : date 정보에서 week 정보 등을 추출할 수 있는 함수DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') as event_datetime : timestamp 형태의 데이터를 날짜로 바꿀 수 있는 함수LAST_VALUE(number_of_orders IGNORE NULLS)... 그냥 마지막 값은 LAST_VALUE, 이전 데이터 중 null값이 아닌 마지막 값을 가져오고 싶을 때는 IGNORE NULLS를 사용해야 함. (5번 연습문제에서 처음에는 LAG 함수를 사용해서 null이 여러 개 연속인 경우가 있어 제대로 값을 가져오지 못했고, LAST_VALUE 윈도우 함수에서IGNORE NULLS 를 까먹어서 null 처리가 또 안됐었다.)With절과 서브쿼리 절을 어떤 상황에 사용할지 아직 잘 모르겠다.CASE WHEN절 사용법이 익숙치 않았다. (문제를 풀 때 IF절을 사용했던 부분에서 강의에서는 CASE WHEN을 사용)윈도우 함수를 제대로 처음 배워봤는데, 어려웠지만 유용한 부분이 많아 사용할 일이 많을 것 같다.
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 2주차] 윈도우 함수, FRAME 설정, QUALIFY
연습 문제 (1)-- 1) SELECT user_id , visit_month , LEAD(visit_month, 1) OVER (PARTITION BY user_id ORDER BY visit_month) AS lead_visit_month , LEAD(visit_month, 2) OVER (PARTITION BY user_id ORDER BY visit_month) AS lead2_visit_month FROM advanced.analytics_function_01 ORDER BY user_id -- 2) SELECT user_id , visit_month , LEAD(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) AS after_visit_month , LEAD(visit_month, 2) OVER (PARTITION BY user_id ORDER BY visit_month) AS after2_visit_month , LAG(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) AS before_visit_month FROM advanced.analytics_function_01 ORDER BY user_id, visit_month -- 3) 유저가 접속했을 때, 다음 접속까지의 간격을 구하시오. SELECT * , after_visit_month - visit_month AS diff_month FROM ( SELECT user_id , visit_month , LEAD(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) AS after_visit_month FROM advanced.analytics_function_01 ) ORDER BY user_id , visit_month -- 4) 이 데이터셋을 기준으로 user_id의 첫번째 방문 월, 마지막 방문 월을 구하는 쿼리를 작성해주세요. SELECT * , FIRST_VALUE(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) AS first_month , LAST_VALUE(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) AS last_month FROM advanced.analytics_function_01 ORDER BY user_id , visit_month -- 윈도우 함수 - 데이터 범위 지정 SELECT * , SUM(amount) OVER () AS amount_sum , SUM(amount) OVER (ORDER BY order_id) AS cumulative_sum , SUM(amount) OVER (PARTITION BY user_id ORDER BY order_id) AS cumulative_sum_by_user , AVG(amount) OVER (ORDER BY order_id ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING) AS last_5_orders_avg_amount FROM advanced.orders ORDER BY order_id 연습 문제 (2)-- 1) SELECT * , COUNT(user) OVER (PARTITION BY user) AS total_query_cnt FROM advanced.query_logs ORDER BY user -- 2) WITH query_cnt_by_team AS ( SELECT EXTRACT(WEEK FROM query_date) AS week_number , team , user , COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL ) SELECT * , RANK() OVER (PARTITION BY week_number, team ORDER BY query_cnt DESC) AS rk FROM query_cnt_by_team QUALIFY rk = 1 ORDER BY week_number , team , query_cnt DESC -- 3) WITH query_cnt_by_team AS ( SELECT EXTRACT(WEEK FROM query_date) AS week_number , team , user , COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL ) SELECT * , LAG(query_cnt, 1) OVER (PARTITION BY user ORDER BY week_number) AS prev_week_query_cnt FROM query_cnt_by_team ORDER BY user , week_number -- 4) SELECT query_date , team , user , query_cnt , SUM(query_cnt) OVER (PARTITION BY user ORDER BY query_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_sum FROM ( SELECT query_date , team , user , COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL ) ORDER BY user , query_date -- 5) WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ) SELECT * , LAST_VALUE(number_of_orders IGNORE NULLS) OVER (ORDER BY date) AS before_number_of_orders FROM raw_data -- 6) WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ), filled_data AS ( SELECT * EXCEPT(number_of_orders) , LAST_VALUE(number_of_orders IGNORE NULLS) OVER (ORDER BY date) AS number_of_orders FROM raw_data ) SELECT * , AVG(number_of_orders) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg FROM filled_data -- 7) WITH step1 AS ( SELECT * EXCEPT(event_params, platform) , DATETIME(TIMESTAMP_MICROS(event_timestamp), "Asia/Seoul") AS event_datetime FROM advanced.app_logs WHERE user_pseudo_id = "1997494153.8491999091" AND event_date = '2022-08-18' ), step2 AS ( SELECT * , DATETIME_DIFF(event_datetime, prev_event_datetime, SECOND) AS second_diff FROM ( SELECT * , LAG(event_datetime) OVER (ORDER BY event_datetime) AS prev_event_datetime FROM step1 ) ORDER BY event_datetime ) SELECT * , SUM(session_start) OVER (ORDER BY event_datetime) AS session_num FROM ( SELECT * , CASE WHEN second_diff IS NULL THEN 1 WHEN second_diff >= 20 THEN 1 ELSE 0 END AS session_start FROM step2 ) ORDER BY event_datetime
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 2주차] 윈도우 함수, FRAME 설정, QUALIFY
연습 문제(1) 윈도우 함수-- 1) user들의 다음 접속 월과 다다음 접속 월을 구하는 쿼리를 작성해주세요. SELECT user_id, visit_month, LEAD(visit_month, 1) OVER (PARTITION BY user_id ORDER BY visit_month) AS after_visit_month_1, LEAD(visit_month, 2) OVER (PARTITION BY user_id ORDER BY visit_month) AS after_visit_month_2, FROM `advanced.analytics_function_01` ORDER BY user_idLEAD 함수와 적절한 수를 이용해 작성LEAD 함수에 들어가는 인자에 따라 다음인지 다다음인지 정할 수 있음따로 정해주지 않을 경우 1로 인식OVER의 뒷 내용이 생각보다 조금 복잡해 한번에 떠오르지는 않았음OVER의 ORDER BY의 기본은 오름차순 -- 2) user들의 다음 접속 월과 다다음 접속 월, 이전 접속 월을 구하는 쿼리를 작성해주세요. SELECT user_id, visit_month, LEAD(visit_month, 1) OVER (PARTITION BY user_id ORDER BY visit_month) AS after_visit_month_1, LEAD(visit_month, 2) OVER (PARTITION BY user_id ORDER BY visit_month) AS after_visit_month_2, LAG(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) AS previous_visit_month_1, FROM `advanced.analytics_function_01` ORDER BY user_idLAG 사용해 이전 값 구해봄LAG 안에 특별한 숫자를 주지않으니 1로 인식하는 것을 확인함LEAD 값이 NULL → 해당 값이 마지막 값LAG 값이 NULL → 해당 값이 첫번째 값-- 3) user의 다음 접속까지의 간격을 구하시오 SELECT user_id, visit_month, LEAD(visit_month, 1) OVER (PARTITION BY user_id ORDER BY visit_month) AS after_visit_month_1, (LEAD(visit_month, 1) OVER (PARTITION BY user_id ORDER BY visit_month) - visit_month) AS diff_month, FROM `advanced.analytics_function_01` ORDER BY user_idSELECT 절에서 만들어진 컬럼은 그대로 사용할 수는 없음하지만 위와 같이 할 경우 너무 길어지고 복잡해짐(중복됨) → 서브쿼리 사용 SELECT *, (after_visit_month_1 - visit_month) AS diff_month FROM ( SELECT user_id, visit_month, LEAD(visit_month, 1) OVER (PARTITION BY user_id ORDER BY visit_month) AS after_visit_month_1, FROM `advanced.analytics_function_01` ORDER BY user_id )쿼리를 최대한 덜 수정하는 방향으로 작성해 볼 것(2) QUALIFY-- amount_total : 전체 SUM -- cumulative_sum : row 시점에 누적 SUM -- cumulative_sum_by_user : row 시점에 유저별 누적 SUM -- last_5_orders_avg_amount : order_id 기준으로 정렬하고, 직전 5개 주문의 평균 amount SELECT *, SUM(amount) OVER() AS amount_total, SUM(amount) OVER(ORDER BY order_id) AS cumulative_sum, SUM(amount) OVER(PARTITION BY user_id ORDER BY order_id) AS cumulative_sum_by_user, AVG(amount) OVER(ORDER BY order_id ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING) AS last_5_orders_avg_amount FROM advanced.orders ORDER BY order_idBETWEEN 앞에 ROWS를 빼먹어서 계속 오류를 냈음-- 1) 사용자별 쿼리를 실행한 총 횟수를 구하는 쿼리를 작성해주세요. -- 단, GROUP BY를 사용해서 집계하는 것이 아닌 query_logs의 데이터의 우측에 새로운 컬럼을 만들어주세요. SELECT *, COUNT(query_date) OVER(PARTITION BY user) AS cnt_by_user FROM advanced.query_logs데이터에 NULL값이 없으므로 어떤 열을 세던지 상관 없음-- 2) 주차별로 팀 내에서 쿼리를 많이 실행한 수를 구한 후, 실행한 수를 활용해 랭킹을 구해주세요. -- 단, 랭킹이 1등인 사람만 결과가 보이도록 해주세요 SELECT *, RANK() OVER(PARTITION BY team, week_key ORDER BY cnt_by_user DESC) AS rnk FROM ( SELECT user, team, IF(query_date < '2024-05-01', 1, 2) AS week_key, COUNT(user) AS cnt_by_user FROM advanced.query_logs GROUP BY ALL ) QUALIFY rnk = 1 ORDER BY team, week_key'주차별’에 대한 아이디어가 잘 떠오르지 않았음 → 데이터의 범위가 좁기 때문에 일단은 IF를 통해 주차를 구분해줌 → 날짜 범위가 넓어지면 어떻게 할지 아직은 모르겠음서브 쿼리 사용해봄PARTITION이 2개임(주차별, 팀별)QUALIFY 사용할 것(생각 못하고 LIMIT 쓰려다 막힘)-- 강의 코드 WITH query_cnt_by_team AS( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL ) SELECT *, RANK() OVER(PARTITION BY week_number, team ORDER BY query_cnt DESC) AS rk FROM query_cnt_by_team QUALIFY rk = 1 ORDER BY week_number, team, query_cntEXTRACT 함수 통해 ‘주차’ 추출GROUP BY 후 윈도우 함수 사용 → 유연하게 사용할 것-- 3) (2번 문제에서 사용한 주차별 쿼리 사용) 쿼리를 실행한 시점 기준 1주 전에 쿼리 실행 수를 별도의 컬럼으로 확인할 수 있는 쿼리를 작성해주세요 SELECT *, LAG(query_cnt) OVER(PARTITION BY user ORDER BY week_number) AS previous_week_cnt FROM query_cnt_by_team ORDER BY userWITH AS로 만든 테이블 그대로 사용-- 4) 시간의 흐름에 따라, 일자별로 유저가 실행한 누적 쿼리 수를 작성해주세요 SELECT *, SUM(query_cnt) OVER(PARTITION BY user ORDER BY query_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumul_cnt FROM ( SELECT user, team, query_date, COUNT(*) AS query_cnt FROM advanced.query_logs GROUP BY ALL ) ORDER BY user, query_dateGROUP BY한 서브쿼리 사용FRAME 사용이 그렇게 까다롭진 않았음FRAME의 defalut 값 → UNBOUNDED PRECEDING ~ CURRENT ROW-- 5) 다음 데이터는 주문 횟수를 나타낸 데이터입니다. 만약 주문 횟수가 없으면 NULL로 기록됩니다. -- 이런 데이터에서 NULL 값이라고 되어있는 부분을 바로 이전 날짜의 값으로 채워주는 쿼리를 작성해주세요 SELECT date, IF(number_of_orders IS NULL, LAG(raw_data.number_of_orders, 1) OVER(ORDER BY date), number_of_orders) AS number_of_orders FROM raw_data조건문 사용해서 IS NULL인 값들만 LAG 사용기존의 number_of_orders가 사라지는 문제가 있음마지막 날짜는 안채워짐(연속으로 NULL이라) -- 강의 코드 SELECT *, LAST_VALUE(number_of_orders IGNORE NULLS) OVER(ORDER BY date) AS last_value_orders FROM raw_dataLAST_VALUE + IGNORE NULLS 사용-- 6) 5번 문제에서 NULL을 채운 후, 2일 전 ~ 현재 데이터의 평균을 구하는 쿼리를 작성해주세요(이동 평균) SELECT *, AVG(last_value_orders) OVER(ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_average FROM ( SELECT *, LAST_VALUE(number_of_orders IGNORE NULLS) OVER(ORDER BY date) AS last_value_orders FROM raw_data서브쿼리 사용했지만 WITH로 정의해도 됨 → 문제에서는 WITH가 연속 두번 나오는데, WITH는 한번만 작성해도 됨(, 로 구분)-- 7) app_logs 테이블에서 Custom Session을 만들어 주세요. 이전 이벤트 로그와 20초가 지나면 새로운 Session을 만들어 주세요. -- Session은 숫자로 (1, 2, 3 ...) 표시해도 됩니다 WITH base AS( SELECT event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), "Asia/Seoul") AS event_datetime, event_name, user_id, user_pseudo_id, FROM advanced.app_logs WHERE (event_date = "2022-08-18") AND (user_pseudo_id = "1997494153.8491999091") ORDER BY event_timestamp ) SELECT *, SUM(diff_classification) OVER(ORDER BY event_datetime) + 1 AS session_id FROM ( SELECT *, IF(DATETIME_DIFF(event_datetime, before_datetime, second) > 20, 1, 0) AS diff_classification FROM ( SELECT *, LAG(event_datetime) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime) AS before_datetime, FROM base ) ) ORDER BY event_datetimetimestamp와 1초의 관계 : timestamp 1당 1초 아닌가? → 그렇다면 굳이 datetime으로 바꿀 필요가 있나서브쿼리 2번 중첩해서 사용IF문 사용해 20초 차이남 → 1 차이 안남 → 0 으로 파생해 누적합 + 1로 session_id 도출첫 행 before_datetime에 예외처리 해줄 것그런데 PARTITION BY 를 매 OVER 안에 무조건 써야 하나?EDA(2)요일별 접속자 수 + 기간 내 이벤트날요일별 접속자 평균을 내보자일 > 토 > 수 > 금 > 목 > 화 > 월 → 역시 쉬는날이 더 배달 수요가 많은건가? → 수요일은 왜 일까?2022-08-01(월) ~ 2023-01-20(금) 의 데이터주말(토, 일)이 아닌 공휴일 목록 → 네이버 캘린더 참조2022-08-15 월 : 광복절2022-09-09 금 : 추석연휴2022-09-12 월 : 추석연휴2022-10-03 월 : 개천절2023-01-23 월 : 설 연휴2023-01-24 화 : 설 연휴(대체공휴일)주요 이벤트 → 특정 사건이 있을 경우 배달 수요가 늘거나 줄지 않을까? / 위키 사이트 참조2022-08-02 화 : 코로나19 누적 감염자 2천만 명 돌파2022-08-08 월 : 수도권 기록적인 폭우 및 홍수2022-10-31 월 : 할로윈데이 / 2022-10-29 토에 이태원 압사 사고 발생2022-11-08 화 : 한국시리즈2022-11-17 목 : 2023학년도 대학수학능력시험2022-11-24 목 : 카타르 월드컵 vs 우루과이2022-11-28 월 : 카타르 월드컵 vs 가나2022-12-03 토 : 카타르 월드컵 vs 포르투갈2022-12-19 월 : 카타르 월드컵 결승2022-12-25 일 : 크리스마스등등…“특정 사건”의 중요도를 정성적으로 평가하기에는 어렵다고 느낌 → 오히려 반대로 갑자기 이용자 수의 변화가 급격하게 나타나는 날짜를 위주로 봐야하나? → 그런데 앱 이용자 수 성장 시기에는 항상 상승만해서 보기 애매할 듯 → 요일별 경향을 따져야 할 수도일요일 < 월요일(개천절) < 화요일 → 앱 이용자 수 상승 예시월드컵에 따른 이용자 수 변화는 뚜렷하지 않음수요일에 딱히 뭔가 보이진 않음 → 평일의 절반이 지남에 따른 보상 심리가 원인일수도?시간대에 따른 이용자 수마찬가지로 시간대별 평균 이용자 수 구해봄저녁 시간대(19시 ~ 22시)가 가장 이용자 수 많음점심 시간대(12시 ~ 14시)가 그 다음당연하게도 식사 시간에 배달 앱 수요가 많음시간대 별로 food_id 에 따른 수요가 다른지 확인해볼 것 → 시간대 별 이용자에게 추천해주는 음식 다르게 설정할 수 있음하루의 기준을 0시로 잡아도 되나?에 대한 의문이 생김(야식 수요) → 이거에 대한 기준을 다시 잡고 일별 이용자 수 다시 구해야 할수도
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 2주차 과제] 윈도우 함수 연습문제
🔐 이번주차 중요 키워드 : 윈도우 함수, FRAME, QUALIFY ✅ 윈도우 함수 연습문제 1번1) 사용자별 쿼리를 실행한 총 횟수를 구하는 쿼리 작성 단, group by를 사용해서 집계하는 것이 아닌 query_logs의 데이터 우측에 새로운 컬럼을 만들어주세요. select *, count(query_date) over (partition by user) as query_cnt from advanced.query_logs order by query_cnt desc ✅ 윈도우 함수 연습문제 2번2) 주차별로 팀 내에서 쿼리를 많이 실행한 수 구하기 2-1) 실행한 수를 활용해 랭킹 구하기 -- 단, 랭킹이 1등인 사람만 결과가 보이도록 해주세요. week_number | team | user | query_cnt | team_rank select * , rank() over(partition by week_number, team order by query_cnt) as team_rank from ( select EXTRACT(week FROM query_date) as week_number, team, user, count(query_date) as query_cnt from advanced.query_logs group by all ) qualify team_rank = 1 order by week_number, team, query_cnt DESC ❗ 새롭게 알게된 함수 : EXTRACT(week FROM query_date) as week_number → 기존의 알고 있던 함수와 같은 결과값 : DATE_TRUNC(query_date, WEEK) AS week_number❗ qualify team_rank = 1 : QUALIFY 덕분에 서브쿼리 없이 바로 조건에 사용가능함! ✅ 윈도우 함수 연습문제 3번3) (2번 문제에서 사용한 주차별 쿼리 사용) 쿼리를 실행한 시점 기준 1주전에 쿼리 실행 수를 별도의 컬럼으로 확인할 수 있는 쿼리 작성 select * , LAG(query_cnt, 1) over (partition by user order by week_number) as pre_week_query_cnt from ( select EXTRACT(week FROM query_date) as week_number, team, user, count(query_date) as query_cnt from advanced.query_logs group by all ) ✅ 윈도우 함수 연습문제 4번4) 시간의 흐름에 따라, 일자별로 유저가 실행한 누적 쿼리수 작성 ## FRAME의 default값은 UNBOUNDED PRECEDING AND CURRENT ROW with query_cnt_by_team as ( select EXTRACT(week FROM query_date) as week_number, team, user, query_date, count(query_date) as query_cnt from advanced.query_logs group by all ) select *, sum(query_cnt) over (partition by user order by query_date ASC) as cumulative_SUM1, sum(query_cnt) over (partition by user order by query_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as cumulative_SUM2, from query_cnt_by_team ❗FRAME의 default값은 UNBOUNDED PRECEDING AND CURRENT ROW✅ 윈도우 함수 연습문제 5번) 주무횟수 데이터에서 주문횟수가 없으면 NULL로 기록됨. 이런 데이터에서 NULL값이라고 되어있는 부분을 바로 이전 날짜의 값으로 채워주는 쿼리 WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ) select * , ifnull(number_of_orders, LAST_VALUE(number_of_orders IGNORE NULLS) over(order by date)) as filled_orders from raw_data -- 조건절 ifnull 사용할 수 있음. -- LAG()를 사용하면 마지막 NULL값인 경우 채우는 값도 NULL! -- 그렇기 떄문에, LAST_VALUE()인데, NULL은 무시하라는 IGNORE NULLS! ❗️LAST_VALUE에서 IGNORE NULLS 안하면 값은 NULL✅ 윈도우 함수 연습문제 6번6) NULL을 채운후, 2일전 ~ 현재 데이터의 평균 (이동평균) WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ) select * , AVG(filled_orders) over (order by date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS avg_orders from ( select * , ifnull(number_of_orders, LAST_VALUE(number_of_orders IGNORE NULLS) over(order by date)) as filled_orders from raw_data ) ✅ 윈도우 함수 연습문제 7번7) app_logs 테이블에서 custom_session을 만들어 주세요:) 이전 이벤트 로그와 20초가 지나면 새로운 session을 만들어 주세요. event_date | event_timestamp | event_datetime | evnet_name | user_id | user_pseudo_id | before_event_datetime | second_diff | session_start | session_id with base as ( select event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') as event_time, event_name, user_id, user_pseudo_id, from advanced.app_logs where event_date = "2022-08-18" and user_pseudo_id = "1997494153.8491999091" order by event_timestamp ) select *, sum(session_start) over (partition by user_pseudo_id order by event_time) as session_num from ( select *, CASE WHEN before_event_datetime IS NULL THEN 1 WHEN second_diff >=20 THEN 1 ELSE 0 END AS session_start from ( select *, DATETIME_DIFF(event_time, before_event_datetime, SECOND) AS second_diff from ( select *, LAG(event_time,1) over (partition by user_pseudo_id order by event_time) as before_event_datetime from base order by event_time ) ) ) ❗새롭게 알게 된 함수 DATETIME_DIFF : 처음에 날짜-시간 차이를 단순히 (-)로만 생각했다가 잘못된 결과 도출 → DATETIME_DIFF 함수로 정답도출!
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 2주차 과제] 윈도우 함수 연습 문제
강의 문제 1) user들의 다음 접속월과 다다음 접속월을 구하는 쿼리를 작성해주세요.-- 출제의도: 윈도우 함수(LEAD)를 사용하여 파티션을 나눠 데이터를 탐색할 수 있는가? SELECT user_id , visit_month AS visit_month_m0 , LEAD(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) AS visit_month_m1 , LEAD(visit_month, 2) OVER (PARTITION BY user_id ORDER BY visit_month) AS visit_month_m2 FROM advanced.analytics_function_01 ORDER BY user_id;강의 문제 2) user들의 다음 접속월과 다다음 접속월, 이전 접속월을 구하는 쿼리를 작성해주세요-- 출제의도: 윈도우 함수(LEAD)와 (LAG)을 함께 사용하여 파티션을 나눠 데이터를 탐색할 수 있는가? SELECT user_id , visit_month AS visit_month_m0 , LEAD(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) AS visit_month_m1 , LEAD(visit_month, 2) OVER (PARTITION BY user_id ORDER BY visit_month) AS visit_month_m2 , LAG(visit_month) OVER (PARTITION BY user_id ORDER BY visit_month) AS visit_month_p1 FROM advanced.analytics_function_01 ORDER BY user_id;강의 문제 3) Frame 설정을 활용한 윈도우 함수 사용SELECT order_id , order_date , user_id , amount , SUM(amount) OVER (ORDER BY order_date, order_id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS amount_total , SUM(amount) OVER (ORDER BY order_date, order_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_sum , SUM(amount) OVER (PARTITION BY user_id ORDER BY order_date, order_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_sum_by_user , AVG(amount) OVER (ORDER BY order_date, order_id ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING) AS last_5_orders_avg_amount FROM advanced.orders --QUALIFY last_5_orders_avg_amount >= 150 ORDER BY order_id;윈도우 함수 연습문제 1) 사용자별 쿼리를 실행한 총 횟수를 구하는 쿼리를 작성해주세요. 단, GROUP BY를 사용해서 집계하는 것이 아닌, query_logs의 데이터 우측에 새로운 컬럼을 만들어주세요. -- 출제의도: 윈도우 함수의 집계 함수 중 COUNT를 사용할 수 있는가? SELECT * , COUNT(user) OVER (PARTITION BY user) AS total_query_cnt FROM advanced.query_logs ORDER BY query_date, user;윈도우 함수 연습문제 2) 주차별로 팀 내에서 쿼리를 많이 실행한 수를 구한 후, 실행한 수를 활용해 랭킹을 구해주세요. 단, 랭킹이 1등인 사람만 결과가 보이도록 해주세요. -- 출제의도: GROUP BY와 윈도우 함수(순위)를 함께 사용할 수 있는가? SELECT * , RANK() OVER (PARTITION BY week_number, team ORDER BY query_cnt DESC) AS team_rank FROM (SELECT EXTRACT(WEEK FROM query_date) AS week_number , team , user , COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL) QUALIFY team_rank = 1 ORDER BY week_number, team;윈도우 함수 연습문제 3) (2번 문제에서 사용한 주차별 쿼리 사용) 쿼리를 실행한 시점 기준, 1주 전에 쿼리 실행 수를 별도의 컬럼으로 확인할 수 있는 쿼리를 작성해주세요. -- 출제의도: GROUP BY와 윈도우 함수(LAG)를 함께 사용할 수 있는가? SELECT * , LAG(query_cnt) OVER (PARTITION BY user ORDER BY week_number) AS prev_week_query_count FROM (SELECT user , team , EXTRACT(WEEK FROM query_date) AS week_number , COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL) ORDER BY user, week_number;윈도우 함수 연습문제 4) 시간 흐름에 따라, 일자별로 유저가 실행한 누적 쿼리 수를 작성해주세요. -- *FRAME의 DEFAULT 값: UNBOUNDED PRECEDING ~ CURRENT ROW -- 출제의도: GROUP BY와 윈도우 함수(SUM-누계합)를 함께 사용할 수 있는가? SELECT * , SUM(query_cnt) OVER (PARTITION BY user ORDER BY query_date) AS cumulative_query_count FROM (SELECT user , team , query_date , COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL) ORDER BY user, query_date;윈도우 함수 연습문제 5) 다음 데이터는 주문 횟수를 나타낸 데이터입니다. 만약 주문 횟수가 없으면 NULL로 기록됩니다. 이런 데이터에서 NULL 값이라고 되어있는 부분을 바로 이전 날짜의 값으로 채워주는 쿼리를 작성해주세요. -- 출제의도: 윈도우 함수(LAST_VALUE)에서 IGNORE NULLS가 필요한 상황을 이해할 수 있는가? WITH raw_data AS( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ) -- SELECT -- date -- , IFNULL(number_of_orders, LAG(number_of_orders) OVER (ORDER BY date)) AS number_of_orders -- FROM raw_data -- ORDER BY date; SELECT * , LAST_VALUE(number_of_orders IGNORE NULLS) OVER (ORDER BY date) AS number_of_orders FROM raw_data ORDER BY date;윈도우 함수 연습문제 6) 5번 문제에서 NULL을 채운 후, 2일 전 ~ 현재 데이터의 평균을 구하는 쿼리를 작성해주세요. -- 출제의도: 예외 값을 처리한 이후, 윈도우 함수로 이동 평균을 계산할 수 있는가? WITH raw_data AS( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ) SELECT * , ROUND(AVG(number_of_orders) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) , 1) AS moving_avg FROM( SELECT date , IFNULL(number_of_orders, LAG(number_of_orders) OVER (ORDER BY date)) AS number_of_orders FROM raw_data) ORDER BY date;윈도우 함수 연습문제 7) app_logs 테이블에서 Custom Session을 만들어주세요. 이전 이벤트 로그와 20초가 지나면 새로운 Session을 만들어 주세요. -- *Session은 숫자로(1,2,3…) 표시해도 됩니다. -- **2022-08-18일의 user_pseudo_id(1997494153. 8491999091)은 session_id가 4까지 나옵니다 -- 출제의도: 윈도우 함수를 웹 로그 데이터에 적용하여 활용할 수 있는가? -- Step 1. Session 정보 추출 WITH base AS( SELECT event_date , event_timestamp , DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime , event_name , user_id , user_pseudo_id , DATETIME(TIMESTAMP_MICROS(LAG(event_timestamp) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp)), 'Asia/Seoul') AS before_event_datetime FROM advanced.app_logs WHERE event_date = '2022-08-18' AND user_pseudo_id = '1997494153.8491999091' ), -- Step 2. 세션 유지 시간 및 신규 세션 여부 계산 session_info AS( SELECT * , TIMESTAMP_DIFF(event_datetime, before_event_datetime, SECOND) AS second_diff , CASE WHEN TIMESTAMP_DIFF(event_datetime, before_event_datetime, SECOND) >= 20 OR TIMESTAMP_DIFF(event_datetime, before_event_datetime, SECOND) IS NULL THEN 1 ELSE NULL END AS session_start FROM base ) -- Step 3. 신규 세션 id 세팅 SELECT * , SUM(session_start) OVER (PARTITION BY user_pseudo_id ORDER BY event_datetime) AS session_id FROM session_info ORDER BY event_date, event_timestamp;
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 2주차] 윈도우 함수, FRAME설정, QUALITY
윈도우 탐색 함수 연습문제(1) 연습문제 1-- 문제 1) USER의 다음 접속월, 다다음 접속 월 SELECT user_id, visit_month, LEAD(visit_month,1) OVER(PARTITION BY user_id ORDER BY visit_month) AS next_month, LEAD(visit_month,2) OVER(PARTITION BY user_id ORDER BY visit_month) AS the_month_after_next FROM `avdanced.analytics_function_01` (2) 연습문제 2-- 문제 2) USER의 다음 접속월, 다다음 접속 월, 이전 접속 월 SELECT user_id, visit_month, LEAD(visit_month,1) OVER(PARTITION BY user_id ORDER BY visit_month) AS next_month, LEAD(visit_month,2) OVER(PARTITION BY user_id ORDER BY visit_month) AS the_month_after_next, LAG(visit_month,1) OVER(PARTITION BY user_id ORDER BY visit_month) AS last_month FROM `avdanced.analytics_function_01` 윈도우 함수 FRAME 연습문제연습문제 (1~4)SELECT -- 1)모든 주문량 SUM(amount) OVER() AS amount_total, -- 2)특정주문시점에서 누적주문량 #SUM(amount) OVER(partition by order_date) AS cumulative_sum, SUM(amount) OVER (ORDER BY order_date) AS cumulative_sum, -- 3)고객별 주문 시점에서 누적 주문량 #SUM(amount) OVER(partition by user_id) AS cumulative_sum_by_user, SUM(amount) OVER(partition by user_id ORDER BY order_id) AS cumulative_sum_by_user, -- 4) 최근 직전 5개 평균 주문량 AVG(amount) OVER(ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING) AS last_5_orders_avg_amount, AVG(amount) OVER(ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS last_5_unbounded_orders_avg_amount, AVG(amount) OVER(ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS all_orders_avg_amount FROM `avdanced.orders` 윈도우 함수(1) 연습문제 1-- 연습문제1) 사용자별 쿼리 실행 횟수 WITH base AS( SELECT user, team, query_date, COUNT(*) OVER(PARTITION BY user) AS total_query_cnt, FROM `avdanced.query_logs` ) SELECT * FROM base(2) 연습문제 2-- 연습문제2) 주차별 팀내 쿼리 실행한 수 (RANK 1만 보이도록) WITH base2 AS( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(*) OVER(PARTITION BY EXTRACT(WEEK FROM query_date) ,user ORDER BY EXTRACT(WEEK FROM query_date) ) AS query_cnt, FROM `avdanced.query_logs` ORDER BY EXTRACT(WEEK FROM query_date) ) SELECT DISTINCT *, RANK() OVER(PARTITION BY team,week_number ORDER BY total_query_cnt DESC) AS team_rank FROM base2 QUALIFY team_rank = 1 ORDER BY week_number, team강의자료의 코드-- 2) 주차별로 팀 내에서 쿼리를 많이 실행한 수를 구한 후, 실행한 수를 활용해 랭킹을 구해주세요. 단, 랭킹이 1등인 사람만 결과가 보이도록 해주세요 -- 주차별로 개인당 실행한 쿼리 횟수 -- 위 쿼리 횟수를 기반으로 랭킹 -- 랭킹을 기반으로 필터링(랭킹=1) -- 문제의 의도 : 원본 데이터 => 1 row마다 데이터가 있고, 그걸 집계해서 사용. GROUP BY => 윈도우 함수 WITH query_cnt_by_team AS ( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(user) AS query_cnt FROM advanced.query_logs GROUP BY ALL ) SELECT *, RANK() OVER(PARTITION BY week_number, team ORDER BY query_cnt DESC) AS rk FROM query_cnt_by_team -- QUALIFY : 윈도우 함수의 조건을 설정할 때 사용 QUALIFY rk = 1 ORDER BY week_number, team, query_cnt DESCCOUNT의 윈도우 함수 대신에 GROUP BY를 사용하는 풀이도 있다는 것을 알게 되었다! 너무 어렵게 생각하지 말기!(3) 연습문제 3WITH base2 AS( SELECT EXTRACT(WEEK FROM query_date) AS week_number, team, user, COUNT(*) OVER(PARTITION BY EXTRACT(WEEK FROM query_date) ,user ORDER BY EXTRACT(WEEK FROM query_date) ) AS query_cnt, FROM `avdanced.query_logs` #QUALIFY team_rank = 1 ORDER BY EXTRACT(WEEK FROM query_date) ), base3 AS( SELECT DISTINCT *, RANK() OVER(PARTITION BY team,week_number ORDER BY query_cnt DESC) AS team_rank FROM base2 QUALIFY team_rank = 1 ORDER BY week_number, team ) -- 연습문제3) 쿼리 실행 시점 1주전 쿼리 실행 SELECT DISTINCT *, LAG(query_cnt,1) OVER(PARTITION BY user ORDER BY week_number) AS prev_week_query_count FROM base2 GROUP BY ALL ORDER BY user, week_number(4) 연습문제 4--연습문제4) SELECT *, SUM(query_count) OVER(PARTITION BY user ORDER BY query_date) AS culmulative_query_count, SUM(query_cnt) OVER(PARTITION BY user ORDER BY query_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_sum2 FROM( SELECT DISTINCT *, COUNT(user) OVER(PARTITION BY query_date, user) AS query_count, FROM `avdanced.query_logs` ) ORDER BY user,query_date QUALIFY 로 조건설정을 하여 두 값이 같은 지 비교하는 법이 인상깊었던 문제 (QUALIFY cumulative_sum != cumulative_sum2) (5) 연습문제 5나의 풀이WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ) --연습문제 5) null에 이전 값 삽입 SELECT raw_data.date, IF(raw_data.number_of_orders IS NULL, LAG(raw_data.number_of_orders,1) OVER(ORDER BY date), raw_data.number_of_orders) FROM raw_data강의풀이WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ), -- LAG로 직전 값 가져오면 되지 않을까? -- number_of_orders가 null이면, before_number_of_orders를 가져와라! -- 아래 쿼리는 어려운 방법 -- 그 다음 방법 : LAST VALUE를 쓰자! => 값이 없으면 NULL이 뜬다! -- FIRST_VALUE, LAST_VALUE => NULL을 포함해서 연산 -- 출제 의도 : NULL을 제외해서 연산하고 싶으면 IGNORE NULLS을 쓰면 된다! -- SELECT -- *, -- IF(number_of_orders IS NULL, before_number_of_orders, number_of_orders) AS filled_orders -- -- Number of arguments does not match for function IF. Supported signature: IF(BOOL, ANY, ANY) at [89:3] -- -- False일 때 인자를 추가하지 않아서 생긴 오류 -- FROM ( -- SELECT -- *, -- LAG(number_of_orders) OVER(ORDER BY date) AS before_number_of_orders -- FROM raw_data -- ) filled_data AS ( SELECT * EXCEPT(number_of_orders), LAST_VALUE(number_of_orders IGNORE NULLS) OVER(ORDER BY date) AS number_of_orders FROM raw_data -- Syntax error: Expected keyword DEPTH but got identifier "filled_data" at [104:6] : WITH문을 두개 작성했는데 WITH 쉼표 쓰고 구분! )LAST_VALUE, FIRST_VALUE를 사용하는 풀이 법에 대하여 알게 됨(6) 연습문제 6WITH raw_data AS ( SELECT DATE '2024-05-01' AS date, 15 AS number_of_orders UNION ALL SELECT DATE '2024-05-02', 13 UNION ALL SELECT DATE '2024-05-03', NULL UNION ALL SELECT DATE '2024-05-04', 16 UNION ALL SELECT DATE '2024-05-05', NULL UNION ALL SELECT DATE '2024-05-06', 18 UNION ALL SELECT DATE '2024-05-07', 20 UNION ALL SELECT DATE '2024-05-08', NULL UNION ALL SELECT DATE '2024-05-09', 13 UNION ALL SELECT DATE '2024-05-10', 14 UNION ALL SELECT DATE '2024-05-11', NULL UNION ALL SELECT DATE '2024-05-12', NULL ), null_is_lag AS( --연습문제 5) null에 이전 값 삽입 SELECT raw_data.date, IF(raw_data.number_of_orders IS NULL, LAG(raw_data.number_of_orders,1) OVER(ORDER BY date), raw_data.number_of_orders) AS number_of_orders FROM raw_data ) -- 연습문제 6) 이동평균 SELECT *, AVG(nl.number_of_orders) OVER(ORDER BY nl.date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg FROM null_is_lag AS nlFRAME절을 사용할 때, AND를 기준으로 앞에는 뒤의 값보다 반드시 이전 행을 가리키는 구문이 와야 함!(7) 연습문제 7-- 1. TIMESTAMP → DATETIME -- 2. SECOND_DIFF 생성 : uSER로 묶어서 - -- 3. SESSION_START생성 : USER로 묶어서 LAG(DATA,1)이 NULL이면 1, SECOND_DIFF가 20이상이면 +1 -- 4. SESSION_ID생성: SESSION_START가 1일 경우 SESSION_ID +1, NULL일 경우 LAG(DATA,1) WITH add_date AS ( -- 1. TIMESTAMP → DATETIME SELECT event_date, event_timestamp, DATETIME(TIMESTAMP_MICROS(event_timestamp)) AS event_datetime, event_name, user_id, user_pseudo_id, LAG(DATETIME(TIMESTAMP_MICROS(event_timestamp))) OVER(PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS before_event_datetime FROM `avdanced.app_logs_temp` --,UNNEST(event_params) AS param -- FROM 절 안에서 UNNEST를 사용 WHERE event_date ="2022-08-18" AND user_pseudo_id = "1997494153.8491999091" ), add_diff AS ( -- 2. SECOND_DIFF 생성 : uSER로 묶어서 - SELECT *, DATE_DIFF(event_datetime, before_event_datetime,SECOND) AS second_diff, FROM add_date ), add_session AS( -- 3. SESSION_START생성 : USER로 묶어서 LAG(DATA,1)이 NULL이면 1, SECOND_DIFF가 20이상이면 +1 SELECT *, IF(second_diff IS NULL OR second_diff >=20, 1, NULL) AS session_start FROM add_diff ) -- 4. SESSION_ID생성 *, SUM(session_start) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime) AS session_num FROM add_session ORDER BY event_datetime user_id와 user_pseudo_id는 다르다. (계정이 존재하면 USER_ID, 없어도 USER_PSEUDO_ID를 통해 활동기록이 남는다.)IF문은 행 단위로 작동, SUM은 특정 파티션에 대한 누적합을 계산함으로 아래 코드가 작동하지 않음 SUM(session_start) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime) AS session_numCTE를 사용하는 것과 서브쿼리를 사용하는 방법 중 상황에 맞게 적절히 혼용할 수 있다는 것을 알게 됨!
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 2주차 과제] 윈도우 함수 연습 문제
윈도우 함수 (탐색 함수) 연습 문제1번SELECT user_id, visit_month, lead(visit_month, 1) OVER(PARTITION BY user_id ORDER BY visit_month) AS next_1month, lead(visit_month, 2) OVER(PARTITION BY user_id ORDER BY visit_month) AS next_2month, FROM advanced.analytics_function_01 ORDER BY 1,2;2번SELECT user_id, visit_month, lead(visit_month, 1) OVER(PARTITION BY user_id ORDER BY visit_month) AS next_1month, lead(visit_month, 2) OVER(PARTITION BY user_id ORDER BY visit_month) AS next_2month, lag(visit_month, 1) OVER(PARTITION BY user_id ORDER BY visit_month) AS pre_1month FROM advanced.analytics_function_01 ORDER BY 1,2;윈도우 함수 연습 문제1번SELECT user, team, query_date, count(user) over(PARTITION BY user) AS total_query_cnt FROM advanced.query_logs ORDER BY query_date, user2번WITH base AS ( SELECT EXTRACT(week FROM query_date) AS week_number, team, user, count(user) AS query_cnt FROM advanced.query_logs GROUP BY 1,2,3 ) SELECT *, rank() OVER(PARTITION BY week_number, team ORDER BY query_cnt desc) AS team_rank FROM base QUALIFY team_rank = 1 ORDER BY week_number, query_cnt desc;3번WITH base AS ( SELECT EXTRACT(week FROM query_date) AS week_number, team, user, count(user) AS query_cnt FROM advanced.query_logs GROUP BY 1,2,3 ) SELECT *, lag(query_cnt) OVER(PARTITION BY team, user ORDER BY week_number asc) AS prev_week_query_cnt FROM base ORDER BY team, user, week_number;4번WITH base AS ( SELECT user, team, query_date, COUNT(user) as query_count FROM advanced.query_logs GROUP BY 1,2,3 ) SELECT user, team, query_date, query_count, SUM(query_count) OVER(PARTITION BY team, user ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_query_count FROM base ORDER BY team, user, query_date;5번WITH raw_data AS( SELECT DATE'2024-05-01'AS date,15 AS number_of_orders UNION ALL SELECT DATE'2024-05-02',13 UNION ALL SELECT DATE'2024-05-03',NULL UNION ALL SELECT DATE'2024-05-04',16 UNION ALL SELECT DATE'2024-05-05',NULL UNION ALL SELECT DATE'2024-05-06',18 UNION ALL SELECT DATE'2024-05-07',20 UNION ALL SELECT DATE'2024-05-08',NULL UNION ALL SELECT DATE'2024-05-09',13 UNION ALL SELECT DATE'2024-05-10',14 UNION ALL SELECT DATE'2024-05-11',NULL UNION ALL SELECT DATE'2024-05-12',NULL ) SELECT date, IF(number_of_orders is null , last_value(number_of_orders IGNORE NULLS) OVER(ORDER BY date asc), number_of_orders) AS number_of_orders_not_null FROM raw_data;6번WITH raw_data AS( SELECT DATE'2024-05-01'AS date,15 AS number_of_orders UNION ALL SELECT DATE'2024-05-02',13 UNION ALL SELECT DATE'2024-05-03',NULL UNION ALL SELECT DATE'2024-05-04',16 UNION ALL SELECT DATE'2024-05-05',NULL UNION ALL SELECT DATE'2024-05-06',18 UNION ALL SELECT DATE'2024-05-07',20 UNION ALL SELECT DATE'2024-05-08',NULL UNION ALL SELECT DATE'2024-05-09',13 UNION ALL SELECT DATE'2024-05-10',14 UNION ALL SELECT DATE'2024-05-11',NULL UNION ALL SELECT DATE'2024-05-12',NULL ), fill_null AS ( SELECT date, IF(number_of_orders is null , last_value(number_of_orders IGNORE NULLS) OVER(ORDER BY date asc), number_of_orders) AS number_of_orders_not_null FROM raw_data ) SELECT date, number_of_orders_not_null, avg(number_of_orders_not_null) OVER(ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg FROM fill_null;7번WITH base AS ( SELECT event_date, event_timestamp, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime, event_name, user_id, user_pseudo_id FROM advanced.app_logs WHERE event_date = '2022-08-18' ), diff_date AS ( SELECT *, DATETIME_DIFF(event_datetime, pre_event_time, second) AS date_diff_sec FROM ( SELECT *, LAG(event_datetime, 1) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime asc) AS pre_event_time FROM base ) ), session_start AS ( SELECT *, CASE WHEN pre_event_time IS NULL THEN 1 WHEN date_diff_sec >= 20 THEN 1 END AS start_session FROM diff_date ) SELECT event_date, event_datetime, event_name, user_id, user_pseudo_id, date_diff_sec, SUM(start_session) OVER(PARTITION BY user_pseudo_id ORDER BY event_datetime) AS session_id FROM session_start ORDER BY user_pseudo_id, event_datetime;
-
해결됨BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
리텐션 연습문제
안녕하세요~리텐션 연습문제 피드백 부탁드립니다! 감사합니다 https://www.notion.so/BigQuery-Retention-12eb0851d79c804389e8caaa3412d282?pvs=4
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 1주차 과제] ARRAY, STRUCT 연습 문제/ PIVOT 연습문제/ 퍼널 쿼리 연습 문제
[ARRAY, STRUCT]array_exercises 테이블에서 각 영화(title)별로 장르(genres)를 UNNEST해서 보여주세요.SELECT title, #genres, genre FROM `advanced.array_excercises` ae cross join unnest(genres) as genre;2. array_exercies 테이블에서 각 영화(title)별로 배우(actor)와 배역(character)을 보여주세요 배우와 배역은 별도의 컬럼으로 나와야 합니다.select title, actor.actor, actor.character from `advanced.array_excercises` cross join unnest(actors) as actor; array_exercises 테이블에서 각 영화(title)별로 배우(actor), 배역(character), 장르 (genre)를 출력하세요. 한 Row에 배우, 배역, 장르가 모두 표시되어야 합니다.select title, actor.actor, actor.character, genre from `advanced.array_excercises` cross join unnest(actors) as actor cross join unnest(genres) as genre;앱 로그 데이터(app_logs) 배열 풀기select user_id, event_date, event_name, user_pseudo_id, params.key, params.value.string_value as string_value, params.value.int_value as int_value from `advanced.app_logs` cross join unnest(event_params) as params where event_date = "2022-08-01 [PIVOT]orders 테이블에서 유저(user_id)별로 주문 금액(amount)의 합계를 PIVOT해주세요.날짜(order_date)를 행(Row)으로, user_id를 열(Column)으로 만들어야 합니다.select order_date, # amount의 합 sum(if(user_id=1, amount, 0)) as user_1, sum(if(user_id=2, amount, 0)) as user_2, sum(if(user_id=3, amount, 0)) as user_3 FROM `advanced.orders` group by order_date order by order_date; orders 테이블에서 날짜(order_date)별로 유저들의 주문 금액(amount)의 합계를 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)으로 만들어야 합니다select user_id, sum(if(order_date = "2023-05-01", amount, 0)) as `2023-05-01`, sum(if(order_date = "2023-05-02", amount, 0)) as `2023-05-02`, sum(if(order_date = "2023-05-03", amount, 0)) as `2023-05-03`, sum(if(order_date = "2023-05-04", amount, 0)) as `2023-05-04`, sum(if(order_date = "2023-05-05", amount, 0)) as `2023-05-05` from `advanced.orders` group by user_id order by user_id;orders 테이블에서 사용자(user_id)별, 날짜(order_date)별로 주문이 있다면 1, 없다면 0으로 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)로 만들고 주문을 많이 해도 1로 처리합니다.select user_id, max(if(order_date = "2023-05-01", 1, 0)) as `2023-05-01`, max(if(order_date = "2023-05-02", 1, 0)) as `2023-05-02`, max(if(order_date = "2023-05-03", 1, 0)) as `2023-05-03`, max(if(order_date = "2023-05-04", 1, 0)) as `2023-05-04`, max(if(order_date = "2023-05-05", 1, 0)) as `2023-05-05` from `advanced.orders` group by user_id order by user_id;user_id = 32888이 카트 추가하기(click_cart)를 누를때 어떤 음식(food_id)을 담았나요?with base as ( select -- * EXCEPT(event_params), # * except(컬럼) :컬럼을 제외하고 모두 다 보여줘 -- param event_date, event_timestamp, event_name, user_id, user_pseudo_id, max(if(param.key = "firebase_screen", param.value.string_value, null)) as firebase_screen, -- max(if(param.key = "food_id", param.value.string_value, null)) as food_id, max(if(param.key = "food_id", param.value.int_value, null)) as food_id, max(if(param.key = "session_id", param.value.string_value, null)) as session_id from `advanced.app_logs` cross join unnest(event_params) as param group by all ) select user_id, event_date, count(user_id) as user_cnt, food_id from base where user_id = 32888 and event_name = "click_cart" group by all [퍼널분석]with base as ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, platform, max(if(event_param.key = "firebase_screen", event_param.value.string_value, null)) as firebase_screen, max(if(event_param.key = "session_id", event_param.value.string_value, null)) as session_id from `advanced.app_logs` cross join unnest(event_params) as event_param where event_date between "2022-08-01" and "2022-08-18" group by all ), filter_event as( select * except(event_name, firebase_screen, event_timestamp), concat(event_name, "-", firebase_screen) as event_name_with_screen, DATETIME(timestamp_micros(event_timestamp),'Asia/Seoul')AS event_datetime from base where event_name IN("screen_view", "click_payment") ), screen_view as( select event_date, event_name_with_screen, case when event_name_with_screen = "screen_view-welcome" then 1 when event_name_with_screen = "screen_view-home" then 2 when event_name_with_screen = "screen_view-food_category" then 3 when event_name_with_screen = "screen_view-restaurant" then 4 when event_name_with_screen = "screen_view-cart" then 5 when event_name_with_screen = "click_payment-cart" then 6 else null end as step_number, count(distinct user_pseudo_id) as cnt from filter_event group by all having step_number is not null order by event_date ) select event_date, max(if(event_name_with_screen = "screen_view-welcome", cnt, null)) as screen_view_welcome, max(if(event_name_with_screen = "screen_view-home", cnt, null)) as screen_view_home, max(if(event_name_with_screen = "screen_view-food_category", cnt, null)) as screen_view_food_category, max(if(event_name_with_screen = "screen_view-restaurant", cnt, null)) as screen_view_restaurant, max(if(event_name_with_screen = "screen_view-cart", cnt, null)) as screen_view_cart, max(if(event_name_with_screen = "click_payment-cart", cnt, null)) as click_payment_cart from screen_view group by all order by event_date
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 1주차 과제] ARRAY, STRUCT 연습 문제/ PIVOT 연습문제/ 퍼널 쿼리 연습 문제
--연습문제 CREATE OR REPLACE TABLE advanced.array_exercises AS SELECT movie_id, title, actors, genres FROM ( SELECT 1 AS movie_id, 'Avengers: Endgame' AS title, ARRAY<STRUCT<actor STRING, character STRING>>[ STRUCT('Robert Downey Jr.', 'Tony Stark'), STRUCT('Chris Evans', 'Steve Rogers') ] AS actors, ARRAY<STRING>['Action', 'Adventure', 'Drama'] AS genres UNION ALL SELECT 2, 'Inception', ARRAY<STRUCT<actor STRING, character STRING>>[ STRUCT('Leonardo DiCaprio', 'Cobb'), STRUCT('Joseph Gordon-Levitt', 'Arthur') ], ARRAY<STRING>['Action', 'Adventure', 'Sci-Fi'] UNION ALL SELECT 3, 'The Dark Knight', ARRAY<STRUCT<actor STRING, character STRING>>[ STRUCT('Christian Bale', 'Bruce Wayne'), STRUCT('Heath Ledger', 'Joker') ], ARRAY<STRING>['Action', 'Crime', 'Drama'] ) -- 1) array_exercises 테이블에서 각 영화(title)별로 장르(genres)를 UNNEST해서 보여주세요 select title , genres_new from advanced,.array_exercise AS a, UNNEST(genres) as genres_new -- 2) array_exercises 테이블에서 각 영화(title)별로 배우(actor)와 배역(character)을 보여주세요. 배우와 배역은 별도의 컬럼으로 나와야 합니다 select title , actors_new.actor , actors_new.character from advanced,.array_exercise AS a, UNNEST(actors) as actors_new --3) array_exercises 테이블에서 각 영화(title)별로 배우(actor), 배역(character), 장르 (genre)를 출력하세요. 한 Row에 배우, 배역, 장르가 모두 표시되어야 합니다 --방법 1 with gen as ( select title , genres_new from advanced,.array_exercise AS a, UNNEST(genres) as genres_new ) , actors as ( select title , actors_new.actor , actors_new.character from advanced,.array_exercise AS a, UNNEST(actors) as actors_new ) select from gen g join actors a on g.title=a.title --방법 2 select title , actors_new.actor , actors_new.character , genre_new from advanced,.array_exercise AS a, UNNEST(actors) as actors_new, UNNEST(genres) as genre_new --방법 3 select title , actors_new.actor , actors_new.character , genre_new from advanced.array_exercise cross join UNNEST(actors) as actors_new cross join UNNEST(genres) as genre_new where actors_new.actor 로 조건을 걸어야함 --actor(키값바로) 또는 actors_new로는 안된다 actors_new는 스트럭트 구조이고 actor는 이전 값임 --4) 앱 로그 데이터(app_logs)의 배열을 풀어주세요 --하루 사용자 집계, 어떤 이벤트가 있는가? select user_id , event_date , event_name , user_pseudo_id , event_component.key , event_component.value.string_value , event_component.value.int_value from app_logs as app, UNNEST(event_pharams) as event_component where event_date = '2022-08-11' --피봇 과제 --1) orders 테이블에서 유저(user_id)별로 주문 금액(amount)의 합계를 PIVOT해주세요. 날짜(order_date)를 행(Row)으로, user_id를 열(Column)으로 만들어야 합니다 with raw as ( select user_id , order_date , sum(amount) as amounts from orders ) SELECT order_date , MAX(IF(user_id=1, amounts, NULL)) AS user_1 , MAX(IF(user_id=2, amounts, NULL)) AS user_2 , MAX(IF(user_id=3, amounts, NULL)) AS user_3 ... FROM raw GROUP BY order_date --2) orders 테이블에서 날짜(order_date)별로 유저들의 주문 금액(amount)의 합계를 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)으로 만들어야 합니다 with raw as ( --혹시나 하나의 유저가 하루에 여러 주문을 했을수도 있을것 같아서 이것 사용(MAX 쓸 예정이라서) select user_id , order_date , sum(amount) as amounts from orders ) SELECT user_id , MAX(IF(order_date='2023-05-01', amounts, NULL)) AS '2023-05-01' , MAX(IF(order_date='2023-05-02', amounts, NULL)) AS '2023-05-02' , MAX(IF(order_date='2023-05-03', amounts, NULL)) AS '2023-05-03' ... FROM raw GROUP BY user_id --3) orders 테이블에서 사용자(user_id)별, 날짜(order_date)별로 주문이 있다면 1, 없다면 0으로 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)로 만들고 주문을 많이 해도 1로 처리합니다 with raw as ( select user_id , order_date , sum(amount) as amounts , count(distinct order_id) as order_cnt from orders ) SELECT user_id , MAX(IF(order_date='2023-05-01', 1, 0)) AS '2023-05-01' , MAX(IF(order_date='2023-05-02', 1, 0)) AS '2023-05-02' , MAX(IF(order_date='2023-05-03', 1, 0)) AS '2023-05-03' --second case , MAX(IF(order_date='2023-05-01', order_cnt, 0)) AS '2023-05-01' , MAX(IF(order_date='2023-05-02', order_cnt, 0)) AS '2023-05-02' , MAX(IF(order_date='2023-05-03', order_cnt, 0)) AS '2023-05-03' ... FROM raw GROUP BY user_id -- 4)user_id = 32888이 카트 추가하기(click_cart)를 누를때 어떤 음식(food_id)을 담았나요? -- ##데이터 조회할때 유용한 except(column):특정 컬럼 제외하고 모두 다 -- select * except(event_params) -- GROUP BY ALL with raw as ( select user_id , event_date , event_name , user_pseudo_id , event_component.key , event_component.value.int_value as food_id from app_logs as app, UNNEST(event_pharams) as event_component where user_id = 32888 and event_name = 'click_cart' and event_component.key = 'food_id' ) select distinct event_time --나는 일별로 보고 싶어서 추가 , food_id from raw --성윤님 강의 내용 select user_id , event_name , event_date , event_timestamp --같은일에 중복 주문이 있을까봐서 , MAX(IF(event_component.key='food_id', event_component.value.int_value, NULL)) AS food_id from app_logs as app, UNNEST(event_pharams) as event_component where user_id = 32888 and event_name = 'click_cart' and event_component.key = 'food_id' GROUP BY ALL --알아서 컬럼들 픽 --퍼널 별 유저 수 집계 with raw as ( select user_id , event_date , event_timestamp , event_name , user_pseudo_id , platform , event_component.key , event_component.value.string_value , event_component.value.int_value , MAX(IF(event_component.key = "firebase_screen", event_component.value.string_value, NULL)) AS firebase_screen -- , MAX(IF(event_component.key = "food_id", event_component.value.int_value, NULL)) AS food_id , MAX(IF(event_component.key = "session_id", event_component.value.int_value, NULL)) AS session_id from app_logs as app, UNNEST(event_pharams) as event_component where event_date BETWEEN "2022-08-01" AND "2022-08-18" group by all ) , filter_event_and_concat_event_and_screen AS( SELECT * EXCEPT(event_name, firebase_screen,event_timestamp) , CONCAT(event_name, "-", firebase_screen) AS event_name_with_screen , DATETIME(TIMESTAMP_MICROS(event_timestamp), "Asia/Seoul") AS event_datetime FROM base WHERE event_name IN ("screen_view", "click_payment") ) SELECT event_date, event_name_with_screen, CASE WHEN event_name_with_screen = 'screen_view-welcome' THEN 1 WHEN event_name_with_screen = 'screen_view-home' THEN 2 WHEN event_name_with_screen = 'screen_view-food_category' THEN 3 WHEN event_name_with_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_with_screen = 'screen_view-cart' THEN 5 WHEN event_name_with_screen = 'click_payment-cart' THEN 6 ELSE NULL END AS step_number, COUNT(DISTINCT user_pseudo_id) AS cnt FROM filter_event_and_concat_event_and_screen GROUP BY ALL HAVING step_number IS NOT NULL ORDER BY event_date, step_number 강의 노트select [0,1,1,2,3,4] as array_practice array<int64>[0,1,3] as array_practice generate_array(1,5,2) generate_date_array('2024-01-01', '2024-02-01', interval 1 week) WITH programming_languages AS ( SELECT "python" AS programming_language UNION ALL SELECT "go" UNION ALL SELECT "scala" ) select array_agg(programming_languages) as output from programming_languages --배열에 접근하기 offset: #0 ordinal: #1 #out of range를 방지하기 위해서 safe_ 추가하기 --사용 예시 select some_numbers[safe_offset(1)] as second_value 컬럼명[safe_offset(가져오고 싶은 위치)] Array(like list): 비슷한 카테고리에 대해 데이터를 저장할때 예시) 메뉴(컬럼): 돼지국밥, 떡볶이, 치킨 Struct(like dict): 다양한 속성에 대해 데이터를 한 컬럼에 다 넣고 싶을때 예시) 주소록(컬럼): 이름, 전화번호,이메일, 생일 등등 SELECT (1,2,3) AS struct_test SELECT STRUCT<hi INT64, hello INT64, awesome STRING>(1, 2, 'HI') AS struct_test SELECT struct_test.hi, struct_test.hello FROM ( SELECT STRUCT<hi INT64, hello INT64, awesome STRING>(1, 2, 'HI') AS struct_test ) -- UNNEST를 사용해 중첩된 데이터 구조 풀기(평면화, Flatten) WITH example_data AS( SELECT 'kyle' AS name, ['Python', 'SQL', 'R', 'Julia', 'Go'] AS preferred_language, 'Incheon' AS hometown UNION ALL SELECT 'max' AS name, ['Python', 'SQL', 'Scala', 'Java', 'Kotlin'] AS preferred_language, 'Seoul' AS hometown UNION ALL SELECT 'yun' AS name, ['Python', 'SQL'] AS preferred_language, 'Incheon' AS hometown ) SELECT name, pref_lang, hometown FROM example_data CROSS JOIN UNNEST(preferred_language) AS pref_lang FROM exaple_data AS a, UNNEST(preferred_language) AS pref_lang --그럼 unnest안에는 array만? struct는? SELECT student , MAX(IF(subject="수학", score, NULL)) AS 수학 , MAX(IF(subject="영어", score, NULL)) AS 영어 , MAX(IF(subject="과학", score, NULL)) AS 과학 FROM Table GROUP BY student ###팁 #같은 단어를 수정할 때,빨리하고 싶은 - 단어를 커서위에 올리고 커맨드 디 범위설정하고 수정하면 일괄수정 -> 인텔리데이에서는 어떻게 하지? #기대하는 아웃풋의 형태를 적어보는것 좋다 -> 쉐어포인트 컬럼에 만들기 프로젝트 시작전 - 어떤 업무를 함에 있어서 흐름을 아는 것이 중요하다(흐름을 모르면 어떤것을 왜 해야하는지 모를 수 있음) - 맥락 -> 목적 -> 퍼널 -> 가설 -> 분석 서비스의 목표 파악(어떤 문제를 해결하려고 하는지) 문제 정의: 핵심 문제 목표 정의 퍼널 정의 -> 우리도 이 데이터가 있는지 물어보기
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 1주차 과제] ARRAY, STRUCT / PIVOT / 퍼널 연습 문제
1. ARRAY, STRUCT ### 1) array_exercises 테이블에서 각 영화(title)별로 장르(genres)를 UNNEST해서 보여주세요 # ARRAY : 같은 타입의 여러 데이터를 저장하고 싶을 때 # ARRAY를 Flatten(평면화) => UNNEST # UNNEST를 할 때는 CROSS JOIN + UNNEST(ARRAY_COLUMN) # UNNEST(ARRAY_COLUMN) AS 새로운 이름 # SELECT 절에서 새로운 이름으로 사용한다. 기존의 ARRAY_COLUMN은 사용하지 않는다! -- SELECT -- title -- , genre -- FROM `advanced.array_exercises` -- CROSS JOIN UNNEST(genres) AS genre -- ORDER BY 1, 2 ## 같은 결과를 출력하기 위해 정렬함. -- ; ### 2) array_exercises 테이블에서 각 영화(title)별로 배우(actor)와 배역(character)을 보여주세요. 배우와 배역은 별도의 컬럼으로 나와야 합니다. # 직접 접근하려면 actors = [ STRUCT(STRING, STRING)] # actors[SAFE_OFFSET(0)].actor # actors[SAFE_OFFSET(0)].character -- SELECT -- title -- , act.actor# AS actor -- , act.character# AS character -- FROM `advanced.array_exercises` -- CROSS JOIN UNNEST(actors) AS act -- ORDER BY 1 -- ; ### 3) array_exercises 테이블에서 각 영화(title)별로 배우(actor), 배역(character), 장르(genre)를 출력하세요. 한 Row에 배우, 배역, 장르가 모두 표시되어야 합니다. # 데이터의 중복이 어느정도 생기는데, 그것은 어쩔 수 없는 이슈(CROSS JOIN) -- SELECT -- title -- -- actors, # ARRAY<STRUCT(STRING, STRING)> -- , act.actor# AS actor -- , act.character# AS character -- -- genres # ARRAY<STRING> -- , genre -- FROM `advanced.array_exercises` -- CROSS JOIN UNNEST(actors) AS act -- CROSS JOIN UNNEST(genres) AS genre -- -- WHERE 1=1 -- -- ## 강의 촬영 시점 이후에 수정된 듯 두 쿼리 모두 오류없이 실행 되는 것 같아요 ! -- -- AND act.actor = "Chris Evans" -- -- AND actor = "Chris Evans" -- ORDER BY 1 -- ; ### 4) 앱 로그 데이터(app_logs)의 배열을 풀어주세요. -- SELECT -- user_id -- , event_date -- , event_name -- , user_pseudo_id -- , evt_prm.key AS key -- , evt_prm.value.string_value AS string_value -- , evt_prm.value.int_value AS int_value -- FROM `advanced.app_logs` -- CROSS JOIN UNNEST(event_params) AS evt_prm -- WHERE 1=1 -- AND event_date = "2022-08-01" -- ORDER BY 2 -- ; ### WITH 문 변경 WITH base AS ( SELECT user_id , event_date , event_name , user_pseudo_id , evt_prm.key AS key , evt_prm.value.string_value AS string_value , evt_prm.value.int_value AS int_value FROM `advanced.app_logs` CROSS JOIN UNNEST(event_params) AS evt_prm WHERE 1=1 AND event_date = "2022-08-01" ) SELECT event_date , event_name , COUNT(DISTINCT user_id) AS cnt FROM base GROUP BY ALL ORDER BY cnt DESC 2. PIVOT# 1) orders 테이블에서 유저(user_id)별로 주문금액(amount)의 합계를 PIVOT해주세요. 날짜(order_date)를 행(Row)으로, user_id를 열(Column)으로 만들어야 합니다. -- 기대하는 output의 형태 -- order_date | user_1 | user_2 | user_3 -- PIVOT : MAX(IF(조건, TRUE일 때의 값, FALSE일 때의 값)) AS new_column + GROUP BY -- MAX 대신 집계 함수를 사용할 수도 있음. SUM -- FALSE일 때의 값은 NULL -- SELECT -- order_date -- , SUM(IF(user_id = 1, amount, 0)) AS user_1 -- , SUM(IF(user_id = 2, amount, 0)) AS user_2 -- , SUM(IF(user_id = 3, amount, 0)) AS user_3 -- FROM `advanced.orders` -- GROUP BY 1 -- ORDER BY 1 -- ; # 2) orders 테이블에서 날짜(order_date)별로 유저들의 주문 금액(amount)의 합계를 PIVOT 해주세요.user_id를 행(Row)으로, order_date를 열(Column)으로 만들어야 합니다. -- 기대하는 output의 형태 -- user_id | 2023-05-01 | 2023-05-02 | 2023-05-03 | 2023-05-04 | 2023-05-05 -- SELECT -- user_id -- , SUM(IF(order_date="2023-05-01", amount, 0)) AS `2023-05-01` -- , SUM(IF(order_date="2023-05-02", amount, 0)) AS `2023-05-02` -- , SUM(IF(order_date="2023-05-03", amount, 0)) AS `2023-05-03` -- , SUM(IF(order_date="2023-05-04", amount, 0)) AS `2023-05-04` -- , SUM(IF(order_date="2023-05-05", amount, 0)) AS `2023-05-05` -- 컬럼의 이름을 지정할 때, 영어 제외하고 backtick(`) -- ANY_VALUE : 그훕화 할 대상 중에 임의의 값을 선택한다 (NULL을 제외하고). ANY_VALUE에선 나머지 값들이 NULL이거나 확정적으로 값을 기대할 수 있을 때 사용한다! -- ANY_VALUE(IF(order_date="2023-05-01", amount, NULL)) AS `2023-05-01` -- FROM `advanced.orders` -- GROUP BY 1 -- ORDER BY 1 -- ; # 3) orders 테이블에서 사용자(user_id)별, 날짜(order_date)별로 주문이 있다면 1, 없다면 0으로 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)로 만들고 주문을 많이 해도 1로 처리합니다. -- SELECT -- user_id -- , MAX(IF(order_date="2023-05-01", 1, 0)) AS `2023-05-01` -- , MAX(IF(order_date="2023-05-02", 1, 0)) AS `2023-05-02` -- , MAX(IF(order_date="2023-05-03", 1, 0)) AS `2023-05-03` -- , MAX(IF(order_date="2023-05-04", 1, 0)) AS `2023-05-04` -- , MAX(IF(order_date="2023-05-05", 1, 0)) AS `2023-05-05` -- FROM `advanced.orders` -- GROUP BY 1 -- ORDER BY 1 -- ; ## 앱 로그 PIVOT WITH base AS( SELECT -- * EXCEPT(event_params) # * EXCEPT(column) : 컬럼을 제외하고 다 보여줘! event_date , event_timestamp , event_name , user_id , user_pseudo_id , MAX(IF(param.key = "fierbase_screen", param.value.string_value, NULL)) AS fierbase_screen -- , MAX(IF(param.key = "food_id", param.value.string_value, NULL)) AS food_id # string_value엔 food_id 값들이 저장되어 있지 않음. , MAX(IF(param.key = "food_id", param.value.int_value, NULL)) AS food_id , MAX(IF(param.key = "session_id", param.value.string_value, NULL)) AS sessioon_id FROM `advanced.app_logs` CROSS JOIN UNNEST(event_params) AS param WHERE 1=1 AND event_date = "2022-08-01" GROUP BY ALL ) SELECT event_date , COUNT(user_id) AS user_cnt FROM base WHERE 1=1 AND event_name = "click_cart" -- AND food_id = 1544 GROUP BY event_date 3. 퍼널 연습 문제# 퍼널 분석 -- 퍼널 데이터 -- 우리가 사용할 이벤트 => 단계 -- - screen_view : welcome, home, food_category, restaurant, cart -- - click_payment -- step_number : 추후에 정렬을 위해 만들 것 -- 사용할 데이터 : 앱 로그 데이터, GA/Firebase => UNNEST => PIVOT -- 기간 : 2022-08-01 ~ 2022-08-18 WITH base as ( SELECT event_date , event_timestamp , event_name , user_id , user_pseudo_id , platform , MAX(IF(event_param.key = "firebase_screen", event_param.value.string_value, NULL)) as firebase_screen , MAX(IF(event_param.key = "food_id", event_param.value.int_value, NULL)) as food_id , MAX(IF(event_param.key = "session_id", event_param.value.string_value, NULL)) as session_id FROM advanced.app_logs CROSS JOIN UNNEST(event_params) as event_param WHERE event_date BETWEEN "2022-08-01" AND "2022-08-18" GROUP BY ALL ), base2 as ( SELECT * , CONCAT(event_name, "-", firebase_screen) as event_screen FROM base WHERE 1=1 AND event_name IN ("screen_view", "click_payment") ), base3 as ( SELECT event_screen , event_date , CASE WHEN event_screen = "screen_view-welcome" THEN 1 WHEN event_screen = "screen_view-home" THEN 2 WHEN event_screen = "screen_view-food_category" THEN 3 WHEN event_screen = "screen_view-restaurant" THEN 4 WHEN event_screen = "screen_view-cart" THEN 5 WHEN event_screen = "click_payment-cart" THEN 6 ELSE NULL END as step_number , COUNT(DISTINCT user_pseudo_id) as cnt FROM base2 GROUP BY ALL HAVING step_number is not NULL ORDER BY event_date ) SELECT event_date , MAX(IF(base3.event_screen ="screen_view-welcome", cnt, NULL)) AS screen_view_welcome , MAX(IF(base3.event_screen ="screen_view-home", cnt, NULL)) AS screen_vie_home , MAX(IF(base3.event_screen ="screen_view-food_category", cnt, NULL)) AS screen_view_food_category , MAX(IF(base3.event_screen ="screen_view-restaurant", cnt, NULL)) AS screen_view_restaurant , MAX(IF(base3.event_screen ="screen_view-cart", cnt, NULL)) AS screen_view_cart FROM base3 GROUP BY ALL ORDER BY event_date
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[바짝스터디 1주차 과제]
[ARRAY, STRUCT] 문제 1) array_exercises 테이블에서 각 영화(title)별로 장르(genres)를 UNNEST해서 보여주세요.SELECT title, genre FROM advanced.array_exercises CROSS JOIN UNNEST(genres) AS genre쿼리 결과 1)문제 2) array_exercises 테이블에서 각 영화(title)별로 배우(actor)와 배역(character)을 보여주세요.SELECT title, actor.actor, actor.character FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS actor쿼리 결과 2) 문제 3) array_exercises 테이블에서 각 영화(title)별로 배우(actor), 배역(character), 장르 (genre)를 출력하세요. 한 Row에 배우, 배역, 장르가 모두 표시되어야 합니다.SELECT title, actor.actor AS actor, actor.character AS character, genre FROM advanced.array_exercises CROSS JOIN UNNEST(genres) AS genre CROSS JOIN UNNEST(actors) AS actor쿼리 결과 3)문제 4) 앱 로그 데이터(app_logs) 배열 풀기SELECT event_date, event_timestamp, event_name, event_param.key AS key, event_param.value.string_value AS string_value, event_param.value.int_value AS int_value, user_id, user_pseudo_id, platform FROM `advanced.app_logs` CROSS JOIN UNNEST(event_params) AS event_param WHERE event_date = "2022-08-01" LIMIT 100쿼리 결과 4)[PIVOT]문제 1) orders 테이블에서 유저(user_id)별로 주문 금액(amount)의 합계를 PIVOT해주세요.날짜(order_date)를 행(Row)으로, user_id를 열(Column)으로 만들어야 합니다.SELECT order_date, SUM(IF(user_id = 1, amount, 0)) AS user_1, SUM(IF(user_id = 2, amount, 0)) AS user_2, SUM(IF(user_id = 3, amount, 0)) AS user_3 FROM advanced.orders GROUP BY ALL ORDER BY order_date쿼리 결과 1)문제 2) orders 테이블에서 날짜(order_date)별로 유저들의 주문 금액(amount)의 합계를 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)으로 만들어야 합니다SELECT user_id, SUM(IF(order_date = "2023-05-01", amount, 0)) AS `2023-05-01`, SUM(IF(order_date = "2023-05-02", amount, 0)) AS `2023-05-02`, SUM(IF(order_date = "2023-05-03", amount, 0)) AS `2023-05-03`, SUM(IF(order_date = "2023-05-04", amount, 0)) AS `2023-05-04`, SUM(IF(order_date = "2023-05-05", amount, 0)) AS `2023-05-05`, FROM advanced.orders GROUP BY ALL ORDER BY user_id 쿼리 결과 2)문제 3) orders 테이블에서 사용자(user_id)별, 날짜(order_date)별로 주문이 있다면 1, 없다면 0으로 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)로 만들고 주문을 많이 해도 1로 처리합니다SELECT user_id, MAX(IF(order_date = "2023-05-01", 1, 0)) AS `2023-05-01`, MAX(IF(order_date = "2023-05-02", 1, 0)) AS `2023-05-02`, MAX(IF(order_date = "2023-05-03", 1, 0)) AS `2023-05-03`, MAX(IF(order_date = "2023-05-04", 1, 0)) AS `2023-05-04`, MAX(IF(order_date = "2023-05-05", 1, 0)) AS `2023-05-05`, FROM advanced.orders GROUP BY ALL ORDER BY user_id 쿼리 결과 3)문제 4)user_id = 32888이 카트 추가하기(click_cart)를 누를때 어떤 음식(food_id)을 담았나요?WITH base AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, MAX(IF(event_param.key = 'firebase_screen',event_param.value.string_value, NULL)) AS firebase_screen, MAX(IF(event_param.key = 'food_id',event_param.value.int_value, NULL)) AS food_id, MAX(IF(event_param.key = 'session_id',event_param.value.string_value, NULL)) AS session_id, FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event_param GROUP BY ALL ) SELECT user_id, event_date, COUNT(user_id) AS user_cnt, food_id FROM base WHERE user_id = 32888 and event_name = 'click_cart' GROUP BY ALL쿼리 결과 4) [퍼널분석]WITH base AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, platform, MAX(IF(event_param.key = "firebase_screen", event_param.value.string_value, NULL)) AS firebase_screen, MAX(IF(event_param.key = "session_id", event_param.value.string_value, NULL)) AS session_id FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event_param WHERE event_date BETWEEN "2022-08-01" AND "2022-08-18" GROUP BY ALL ), filter_event_and_concat_event_and_screen AS ( SELECT * EXCEPT(event_name, firebase_screen, event_timestamp), CONCAT(event_name, "-", firebase_screen) AS event_name_with_screen, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime FROM base WHERE event_name IN ("screen_view", "click_payment") ) SELECT event_date, event_name_with_screen, CASE WHEN event_name_with_screen = "screen_view-welcome" THEN 1 WHEN event_name_with_screen = "screen_view-home" THEN 2 WHEN event_name_with_screen = "screen_view-food_category" THEN 3 WHEN event_name_with_screen = "screen_view-restaurant" THEN 4 WHEN event_name_with_screen = "screen_view-cart" THEN 5 WHEN event_name_with_screen = "click_payment-cart" THEN 6 ELSE NULL END AS step_number, COUNT(DISTINCT user_pseudo_id) AS cnt FROM filter_event_and_concat_event_and_screen GROUP BY ALL HAVING step_number IS NOT NULL 쿼리 결과
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 1주차 과제] ARRAY, STRUCT / PIVOT / 퍼널 쿼리 연습 문제
1. ARRAY, STRUCT 1)SELECT title, movie_genres FROM advanced.array_exercises CROSS JOIN UNNEST(genres) AS movie_genres LIMIT 1002)SELECT title, actor.actor, actor.character FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS actor3)SELECT title, actors.actor, actors.character, genres FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS actors CROSS JOIN UNNEST(genres) AS genres WHERE actor = 'ChrisEvans'4)select user_id , event_date , event_name , user_pseudo_id , param.key as key , param.value.string_value as string_value , param.value.int_value as int_value from advanced.app_logs , unnest(event_params) as param 2. PIVOT 1)select order_date , sum(if(user_id = 1,amount,0)) as user_1 , sum(if(user_id = 2,amount,0)) as user_2 , sum(if(user_id = 3,amount,0)) as user_3 from advanced.orders group by order_date order by order_date2)select user_id , max(if(order_date = '2023-05-01',amount,0)) as `2023-05-01` , max(if(order_date = '2023-05-02',amount,0)) as `2023-05-02` , max(if(order_date = '2023-05-03',amount,0)) as `2023-05-03` , max(if(order_date = '2023-05-04',amount,0)) as `2023-05-04` , max(if(order_date = '2023-05-05',amount,0)) as `2023-05-05` from advanced.orders group by user_id order by user_id3)select user_id , max(if(order_date = '2023-05-01',1,0)) as `2023-05-01` , max(if(order_date = '2023-05-02',1,0)) as `2023-05-02` , max(if(order_date = '2023-05-03',1,0)) as `2023-05-03` , max(if(order_date = '2023-05-04',1,0)) as `2023-05-04` , max(if(order_date = '2023-05-05',1,0)) as `2023-05-05` from advanced.orders group by user_id order by user_id4)WITH base AS ( SELECT user_id, event_date, event_name, user_pseudo_id, event_param.key AS key, event_param.value.string_value AS string_value, event_param.value.int_value AS int_value FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event_param ) SELECT user_id, event_date, event_name, user_pseudo_id, MAX(IF(key = 'firebase_screen', string_value, NULL)) AS firebase_screen, MAX(IF(key = 'food_id', int_value, NULL)) AS food_id, MAX(IF(key = 'session_id', string_value, NULL)) AS session_id FROM base WHERE user_id = 32888 AND event_name = 'click_cart' GROUP BY ALL 3. 퍼널 1)with base as ( select event_date , event_name , event_timestamp , user_id , user_pseudo_id , platform , max(if(param.key = 'firebase_screen', param.value.string_value, null)) as firebase_screen from advanced.app_logs , unnest(event_params) as param where event_date between '2022-08-01' and '2022-08-18' group by all ), filter_event as ( select * except(event_name, firebase_screen) , concat(event_name, "-", firebase_screen) as event_name_with_screen from base where event_name in ('screen_view', 'click_payment') ) select event_date , event_name_with_screen , case when event_name_with_screen = 'screen_view-welcome' then 1 when event_name_with_screen = 'screen_view-home' then 2 when event_name_with_screen = 'screen_view-food_category' then 3 when event_name_with_screen = 'screen_view-restaurant' then 4 when event_name_with_screen = 'screen_view-cart' then 5 when event_name_with_screen = 'click_payment-cart' then 6 else null end as step_number , count(distinct user_pseudo_id) as cnt from filter_event group by all having step_number is not null order by event_date 2)with base as ( select event_date , event_name , event_timestamp , user_id , user_pseudo_id , platform , max(if(param.key = 'firebase_screen', param.value.string_value, null)) as firebase_screen from advanced.app_logs , unnest(event_params) as param where event_date between '2022-08-01' and '2022-08-18' group by all ), filter_event as ( select * except(event_name, firebase_screen) , concat(event_name, "-", firebase_screen) as event_name_with_screen from base where event_name in ('screen_view', 'click_payment') ), daily_group as ( select event_date , event_name_with_screen , case when event_name_with_screen = 'screen_view-welcome' then 1 when event_name_with_screen = 'screen_view-home' then 2 when event_name_with_screen = 'screen_view-food_category' then 3 when event_name_with_screen = 'screen_view-restaurant' then 4 when event_name_with_screen = 'screen_view-cart' then 5 when event_name_with_screen = 'click_payment-cart' then 6 else null end as step_number , count(distinct user_pseudo_id) as cnt from filter_event group by all having step_number is not null order by event_date ) select event_date , max(if(event_name_with_screen = 'screen_view-welcome',cnt,null)) as screen_view_welcome , max(if(event_name_with_screen = 'screen_view-home',cnt,null)) as screen_view_home , max(if(event_name_with_screen = 'screen_view-food_category',cnt,null)) as screen_view_food_category , max(if(event_name_with_screen = 'screen_view-restaurant',cnt,null)) as screen_view_restaurant , max(if(event_name_with_screen = 'screen_view-cart',cnt,null)) as screen_view_cart , max(if(event_name_with_screen = 'click_payment-cart',cnt,null)) as click_payment_cart from daily_group group by all order by event_date
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 1주차 과제] ARRAY, STRUCT 연습 문제 / PIVOT 연습 문제 / 퍼널 쿼리 연습 문제
ARRAY, STRUCT 연습 문제각 영화(title)별로 장르(genres)를 UNNEST해서 보여주세요.SELECT title, genre FROM advanced.array_exercises AS ae CROSS JOIN UNNEST(genres) AS genre각 영화(title)별 배우(actor)와 배역(character)을 보여주세요.(별도 칼럼)SELECT title, actor.actor, actor.character FROM advanced.array_exercises AS ae CROSS JOIN UNNEST(actors) AS actor각 영화(title)별로 배우(actor),배역(character),장르(genre)를 출력하세요. SELECT title, actor, character, genre FROM advanced.array_exercises AS ae CROSS JOIN UNNEST(actors) AS actor CROSS JOIN UNNEST(genres) AS genre 앱 로그 데이터(app_logs)의 배열을 풀어주세요.SELECT event_date, event_timestamp, event_name, user_pseudo_id, event_param.key AS key, event_param.value.string_value AS string_value, event_param.value.int_value AS int_value FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event_param PIVOT 연습 문제유저(user_id)별로 주문 금액(amount)의 합계를 PIVOT (날짜를 행, user_id를 열)SELECT order_date, SUM(IF(user_id = 1, amount, 0)) AS user_1, SUM(IF(user_id = 2, amount, 0)) AS user_2, SUM(IF(user_id = 3, amount, 0)) AS user_3 FROM advanced.orders GROUP BY order_date ORDER BY order_date날짜별로 유저들의 주문금액의 합계를 PIVOT (user_id를 행, order_date를 열)SELECT user_id, SUM(IF(order_date = '2023-05-01', amount, 0)) AS `2023-05-01`, SUM(IF(order_date = '2023-05-02', amount, 0)) AS `2023-05-02`, SUM(IF(order_date = '2023-05-03', amount, 0)) AS `2023-05-03`, SUM(IF(order_date = '2023-05-04', amount, 0)) AS `2023-05-04`, SUM(IF(order_date = '2023-05-05', amount, 0)) AS `2023-05-05` FROM advanced.orders GROUP BY user_id ORDER BY user_id사용자별, 날짜별로 주문이 있다면 1, 없다면 0으로 PIVOT 해주세요 (user_id를 행, order_date를 열)SELECT user_id, IF(SUM(IF(order_date = '2023-05-01', amount, 0))>0,1,0) AS `2023-05-01`, IF(SUM(IF(order_date = '2023-05-02', amount, 0))>0,1,0) AS `2023-05-03`, IF(SUM(IF(order_date = '2023-05-03', amount, 0))>0,1,0) AS `2023-05-02`, IF(SUM(IF(order_date = '2023-05-04', amount, 0))>0,1,0) AS `2023-05-04`, IF(SUM(IF(order_date = '2023-05-05', amount, 0))>0,1,0) AS `2023-05-05` FROM advanced.orders GROUP BY user_id ORDER BY user_iduser_id = 32888이 카트 추가하기(click_cart)를 누를때 어떤 음식(food_id)을 담았나요?SELECT user_id, MAX(IF(event_param.key = 'firebase_screen',event_param.value.string_value,NULL)) AS firebase_screen, MAX(IF(event_param.key = 'food_id',event_param.value.int_value,NULL)) AS food_id FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event_param WHERE user_id = 32888 AND event_name = 'click_cart' GROUP BY user_id, event_timestamp -- 카트에 담은 음식(food_id): 1559, 1942퍼널 쿼리 연습 문제일자별 이벤트 집계 후 PIVOTWITH funnel_data AS ( SELECT *, CASE WHEN event_name_with_screen = 'screen_view-welcome' THEN 1 WHEN event_name_with_screen = 'screen_view-home' THEN 2 WHEN event_name_with_screen = 'screen_view-food_category' THEN 3 WHEN event_name_with_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_with_screen = 'screen_view-cart' THEN 5 WHEN event_name_with_screen = 'click_payment-cart' THEN 6 END AS step_number FROM ( SELECT event_date, event_timestamp, user_pseudo_id, concat(event_name, '-', event_param.value.string_value) AS event_name_with_screen FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event_param WHERE event_param.key = 'firebase_screen' ) AS unnested_app_logs WHERE event_name_with_screen IN ( 'screen_view-welcome', 'screen_view-home', 'screen_view-food_category', 'screen_view-restaurant', 'screen_view-cart', 'click_payment-cart' ) )SELECT event_date, COUNT(IF(step_number = 1, user_pseudo_id, NULL)) AS `screen_view-welcome`, COUNT(IF(step_number = 2, user_pseudo_id, NULL)) AS `screen_view-home`, COUNT(IF(step_number = 3, user_pseudo_id, NULL)) AS `screen_view-food_category`, COUNT(IF(step_number = 4, user_pseudo_id, NULL)) AS `screen_view-restaurant`, COUNT(IF(step_number = 5, user_pseudo_id, NULL)) AS `screen_view-cart`, COUNT(IF(step_number = 6, user_pseudo_id, NULL)) AS `click_payment-cart` FROM funnel_data GROUP BY event_date ORDER BY event_date
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[바짝스터디 1주차 과제] ARRAY,STRUCT,PIVOT,FUNNEL
1. ARRAY, STRUCT 연습문제문제 1) array_exercise테이블에서 각 영화(title)별로 장르(genres)를 UNNEST 해서 보여주세요SELECT title, genres FROM `analystic-project.advanced.array_exercises` , UNNEST(genres) AS genres ; 문제 2) array_exercises 테이블에서 각 영화(title)별로 배우(actor)와 배역(character)을 보여주세요. 배우와 배역은 별도의 컬럼으로 나와야합니다SELECT title, actors.actor, actors.character FROM `analystic-project.advanced.array_exercises` , UNNEST(actors) AS actors ; 문제 3) array_exercises 테이블에서 각 영화(title)별로 배우(actor), 배역(character), 장르 (genre)를 출력하세요. 한 Row에 배우, 배역, 장르가 모두 표시되어야 합니다SELECT title, actors.actor, actors.character, genres FROM `analystic-project.advanced.array_exercises` , UNNEST(actors) AS actors, UNNEST(genres) genres ; 문제 4) 앱 로그 데이터(app_logs) 배열 풀기SELECT user_id, event_date, event_name, user_pseudo_id, pr.key, pr.value.string_value, pr.value.int_value FROM `analystic-project.advanced.app_logs` , UNNEST(event_params) AS pr WHERE event_date = "2022-08-01" LIMIT 1000 ; 2. PIVOT 연습문제 풀이문제 1) orders 테이블에서 유저(user_id)별로 주문 금액(amount)의 합계를 PIVOT해주세요. 날짜(order_date)를 행(Row)으로, user_id를 열(Column)으로 만들어야 합니다SELECT order_date, COALESCE(SUM(IF(user_id = 1, amount, null)),0) AS user_1, COALESCE(SUM(IF(user_id = 2, amount, null)),0) AS user_2, COALESCE(SUM(IF(user_id = 3, amount, null)),0) AS user_3 FROM advanced.orders GROUP BY order_date ORDER BY order_date ; 문제 2) orders 테이블에서 날짜(order_date)별로 유저들의 주문 금액(amount)의 합계를 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)으로 만들어야 합니다SELECT user_id, COALESCE(SUM(IF(order_date = '2023-05-01', amount, null)),0) AS `2023-05-01`, COALESCE(SUM(IF(order_date = '2023-05-02', amount, null)),0) AS `2023-05-02`, COALESCE(SUM(IF(order_date = '2023-05-03', amount, null)),0) AS `2023-05-03`, COALESCE(SUM(IF(order_date = '2023-05-04', amount, null)),0) AS `2023-05-04`, COALESCE(SUM(IF(order_date = '2023-05-05', amount, null)),0) AS `2023-05-05`, FROM advanced.orders GROUP BY user_id ORDER BY user_id ; 문제 3) orders 테이블에서 사용자(user_id)별, 날짜(order_date)별로 주문이 있다면 1, 없다면 0으로 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)로 만들고 주문을 많이 해도 1로 처리합니다SELECT user_id, MAX(IF(order_date = '2023-05-01' AND order_id is not null, 1, 0)) AS `2023-05-01`, MAX(IF(order_date = '2023-05-02' AND order_id is not null, 1, 0)) AS `2023-05-02`, MAX(IF(order_date = '2023-05-03' AND order_id is not null, 1, 0)) AS `2023-05-03`, MAX(IF(order_date = '2023-05-04' AND order_id is not null, 1, 0)) AS `2023-05-04`, MAX(IF(order_date = '2023-05-05' AND order_id is not null, 1, 0)) AS `2023-05-05`, FROM advanced.orders GROUP BY user_id ORDER BY user_id ; 문제 4) user_id = 32888이 카트 추가하기(click_cart)를 누를때 어떤 음식(food_id)을 담았나요?WITH app_order_raw AS ( SELECT user_id, event_date, event_name, user_pseudo_id, pr.key, pr.value.string_value, pr.value.int_value FROM advanced.app_logs, UNNEST(event_params) AS pr WHERE event_date = '2022-08-01' ) SELECT user_id, event_date, event_name, user_pseudo_id, MAX(IF(key = 'firebase_screen', string_value, null)) AS firebase_screen, MAX(IF(key = 'food_id', int_value, null)) AS food_id, MAX(IF(key = 'session_id', string_value, null)) AS session_id, FROM app_order_raw GROUP BY user_id, event_date, event_name, user_pseudo_id ; 3. 퍼널분석문제 1) 각 퍼널의 유저 수를 집계 / 데이터 기준: 2022-08-01 ~ 2022-08-18WITH funnel_data_raw AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null)) AS screen_name, CONCAT(event_name, '-', MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null))) AS event_name_with_screen FROM advanced.app_logs, UNNEST(event_params) AS pr WHERE event_date BETWEEN '2022-08-01' AND '2022-08-18' GROUP BY 1,2,3,4,5 ) SELECT event_name_with_screen, CASE WHEN event_name_with_screen = 'screen_view-welcome' THEN 1 WHEN event_name_with_screen = 'screen_view-home' THEN 2 WHEN event_name_with_screen = 'screen_view-food_category' THEN 3 WHEN event_name_with_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_with_screen = 'screen_view-cart' THEN 5 WHEN event_name_with_screen = 'click_payment-cart' THEN 6 END AS step_number, COUNT(DISTINCT user_pseudo_id) AS cnt FROM funnel_data_raw WHERE event_name IN ('screen_view', 'click_payment') AND screen_name IN ('welcome', 'home', 'food_category', 'restaurant', 'cart') GROUP BY 1,2 ORDER BY 2 ; 문제 2) 일자별 퍼널 유저 수 집계WITH funnel_data_raw AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null)) AS screen_name, CONCAT(event_name, '-', MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null))) AS event_name_with_screen FROM advanced.app_logs, UNNEST(event_params) AS pr WHERE event_date BETWEEN '2022-08-01' AND '2022-08-18' GROUP BY 1,2,3,4,5 ) SELECT event_date, event_name_with_screen, CASE WHEN event_name_with_screen = 'screen_view-welcome' THEN 1 WHEN event_name_with_screen = 'screen_view-home' THEN 2 WHEN event_name_with_screen = 'screen_view-food_category' THEN 3 WHEN event_name_with_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_with_screen = 'screen_view-cart' THEN 5 WHEN event_name_with_screen = 'click_payment-cart' THEN 6 END AS step_number, COUNT(DISTINCT user_pseudo_id) AS cnt FROM funnel_data_raw WHERE event_name IN ('screen_view', 'click_payment') AND screen_name IN ('welcome', 'home', 'food_category', 'restaurant', 'cart') GROUP BY 1,2,3 ORDER BY 1,3 ; 문제 3) 일자별 퍼널 유저 수 집계 형태를 PIVOT형태로 전환하기WITH funnel_data_raw AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null)) AS screen_name, CONCAT(event_name, '-', MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null))) AS event_name_with_screen FROM advanced.app_logs, UNNEST(event_params) AS pr WHERE event_date BETWEEN '2022-08-01' AND '2022-08-18' GROUP BY 1,2,3,4,5 ), daily_funnel_user_count as ( SELECT event_date, event_name_with_screen, CASE WHEN event_name_with_screen = 'screen_view-welcome' THEN 1 WHEN event_name_with_screen = 'screen_view-home' THEN 2 WHEN event_name_with_screen = 'screen_view-food_category' THEN 3 WHEN event_name_with_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_with_screen = 'screen_view-cart' THEN 5 WHEN event_name_with_screen = 'click_payment-cart' THEN 6 END AS step_number, COUNT(DISTINCT user_pseudo_id) AS cnt FROM funnel_data_raw WHERE event_name IN ('screen_view', 'click_payment') AND screen_name IN ('welcome', 'home', 'food_category', 'restaurant', 'cart') GROUP BY 1,2,3 ORDER BY 1,3 ) SELECT event_date, MAX(IF(step_number = 1, cnt, null)) AS `screen_view-welcome`, MAX(IF(step_number = 2, cnt, null)) AS `screen_view-home`, MAX(IF(step_number = 3, cnt, null)) AS `screen_view-food_category`, MAX(IF(step_number = 4, cnt, null)) AS `screen_view-restaurant`, MAX(IF(step_number = 5, cnt, null)) AS `screen_view-cart`, MAX(IF(step_number = 6, cnt, null)) AS `click_payment-cart`, FROM daily_funnel_user_count GROUP BY ALL ORDER BY 1 ;
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 1주차 과제] ARRAY, STRUCT 연습 문제/ PIVOT 연습문제/ 퍼널 쿼리 연습 문제
<PART 1> ARRAY, STRUCT 연습문제Q1. array_exercises 테이블에서 각 영화(title)별로 장르(genres)를 UNNEST해서 보여주세요.-- 출제의도: 배열 UNNEST의 기본 형태를 사용할 수 있는가? SELECT title , genre FROM advanced.array_exercises CROSS JOIN UNNEST(genres) AS genre ORDER BY title;Q2. array_exercise 테이블에서 각 영화(title)별로 배우(actors)와 배역(character)을 보여주세요. 배우와 배역은 별도의 컬럼으로 나와야 합니다.-- 출제의도: 다중 배열 구조에서 UNNEST를 사용할 수 있는가? SELECT title , actor.actor , actor.character FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS actor ORDER BY title; Q3. array_exercises 테이블에서 각 영화(title)별로 배우(actor), 배역(character), 장르(genre)를 출력하세요. 한 Row에 배우, 배역, 장르가 모두 표시되어야 합니다.-- 출제의도: 여러 칼럼을 동시에 UNNEST할 수 있는가? SELECT title , actor.actor , actor.character , genre FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS actor CROSS JOIN UNNEST(genres) AS genre ORDER BY title;Q4. 앱 로그 데이터(app_logs)의 배열을 풀어주세요.-- 출제의도: 다중 struct 구조의 데이터를 평면화하여 쿼리로 호출할 수 있는가? SELECT user_id , event_date , event_name , user_pseudo_id , event.key , event.value.string_value , event.value.int_value FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event WHERE event_date = '2022-08-01';<PART 2> PIVOT 연습문제Q1. orders 테이블에서 유저(user_id)별로 주문 금액(amount)의 합계를 PIVOT 해주세요. 날짜(order_date)를 행(Row)으로, user_id를 열(Column)으로 만들어야 합니다.-- 출제의도: 집계 함수와 조건 함수를 결합하여 PIVOT 테이블을 만들 수 있는가? SELECT order_date , SUM(IF(user_id=1, amount, 0)) AS user_1 , SUM(IF(user_id=2, amount, 0)) AS user_2 , SUM(IF(user_id=3, amount, 0)) AS user_3 FROM advanced.orders GROUP BY ALL ORDER BY order_date; Q2. orders 테이블에서 날짜(order_date)별로 유저들의 주문 금액(amount)의 합계를 PIVOT 해주세요. user_id 를 행(Row)으로, order_date를 열(Column)으로 만들어야 합니다.-- 출제의도 : PIVOT 테이블 구성 시, 날짜 칼럼을 이용하여 시계열 방식을 구성할 수 있는가? SELECT user_id , SUM(IF(order_date = '2023-05-01', amount, 0)) AS `2023-05-01` , SUM(IF(order_date = '2023-05-02', amount, 0)) AS `2023-05-02` , SUM(IF(order_date = '2023-05-03', amount, 0)) AS `2023-05-03` , SUM(IF(order_date = '2023-05-04', amount, 0)) AS `2023-05-04` , SUM(IF(order_date = '2023-05-05', amount, 0)) AS `2023-05-05` FROM advanced.orders GROUP BY ALL ORDER BY user_id; Q3. orders 테이블에서 사용자(user_id)별, 날짜(order_date)별로 주문이 있다면 1, 없다면 0으로 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)로 만들고 주문을 많이 해도 1로 처리합니다.-- 출제의도 : PIVOT 테이블 구성 시, 집계 함수로 MAX를 사용할 수 있는가? SELECT user_id , MAX(IF(order_date = '2023-05-01', 1, 0)) AS `2023-05-01` , MAX(IF(order_date = '2023-05-02', 1, 0)) AS `2023-05-02` , MAX(IF(order_date = '2023-05-03', 1, 0)) AS `2023-05-03` , MAX(IF(order_date = '2023-05-04', 1, 0)) AS `2023-05-04` , MAX(IF(order_date = '2023-05-05', 1, 0)) AS `2023-05-05` FROM advanced.orders GROUP BY ALL ORDER BY user_id; Q4. user_id = 32888 이 카트 추가하기(click_cart)를 누를 때 어떤 음식 (food_id)을 담았는지 구해주세요. key 를 Column 으로 두고, string_value 나 int_value를 Column의 값으로 설정해서 풀어주세요.-- 출제의도 : PIVOT 테이블을 앱로그 데이터에 사용하여, 조건문으로 개별 유저 데이터를 특정할 수 있는가? WITH base AS( SELECT event_date , event_timestamp , event_name , user_id , user_pseudo_id , MAX(IF(params.key = 'firebase_screen', params.value.string_value, NULL)) AS firebase_screen , MAX(IF(params.key = 'food_id', params.value.int_value, NULL)) AS food_id , MAX(IF(params.key = 'session_id', params.value.string_value, NULL)) AS session_id FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS params WHERE event_date = '2022-08-01' GROUP BY ALL ) SELECT * FROM base WHERE event_name = 'click_cart' and user_id = 32888 -- 실행결과 : food_id = 1942<PART 3> 퍼널 연습문제-- 출제의도: 앱 로그 데이터에서 원하는 이벤트를 추출해, 퍼널 분석을 위한 전처리를 진행할 수 있는가? -- step 1. UNNEST를 통한 base 데이터 준비 WITH base AS( SELECT event_date , event_timestamp , event_name , event.key AS event_key , event.value.string_value AS event_string_value , event.value.int_value AS event_int_value , user_id , user_pseudo_id , platform FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event WHERE event_date BETWEEN '2022-08-01' AND '2022-08-22' ), -- step 2. 필요한 퍼널 이벤트에만 step_number를 세팅하여 준비 sorted_events AS( SELECT event_date , CONCAT(event_name, "-", event_string_value) AS event_name_with_screen , CASE WHEN event_name = 'screen_view' AND event_string_value = 'welcome' THEN 1 WHEN event_name = 'screen_view' AND event_string_value = 'home' THEN 2 WHEN event_name = 'screen_view' AND event_string_value = 'food_category' THEN 3 WHEN event_name = 'screen_view' AND event_string_value = 'restaurant' THEN 4 WHEN event_name = 'screen_view' AND event_string_value = 'cart' THEN 5 WHEN event_name = 'click_payment' AND event_string_value = 'cart' THEN 6 ELSE NULL END AS step_number , user_pseudo_id FROM base WHERE event_key = 'firebase_screen' ) -- step 3. 최종 조회 쿼리 SELECT event_date , event_name_with_screen , step_number , COUNT(DISTINCT user_pseudo_id) AS cnt FROM sorted_events WHERE step_number IS NOT NULL GROUP BY ALL ORDER BY event_date, step_number
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 1주차 과제] ARRAY, STRUCT / PIVOT / 퍼널 연습 문제
ARRAY-- 1) array_exercises 테이블에서 각 영화(title)별로 장르(genres)를 UNNEST해서 보여주세요. SELECT title, genre FROM advanced.array_exercises CROSS JOIN UNNEST(genres) AS genre --2) array_exercises 테이블에서 각 영화(title)별로 배우(actor)와 배역(character)을 보여주세요. 배우와 배역은 별도의 컬럼으로 나와야 합니다. SELECT title, actor.actor, actor.character FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS actor --3) array_exercises 테이블에서 각 영화(title)별로 배우(actor), 배역(character), 장르(genre)를 출력하세요. 한 Row에 배우, 배역, 장르가 모두 표시되어야 합니다. SELECT title, actor.actor AS actor, actor.character AS character, genre FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS actor CROSS JOIN UNNEST(genres) AS genre WHERE actor.actor = 'Chris Evans' AND genre = 'Action'es) AS genre -- 4) 앱 로그 데이터(app_logs)의 배열을 풀어주세요 WITH base AS ( SELECT user_id, event_date, event_name, user_pseudo_id, event_param.key AS key, event_param.value.string_value AS string_value, event_param.value.int_value AS int_value FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event_param WHERE event_date = '2022-08-01')PIVOT--1. orders 테이블에서 유저(user_id)별로 주문 금액(amount)의 합계를 PIVOT 해주세요. 날짜(order_date)를 행(Row)으로, user_id를 열(Column)으로 만들어아 합니다. SELECT order_date, MAX(IF(user_id = 1, sum_of_amount, 0)) AS user_1, MAX(IF(user_id = 2, sum_of_amount, 0)) AS user_2, MAX(IF(user_id = 3, sum_of_amount, 0)) AS user_3 FROM ( SELECT order_date, user_id, SUM(amount) AS sum_of_amount FROM advanced.orders GROUP BY order_date, user_id ) GROUP BY order_date ORDER BY order_date --2.orders 테이블에서 날짜(order_date)별로 유저들의 주문 금액(amount)의 합계를 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)으로 만들어야 합니다. SELECT user_id, MAX(IF(order_date = "2023-05-01", amount, 0)) AS `2023-05-01`, MAX(IF(order_date = "2023-05-02", amount, 0)) AS `2023-05-02`, MAX(IF(order_date = "2023-05-03", amount, 0)) AS `2023-05-03`, MAX(IF(order_date = "2023-05-04", amount, 0)) AS `2023-05-04`, MAX(IF(order_date = "2023-05-05", amount, 0)) AS `2023-05-05`, FROM advanced.orders GROUP BY user_id ORDER BY user_id --3.orders 테이블에서 사용자(user_id)별, 날짜(order_date)별로 주문이 있다면 1, 없다면 0으로 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)로 만들고 주문을 많이 해도 1로 처리합니다. SELECT user_id, MAX(IF(order_date = '2023-05-01', 1, 0)) AS `2023-05-01`, MAX(IF(order_date = '2023-05-02', 1, 0)) AS `2023-05-02`, MAX(IF(order_date = '2023-05-03', 1, 0)) AS `2023-05-03`, MAX(IF(order_date = '2023-05-04', 1, 0)) AS `2023-05-04`, MAX(IF(order_date = '2023-05-05', 1, 0)) AS `2023-05-05`, FROM advanced.orders GROUP BY user_id 퍼널WITH base AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, platform, MAX(IF(event_param.key = "firebase_screen", event_param.value.string_value, NULL)) AS firebase_screen, MAX(IF(event_param.key = "session_id", event_param.value.string_value, NULL)) AS session_id FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event_param WHERE event_date BETWEEN "2022-08-01" AND "2022-08-18" GROUP BY ALL ), filter_event_and_concat_event_and_screen AS ( SELECT * EXCEPT(event_name, firebase_screen, event_timestamp), CONCAT(event_name, "-", firebase_screen) AS event_name_with_screen, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime FROM base WHERE event_name IN ("screen_view", "click_payment") ) SELECT event_date, event_name_with_screen, CASE WHEN event_name_with_screen = "screen_view-welcome" THEN 1 WHEN event_name_with_screen = "screen_view-home" THEN 2 WHEN event_name_with_screen = "screen_view-food_category" THEN 3 WHEN event_name_with_screen = "screen_view-restaurant" THEN 4 WHEN event_name_with_screen = "screen_view-cart" THEN 5 WHEN event_name_with_screen = "click_payment-cart" THEN 6 ELSE NULL END AS step_number, COUNT(DISTINCT user_pseudo_id) AS cnt FROM filter_event_and_concat_event_and_screen GROUP BY ALL HAVING step_number IS NOT NULL아직 SQL 익숙지 않아서, 강의 들으면서 코드를 이해하려고 했습니다.얼른 빅쿼리 SQL입문 강의도 다 듣고, 2주차에 더 실력이 올라갔으면 좋겠습니다!
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 1주차 과제] ARRAY, STRUCT (UNNEST), 데이터 PIVOT, 퍼널 분석
1.ARRAY, STRUCT 연습문제/* UNNEST를 사용하는 이유 : 중첩된 데이터를 평평하게 만들어 집계 및 분석을 쉽게 하기 위해 UNNEST된 결과를 사용하여 분석을 실행 : 1.프로그래밍 언어 선호도, 2.지역별 언어 선호도 분석을 통해 Action Itme을 도출 : 프로그래밍 강좌를 제공한다면 선호하는 언어 순으로 영상 제작 등 */ SELECT name, pref_lang, hometown FROM example_data CROSS JOIN UNNEST(preferred_language) AS pref_lang; # UNNEST란 장바구니(배열)에 있는 과일(배열의 값)을 모두 다 꺼내는 것 /* 연습문제 1 UNNEST된 결과를 사용하여 분석을 실행 : 1.영화 장르 선호도 분석을 통해 Action Itme을 도출 : 영화 제작사라면 어떤 장르가 선호되는 것을 보고 영화 제작 */ SELECT title, genre FROM advanced.array_exercises CROSS JOIN UNNEST(genres) AS genre; /* 연습문제 2 UNNEST된 결과를 사용하여 분석을 실행 : X 분석을 통해 Action Itme을 도출 : X */ SELECT title, actor.actor, actor.character FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS actor /* 연습문제 3 UNNEST된 결과를 사용하여 분석을 실행 : 1.배우의 영화 장르 선호도 분석을 통해 Action Itme을 도출 : 영화 제작시 배우의 장르 선호도 확인 후 */ SELECT title, actor.actor, actor.character, genre FROM advanced.array_exercises ,UNNEST(actors) AS actor, UNNEST(genres) AS genre /* 연습문제 4 */ WITH base AS ( SELECT user_id, event_date, event_name, user_pseudo_id, event_param.key AS key, -- event_param.value AS value, event_param.value.string_value, event_param.value.int_value FROM advanced.app_logs AS al CROSS JOIN UNNEST(event_params) AS event_param WHERE 1=1 AND event_date = '2022-08-01' ) SELECT event_date, event_name, COUNT(DISTINCT user_id) AS cnt FROM base GROUP BY ALL ORDER BY cnt DESC2.PIVOT 연습문제SELECT order_date, MAX(IF(user_id = 1, sum_of_amount, 0)) AS user_1, MAX(IF(user_id = 2, sum_of_amount, 0)) AS user_2, MAX(IF(user_id = 3, sum_of_amount, 0)) AS user_3 FROM ( SELECT order_date, user_id, #Amount의 합 SUM(amount) AS sum_of_amount FROM advanced.orders GROUP BY order_date, user_id ) GROUP BY order_date ORDER BY order_date; SELECT order_date, SUM(IF(user_id = 1, amount, 0)) AS user_1, SUM(IF(user_id = 2, amount, 0)) AS user_2, SUM(IF(user_id = 3, amount, 0)) AS user_3 FROM advanced.orders GROUP BY order_date ORDER BY order_date; SELECT order_id, order_date, user_id, IF(order_date = '2023-05-01', amount, NULL) AS `2023-05-01`, IF(order_date = '2023-05-02', amount, NULL) AS `2023-05-02`, IF(order_date = '2023-05-03', amount, NULL) AS `2023-05-03`, IF(order_date = '2023-05-04', amount, NULL) AS `2023-05-04`, IF(order_date = '2023-05-05', amount, NULL) AS `2023-05-05` FROM advanced.orders; SELECT user_id, # amount 대신 1이라고 표시. IF 문 안에 TRUE 일 때의 값이 항상 특정 컬럼이 아니라 1이라고 할 수도 있음(유무에 따라서) MAX(IF(order_date = '2023-05-01', 1, 0)) AS `2023-05-01`, MAX(IF(order_date = '2023-05-02', 1, 0)) AS `2023-05-02`, MAX(IF(order_date = '2023-05-03', 1, 0)) AS `2023-05-03`, MAX(IF(order_date = '2023-05-04', 1, 0)) AS `2023-05-04`, MAX(IF(order_date = '2023-05-05', 1, 0)) AS `2023-05-05` FROM advanced.orders GROUP BY user_id; WITH base AS ( SELECT # * EXCEPT(event_params), # * EXCEPT(컬럼) : 컬럼을 제외하고 모두 다 보여줘! event_date, event_timestamp, event_name, user_id, user_pseudo_id, MAX(IF(param.key = "firebase_screen", param.value.string_value, NULL)) AS firebase_screen, MAX(IF(param.key = "food_id", param.value.int_value, NULL)) AS food_id2, MAX(IF(param.key = "session_id", param.value.string_value, NULL)) AS session_id, FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS param WHERE event_date = "2022-08-01" GROUP BY ALL ) SELECT event_date, COUNT(user_id) AS user_cnt FROM base WHERE event_name = "click_cart" GROUP BY event_date 3.퍼널 분석 연습문제WITH base AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, platform, MAX(IF(event_param.key = "firebase_screen", event_param.value.string_value, NULL)) AS firebase_screen, MAX(IF(event_param.key = "session_id", event_param.value.string_value, NULL)) AS session_id FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event_param WHERE 1=1 AND event_date BETWEEN "2022-08-01" AND "2022-08-18" GROUP BY ALL ), filter_event_and_concat_event_and_screen AS ( SELECT * EXCEPT(event_name, firebase_screen, event_timestamp), CONCAT(event_name, "-", firebase_screen) AS event_name_with_screen, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime FROM base WHERE event_name IN ("screen_view", "click_payment") ) #일자별로 퍼널별 유저 수 쿼리 SELECT event_date, event_name_with_screen, CASE WHEN event_name_with_screen = "screen_view-welcome" THEN 1 WHEN event_name_with_screen = "screen_view-home" THEN 2 WHEN event_name_with_screen = "screen_view-food_category" THEN 3 WHEN event_name_with_screen = "screen_view-restaurant" THEN 4 WHEN event_name_with_screen = "screen_view-cart" THEN 5 WHEN event_name_with_screen = "click_payment-cart" THEN 6 ELSE NULL END AS step_number, COUNT(DISTINCT user_pseudo_id) AS cnt FROM filter_event_and_concat_event_and_screen GROUP BY ALL HAVING step_number IS NOT NULL ORDER BY event_date
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[바짝스터디 1주차 과제] ARRAY,STRUCT,PIVOT,FUNNEL
Q1 STRUCT, UNNEST 1. array_exercises 테이블에서 각 영화(title)별로 장르(genres)를 UNNEST해서 보여주세요. SELECT title, -- genres, genre FROM advanced.array_exercises AS ae CROSS JOIN UNNEST(genres) AS genre -- genres는 평면화가 된 데이터를 의미 -- genres가 지금 배열 -- ARRAY : 같은 타입의 여러 데이터를 저장하고 싶을 때 -- ARRAY를 풀때 Flattten(평면화) -> UNNEST -- UNNEST릃 할 때는 CROSS JOIN + UNNEST(ARRAY_COLUMN) 컬럼 명시 2) array_exercises 테이블에서 각 영화(title)q별로 배우(actor)와 배역(character)을 보여주세요. 배우와 배역은 별도의 컬럼으로 나와야 함. SELECT title, -- actors -- actor에 직접 접근하면 어떨까 -> 새로운 컬럼으로 가능하나, 매번 SAFE_OFFSET을 지정해야 함 -- actors = [STRUCT(STRING,STRING)] actors[SAFE_OFFSET(0)].actor AS first_actor, actors[SAFE_OFFSET(0)].actor AS first_character, actors[SAFE_OFFSET(1)].actor AS second_actor, actors[SAFE_OFFSET(1)].actor AS second_character -- 배열에 직접 접근이 아닌 UNNEST로 풀어야 편리할 듯 FROM advanced.array_exercises as ae --------------------------------------------------------------- --------------------------------------------------------------- SELECT title, actor.actor, actor.character FROM advanced.array_exercises as ae CROSS JOIN UNNEST(actors) AS actor -- actors가 배열 3) array_exercises 테이블에서 각 영화(title) 별로 배우(actor), 배역(character), 장르(genre)를 출력. 한 row에 배우, 배역, 장르가 모두 표시되어야 함. SELECT title, -- actors, #ARRAY<STRUCT(STRING, STRING)> actor.actor as actor, actor.character as character, -- genres # ARRAY<STRING> genre FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS actor CROSS JOIN UNNEST(genres) AS genre Q2 PIVOT 1-1) 1) orders 테이블에서 유저(user_id)별로 주문 금액(amount)의 합계를 PIVOT해주세요. 날짜(order_date)를 행(Row)으로, user_id를 열(Column)으로 만들어야 합니다 SELECT order_date, IF(user_id = 1, amount , NULL) AS user_1, IF(user_id = 2, amount , NULL) AS user_2, IF(user_id = 3, amount , NULL) AS user_3 FROM( SELECT order_date, user_id, amount FROM advanced.orders GROUP BY order_date, user_id, amount ORDER BY order_date ) --------------------------------------------------------------- --------------------------------------------------------------- 1-2) SELECT order_date, MAX(IF(user_id = 1, amount , NULL)) AS user_1, MAX(IF(user_id = 2, amount , NULL)) AS user_2, MAX(IF(user_id = 3, amount , NULL)) AS user_3 FROM( SELECT order_date, user_id, amount FROM advanced.orders GROUP BY order_date, user_id, amount ORDER BY order_date ) GROUP BY order_date ORDER BY order_date --------------------------------------------------------------- --------------------------------------------------------------- 2) orders 테이블에서 날짜(order_date)별로 유저들의 주문 금액(amount)의 합계를 PIVOT해주세요. user_id를 행(Row)으로, order_date를 열(Column)으로 만들어야 합니다. SELECT order_date, SUM(IF(user_id = 1, amount , NULL)) AS user_1, SUM(IF(user_id = 2, amount , NULL)) AS user_2, SUM(IF(user_id = 3, amount , NULL)) AS user_3 FROM advanced.orders GROUP BY order_date ORDER BY order_date backtick 활용 any value는 어디에 활용할 수 있을지? -> 데이터는 믿을수 없기에 일부 데이터만 보고 사용 판단하기엔 위험할 것 같음. SELECT user_id, SUM(IF(order_date = "2023-05-01", amount, 0)) AS `2023-05-01`, SUM(IF(order_date = "2023-05-02", amount, 0)) AS `2023-05-02`, SUM(IF(order_date = "2023-05-03", amount, 0)) AS `2023-05-03`, SUM(IF(order_date = "2023-05-04", amount, 0)) AS `2023-05-04`, SUM(IF(order_date = "2023-05-05", amount, 0)) AS `2023-05-05`, FROM advanced.orders GROUP BY user_id ORDER BY user_id --------------------------------------------------------------- --------------------------------------------------------------- 3) orders 테이블에서 사용자(user_id)별, 날짜(order_date)별로 주문이 있다면 1, 없다면 0으로 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)로 만들고 주문을 많이 해도 1로 처리합니다 3-1) 주문 여부 1,0 SELECT user_id, SUM(IF(order_date = "2023-05-01", 1, 0)) AS `2023-05-01`, SUM(IF(order_date = "2023-05-02", 1, 0)) AS `2023-05-02`, SUM(IF(order_date = "2023-05-03", 1, 0)) AS `2023-05-03`, SUM(IF(order_date = "2023-05-04", 1, 0)) AS `2023-05-04`, SUM(IF(order_date = "2023-05-05", 1, 0)) AS `2023-05-05`, FROM advanced.orders GROUP BY user_id ORDER BY user_id 3-2) 횟수 SELECT user_id, SUM(IF(order_date = "2023-05-01", 1, 0)) AS `2023-05-01`, SUM(IF(order_date = "2023-05-02", 1, 0)) AS `2023-05-02`, SUM(IF(order_date = "2023-05-03", 1, 0)) AS `2023-05-03`, SUM(IF(order_date = "2023-05-04", 1, 0)) AS `2023-05-04`, SUM(IF(order_date = "2023-05-05", 1, 0)) AS `2023-05-05`, FROM advanced.orders GROUP BY user_id ORDER BY user_id -- 앱 로그 PIVOT WITH base AS( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, MAX(IF(param.key = "firebase_screen", param.value.string_value, NULL)) AS firebase_screen, MAX(IF(param.key = "food_id", param.value.int_value, NULL)) AS food_id, MAX(IF(param.key = "session_id", param.value.string_value, NULL)) AS session_id -- * EXCEPT(event_params) FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS param WHERE event_date = "2022-08-01" GROUP BY ALL ) SELECT event_date, COUNT(user_id) AS user_cnt FROM base WHERE event_name = "click_cart" AND food_id = 1544 GROUP BY event_date Q3 퍼널 데이터-- 이중 WITH 문 WITH BASE AS( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, platform, -- event_param MAX(IF(event_param.key = "firebase_screen", event_param.value.string_value, NULL )) AS firebase_screen, -- MAX(IF(event_param.key = "food_id", event_param.value.int_value, NULL )) AS food_id, MAX(IF(event_param.key = "session_id", event_param.value.string_value, NULL )) AS session_id FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event_param WHERE event_date BETWEEN "2022-08-01" AND "2022-08-18" GROUP BY ALL ) --event_name + screen (필요한 이벤트만 조건 걸어서 사용) ,filter_event_and_concat_event_and_screen AS( SELECT * EXCEPT(event_name, firebase_screen, event_timestamp), CONCAT(event_name, "-", firebase_screen) AS event_name_with_screen, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime FROM BASE WHERE event_name IN ("screen_view","click_payment") ) --step_number + COUNT --CASE WHEN 사용 SELECT event_date, event_name_with_screen, CASE WHEN event_name_with_screen = "screen_view-welcome" THEN 1 WHEN event_name_with_screen = "screen_view-home" THEN 2 WHEN event_name_with_screen = "screen_view-food_category" THEN 3 WHEN event_name_with_screen = "screen_view-restaurant" THEN 4 WHEN event_name_with_screen = "screen_view-cart" THEN 5 WHEN event_name_with_screen = "click_payment-cart" THEN 6 ELSE NULL END AS step_number, COUNT(DISTINCT user_pseudo_id) AS cnt FROM filter_event_and_concat_event_and_screen GROUP BY ALL HAVING step_number IS NOT NULL ORDER BY event_date -- food_detail, search, search_result도 파악 STRUCT 과 UNNEST 처음 접해보는 내용이라, 복습 필요.PIVOT 내용 중 ANY_VALUE는 데이터 양이 많고, 어떤 데이터들이 어떤 특성을 가지고 담겨있는지 정확하게 모른다면 활용하면 위험하겠다는 생각이 들었음.