묻고 답해요
143만명의 커뮤니티!! 함께 토론해봐요.
인프런 TOP Writers
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 1주차 과제] ARRAY, STRUCT 연습문제 / PIVOT 연습문제 / 퍼널별 전환율을 쉽게 구할 수 있도록 PIVOT해보기
연습문제CREATE OR REPLACE TABLE advanced.array_excercised AS #DDL SELECT movie_id, title, actors, genres FROM ( SELECT 1 AS movie_id, 'Avengers: Endgame' AS title, ARRAY<STRUCT<actor STRING, character STRING>>[ STRUCT('Robert Downey Jr.', 'Tony Stark'), STRUCT('Chris Evans', 'Steve Rogers') ] AS actors, ARRAY<STRING>['Action', 'Adventure', 'Drama'] AS genres UNION ALL SELECT 2, 'Inception', ARRAY<STRUCT<actor STRING,character STRING>>[ STRUCT('leonardo DiCaprio', 'Cobb'), STRUCT('Joseph Gordon-Levitt', 'Arthur') ], ARRAY<STRING>['Action', 'Adventure', 'Sci-Fi'] UNION ALL SELECT 3, 'The Dark Knight', ARRAY<STRUCT<actor STRING, character STRING>>[ STRUCT('Christian Bale', 'Bruce Wayne'), STRUCT('Heath Ledger', 'Joker') ], ARRAY<STRING>['Action', 'Crime', 'Drama'] ) → actors라는 STRUCT 구조체를 만들고 그 안에 2개의 필드 actor와 character를 지정하고, STRUCT 구조체 2개를 list처럼 ARRAY에 넣은 것.위의 테이블을 가지고 연습문제 1~4번 진행array_excercises 테이블에서 각 영화(title)별로 장르(genres)를 UNNEST해서 보여주세요.SELECT title, genre FROM advanced.array_excercised CROSS JOIN UNNEST(genres) AS genre array_excercised 테이블에서 각 영화(title)별로 배우(actor)와 배역(character)을 보여주세요. 배우와 배역은 별도의 컬럼으로 나와야 합니다.SELECT title, actor.actor, actor.character FROM advanced.array_excercised CROSS JOIN UNNEST(actors) AS actor actors ARRAY안에 2개의 STRUCT 구조체가 있는 구조이므로 ARRAY 데이터에 접근하는 방법으로 데이터에는 접근 가능actors[SAFE_OFFSET(0)].actor AS first_actorarray_excercises 테이블에서 각 영화(title)별로 배우(actor), 배역(character), 장르(genre)를 출력하세요. 한 Row에 배우, 배역, 장르가 모두 표시되어야 합니다.SELECT title, actor.actor, actor.character, genre FROM advanced.array_excercised CROSS JOIN UNNEST(actors) AS actor CROSS JOIN UNNEST(genres) AS genre *UNNEST를 2번 연속 사용할 수 있다.*쿼리문의 실행순서는 FROM → JOIN → SELECT 이다. UNNEST를 통해 만들어진 actor는 현재 actor.actor가 아니라 actor라는 STRUCT 구조체이므로 구조체에 바로 접근할 수 없다는 에러가 뜰 수 있다.앱 로그 데이터(app_logs)의 배열을 풀어주세요. SELECT event_date, event_timestamp, event_name, event_param.key AS key, event_param.value AS value, user_id, user_pseudo_id, platform FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event_paramevent_params는 RECORD라고 되어있는데 STRUCT이다. 중첩된 구조라는 의미.PIVOT 연습문제orders 테이블에서 유저(user_id)별로 주문 금액(amount)의 합계를 PIVOT해주세요. 날짜(order_date)를 행(Row)으로, user_id를 열(column)으로 만들어야 합니다.SELECT order_date, SUM(IF(user_id = 1, amount, 0)) AS user_1, SUM(IF(user_id = 2, amount, 0)) AS user_2, SUM(IF(user_id = 3, amount, 0)) AS user_3 FROM advanced.orders GROUP BY order_date ORDER BY order_date 2. orders 테이블에서 날짜(order_date)별로 유저들의 주문 금액(amount)의 합계를 PIVOT해주세요. user_id를 행(Row)으로, order_date를 열(Column)으로 만들어야 합니다.SELECT user_id, SUM(IF(order_date = "2023-05-01", amount, 0)) AS `2023-05-01`, SUM(IF(order_date = "2023-05-02", amount, 0)) AS `2023-05-02`, SUM(IF(order_date = "2023-05-03", amount, 0)) AS `2023-05-03`, SUM(IF(order_date = "2023-05-04", amount, 0)) AS `2023-05-04`, SUM(IF(order_date = "2023-05-05", amount, 0)) AS `2023-05-05` FROM advanced.orders GROUP BY user_id orders 테이블에서 사용자(user_id)별, 날짜(order_date)별로 주문이 있다면 1, 없다면 0으로 PIVOT해주세요. user_id를 행(Row)으로, order_date를 열(Column)로 만들고 주문을 많이 해도 1로 처리합니다.SELECT user_id, MAX(IF(order_date = "2023-05-01", 1, 0)) AS `2023-05-01`, MAX(IF(order_date = "2023-05-02", 1, 0)) AS `2023-05-02`, MAX(IF(order_date = "2023-05-03", 1, 0)) AS `2023-05-03`, MAX(IF(order_date = "2023-05-04", 1, 0)) AS `2023-05-04`, MAX(IF(order_date = "2023-05-05", 1, 0)) AS `2023-05-05` FROM advanced.orders GROUP BY user_id 4. user_id = 32888이 카트 추가하기(click_cart)를 누를 때 어떤 음식(food_id)을 담았나요?WITH base AS( SELECT #* EXCEPT(event_params), event_date, event_timestamp, event_name, event_param.key AS key, event_param.value.string_value AS string_value, event_param.value.int_value AS int_value, user_id, user_pseudo_id, platform FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event_param ) SELECT user_id, event_date, event_name, MAX(IF(key = 'firebase_screen', string_value, NULL)) AS firebase_screen, MAX(IF(key = 'food_id', int_value, NULL)) AS food_id, MAX(IF(key = 'session_id', string_value, NULL)) AS session_id FROM base WHERE user_id = 32888 AND event_name = 'click_cart' GROUP BY user_id, event_date, event_name퍼널별 전환율을 쉽게 구할 수 있도록 PIVOT 해보기 WITH base AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, platform, #event_param, MAX(IF(event_param.key = 'firebase_screen', event_param.value.string_value, NULL)) AS firebase_screen, #MAX(IF(event_param.key = 'food_id', event_param.value.int_value, NULL)) AS food_id, MAX(IF(event_param.key = 'session_id', event_param.value.string_value, NULL)) AS session_id FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event_param WHERE #event_date = "2022-08-01" #적은 데이터로 쿼리를 작성하기 위해 만들어둔 조건 event_date BETWEEN "2022-08-01" AND "2022-08-18" GROUP BY ALL ), filter_event_and_concat_event_and_screen AS ( SELECT * EXCEPT(event_name, firebase_screen, event_timestamp), CONCAT(event_name, "-", firebase_screen) AS event_name_with_screen, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime FROM base WHERE event_name IN ("screen_view", "click_payment") ), funnel_analysis AS ( SELECT event_date, #일자별로 퍼널별 유저수 쿼리 event_name_with_screen, CASE WHEN event_name_with_screen = "screen_view-welcome" THEN 1 WHEN event_name_with_screen = "screen_view-home" THEN 2 WHEN event_name_with_screen = "screen_view-food_category" THEN 3 WHEN event_name_with_screen = "screen_view-restaurant" THEN 4 WHEN event_name_with_screen = "screen_view-cart" THEN 5 WHEN event_name_with_screen = "click_payment-cart" THEN 6 ELSE NULL END AS step_number, COUNT(DISTINCT user_pseudo_id) AS cnt FROM filter_event_and_concat_event_and_screen GROUP BY ALL HAVING step_number IS NOT NULL ORDER BY event_date ) SELECT event_date, MAX(IF (event_name_with_screen = "screen_view-welcome", cnt, NULL)) AS `screen_view-welcome`, MAX(IF (event_name_with_screen = "screen_view-home", cnt, NULL)) AS `screen_view-home`, MAX(IF (event_name_with_screen = "screen_view-food_category", cnt, NULL)) AS `screen_view-food_category`, MAX(IF (event_name_with_screen = "screen_view-restaurant", cnt, NULL)) AS `screen_view-restaurant`, MAX(IF (event_name_with_screen = "screen_view-cart", cnt, NULL)) AS `screen_view-cart`, MAX(IF (event_name_with_screen = "click_payment-cart", cnt, NULL)) AS `click_payment-cart` FROM funnel_analysis GROUP BY ALL ORDER BY event_date
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 1주차 과제] Array, Struct, Pivot, Funnel
1. ARRAY, STRUCT1) array_exercises 테이블에서 각 영화(title)별로 장르(genres)를 UNNEST해서 보여주세요.SELECT title , genre FROM advanced.array_exercises CROSS JOIN UNNEST(genres) AS genre ;2) array_exercises 테이블에서 각 영화(title)별로 배우(actor)와 배역(character)을 보여주세요. (배우와 배역은 별도의 컬럼으로 나와야 합니다)-- 동일한 단어에 대해 선택할 수 있는 함수 : cmd+d SELECT title , actor.actor , actor.character FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS actor ;3) array_exercises 테이블에서 각 영화(title)별로 배우(actor), 배역(character), 장르(genre)를 출력하세요. 한 Row에 배우, 배역, 장르가 모두 표시되어야 합니다.SELECT title , actor.actor , actor.character , genre FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS actor CROSS JOIN UNNEST(genres) AS genre ;4) 앱 로그 데이터(app_logs)의 배열을 풀어주세요.SELECT user_id , event_date , event_name , user_pseudo_id , event_param.key , event_param.value.string_value , event_param.value.int_value , platform FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event_param ;2. PIVOT1) orders 테이블에서 유저(user_id)별로 주문 금액(amount)의 합계를 PIVOT해주세요. 날짜(order_date)를 행(Row)으로, user_id를 열(Column)으로 만들어야 합니다.SELECT order_date , SUM(IF(user_id = 1, amount, NULL)) AS `user_id_1` , SUM(IF(user_id = 2, amount, NULL)) AS `user_id_2` , SUM(IF(user_id = 3, amount, NULL)) AS `user_id_3` FROM advanced.orders GROUP BY 1 ORDER BY 1 ;2) orders 테이블에서 날짜(order_date)별로 유저들의 주문 금액(amount)의 합계를 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)으로 만들어야 합니다.SELECT user_id , SUM(IF(order_date = '2023-05-01', amount, 0)) AS `2023-05-01` , SUM(IF(order_date = '2023-05-02', amount, 0)) AS `2023-05-02` , SUM(IF(order_date = '2023-05-03', amount, 0)) AS `2023-05-03` , SUM(IF(order_date = '2023-05-04', amount, 0)) AS `2023-05-04` , SUM(IF(order_date = '2023-05-05', amount, 0)) AS `2023-05-05` FROM advanced.orders GROUP BY 1 ;3) orders 테이블에서 사용자(user_id)별, 날짜(order_date)별로 주문이 있다면 1, 없다면 0으로 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)로 만들고 주문을 많이 해도 1로 처리합니다SELECT user_id , MAX(IF(order_date = '2023-05-01', 1, 0)) AS `2023-05-01` , MAX(IF(order_date = '2023-05-02', 1, 0)) AS `2023-05-02` , MAX(IF(order_date = '2023-05-03', 1, 0)) AS `2023-05-03` , MAX(IF(order_date = '2023-05-04', 1, 0)) AS `2023-05-04` , MAX(IF(order_date = '2023-05-05', 1, 0)) AS `2023-05-05` FROM advanced.orders GROUP BY 1 ;4) user_id = 32888이 카트 추가하기(click_cart)를 누를 때 어떤 음식(food_id)을 담았나요?WITH app_logs_info AS ( SELECT user_id , event_name , MAX(IF(event_param.key = 'firebase_screen', event_param.value.string_value, NULL)) AS firebase_screen , MAX(IF(event_param.key = 'food_id', event_param.value.int_value, NULL)) AS food_id FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event_param GROUP BY 1, 2 ) SELECT food_id FROM app_logs_info WHERE user_id = 32888 AND event_name = 'click_cart' ; 3. Funnel1) 일자별, 이벤트별 집계WITH app_logs_info AS ( SELECT user_id , event_date , event_timestamp , event_name , user_pseudo_id , event_param.key , MAX(IF(event_param.key = 'firebase_screen', event_param.value.string_value, NULL)) AS firebase_screen , MAX(IF(event_param.key = 'food_id', event_param.value.int_value, NULL)) AS food_id FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event_param WHERE event_date BETWEEN '2022-08-01' AND '2022-08-18' AND event_name IN ('screen_view', 'click_payment') GROUP BY ALL ) , add_step_number AS ( SELECT event_date , DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_date_time , user_id , user_pseudo_id , CONCAT(event_name, '-', firebase_screen) AS event_name_with_screen , CASE CONCAT(event_name, '-', firebase_screen) WHEN 'screen_view-welcome' THEN 1 WHEN 'screen_view-home' THEN 2 WHEN 'screen_view-food_category' THEN 3 WHEN 'screen_view-restaurant' THEN 4 WHEN 'screen_view-cart' THEN 5 WHEN 'click_payment-cart' THEN 6 ELSE NULL END AS step_number FROM app_logs_info ) SELECT event_date , step_number , event_name_with_screen , COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM add_step_number WHERE step_number IS NOT NULL GROUP BY 1, 2, 3 ORDER BY 1, 2 ; 2) 집계 데이터 PIVOTWITH app_logs_info AS ( SELECT user_id , event_date , event_timestamp , event_name , user_pseudo_id , event_param.key , MAX(IF(event_param.key = 'firebase_screen', event_param.value.string_value, NULL)) AS firebase_screen , MAX(IF(event_param.key = 'food_id', event_param.value.int_value, NULL)) AS food_id FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event_param WHERE event_date BETWEEN '2022-08-01' AND '2022-08-18' GROUP BY ALL ) , add_step_number AS ( SELECT event_date , DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_date_time , user_id , user_pseudo_id , CONCAT(event_name, '-', firebase_screen) AS event_name_with_screen , CASE CONCAT(event_name, '-', firebase_screen) WHEN 'screen_view-welcome' THEN 1 WHEN 'screen_view-home' THEN 2 WHEN 'screen_view-food_category' THEN 3 WHEN 'screen_view-restaurant' THEN 4 WHEN 'screen_view-cart' THEN 5 WHEN 'click_payment-cart' THEN 6 ELSE NULL END AS step_number FROM app_logs_info WHERE event_name IN ('screen_view', 'click_payment') ) , agg_user_cnt AS ( SELECT event_date , step_number , event_name_with_screen , COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM add_step_number WHERE step_number IS NOT NULL GROUP BY 1, 2, 3 ORDER BY 1, 2 ) SELECT event_date , MAX(IF(step_number = 1, user_cnt, NULL)) AS `screen_view-welcome` , MAX(IF(step_number = 2, user_cnt, NULL)) AS `screen_view-home` , MAX(IF(step_number = 3, user_cnt, NULL)) AS `screen_view-food_category` , MAX(IF(step_number = 4, user_cnt, NULL)) AS `screen_view-restaurant` , MAX(IF(step_number = 5, user_cnt, NULL)) AS `screen_view-cart` , MAX(IF(step_number = 6, user_cnt, NULL)) AS `click_payment-cart` FROM agg_user_cnt GROUP BY 1 ORDER BY 1 ;
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[바짝스터디 1주차 과제] ARRAY, STRUCT 연습문제 / PIVOT 연습문제 / 퍼널 쿼리 연습문제
1. ARRAY, STRUCT 연습문제연습문제 1SELECT title , genre FROM `advanced.array_exercises` CROSS JOIN UNNEST(genres) AS genre 연습문제 2SELECT title , actor.actor , actor.character FROM `advanced.array_exercises` CROSS JOIN UNNEST(actors) AS actor 연습문제 3SELECT title , actor.actor AS actor , actor.character AS character , genre FROM `advanced.array_exercises` CROSS JOIN UNNEST(actors) AS actor CROSS JOIN UNNEST(genres) AS genre[메모]여기서 CROSS JOIN 다음 라인에 WHERE actor = 'Chris Evans' 이렇게 조건을 걸어줄 경우 오류가 발생한다. 오류는 실행 순서와 관련이 있다. 실행 순서: FROM -> JOIN -> SELECT따라서 SELECT 에서 알리아스로 이름 붙인 actor 가 아닌, CROSS JOIN 결과를 사용하여 조건을 만들어야 한다. => WHERE actor.actor = 'Chris Evans' 연습문제 4SELECT user_id , event_date , event_name , user_pseudo_id , parameter.key AS key , parameter.value.string_value AS string_value , parameter.value.int_value AS int_value FROM `inflearn-bigquery-437203.advanced.app_logs` CROSS JOIN UNNEST (event_params) AS parameter WHERE event_date = "2022-08-01" 2. PIVOT 연습문제연습문제 1SELECT order_date , SUM(IF(user_id = 1, amount, 0)) AS user_1 , SUM(IF(user_id = 2, amount, 0)) AS user_2 , SUM(IF(user_id = 3, amount, 0)) AS user_3 FROM `inflearn-bigquery-437203.advanced.orders` GROUP BY order_date ORDER BY order_date 연습문제 2SELECT user_id , SUM(IF(order_date= '2023-05-01', amount, 0)) AS `2023-05-01` , SUM(IF(order_date= '2023-05-02', amount, 0)) AS `2023-05-02` , SUM(IF(order_date= '2023-05-03', amount, 0)) AS `2023-05-03` , SUM(IF(order_date= '2023-05-04', amount, 0)) AS `2023-05-04` , SUM(IF(order_date= '2023-05-05', amount, 0)) AS `2023-05-05` FROM `inflearn-bigquery-437203.advanced.orders` GROUP BY user_id ORDER BY user_id[메모]알리아스로 컬럼명 지정할 때, 영어 제외하고 다른 문자열가 포함될 경우 → backtick (`) 으로 감싸준다. 연습문제 3SELECT user_id , MAX(IF(order_date= '2023-05-01', 1, 0)) AS `2023-05-01` , MAX(IF(order_date= '2023-05-02', 1, 0)) AS `2023-05-02` , MAX(IF(order_date= '2023-05-03', 1, 0)) AS `2023-05-03` , MAX(IF(order_date= '2023-05-04', 1, 0)) AS `2023-05-04` , MAX(IF(order_date= '2023-05-05', 1, 0)) AS `2023-05-05` FROM `inflearn-bigquery-437203.advanced.orders` GROUP BY user_id ORDER BY user_id 연습문제 4SELECT event_date , event_timestamp , event_name , user_id , user_pseudo_id , MAX(IF(params.key = 'firebase_screen', params.value.string_value, NULL)) AS firebase_screen , MAX(IF(params.key = 'food_id', params.value.int_value, NULL)) AS food_id , MAX(IF(params.key = 'session_id', params.value.string_value, NULL)) AS session_id FROM `inflearn-bigquery-437203.advanced.app_logs` CROSS JOIN UNNEST(event_params) AS params WHERE event_date = '2022-08-01' GROUP BY ALL 3. 퍼널 쿼리 연습문제연습문제 1: 각 퍼널별 유저 수 집계-- 퍼널 단계: 6 -- screen_view(welcome) -- screen_view(home) -- screen_view(food_category) -- screen_view(restaurant) -- screen_view(cart) -- click_payment(cart) WITH funnel AS ( SELECT CONCAT(event_name, '-', param.value.string_value) AS event_name_with_screen , CASE WHEN event_name = 'screen_view' AND param.value.string_value = 'welcome' THEN 1 WHEN event_name = 'screen_view' AND param.value.string_value = 'home' THEN 2 WHEN event_name = 'screen_view' AND param.value.string_value = 'food_category' THEN 3 WHEN event_name = 'screen_view' AND param.value.string_value = 'restaurant' THEN 4 WHEN event_name = 'screen_view' AND param.value.string_value = 'cart' THEN 5 WHEN event_name = 'click_payment' AND param.value.string_value = 'cart' THEN 6 END AS step_number , user_pseudo_id FROM `inflearn-bigquery-437203.advanced.app_logs` CROSS JOIN UNNEST(event_params) AS param WHERE event_date BETWEEN '2022-08-01' AND '2022-08-18' AND event_name IN ('screen_view', 'click_payment') AND param.key = 'firebase_screen' AND param.value.string_value IN ('welcome', 'home', 'food_category', 'restaurant', 'cart') ) SELECT event_name_with_screen , MAX(step_number) AS step_number , COUNT(DISTINCT user_pseudo_id) AS cnt FROM funnel GROUP BY event_name_with_screen 연습문제 2: 일자별 각 퍼널의 유저 수 집계-- 퍼널 단계: 6 -- screen_view(welcome) -- screen_view(home) -- screen_view(food_category) -- screen_view(restaurant) -- screen_view(cart) -- click_payment(cart) WITH funnel AS ( SELECT event_date -- 날짜 컬럼 추가 , CONCAT(event_name, '-', param.value.string_value) AS event_name_with_screen , CASE WHEN event_name = 'screen_view' AND param.value.string_value = 'welcome' THEN 1 WHEN event_name = 'screen_view' AND param.value.string_value = 'home' THEN 2 WHEN event_name = 'screen_view' AND param.value.string_value = 'food_category' THEN 3 WHEN event_name = 'screen_view' AND param.value.string_value = 'restaurant' THEN 4 WHEN event_name = 'screen_view' AND param.value.string_value = 'cart' THEN 5 WHEN event_name = 'click_payment' AND param.value.string_value = 'cart' THEN 6 END AS step_number , user_pseudo_id FROM `inflearn-bigquery-437203.advanced.app_logs` CROSS JOIN UNNEST(event_params) AS param WHERE event_date BETWEEN '2022-08-01' AND '2022-08-18' AND event_name IN ('screen_view', 'click_payment') AND param.key = 'firebase_screen' AND param.value.string_value IN ('welcome', 'home', 'food_category', 'restaurant', 'cart') ) SELECT event_date -- 날짜 컬럼 추가 , event_name_with_screen , MAX(step_number) AS step_number , COUNT(DISTINCT user_pseudo_id) AS cnt FROM funnel GROUP BY ALL ORDER BY event_date, step_number 연습문제 3: 일자별 각 퍼널의 유저 수 집계한 결과 → PIVOT 하기-- 퍼널 단계: 6 -- screen_view(welcome) -- screen_view(home) -- screen_view(food_category) -- screen_view(restaurant) -- screen_view(cart) -- click_payment(cart) WITH funnel AS ( SELECT event_date , CONCAT(event_name, '-', param.value.string_value) AS event_name_with_screen , CASE WHEN event_name = 'screen_view' AND param.value.string_value = 'welcome' THEN 1 WHEN event_name = 'screen_view' AND param.value.string_value = 'home' THEN 2 WHEN event_name = 'screen_view' AND param.value.string_value = 'food_category' THEN 3 WHEN event_name = 'screen_view' AND param.value.string_value = 'restaurant' THEN 4 WHEN event_name = 'screen_view' AND param.value.string_value = 'cart' THEN 5 WHEN event_name = 'click_payment' AND param.value.string_value = 'cart' THEN 6 END AS step_number , user_pseudo_id FROM `inflearn-bigquery-437203.advanced.app_logs` CROSS JOIN UNNEST(event_params) AS param WHERE event_date BETWEEN '2022-08-01' AND '2022-08-18' AND event_name IN ('screen_view', 'click_payment') AND param.key = 'firebase_screen' AND param.value.string_value IN ('welcome', 'home', 'food_category', 'restaurant', 'cart') ), -- 일자별 각 퍼널의 유저수 집계 funnel_daily AS( SELECT event_date , event_name_with_screen , MAX(step_number) AS step_number , COUNT(DISTINCT user_pseudo_id) AS cnt FROM funnel GROUP BY ALL ORDER BY event_date, step_number ) -- 일자별 각 퍼널의 유저수 집계 -> 피벗하기 SELECT event_date , SUM(IF(event_name_with_screen = 'screen_view-welcome', cnt, 0)) AS `screen_view-welcome` , SUM(IF(event_name_with_screen = 'screen_view-home', cnt, 0)) AS `scree_view-home` , SUM(IF(event_name_with_screen = 'screen_view-food_category', cnt, 0)) AS `screen_view-food_category` , SUM(IF(event_name_with_screen = 'screen_view-restaurant', cnt, 0)) AS `screen_view-restaurant` , SUM(IF(event_name_with_screen = 'screen_view-cart', cnt, 0)) AS `screen_view-cart` , SUM(IF(event_name_with_screen = 'click_payment-cart', cnt, 0)) AS `click_payment-cart` FROM funnel_daily GROUP BY event_date ORDER BY event_date
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 1주차 과제] ARRAY, STRUCT 연습 문제/ PIVOT 연습문제/ 퍼널 분석 연습 문제
1-4. Array, Struct 연습문제 (1~4번)연습문제1문제array_exercises 테이블에서 각 영화(title)별로 장르(genres)를 UNNEST해서 보여주세요 쿼리select title, genres2 from advanced.array_exercises cross join unnest(genres) as genres2 ;결과 연습문제2문제array_exercises 테이블에서 각 영화(title)별로 배우(actor)와 배역(character)을 보여주세요. 배우와 배역은 별도의 컬럼으로 나와야 합니다쿼리select title, actor.actor, actor.character from advanced.array_exercises cross join unnest(actors) as actor order by movie_id ;결과 연습문제3문제array_exercises 테이블에서 각 영화(title)별로 배우(actor), 배역(character), 장르 (genre)를 출력하세요. 한 Row에 배우, 배역, 장르가 모두 표시되어야 합니다쿼리select title, actor.actor, actor.character genre2 # array<string> from advanced.array_exercises cross join unnest (actors) as actor cross join unnest (genres) as genre order by movie_id ;결과 연습문제4문제앱 로그 데이터(app_logs)의 배열을 풀어주세요쿼리select user_id, event_date, event_name, user_pseudo_id, event_param.key as key , event_param.value.string_value as string_value , event_param.value.int_value as int_value from advanced.app_logs cross join unnest (event_params) as event_param where event_date = "2022-08-01" limit 10 ;결과 1-9. 퍼널 SQL 쿼리 작성하기연습문제1문제orders 테이블에서 유저(user_id)별로 주문 금액(amount)의 합계를 PIVOT해주세요. 날짜(order_date)를 행(Row)으로, user_id를 열(Column)으로 만들어야 합니다쿼리select order_date, sum(if(user_id = 1, amount, 0)) as user_1, sum(if(user_id = 2, amount, 0)) as user_2, sum(if(user_id = 3, amount, 0)) as user_3 from advanced.orders group by order_date order by 1 ;결과연습문제2문제orders 테이블에서 날짜(order_date)별로 유저들의 주문 금액(amount)의 합계를 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)으로 만들어야 합니다쿼리select user_id, # 컬럼의 이름을 지정할때, 영어를 제외하고 backtick(`)사용 sum(if(order_date = "2023-05-01", amount, 0)) as `2023-05-01`, sum(if(order_date = "2023-05-02", amount, 0)) as `2023-05-02`, sum(if(order_date = "2023-05-03", amount, 0)) as `2023-05-03`, sum(if(order_date = "2023-05-04", amount, 0)) as `2023-05-04`, sum(if(order_date = "2023-05-05", amount, 0)) as `2023-05-05` from advanced.orders group by user_id order by 1 ;결과연습문제3문제orders 테이블에서 사용자(user_id)별, 날짜(order_date)별로 주문이 있다면 1, 없다면 0으로 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)로 만들고 주문을 많이 해도 1로 처리합니다쿼리select user_id, max(if(order_date = "2023-05-01", 1, 0)) as `2023-05-01`, max(if(order_date = "2023-05-02", 1, 0)) as `2023-05-02`, max(if(order_date = "2023-05-03", 1, 0)) as `2023-05-03`, max(if(order_date = "2023-05-04", 1, 0)) as `2023-05-04`, max(if(order_date = "2023-05-05", 1, 0)) as `2023-05-05` from advanced.orders group by user_id order by 1 ;결과연습문제4문제앱 로그 데이터 배열 PIVOT하기 - user_id = 32888이 카트 추가하기(click_cart)를 누를때 어떤 음식(food_id)을 담았나요?쿼리select event_date, event_timestamp, event_name, user_id, user_pseudo_id, max(if(param.key = 'firebase_screen', param.value.string_value, null)) as firebase_screen, -- max(if(param.key = 'food_id', param.value.string_value, null)) as food_id, max(if(param.key = 'food_id', param.value.int_value, null)) as food_id, max(if(param.key = 'session_id', param.value.string_value, null)) as session_id from advanced.app_logs cross join unnest(event_params) as param where event_date = "2022-08-01" and user_id = 32888 and event_name = "click_cart" group by all limit 100 ; 결과퍼널 분석문제step_number별 count, 일자별 퍼널별 유저 수 쿼리쿼리with base as (select event_date, event_timestamp, event_name, user_id, user_pseudo_id, max(if(event_param.key = 'firebase_screen', event_param.value.string_value, null)) as firebase_screen, -- max(if(event_param.key = 'food_id', event_param.value.int_value, null)) as food_id, max(if(event_param.key = 'session_id', event_param.value.string_value, null)) as session_id from advanced.app_logs cross join unnest(event_params) as event_param where 1=1 and event_date between "2022-08-01" and "2022-08-18" group by all ), filter_event_and_concat_event_and_acreen AS ( -- event_name + screen select * except(event_name, firebase_screen, event_timestamp), concat(event_name, "-", firebase_screen) as event_name_with_screen, datetime(timestamp_micros(event_timestamp), "Asia/Seoul") as event_datetime from base where 1=1 and event_name in ("screen_view", "click_payment")) # 일자별로 퍼널별 유저 수 select -- distinct(event_name_with_screen) event_date, event_name_with_screen, case when event_name_with_screen = 'screen_view-welcome' then 1 when event_name_with_screen = 'screen_view-home' then 2 when event_name_with_screen = 'screen_view-food_category' then 3 when event_name_with_screen = 'screen_view-restaurant' then 4 when event_name_with_screen = 'screen_view-cart' then 5 when event_name_with_screen = 'click_payment-cart' then 6 else null end as step_number, count(distinct user_pseudo_id) as cnt from filter_event_and_concat_event_and_acreen group by all having step_number is not null order by event_date ;결과
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[바짝스터디 1주차 과제] ARRAY, STRUCT 연습 문제 / PIVOT 연습 문제 / 퍼널 쿼리 연습 문제
1. ARRAY, STRUCT 연습문제 연습 문제 1번# 1)array_exercises테이블에서 각 영화(title)별로 장르(genres)를 UNNEST해서 보여주세요 SELECT title, genre FROM `advanced.array_exercises` AS exercise CROSS JOIN UNNEST(genres) AS genre SELECT title, # 기존에 array_exercises에 저장되어 있던 컬럼 genre FROM `advanced.array_exercises` AS ae, UNNEST(genres) AS genre # ARRAY : 같은 타입의 여러 데이터를 저장하고 싶을 때 # ARRAY를 Flatten(평면화) => UNNEST # UNNEST를 할 때는 CROSS JOIN + UNNEST(ARRAY_COLUMN) # UNNEST(ARRAY_COLUMN) AS 새로운 이름 # SELECT 절에서 새로운 이름으로 시작한다. 기존의 ARRAY_COLUMN은 사용하지 않는다! 연습 문제 2번# 2) array_exercises 테이블에서 각 영화(title)별로 배우(actor)와 배역(character)을 보여주세요. 배우와 배역은 별도의 컬럼으로 나와야 합니다 SELECT title, actor.actor, actor.character -- FROM `advanced.array_exercises` AS ae, UNNEST(actors) AS actor FROM `advanced.array_exercises` AS ae CROSS JOIN UNNEST(actors) AS actor # actors = [STRUCT(STRING, STRING)] 연습 문제 3번# 3) array_exercises 테이블에서 각 영화(title)별로 배우(actor), 배역(character), 장르(genre)를 출력하세요. 한 Row에 배우, 배역, 장르가 모두 표시 되어야 합니다 SELECT title, actor.actor, actor.character, genre FROM `advanced.array_exercises` AS ae ,UNNEST(actors) AS actor, UNNEST(genres) as genre -- FROM `advanced.array_exercises` AS ae -- CROSS JOIN UNNEST(actors) AS actor -- CROSS JOIN UNNEST(genres) as genre # 이 문제의 의도 : UNNEST를 2번 연속 사용할 수 있다. # CROSS JOIN => JOIN 연속 2번과 맥락은 동일한데, UNNEST라는 것이 어색할 수 있었다 # 데이터의 중복이 어느정도 생기는데, 그것은 어쩔 수 없는 이슈(CROSS JOIN) -- FROM `advanced.array_exercises` AS ae -- CROSS JOIN UNNEST(actors) AS actor, UNNEST(genres) as genre SELECT title, actor.actor, actor.character, genre FROM `advanced.array_exercises` AS ae ,UNNEST(actors) AS actor, UNNEST(genres) as genre -- FROM `advanced.array_exercises` AS ae -- CROSS JOIN UNNEST(actors) AS actor -- CROSS JOIN UNNEST(genres) as genre # 이 문제의 의도 : UNNEST를 2번 연속 사용할 수 있다. # CROSS JOIN => JOIN 연속 2번과 맥락은 동일한데, UNNEST라는 것이 어색할 수 있었다 # 데이터의 중복이 어느정도 생기는데, 그것은 어쩔 수 없는 이슈(CROSS JOIN) -- FROM `advanced.array_exercises` AS ae -- CROSS JOIN UNNEST(actors) AS actor, UNNEST(genres) as genre WHERE actor.actor = 'Chris Evans' AND genre = 'Action' -- WHERE actor = 'Chris Evans' (X) # 실행 순서 : FROM -> JOIN -> SELECT # actors : ARRAY<STRUCT> => UNNEST => STRUCT # genres : ARRAY<STRING> => STRING 연습 문제 4번# 4) 앱 로그 데이터(app_logs)의 배열을 풀어주세요 SELECT event_date, event_timestamp, event_name, event_param.key as key, event_param.value as value, event_param.value.string_value as string_value, event_param.value.int_value as int_value, -- event_params, user_id, event_param FROM `advanced.app_logs` CROSS JOIN UNNEST(event_params) AS event_param WHERE event_date ='2022-08-01' 2. PIVOTPIVOT 연습 문제 1## SubQuery 방식 SELECT order_date, SUM(IF(user_id = 1, sum_of_amount, NULL)) AS user_1, SUM(IF(user_id = 2, sum_of_amount, NULL)) AS user_2, SUM(IF(user_id = 3, sum_of_amount, NULL)) AS user_3 -- MAX를 써도 동일한 결과 값이 나옴 -- 그룹화 할때 값이 하나밖에 없음 FROM ( SELECT order_date, user_id, # Amount의 합 SUM(amount) AS sum_of_amount FROM advanced.orders GROUP BY order_date, user_id -- ORDER BY order_date ) GROUP BY order_date ORDER BY order_date PIVOT 연습 문제 2# ANY_VALUE 활용 -- ANY_VALUE : 그룹화 할 대상 중에 임의의 값을 선택한다 (NULL)을 제외하고 -- ANY_VALUE에선 나머지 값들이 NULL 이거나 확정적으로 이 값이 나올 것이다 기대할 때 사용한다 SELECT user_id, ANY_VALUE(IF(order_date = PARSE_DATE('%Y-%m-%d', '2023-05-01'), amount, NULL)) AS `2023-05-01`, ANY_VALUE(IF(order_date = date('2023-05-02'), amount, NULL)) AS `2023-05-02`, ANY_VALUE(IF(order_date = date('2023-05-03'), amount, NULL)) AS `2023-05-03`, ANY_VALUE(IF(order_date = date('2023-05-04'), amount, NULL)) AS `2023-05-04`, ANY_VALUE(IF(order_date = date('2023-05-05'), amount, NULL)) AS `2023-05-05` FROM `advanced.orders` GROUP BY user_id ORDER BY user_id PIVOT 연습 문제 3# 3) orders 테이블에서 사용자(user_id)별, 날짜(order_date)별로 주문이 있다면 1, 없다면 0으로 PIVOT 해주세요.user_id를 행(Row)으로, order_date를 열(Column)로 만들고 주문을 많이 해도 1로 처리합니다 SELECT user_id, # amount 대신 1 이라고 표시. IF 문 안에 TRUE 일 때의 값이 항상 특정 컬럼이 아니라 1이라고 할 수도 있음(유무에 따라서) MAX(IF(order_date = PARSE_DATE('%Y-%m-%d', '2023-05-01'), 1, 0)) AS `2023-05-01`, MAX(IF(order_date = date('2023-05-02'), 1, 0)) AS `2023-05-02`, MAX(IF(order_date = date('2023-05-03'), 1, 0)) AS `2023-05-03`, MAX(IF(order_date = date('2023-05-04'), 1, 0)) AS `2023-05-04`, MAX(IF(order_date = date('2023-05-05'), 1, 0)) AS `2023-05-05` FROM `advanced.orders` GROUP BY user_id PIVOT 연습 문제 4 앱 로그 데이터 배열 PIVOT 하기WITH base AS ( SELECT event_date, event_name, user_id, user_pseudo_id, event_timestamp, MAX(IF(param.key = 'firebase_screen', param.value.string_value, NULL)) AS firebase_screen, MAX(IF(param.key = 'food_id', param.value.int_value, NULL)) AS food_id, MAX(IF(param.key = 'session_id', param.value.string_value, NULL)) AS session_id FROM `advanced.app_logs` CROSS JOIN UNNEST(event_params) AS param GROUP BY ALL -- WHERE event_name ='click_cart' ORDER BY user_pseudo_id LIMIT 100 ) SELECT event_date, COUNT(user_id) AS user_cnt FROM base WHERE event_name ='click_cart' GROUP BY event_date ORDER BY event_date 3. 퍼널 분석 -- event_name + screen (필요한 이벤트만 WHERE 조건에 걸어서 사용) -- step_number + COUNT WITH base AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, platform, -- param MAX(IF(param.key = 'firebase_screen', param.value.string_value, NULL)) AS firebase_screen, MAX(IF(param.key = 'food_id', param.value.int_value, NULL)) AS food_id, MAX(IF(param.key = 'session_id', param.value.string_value, NULL)) AS session_id, FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS param WHERE event_date = '2022-08-01' GROUP BY ALL -- LIMIT 100 ), filter_event_and_concat_event_and_screen AS ( SELECT * EXCEPT(event_name, firebase_screen), CONCAT(event_name, "-", firebase_screen) AS event_name_with_screen FROM base WHERE event_name IN ("screen_view", "click_payment") ) SELECT * FROM filter_event_and_concat_event_and_screen 최종 RESULT# 일자 상관 없이 퍼널의 유저 수를 집계한 쿼리 => 일자별로 하기 위해 event_date 추가WITH base AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, platform, -- param MAX(IF(param.key = 'firebase_screen', param.value.string_value, NULL)) AS firebase_screen, MAX(IF(param.key = 'session_id', param.value.string_value, NULL)) AS session_id, FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS param WHERE -- event_date = '2022-08-01' # 적은 데이터로 쿼리를 작성하기 위해 만들어둔 조건 event_date BETWEEN '2022-08-01' AND '2022-08-18' GROUP BY ALL ), filter_event_and_concat_event_and_screen AS ( SELECT * EXCEPT(event_name, firebase_screen), CONCAT(event_name, "-", firebase_screen) AS event_name_with_screen, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime FROM base WHERE event_name IN ("screen_view", "click_payment") ) -- step_number + COUNT -- step_number : CASE WHEN 을 사용해 숫자 지정, 조건문을 여러 개 하고싶을 때 사용하는 함수 # 일자 상관 없이 퍼널의 유저 수를 집계한 쿼리 => 일자별로 하기 위해 event_date 추가 SELECT event_date, # 일자별로 퍼널별 유저 수 쿼리 event_name_with_screen, -- event_name_with_screen, -- event_datetime, user_pseudo_id, CASE WHEN event_name_with_screen = 'screen_view-welcome' THEN 1 WHEN event_name_with_screen = 'screen_view-home' THEN 2 WHEN event_name_with_screen = 'screen_view-food_category' THEN 3 WHEN event_name_with_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_with_screen = 'screen_view-cart' THEN 5 WHEN event_name_with_screen = 'click_payment-cart' THEN 6 ELSE NULL END AS step_number, COUNT(DISTINCT user_pseudo_id) AS CNT FROM filter_event_and_concat_event_and_screen GROUP BY ALL HAVING step_number IS NOT NULL ORDER BY event_date
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[ 인프런 빅쿼리 빠짝스터디 1주차 ] ARRAY, STRUCT 연습 문제 / PIVOT 연습 문제 / 퍼널 쿼리 연습 문제
1. ARRAY, STRUCT1) array_exercises 테이블에서 각 영화(title)별로 장르(genres)를 UNNEST해서 보여주세요SELECT title, genre FROM advanced.array_exercises CROSS JOIN UNNEST(genres) AS genre2) array_exercises 테이블에서 각 영화(title)별로 배우(actor)와 배역(character)을 보여주세요. (배우와 배역은 별도의 컬럼으로 나와야 합니다)SELECT title, actor.actor, actor.character FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS actor3) array_exercises 테이블에서 각 영화(title)별로 배우(actor), 배역(character), 장르(genre)를 출력하세요. 한 Row에 배우, 배역, 장르가 모두 표시되어야 합니다SELECT title, actor.actor, actor.character, genre FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS actor CROSS JOIN UNNEST(genres) AS genre4) 앱 로그 데이터(app_logs)의 배열을 풀어주세요SELECT user_id, event_date, event_name, user_pseudo_id, event_param.key AS key, event_param.value.string_value AS string_value, event_param.value.int_value AS int_value FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event_param WHERE event_date = '2022-08-01'2. PIVOT1) orders 테이블에서 유저(user_id)별로 주문 금액(amount)의 합계를 PIVOT해주세요. 날짜(order_date)를 행(Row)으로, user_id를 열(Column)으로 만들어야 합니다SELECT order_date, SUM(IF(user_id = 1, amount, 0)) AS user_1, SUM(IF(user_id = 2, amount, 0)) AS user_2, SUM(IF(user_id = 3, amount, 0)) AS user_3, FROM advanced.orders GROUP BY order_date ORDER BY order_date 2) orders 테이블에서 날짜(order_date)별로 유저들의 주문 금액(amount)의 합계를 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)으로 만들어야 합니다-- 2) orders 테이블에서 날짜(order_date)별로 유저들의 주문 금액(amount)의 합계를 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)으로 만들어야 합니다 SELECT user_id, SUM(IF(order_date = '2023-05-01', amount, 0)) AS `2023-05-01`, SUM(IF(order_date = '2023-05-02', amount, 0)) AS `2023-05-02`, SUM(IF(order_date = '2023-05-03', amount, 0)) AS `2023-05-03`, SUM(IF(order_date = '2023-05-04', amount, 0)) AS `2023-05-04`, SUM(IF(order_date = '2023-05-05', amount, 0)) AS `2023-05-05`, FROM advanced.orders GROUP BY user_id ORDER BY user_id3) orders 테이블에서 사용자(user_id)별, 날짜(order_date)별로 주문이 있다면 1, 없다면 0으로 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)로 만들고 주문을 많이 해도 1로 처리합니다SELECT user_id, MAX(IF(order_date = '2023-05-01', 1, 0)) AS `2023-05-01`, MAX(IF(order_date = '2023-05-02', 1, 0)) AS `2023-05-02`, MAX(IF(order_date = '2023-05-03', 1, 0)) AS `2023-05-03`, MAX(IF(order_date = '2023-05-04', 1, 0)) AS `2023-05-04`, MAX(IF(order_date = '2023-05-05', 1, 0)) AS `2023-05-05`, FROM advanced.orders GROUP BY user_id ORDER BY user_id4) 앱 로그 데이터 배열 PIVOT하기# 앱 로그 PIVOT # 쿼리를 작성하는 목표, 확인할 지표 : user_id = 32888이 카트 추가하기(click_cart)를 누를때 어떤 음식(food_id)을 담았나요? # 쿼리 계산 방법 : UNNEST -> PIVOT # 데이터의 기간 : X # 사용할 테이블 : app_logs # Join KEY : X # 데이터 특징: -- event_params ARRAY, STRUCT / event_params.value ARRAY, STRUCT WITH base AS ( SELECT user_id, event_date, event_name, user_pseudo_id, event_param.key AS key, event_param.value.string_value AS string_value, event_param.value.int_value AS int_value FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event_param ) SELECT user_id, event_date, event_name, user_pseudo_id, MAX(IF(key = 'firebase_screen', string_value, NULL)) AS firebase_screen, MAX(IF(key = 'food_id', int_value, NULL)) AS food_id, MAX(IF(key = 'session_id', string_value, NULL)) AS session_id FROM base WHERE user_id = 32888 AND event_name = 'click_cart' GROUP BY ALL3. 퍼널WITH base AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, platform, MAX(IF(event_param.key = 'firebase_screen', event_param.value.string_value, NULL)) AS firebase_screen, MAX(IF(event_param.key = 'session_id', event_param.value.string_value, NULL)) AS session_id FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event_param WHERE event_date BETWEEN '2022-08-01' AND '2022-08-18' GROUP BY ALL ), filter_event_and_concat_event_and_screen AS ( SELECT * EXCEPT(event_name, firebase_screen, event_timestamp), CONCAT(event_name, "-", firebase_screen) AS event_name_with_screen, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime FROM base WHERE event_name IN ('screen_view', 'click_payment') ), daily_event_summary AS ( SELECT event_date, # 일자별로 퍼널별 유저 수 쿼리 event_name_with_screen, CASE WHEN event_name_with_screen = 'screen_view-welcome' THEN 1 WHEN event_name_with_screen = 'screen_view-home' THEN 2 WHEN event_name_with_screen = 'screen_view-food_category' THEN 3 WHEN event_name_with_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_with_screen = 'screen_view-cart' THEN 5 WHEN event_name_with_screen = 'click_payment-cart' THEN 6 ELSE NULL END AS step_number, COUNT(DISTINCT user_pseudo_id) AS cnt FROM filter_event_and_concat_event_and_screen GROUP BY ALL HAVING step_number IS NOT NULL ) -- 집계한 데이터 PIVOT SELECT event_date, MAX(IF(event_name_with_screen = 'screen_view-welcome', cnt, 0)) AS screen_view_welcome, MAX(IF(event_name_with_screen = 'screen_view-home', cnt, 0)) AS screen_view_home, MAX(IF(event_name_with_screen = 'screen_view-food_category', cnt, 0)) AS screen_view_food_category, MAX(IF(event_name_with_screen = 'screen_view-restaurant', cnt, 0)) AS screen_view_restaurant, MAX(IF(event_name_with_screen = 'screen_view-cart', cnt, 0)) AS screen_view_cart, MAX(IF(event_name_with_screen = 'click_payment-cart', cnt, 0)) AS click_payment_cart FROM daily_event_summary GROUP BY ALL ORDER BY event_date
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[ 인프런 빅쿼리 빠짝스터디 1주차 ] ARRAY, STRUCT 연습 문제 / PIVOT 연습 문제 / 퍼널 쿼리 연습 문제
1. ARRAY, STRUCT1) array_exercises 테이블에서 각 영화(title)별로 장르(genres)를 UNNEST해서 보여주세요SELECT title , genre FROM advanced.array_exercises CROSS JOIN UNNEST(genres) AS genre2) array_exercises 테이블에서 각 영화(title)별로 배우(actor)와 배역(character)을 보여주세요. (배우와 배역은 별도의 컬럼으로 나와야 합니다)SELECT title, , actor.actor AS actor , actor.character AS character FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS actor3) array_exercises 테이블에서 각 영화(title)별로 배우(actor), 배역(character), 장르(genre)를 출력하세요. 한 Row에 배우, 배역, 장르가 모두 표시되어야 합니다SELECT title , actor.actor AS actor , actor.character AS character , genre FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS actor CROSS JOIN UNNEST(genres) AS genre 4) 앱 로그 데이터(app_logs)의 배열을 풀어주세요SELECT user_id , event_date , event_name , user_pseudo_id , params.key AS key , params.value.string_value AS str_value , params.value.int_value AS int_value FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS params WHERE event_date = '2022-08-01' 2. PIVOT1) orders 테이블에서 유저(user_id)별로 주문 금액(amount)의 합계를 PIVOT해주세요. 날짜(order_date)를 행(Row)으로, user_id를 열(Column)으로 만들어야 합니다-- 1) orders 테이블에서 유저(user_id)별로 주문 금액(amount)의 합계를 PIVOT해주세요. 날짜(order_date)를 행(Row)으로, user_id를 열(Column)으로 만들어야 합니다 WITH step1 AS ( SELECT order_date , user_id , sum(amount) AS sum_of_amount FROM advanced.orders GROUP BY ALL ) SELECT order_date , MAX(IF(user_id = 1, sum_of_amount, 0)) AS user_1 , MAX(IF(user_id = 2, sum_of_amount, 0)) AS user_2 , MAX(IF(user_id = 3, sum_of_amount, 0)) AS user_3 FROM step1 GROUP BY order_date ORDER BY order_date2) orders 테이블에서 날짜(order_date)별로 유저들의 주문 금액(amount)의 합계를 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)으로 만들어야 합니다SELECT user_id , SUM(IF(order_date = '2023-05-01', amount, 0)) AS `2023-05-01` , SUM(IF(order_date = '2023-05-02', amount, 0)) AS `2023-05-02` , SUM(IF(order_date = '2023-05-03', amount, 0)) AS `2023-05-03` , SUM(IF(order_date = '2023-05-04', amount, 0)) AS `2023-05-04` , SUM(IF(order_date = '2023-05-05', amount, 0)) AS `2023-05-05` FROM advanced.orders GROUP BY user_id ORDER BY user_id3) orders 테이블에서 사용자(user_id)별, 날짜(order_date)별로 주문이 있다면 1, 없다면 0으로 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)로 만들고 주문을 많이 해도 1로 처리합니다SELECT user_id , MAX(IF(order_date = '2023-05-01', 1, 0)) AS `2023-05-01` , MAX(IF(order_date = '2023-05-02', 1, 0)) AS `2023-05-02` , MAX(IF(order_date = '2023-05-03', 1, 0)) AS `2023-05-03` , MAX(IF(order_date = '2023-05-04', 1, 0)) AS `2023-05-04` , MAX(IF(order_date = '2023-05-05', 1, 0)) AS `2023-05-05` FROM advanced.orders GROUP BY user_id ORDER BY user_id 4) 앱 로그 데이터 배열 PIVOT하기SELECT user_id , event_date , event_name , user_pseudo_id , MAX(IF(params.key = 'firebase_screen', params.value.string_value, NULL)) AS firebase_screen , MAX(IF(params.key = 'food_id', params.value.int_value, NULL)) AS food_id , MAX(IF(params.key = 'session_id', params.value.string_value, NULL)) AS session_id FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS params WHERE event_date = '2022-08-01' GROUP BY ALL 3. 퍼널 분석 WITH step1 AS ( SELECT event_date , event_timestamp , event_name , user_id , user_pseudo_id , MAX(IF(params.key = 'firebase_screen' , params.value.string_value , NULL)) AS firebase_screen , MAX(IF(params.key = 'session_id' , params.value.string_value , NULL)) AS session_id , platform FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS params WHERE event_date BETWEEN '2022-08-01' AND '2022-08-18' GROUP BY ALL ), step2 AS ( SELECT * EXCEPT(event_timestamp) , CONCAT(event_name, '-', firebase_screen) AS event_name_with_screen , DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime FROM step1 ), step3 AS ( SELECT * , CASE WHEN event_name_with_screen = 'screen_view-welcome' THEN 1 WHEN event_name_with_screen = 'screen_view-home' THEN 2 WHEN event_name_with_screen = 'screen_view-food_category' THEN 3 WHEN event_name_with_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_with_screen = 'screen_view-cart' THEN 5 WHEN event_name_with_screen = 'click_payment-cart' THEN 6 END AS step_number FROM step2 -- 1) 각 퍼널별 유저 수 집계 ), step3_1 AS ( SELECT event_name_with_screen , step_number , COUNT(DISTINCT user_pseudo_id) AS cnt FROM step3 GROUP BY ALL HAVING step_number IS NOT NULL ORDER BY step_number ) -- 2) 일자별 각 퍼널별 유저 수 집계 , step3_2 AS ( SELECT event_date , event_name_with_screen , step_number , COUNT(DISTINCT user_pseudo_id) AS cnt FROM step3 GROUP BY ALL HAVING step_number IS NOT NULL ORDER BY event_date , step_number ) 3) 2) 데이터를 PIVOTSELECT event_date , MAX(IF(event_name_with_screen = 'screen_view-welcome', cnt, NULL)) AS `screen_view-welcome` , MAX(IF(event_name_with_screen = 'screen_view-home', cnt, NULL)) AS `screen_view-home` , MAX(IF(event_name_with_screen = 'screen_view-food_category', cnt, NULL)) AS `screen_view-food_category` , MAX(IF(event_name_with_screen = 'screen_view-restaurant', cnt, NULL)) AS `screen_view-restaurant` , MAX(IF(event_name_with_screen = 'screen_view-cart', cnt, NULL)) AS `screen_view-cart` , MAX(IF(event_name_with_screen = 'click_payment-cart', cnt, NULL)) AS `click_payment-cart` FROM step3_2 GROUP BY event_date ORDER BY event_date
-
미해결8시간 완성 SQLD(2과목)
도커 말고 다른 방법으로 실습환경을 구축할 수 있을까요?
제가 오라클 버추얼박스로 따로 실습하고 있는 게 있는데, 도커를 설치한 이후로 버추얼박스 vm이 구동되지 않는 것을 확인했습니다. 그러다 버추얼박스와 도커가 WSL2로 인해 충돌이 있다는 것을 알게 되었고, Hyper-V 관련 설정을 제거하였더니 버추얼박스 vm이 다시 구동되었습니다. 그런데 아니나 다를까 도커는 실행이 안되더라구요 참고로 버추얼박스 실습하는것도 호환성 문제때문에 구버전을 쓰고있고 업데이트가 불가한 상황인데 혹시 도커말고도 sql 실습환경을 구축할수 있는 방법이 있을지 혹시나싶어 문의드립니다..!!
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 1주차 과제] ARRAY, STRUCT 연습 문제/ PIVOT 연습문제/ 퍼널 쿼리 연습 문제
[PART 1] ARRAY, STRUCT 연습문제 Q1. array_exercises 테이블에서 각 영화(title)별로 장르(genres)를 UNNEST해서 보여주세요.-- CROSS JOIN UNNEST(ARRARY_COKUMN) AS 새로운 이름 -> 이후 SELECT 절에서 새로운 이름만 포함하여 쿼리를 실행하여 평면화 가능 SELECT title, -- genres, genre FROM advanced.array_exercises AS ae CROSS JOIN UNNEST(genres) AS genre; Q2. array_exercise 테이블에서 각 영화(title)별로 배우(actors)와 배역(character)을 보여주세요 배우와 배역은 별도의 컬럼으로 나와야 합니다.SELECT title, aa.actor, aa.character FROM advanced.array_exercises AS ae CROSS JOIN UNNEST(actors) AS aa; Q3. array_exercises 테이블에서 각 영화(title)별로 배우(actor), 배역(character), 장르(genre)를 출력하세요. 한 Row에 배우, 배역, 장르가 모두 표시되어야 합니다.SELECT title, aa.actor, aa.character, gg AS genre FROM advanced.array_exercises AS ae CROSS JOIN UNNEST(actors) AS aa CROSS JOIN UNNEST(genres) AS gg; Q4. 앱 로그 데이터(app_logs)의 배열을 풀어주세요. SELECT event_date, event_timestamp, event_name, ep.key, ep.value.string_value, ep.value.int_value, user_id, user_pseudo_id, platform FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS ep WHERE event_date = "2022-08-01"; [PART 2] PIVOT 연습문제 Q1. orders 테이블에서 유저(user_id)별로 주문 금액(amount)의 합계를 PIVOT해주세요. 날짜(order_date)를 행(Row)으로, user_id를 열(Column)으로 만들어야 합니다.SELECT order_date, SUM(IF(user_id = 1, amount, 0)) AS user_1, SUM(IF(user_id = 2, amount, 0)) AS user_2, SUM(IF(user_id = 3, amount, 0)) AS user_3 FROM advanced.orders GROUP BY order_date; Q2. orders 테이블에서 날짜(order_date)별로 유저들의 주문 금액(amount)의 합계를 PIVOT 해주세요. user_id 를 행(Row)으로, order_date를 열(Column)으로 만들어야 합니다.-- 처음 시도에 짠 쿼리 SELECT user_id, SUM(IF(order_date = "2023-05-01", amount, 0)) AS 2023-05-01, SUM(IF(order_date = "2023-05-02", amount, 0)) AS 2023-05-02, SUM(IF(order_date = "2023-05-03", amount, 0)) AS 2023-05-03, SUM(IF(order_date = "2023-05-04", amount, 0)) AS 2023-05-04, SUM(IF(order_date = "2023-05-05", amount, 0)) AS 2023-05-05, FROM advanced.orders GROUP BY user_id;이렇게 쿼리를 짜고 실행해보니 다음과 같은 오류 문구가 나옴Syntax error: Unexpected integer literal "2023" at [3:52]SELECT 문에서 AS 다음에 정의한 새로운 컬럼 명칭에 오류가 있는 것 같은데, “ “ 로 감싸도 오류가 나오고 + 아예 AS 를 빼고 실행했더니 f0/f1/f2 와 같은 임의의 컬럼명이 지정됨-- 강의를 듣고 고친 쿼리 SELECT user_id, SUM(IF(order_date = "2023-05-01", amount, 0)) AS `2023-05-01`, SUM(IF(order_date = "2023-05-02", amount, 0)) AS `2023-05-02`, SUM(IF(order_date = "2023-05-03", amount, 0)) AS `2023-05-03`, SUM(IF(order_date = "2023-05-04", amount, 0)) AS `2023-05-04`, SUM(IF(order_date = "2023-05-05", amount, 0)) AS `2023-05-05`, FROM advanced.orders GROUP BY user_id;alias 로 영어가 아닌 컬럼의 이름을 새로 지정할 때에는, 반드시 backtick(`) 으로 감싸줘야 한다는 점!(c.f.) MAC 에서 backtick 은 영문인 상태로 ₩ 단축키를 누르면 나온다강의에서는 SUM 함수가 아닌 MAX 함수 or ANY_VALUE 함수로 감싸서 쿼리를 작성해주신 점 확인강의 진행 시에는 IF 문으로 먼저 데이터를 확인하고 user_id x order_date 별로 1개의 데이터만 있다는 점을 중간확인 했기 때문단, 실무 상으로는 user_id 와 order_date 가 무수히 많거나 or 데이터 양이 너무 많아 중간 조회를 할 수 없는 상황도 있기 때문에 → 이런 경우에는 MAX 로 가져가는 것이 안전하지 않을까? 하는 생각도 들었음데이터를 보고자 하는 목적이 “각 user_id x order_date 별로 주문금액의 합산” 을 보기 위함이었기 때문! Q3. orders 테이블에서 사용자(user_id)별, 날짜(order_date)별로 주문이 있다면 1, 없다면 0으로 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)로 만들고 주문을 많이 해도 1로 처리합니다.-- 처음 시도에 짠 쿼리 SELECT user_id, IF((SUM(IF(order_date = "2023-05-01", amount, NULL))) IS NOT NULL, 1, 0) AS `2023-05-01`, IF((SUM(IF(order_date = "2023-05-02", amount, NULL))) IS NOT NULL, 1, 0) AS `2023-05-02`, IF((SUM(IF(order_date = "2023-05-03", amount, NULL))) IS NOT NULL, 1, 0) AS `2023-05-03`, IF((SUM(IF(order_date = "2023-05-04", amount, NULL))) IS NOT NULL, 1, 0) AS `2023-05-04`, IF((SUM(IF(order_date = "2023-05-05", amount, NULL))) IS NOT NULL, 1, 0) AS `2023-05-05`, FROM advanced.orders GROUP BY user_id;SUM 과 IF 문으로 먼저 쿼리를 짜서 user_id 별 합산값을 확인하고 → 합산값이 NULL 이 아닌 경우에만 1을 다시 값으로 지정할 수 있도록 IF 문으로 감싸둠이렇게 쿼리를 짜도 결과값은 동이하게 나오지만, SELECT 문이 길어진 것 같아서 좀 더 효율적으로 쿼리를 짤 수 있는 방법은 없을지 고민되었음 ㅠ-- 강의를 듣고 고친 쿼리 SELECT user_id, MAX(IF(order_date = "2023-05-01", 1, 0)) AS `2023-05-01`, MAX(IF(order_date = "2023-05-02", 1, 0)) AS `2023-05-02`, MAX(IF(order_date = "2023-05-03", 1, 0)) AS `2023-05-03`, MAX(IF(order_date = "2023-05-04", 1, 0)) AS `2023-05-04`, MAX(IF(order_date = "2023-05-05", 1, 0)) AS `2023-05-05`, FROM advanced.orders GROUP BY user_id;IF 문 안에 들어있는 TRUE 조건이었던 amount 를 1로 바꿔주면 간단히 해결되었을 문제!IF 문 안에 TRUE 일 때 값이 항상 특정 컬럼이 아니라 1이라고도 할 수 있다는 것을 보여주기 위한 문제다만, PIVOT 2번 문제에서 언급했던 바와 같이 만약 user_id x order_date 별로 1개의 값만 존재하는 것이 아니라 여러 값이 존재했더라면 → 3번 쿼리에서 MAX를 썼을 때와 SUM 을 썼을 때의 결과값은 달라졌을 것이라 생각함만약 user_id x order_date 별로 N개의 값이 존재했더라면 → SUM 함수로 감쌌을 때 1의 값이 x N개 합산되어서 나왔을 것이기 때문따라서, 3번 문제의 경우 1 혹은 0 2개의 값으로만 표현해야 했기 때문에, SUM 이 아닌 MAX 함수로 쿼리를 작성하는 것이 필요하다고 생각함단, 만약 “횟수” 를 알고 싶었다면 기존대로 SUM 함수를 사용하면 됨! Q4. 앱로그 데이터 배열 PIVOT 하기 → user_id = 32888 이 카트 추가하기(click_cart)를 누를 때 어떤 음식 (food_id)을 담았는지 구해주세요. key 를 Column 으로 두고, string_value 나 int_value를 Column의 값으로 설정해서 풀어주세요.SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, MAX(IF(ep.key = "firebase_screen", ep.value.string_value, NULL)) AS firebase_screen, MAX(IF(ep.key = "food_id", ep.value.int_value, NULL)) AS food_id, MAX(IF(ep.key = "session_id", ep.value.string_value, NULL)) AS session_id FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS ep GROUP BY ALL; [PART 3] 퍼널 연습문제WITH base AS (SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, platform, -- event_param, MAX(IF(event_param.key = "firebase_screen", event_param.value.string_value, NULL)) AS firebase_screen, MAX(IF(event_param.key = "food_id", event_param.value.int_value, NULL)) AS food_id, MAX(IF(event_param.key = "session_id", event_param.value.string_value, NULL)) AS session_id, FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event_param WHERE -- event_date = "2022-08-01" event_date BETWEEN "2022-08-01" AND "2022-08-18" GROUP BY ALL ), filter_event_and_concat_event_and_screen AS ( -- event_name + screen (필요한 이벤트만 WHERE 조건에 걸어서 사용) SELECT * EXCEPT(event_name, firebase_screen, event_timestamp), CONCAT(event_name, "-", firebase_screen) AS event_name_with_screen, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime FROM base WHERE event_name IN ("screen_view", "click_payment") ) -- step_number + COUNT -- step_number : CASE WHEN을 사용해 숫자 지정 -- 일자별로 퍼널별 유저 수 쿼리 SELECT event_date, event_name_with_screen, CASE WHEN event_name_with_screen = "screen_view-welcome" THEN 1 WHEN event_name_with_screen = "screen_view-home" THEN 2 WHEN event_name_with_screen = "screen_view-food_category" THEN 3 WHEN event_name_with_screen = "screen_view-restaurant" THEN 4 WHEN event_name_with_screen = "screen_view-cart" THEN 5 WHEN event_name_with_screen = "click_payment-cart" THEN 6 ELSE NULL END AS step_number COUNT(DISTINCT user_pseudo_id) AS cnt FROM filter_event_and_concat_event_and_screen GROUP BY ALL HAVING step_number IS NOT NULL ORDER BY event_date
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[바짝스터디 1주차 과제] ARRAY,STRUCT,PIVOT,FUNNEL
1. ARRAY, STRUCT중요 문법CROSS JOIN UNNEST ( ) 연습문제 1 ) ARRAY 데이터의 기본 추출 SELECT title, genre FROM `advanced.array_exercises` CROSS JOIN UNNEST(genres) AS genre 연습문제 2 ) STRUCT 데이터의 기본 추출 SELECT title, actor FROM `advanced.array_exercises` CROSS JOIN UNNEST(actors) AS actor --위와 같이 추출할시 actor 컬럼과 character 컬럼이명이 명확하게 나오지 않음으로 SELECT title, actor.actor, actor.character FROM `advanced.array_exercises` CROSS JOIN UNNEST(actors) AS actor --위와 같이 명확한 명칭을 적어주면 데이터의 컬럼을 확인하기 좋다 연습문제 3 ) CROSS JOIN 2번 사용 SELECT title, actor.actor, actor.character, genre FROM `advanced.array_exercises` CROSS JOIN UNNEST(actors) AS actor CROSS JOIN UNNEST(genres) AS genre -- OR SELECT title, actor.actor, actor.character, genre FROM `advanced.array_exercises`, UNNEST(actors) AS actor, UNNEST(genres) AS genre 연습문제 4 ) 로그 데이터 풀어보기 SELECT event_date, event_timestamp, event_name, event_param.key AS key, event_param.value.string_value AS string_value, event_param.value.int_value AS int_value, user_id, user_pseudo_id, platform FROM `advanced.app_logs` CROSS JOIN UNNEST (event_params) AS event_param2. PIVOT중요 문법IF(조건 = , TRUE , FALSE ) 컬럼명을 숫자로 설정하고 싶을시 ` 을 사용하여 감싸주어야한다 연습 문제 1 ) PIVOT 기본 SELECT order_date, SUM(IF(user_id = 1, total_amount, 0)) AS user_id_1, SUM(IF(user_id = 2, total_amount, 0)) AS user_id_2, SUM(IF(user_id = 3, total_amount, 0)) AS user_id_3 FROM( SELECT order_date, user_id, SUM(amount) AS total_amount FROM `advanced.orders` GROUP BY order_date , user_id ) GROUP BY order_date ORDER BY order_date 연습문제 2 ) PIVOT 기본 2SELECT user_id, SUM(IF(order_date = '2023-05-01', amount, 0)) AS `2023-05-01`, SUM(IF(order_date = '2023-05-02', amount, 0)) AS `2023-05-02`, SUM(IF(order_date = '2023-05-03', amount, 0)) AS `2023-05-03`, SUM(IF(order_date = '2023-05-04', amount, 0)) AS `2023-05-04`, SUM(IF(order_date = '2023-05-05', amount, 0)) AS `2023-05-05` FROM `advanced.orders` GROUP BY user_id ORDER BY user_id 연습문제 3) TRUE 값의 변화SELECT user_id, MAX(IF(order_date = '2023-05-01', 1, 0)) AS `2023-05-01`, MAX(IF(order_date = '2023-05-02', 1, 0)) AS `2023-05-02`, MAX(IF(order_date = '2023-05-03', 1, 0)) AS `2023-05-03`, MAX(IF(order_date = '2023-05-04', 1, 0)) AS `2023-05-04`, MAX(IF(order_date = '2023-05-05', 1, 0)) AS `2023-05-05` FROM `advanced.orders` GROUP BY user_id ORDER BY user_id 연습문제 4 ) 로그 데이터를 이용한 컬럼 정리SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, MAX(IF(param.key = 'firebase_screen', param.value.string_value, NULL)) AS firebase_screen, MAX(IF(param.key = 'food_id', param.value.int_value, NULL)) AS food_id, MAX(IF(param.key = 'session_id', param.value.string_value, NULL)) AS session_id FROM `advanced.app_logs` CROSS JOIN UNNEST (event_params) AS param GROUP BY ALLGROUP BY ALL에 대한 이해,UNNEST 이후 컬럼을 어떤식으로 정리할것인가 3. 퍼널분석퍼널분석 쿼리 WITH logs AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, MAX(IF(param.key = 'firebase_screen', param.value.string_value, NULL)) AS firebase_screen, MAX(IF(param.key = 'food_id', param.value.int_value, NULL)) AS food_id, MAX(IF(param.key = 'session_id', param.value.string_value, NULL)) AS session_id FROM `advanced.app_logs` CROSS JOIN UNNEST (event_params) AS param WHERE event_date BETWEEN '2022-12-01' AND '2022-12-31' GROUP BY ALL ), filter_logs AS ( SELECT * EXCEPT(event_name, firebase_screen, event_timestamp), CONCAT(event_name, '-', firebase_screen) AS event_name_screen, DATETIME(TIMESTAMP_MICROS(event_timestamp)) AS event_time FROM logs WHERE event_name IN ('screen_view', 'click_payment') ) funnel AS ( SELECT event_date, event_name_screen, CASE WHEN event_name_screen = 'screen_view-welcome' THEN 1 WHEN event_name_screen = 'screen_view-home' THEN 2 WHEN event_name_screen = 'screen_view-food_category' THEN 3 WHEN event_name_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_screen = 'screen_view-cart' THEN 5 WHEN event_name_screen = 'click_payment-cart' THEN 6 ELSE NULL END AS flow, COUNT(DISTINCT user_pseudo_id) AS cnt FROM filter_logs GROUP BY ALL HAVING flow IS NOT NULL ORDER BY 1,3 ) PIVOTSELECT event_date, MAX(IF(flow = 1, cnt, 0)) AS screen_view_welcome, MAX(IF(flow = 2, cnt, 0)) AS screen_view_home, MAX(IF(flow = 3, cnt, 0)) AS screen_food_category, MAX(IF(flow = 4, cnt, 0)) AS screen_restaurant, MAX(IF(flow = 5, cnt, 0)) AS screen_cart, MAX(IF(flow = 6, cnt, 0)) AS click_payment_cart, FROM funnel GROUP BY event_date ORDER BY 1
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[바짝스터디 1주차 과제] ARRAY, STRUCT 연습 문제 / PIVOT 연습 문제
1. ARRAY, STRUCT 연습문제 1)array_exercises테이블에서각영화(title)별로장르(genres)를UNNEST해서 보여주세요 SELECT title, genre FROM advanced.array_exercises CROSS JOIN UNNEST(genres) AS genre; 2)array_exercises테이블에서각영화(title)별로배우(actor)와배역(character)을 보여주세요. 배우와배역은별도의컬럼으로나와야합니다 SELECT title, actors.actor, actors.character FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS ac; 3)array_exercises테이블에서각영화(title)별로배우(actor),배역(character),장르 (genre)를출력하세요.한Row에배우,배역,장르가모두표시되어야합니다 SELECT title, actor.actor, actor.character, genre FROM `advanced.array_exercises` CROSS JOIN UNNEST(actors) as actor CROSS JOIN UNNEST(genres) as genre; 4)앱로그데이터(app_logs)의배열을풀어주세요 SELECT user_id, event_date, event_name, user_pseudo_id, event_param.key, event_param.value.string_value, event_param.value.int_value FROM `advanced.app_logs` CROSS JOIN UNNEST(event_params) AS event_param 2. PIVOT 연습 문제 1)orders테이블에서유저(user_id)별로주문금액(amount)의합계를PIVOT해주세요. 날짜(order_date)를행(Row)으로,user_id를열(Column)으로만들어야합니다 SELECT order_date, COALESCE(SUM(IF(user_id = 1, amount, null)),0) AS user_1, COALESCE(SUM(IF(user_id = 2, amount, null)),0) AS user_2, COALESCE(SUM(IF(user_id = 3, amount, null)),0) AS user_3 FROM advanced.orders GROUP BY order_date ORDER BY order_date 2)orders테이블에서날짜(order_date)별로유저들의주문금액(amount)의합계를 PIVOT해주세요.user_id를행(Row)으로,order_date를열(Column)으로만들어야 합니다 SELECT user_id, COALESCE(SUM(IF(order_date = '2023-05-01', amount, null)),0) AS `2023-05-01`, COALESCE(SUM(IF(order_date = '2023-05-02', amount, null)),0) AS `2023-05-02`, COALESCE(SUM(IF(order_date = '2023-05-03', amount, null)),0) AS `2023-05-03`, COALESCE(SUM(IF(order_date = '2023-05-04', amount, null)),0) AS `2023-05-04`, COALESCE(SUM(IF(order_date = '2023-05-05', amount, null)),0) AS `2023-05-05`, FROM advanced.orders GROUP BY user_id ORDER BY user_id; 3)orders테이블에서사용자(user_id)별,날짜(order_date)별로주문이있다면1, 없다면0으로PIVOT해주세요.user_id를행(Row)으로,order_date를열(Column)로 만들고주문을많이해도1로처리합니다 SELECT user_id, MAX(IF(order_date = '2023-05-01' AND order_id is not null, 1, 0)) AS `2023-05-01`, MAX(IF(order_date = '2023-05-02' AND order_id is not null, 1, 0)) AS `2023-05-02`, MAX(IF(order_date = '2023-05-03' AND order_id is not null, 1, 0)) AS `2023-05-03`, MAX(IF(order_date = '2023-05-04' AND order_id is not null, 1, 0)) AS `2023-05-04`, MAX(IF(order_date = '2023-05-05' AND order_id is not null, 1, 0)) AS `2023-05-05`, FROM advanced.orders GROUP BY user_id ORDER BY user_id; *) user_id=32888이카트추가하기(click_cart)를누를때어떤음식(food_id)을담았나요? WITH app_order_raw AS ( SELECT user_id, event_date, event_name, user_pseudo_id, pr.key, pr.value.string_value, pr.value.int_value FROM advanced.app_logs, UNNEST(event_params) AS pr WHERE event_date = '2022-08-01' ) SELECT user_id, event_date, event_name, user_pseudo_id, MAX(IF(key = 'firebase_screen', string_value, null)) AS firebase_screen, MAX(IF(key = 'food_id', int_value, null)) AS food_id, MAX(IF(key = 'session_id', string_value, null)) AS session_id, FROM app_order_raw GROUP BY user_id, event_date, event_name, user_pseudo_id; 3. 느낀점 퍼널이 무엇언지에 대해서 자세히 배울 수 있었음(학과 수업에선 퍼널에 대해서 딱히 알려주는 수업이 없어서 이렇게 재대로 배울 기회가 없었다.) 시작하기 앞서 빅쿼리에 대해 아는 것이 적어 잘할 수 있을지 고민이 많이 되었지만 강의 내용이 이해가 잘되어 열심히 노력하면 따라갈 수 있겠다는 생각이 듬 2주차도 잘부탁드립니다~
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[바짝스터디 1주차 과제] ARRAY, STRUCT 연습 문제 / PIVOT 연습 문제 / 퍼널 쿼리 연습 문제
SELECT movie_id, title, genre FROM `advanced.array_exercises`, CROSS JOIN UNNEST(genres) AS genre SELECT title, actors.actor, actors.character FROM `advanced.array_exercises` CROSS JOIN UNNEST(actors) AS actors SELECT title, actor.actor, actor.character, genre FROM `advanced.array_exercises` CROSS JOIN UNNEST(actors) AS actor CROSS JOIN UNNEST(genres) AS genre SELECT event_date, event_timestamp, event_name, event_param.key AS key, event_param.value.string_value AS string_value, event_param.value.int_value AS int_value, user_id, user_pseudo_id, platform FROM `advanced.app_logs` CROSS JOIN UNNEST(event_params) AS event_paramSELECT order_date, SUM(IF(user_id = 1, amount, 0)) AS user_1, SUM(IF(user_id = 2, amount, 0)) AS user_2, SUM(IF(user_id = 3, amount, 0)) AS user_3 FROM advanced.orders GROUP BY ALL ORDER BY order_date -- 2. orders 테이블에서 날짜(order_date) 별로 유저들의 주문 금액(amount)의 합계를 PIVOT 해주세요. user_id를 행(Row)으로, order_Date를 열(Column)으로 만들어야 합니다. SELECT user_id, SUM(IF(order_date = "2023-05-01", amount, 0)) AS `2023-05-01`, SUM(IF(order_date = "2023-05-02", amount, 0)) AS `2023-05-02`, SUM(IF(order_date = "2023-05-03", amount, 0)) AS `2023-05-03`, SUM(IF(order_date = "2023-05-04", amount, 0)) AS `2023-05-04`, SUM(IF(order_date = "2023-05-05", amount, 0)) AS `2023-05-05`, FROM advanced.orders GROUP BY ALL ORDER BY user_id -- 3. orders 테이블에서 사용자(user_id)별, 날짜(order_date)별로 주문이 있다면 1, 없다면 0으로 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)로 만들고 주문을 많이 해도 1로 처리합니다. SELECT user_id, MAX(IF(order_date = "2023-05-01", 1, 0)) AS `2023-05-01`, MAX(IF(order_date = "2023-05-02", 1, 0)) AS `2023-05-02`, MAX(IF(order_date = "2023-05-03", 1, 0)) AS `2023-05-03`, MAX(IF(order_date = "2023-05-04", 1, 0)) AS `2023-05-04`, MAX(IF(order_date = "2023-05-05", 1, 0)) AS `2023-05-05`, FROM advanced.orders GROUP BY ALL ORDER BY user_id -- 4.앱 로그 PIVOT WITH base AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, MAX(IF(param.key = "firebase_screen", param.value.string_value, NULL)) AS firebase_screen, MAX(IF(param.key = "food_id", param.value.int_value, NULL)) AS food_id, MAX(IF(param.key = "session_id", param.value.string_value, NULL)) AS session_id, FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS param GROUP BY ALL ) SELECT event_date, COUNT(user_id) AS user_cnt FROM base WHERE event_name = "click_cart" GROUP BY ALL FUNNEL with base as ( select event_date, event_timestamp, event_name, user_id, user_pseudo_id, platform, -- event_param max(if(event_param.key = 'firebase_screen', event_param.value.string_value, null)) as firebase_screen, -- max(if(event_param.key = 'food_id', event_param.value.int_value, null)) as food_id, max(if(event_param.key = 'session_id', event_param.value.string_value, null)) as session_id from advanced.app_logs cross join unnest(event_params) as event_param where event_date between '2022-08-01' and '2022-08-18' group by all ), filter_event_and_concat_event_and_screen as ( select * except(event_name, firebase_screen, event_timestamp), concat(event_name, '-', firebase_screen) as event_name_with_screen, datetime(timestamp_micros(event_timestamp), 'Asia/Seoul') as event_datetime from base where event_name in ('screen_view', 'click_payment' ) ) select event_date, event_name_with_screen, case when event_name_with_screen = 'screen_view-welcome' then 1 when event_name_with_screen = 'screen_view-home' then 2 when event_name_with_screen = 'screen_view-food_category' then 3 when event_name_with_screen = 'screen_view-restaurant' then 4 when event_name_with_screen = 'screen_view-cart' then 5 when event_name_with_screen = 'click_payment-cart' then 6 else null end as step_number, count(distinct user_pseudo_id) as cnt from filter_event_and_concat_event_and_screen group by all having step_number is not null order by 1, 3 -- 3번 퍼널 문제 with base as ( select event_date, event_timestamp, event_name, user_id, user_pseudo_id, platform, -- event_param max(if(event_param.key = 'firebase_screen', event_param.value.string_value, null)) as firebase_screen, -- max(if(event_param.key = 'food_id', event_param.value.int_value, null)) as food_id, max(if(event_param.key = 'session_id', event_param.value.string_value, null)) as session_id from advanced.app_logs cross join unnest(event_params) as event_param where event_date between '2022-08-01' and '2022-08-18' group by all ), filter_event_and_concat_event_and_screen as ( select * except(event_name, firebase_screen, event_timestamp), concat(event_name, '-', firebase_screen) as event_name_with_screen, datetime(timestamp_micros(event_timestamp), 'Asia/Seoul') as event_datetime from base where event_name in ('screen_view', 'click_payment' ) ) select event_date, event_name_with_screen, case when event_name_with_screen = 'screen_view-welcome' then 1 when event_name_with_screen = 'screen_view-home' then 2 when event_name_with_screen = 'screen_view-food_category' then 3 when event_name_with_screen = 'screen_view-restaurant' then 4 when event_name_with_screen = 'screen_view-cart' then 5 when event_name_with_screen = 'click_payment-cart' then 6 else null end as step_number, count(distinct user_pseudo_id) as cnt from filter_event_and_concat_event_and_screen group by all having step_number is not null order by 1, 3 SELECT event_date, MAX(IF(event_name_with_screen ="screen_view-welcome", cnt, NULL)) AS `screen_view-welcome`, MAX(IF(event_name_with_screen ="screen_view-home", cnt, NULL)) AS `screen_view-home`, MAX(IF(event_name_with_screen ="screen_view-food_category", cnt, NULL)) AS `screen_view-food_category`, MAX(IF(event_name_with_screen ="screen_view-restaurant", cnt, NULL)) AS `screen_view-restaurant`, MAX(IF(event_name_with_screen ="screen_view-cart", cnt, NULL)) AS `screen_view-cart`, MAX(IF(event_name_with_screen ="click_payment-cart", cnt, NULL)) AS `click_payment-cart` FROM main GROUP BY ALL ORDER BY all
-
미해결
[빠짝스터디 1주차 과제] ARRAY STRUCT, PIVOT, 퍼널 쿼리 연습문제
1주차.ARRAY(배열), STRUCT(구조체) 연습문제ARRAY = 같은 타입의(숫자, 문자 등) 여러 값을 하나의 컬럼에 저장할 수 있는 자료음식 메뉴판, 음악 플레이리스트-즐겨찾기한 음악 등STRUCT = 서로 다른 타입의 여러 값을 하나의 컬럼에 저장할 수 있는 자료주소록, 영화 정보 등✅ ARRAY, STRUCT 연습문제 1# 연습문제 1번. # array_exercises 테이블에서 각 영화(title) 별로 장르(genres)를 UNNEST해서 보여주세요. SELECT title, genre FROM advanced.array_exercises CROSS JOIN UNNEST(genres) as genre✅ ARRAY, STRUCT 연습문제 2# 연습문제 2번. # array_exercises 테이블에서 각 영화(title) 별로 배우(actor)와 배역(character)을 보여주세요. # 배우와 배역은 별도의 컬럼으로 나와야합니다. SELECT title, actor.actor, actor.character FROM advanced.array_exercises CROSS JOIN UNNEST(actors) as actor✅ ARRAY, STRUCT 연습문제 3# 연습문제 3번. # array_exercises 테이블에서 각 영화(title) 별로 배우(actor), 배역(character), 장르(genres)를 출력. # 한 Row에 배우, 배역, 장르가 모두 표시되어야합니다. SELECT title, actor.actor as actor, actor.character as character, genre FROM advanced.array_exercises CROSS JOIN UNNEST(genres) as genre CROSS JOIN UNNEST(actors) as actor ORDER BY 1, 2 desc✅ ARRAY, STRUCT 연습문제 4# 연습문제 4번. # app_logs 데이터의 배열을 풀어라. # 데이터 탐색, group by 활용, 하루 사용자 집계, 어떤 이벤트가 있는지 등. WITH base AS ( SELECT event_date, event_name, evnent_parm.key AS key, evnent_parm.value.string_value AS string_value, evnent_parm.value.int_value AS int_value, user_id FROM advanced.app_logs, UNNEST(event_params) AS evnent_parm WHERE event_date = '2022-08-01') SELECT event_date, event_name, count(distinct user_id) as cnt FROM base GROUP BY ALL order by cnt desc click_banner, cart, food, food_category, login, payment, recommend_extra_food, recommend_food, restaurant, restaruant_nearby, search / request_search, screen_view, view_recommend_extra_food → 총 14가지의 event_name.기간 내 전체 사용자(중복 제거) → 총 49,678명, click_payment 를 실행한 이용자 총 11,467명.2022-08-01 하루 사용자 129명. 8월 한달 사용자 6,424명. 8월 한달 click_payment 810명.→ event_name을 screen_view로 설정했지만, 기간 내 전체 사용자, 0801 하루 사용자, 8월 한달 사용자의 수는 같았다(user_id 사용). screen_view는 앱에 들어오면 바로 체크가 되는 기본 단계임을 알 수 있음. 앱에 들어오고 바로 다음이 중요하다고 생각.8월 한달 간의 ‘screen_view’ 이벤트의 분포가 월초 → 월말로 증가하는 형태가 나타남. screen_view 이벤트가 월말이 될 수록 커짐. 다른 달도 확인해봐야함. → 스프레드 시트로 월별로 확인해보면 좋을듯??✅ Pivot 연습문제 1# 데이터 PIVOT 연습문제 1번. # orders 테이블에서 유저(user_id)별로 주문 금액(amount)의 합계를 PIVOT 해주세요. # 날짜(order_rate)를 행으로, user_id를 열으로 만들어야 합니다. # 기대하는 OUTPUT : order_date | user_1 | user_2 | user_3 SELECT order_date, SUM(IF(user_id = 1, amount, 0 )) AS user_1, SUM(IF(user_id = 2, amount, 0 )) AS user_2, SUM(IF(user_id = 3, amount, 0 )) AS user_3 FROM advanced.orders GROUP BY order_date ORDER BY order_date✅ Pivot 연습문제 2# 데이터 PIVOT 연습문제 2번. # orders 테이블에서 날짜별로 유저들의 주문 금액의 합계를 PIVOT 해주세요. # user_id 를 행으로, order_date를 열으로 만들어야합니다. # 기대하는 OUTPUT : user_id | order_date | -- `(백틱, 숫자 1 왼쪽)을 쓰면 영어 이외에도 한글도 쓸 수 있다는 점!! SELECT user_id, SUM(IF(order_date = '2023-05-01', amount, 0 )) AS `2023-05-01`, SUM(IF(order_date = '2023-05-02', amount, 0 )) AS `2023-05-02`, SUM(IF(order_date = '2023-05-03', amount, 0 )) AS `2023-05-03`, SUM(IF(order_date = '2023-05-04', amount, 0 )) AS `2023-05-04`, SUM(IF(order_date = '2023-05-05', amount, 0 )) AS `2023-05-05` FROM advanced.orders GROUP BY user_id ORDER BY user_id✅ Pivot 연습문제 3# 데이터 PIVOT 연습문제 3번. # orders 테이블에서 사용자별, 날짜별로 주문이 있다면 1, 없다면 0으로 PIVOT 해주세요. # user_id를 행으로, order_date를 열로 만들고 주문을 많이 해도 1로 처리 합니다. -- AMOUNT 대신에 1로 표시!! SELECT user_id, MAX(IF(order_date = '2023-05-01',1,0)) AS `2023-05-01`, MAX(IF(order_date = '2023-05-02',1,0)) AS `2023-05-02`, MAX(IF(order_date = '2023-05-03',1,0)) AS `2023-05-03`, MAX(IF(order_date = '2023-05-04',1,0)) AS `2023-05-04`, MAX(IF(order_date = '2023-05-05',1,0)) AS `2023-05-05` FROM advanced.orders GROUP BY user_id ORDER BY user_id✅ Pivot 연습문제 4# 앱 로그 데이터 PIVOT 하기 # user_id = 32888 카트 추가하기(click_cart)를 누를 때 어떤 음식(food_id)을 담았나요? WITH base AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, MAX(IF(event_param.key = 'firebase_screen',event_param.value.string_value, NULL)) AS firebase_screen, MAX(IF(event_param.key = 'food_id',event_param.value.int_value, NULL)) AS food_id, MAX(IF(event_param.key = 'session_id',event_param.value.string_value, NULL)) AS session_id, FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event_param GROUP BY ALL ) SELECT user_id, event_date, COUNT(user_id) AS user_cnt, food_id FROM base WHERE user_id = 32888 and event_name = 'click_cart' GROUP BY ALL✅ 퍼널 분석 (Funnel) 연습문제# screen_view : welcome, home, food_category, cart, click_payment + step_number # 데이터의 기간 : 2022-08-01 ~ 2022-08-18 # 사용할 테이블 : 앱 로그 데이터, GA/firebase 데이터 -> UNNEST -> PIVOT # 기본이 되는 데이터프레임 만듬(base), 피벗으로 ARRAY 데이터 풀어줌. WITH base AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, platform, event_param, MAX(IF(event_param.key = 'firebase_screen', event_param.value.string_value, NULL)) AS firebase_screen, # food_id는 null 모두 null 값으로 필요없어서 주석처리함. -- MAX(IF(event_param.key = 'food_id', event_param.value.int_value, NULL)) AS food_id, MAX(IF(event_param.key = 'session_id', event_param.value.string_value, NULL)) AS session_id FROM advanced.app_logs CROSS JOIN UNNEST(e vent_params) AS event_param WHERE event_date between '2022-08-01'and '2022-08-18' GROUP BY ALL ), fliter_event_and_concat_event_and_screen AS( SELECT * EXCEPT(event_name, firebase_screen, event_timestamp), CONCAT(event_name,'-',firebase_screen) AS event_name_with_screen, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime FROM base WHERE event_name IN ('screen_view', 'click_payment') ) # step_number 만들어주는 case when 구문 select event_date, event_name_with_screen, CASE WHEN event_name_with_screen = 'screen_view-welcome' THEN 1 WHEN event_name_with_screen = 'screen_view-home' THEN 2 WHEN event_name_with_screen = 'screen_view-food_category' THEN 3 WHEN event_name_with_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_with_screen = 'screen_view-cart' THEN 5 WHEN event_name_with_screen = 'click_payment-cart' THEN 6 ELSE NULL END as step_number, COUNT(distinct user_pseudo_id) AS cnt FROM fliter_event_and_concat_event_and_screen GROUP BY ALL HAVING step_number IS NOT NULL # having으로 6가지를 제외한 나머지 이벤트는 제외 ORDER BY 1 # 바로 위의 커리를 with문으로 감싸서 'CTE'라는 테이블을 만듬. # 집계한 데이터를 PIVOT. 일자별 각 이벤트 네임의 횟수 확인 가능. SELECT event_date, MAX(IF(event_name_with_screen ='screen_view-welcome',cnt,NULL )) as screen_view_welcome, MAX(IF(event_name_with_screen ='screen_view-home',cnt,NULL )) as screen_view_home, MAX(IF(event_name_with_screen ='screen_view-food_category',cnt,NULL )) as screen_view_food_category, MAX(IF(event_name_with_screen ='screen_view-restaurant',cnt,NULL )) as screen_view_restaurant, MAX(IF(event_name_with_screen ='screen_view-cart',cnt,NULL )) as screen_view_cart, MAX(IF(event_name_with_screen ='click_payment-cart',cnt,NULL )) as click_payment_cart, FROM CTE GROUP BY ALL ORDER BY 1
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[바짝스터디 1주차 과제] ARRAY, STRUCT 연습 문제 / PIVOT 연습 문제 / 퍼널 쿼리 연습 문제
# ARRAY, STRUCT 연습 문제 1. array_exercises 테이블에서 각 영화(title) 별로 장르(genres)를 UNNEST 해서 보여주세요 SELECT title, genre FROM advanced.array_exercises CROSS JOIN UNNEST(genres) AS genre# ARRAY, STRUCT 연습 문제 2. array_exercises 테이블에서 각 영화(title)별로 배우(actor)와 배역(character)을 보여주세요. 배우와 배역은 별도의 컬럼으로 나와야 합니다. SELECT title, actor.actor, actor.character FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS actor# ARRAY, STRUCT 연습 문제 3. array_exercises 테이블에서 각 영화(title)별로 배우(actor), 배역(character), 장르(genre)를 출력하세요. 한 Row에 배우, 배역, 장르가 모두 표시되어야 합니다 SELECT title, actor.actor, actor.character, genre FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS actor, UNNEST(genres) AS genre# ARRAY, STRUCT 연습 문제 4. 앱 로그 데이터(app_logs)의 배열을 풀어주세요 SELECT user_id, event_date, event_name, user_pseudo_id, event_param.key, event_param.value.string_value, event_param.value.int_value FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event_param# PIVOT 연습 문제 1. orders 테이블에서 유저(user_id)별로 주문 금액(amount)의 합계를 PIVOT 해주세요. 날짜(order_date)를 행(Row)으로, user_id를 열(Column)으로 만들어아 합니다 -- SELECT -- order_date, -- MAX(IF(user_id = 1, amount, 0)) AS user_1, -- MAX(IF(user_id = 2, amount, 0)) AS user_2, -- MAX(IF(user_id = 3, amount, 0)) AS user_3 -- FROM advanced.orders -- GROUP BY -- order_date -- ORDER BY -- order_date SELECT order_date, MAX(IF(user_id = 1, sum_of_amount, 0)) AS user_1, MAX(IF(user_id = 2, sum_of_amount, 0)) AS user_2, MAX(IF(user_id = 3, sum_of_amount, 0)) AS user_3 FROM ( SELECT order_date, user_id, MAX(amount) AS sum_of_amount FROM advanced.orders GROUP BY order_date, user_id ) GROUP BY order_date ORDER BY order_date# PIVOT 연습 문제 2. orders 테이블에서 날짜(order_date)별로 유저들의 주문 금액(amount)의 합계를 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)으로 만들어야 합니다 SELECT user_id, SUM(IF(order_date = '2023-05-01', amount, 0)) AS `2023-05-01`, SUM(IF(order_date = '2023-05-02', amount, 0)) AS `2023-05-02`, SUM(IF(order_date = '2023-05-03', amount, 0)) AS `2023-05-03`, SUM(IF(order_date = '2023-05-04', amount, 0)) AS `2023-05-04`, SUM(IF(order_date = '2023-05-05', amount, 0)) AS `2023-05-05` FROM advanced.orders GROUP BY user_id ORDER BY user_id# PIVOT 연습 문제 3) orders 테이블에서 사용자(user_id)별, 날짜(order_date)별로 주문이 있다면 1, 없다면 0으로 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)로 만들고 주문을 많이 해도 1로 처리합니다 SELECT user_id, MAX(IF(order_date = '2023-05-01', amount, 0)) AS `2023-05-01`, MAX(IF(order_date = '2023-05-02', amount, 0)) AS `2023-05-02`, MAX(IF(order_date = '2023-05-03', amount, 0)) AS `2023-05-03`, MAX(IF(order_date = '2023-05-04', amount, 0)) AS `2023-05-04`, MAX(IF(order_date = '2023-05-05', amount, 0)) AS `2023-05-05` FROM ( SELECT user_id, order_date, CASE WHEN amount IS NULL THEN 0 ELSE 1 END AS amount FROM advanced.orders ) GROUP BY user_id ORDER BY user_id# PIVOT 연습 문제 version 1 -- WITH sub AS ( -- SELECT -- user_id, -- event_date, -- event_name, -- event_timestamp, -- user_pseudo_id, -- event_param.key AS key, -- event_param.value.string_value AS string_value, -- event_param.value.int_value AS int_value -- FROM ( -- SELECT -- * -- FROM advanced.app_logs -- WHERE -- event_date = '2022-08-01') -- CROSS JOIN UNNEST(event_params) AS event_param -- ) -- SELECT -- event_date, -- COUNT(user_id) AS user_cnt -- FROM ( -- SELECT -- user_id, -- event_date, -- event_name, -- event_timestamp, -- user_pseudo_id, -- MAX(IF(sub.key = 'firebase_screen', sub.string_value, NULL)) AS firebase_screen, -- MAX(IF(sub.key = 'food_id', sub.int_value, NULL)) AS food_id, -- MAX(IF(sub.key = 'session_id', sub.string_value, NULL)) AS session_id -- FROM sub -- GROUP BY -- user_id, -- event_date, -- event_name, -- event_timestamp, -- user_pseudo_id -- ) -- WHERE -- event_name = 'click_cart' # AND food_id = 1544 -- GROUP BY -- event_date # PIVOT 연습 문제 version 2 WITH base AS ( SELECT -- * EXCEPT(event_params), # * EXCEPT(컬럼) : 컬럼을 제외하고 모두 다 보여줘 event_date, event_timestamp, event_name, user_id, user_pseudo_id, MAX(IF(param.key = 'firebase_screen', param.value.string_value, NULL)) AS firebase_screen, MAX(IF(param.key = 'food_id', param.value.int_value, NULL)) AS food_id, MAX(IF(param.key = 'session_id', param.value.string_value, NULL)) AS session_id, FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS param WHERE event_date = "2022-08-01" GROUP BY ALL ) SELECT event_date, COUNT(user_id) AS user_cnt FROM base WHERE event_name = 'click_cart' # AND food_id = 1544 GROUP BY event_date# 퍼널 쿼리 연습 문제 퍼널 별 전환율 WITH base AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, platform, MAX(IF(event_param.key = 'firebase_screen', event_param.value.string_value, NULL)) AS firebase_screen, MAX(IF(event_param.key = 'session_id', event_param.value.string_value, NULL)) AS session_id FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event_param WHERE event_date BETWEEN '2022-08-01' AND '2022-08-18' GROUP BY ALL ), filter_event_and_concat_event_and_screen AS ( SELECT * EXCEPT(event_name, firebase_screen, event_timestamp), CONCAT(event_name,'-',firebase_screen) AS event_name_with_screen, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime FROM base WHERE event_name IN ('screen_view', 'click_payment') ), before_pivot AS ( SELECT event_date, event_name_with_screen, CASE WHEN event_name_with_screen = 'screen_view-welcome' THEN 1 WHEN event_name_with_screen = 'screen_view-home' THEN 2 WHEN event_name_with_screen = 'screen_view-food_category' THEN 3 WHEN event_name_with_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_with_screen = 'screen_view-cart' THEN 5 WHEN event_name_with_screen = 'click_payment-cart' THEN 6 ELSE NULL END AS step_number, COUNT(DISTINCT user_pseudo_id) AS cnt FROM filter_event_and_concat_event_and_screen GROUP BY ALL HAVING step_number IS NOT NULL ORDER BY event_date ) SELECT event_date, MAX(IF(step_number = 1, cnt, NULL)) AS `screen_view-welcome`, MAX(IF(step_number = 2, cnt, NULL)) AS `screen_view-home`, MAX(IF(step_number = 3, cnt, NULL)) AS `screen_view-food_category`, MAX(IF(step_number = 4, cnt, NULL)) AS `screen_view-restaurant`, MAX(IF(step_number = 5, cnt, NULL)) AS `screen_view-cart`, MAX(IF(step_number = 6, cnt, NULL)) AS `click_payment-cart`, CAST(MAX(IF(step_number = 2, cnt, NULL)) AS FLOAT64) / CAST(MAX(IF(step_number = 1, cnt, NULL)) AS FLOAT64) AS CVR1, CAST(MAX(IF(step_number = 3, cnt, NULL)) AS FLOAT64) / CAST(MAX(IF(step_number = 2, cnt, NULL)) AS FLOAT64) AS CVR2, CAST(MAX(IF(step_number = 4, cnt, NULL)) AS FLOAT64) / CAST(MAX(IF(step_number = 3, cnt, NULL)) AS FLOAT64) AS CVR3, CAST(MAX(IF(step_number = 5, cnt, NULL)) AS FLOAT64) / CAST(MAX(IF(step_number = 4, cnt, NULL)) AS FLOAT64) AS CVR4, CAST(MAX(IF(step_number = 6, cnt, NULL)) AS FLOAT64) / CAST(MAX(IF(step_number = 5, cnt, NULL)) AS FLOAT64) AS CVR5 FROM before_pivot GROUP BY event_date ORDER BY event_date
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[바짝스터디 1주차 과제] ARRAY, STRUCT 연습 문제 / PIVOT 연습 문제 / 퍼널 쿼리 연습 문제
ARRAY, STRUCT, UNNEST 연습 문제 array_exercises 테이블에서 각 영화(title)별로 장르(genres)를 UNNEST해서 보여주세요SELECT title, genres -- 오답: genres 배열을 그대로 선택하려 함. FROM `advanced.array_exercises` AS ae JOIN UNNEST(genres) AS genre; -- 오답: UNNEST와 잘못된 JOIN 방식 사용 SELECT title, -- 각 영화의 제목을 선택합니다. 이를 통해 각 장르가 어떤 영화에 속하는지 확인할 수 있습니다. genre -- UNNEST를 통해 평면화한 개별 장르를 선택합니다. FROM `advanced.array_exercises` AS ae -- advanced.array_exercises 테이블에서 데이터를 가져옵니다. CROSS JOIN UNNEST(genres) AS genre; -- genres 배열을 평면화하여 각 장르를 개별 행으로 만듭니다.genres라는 배열을 그대로 선택하려 했습니다.genres는 ARRAY 자료형이기 때문에 이 상태로는 각 장르를 개별적으로 볼 수 없습니다.UNNEST를 사용할 때 단순 JOIN을 사용했습니다. UNNEST를 통해 배열을 개별적인 행으로 변환할 때는 CROSS JOIN을 사용해야 합니다. 2. array_exercises 테이블에서 각 영화(title)별로 배우(actor)와 배역(character)을 보여주세요. 배우와 배역은 별도의 컬럼으로 나와야 합니다.SELECT title, actors.actor, -- 잘못된 접근 방식: 'actors'는 배열 형태이므로 직접 필드에 접근할 수 없음. actors.character -- 잘못된 접근 방식: 'actors'의 필드에 바로 접근하려고 하였음. FROM `advanced.array_exercises` AS ae CROSS JOIN UNNEST(actors) AS actor; -- 배열을 평면화하기 위해 UNNEST 사용, 하지만 이후 필드 접근 방식에서 실수가 있었음.SELECT title, -- 각 영화의 제목을 선택합니다. actor.actor, -- 배우의 이름을 선택합니다. UNNEST를 통해 평면화한 결과입니다. actor.character -- 배우가 맡은 배역을 선택합니다. UNNEST를 통해 평면화한 결과입니다. FROM `advanced.array_exercises` AS ae -- advanced.array_exercises 테이블에서 데이터를 가져옵니다. CROSS JOIN UNNEST(actors) AS actor; -- actors 배열을 평면화하여 각 배우와 배역 정보를 개별 행으로 만듭니다.actors 배열의 각 필드 (actor, character)에 접근하기 위해 단순히 배열을 호출했습니다. 그러나 배열은 UNNEST를 통해 각 요소를 개별 행으로 확장하지 않으면 필드에 접근할 수 없습니다.actors는 구조체(STRUCT)로 배열 안에 존재하기 때문에, 각 필드에 접근하려면 UNNEST 이후 명확한 필드명을 사용해야 했습니다.actor.actor: UNNEST된 배열의 actor 필드에 접근하여 배우의 이름을 가져옵니다.actor.character: UNNEST된 배열의 character 필드에 접근하여 각 배우의 배역 정보를 가져옵니다. 3. array_exercises 테이블에서 각 영화(title)별로 배우(actor), 배역(character), 장르 (genre)를 출력하세요. 한 Row에 배우, 배역, 장르가 모두 표시되어야 합니다.SELECT title, -- 각 영화의 제목을 선택합니다. 영화별 배우와 배역, 그리고 장르 정보를 확인할 수 있습니다. actor.actor AS actor, -- 배우의 이름을 선택합니다. UNNEST를 통해 평면화한 결과로 각 영화마다 배우의 이름을 가져옵니다. actor.character AS character, -- 배우가 맡은 배역을 선택합니다. UNNEST를 통해 평면화한 결과로 각 영화마다 배역 정보를 제공합니다. genre -- 영화의 장르를 선택합니다. UNNEST를 통해 평면화한 결과로 각 영화마다 해당하는 장르를 표시합니다. FROM `advanced.array_exercises` AS ae -- advanced.array_exercises 테이블에서 데이터를 가져옵니다. CROSS JOIN UNNEST(actors) AS actor -- actors 배열을 평면화하여 각 배우와 배역 정보를 개별 행으로 만듭니다. CROSS JOIN UNNEST(genres) AS genre -- genres 배열을 평면화하여 각 장르를 개별 행으로 만듭니다. WHERE actor.actor = "Chris Evans" -- 특정 배우의 이름을 기준으로 필터링합니다. 이 조건을 통해 Chris Evans가 출연한 영화만 선택합니다. AND genre = "Action" -- 특정 장르를 기준으로 필터링합니다. Action 장르의 영화만 선택합니다.UNNEST(actors) 및 UNNEST(genres): 각각의 배열을 평면화하여 필요한 정보에 접근하기 쉽게 만들어 줍니다. 평면화된 배열의 데이터를 각 필드별로 나누어 분석하고 이해하기가 쉬워집니다.AS actor, AS genre: 평면화된 결과에 별칭을 부여하여 이후 쿼리에서 이를 명확하게 사용할 수 있도록 합니다. 별칭을 사용함으로써 코드의 가독성이 향상되고, 각 필드를 참조할 때 혼동을 줄일 수 있습니다.앱 로그 데이터(app_logs)의 배열을 풀어주세요.SELECT user_id, event_date, event_name, user_pseudo_id, ae.key, ae.value.string_value, ae.value.int_value FROM `advanced.app_logs` CROSS JOIN UNNEST(event_params) AS aeWITH base AS ( SELECT event_date, event_timestamp, event_name, event_param.key AS key, -- event_params 배열의 key 값을 가져옵니다. 각 이벤트의 파라미터 키를 명확히 확인할 수 있도록 합니다. event_param.value AS value, -- event_params 배열의 value 구조체를 가져옵니다. value에는 여러 필드가 포함될 수 있으므로 전체 구조체를 가져옵니다. event_param.value.string_value AS string_value, -- value 구조체 내의 string_value 필드에 접근합니다. 문자열 형태의 파라미터 값을 직접 추출합니다. event_param.value.int_value AS int_value, -- value 구조체 내의 int_value 필드에 접근합니다. 정수 형태의 값을 명확히 추출하여 분석합니다. user_id -- 이벤트를 발생시킨 사용자 ID입니다. 사용자별 이벤트 참여를 분석하기 위해 필요합니다. FROM `advanced.app_logs` CROSS JOIN UNNEST(event_params) AS event_param -- event_params 배열을 평면화하여 각 파라미터를 독립된 행으로 만듭니다. WHERE event_date = "2022-08-01" -- 특정 날짜의 데이터를 선택합니다. 이 날짜에 발생한 이벤트만 필터링하여 분석합니다. ) SELECT event_date, -- 이벤트가 발생한 날짜를 선택합니다. 이를 통해 특정 날짜에 발생한 이벤트 정보를 확인할 수 있습니다. event_name, -- 이벤트의 이름을 선택합니다. 이벤트 종류별로 데이터를 구분하여 분석할 수 있도록 합니다. COUNT(DISTINCT user_id) AS cnt -- 중복되지 않는 사용자 수를 집계하여 이벤트별 사용자 수를 계산합니다. 각 이벤트에 참여한 고유 사용자의 수를 계산함으로써 이벤트의 인기도를 파악할 수 있습니다. FROM base GROUP BY event_date, event_name -- 날짜와 이벤트 이름으로 그룹화하여 집계합니다. 이를 통해 각 날짜의 이벤트별 참여도를 명확히 알 수 있습니다. ORDER BY cnt DESC; -- 사용자 수를 기준으로 내림차순 정렬하여 가장 많은 사용자가 참여한 이벤트를 먼저 보여줍니다. 이를 통해 인기 이벤트를 한눈에 파악할 수 있습니다. PIVOT 연습 문제 orders 테이블에서 유저(user_id)별로 주문 금액(amount)의 합계를 PIVOT해주세요. 날짜(order_date)를 행(Row)으로, user_id를 열(Column)으로 만들어야 합니다.SELECT order_date, SUM(IF(user_id = 1, amount, 0)) AS user_1, -- user_id가 1인 유저의 금액 합계를 해당 열로 표현합니다. SUM(IF(user_id = 2, amount, 0)) AS user_2, -- user_id가 2인 유저의 금액 합계를 해당 열로 표현합니다. SUM(IF(user_id = 3, amount, 0)) AS user_3 -- user_id가 3인 유저의 금액 합계를 해당 열로 표현합니다. FROM advanced.orders GROUP BY order_date -- 각 날짜(order_date)별로 그룹화하여 유저별 합계를 구합니다. ORDER BY order_date; -- 결과를 날짜 순으로 정렬합니다. orders 테이블에서 날짜(order_date)별로 유저들의 주문 금액(amount)의 합계를 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)으로 만들어야 합니다.SELECT user_id, SUM(IF(order_date = DATE '2023-05-01', amount, 0)) AS `2023-05-01`, -- 2023-05-01에 해당하는 주문 금액 합계 SUM(IF(order_date = DATE '2023-05-02', amount, 0)) AS `2023-05-02`, -- 2023-05-02에 해당하는 주문 금액 합계 SUM(IF(order_date = DATE '2023-05-03', amount, 0)) AS `2023-05-03`, -- 2023-05-03에 해당하는 주문 금액 합계 SUM(IF(order_date = DATE '2023-05-04', amount, 0)) AS `2023-05-04`, -- 2023-05-04에 해당하는 주문 금액 합계 SUM(IF(order_date = DATE '2023-05-05', amount, 0)) AS `2023-05-05` -- 2023-05-05에 해당하는 주문 금액 합계 FROM advanced.orders GROUP BY user_id -- 각 유저별로 그룹화하여 날짜별 주문 금액을 피벗 형태로 만듭니다. ORDER BY user_id; -- 결과를 user_id 순으로 정렬합니다.서브쿼리를 사용하는 대신, orders 테이블에서 직접 필요한 데이터를 집계하는 방식으로 간소화합니다. 3. orders 테이블에서 사용자(user_id)별, 날짜(order_date)별로 주문이 있다면 1, 없다면 0으로 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)로 만들고 주문을 많이 해도 1로 처리합니다.SELECT user_id, -- 각 사용자 ID를 행으로 표시합니다. MAX(IF(order_date = DATE '2023-05-01', 1, 0)) AS `2023-05-01`, -- 2023-05-01에 주문이 있으면 1, 없으면 0을 반환합니다. MAX(IF(order_date = DATE '2023-05-02', 1, 0)) AS `2023-05-02`, -- 2023-05-02에 주문이 있으면 1, 없으면 0을 반환합니다. MAX(IF(order_date = DATE '2023-05-03', 1, 0)) AS `2023-05-03`, -- 2023-05-03에 주문이 있으면 1, 없으면 0을 반환합니다. MAX(IF(order_date = DATE '2023-05-04', 1, 0)) AS `2023-05-04`, -- 2023-05-04에 주문이 있으면 1, 없으면 0을 반환합니다. MAX(IF(order_date = DATE '2023-05-05', 1, 0)) AS `2023-05-05` -- 2023-05-05에 주문이 있으면 1, 없으면 0을 반환합니다. FROM `advanced.orders` GROUP BY user_id -- 각 사용자별로 그룹화합니다. ORDER BY user_id; -- 사용자 ID 순으로 결과를 정렬하여 표시합니다.amount를 이용하여 MAX()를 사용한 부분에서 금액의 크기를 반환하게 됩니다. 하지만 이 문제에서는 금액이 아닌 주문 여부를 판별해야 합니다. 따라서 amount 대신 주문 여부를 확인하는 1이나 0을 사용해야 합니다.반환 값으로 amount를 사용해 최대값을 찾는 것은 주문이 여러 번 발생했을 때 의미가 없습니다. 우리는 단순히 주문이 있었는지 여부만을 확인하고자 하므로, 1 또는 0을 반환하는 것이 필요합니다. 주문이 한 번이라도 있었다면 1을, 없었다면 0을 반환하는 형태가 되어야 합니다.퍼널쿼리 연습문제각 퍼널의 유저 수를 집계데이터 : 2022-08-01 ~ 2022-08-18 -- Step 1: 초기 데이터 추출 - 기본 테이블(base)을 생성합니다. WITH base AS ( SELECT event_timestamp, event_date, event_name, user_id, user_pseudo_id, platform, MAX(IF(event_param.key = "firebase_screen", event_param.value.string_value, NULL)) AS firebase_screen, MAX(IF(event_param.key = "food_id", event_param.value.int_value, NULL)) AS food_id, MAX(IF(event_param.key = "session_id", event_param.value.string_value, NULL)) AS session_id FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event_param WHERE event_date BETWEEN "2022-01-01" AND "2022-12-31" GROUP BY event_timestamp, event_date, event_name, user_id, user_pseudo_id, platform -- 필요한 필드만 명시적으로 그룹화합니다. ), -- Step 2: 이벤트 이름과 화면 이름을 결합하여 새로운 형태로 가공합니다. filter_event_and_concat_event_and_screen AS ( SELECT * EXCEPT(event_name, firebase_screen, event_timestamp), -- 특정 필드를 제외하고 나머지를 선택 CONCAT(event_name, "-", firebase_screen) AS event_name_with_screen, -- 이벤트와 화면 이름을 결합하여 새로운 열 생성 DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime -- 타임스탬프를 서울 시간대로 변환 FROM base WHERE event_name IN ("screen_view", "click_payment") -- 필요한 이벤트만 명확하게 필터링합니다. 필터링할 이벤트를 명확하게 정의해야 합니다. ) -- Step 3: 퍼널 단계 정의 및 각 단계별 유저 수를 집계합니다. SELECT event_name_with_screen, event_date, CASE WHEN event_name_with_screen = "screen_view-welcome" THEN 1 WHEN event_name_with_screen = "screen_view-home" THEN 2 WHEN event_name_with_screen = "screen_view-food_category" THEN 3 WHEN event_name_with_screen = "screen_view-restaurant" THEN 4 WHEN event_name_with_screen = "screen_view-cart" THEN 5 WHEN event_name_with_screen = "click_payment-cart" THEN 6 ELSE NULL END AS step_number, COUNT(DISTINCT user_pseudo_id) AS cnt -- 중복 유저를 제거하기 위해 DISTINCT를 사용합니다. FROM filter_event_and_concat_event_and_screen GROUP BY event_name_with_screen, event_date, step_number -- 필요한 열만 명시적으로 그룹화합니다. HAVING step_number IS NOT NULL ORDER BY event_date;
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
빠짝스터디 1주차 과제
(사진)
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
빠짝스터디 1주차 과제
1. ARRAY, STRUCT 연습문제문제 1) array_exercise테이블에서 각 영화(title)별로 장르(genres)를 UNNEST 해서 보여주세요SELECT title, genres FROM `analystic-project.advanced.array_exercises` , UNNEST(genres) AS genres ;문제 2) array_exercises 테이블에서 각 영화(title)별로 배우(actor)와 배역(character)을 보여주세요. 배우와 배역은 별도의 컬럼으로 나와야합니다SELECT title, actors.actor, actors.character FROM `analystic-project.advanced.array_exercises` , UNNEST(actors) AS actors ;문제 3) array_exercises 테이블에서 각 영화(title)별로 배우(actor), 배역(character), 장르 (genre)를 출력하세요. 한 Row에 배우, 배역, 장르가 모두 표시되어야 합니다 SELECT title, actors.actor, actors.character, genres FROM `analystic-project.advanced.array_exercises` , UNNEST(actors) AS actors, UNNEST(genres) genres ;문제 4) 앱 로그 데이터(app_logs) 배열 풀기SELECT user_id, event_date, event_name, user_pseudo_id, pr.key, pr.value.string_value, pr.value.int_value FROM `analystic-project.advanced.app_logs` , UNNEST(event_params) AS pr WHERE event_date = "2022-08-01" LIMIT 1000 ;2. PIVOT 연습문제 풀이문제 1) orders 테이블에서 유저(user_id)별로 주문 금액(amount)의 합계를 PIVOT해주세요. 날짜(order_date)를 행(Row)으로, user_id를 열(Column)으로 만들어야 합니다SELECT order_date, COALESCE(SUM(IF(user_id = 1, amount, null)),0) AS user_1, COALESCE(SUM(IF(user_id = 2, amount, null)),0) AS user_2, COALESCE(SUM(IF(user_id = 3, amount, null)),0) AS user_3 FROM advanced.orders GROUP BY order_date ORDER BY order_date ;문제 2) orders 테이블에서 날짜(order_date)별로 유저들의 주문 금액(amount)의 합계를 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)으로 만들어야 합니다SELECT user_id, COALESCE(SUM(IF(order_date = '2023-05-01', amount, null)),0) AS `2023-05-01`, COALESCE(SUM(IF(order_date = '2023-05-02', amount, null)),0) AS `2023-05-02`, COALESCE(SUM(IF(order_date = '2023-05-03', amount, null)),0) AS `2023-05-03`, COALESCE(SUM(IF(order_date = '2023-05-04', amount, null)),0) AS `2023-05-04`, COALESCE(SUM(IF(order_date = '2023-05-05', amount, null)),0) AS `2023-05-05`, FROM advanced.orders GROUP BY user_id ORDER BY user_id ;문제 3) orders 테이블에서 사용자(user_id)별, 날짜(order_date)별로 주문이 있다면 1, 없다면 0으로 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)로 만들고 주문을 많이 해도 1로 처리합니다SELECT user_id, MAX(IF(order_date = '2023-05-01' AND order_id is not null, 1, 0)) AS `2023-05-01`, MAX(IF(order_date = '2023-05-02' AND order_id is not null, 1, 0)) AS `2023-05-02`, MAX(IF(order_date = '2023-05-03' AND order_id is not null, 1, 0)) AS `2023-05-03`, MAX(IF(order_date = '2023-05-04' AND order_id is not null, 1, 0)) AS `2023-05-04`, MAX(IF(order_date = '2023-05-05' AND order_id is not null, 1, 0)) AS `2023-05-05`, FROM advanced.orders GROUP BY user_id ORDER BY user_id ;문제 4)user_id = 32888이 카트 추가하기(click_cart)를 누를때 어떤 음식(food_id)을 담았나요?WITH app_order_raw AS ( SELECT user_id, event_date, event_name, user_pseudo_id, pr.key, pr.value.string_value, pr.value.int_value FROM advanced.app_logs, UNNEST(event_params) AS pr WHERE event_date = '2022-08-01' ) SELECT user_id, event_date, event_name, user_pseudo_id, MAX(IF(key = 'firebase_screen', string_value, null)) AS firebase_screen, MAX(IF(key = 'food_id', int_value, null)) AS food_id, MAX(IF(key = 'session_id', string_value, null)) AS session_id, FROM app_order_raw GROUP BY user_id, event_date, event_name, user_pseudo_id ;3. 퍼널분석문제 1) 각 퍼널의 유저 수를 집계 / 데이터 기준: 2022-08-01 ~ 2022-08-18WITH funnel_data_raw AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null)) AS screen_name, CONCAT(event_name, '-', MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null))) AS event_name_with_screen FROM advanced.app_logs, UNNEST(event_params) AS pr WHERE event_date BETWEEN '2022-08-01' AND '2022-08-18' GROUP BY 1,2,3,4,5 ) SELECT event_name_with_screen, CASE WHEN event_name_with_screen = 'screen_view-welcome' THEN 1 WHEN event_name_with_screen = 'screen_view-home' THEN 2 WHEN event_name_with_screen = 'screen_view-food_category' THEN 3 WHEN event_name_with_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_with_screen = 'screen_view-cart' THEN 5 WHEN event_name_with_screen = 'click_payment-cart' THEN 6 END AS step_number, COUNT(DISTINCT user_pseudo_id) AS cnt FROM funnel_data_raw WHERE event_name IN ('screen_view', 'click_payment') AND screen_name IN ('welcome', 'home', 'food_category', 'restaurant', 'cart') GROUP BY 1,2 ORDER BY 2 ;문제 2) 일자별 퍼널 유저 수 집계 WITH funnel_data_raw AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null)) AS screen_name, CONCAT(event_name, '-', MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null))) AS event_name_with_screen FROM advanced.app_logs, UNNEST(event_params) AS pr WHERE event_date BETWEEN '2022-08-01' AND '2022-08-18' GROUP BY 1,2,3,4,5 ) SELECT event_date, event_name_with_screen, CASE WHEN event_name_with_screen = 'screen_view-welcome' THEN 1 WHEN event_name_with_screen = 'screen_view-home' THEN 2 WHEN event_name_with_screen = 'screen_view-food_category' THEN 3 WHEN event_name_with_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_with_screen = 'screen_view-cart' THEN 5 WHEN event_name_with_screen = 'click_payment-cart' THEN 6 END AS step_number, COUNT(DISTINCT user_pseudo_id) AS cnt FROM funnel_data_raw WHERE event_name IN ('screen_view', 'click_payment') AND screen_name IN ('welcome', 'home', 'food_category', 'restaurant', 'cart') GROUP BY 1,2,3 ORDER BY 1,3 ;문제 3) 일자별 퍼널 유저 수 집계형태를 PIVOT형태로 전환하기WITH funnel_data_raw AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null)) AS screen_name, CONCAT(event_name, '-', MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null))) AS event_name_with_screen FROM advanced.app_logs, UNNEST(event_params) AS pr WHERE event_date BETWEEN '2022-08-01' AND '2022-08-18' GROUP BY 1,2,3,4,5 ), daily_funnel_user_count as ( SELECT event_date, event_name_with_screen, CASE WHEN event_name_with_screen = 'screen_view-welcome' THEN 1 WHEN event_name_with_screen = 'screen_view-home' THEN 2 WHEN event_name_with_screen = 'screen_view-food_category' THEN 3 WHEN event_name_with_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_with_screen = 'screen_view-cart' THEN 5 WHEN event_name_with_screen = 'click_payment-cart' THEN 6 END AS step_number, COUNT(DISTINCT user_pseudo_id) AS cnt FROM funnel_data_raw WHERE event_name IN ('screen_view', 'click_payment') AND screen_name IN ('welcome', 'home', 'food_category', 'restaurant', 'cart') GROUP BY 1,2,3 ORDER BY 1,3 ) SELECT event_date, MAX(IF(step_number = 1, cnt, null)) AS `screen_view-welcome`, MAX(IF(step_number = 2, cnt, null)) AS `screen_view-home`, MAX(IF(step_number = 3, cnt, null)) AS `screen_view-food_category`, MAX(IF(step_number = 4, cnt, null)) AS `screen_view-restaurant`, MAX(IF(step_number = 5, cnt, null)) AS `screen_view-cart`, MAX(IF(step_number = 6, cnt, null)) AS `click_payment-cart`, FROM daily_funnel_user_count GROUP BY ALL ORDER BY 1 ;
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 1주차 과제] ARRAY STRUCT, PIVOT, 퍼널 쿼리 연습문제
<목차>이번주차 중요 키워드 ARRAY, STRUCT 연습문제 1~4 추가 인사이트 PIVOT 연습문제 1~4퍼널분석 연습문제 1주차 느낀점 및 하고싶은 말 (솔직후기) 🔐 이번주차 중요 키워드 : ARRAY(배열), STRUCT(구조체),pivot, 퍼널분석 ✅ ARRAY, STRUCT 연습문제 1<INPUT> advanced.array_exercises<OUTPUT> title | genre## 1) array_exercises 테이블에서 각 영화(title)별로 장르(genres)을 UNNEST해서 보여주세요. SELECT title, genre FROM advanced.array_exercises AS ae CROSS JOIN UNNEST (genres) AS genre 📊 추가 인사이트 정리Q. 어느 genre에서 가장 많은 영화가 있을까?A. action에서 가장 많은 영화가 있군, 나도 action 좋아하는데 2024년 베놈(venom)2 보러가야징 ㅎ ✅ ARRAY, STRUCT 연습문제 2<INPUT> 위와 동일한 데이터 셋 advanced.array_exercises<OUTPUT> title | actor | character## 2) array_exercises 테이블에서 각 영화(title)별로 배우(actor)와 배역(character)을 보여주세요. 배우와 배역은 별도의 컬럼으로 나와야 합니다. SELECT title, actor.actor, actor.character FROM advanced.array_exercises AS ae CROSS JOIN UNNEST (actors) as actor # actors = [STRUCT(STRING,STRING)], array 형태이므로 SAFE_OFFSET으로 직접 데이터에 접근해도 # 되지만, 새로운 컬럼으로 만들어지기 때문에 long format이 아니라 wide format으로 테이블이 만들어진다. actors[SAFE_OFFSET(0)].actor AS first_actor actors[SAFE_OFFSET(1)].character AS second_character ✅ ARRAY, STRUCT 연습문제 3<INPUT> 위와 동일한 데이터 셋 advanced.array_exercises<OUTPUT> title | actor | character | genre### 3) array_exercises 테이블에서 각 영화(title)별로 배우(actor), 배역(character), 장르(genre)를 출력하세요. SELECT title, actor.actor as actor, actor.character as character, genre FROM advanced.array_exercises as ae cross join unnest(actors) as actor cross join unnest(genres) as genre ✅ ARRAY, STRUCT 연습문제 4<INPUT> 위와 동일한 데이터 셋 advanced.array_exercises <OUTPUT> user_id | event_date | event_timestamp | event_name | key | string_value | int_value | user_id_1 | user_id_2 | user_id ### 4) 앱 로그 데이터 풀어주세요. # 데이터 - 미리보기 - event_params의 데이터 유형 RECORD = STRUCT WITH app_log AS ( select user_id, event_date, event_timestamp, event_name, event_param.key as key, -- event_param.value as value, event_param.value.string_value as string_value, event_param.value.int_value as int_value -- event_params, from advanced.app_logs cross join unnest(event_params) as event_param where event_date = '2022-08-01' ) <추가 인사이트 코드 추가> select event_date, event_name, COUNT(DISTINCT user_id) as cnt from app_log GROUP BY ALL ORDER BY cnt DESC SELECT event_date, event_name, cnt, LAG(cnt) OVER (ORDER BY cnt DESC) as before_event_cnt, ROUND(1 - (cnt / LAG(cnt) OVER (ORDER BY cnt DESC)),2) as churn_rate FROM ( select event_date, event_name, COUNT(DISTINCT user_id) as cnt from app_log GROUP BY ALL ) ORDER BY cnt DESC 📊 추가 인사이트 정리 Q. 어디 event에서 가장 많이 이탈하고 있는지? 파악해주세요. A. 퍼널단계가 다음 표와 같은 순서이고, 데이터 정합성이 있다고 가정하고, 1- 리텐션율 = 이탈율이라 했을때 churn rate은 아래 3가지 단계에서 이탈율이 가장 높음 → 추후 더 자세히 살펴볼 필요가 있음 (1) click_restaurant_nearby > click_login, (2) screen_view > click_food_category (3) click_restaurant > click_banner ✅ Pivot 연습문제 1<INPUT> advanced.orders<OUTPUT> order_date, user_id_1, user_id_2, user_id_3 -- 1) orders 테이블에서 유저(user_id)별로 주문금액(amount)의 합계를 PIVOT 해주세요. -- 날짜(order_date)를 행(row)으로, user_id를 열(column)으로 만들어야 합니다. SELECT order_date, SUM(IF (user_id = 1, sum_of_amount,0)) as user_id_1, SUM(IF (user_id = 2, sum_of_amount,0)) as user_id_2, SUM(IF (user_id = 3, sum_of_amount,0)) as user_id_3 FROM ( SELECT order_date, user_id, SUM(amount) as sum_of_amount FROM advanced.orders GROUP BY order_date, user_id ) GROUP BY order_date ORDER BY order_date ✅ Pivot 연습문제 2 <INPUT> advanced.orders<OUTPUT> user_id | 2023-05-01 | 2023-05-02 | 2023-05-03 | 2023-05-04 | 2023-05-05-- 2) orders 테이블에서, 날짜(order_date)별로 유저들의 주문금액(amount)의 합계를 pivot 해주세요. user_id를 행(row)으로, order_date를 열(column)으로 만들어야 합니다. with tbl_2 as ( select user_id, order_date, sum(amount) as amount from advanced.orders group by all order by 1,2 ) # 이름 지정 오류 해결법 : column alias 지정할떄, 영어제외하고 날짜형 백틱(`)을 통해 지정할 수 있음 # MAX 대신 ANY_VALUE() 함수는 그룹화 할 대상 중 임의의 값을 선택하는 함수(null 제외) select user_id, MAX(if(order_date = "2023-05-01", amount,0)) as `2023-05-01`, MAX(if(order_date = "2023-05-02", amount,0)) as `2023-05-02`, MAX(if(order_date = "2023-05-03", amount,0)) as `2023-05-03`, MAX(if(order_date = "2023-05-04", amount,0)) as `2023-05-04`, MAX(if(order_date = "2023-05-05", amount,0)) as `2023-05-05` from tbl_2 group by all ✅ Pivot 연습문제 3 <INPUT> advanced.orders<OUTPUT> user_id | 2023-05-01 | 2023-05-02 | 2023-05-03 | 2023-05-04 | 2023-05-05--3 ) order 테이블에서 사용자(user_id)별, 날짜(order_date)별 주문이 있다면 1, 없다면 0으로 pivot 해주세요. -- user_id를 행(row)으로, order_date를 열(column)로 만들고, 주문을 많이 해도 1로 처리합니다. select user_id, MAX(if(order_date = "2023-05-01", 1,0)) as `2023-05-01`, MAX(if(order_date = "2023-05-02", 1,0)) as `2023-05-02`, MAX(if(order_date = "2023-05-03", 1,0)) as `2023-05-03`, MAX(if(order_date = "2023-05-04", 1,0)) as `2023-05-04`, MAX(if(order_date = "2023-05-05", 1,0)) as `2023-05-05` from tbl_2 group by user_id ✅ Pivot 연습문제 4 <INPUT> advanced.app_logs <OUTPUT> event_date | event_timestamp | event_name | user_id |user_pseudo_id | firebase_screen | food_id | session_id--4 ) 앱 로그 데이터 배열 PIVOT 하기 # 참고로 unique key는 user_id + event_timestamp select -- * EXCEPT(event_params) #except(컬럼) : 컬럼을 제외하고 모두 다 보여주는..! event_date, event_timestamp, event_name, user_id, user_pseudo_id, max(if(param.key = "firebase_screen", param.value.string_value, NULL)) as firebase_screen, max(if(param.key = "food_id", param.value.int_value, null)) as food_id, max(if(param.key = "session_id", param.value.string_value,null)) as session_id from advanced.app_logs cross join unnest(event_params) as param where event_date = '2022-08-01' group by all limit 100 ✅ 퍼널 분석 (Funnel) 연습문제 <INPUT> advanced.app_logs <OUTPUT> event_date | event_name_firebase_screen | step_number | user_cnt -- 퍼널 단계 : welcome, home, food_category, restaurant, cart, click_payment -- event_name, firebase_screen 연결되어 해당 퍼널의 값 | step_number | cnt -- 기간 : 2022-08-01 ~ 2022-08-18 -- 사용할 데이터 : advanced.app_logs with unnest_tbl as ( select -- * EXCEPT(event_params) #except(컬럼) : 컬럼을 제외하고 모두 다 보여주는..! event_date, event_timestamp, event_name, user_id, user_pseudo_id, max(if(param.key = "firebase_screen", param.value.string_value, NULL)) as firebase_screen, max(if(param.key = "food_id", param.value.int_value, null)) as food_id, max(if(param.key = "session_id", param.value.string_value,null)) as session_id from advanced.app_logs cross join unnest(event_params) as param where event_date BETWEEN '2022-08-01' AND "2022-08-18" group by all ) , filter as ( select * EXCEPT(event_name, firebase_screen, event_timestamp), concat(event_name, '-', firebase_screen) as event_name_firebase_screen , DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') as event_time from unnest_tbl where event_name IN ("screen_view", "click_payment") ) SELECT event_date, event_name_firebase_screen, CASE WHEN event_name_firebase_screen = "screen_view-welcome" then 1 WHEN event_name_firebase_screen = "screen_view-home" then 2 WHEN event_name_firebase_screen = "screen_view-food_category" then 3 WHEN event_name_firebase_screen = "screen_view-restaurant" then 4 WHEN event_name_firebase_screen = "screen_view-cart" then 5 WHEN event_name_firebase_screen = "click_payment-cart" then 6 ELSE NULL END AS step_number, COUNT(DISTINCT user_pseudo_id) as cnt FROM filter group by all having step_number IS NOT NULL order by event_date 👩💻 1주차 느낀점 및 하고싶은 말 (솔직후기) (1) ARRAY, STRUCT - 빅쿼리 사용법은 처음이라 이 문법은 생소했습니다. 학습완료! (2) PIVOT - 태블로로 시각화할떄 long format으로 시각화하려다 보니 쿼리 로딩시간이 초과되어 alert을 받은 적이 있습니다. ㅎㅎ.. 이때 SQL로는 pivot하는 방법을 잘 몰랐는데 학습완료! (3) 퍼널 분석 - 퍼널분석의 큰 흐름은 대충 알고 있었는데 퍼널분석 종류나 집계기준 종류, 해석하는 법 등 퍼널 분석 결과를 어떻게 바라보아야 하는지 정리해주셔서 명쾌했다.쿼리 기초문법은 이론적으로 알고 있었으나 첫 데이터관련 인턴을 하면서, 실무에서 하는 업무는 그보다 더 복잡하고, 비즈니스적으로 어떻게 바라보아야 하고, 해석해야하는지 중요하다는 것을 알게되었습니다. 그러나, 실무에서 또한 의문점이 발생해도 명쾌한 해답을 얻기 어려웠습니다. 이에 데이터 분석가를 희망은 하고 있지만, 어떻게 공부해야 할지에 대한 지속적으로 의문이 들었습니다. 그때 마침, 카일스쿨님의 강의를 우연히 접하게 되었고, 그동안 조각조각이었던 개념들을 SQL로 작성하고, 시각화까지 하는방법들을 모두 보여주시면서 데이터분석에 대한 자신감이 붙을 수 있었던 것 같습니다. 카일스쿨님이 그동안 걸어오신 길이 모두 보이는 강의였습니다. 강의 만들어주셔서 정말 감사합니다. 🙏
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
바짝스터디 1주차 과제
Array, Struct 연습문제-- genre unnest select title, genre from `advanced.array_exercises` cross join UNNEST(genres) as genre; -- actor unnest select title, actor.actor, actor.character from `advanced.array_exercises` cross join UNNEST(actors) as actor; -- actor, genre unnest select title, actor.actor, actor.character, genre from `advanced.array_exercises` cross join UNNEST(actors) as actor cross join UNNEST(genres) as genre;PIVOT 연습문제-- 일자 별 사용자의 이벤트 발생 횟수 select order_date , sum(if(user_id=1, amount, 0)) as user_1, sum(if(user_id=2, amount, 0)) as user_2, sum(if(user_id=3, amount, 0)) as user_3 from `advanced.orders` group by order_date order by order_date; -- 사용자 별 일일 이벤트 발생 횟수 select user_id, sum(if(order_date='2023-05-01', amount, 0)) as `2023-05-01`, sum(if(order_date='2023-05-02', amount, 0)) as `2023-05-02`, sum(if(order_date='2023-05-03', amount, 0)) as `2023-05-03`, sum(if(order_date='2023-05-04', amount, 0)) as `2023-05-04`, sum(if(order_date='2023-05-05', amount, 0)) as `2023-05-05` from `advanced.orders` group by user_id order by user_id; -- 사용자 별 일일 이벤트 발생 여부 select user_id, max(if(order_date='2023-05-01', 1, 0)) as `2023-05-01`, max(if(order_date='2023-05-02', 1, 0)) as `2023-05-02`, max(if(order_date='2023-05-03', 1, 0)) as `2023-05-03`, max(if(order_date='2023-05-04', 1, 0)) as `2023-05-04`, max(if(order_date='2023-05-05', 1, 0)) as `2023-05-05` from `advanced.orders` group by user_id order by user_id;퍼널 연습문제with data_event_name_with_screen_step as ( with data_event_name_with_screen as ( with base as (select event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') as event_time, event_name, user_id, user_pseudo_id, max(if(event_param.key = 'firebase_screen',event_param.value.string_value , null)) as firebase_screen from `advanced.app_logs` cross join unnest(event_params) as event_param where event_date between '2022-08-01' and '2022-08-18' group by all ) select * except(event_name, firebase_screen), concat(event_name, '-', firebase_screen) as event_name_with_screen from base ) select COUNT(DISTINCT user_pseudo_id) AS cnt, event_date, event_name_with_screen, (CASE WHEN event_name_with_screen = "screen_view-welcome" THEN 1 WHEN event_name_with_screen = "screen_view-home" THEN 2 WHEN event_name_with_screen = "screen_view-food_category" THEN 3 WHEN event_name_with_screen = "screen_view-restaurant" THEN 4 WHEN event_name_with_screen = "screen_view-cart" THEN 5 WHEN event_name_with_screen = "click_payment-cart" THEN 6 ELSE NULL END) as step_number from data_event_name_with_screen group by all having step_number is not null order by event_date ) -- 이벤트로 pivot select event_date, sum(if(event_name_with_screen='screen_view-welcome', cnt, 0)) as `screen_view-welcome`, sum(if(event_name_with_screen='screen_view-home', cnt, 0)) as `screen_view-home`, sum(if(event_name_with_screen='screen_view-food_category', cnt, 0)) as `screen_view-food_category`, sum(if(event_name_with_screen='screen_view-restaurant', cnt, 0)) as `screen_view-restaurant`, sum(if(event_name_with_screen='screen_view-cart', cnt, 0)) as `screen_view-cart`, sum(if(event_name_with_screen='click_payment-cart', cnt, 0)) as `click_payment-cart` from data_event_name_with_screen_step group by all order by event_date ; 이전 부터 쿼리로 퍼널 분석하기 좋은 방법이 없을까 생각을 했었는데 방향성을 알게 되어 좋은 것 같습니다.
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[바짝스터디 1주차 과제] ARRAY, STRUCT 연습 문제 / PIVOT 연습 문제 / 퍼널 쿼리 연습 문제
1. ARRAY, STRUCT 연습문제문제 1) array_exercise테이블에서 각 영화(title)별로 장르(genres)를 UNNEST 해서 보여주세요SELECT title, genres FROM `analystic-project.advanced.array_exercises` , UNNEST(genres) AS genres ;문제 2) array_exercises 테이블에서 각 영화(title)별로 배우(actor)와 배역(character)을 보여주세요. 배우와 배역은 별도의 컬럼으로 나와야합니다SELECT title, actors.actor, actors.character FROM `analystic-project.advanced.array_exercises` , UNNEST(actors) AS actors ;문제 3) array_exercises 테이블에서 각 영화(title)별로 배우(actor), 배역(character), 장르 (genre)를 출력하세요. 한 Row에 배우, 배역, 장르가 모두 표시되어야 합니다 SELECT title, actors.actor, actors.character, genres FROM `analystic-project.advanced.array_exercises` , UNNEST(actors) AS actors, UNNEST(genres) genres ;문제 4) 앱 로그 데이터(app_logs) 배열 풀기SELECT user_id, event_date, event_name, user_pseudo_id, pr.key, pr.value.string_value, pr.value.int_value FROM `analystic-project.advanced.app_logs` , UNNEST(event_params) AS pr WHERE event_date = "2022-08-01" LIMIT 1000 ;2. PIVOT 연습문제 풀이문제 1) orders 테이블에서 유저(user_id)별로 주문 금액(amount)의 합계를 PIVOT해주세요. 날짜(order_date)를 행(Row)으로, user_id를 열(Column)으로 만들어야 합니다SELECT order_date, COALESCE(SUM(IF(user_id = 1, amount, null)),0) AS user_1, COALESCE(SUM(IF(user_id = 2, amount, null)),0) AS user_2, COALESCE(SUM(IF(user_id = 3, amount, null)),0) AS user_3 FROM advanced.orders GROUP BY order_date ORDER BY order_date ;문제 2) orders 테이블에서 날짜(order_date)별로 유저들의 주문 금액(amount)의 합계를 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)으로 만들어야 합니다SELECT user_id, COALESCE(SUM(IF(order_date = '2023-05-01', amount, null)),0) AS `2023-05-01`, COALESCE(SUM(IF(order_date = '2023-05-02', amount, null)),0) AS `2023-05-02`, COALESCE(SUM(IF(order_date = '2023-05-03', amount, null)),0) AS `2023-05-03`, COALESCE(SUM(IF(order_date = '2023-05-04', amount, null)),0) AS `2023-05-04`, COALESCE(SUM(IF(order_date = '2023-05-05', amount, null)),0) AS `2023-05-05`, FROM advanced.orders GROUP BY user_id ORDER BY user_id ;문제 3) orders 테이블에서 사용자(user_id)별, 날짜(order_date)별로 주문이 있다면 1, 없다면 0으로 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)로 만들고 주문을 많이 해도 1로 처리합니다SELECT user_id, MAX(IF(order_date = '2023-05-01' AND order_id is not null, 1, 0)) AS `2023-05-01`, MAX(IF(order_date = '2023-05-02' AND order_id is not null, 1, 0)) AS `2023-05-02`, MAX(IF(order_date = '2023-05-03' AND order_id is not null, 1, 0)) AS `2023-05-03`, MAX(IF(order_date = '2023-05-04' AND order_id is not null, 1, 0)) AS `2023-05-04`, MAX(IF(order_date = '2023-05-05' AND order_id is not null, 1, 0)) AS `2023-05-05`, FROM advanced.orders GROUP BY user_id ORDER BY user_id ;문제 4)user_id = 32888이 카트 추가하기(click_cart)를 누를때 어떤 음식(food_id)을 담았나요?WITH app_order_raw AS ( SELECT user_id, event_date, event_name, user_pseudo_id, pr.key, pr.value.string_value, pr.value.int_value FROM advanced.app_logs, UNNEST(event_params) AS pr WHERE event_date = '2022-08-01' ) SELECT user_id, event_date, event_name, user_pseudo_id, MAX(IF(key = 'firebase_screen', string_value, null)) AS firebase_screen, MAX(IF(key = 'food_id', int_value, null)) AS food_id, MAX(IF(key = 'session_id', string_value, null)) AS session_id, FROM app_order_raw GROUP BY user_id, event_date, event_name, user_pseudo_id ;3. 퍼널분석문제 1) 각 퍼널의 유저 수를 집계 / 데이터 기준: 2022-08-01 ~ 2022-08-18WITH funnel_data_raw AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null)) AS screen_name, CONCAT(event_name, '-', MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null))) AS event_name_with_screen FROM advanced.app_logs, UNNEST(event_params) AS pr WHERE event_date BETWEEN '2022-08-01' AND '2022-08-18' GROUP BY 1,2,3,4,5 ) SELECT event_name_with_screen, CASE WHEN event_name_with_screen = 'screen_view-welcome' THEN 1 WHEN event_name_with_screen = 'screen_view-home' THEN 2 WHEN event_name_with_screen = 'screen_view-food_category' THEN 3 WHEN event_name_with_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_with_screen = 'screen_view-cart' THEN 5 WHEN event_name_with_screen = 'click_payment-cart' THEN 6 END AS step_number, COUNT(DISTINCT user_pseudo_id) AS cnt FROM funnel_data_raw WHERE event_name IN ('screen_view', 'click_payment') AND screen_name IN ('welcome', 'home', 'food_category', 'restaurant', 'cart') GROUP BY 1,2 ORDER BY 2 ;문제 2) 일자별 퍼널 유저 수 집계 WITH funnel_data_raw AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null)) AS screen_name, CONCAT(event_name, '-', MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null))) AS event_name_with_screen FROM advanced.app_logs, UNNEST(event_params) AS pr WHERE event_date BETWEEN '2022-08-01' AND '2022-08-18' GROUP BY 1,2,3,4,5 ) SELECT event_date, event_name_with_screen, CASE WHEN event_name_with_screen = 'screen_view-welcome' THEN 1 WHEN event_name_with_screen = 'screen_view-home' THEN 2 WHEN event_name_with_screen = 'screen_view-food_category' THEN 3 WHEN event_name_with_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_with_screen = 'screen_view-cart' THEN 5 WHEN event_name_with_screen = 'click_payment-cart' THEN 6 END AS step_number, COUNT(DISTINCT user_pseudo_id) AS cnt FROM funnel_data_raw WHERE event_name IN ('screen_view', 'click_payment') AND screen_name IN ('welcome', 'home', 'food_category', 'restaurant', 'cart') GROUP BY 1,2,3 ORDER BY 1,3 ;문제 3) 일자별 퍼널 유저 수 집계형태를 PIVOT형태로 전환하기WITH funnel_data_raw AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null)) AS screen_name, CONCAT(event_name, '-', MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null))) AS event_name_with_screen FROM advanced.app_logs, UNNEST(event_params) AS pr WHERE event_date BETWEEN '2022-08-01' AND '2022-08-18' GROUP BY 1,2,3,4,5 ), daily_funnel_user_count as ( SELECT event_date, event_name_with_screen, CASE WHEN event_name_with_screen = 'screen_view-welcome' THEN 1 WHEN event_name_with_screen = 'screen_view-home' THEN 2 WHEN event_name_with_screen = 'screen_view-food_category' THEN 3 WHEN event_name_with_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_with_screen = 'screen_view-cart' THEN 5 WHEN event_name_with_screen = 'click_payment-cart' THEN 6 END AS step_number, COUNT(DISTINCT user_pseudo_id) AS cnt FROM funnel_data_raw WHERE event_name IN ('screen_view', 'click_payment') AND screen_name IN ('welcome', 'home', 'food_category', 'restaurant', 'cart') GROUP BY 1,2,3 ORDER BY 1,3 ) SELECT event_date, MAX(IF(step_number = 1, cnt, null)) AS `screen_view-welcome`, MAX(IF(step_number = 2, cnt, null)) AS `screen_view-home`, MAX(IF(step_number = 3, cnt, null)) AS `screen_view-food_category`, MAX(IF(step_number = 4, cnt, null)) AS `screen_view-restaurant`, MAX(IF(step_number = 5, cnt, null)) AS `screen_view-cart`, MAX(IF(step_number = 6, cnt, null)) AS `click_payment-cart`, FROM daily_funnel_user_count GROUP BY ALL ORDER BY 1 ;4. 새롭게 배웠던 점쿼리를 활용하여 PIVOT 형태로 데이터를 변환 할 수 있는 스킬을 습득함."GROUP BY ALL" 이라는 방식으로 전체 집계를 할 수 있다는 방법을 처음 알게 됨.`` (백틱) 기호를 사용하면 특수문자 '-'와 한글까지 컬럼명으로 사용할 수 있다는 것을 처음 알게 됨.실무에서 FROM절에 UNNEST() 함수 앞에 ',' 를 왜(Why) 써야되는지에 대한 궁금했는데, CROSS JOIN의 약자로 사용하고 있다는 것을 처음 알게 됨.ARRAY와 STRUCT 구조의 차이점과 해당 데이터타입 구조의 데이터를 추출하기 위한 방법을 명확하게 알게 됨.5. 느낀점사소하지만 스킬적인 부분으로 새롭게 배웠던 부분은 따로 잘 정리해두어 업무에 바로 사용해볼 예정. 또한 데이터 탐색 방법으로 Funnel 사용하는데 있어, Open 퍼널과 Closed 퍼널 방식의 종류를 배울 수 있어 뿌듯했음.