묻고 답해요
141만명의 커뮤니티!! 함께 토론해봐요.
인프런 TOP Writers
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
빠짝스터디 1주차 과제
1. ARRAY, STRUCT 연습문제문제 1) array_exercise테이블에서 각 영화(title)별로 장르(genres)를 UNNEST 해서 보여주세요SELECT title, genres FROM `analystic-project.advanced.array_exercises` , UNNEST(genres) AS genres ;문제 2) array_exercises 테이블에서 각 영화(title)별로 배우(actor)와 배역(character)을 보여주세요. 배우와 배역은 별도의 컬럼으로 나와야합니다SELECT title, actors.actor, actors.character FROM `analystic-project.advanced.array_exercises` , UNNEST(actors) AS actors ;문제 3) array_exercises 테이블에서 각 영화(title)별로 배우(actor), 배역(character), 장르 (genre)를 출력하세요. 한 Row에 배우, 배역, 장르가 모두 표시되어야 합니다 SELECT title, actors.actor, actors.character, genres FROM `analystic-project.advanced.array_exercises` , UNNEST(actors) AS actors, UNNEST(genres) genres ;문제 4) 앱 로그 데이터(app_logs) 배열 풀기SELECT user_id, event_date, event_name, user_pseudo_id, pr.key, pr.value.string_value, pr.value.int_value FROM `analystic-project.advanced.app_logs` , UNNEST(event_params) AS pr WHERE event_date = "2022-08-01" LIMIT 1000 ;2. PIVOT 연습문제 풀이문제 1) orders 테이블에서 유저(user_id)별로 주문 금액(amount)의 합계를 PIVOT해주세요. 날짜(order_date)를 행(Row)으로, user_id를 열(Column)으로 만들어야 합니다SELECT order_date, COALESCE(SUM(IF(user_id = 1, amount, null)),0) AS user_1, COALESCE(SUM(IF(user_id = 2, amount, null)),0) AS user_2, COALESCE(SUM(IF(user_id = 3, amount, null)),0) AS user_3 FROM advanced.orders GROUP BY order_date ORDER BY order_date ;문제 2) orders 테이블에서 날짜(order_date)별로 유저들의 주문 금액(amount)의 합계를 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)으로 만들어야 합니다SELECT user_id, COALESCE(SUM(IF(order_date = '2023-05-01', amount, null)),0) AS `2023-05-01`, COALESCE(SUM(IF(order_date = '2023-05-02', amount, null)),0) AS `2023-05-02`, COALESCE(SUM(IF(order_date = '2023-05-03', amount, null)),0) AS `2023-05-03`, COALESCE(SUM(IF(order_date = '2023-05-04', amount, null)),0) AS `2023-05-04`, COALESCE(SUM(IF(order_date = '2023-05-05', amount, null)),0) AS `2023-05-05`, FROM advanced.orders GROUP BY user_id ORDER BY user_id ;문제 3) orders 테이블에서 사용자(user_id)별, 날짜(order_date)별로 주문이 있다면 1, 없다면 0으로 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)로 만들고 주문을 많이 해도 1로 처리합니다SELECT user_id, MAX(IF(order_date = '2023-05-01' AND order_id is not null, 1, 0)) AS `2023-05-01`, MAX(IF(order_date = '2023-05-02' AND order_id is not null, 1, 0)) AS `2023-05-02`, MAX(IF(order_date = '2023-05-03' AND order_id is not null, 1, 0)) AS `2023-05-03`, MAX(IF(order_date = '2023-05-04' AND order_id is not null, 1, 0)) AS `2023-05-04`, MAX(IF(order_date = '2023-05-05' AND order_id is not null, 1, 0)) AS `2023-05-05`, FROM advanced.orders GROUP BY user_id ORDER BY user_id ;문제 4)user_id = 32888이 카트 추가하기(click_cart)를 누를때 어떤 음식(food_id)을 담았나요?WITH app_order_raw AS ( SELECT user_id, event_date, event_name, user_pseudo_id, pr.key, pr.value.string_value, pr.value.int_value FROM advanced.app_logs, UNNEST(event_params) AS pr WHERE event_date = '2022-08-01' ) SELECT user_id, event_date, event_name, user_pseudo_id, MAX(IF(key = 'firebase_screen', string_value, null)) AS firebase_screen, MAX(IF(key = 'food_id', int_value, null)) AS food_id, MAX(IF(key = 'session_id', string_value, null)) AS session_id, FROM app_order_raw GROUP BY user_id, event_date, event_name, user_pseudo_id ;3. 퍼널분석문제 1) 각 퍼널의 유저 수를 집계 / 데이터 기준: 2022-08-01 ~ 2022-08-18WITH funnel_data_raw AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null)) AS screen_name, CONCAT(event_name, '-', MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null))) AS event_name_with_screen FROM advanced.app_logs, UNNEST(event_params) AS pr WHERE event_date BETWEEN '2022-08-01' AND '2022-08-18' GROUP BY 1,2,3,4,5 ) SELECT event_name_with_screen, CASE WHEN event_name_with_screen = 'screen_view-welcome' THEN 1 WHEN event_name_with_screen = 'screen_view-home' THEN 2 WHEN event_name_with_screen = 'screen_view-food_category' THEN 3 WHEN event_name_with_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_with_screen = 'screen_view-cart' THEN 5 WHEN event_name_with_screen = 'click_payment-cart' THEN 6 END AS step_number, COUNT(DISTINCT user_pseudo_id) AS cnt FROM funnel_data_raw WHERE event_name IN ('screen_view', 'click_payment') AND screen_name IN ('welcome', 'home', 'food_category', 'restaurant', 'cart') GROUP BY 1,2 ORDER BY 2 ;문제 2) 일자별 퍼널 유저 수 집계 WITH funnel_data_raw AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null)) AS screen_name, CONCAT(event_name, '-', MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null))) AS event_name_with_screen FROM advanced.app_logs, UNNEST(event_params) AS pr WHERE event_date BETWEEN '2022-08-01' AND '2022-08-18' GROUP BY 1,2,3,4,5 ) SELECT event_date, event_name_with_screen, CASE WHEN event_name_with_screen = 'screen_view-welcome' THEN 1 WHEN event_name_with_screen = 'screen_view-home' THEN 2 WHEN event_name_with_screen = 'screen_view-food_category' THEN 3 WHEN event_name_with_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_with_screen = 'screen_view-cart' THEN 5 WHEN event_name_with_screen = 'click_payment-cart' THEN 6 END AS step_number, COUNT(DISTINCT user_pseudo_id) AS cnt FROM funnel_data_raw WHERE event_name IN ('screen_view', 'click_payment') AND screen_name IN ('welcome', 'home', 'food_category', 'restaurant', 'cart') GROUP BY 1,2,3 ORDER BY 1,3 ;문제 3) 일자별 퍼널 유저 수 집계형태를 PIVOT형태로 전환하기WITH funnel_data_raw AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null)) AS screen_name, CONCAT(event_name, '-', MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null))) AS event_name_with_screen FROM advanced.app_logs, UNNEST(event_params) AS pr WHERE event_date BETWEEN '2022-08-01' AND '2022-08-18' GROUP BY 1,2,3,4,5 ), daily_funnel_user_count as ( SELECT event_date, event_name_with_screen, CASE WHEN event_name_with_screen = 'screen_view-welcome' THEN 1 WHEN event_name_with_screen = 'screen_view-home' THEN 2 WHEN event_name_with_screen = 'screen_view-food_category' THEN 3 WHEN event_name_with_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_with_screen = 'screen_view-cart' THEN 5 WHEN event_name_with_screen = 'click_payment-cart' THEN 6 END AS step_number, COUNT(DISTINCT user_pseudo_id) AS cnt FROM funnel_data_raw WHERE event_name IN ('screen_view', 'click_payment') AND screen_name IN ('welcome', 'home', 'food_category', 'restaurant', 'cart') GROUP BY 1,2,3 ORDER BY 1,3 ) SELECT event_date, MAX(IF(step_number = 1, cnt, null)) AS `screen_view-welcome`, MAX(IF(step_number = 2, cnt, null)) AS `screen_view-home`, MAX(IF(step_number = 3, cnt, null)) AS `screen_view-food_category`, MAX(IF(step_number = 4, cnt, null)) AS `screen_view-restaurant`, MAX(IF(step_number = 5, cnt, null)) AS `screen_view-cart`, MAX(IF(step_number = 6, cnt, null)) AS `click_payment-cart`, FROM daily_funnel_user_count GROUP BY ALL ORDER BY 1 ;
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 1주차 과제] ARRAY STRUCT, PIVOT, 퍼널 쿼리 연습문제
<목차>이번주차 중요 키워드 ARRAY, STRUCT 연습문제 1~4 추가 인사이트 PIVOT 연습문제 1~4퍼널분석 연습문제 1주차 느낀점 및 하고싶은 말 (솔직후기) 🔐 이번주차 중요 키워드 : ARRAY(배열), STRUCT(구조체),pivot, 퍼널분석 ✅ ARRAY, STRUCT 연습문제 1<INPUT> advanced.array_exercises<OUTPUT> title | genre## 1) array_exercises 테이블에서 각 영화(title)별로 장르(genres)을 UNNEST해서 보여주세요. SELECT title, genre FROM advanced.array_exercises AS ae CROSS JOIN UNNEST (genres) AS genre 📊 추가 인사이트 정리Q. 어느 genre에서 가장 많은 영화가 있을까?A. action에서 가장 많은 영화가 있군, 나도 action 좋아하는데 2024년 베놈(venom)2 보러가야징 ㅎ ✅ ARRAY, STRUCT 연습문제 2<INPUT> 위와 동일한 데이터 셋 advanced.array_exercises<OUTPUT> title | actor | character## 2) array_exercises 테이블에서 각 영화(title)별로 배우(actor)와 배역(character)을 보여주세요. 배우와 배역은 별도의 컬럼으로 나와야 합니다. SELECT title, actor.actor, actor.character FROM advanced.array_exercises AS ae CROSS JOIN UNNEST (actors) as actor # actors = [STRUCT(STRING,STRING)], array 형태이므로 SAFE_OFFSET으로 직접 데이터에 접근해도 # 되지만, 새로운 컬럼으로 만들어지기 때문에 long format이 아니라 wide format으로 테이블이 만들어진다. actors[SAFE_OFFSET(0)].actor AS first_actor actors[SAFE_OFFSET(1)].character AS second_character ✅ ARRAY, STRUCT 연습문제 3<INPUT> 위와 동일한 데이터 셋 advanced.array_exercises<OUTPUT> title | actor | character | genre### 3) array_exercises 테이블에서 각 영화(title)별로 배우(actor), 배역(character), 장르(genre)를 출력하세요. SELECT title, actor.actor as actor, actor.character as character, genre FROM advanced.array_exercises as ae cross join unnest(actors) as actor cross join unnest(genres) as genre ✅ ARRAY, STRUCT 연습문제 4<INPUT> 위와 동일한 데이터 셋 advanced.array_exercises <OUTPUT> user_id | event_date | event_timestamp | event_name | key | string_value | int_value | user_id_1 | user_id_2 | user_id ### 4) 앱 로그 데이터 풀어주세요. # 데이터 - 미리보기 - event_params의 데이터 유형 RECORD = STRUCT WITH app_log AS ( select user_id, event_date, event_timestamp, event_name, event_param.key as key, -- event_param.value as value, event_param.value.string_value as string_value, event_param.value.int_value as int_value -- event_params, from advanced.app_logs cross join unnest(event_params) as event_param where event_date = '2022-08-01' ) <추가 인사이트 코드 추가> select event_date, event_name, COUNT(DISTINCT user_id) as cnt from app_log GROUP BY ALL ORDER BY cnt DESC SELECT event_date, event_name, cnt, LAG(cnt) OVER (ORDER BY cnt DESC) as before_event_cnt, ROUND(1 - (cnt / LAG(cnt) OVER (ORDER BY cnt DESC)),2) as churn_rate FROM ( select event_date, event_name, COUNT(DISTINCT user_id) as cnt from app_log GROUP BY ALL ) ORDER BY cnt DESC 📊 추가 인사이트 정리 Q. 어디 event에서 가장 많이 이탈하고 있는지? 파악해주세요. A. 퍼널단계가 다음 표와 같은 순서이고, 데이터 정합성이 있다고 가정하고, 1- 리텐션율 = 이탈율이라 했을때 churn rate은 아래 3가지 단계에서 이탈율이 가장 높음 → 추후 더 자세히 살펴볼 필요가 있음 (1) click_restaurant_nearby > click_login, (2) screen_view > click_food_category (3) click_restaurant > click_banner ✅ Pivot 연습문제 1<INPUT> advanced.orders<OUTPUT> order_date, user_id_1, user_id_2, user_id_3 -- 1) orders 테이블에서 유저(user_id)별로 주문금액(amount)의 합계를 PIVOT 해주세요. -- 날짜(order_date)를 행(row)으로, user_id를 열(column)으로 만들어야 합니다. SELECT order_date, SUM(IF (user_id = 1, sum_of_amount,0)) as user_id_1, SUM(IF (user_id = 2, sum_of_amount,0)) as user_id_2, SUM(IF (user_id = 3, sum_of_amount,0)) as user_id_3 FROM ( SELECT order_date, user_id, SUM(amount) as sum_of_amount FROM advanced.orders GROUP BY order_date, user_id ) GROUP BY order_date ORDER BY order_date ✅ Pivot 연습문제 2 <INPUT> advanced.orders<OUTPUT> user_id | 2023-05-01 | 2023-05-02 | 2023-05-03 | 2023-05-04 | 2023-05-05-- 2) orders 테이블에서, 날짜(order_date)별로 유저들의 주문금액(amount)의 합계를 pivot 해주세요. user_id를 행(row)으로, order_date를 열(column)으로 만들어야 합니다. with tbl_2 as ( select user_id, order_date, sum(amount) as amount from advanced.orders group by all order by 1,2 ) # 이름 지정 오류 해결법 : column alias 지정할떄, 영어제외하고 날짜형 백틱(`)을 통해 지정할 수 있음 # MAX 대신 ANY_VALUE() 함수는 그룹화 할 대상 중 임의의 값을 선택하는 함수(null 제외) select user_id, MAX(if(order_date = "2023-05-01", amount,0)) as `2023-05-01`, MAX(if(order_date = "2023-05-02", amount,0)) as `2023-05-02`, MAX(if(order_date = "2023-05-03", amount,0)) as `2023-05-03`, MAX(if(order_date = "2023-05-04", amount,0)) as `2023-05-04`, MAX(if(order_date = "2023-05-05", amount,0)) as `2023-05-05` from tbl_2 group by all ✅ Pivot 연습문제 3 <INPUT> advanced.orders<OUTPUT> user_id | 2023-05-01 | 2023-05-02 | 2023-05-03 | 2023-05-04 | 2023-05-05--3 ) order 테이블에서 사용자(user_id)별, 날짜(order_date)별 주문이 있다면 1, 없다면 0으로 pivot 해주세요. -- user_id를 행(row)으로, order_date를 열(column)로 만들고, 주문을 많이 해도 1로 처리합니다. select user_id, MAX(if(order_date = "2023-05-01", 1,0)) as `2023-05-01`, MAX(if(order_date = "2023-05-02", 1,0)) as `2023-05-02`, MAX(if(order_date = "2023-05-03", 1,0)) as `2023-05-03`, MAX(if(order_date = "2023-05-04", 1,0)) as `2023-05-04`, MAX(if(order_date = "2023-05-05", 1,0)) as `2023-05-05` from tbl_2 group by user_id ✅ Pivot 연습문제 4 <INPUT> advanced.app_logs <OUTPUT> event_date | event_timestamp | event_name | user_id |user_pseudo_id | firebase_screen | food_id | session_id--4 ) 앱 로그 데이터 배열 PIVOT 하기 # 참고로 unique key는 user_id + event_timestamp select -- * EXCEPT(event_params) #except(컬럼) : 컬럼을 제외하고 모두 다 보여주는..! event_date, event_timestamp, event_name, user_id, user_pseudo_id, max(if(param.key = "firebase_screen", param.value.string_value, NULL)) as firebase_screen, max(if(param.key = "food_id", param.value.int_value, null)) as food_id, max(if(param.key = "session_id", param.value.string_value,null)) as session_id from advanced.app_logs cross join unnest(event_params) as param where event_date = '2022-08-01' group by all limit 100 ✅ 퍼널 분석 (Funnel) 연습문제 <INPUT> advanced.app_logs <OUTPUT> event_date | event_name_firebase_screen | step_number | user_cnt -- 퍼널 단계 : welcome, home, food_category, restaurant, cart, click_payment -- event_name, firebase_screen 연결되어 해당 퍼널의 값 | step_number | cnt -- 기간 : 2022-08-01 ~ 2022-08-18 -- 사용할 데이터 : advanced.app_logs with unnest_tbl as ( select -- * EXCEPT(event_params) #except(컬럼) : 컬럼을 제외하고 모두 다 보여주는..! event_date, event_timestamp, event_name, user_id, user_pseudo_id, max(if(param.key = "firebase_screen", param.value.string_value, NULL)) as firebase_screen, max(if(param.key = "food_id", param.value.int_value, null)) as food_id, max(if(param.key = "session_id", param.value.string_value,null)) as session_id from advanced.app_logs cross join unnest(event_params) as param where event_date BETWEEN '2022-08-01' AND "2022-08-18" group by all ) , filter as ( select * EXCEPT(event_name, firebase_screen, event_timestamp), concat(event_name, '-', firebase_screen) as event_name_firebase_screen , DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') as event_time from unnest_tbl where event_name IN ("screen_view", "click_payment") ) SELECT event_date, event_name_firebase_screen, CASE WHEN event_name_firebase_screen = "screen_view-welcome" then 1 WHEN event_name_firebase_screen = "screen_view-home" then 2 WHEN event_name_firebase_screen = "screen_view-food_category" then 3 WHEN event_name_firebase_screen = "screen_view-restaurant" then 4 WHEN event_name_firebase_screen = "screen_view-cart" then 5 WHEN event_name_firebase_screen = "click_payment-cart" then 6 ELSE NULL END AS step_number, COUNT(DISTINCT user_pseudo_id) as cnt FROM filter group by all having step_number IS NOT NULL order by event_date 👩💻 1주차 느낀점 및 하고싶은 말 (솔직후기) (1) ARRAY, STRUCT - 빅쿼리 사용법은 처음이라 이 문법은 생소했습니다. 학습완료! (2) PIVOT - 태블로로 시각화할떄 long format으로 시각화하려다 보니 쿼리 로딩시간이 초과되어 alert을 받은 적이 있습니다. ㅎㅎ.. 이때 SQL로는 pivot하는 방법을 잘 몰랐는데 학습완료! (3) 퍼널 분석 - 퍼널분석의 큰 흐름은 대충 알고 있었는데 퍼널분석 종류나 집계기준 종류, 해석하는 법 등 퍼널 분석 결과를 어떻게 바라보아야 하는지 정리해주셔서 명쾌했다.쿼리 기초문법은 이론적으로 알고 있었으나 첫 데이터관련 인턴을 하면서, 실무에서 하는 업무는 그보다 더 복잡하고, 비즈니스적으로 어떻게 바라보아야 하고, 해석해야하는지 중요하다는 것을 알게되었습니다. 그러나, 실무에서 또한 의문점이 발생해도 명쾌한 해답을 얻기 어려웠습니다. 이에 데이터 분석가를 희망은 하고 있지만, 어떻게 공부해야 할지에 대한 지속적으로 의문이 들었습니다. 그때 마침, 카일스쿨님의 강의를 우연히 접하게 되었고, 그동안 조각조각이었던 개념들을 SQL로 작성하고, 시각화까지 하는방법들을 모두 보여주시면서 데이터분석에 대한 자신감이 붙을 수 있었던 것 같습니다. 카일스쿨님이 그동안 걸어오신 길이 모두 보이는 강의였습니다. 강의 만들어주셔서 정말 감사합니다. 🙏
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
바짝스터디 1주차 과제
Array, Struct 연습문제-- genre unnest select title, genre from `advanced.array_exercises` cross join UNNEST(genres) as genre; -- actor unnest select title, actor.actor, actor.character from `advanced.array_exercises` cross join UNNEST(actors) as actor; -- actor, genre unnest select title, actor.actor, actor.character, genre from `advanced.array_exercises` cross join UNNEST(actors) as actor cross join UNNEST(genres) as genre;PIVOT 연습문제-- 일자 별 사용자의 이벤트 발생 횟수 select order_date , sum(if(user_id=1, amount, 0)) as user_1, sum(if(user_id=2, amount, 0)) as user_2, sum(if(user_id=3, amount, 0)) as user_3 from `advanced.orders` group by order_date order by order_date; -- 사용자 별 일일 이벤트 발생 횟수 select user_id, sum(if(order_date='2023-05-01', amount, 0)) as `2023-05-01`, sum(if(order_date='2023-05-02', amount, 0)) as `2023-05-02`, sum(if(order_date='2023-05-03', amount, 0)) as `2023-05-03`, sum(if(order_date='2023-05-04', amount, 0)) as `2023-05-04`, sum(if(order_date='2023-05-05', amount, 0)) as `2023-05-05` from `advanced.orders` group by user_id order by user_id; -- 사용자 별 일일 이벤트 발생 여부 select user_id, max(if(order_date='2023-05-01', 1, 0)) as `2023-05-01`, max(if(order_date='2023-05-02', 1, 0)) as `2023-05-02`, max(if(order_date='2023-05-03', 1, 0)) as `2023-05-03`, max(if(order_date='2023-05-04', 1, 0)) as `2023-05-04`, max(if(order_date='2023-05-05', 1, 0)) as `2023-05-05` from `advanced.orders` group by user_id order by user_id;퍼널 연습문제with data_event_name_with_screen_step as ( with data_event_name_with_screen as ( with base as (select event_date, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') as event_time, event_name, user_id, user_pseudo_id, max(if(event_param.key = 'firebase_screen',event_param.value.string_value , null)) as firebase_screen from `advanced.app_logs` cross join unnest(event_params) as event_param where event_date between '2022-08-01' and '2022-08-18' group by all ) select * except(event_name, firebase_screen), concat(event_name, '-', firebase_screen) as event_name_with_screen from base ) select COUNT(DISTINCT user_pseudo_id) AS cnt, event_date, event_name_with_screen, (CASE WHEN event_name_with_screen = "screen_view-welcome" THEN 1 WHEN event_name_with_screen = "screen_view-home" THEN 2 WHEN event_name_with_screen = "screen_view-food_category" THEN 3 WHEN event_name_with_screen = "screen_view-restaurant" THEN 4 WHEN event_name_with_screen = "screen_view-cart" THEN 5 WHEN event_name_with_screen = "click_payment-cart" THEN 6 ELSE NULL END) as step_number from data_event_name_with_screen group by all having step_number is not null order by event_date ) -- 이벤트로 pivot select event_date, sum(if(event_name_with_screen='screen_view-welcome', cnt, 0)) as `screen_view-welcome`, sum(if(event_name_with_screen='screen_view-home', cnt, 0)) as `screen_view-home`, sum(if(event_name_with_screen='screen_view-food_category', cnt, 0)) as `screen_view-food_category`, sum(if(event_name_with_screen='screen_view-restaurant', cnt, 0)) as `screen_view-restaurant`, sum(if(event_name_with_screen='screen_view-cart', cnt, 0)) as `screen_view-cart`, sum(if(event_name_with_screen='click_payment-cart', cnt, 0)) as `click_payment-cart` from data_event_name_with_screen_step group by all order by event_date ; 이전 부터 쿼리로 퍼널 분석하기 좋은 방법이 없을까 생각을 했었는데 방향성을 알게 되어 좋은 것 같습니다.
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[바짝스터디 1주차 과제] ARRAY, STRUCT 연습 문제 / PIVOT 연습 문제 / 퍼널 쿼리 연습 문제
1. ARRAY, STRUCT 연습문제문제 1) array_exercise테이블에서 각 영화(title)별로 장르(genres)를 UNNEST 해서 보여주세요SELECT title, genres FROM `analystic-project.advanced.array_exercises` , UNNEST(genres) AS genres ;문제 2) array_exercises 테이블에서 각 영화(title)별로 배우(actor)와 배역(character)을 보여주세요. 배우와 배역은 별도의 컬럼으로 나와야합니다SELECT title, actors.actor, actors.character FROM `analystic-project.advanced.array_exercises` , UNNEST(actors) AS actors ;문제 3) array_exercises 테이블에서 각 영화(title)별로 배우(actor), 배역(character), 장르 (genre)를 출력하세요. 한 Row에 배우, 배역, 장르가 모두 표시되어야 합니다 SELECT title, actors.actor, actors.character, genres FROM `analystic-project.advanced.array_exercises` , UNNEST(actors) AS actors, UNNEST(genres) genres ;문제 4) 앱 로그 데이터(app_logs) 배열 풀기SELECT user_id, event_date, event_name, user_pseudo_id, pr.key, pr.value.string_value, pr.value.int_value FROM `analystic-project.advanced.app_logs` , UNNEST(event_params) AS pr WHERE event_date = "2022-08-01" LIMIT 1000 ;2. PIVOT 연습문제 풀이문제 1) orders 테이블에서 유저(user_id)별로 주문 금액(amount)의 합계를 PIVOT해주세요. 날짜(order_date)를 행(Row)으로, user_id를 열(Column)으로 만들어야 합니다SELECT order_date, COALESCE(SUM(IF(user_id = 1, amount, null)),0) AS user_1, COALESCE(SUM(IF(user_id = 2, amount, null)),0) AS user_2, COALESCE(SUM(IF(user_id = 3, amount, null)),0) AS user_3 FROM advanced.orders GROUP BY order_date ORDER BY order_date ;문제 2) orders 테이블에서 날짜(order_date)별로 유저들의 주문 금액(amount)의 합계를 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)으로 만들어야 합니다SELECT user_id, COALESCE(SUM(IF(order_date = '2023-05-01', amount, null)),0) AS `2023-05-01`, COALESCE(SUM(IF(order_date = '2023-05-02', amount, null)),0) AS `2023-05-02`, COALESCE(SUM(IF(order_date = '2023-05-03', amount, null)),0) AS `2023-05-03`, COALESCE(SUM(IF(order_date = '2023-05-04', amount, null)),0) AS `2023-05-04`, COALESCE(SUM(IF(order_date = '2023-05-05', amount, null)),0) AS `2023-05-05`, FROM advanced.orders GROUP BY user_id ORDER BY user_id ;문제 3) orders 테이블에서 사용자(user_id)별, 날짜(order_date)별로 주문이 있다면 1, 없다면 0으로 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)로 만들고 주문을 많이 해도 1로 처리합니다SELECT user_id, MAX(IF(order_date = '2023-05-01' AND order_id is not null, 1, 0)) AS `2023-05-01`, MAX(IF(order_date = '2023-05-02' AND order_id is not null, 1, 0)) AS `2023-05-02`, MAX(IF(order_date = '2023-05-03' AND order_id is not null, 1, 0)) AS `2023-05-03`, MAX(IF(order_date = '2023-05-04' AND order_id is not null, 1, 0)) AS `2023-05-04`, MAX(IF(order_date = '2023-05-05' AND order_id is not null, 1, 0)) AS `2023-05-05`, FROM advanced.orders GROUP BY user_id ORDER BY user_id ;문제 4)user_id = 32888이 카트 추가하기(click_cart)를 누를때 어떤 음식(food_id)을 담았나요?WITH app_order_raw AS ( SELECT user_id, event_date, event_name, user_pseudo_id, pr.key, pr.value.string_value, pr.value.int_value FROM advanced.app_logs, UNNEST(event_params) AS pr WHERE event_date = '2022-08-01' ) SELECT user_id, event_date, event_name, user_pseudo_id, MAX(IF(key = 'firebase_screen', string_value, null)) AS firebase_screen, MAX(IF(key = 'food_id', int_value, null)) AS food_id, MAX(IF(key = 'session_id', string_value, null)) AS session_id, FROM app_order_raw GROUP BY user_id, event_date, event_name, user_pseudo_id ;3. 퍼널분석문제 1) 각 퍼널의 유저 수를 집계 / 데이터 기준: 2022-08-01 ~ 2022-08-18WITH funnel_data_raw AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null)) AS screen_name, CONCAT(event_name, '-', MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null))) AS event_name_with_screen FROM advanced.app_logs, UNNEST(event_params) AS pr WHERE event_date BETWEEN '2022-08-01' AND '2022-08-18' GROUP BY 1,2,3,4,5 ) SELECT event_name_with_screen, CASE WHEN event_name_with_screen = 'screen_view-welcome' THEN 1 WHEN event_name_with_screen = 'screen_view-home' THEN 2 WHEN event_name_with_screen = 'screen_view-food_category' THEN 3 WHEN event_name_with_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_with_screen = 'screen_view-cart' THEN 5 WHEN event_name_with_screen = 'click_payment-cart' THEN 6 END AS step_number, COUNT(DISTINCT user_pseudo_id) AS cnt FROM funnel_data_raw WHERE event_name IN ('screen_view', 'click_payment') AND screen_name IN ('welcome', 'home', 'food_category', 'restaurant', 'cart') GROUP BY 1,2 ORDER BY 2 ;문제 2) 일자별 퍼널 유저 수 집계 WITH funnel_data_raw AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null)) AS screen_name, CONCAT(event_name, '-', MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null))) AS event_name_with_screen FROM advanced.app_logs, UNNEST(event_params) AS pr WHERE event_date BETWEEN '2022-08-01' AND '2022-08-18' GROUP BY 1,2,3,4,5 ) SELECT event_date, event_name_with_screen, CASE WHEN event_name_with_screen = 'screen_view-welcome' THEN 1 WHEN event_name_with_screen = 'screen_view-home' THEN 2 WHEN event_name_with_screen = 'screen_view-food_category' THEN 3 WHEN event_name_with_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_with_screen = 'screen_view-cart' THEN 5 WHEN event_name_with_screen = 'click_payment-cart' THEN 6 END AS step_number, COUNT(DISTINCT user_pseudo_id) AS cnt FROM funnel_data_raw WHERE event_name IN ('screen_view', 'click_payment') AND screen_name IN ('welcome', 'home', 'food_category', 'restaurant', 'cart') GROUP BY 1,2,3 ORDER BY 1,3 ;문제 3) 일자별 퍼널 유저 수 집계형태를 PIVOT형태로 전환하기WITH funnel_data_raw AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null)) AS screen_name, CONCAT(event_name, '-', MAX(IF(pr.key = 'firebase_screen', pr.value.string_value, null))) AS event_name_with_screen FROM advanced.app_logs, UNNEST(event_params) AS pr WHERE event_date BETWEEN '2022-08-01' AND '2022-08-18' GROUP BY 1,2,3,4,5 ), daily_funnel_user_count as ( SELECT event_date, event_name_with_screen, CASE WHEN event_name_with_screen = 'screen_view-welcome' THEN 1 WHEN event_name_with_screen = 'screen_view-home' THEN 2 WHEN event_name_with_screen = 'screen_view-food_category' THEN 3 WHEN event_name_with_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_with_screen = 'screen_view-cart' THEN 5 WHEN event_name_with_screen = 'click_payment-cart' THEN 6 END AS step_number, COUNT(DISTINCT user_pseudo_id) AS cnt FROM funnel_data_raw WHERE event_name IN ('screen_view', 'click_payment') AND screen_name IN ('welcome', 'home', 'food_category', 'restaurant', 'cart') GROUP BY 1,2,3 ORDER BY 1,3 ) SELECT event_date, MAX(IF(step_number = 1, cnt, null)) AS `screen_view-welcome`, MAX(IF(step_number = 2, cnt, null)) AS `screen_view-home`, MAX(IF(step_number = 3, cnt, null)) AS `screen_view-food_category`, MAX(IF(step_number = 4, cnt, null)) AS `screen_view-restaurant`, MAX(IF(step_number = 5, cnt, null)) AS `screen_view-cart`, MAX(IF(step_number = 6, cnt, null)) AS `click_payment-cart`, FROM daily_funnel_user_count GROUP BY ALL ORDER BY 1 ;4. 새롭게 배웠던 점쿼리를 활용하여 PIVOT 형태로 데이터를 변환 할 수 있는 스킬을 습득함."GROUP BY ALL" 이라는 방식으로 전체 집계를 할 수 있다는 방법을 처음 알게 됨.`` (백틱) 기호를 사용하면 특수문자 '-'와 한글까지 컬럼명으로 사용할 수 있다는 것을 처음 알게 됨.실무에서 FROM절에 UNNEST() 함수 앞에 ',' 를 왜(Why) 써야되는지에 대한 궁금했는데, CROSS JOIN의 약자로 사용하고 있다는 것을 처음 알게 됨.ARRAY와 STRUCT 구조의 차이점과 해당 데이터타입 구조의 데이터를 추출하기 위한 방법을 명확하게 알게 됨.5. 느낀점사소하지만 스킬적인 부분으로 새롭게 배웠던 부분은 따로 잘 정리해두어 업무에 바로 사용해볼 예정. 또한 데이터 탐색 방법으로 Funnel 사용하는데 있어, Open 퍼널과 Closed 퍼널 방식의 종류를 배울 수 있어 뿌듯했음.
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[바짝스터디 1주차 과제] ARRAY, STRUCT 연습 문제 / PIVOT 연습 문제 / 퍼널 쿼리 연습 문제
다른 문제들은 미리 다 혼자 풀면 해결 가능했는데, 맨 마지막 문제는 혼자 시도해서 답은 맞게 나왔으나 쿼리 접근 방식이 상이했습니다.WITH 구문으로 2~3개의 데이터를 미리 구축해두고 거기서 퍼널 분석을 하시던데, 해당 쿼리의 경우 좀 더 익숙해지면 정말 실무에서도 바로바로 써먹을 수 있겠다는 생각이 들더군요!피벗은 case when으로 작성했었는데, max와 if로 작성하는 방법을 알아가서 좋았습니다.다음주 강의도 열심히 들어보겠습니다! 감사합니다. SELECT title, genreFROM advanced.array_exercisesCROSS JOIN UNNEST(genres) as genre; SELECT title, act.actor as actor, act.character as characterFROM advanced.array_exercisesCROSS JOIN UNNEST(actors) as act SELECT title, act.actor as actor, act.character as character, genreFROM advanced.array_exercisesCROSS JOIN UNNEST(actors) as actCROSS JOIN UNNEST(genres) as genre; WITH base AS ( SELECT user_id, event_date, event_name, user_pseudo_id, events.key, events.value FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS events)SELECT event_date, event_name, COUNT(DISTINCT user_id) AS cntFROM baseGROUP BY ALLORDER BY cnt DESC SELECT order_date, MAX(IF(user_id=1, amount, 0)) as user_1, MAX(IF(user_id=2, amount, 0)) as user_2, MAX(IF(user_id=3, amount, 0)) as user_3FROM advanced.ordersGROUP BY order_dateORDER BY order_date; SELECT user_id, SUM(IF(order_date='2023-05-01', amount, 0)) as 2023-05-01, SUM(IF(order_date='2023-05-02', amount, 0)) as 2023-05-02, SUM(IF(order_date='2023-05-03', amount, 0)) as 2023-05-03, SUM(IF(order_date='2023-05-04', amount, 0)) as 2023-05-04, SUM(IF(order_date='2023-05-05', amount, 0)) as 2023-05-05FROM advanced.ordersGROUP BY user_idORDER BY user_id; SELECT user_id, MAX(IF(order_date = '2023-05-01', 1, 0)) AS 2023-05-01, MAX(IF(order_date = '2023-05-02', 1, 0)) AS 2023-05-02, MAX(IF(order_date = '2023-05-03', 1, 0)) AS 2023-05-03, MAX(IF(order_date = '2023-05-04', 1, 0)) AS 2023-05-04, MAX(IF(order_date = '2023-05-05', 1, 0)) AS 2023-05-05,FROM advanced.ordersGROUP BY user_idORDER BY user_id; SELECT user_id , event_date , event_timestamp , event_name , user_pseudo_id , MAX(IF(events.key = 'firebase_screen', events.value.string_value, NULL)) AS firebase_screen , MAX(IF(events.key = 'food_id', events.value.int_value, NULL)) AS food_id , MAX(IF(events.key = 'session_id', events.value.int_value, NULL)) AS session_idFROM advanced.app_logsCROSS JOIN UNNEST(event_params) AS eventsWHERE user_id = 32888 AND event_name = 'click_cart'GROUP BY ALL #정석 방법WITH base AS ( SELECT event_date , event_timestamp , event_name , user_id , user_pseudo_id , platform , MAX(IF(event_param.key = 'firebase_screen', event_param.value.string_value, NULL)) AS firebase_screen , MAX(IF(event_param.key = 'session_id', event_param.value.string_value, NULL)) AS session_id FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event_param WHERE event_date BETWEEN '2022-08-01' AND '2022-08-18' GROUP BY ALL), filter_event_and_concat_event_and_screen AS ( SELECT * EXCEPT(event_name, firebase_screen, event_timestamp) , DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime , CONCAT(event_name, '-', firebase_screen) AS event_name_with_screen FROM base WHERE event_name IN ('screen_view', 'click_payment'))SELECT event_name_with_screen , CASE WHEN event_name_with_screen = 'screen_view-welcome' THEN 1 WHEN event_name_with_screen = 'screen_view-home' THEN 2 WHEN event_name_with_screen = 'screen_view-food_category' THEN 3 WHEN event_name_with_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_with_screen = 'screen_view-cart' THEN 5 WHEN event_name_with_screen = 'click_payment-cart' THEN 6 ELSE NULL END AS STEP_NUMBER , COUNT(DISTINCT user_pseudo_id) AS cntFROM filter_event_and_concat_event_and_screenGROUP BY ALLHAVING step_number IS NOT NULL #내가 시도한 방법SELECT CONCAT(event_name, param.value.string_value) , COUNT(DISTINCT user_pseudo_id)FROM advanced.app_logsCROSS JOIN UNNEST(event_params) AS paramWHERE event_name IN ('screen_view', 'click_payment')AND param.value.string_value IN ('welcome', 'home', 'food_category', 'restaurant', 'cart')AND event_date BETWEEN '2022-08-01' AND '2022-08-18'GROUP BY event_name, param.value.string_value
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[바짝스터디 1주차 과제] ARRAY, STRUCT 연습 문제 / PIVOT 연습 문제 / 퍼널 쿼리 연습 문제
ARRAY, STRUCT-- array_exercises 테이블 -- 영화(title)별로 장르(genres) UNNEST SELECT movie_id, title, genre FROM `advanced.array_exercises`, CROSS JOIN UNNEST(genres) AS genre -- array_exercises 테이블 -- 영화(title) 별로 배우(actor), 배역(character) UNNEST SELECT title, actors.actor, actors.character FROM `advanced.array_exercises` CROSS JOIN UNNEST(actors) AS actors -- array_exercises 테이블 -- 영화(title) 별로 배우(actor), 배역(character), 장르(genre) UNNEST SELECT title, actor.actor, actor.character, genre FROM `advanced.array_exercises` CROSS JOIN UNNEST(actors) AS actor CROSS JOIN UNNEST(genres) AS genre -- 앱 로그 데이터(app_logs) UNNEST SELECT event_date, event_timestamp, event_name, event_param.key AS key, event_param.value.string_value AS string_value, event_param.value.int_value AS int_value, user_id, user_pseudo_id, platform FROM `advanced.app_logs` CROSS JOIN UNNEST(event_params) AS event_param PIVOT-- 1.orders 테이블에서 유저(user_id) 별로 주문 금액(amount)의 합계를 PIVOT해주세요. 날짜(order_date)를 행(Row)으로, user_id를 열(Column)으로 만들어야 합니다. SELECT order_date, SUM(IF(user_id = 1, amount, 0)) AS user_1, SUM(IF(user_id = 2, amount, 0)) AS user_2, SUM(IF(user_id = 3, amount, 0)) AS user_3 FROM advanced.orders GROUP BY ALL ORDER BY order_date -- 2. orders 테이블에서 날짜(order_date) 별로 유저들의 주문 금액(amount)의 합계를 PIVOT 해주세요. user_id를 행(Row)으로, order_Date를 열(Column)으로 만들어야 합니다. SELECT user_id, SUM(IF(order_date = "2023-05-01", amount, 0)) AS `2023-05-01`, SUM(IF(order_date = "2023-05-02", amount, 0)) AS `2023-05-02`, SUM(IF(order_date = "2023-05-03", amount, 0)) AS `2023-05-03`, SUM(IF(order_date = "2023-05-04", amount, 0)) AS `2023-05-04`, SUM(IF(order_date = "2023-05-05", amount, 0)) AS `2023-05-05`, FROM advanced.orders GROUP BY ALL ORDER BY user_id -- 3. orders 테이블에서 사용자(user_id)별, 날짜(order_date)별로 주문이 있다면 1, 없다면 0으로 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)로 만들고 주문을 많이 해도 1로 처리합니다. SELECT user_id, MAX(IF(order_date = "2023-05-01", 1, 0)) AS `2023-05-01`, MAX(IF(order_date = "2023-05-02", 1, 0)) AS `2023-05-02`, MAX(IF(order_date = "2023-05-03", 1, 0)) AS `2023-05-03`, MAX(IF(order_date = "2023-05-04", 1, 0)) AS `2023-05-04`, MAX(IF(order_date = "2023-05-05", 1, 0)) AS `2023-05-05`, FROM advanced.orders GROUP BY ALL ORDER BY user_id -- 4.앱 로그 PIVOT WITH base AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, MAX(IF(param.key = "firebase_screen", param.value.string_value, NULL)) AS firebase_screen, MAX(IF(param.key = "food_id", param.value.int_value, NULL)) AS food_id, MAX(IF(param.key = "session_id", param.value.string_value, NULL)) AS session_id, FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS param GROUP BY ALL ) SELECT event_date, COUNT(user_id) AS user_cnt FROM base WHERE event_name = "click_cart" GROUP BY ALL FUNNELwith base as ( select event_date, event_timestamp, event_name, user_id, user_pseudo_id, platform, -- event_param max(if(event_param.key = 'firebase_screen', event_param.value.string_value, null)) as firebase_screen, -- max(if(event_param.key = 'food_id', event_param.value.int_value, null)) as food_id, max(if(event_param.key = 'session_id', event_param.value.string_value, null)) as session_id from advanced.app_logs cross join unnest(event_params) as event_param where event_date between '2022-08-01' and '2022-08-18' group by all ), filter_event_and_concat_event_and_screen as ( select * except(event_name, firebase_screen, event_timestamp), concat(event_name, '-', firebase_screen) as event_name_with_screen, datetime(timestamp_micros(event_timestamp), 'Asia/Seoul') as event_datetime from base where event_name in ('screen_view', 'click_payment' ) ) select event_date, event_name_with_screen, case when event_name_with_screen = 'screen_view-welcome' then 1 when event_name_with_screen = 'screen_view-home' then 2 when event_name_with_screen = 'screen_view-food_category' then 3 when event_name_with_screen = 'screen_view-restaurant' then 4 when event_name_with_screen = 'screen_view-cart' then 5 when event_name_with_screen = 'click_payment-cart' then 6 else null end as step_number, count(distinct user_pseudo_id) as cnt from filter_event_and_concat_event_and_screen group by all having step_number is not null order by 1, 3 SELECT event_date, MAX(IF(event_name_with_screen ="screen_view-welcome", cnt, NULL)) AS `screen_view-welcome`, MAX(IF(event_name_with_screen ="screen_view-home", cnt, NULL)) AS `screen_view-home`, MAX(IF(event_name_with_screen ="screen_view-food_category", cnt, NULL)) AS `screen_view-food_category`, MAX(IF(event_name_with_screen ="screen_view-restaurant", cnt, NULL)) AS `screen_view-restaurant`, MAX(IF(event_name_with_screen ="screen_view-cart", cnt, NULL)) AS `screen_view-cart`, MAX(IF(event_name_with_screen ="click_payment-cart", cnt, NULL)) AS `click_payment-cart` FROM main GROUP BY ALL ORDER BY all
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 1주차 과제] ARRAY, STRUCT 연습 문제/ PIVOT 연습문제/ 퍼널 쿼리 연습 문제
ARRAY / STRUCT 연습문제1# ARRAY, STRUCT 연습 문제1 -- genres 배열을 UNNEST하여 펼치고, 각 장르를 개별 행으로 분리하여 영화 제목별 장르를 나타냈습니다. SELECT title, genre FROM `bqmaster.advanced.array_exercises`, UNNEST(genres) AS genre연습문제2# ARRAY, STRUCT 연습 문제1 -- genres 배열을 UNNEST하여 펼치고, 각 장르를 개별 행으로 분리하여 영화 제목별 장르를 나타냈습니다. SELECT title, genre FROM `bqmaster.advanced.array_exercises`, UNNEST(genres) AS genre연습문제3# ARRAY, STRUCT 연습 문제1 -- genres 배열을 UNNEST하여 펼치고, 각 장르를 개별 행으로 분리하여 영화 제목별 장르를 나타냈습니다. SELECT title, genre FROM `bqmaster.advanced.array_exercises`, UNNEST(genres) AS genre연습문제4# ARRAY, STRUCT 연습 문제1 -- genres 배열을 UNNEST하여 펼치고, 각 장르를 개별 행으로 분리하여 영화 제목별 장르를 나타냈습니다. SELECT title, genre FROM `bqmaster.advanced.array_exercises`, UNNEST(genres) AS genre PIVOT 연습문제1# PIVOT 연습 문제1 -- 날짜별로 사용자별 amount 합산을 나타냈습니다. -- IF 문을 사용해 user_id가 1, 2, 3인 경우에만 amount를 집계하고, 나머지는 0으로 처리했습니다. -- 데이터셋 특성상 SUM과 MAX 모두 사용할 수 있습니다. SELECT order_date, SUM(IF(user_id = 1, amount, 0)) AS user_1, SUM(IF(user_id = 2, amount, 0)) AS user_2, SUM(IF(user_id = 3, amount, 0)) AS user_3, FROM `bqmaster.advanced.orders` GROUP BY ALL ORDER BY order_date연습문제2# PIVOT 연습 문제2 -- 사용자별로 날짜별 amount 합산을 나타냈습니다. -- IF 문을 사용해 order_date가 지정된 날짜에 해당하는 경우에만 amount를 집계하고, 나머지는 0으로 처리했습니다. -- 데이터셋 특성상 SUM과 MAX 모두 사용할 수 있습니다. SELECT user_id, SUM(IF(order_date = '2023-05-01', amount, 0)) AS `2023-05-01`, SUM(IF(order_date = '2023-05-02', amount, 0)) AS `2023-05-02`, SUM(IF(order_date = '2023-05-03', amount, 0)) AS `2023-05-03`, SUM(IF(order_date = '2023-05-04', amount, 0)) AS `2023-05-04`, SUM(IF(order_date = '2023-05-05', amount, 0)) AS `2023-05-05` FROM `bqmaster.advanced.orders` GROUP BY ALL ORDER BY user_id연습문제3# PIVOT 연습 문제3 -- 사용자별로 날짜별 주문 여부를 나타냈습니다. -- IF 문을 사용해 해당 날짜에 주문이 있는 경우 1, 없는 경우 0으로 표시하고, MAX를 사용해 날짜별로 최대값(1 또는 0)을 반환합니다. -- 데이터셋 특성상 SUM과 MAX 모두 사용할 수 있습니다. SELECT user_id, MAX(IF(order_date = '2023-05-01', 1, 0)) AS `2023-05-01`, MAX(IF(order_date = '2023-05-02', 1, 0)) AS `2023-05-02`, MAX(IF(order_date = '2023-05-03', 1, 0)) AS `2023-05-03`, MAX(IF(order_date = '2023-05-04', 1, 0)) AS `2023-05-04`, MAX(IF(order_date = '2023-05-05', 1, 0)) AS `2023-05-05`, FROM `bqmaster.advanced.orders` GROUP BY ALL ORDER BY user_id연습문제4# PIVOT 연습 문제4 -- user_id가 32888인 사용자가 click_cart 이벤트 발생 시의 데이터를 나타냈습니다. -- event_params 배열을 UNNEST하여 param으로 분리한 후, firebase_screen, food_id, session_id를 개별 컬럼으로 표시했습니다. WITH base AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, MAX(IF(param.key = 'firebase_screen', param.value.string_value, NULL)) AS firebase_screen, MAX(IF(param.key = 'food_id', param.value.int_value, NULL)) AS food_id, MAX(IF(param.key = 'session_id', param.value.string_value, NULL)) AS session_id FROM `bqmaster.advanced.app_logs`, UNNEST(event_params) AS param GROUP BY ALL ) SELECT * FROM base WHERE 1=1 AND user_id = 32888 AND event_name = 'click_cart' FUNNEL연습문제1# FUNNEL 연습 문제1 -- WITH 구문을 사용해 base 테이블을 생성하여 이벤트 이름과 화면 이름을 포함한 event_name_with_screen 컬럼을 만들었습니다. -- event_date는 2022-08-01에서 2022-08-18 사이로 설정했습니다. WITH base AS ( SELECT event_date, CONCAT(event_name, '-', param.value.string_value) AS event_name_with_screen, user_pseudo_id FROM `bqmaster.advanced.app_logs`, UNNEST(event_params) AS param WHERE 1=1 AND event_name IN ('screen_view', 'click_payment') AND param.key = 'firebase_screen' AND event_date BETWEEN '2022-08-01' AND '2022-08-18' ) # 일자 상관 없이 퍼널의 유저 수를 집계 SELECT event_name_with_screen, CASE WHEN event_name_with_screen = 'screen_view-welcome' THEN 1 WHEN event_name_with_screen = 'screen_view-home' THEN 2 WHEN event_name_with_screen = 'screen_view-food_category' THEN 3 WHEN event_name_with_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_with_screen = 'screen_view-cart' THEN 5 WHEN event_name_with_screen = 'click_payment-cart' THEN 6 ELSE NULL END AS step, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM base GROUP BY ALL HAVING step IS NOT NULL ORDER BY step # 일자별 퍼널의 유저 수를 집계 SELECT event_date, event_name_with_screen, CASE WHEN event_name_with_screen = 'screen_view-welcome' THEN 1 WHEN event_name_with_screen = 'screen_view-home' THEN 2 WHEN event_name_with_screen = 'screen_view-food_category' THEN 3 WHEN event_name_with_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_with_screen = 'screen_view-cart' THEN 5 WHEN event_name_with_screen = 'click_payment-cart' THEN 6 ELSE NULL END AS step, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM base GROUP BY ALL HAVING step IS NOT NULL ORDER BY event_date, step연습문제2# FUNNEL 연습 문제2 -- 윈도우 함수를 사용해 이탈률을 계산해 봤습니다. -- LAG을 사용하여 이전 사용자수 값을 가져와 계산에 활용했습니다. WITH base AS ( SELECT event_date, CONCAT(event_name, '-', param.value.string_value) AS event_name_with_screen, user_pseudo_id FROM `bqmaster.advanced.app_logs`, UNNEST(event_params) AS param WHERE 1=1 AND event_name IN ('screen_view', 'click_payment') AND param.key = 'firebase_screen' AND event_date BETWEEN '2022-08-01' AND '2022-08-18' ), add_step AS ( SELECT event_name_with_screen, CASE WHEN event_name_with_screen = 'screen_view-welcome' THEN 1 WHEN event_name_with_screen = 'screen_view-home' THEN 2 WHEN event_name_with_screen = 'screen_view-food_category' THEN 3 WHEN event_name_with_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_with_screen = 'screen_view-cart' THEN 5 WHEN event_name_with_screen = 'click_payment-cart' THEN 6 ELSE NULL END AS funnel_step, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM base GROUP BY ALL HAVING funnel_step IS NOT NULL ), add_total_user_cnt AS ( SELECT *, LAG(user_cnt) OVER(ORDER BY funnel_step) AS previous_step_user_cnt FROM add_step ) SELECT *, ROUND((previous_step_user_cnt - user_cnt) / previous_step_user_cnt, 2) AS bounce_rate FROM add_total_user_cnt ORDER BY funnel_step연습문제3# FUNNEL 연습 문제3 -- PIVOT을 사용해 화면별로 사용자 수를 컬럼으로 나열하여 각 화면별 날짜별 사용자 수를 확인할 수 있도록 구성했습니다. WITH base AS ( SELECT event_date, CONCAT(event_name, '-', param.value.string_value) AS event_name_with_screen, user_pseudo_id FROM `bqmaster.advanced.app_logs`, UNNEST(event_params) AS param WHERE 1=1 AND event_name IN ('screen_view', 'click_payment') AND param.key = 'firebase_screen' AND event_date BETWEEN '2022-08-01' AND '2022-08-18' ), add_user_cnt AS ( SELECT event_date, event_name_with_screen, COUNT(DISTINCT user_pseudo_id) AS user_cnt FROM base GROUP BY ALL ) SELECT event_date, MAX(IF(event_name_with_screen = 'screen_view-welcome', user_cnt, NULL)) AS screen_view_welcome, MAX(IF(event_name_with_screen = 'screen_view-home', user_cnt, NULL)) AS screen_view_home, MAX(IF(event_name_with_screen = 'screen_view-food_category', user_cnt, NULL)) AS screen_view_food_category, MAX(IF(event_name_with_screen = 'screen_view-restaurant', user_cnt, NULL)) AS screen_view_restaurant, MAX(IF(event_name_with_screen = 'screen_view-cart', user_cnt, NULL)) AS screen_view_cart, MAX(IF(event_name_with_screen = 'click_payment-cart', user_cnt, NULL)) AS click_payment_cart FROM add_user_cnt GROUP BY ALL ORDER BY event_date
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[바짝스터디 1주차 과제] ARRAY, STRUCT 연습 문제 / PIVOT 연습 문제 / 퍼널 쿼리 연습 문제
ARRAY(배열) / STRUCT(구조체)# 1. array_exercises 테이블에서 각 영화(title)별로 장르(genres)를 UNNEST해서 보여주세요 SELECT title, genre # UNNEST에서 사용한 새로운 컬럼으로 사용 / 기존 ARRAY_COULMN은 사용 X FROM advanced.array_exercises CROSS JOIN UNNEST(genres) AS genre; # 2. array_exercises 테이블에서 각 영화(title)별로 배우(actor)와 배역(character)을 보여주세요. 배우와 배역은 별도의 컬럼으로 나와야 합니다 SELECT title, actor.actor, actor.character FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS actor; # 3. array_exercises 테이블에서 각 영화(title)별로 배우(actor), 배역(character), 장르(genre)를 출력하세요. 한 Row에 배우, 배역, 장르가 모두 표시되어야 합니다 SELECT title, actor.actor, actor.character, genre FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS actor CROSS JOIN UNNEST(genres) AS genre; # 4. 앱 로그 데이터(app_logs)의 배열을 풀어주세요 SELECT user_id, event_date, event_name, user_pseudo_id, event_param.key, event_param.value.string_value, event_param.value.int_value FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event_param WHERE event_date = "2022-08-01"; PIVOT(피봇): 축을 중심으로 회전한다# 1. orders테이블에서 유저(user_id)별로 주문 금액(amount)의 합계를 PIVOT해주세요. 날짜(order_date)를 행(Row)으로, user_id를 열(Column)으로 만들어야 합니다 SELECT order_date, SUM(IF(user_id = 1, amount, 0)) AS user_1, SUM(IF(user_id = 2, amount, 0)) AS user_2, SUM(IF(user_id = 3, amount, 0)) AS user_3, FROM advanced.orders GROUP BY ALL ORDER BY order_date; # 2. orders 테이블에서 날짜(order_date)별로 유저들의 주문 금액(amount)의 합계를 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)으로 만들어야 합니다 SELECT user_id, SUM(IF(order_date = "2023-05-01", amount, 0)) AS `2023-05-01`, SUM(IF(order_date = "2023-05-02", amount, 0)) AS `2023-05-02`, SUM(IF(order_date = "2023-05-03", amount, 0)) AS `2023-05-03`, SUM(IF(order_date = "2023-05-04", amount, 0)) AS `2023-05-04`, SUM(IF(order_date = "2023-05-05", amount, 0)) AS `2023-05-05` FROM advanced.orders GROUP BY ALL ORDER BY user_id; # 3. orders 테이블에서 사용자(user_id)별, 날짜(order_date)별로 주문이 있다면 1, 없다면 0으로 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)로 만들고 주문을 많이 해도 1로 처리합니다 SELECT user_id, MAX(IF(order_date = "2023-05-01", 1, 0)) AS `2023-05-01`, MAX(IF(order_date = "2023-05-02", 1, 0)) AS `2023-05-02`, MAX(IF(order_date = "2023-05-03", 1, 0)) AS `2023-05-03`, MAX(IF(order_date = "2023-05-04", 1, 0)) AS `2023-05-04`, MAX(IF(order_date = "2023-05-05", 1, 0)) AS `2023-05-05` FROM advanced.orders GROUP BY ALL ORDER BY user_id; # 4. app_log를 pivot하기 WITH base AS( SELECT user_id, event_date, event_name, event_timestamp, user_pseudo_id, MAX(if(event_param.key = "firebase_screen", event_param.value.string_value, NULL)) AS firebase_screen, MAX(if(event_param.key ="food_id", event_param.value.int_value, NULL)) AS food_id, MAX(if(event_param.key ="session_id", event_param.value.string_value, NULL)) AS session_id FROM avdanced.app_logs CROSS JOIN UNNEST (event_params) AS event_param WHERE event_date ='2022-08-01' GROUP BY ALL ) SELECT event_date, COUNT(user_id) AS cnt FROM base WHERE event_name = 'click_cart' GROUP BY ALL; 퍼널(Funnel): 깔때기 의미 → 넓은 시작점에서 점점 좁하지는 흐름WITH base AS (SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, platform, MAX(IF(event_param.key = "firebase_screen", event_param.value.string_value, NULL)) AS fire_screen, -- MAX(IF(event_param.key = "food_id", event_param.value.int_value, NULL)) AS food_id, MAX(IF(event_param.key = "session_id", event_param.value.string_value, NULL)) AS session_id FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event_param WHERE event_date BETWEEN "2022-08-01" AND "2022-08-18" GROUP BY ALL ), fiter_event_and_concat_event_and_screen AS ( -- event_name + screen (필요한 이벤트만 WHERE 조건에 걸어서 사용) SELECT * EXCEPT(event_name, fire_screen, event_timestamp), CONCAT(event_name, "-", fire_screen) AS event_name_with_screen, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime FROM base WHERE event_name IN ("screen_view", "click_payment") ) -- step_number + COUNT -- step_number : CASE WHEN을 사용해 숫자 지정 SELECT event_name_with_screen, CASE WHEN event_name_with_screen = "screen_view-welcome" THEN 1 WHEN event_name_with_screen = "screen_view-home" THEN 2 WHEN event_name_with_screen = "screen_view-food_category" THEN 3 WHEN event_name_with_screen = "screen_view-restaurant" THEN 4 WHEN event_name_with_screen = "screen_view-cart" THEN 5 WHEN event_name_with_screen = "click_payment-cart" THEN 6 ELSE NULL END AS step_number, COUNT(DISTINCT user_pseudo_id) AS cnt FROM fiter_event_and_concat_event_and_screen GROUP BY ALL HAVING step_number IS NOT NULL WHERE user_pseudo_id = "1350836585.3421064109" -- 일자별 SELECT event_date, event_name_with_screen, CASE WHEN event_name_with_screen = "screen_view-welcome" THEN 1 WHEN event_name_with_screen = "screen_view-home" THEN 2 WHEN event_name_with_screen = "screen_view-food_category" THEN 3 WHEN event_name_with_screen = "screen_view-restaurant" THEN 4 WHEN event_name_with_screen = "screen_view-cart" THEN 5 WHEN event_name_with_screen = "click_payment-cart" THEN 6 ELSE NULL END AS step_number, COUNT(DISTINCT user_pseudo_id) AS cnt FROM fiter_event_and_concat_event_and_screen GROUP BY ALL HAVING step_number IS NOT NULL ORDER BY event_date;
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[바짝스터디 1주차 과제] ARRAY, STRUCT 연습 문제 / PIVOT 연습 문제 / 퍼널 쿼리 연습 문제
1. ARRAY, STRUCT 연습 문제-- 1.array_exercises 테이블에서 각 영화(title)별로 장르(genres)를 UNNEST해서 보여주세요 SELECT movie_id, title, genre FROM `advanced.array_exercises`, CROSS JOIN UNNEST(genres) AS genre -- 2.array_exercises 테이블에서 각 영화(title) 별로 배우(actor)와 배역(character)을 보여주세요. 배우와 배역은 별도의 컬럼으로 나와야 합니다. SELECT title, actors.actor, actors.character FROM `advanced.array_exercises` CROSS JOIN UNNEST(actors) AS actors -- 3.array_exercises 테이블에서 각 영화(title) 별로 배우(actor), 배역(character), 장르(genre)를 출력하세요. 한 Row에 배우, 배역, 장르가 모두 표시되어야 합니다. SELECT title, actor.actor, actor.character, genre FROM `advanced.array_exercises` CROSS JOIN UNNEST(actors) AS actor CROSS JOIN UNNEST(genres) AS genre -- 4.앱 로그 데이터(app_logs)의 배열을 풀어주세요 SELECT event_date, event_timestamp, event_name, event_param.key AS key, event_param.value.string_value AS string_value, event_param.value.int_value AS int_value, user_id, user_pseudo_id, platform FROM `advanced.app_logs` CROSS JOIN UNNEST(event_params) AS event_param2. PIVOT 연습 문제-- 1.orders 테이블에서 유저(user_id) 별로 주문 금액(amount)의 합계를 PIVOT해주세요. 날짜(order_date)를 행(Row)으로, user_id를 열(Column)으로 만들어야 합니다. SELECT order_date, SUM(IF(user_id = 1, amount, 0)) AS user_1, SUM(IF(user_id = 2, amount, 0)) AS user_2, SUM(IF(user_id = 3, amount, 0)) AS user_3 FROM advanced.orders GROUP BY ALL ORDER BY order_date -- 2. orders 테이블에서 날짜(order_date) 별로 유저들의 주문 금액(amount)의 합계를 PIVOT 해주세요. user_id를 행(Row)으로, order_Date를 열(Column)으로 만들어야 합니다. SELECT user_id, SUM(IF(order_date = "2023-05-01", amount, 0)) AS `2023-05-01`, SUM(IF(order_date = "2023-05-02", amount, 0)) AS `2023-05-02`, SUM(IF(order_date = "2023-05-03", amount, 0)) AS `2023-05-03`, SUM(IF(order_date = "2023-05-04", amount, 0)) AS `2023-05-04`, SUM(IF(order_date = "2023-05-05", amount, 0)) AS `2023-05-05`, FROM advanced.orders GROUP BY ALL ORDER BY user_id -- 3. orders 테이블에서 사용자(user_id)별, 날짜(order_date)별로 주문이 있다면 1, 없다면 0으로 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)로 만들고 주문을 많이 해도 1로 처리합니다. SELECT user_id, MAX(IF(order_date = "2023-05-01", 1, 0)) AS `2023-05-01`, MAX(IF(order_date = "2023-05-02", 1, 0)) AS `2023-05-02`, MAX(IF(order_date = "2023-05-03", 1, 0)) AS `2023-05-03`, MAX(IF(order_date = "2023-05-04", 1, 0)) AS `2023-05-04`, MAX(IF(order_date = "2023-05-05", 1, 0)) AS `2023-05-05`, FROM advanced.orders GROUP BY ALL ORDER BY user_id -- 4.앱 로그 PIVOT WITH base AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, MAX(IF(param.key = "firebase_screen", param.value.string_value, NULL)) AS firebase_screen, MAX(IF(param.key = "food_id", param.value.int_value, NULL)) AS food_id, MAX(IF(param.key = "session_id", param.value.string_value, NULL)) AS session_id, FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS param GROUP BY ALL ) SELECT event_date, COUNT(user_id) AS user_cnt FROM base WHERE event_name = "click_cart" GROUP BY ALL 3. 퍼널 쿼리 연습 문제WITH base AS ( select event_date, event_timestamp, event_name, user_id, user_pseudo_id, platform, MAX(IF(param.key = "firebase_screen", param.value.string_value, NULL)) AS firebase_screen, MAX(IF(param.key = "food_id", param.value.int_value, NULL)) AS food_id, MAX(IF(param.key = "session_id", param.value.string_value, NULL)) AS session_id, FROM `advanced.app_logs` CROSS JOIN UNNEST(event_params) AS param WHERE 1=1 AND event_date BETWEEN "2022-08-01" AND "2022-08-18" GROUP BY ALL ), filter_event_and_concat_event_and_screen AS ( SELECT * EXCEPT(event_name, firebase_screen, event_timestamp), CONCAT(event_name, "-", firebase_screen) AS event_name_with_screen, DATETIME(TIMESTAMP_MICROS(event_timestamp), "Asia/Seoul") AS event_datetime FROM base WHERE 1=1 AND event_name IN ("screen_view", "click_payment") ), MAIN AS ( SELECT event_date, event_name_with_screen, CASE WHEN event_name_with_screen = "screen_view-welcome" THEN 1 WHEN event_name_with_screen = "screen_view-home" THEN 2 WHEN event_name_with_screen = "screen_view-food_category" THEN 3 WHEN event_name_with_screen = "screen_view-restaurant" THEN 4 WHEN event_name_with_screen = "screen_view-cart" THEN 5 WHEN event_name_with_screen = "click_payment-cart" THEN 6 ELSE NULL END AS step_number, COUNT(DISTINCT user_pseudo_id) AS cnt FROM filter_event_and_concat_event_and_screen GROUP BY ALL HAVING step_number IS NOT NULL ) -- 피벗 쿼리 만들기 SELECT event_date, MAX(IF(event_name_with_screen ="screen_view-welcome", cnt, NULL)) AS `screen_view-welcome`, MAX(IF(event_name_with_screen ="screen_view-home", cnt, NULL)) AS `screen_view-home`, MAX(IF(event_name_with_screen ="screen_view-food_category", cnt, NULL)) AS `screen_view-food_category`, MAX(IF(event_name_with_screen ="screen_view-restaurant", cnt, NULL)) AS `screen_view-restaurant`, MAX(IF(event_name_with_screen ="screen_view-cart", cnt, NULL)) AS `screen_view-cart`, MAX(IF(event_name_with_screen ="click_payment-cart", cnt, NULL)) AS `click_payment-cart` FROM main GROUP BY ALL ORDER BY event_date
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[인프런 빅쿼리 빠짝스터디 1주차] ARRAY, STRUCT 연습 문제/ PIVOT 연습문제/ 퍼널 쿼리 연습문제
1. ARRAY, STRUCT 연습 문제연습문제 1SELECT a.title, genre FROM advanced.array_exercises as a CROSS JOIN UNNEST(a.genres) AS genre연습문제 2SELECT a.title, actor.actor, actor.character FROM advanced.array_exercises as a CROSS JOIN UNNEST(a.actors) AS actor연습문제 3SELECT a.title, actor.actor, actor.character, genre FROM advanced.array_exercises as a CROSS JOIN UNNEST(a.actors) AS actor CROSS JOIN UNNEST(a.genres) AS genre연습문제 4SELECT user_id, event_date, event_name, user_pseudo_id, event_param.key, event_param.value.string_value, event_param.value.int_value, event_timestamp, platform, FROM advanced.app_logs CROSS JOIN UNNEST(event_params) as event_param2. PIVOT 연습문제연습문제 1SELECT order_date, SUM(IF(user_id = 1, sum_amount, 0)) as user_1, SUM(IF(user_id = 2, sum_amount, 0)) as user_2, SUM(IF(user_id = 3, sum_amount, 0)) as user_3, FROM ( SELECT order_date, user_id, sum(amount) as sum_amount, FROM advanced.orders GROUP BY user_id, order_date ) GROUP BY order_date ORDER BY order_date연습문제 2SELECT user_id, MAX(IF(order_date = "2023-05-01", sum_amount, 0)) as `2023-05-01`, MAX(IF(order_date = "2023-05-02", sum_amount, 0)) as `2023-05-02`, MAX(IF(order_date = "2023-05-03", sum_amount, 0)) as `2023-05-03`, MAX(IF(order_date = "2023-05-04", sum_amount, 0)) as `2023-05-04`, MAX(IF(order_date = "2023-05-05", sum_amount, 0)) as `2023-05-05` FROM ( SELECT user_id, order_date, SUM(amount) as sum_amount FROM advanced.orders GROUP BY user_id, order_date ORDER BY user_id, order_date ) GROUP BY user_id ORDER BY user_id연습문제 3SELECT user_id, MAX(IF(order_date = "2023-05-01", 1, 0)) as `2023-05-01`, MAX(IF(order_date = "2023-05-02", 1, 0)) as `2023-05-02`, MAX(IF(order_date = "2023-05-03", 1, 0)) as `2023-05-03`, MAX(IF(order_date = "2023-05-04", 1, 0)) as `2023-05-04`, MAX(IF(order_date = "2023-05-05", 1, 0)) as `2023-05-05` FROM advanced.orders GROUP BY user_id ORDER BY user_id3. 퍼널 쿼리 연습 문제WITH base as (SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, platform, MAX(IF(event_param.key = "firebase_screen", event_param.value.string_value, NULL)) as firebase_screen, MAX(IF(event_param.key = "food_id", event_param.value.int_value, NULL)) as food_id, MAX(IF(event_param.key = "session_id", event_param.value.string_value, NULL)) as session_id FROM advanced.app_logs CROSS JOIN UNNEST(event_params) as event_param WHERE event_date BETWEEN "2022-08-01" AND "2022-08-18" GROUP BY ALL ), base2 as ( SELECT *, CONCAT(event_name, "-", firebase_screen) as event_screen FROM base WHERE event_name IN ("screen_view", "click_payment") ) SELECT event_screen, event_date, CASE WHEN event_screen = "screen_view-welcome" THEN 1 WHEN event_screen = "screen_view-home" THEN 2 WHEN event_screen = "screen_view-food_category" THEN 3 WHEN event_screen = "screen_view-restaurant" THEN 4 WHEN event_screen = "screen_view-cart" THEN 5 WHEN event_screen = "click_payment-cart" THEN 6 ELSE NULL END as step_number, COUNT(DISTINCT user_pseudo_id) as cnt FROM base2 GROUP BY ALL HAVING step_number is not NULL ORDER BY event_datePIVOT 적용WITH base as (SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, platform, MAX(IF(event_param.key = "firebase_screen", event_param.value.string_value, NULL)) as firebase_screen, MAX(IF(event_param.key = "food_id", event_param.value.int_value, NULL)) as food_id, MAX(IF(event_param.key = "session_id", event_param.value.string_value, NULL)) as session_id FROM advanced.app_logs CROSS JOIN UNNEST(event_params) as event_param WHERE event_date BETWEEN "2022-08-01" AND "2022-08-18" GROUP BY ALL ), base2 as ( SELECT *, CONCAT(event_name, "-", firebase_screen) as event_screen FROM base WHERE event_name IN ("screen_view", "click_payment") ), base3 as ( SELECT event_screen, event_date, CASE WHEN event_screen = "screen_view-welcome" THEN 1 WHEN event_screen = "screen_view-home" THEN 2 WHEN event_screen = "screen_view-food_category" THEN 3 WHEN event_screen = "screen_view-restaurant" THEN 4 WHEN event_screen = "screen_view-cart" THEN 5 WHEN event_screen = "click_payment-cart" THEN 6 ELSE NULL END as step_number, COUNT(DISTINCT user_pseudo_id) as cnt FROM base2 GROUP BY ALL HAVING step_number is not NULL ORDER BY event_date ) SELECT event_date, MAX(IF(base3.event_screen ="screen_view-welcome", cnt, NULL)) AS screen_view_welcome, MAX(IF(base3.event_screen ="screen_view-home", cnt, NULL)) AS screen_vie_home, MAX(IF(base3.event_screen ="screen_view-food_category", cnt, NULL)) AS screen_view_food_category, MAX(IF(base3.event_screen ="screen_view-restaurant", cnt, NULL)) AS screen_view_restaurant, MAX(IF(base3.event_screen ="screen_view-cart", cnt, NULL)) AS screen_view_cart FROM base3 GROUP BY ALL ORDER BY event_date배운점날짜를 사용할 때 백틱(`)을 사용해야 한다는 점PIVOT 관련 쿼리를 처음 짜봤는데, GROUP BY로 먼저 그룹화한 뒤 사용하는 방법으로는 풀겠는데, 바로 PIVOT하는 방법은 아직 익숙하지 않아서 더 연습해야겠다. 구현 과정이 아직 머릿속에서 굴러가지 않는 느낌이었다.처음 쿼리를 짤 때보다 연습하면서 WITH 구문에 익숙해진 것 같다.풀고 나서 해설을 보니, EXCEPT 문을 사용하는 것을 까먹었다. 몇 개만 빼고 select 하는 경우 유용할 듯구문 오류 시 어떤 오류인지 아직 파악이 안되어 있어 강의에서 말씀하신 것처럼 하나하나 정리해놔야겠다.
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 1주차 과제] ARRAY, STRUCT 연습 문제/ PIVOT 연습문제/ 퍼널 분석 연습 문제
연습문제(1) ARRAY, STRUCT-- 1) array_exercises 테이블에서 각 영화(title)별로 장르(genres)를 UNNEST해서 보여주세요 SELECT title , genre FROM advanced.array_exercises CROSS JOIN UNNEST(genres) AS genre -- 2) array_exercises 테이블에서 각 영화(title)별로 배우(actor)와 배역(character)을 보여주세요. 배우와 배역은 별도의 컬럼으로 나와야 합니다 SELECT title, , actor.actor AS actor , actor.character AS character FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS actor -- 3) array_exercises 테이블에서 각 영화(title)별로 배우(actor), 배역(character), 장르(genre)를 출력하세요. 한 Row에 배우, 배역, 장르가 모두 표시되어야 합니다 SELECT title , actor.actor AS actor , actor.character AS character , genre FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS actor CROSS JOIN UNNEST(genres) AS genre -- 4) 앱 로그 데이터(app_logs)의 배열을 풀어주세요 SELECT user_id , event_date , event_name , user_pseudo_id , params.key AS key , params.value.string_value AS str_value , params.value.int_value AS int_value FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS params WHERE event_date = '2022-08-01' (2) PIVOT-- 1) orders 테이블에서 유저(user_id)별로 주문 금액(amount)의 합계를 PIVOT해주세요. 날짜(order_date)를 행(Row)으로, user_id를 열(Column)으로 만들어야 합니다 WITH step1 AS ( SELECT order_date , user_id , sum(amount) AS sum_of_amount FROM advanced.orders GROUP BY ALL ) SELECT order_date , MAX(IF(user_id = 1, sum_of_amount, 0)) AS user_1 , MAX(IF(user_id = 2, sum_of_amount, 0)) AS user_2 , MAX(IF(user_id = 3, sum_of_amount, 0)) AS user_3 FROM step1 GROUP BY order_date ORDER BY order_date -- 2) orders 테이블에서 날짜(order_date)별로 유저들의 주문 금액(amount)의 합계를 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)으로 만들어야 합니다 SELECT user_id , SUM(IF(order_date = '2023-05-01', amount, 0)) AS `2023-05-01` , SUM(IF(order_date = '2023-05-02', amount, 0)) AS `2023-05-02` , SUM(IF(order_date = '2023-05-03', amount, 0)) AS `2023-05-03` , SUM(IF(order_date = '2023-05-04', amount, 0)) AS `2023-05-04` , SUM(IF(order_date = '2023-05-05', amount, 0)) AS `2023-05-05` FROM advanced.orders GROUP BY user_id ORDER BY user_id -- 3) orders 테이블에서 사용자(user_id)별, 날짜(order_date)별로 주문이 있다면 1, 없다면 0으로 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)로 만들고 주문을 많이 해도 1로 처리합니다 SELECT user_id , MAX(IF(order_date = '2023-05-01', 1, 0)) AS `2023-05-01` , MAX(IF(order_date = '2023-05-02', 1, 0)) AS `2023-05-02` , MAX(IF(order_date = '2023-05-03', 1, 0)) AS `2023-05-03` , MAX(IF(order_date = '2023-05-04', 1, 0)) AS `2023-05-04` , MAX(IF(order_date = '2023-05-05', 1, 0)) AS `2023-05-05` FROM advanced.orders GROUP BY user_id ORDER BY user_id -- 4) 앱 로그 데이터 배열 PIVOT하기 SELECT user_id , event_date , event_name , user_pseudo_id , MAX(IF(params.key = 'firebase_screen', params.value.string_value, NULL)) AS firebase_screen , MAX(IF(params.key = 'food_id', params.value.int_value, NULL)) AS food_id , MAX(IF(params.key = 'session_id', params.value.string_value, NULL)) AS session_id FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS params WHERE event_date = '2022-08-01' GROUP BY ALL (3) 퍼널 분석WITH step1 AS ( SELECT event_date , event_timestamp , event_name , user_id , user_pseudo_id , MAX(IF(params.key = 'firebase_screen', params.value.string_value, NULL)) AS firebase_screen , MAX(IF(params.key = 'session_id', params.value.string_value, NULL)) AS session_id , platform FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS params WHERE event_date BETWEEN '2022-08-01' AND '2022-08-18' GROUP BY ALL ), step2 AS ( SELECT * EXCEPT(event_timestamp) , CONCAT(event_name, '-', firebase_screen) AS event_name_with_screen , DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime FROM step1 ), step3 AS ( SELECT * , CASE WHEN event_name_with_screen = 'screen_view-welcome' THEN 1 WHEN event_name_with_screen = 'screen_view-home' THEN 2 WHEN event_name_with_screen = 'screen_view-food_category' THEN 3 WHEN event_name_with_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_with_screen = 'screen_view-cart' THEN 5 WHEN event_name_with_screen = 'click_payment-cart' THEN 6 END AS step_number FROM step2 ), step3_1 AS ( -- 1) 각 퍼널별 유저 수 집계 SELECT event_name_with_screen , step_number , COUNT(DISTINCT user_pseudo_id) AS cnt FROM step3 GROUP BY ALL HAVING step_number IS NOT NULL ORDER BY step_number ) , step3_2 AS ( -- 2) 일자별 각 퍼널별 유저 수 집계 SELECT event_date , event_name_with_screen , step_number , COUNT(DISTINCT user_pseudo_id) AS cnt FROM step3 GROUP BY ALL HAVING step_number IS NOT NULL ORDER BY event_date , step_number ) -- 3) 2) 데이터를 PIVOT SELECT event_date , MAX(IF(event_name_with_screen = 'screen_view-welcome', cnt, NULL)) AS `screen_view-welcome` , MAX(IF(event_name_with_screen = 'screen_view-home', cnt, NULL)) AS `screen_view-home` , MAX(IF(event_name_with_screen = 'screen_view-food_category', cnt, NULL)) AS `screen_view-food_category` , MAX(IF(event_name_with_screen = 'screen_view-restaurant', cnt, NULL)) AS `screen_view-restaurant` , MAX(IF(event_name_with_screen = 'screen_view-cart', cnt, NULL)) AS `screen_view-cart` , MAX(IF(event_name_with_screen = 'click_payment-cart', cnt, NULL)) AS `click_payment-cart` FROM step3_2 GROUP BY event_date ORDER BY event_date
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 1주차 과제] ARRAY, STRUCT 연습 문제/ PIVOT 연습문제/ 퍼널 분석 연습 문제
1. ARRAY, STRUCT 연습문제/*1) array_exercise 테이블에서 각 영화(title)별로 장르를 UNNEST 해서 보여주세요*/ select title, genre from `advanced.array_exercises`, unnest(genres) as genre /*2) array_exercise 테이블에서 각 영화별로 배우와 배역을 보여주세요. 배우와 배역은 별도의 칼럼으로 보여주세요*/ select title, actor.actor, actor.character from `advanced.array_exercises`, unnest(actors) as actor /*3) array_exercise 테이블에서 각 영화별로 배우, 배역, 장르를 출력하세요. 한 Row에 배우, 배역, 장르가 모두 표시되어야 합니다.*/ select title, actor.actor, actor.character, genre from `advanced.array_exercises` ,unnest(actors) as actor ,unnest(genres) as genre -- 문제의의도: UNNEST를 2번 연속 사용 가능하다 -- 데이터 중복은 CROSS JOIN으로 인한 어쩔 수 없는 이슈 -- SQL 실행순서 -> FROM -> JOIN -> SELECT -- actors : ARRAY<STRUCT> -> UNNEST -> STRUCT -- genres : ARRAY<STRING> -> UNNEST -> STRING /*4) 앱로그데이터(app_logs)의 배열을 풀어주세요*/ select event_date, datetime(timestamp_micros(event_timestamp),'Asia/Seoul') as event_timestamp, event_name, event_param.key as key, event_param.value.string_value as string_value, event_param.value.int_value as int_value, user_id from `advanced.app_logs` ,unnest(event_params) as event_param -- event_params : ARRAY<STRUCT> -- event_params.value : STRUCT<STRING, INT64> 2. PIVOT 연습문제/*1) orders 테이블에서 유저별로 주문금액의 합계를 PIVOT해주세요. 날짜를 행으로, user_id를 열로 만들어야합니다. /*case when 사용한 풀이*/ select order_date, sum(case when user_id = 1 then amount else 0 end) as user_1, sum(case when user_id = 2 then amount else 0 end) as user_2, sum(case when user_id = 3 then amount else 0 end) as user_3 from `advanced.orders` group by order_date order by order_date /*if 사용한 풀이*/ select order_date, sum(if(user_id = 1,amount,0)) as user_1, sum(if(user_id = 2,amount,0)) as user_2, sum(if(user_id = 3,amount,0)) as user_3 from `advanced.orders` group by order_date order by order_date /*2) orders 테이블에서 날짜별로 유저들의 주문금액의 합계를 PIVOT 해주세요. user_id를 행으로, order_date를 열로 만들어야 합니다.*/ select user_id, sum(if(order_date = '2023-05-01',amount,0)) as `2023-05-01`, sum(if(order_date = '2023-05-02',amount,0)) as `2023-05-02`, sum(if(order_date = '2023-05-03',amount,0)) as `2023-05-03`, sum(if(order_date = '2023-05-04',amount,0)) as `2023-05-04`, sum(if(order_date = '2023-05-05',amount,0)) as `2023-05-05` FROM `advanced.orders` group by user_id order by user_id -- ANY_VALUE: 그룹화할 대상 중 임의의 값(NULL 제외)를 표시한다. -- 나머지 값이 NULL이거나 확정적으로 값이 나올것이라 예상될 때 사용 /*3)orders 테이블에서 사용자별, 날짜별 주문이 있다면 1 없다면 0으로 PIVOT 해주세요. user_id를 행으로 order_date를 칼럼으로 만들고 주문횟수에 상관없이 1로 처리합니다.*/ select user_id, MAX(if(order_date = '2023-05-01' and amount is not null,1,0)) as `2023-05-01`, MAX(if(order_date = '2023-05-02' and amount is not null,1,0)) as `2023-05-02`, MAX(if(order_date = '2023-05-03' and amount is not null,1,0)) as `2023-05-03`, MAX(if(order_date = '2023-05-04' and amount is not null,1,0)) as `2023-05-04`, MAX(if(order_date = '2023-05-05' and amount is not null,1,0)) as `2023-05-05` FROM `advanced.orders` group by user_id order by user_id; /* 횟수를 구해달라고 하는 경우*/ -- MAX를 SUM으로 바꾸면 된다. /*4) APP_LOG PIVOT 실습하기*/ select user_id, event_date, datetime(timestamp_micros(event_timestamp), 'Asia/Seoul') as event_timestamp, event_name, max(if(params.key = 'firebase_screen',params.value.string_value,NULL)) as firebase_screen, max(if(params.key = 'food_id',params.value.int_value,NULL)) as food_id, max(if(params.key = 'session_id',params.value.string_value,NULL)) as session_id from `advanced.app_logs` , unnest(event_params) as params where event_date = '2022-08-01' group by 1,2,3,4 3. 퍼널 분석 연습문제/* 일자별 퍼널별 유저 수 집계 기간: 2022-08-01 ~ 2022-08-18 사용할 event => event_name과 firebase_screen의 값을 concat하여 사용 1. screen_view-welcome 2. screen_view-home 3. screen_view-food_category 4. screen_view-restaurant 5. screen_view-cart 6. click_payment-cart -- screen_view-welcome에서 user_id는 NULL임 -> user_pseudo_id는 NULL이 아님 -- welcome에서 home으로 넘어가며 로그인을 하여서 그럼 -- WHERE: FROM 절에서 바로 필터링을 하고 싶은 조건을 지정 -- HAVING: GROUP BY 후에 나오는 집계 결과에 대한 조건을 지정 */ with base as ( select event_date, datetime(timestamp_micros(event_timestamp),'Asia/Seoul') as event_timestamp, event_name, user_pseudo_id, max(if(params.key = 'firebase_screen',params.value.string_value,NULL)) as firebase_screen, max(if(params.key = 'food_id',params.value.int_value,NULL)) as food_id, max(if(params.key = 'session_id',params.value.string_value,NULL)) as session_id from `advanced.app_logs` , unnest(event_params) as params where event_date between '2022-08-01' and '2022-08-18' group by 1,2,3,4 ) select event_date, event_name_with_screen, case when event_name_with_screen = 'screen_view-welcome' then 1 when event_name_with_screen = 'screen_view-home' then 2 when event_name_with_screen = 'screen_view-food_category' then 3 when event_name_with_screen = 'screen_view-restaurant' then 4 when event_name_with_screen = 'screen_view-cart' then 5 when event_name_with_screen = 'click_payment-cart' then 6 end as step_number, count(distinct user_pseudo_id) as cnt from ( select *, case when event_name in ('screen_view','click_payment') then concat(event_name,'-',firebase_screen) else null end as event_name_with_screen from base ) group by 1,2,3 having step_number is not null order by event_date, step_number
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 1주차 과제] ARRAY, STRUCT 연습 문제/ PIVOT 연습문제/ 퍼널 쿼리 연습 문제
연습 문제(1) ARRAY, STRUCT 연습문제1-1.--1-1) array_exercises 테이블에서 각 영화(title)별로 장르(genres)를 UNNEST해서 보여주세요. SELECT title , genre FROM advanced.array_exercises , unnest(genres) as genre1-2.--1-2) array_exercises 테이블에서 각 영화(title)별로 배우(actor)와 배역(character)을 보여주세요. 배우와 배역은 별도의 컬럼으로 나와야 합니다. select title , actor.actor , actor.character from advanced.array_exercises , unnest(actors) as actor 1-3.--1-3) array_exercises 테이블에서 각 영화(title)별로 배우(actor), 배역(character), 장르(genre)를 출력하세요. 한 Row에 배우, 배역, 장르가 모두 표시되어야 합니다. select title , actor.actor as actor , actor.character as character , genre from advanced.array_exercises , unnest(actors) as actor , unnest(genres) as genre 1-4.--1-4) 앱 로그 데이터(app_logs)의 배열을 풀어주세요 select user_id , event_date , event_name , user_pseudo_id , param.key as key , param.value.string_value as string_value , param.value.int_value as int_value from advanced.app_logs , unnest(event_params) as param where event_date = '2022-08-01' (2) PIVOT 연습문제2-1.--2-1) orders 테이블에서 유저(user_id)별로 주문 금액(amount)의 합계를 PIVOT해주세요. 날짜(order_date)를 행(Row)으로, user_id를 열(Column)으로 만들어야 합니다. select order_date , sum(if(user_id = 1,amount,0)) as user_1 , sum(if(user_id = 2,amount,0)) as user_2 , sum(if(user_id = 3,amount,0)) as user_3 from advanced.orders group by order_date order by order_date2-2.--2-2) orders 테이블에서 날짜(order_date)별로 유저들의 주문 금액(amount)의 합계를 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)으로 만들어야 합니다. select user_id , max(if(order_date = '2023-05-01',amount,0)) as `2023-05-01` , max(if(order_date = '2023-05-02',amount,0)) as `2023-05-02` , max(if(order_date = '2023-05-03',amount,0)) as `2023-05-03` , max(if(order_date = '2023-05-04',amount,0)) as `2023-05-04` , max(if(order_date = '2023-05-05',amount,0)) as `2023-05-05` from advanced.orders group by user_id order by user_id2-3.--2-3) orders 테이블에서 사용자별, 날짜별로 주문이 있따면 1, 없다면 0으로 PIVOT 해주세요. user_id를 행(Row)으로, order_date를 열(Column)로 만들고 주문을 많이 해도 1로 처리합니다. select user_id , max(if(order_date = '2023-05-01',1,0)) as `2023-05-01` , max(if(order_date = '2023-05-02',1,0)) as `2023-05-02` , max(if(order_date = '2023-05-03',1,0)) as `2023-05-03` , max(if(order_date = '2023-05-04',1,0)) as `2023-05-04` , max(if(order_date = '2023-05-05',1,0)) as `2023-05-05` from advanced.orders group by user_id order by user_id2-4.--2-4) 앱 로그 데이터 배열 PIVOT 하기 select user_id , event_date , event_name , user_pseudo_id , max(if(param.key = 'firebase_screen', param.value.string_value, null)) as firebase_screen , max(if(param.key = 'food_id', param.value.int_value, null)) as food_id , max(if(param.key = 'session_id', param.value.string_value, null)) as session_id from advanced.app_logs , unnest(event_params) as param where event_date = '2022-08-01' group by all (3) 퍼널 연습문제3-1.--3-1) 퍼널 별 유저 수 집계(일자별) with base as ( select event_date , event_name , event_timestamp , user_id , user_pseudo_id , platform , max(if(param.key = 'firebase_screen', param.value.string_value, null)) as firebase_screen from advanced.app_logs , unnest(event_params) as param where event_date between '2022-08-01' and '2022-08-18' group by all ), filter_event as ( select * except(event_name, firebase_screen) , concat(event_name, "-", firebase_screen) as event_name_with_screen from base where event_name in ('screen_view', 'click_payment') ) select event_date , event_name_with_screen , case when event_name_with_screen = 'screen_view-welcome' then 1 when event_name_with_screen = 'screen_view-home' then 2 when event_name_with_screen = 'screen_view-food_category' then 3 when event_name_with_screen = 'screen_view-restaurant' then 4 when event_name_with_screen = 'screen_view-cart' then 5 when event_name_with_screen = 'click_payment-cart' then 6 else null end as step_number , count(distinct user_pseudo_id) as cnt from filter_event group by all having step_number is not null order by event_date3-2.--3-2) 퍼널 별 유저 수 집계 PIVOT with base as ( select event_date , event_name , event_timestamp , user_id , user_pseudo_id , platform , max(if(param.key = 'firebase_screen', param.value.string_value, null)) as firebase_screen from advanced.app_logs , unnest(event_params) as param where event_date between '2022-08-01' and '2022-08-18' group by all ), filter_event as ( select * except(event_name, firebase_screen) , concat(event_name, "-", firebase_screen) as event_name_with_screen from base where event_name in ('screen_view', 'click_payment') ), daily_group as ( select event_date , event_name_with_screen , case when event_name_with_screen = 'screen_view-welcome' then 1 when event_name_with_screen = 'screen_view-home' then 2 when event_name_with_screen = 'screen_view-food_category' then 3 when event_name_with_screen = 'screen_view-restaurant' then 4 when event_name_with_screen = 'screen_view-cart' then 5 when event_name_with_screen = 'click_payment-cart' then 6 else null end as step_number , count(distinct user_pseudo_id) as cnt from filter_event group by all having step_number is not null order by event_date ) select event_date , max(if(event_name_with_screen = 'screen_view-welcome',cnt,null)) as screen_view_welcome , max(if(event_name_with_screen = 'screen_view-home',cnt,null)) as screen_view_home , max(if(event_name_with_screen = 'screen_view-food_category',cnt,null)) as screen_view_food_category , max(if(event_name_with_screen = 'screen_view-restaurant',cnt,null)) as screen_view_restaurant , max(if(event_name_with_screen = 'screen_view-cart',cnt,null)) as screen_view_cart , max(if(event_name_with_screen = 'click_payment-cart',cnt,null)) as click_payment_cart from daily_group group by all order by event_date
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 1주차 과제] ARRAY, STRUCT 연습 문제/ PIVOT 연습문제/ 퍼널 쿼리 연습 문제
1. ARRAY, STRUCT 연습 문제1-1)ARRAY로 된 genres 컬럼을 평면화해 genre로 alias를 붙이고, title과 CROSS JOIN하였습니다.SELECT title , genre FROM advanced.array_exercises CROSS JOIN UNNEST(genres) AS genre 1-2)배우와 배역은 actor컬럼에 ARRAY<STRUCT<actor STRING, character STRING>> 타입으로 저장되어 있습니다. 우선 ARRAY를 평면화하고 .(dot)을 사용해 컬럼을 구분하였습니다.SELECT title , actor.actor , actor.character FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS actor 1-3)처음에는 UNNEST가 익숙하지 않아 단순히 두 데이터의 title을 키 값으로 CTE_ACTOR 테이블에 CTE_GENRE 테이블을 LEFT JOIN하자는 생각이 들었습니다.WITH CTE_ACTOR AS ( SELECT title , actor.actor AS actor , actor.character AS character FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS actor ), CTE_GENRE AS ( SELECT title , genre FROM advanced.array_exercises CROSS JOIN UNNEST(genres) AS genre ) SELECT A.title , A.actor , A.character , B.genre FROM CTE_ACTOR A JOIN CTE_GENRE B ON A.title = B.title 강의를 듣고는 CROSS JOIN을 두 번 해보았습니다. 배열이 각각의 행으로 풀리니 이를 두 번 실행해 결과를 얻을 수 있었고 쿼리도 훨씬 간단해졌습니다 😀WITH CTE_ACTOR AS ( SELECT title , actor.actor AS actor , actor.character AS character FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS actor ), CTE_GENRE AS ( SELECT title , genre FROM advanced.array_exercises CROSS JOIN UNNEST(genres) AS genre ) SELECT A.title , A.actor , A.character , B.genre FROM CTE_ACTOR A JOIN CTE_GENRE B ON A.title = B.title 1-4)SELECT user_id , event_date , event_name , user_pseudo_id , event_param.key , event_param.value.string_value , event_param.value.int_value FROM advanced.app_logs , UNNEST(event_params) event_param -- WHERE event_date='2022-08-02' 2. PIVOT 연습문제2-1)order_date를 기준으로 SUM을 사용해 유저 별 주문금액 합계를 구하고, 요구사항에 따라 빈 값은 0으로 채웠습니다.SELECT order_date , SUM(IF(user_id=1, amount, 0)) AS user_1 , SUM(IF(user_id=2, amount, 0)) AS user_2 , SUM(IF(user_id=3, amount, 0)) AS user_3 FROM advanced.orders GROUP BY order_date 2-2)SELECT user_id , SUM(IF(order_date='2023-05-01', amount, 0)) AS `2023-05-01` , SUM(IF(order_date='2023-05-02', amount, 0)) AS `2023-05-02` , SUM(IF(order_date='2023-05-03', amount, 0)) AS `2023-05-03` , SUM(IF(order_date='2023-05-04', amount, 0)) AS `2023-05-04` , SUM(IF(order_date='2023-05-05', amount, 0)) AS `2023-05-05` FROM advanced.orders GROUP BY user_id 2-3)SELECT user_id , MAX(IF(order_date='2023-05-01', 1, 0)) AS `2023-05-01` , MAX(IF(order_date='2023-05-02', 1, 0)) AS `2023-05-02` , MAX(IF(order_date='2023-05-03', 1, 0)) AS `2023-05-03` , MAX(IF(order_date='2023-05-04', 1, 0)) AS `2023-05-04` , MAX(IF(order_date='2023-05-05', 1, 0)) AS `2023-05-05` FROM advanced.orders GROUP BY user_id 2-4)SELECT user_id , event_date , event_name , event_timestamp , user_pseudo_id , MAX(IF(event_param.key='firebase_screen', event_param.value.string_value, null)) AS `firebase_screen` , MAX(IF(event_param.key='food_id', event_param.value.int_value, null)) AS `string_value` , MAX(IF(event_param.key='session_id', event_param.value.string_value, null)) `int_value` FROM advanced.app_logs , UNNEST(event_params) event_param WHERE event_date='2022-08-25' AND event_name='click_cart' GROUP BY ALL 3. 퍼널 쿼리 연습 문제처음엔 CASE 함수를 바로 떠올리지 못해 순서를 저장한 테이블을 만들고 JOIN해 결과를 구했습니다.WITH CTE_STEP AS ( SELECT 'screen_view-welcome' `event_name_with_screen`, 1 `step_number` UNION ALL SELECT 'screen_view-home', 2 UNION ALL SELECT 'screen_view-food_category', 3 UNION ALL SELECT 'screen_view-restaurant', 4 UNION ALL SELECT 'screen_view-cart', 5 UNION ALL SELECT 'click_payment-cart', 6 ), CTE_EVENT AS ( SELECT event_date , event_name_with_screen , COUNT(DISTINCT user_pseudo_id) `cnt` FROM ( SELECT event_date , user_pseudo_id , CONCAT(event_name, '-', event_param.value.string_value) `event_name_with_screen` FROM advanced.app_logs , UNNEST(event_params) `event_param` WHERE event_date BETWEEN '2022-08-01' AND '2022-08-18' AND event_name IN ('screen_view', 'click_payment') AND event_param.key = 'firebase_screen' AND event_param.value.string_value IN ('welcome', 'home', 'food_category', 'restaurant', 'cart') ) GROUP BY event_date , event_name_with_screen ) SELECT A.event_date , A.event_name_with_screen , B.step_number , A.cnt FROM CTE_EVENT A JOIN CTE_STEP B ON A.event_name_with_screen = B.event_name_with_screen ORDER BY event_date, step_number 이후 강의를 참고해 피벗 테이블을 만들고 CASE문을 사용한 쿼리입니다.WITH base AS ( SELECT event_date , event_timestamp , event_name , user_pseudo_id , MAX(IF(event_param.key='firebase_screen', event_param.value.string_value, null)) `firebase_screen` , MAX(IF(event_param.key='session_id', event_param.value.string_value, null)) `session_id` FROM advanced.app_logs , UNNEST(event_params) `event_param` WHERE event_date BETWEEN '2022-08-01' AND '2022-08-18' AND event_name IN ('screen_view', 'click_payment') GROUP BY ALL ), filter_event_and_concat_evnet_and_screen AS ( SELECT * EXCEPT(event_name, firebase_screen, event_timestamp) , CONCAT(event_name, '-', firebase_screen) `event_name_with_screen` , DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') `event_datetime` FROM base WHERE event_name IN ('screen_view', 'click_payment') ) SELECT event_date , event_name_with_screen , CASE WHEN event_name_with_screen = "screen_view-welcome" THEN 1 WHEN event_name_with_screen = "screen_view-home" THEN 2 WHEN event_name_with_screen = "screen_view-food_category" THEN 3 WHEN event_name_with_screen = "screen_view-restaurant" THEN 4 WHEN event_name_with_screen = "screen_view-cart" THEN 5 WHEN event_name_with_screen = "click_payment-cart" THEN 6 ELSE NULL END AS step_number , COUNT(DISTINCT user_pseudo_id) FROM filter_event_and_concat_evnet_and_screen GROUP BY ALL HAVING step_number IS NOT NULL ORDER BY event_date 추가로 결과 테이블을 피벗 테이블로 만들기 위한 쿼리입니다.SELECT event_date , MAX(IF(event_name_with_screen='screen_view-welcome', cnt, 0)) `screen_view-welcome` , MAX(IF(event_name_with_screen='screen_view-home', cnt, 0)) `screen_view-home` , MAX(IF(event_name_with_screen='screen_view-food_category', cnt, 0)) `screen_view-food_category` , MAX(IF(event_name_with_screen='screen_view-restaurant', cnt, 0)) `screen_view-restaurant` , MAX(IF(event_name_with_screen='screen_view-cart', cnt, 0)) `screen_view-cart` , MAX(IF(event_name_with_screen='click_payment-cart', cnt, 0)) `click_payment-cart` FROM result GROUP BY event_date ORDER BY event_date ✔ 느낀 점쿼리를 작성하기 전 자료형을 파악하자.요구사항이 헷갈릴 땐 한번에 결과를 내려고 하지말고 단계별로 해결하자. ✔ 새롭게 알게된 점cmd + D를 여러 번 해서 같은 단어를 동시에 바꿀 수 있다.group by ALL을 사용하면 SELECT의 컬럼을 반복해서 적지 않아도 된다.구글 스프레드 시트에서도 예약 시간을 사용하면 갱신되는 데이터를 제공할 수 있다(예약 시간 주의).구글 스프레드 시트로 간단한 시각화를 해볼 수 있다. 퍼널 분석에 대한 부분을 재밌게 수강했습니다. 퍼널 분석은 개념 정도만 알고 있었는데 퍼널을 정의하는 것에서 부터 피벗 테이블을 만들고, 시각화하기까지 실무에서의 큰 흐름을 배운 것 같아요. 팀에서 로그 설계를 진행 중인데 완료되면 간단한 퍼널 분석을 시도해보겠습니다💪🏻
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 1주차 과제] ARRAY, STRUCT 연습 문제/ PIVOT 연습문제/ 퍼널 쿼리 연습 문제
ARRAY, STRUCT-- 1) array_exercises 테이블에서 각 영화(title)별로 장르(genres)를 UNNEST해서 보여주세요.select title,genrefrom advanced.array_exercises as aecross join unnest(genres) as genre ; #pivot 연습문제—1) 첫번째 풀이order_date / user_1 / user_2 / user_3--pivot : max(if(조건, true일 때의 값, false일 때의 값)) as new_colum + group by-- max 대신 집계 함수를 사용할 수도 있음 . sum-- false 일 때의 값은 nullselect order_date, sum(if(user_id = 1, sum_of_amount, null)) as user_1, sum(if(user_id = 2, sum_of_amount, null)) as user_2, sum(if(user_id = 3, sum_of_amount, null)) as user_3from(select order_date, user_id, sum(amount) as sum_of_amountfrom advanced.ordersgroup by order_date, user_id)group by order_dateorder by 1 ;—2) 두번째 풀이-- 2번 문제 orders 테이블에서 유저(user_id)별로 주문 금액(AMOUNT)의 합계를 pivot 해주세요.-- 날짜(order_date)를 행(row)으로 user_id를 열 (column)으로 만들어야 합니다-- 컬럼의 이름을 지정할 때 영어 제외하고 backtick('') option + ~select user_id, max(if(order_date = "2023-05-01",amount, 0)) as 2023-05-01, max(if(order_date = "2023-05-02",amount, 0)) as 2023-05-02, max(if(order_date = "2023-05-03",amount, 0)) as 2023-05-03, max(if(order_date = "2023-05-04",amount, 0)) as 2023-05-04, max(if(order_date = "2023-05-05",amount, 0)) as 2023-05-05from advanced.ordersgroup by 1order by 1-- 3번 문제 orders 테이블에서 사용자(user_id)별, 날짜(order_date)별 주문이 있다면 1,-- 없다면 0 으로 pivot 해주세요. user_id를 행(row)으로 order_date를 열 (column)으로 만들어야 합니다-- 컬럼의 이름을 지정할 때 영어 제외하고 backtick('') option + ~select user_id, max(if(order_date = "2023-05-01",1, 0)) as 2023-05-01, max(if(order_date = "2023-05-02",1, 0)) as 2023-05-02, max(if(order_date = "2023-05-03",1, 0)) as 2023-05-03, max(if(order_date = "2023-05-04",1, 0)) as 2023-05-04, max(if(order_date = "2023-05-05",1, 0)) as 2023-05-05from advanced.ordersgroup by 1order by 1
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 1주차 과제] + UNNEST, PIVOT
UNNEST1) array_exercises 테이블에서 각 영화(title)별로 장르를(genres) unnest 해서 보여주세요SELECT title, genre FROM advanced.array_exercises CROSS JOIN UNNEST(genres) AS genre2) array_exercieses 테이블에서 각 영화(title) 별로 배우 (actor)와 배역(character)을 보여주세요. 배우와 배역은 별도의 컬럼으로 나와야 합니다.SELECT title, actor_info.actor, actor_info.character, FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS actor_info;actors는 하나의 배열이고 그 배열의 각 요소가 STRUCT(구조체)배열 자체에서 바로 .actor나 .character로 접근할 수 없음먼저 배열을 UNNEST로 펼친 후에 펼쳐진 각 STRUCT에서 필드값을 가져와야함actors는 '서류 묶음'(배열)각 서류(STRUCT)에는 '배우 이름'과 '캐릭터 이름'이라는 항목이 있음서류 묶음을 먼저 풀어서(UNNEST) 개별 서류로 만든 다음각 서류에서 원하는 정보를 읽어야 함3) array_exercises 테이블에서 각 영화(title) 별로 배우(actor), 배역(character), 장르(genre)를 출력하세요. 한 row 에 배우, 배역, 장르가 모두 표시되어야 합니다.SELECT title, actor_info.actor, actor_info.character, genre, FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS actor_info CROSS JOIN UNNEST(genres) AS genre;actors 는 배열 인데 구조에 그 밑에 스트링, 스트링 2개 항목이 저장된거고genres 는 배열인데 스트링 한 계층?의 구조가 있는거라 서로 구조가 달라서 둘이 같이 unnest 불가함그래서 cross join, unnest 두 번 써줘야함actors : ARRAY<STRUCT> => UNNEST => STRUCTgenres : ARRAY<STRING> => UNNEST => STRING4) 앱 로그 데이터(app_logs)의 배열을 풀어주세요.SELECT user_id, event_date, event_name, user_pseudo_id, params.key AS key, params.value.string_value AS string_value, params.value.int_value AS int_value, FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS params ORDER BY event_date;2022.08.01 부터 나와있길래 그때 날짜부터 정렬인줄 알았더니 where 절에서 2022.08.01 로 필터링 한거였음;;에러 메시지의 의미:"Expected end of input but got keyword UNNEST" → FROM 절 다음에 UNNEST가 바로 나오면 안 되고, JOIN이나 CROSS JOIN이 먼저 나와야 한다는 뜻JOIN이나 CROSS JOIN이 필요한 이유:원본 테이블(app_logs)의 행과 UNNEST로 펼친 배열(event_params)의 요소들을 어떻게 연결할지 명시해야 하기 때문 PIVOT1) orders 테이블에서 유저(user_id)별로 주문금액(amount)의 합계를 PIVOT 해주세요 날짜(order_date)를 행(Row)으로, user_id를 열(Column)으로 만들어야 합니다.SELECT order_date, SUM(IF(user_id=1, amount, 0)) AS user_1, SUM(IF(user_id=2, amount, 0)) AS user_2, SUM(IF(user_id=3, amount, 0)) AS user_3 FROM advanced.orders GROUP BY order_date ORDER BY order_date ASC첫번째 풀이는 PIVOT을 하면서 바로 SUM을 한 것다른 풀이는 집계 함수를 사용해서(SUM) 집계한 후에 PIVOT컬럼 별칭(AS)에 작은따옴표를 사용함 (AS '2023-05-01') → 백틱 `` 을 사용해야 함.GROUP BY 절에 직접 ASC/DESC를 사용함 → ORDER BY를 따로 사용해야 함 2) orders 테이블에서 날짜(order_date)별로 유저들의 주문금액(amount)의 합계를 PIVOT 해주세요. user_id를 행으로, order_date를 열으로 만들어야 합니다. SELECT user_id, SUM(IF(order_date = '2023-05-01', amount, 0)) AS `2023-05-01`, SUM(IF(order_date = '2023-05-02', amount, 0)) AS `2023-05-02`, SUM(IF(order_date = '2023-05-03', amount, 0)) AS `2023-05-03`, SUM(IF(order_date = '2023-05-04', amount, 0)) AS `2023-05-04`, SUM(IF(order_date = '2023-05-05', amount, 0)) AS `2023-05-05` FROM advanced.orders GROUP BY user_id ORDER BY user_id;3) orders 테이블에서 사용자(user_id)별, 날짜(order_date)별로 주문이 있다면 1, 없다면 0으로 PIVOT 해주세요. user_id를 행으로, order_date를 열로 만들고 주문을 많이 해도 1로 처리합니다. SELECT user_id, MAX(IF(order_date = "2023-05-01",1,0)) AS `2023-05-01`, MAX(IF(order_date = "2023-05-02",1,0)) AS `2023-05-02`, MAX(IF(order_date = "2023-05-03",1,0)) AS `2023-05-03`, MAX(IF(order_date = "2023-05-04",1,0)) AS `2023-05-04`, MAX(IF(order_date = "2023-05-05",1,0)) AS `2023-05-05` FROM advanced.orders GROUP BY user_id ORDER BY user_id주문 횟수별로 구해달라고 했을 땐? → MAX 대신 SUM값이 있으면 1로 넣어달라고 했기 때문에 이걸 SUM 하면 주문 횟수 더한 값이 됨.4) user_id = 32888 이 카트 추가하기 (click_cart)를 누를 때 어떤 음식(food_id)을 담았나요?WITH base AS ( SELECT event_date, event_name, user_pseudo_id, event_timestamp, user_id, MAX(IF(param.key = "firebase_screen", param.value.string_value, NULL)) AS firebase_screen, MAX(IF(param.key = "food_id", param.value.int_value, NULL)) AS food_id, MAX(IF(param.key = "session_id", param.value.string_value, NULL)) AS session_id FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS param WHERE event_date = "2022-08-01" GROUP BY ALL ) SELECT select_date, COUNT(user_id) AS user_cnt FROM base WHERE event_name = "click_cart" GROUP BY event_date
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 1주차 과제] ARRAY, STRUCT 연습 문제/ PIVOT 연습문제/ 퍼널 쿼리 연습 문제
ARRAY, STRUCT 연습문제-- 1. array_exercises 테이블에서 title 별로 genres를 UNNEST하기 SELECT title , genre FROM advanced.array_exercises CROSS JOIN UNNEST(genres) as genre; -- 2. array_exercises 테이블에서 title 별로 actor, character 추출 -- actor, character는 별도의 컬럼으로 빼기 (struct의 key로써 존재하면 안 됨.) SELECT title , ACTORS.actor AS actor , ACTORS.character AS character FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS ACTORS; -- 3. array_exercises 테이블에서 title 별로 actor, character, genre 추출 -- 여러 ARRAY 컬럼을 UNNEST할 경우, 각 컬럼별로 UNNEST한 것을 CROSS JOIN 진행하면 됨. SELECT title , ACTORS.actor AS actor , ACTORS.character AS character , genre FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS ACTORS CROSS JOIN UNNEST(genres) as genre; -- 4. app_logs 테이블(약 73만 건의 로그 데이터)의 ARRAY를 UNNEST 하기 -- event_params 형태? -- ARRAY<STRUCT<key STRING, value STRUCT<string_value STRING, int_value INT64>>>[ -- STRUCT('firebase_screen', STRUCT('food_detail', NULL)) -- , ... -- ] SELECT user_id , event_date , event_name , user_pseudo_id , EVENT_PARAMS.key AS key , EVENT_PARAMS.value.string_value AS string_value , EVENT_PARAMS.value.int_value AS int_value FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS EVENT_PARAMS PIVOT 연습문제-- 1. orders 테이블에서 order_date별 user_id간 amount 합계를 PIVOT하기 SELECT order_date , SUM(IF(user_id=1, amount, 0)) AS user_1 , SUM(IF(user_id=2, amount, 0)) AS user_2 , SUM(IF(user_id=3, amount, 0))AS user_3 FROM advanced.orders GROUP BY order_date ORDER BY order_date ASC; -- 2. orders 테이블에서 user_id별 order_date 간 amount 합계를 PIVOT하기 SELECT user_id , SUM(IF(order_date='2023-05-01', amount, 0)) AS `2023-05-01` , SUM(IF(order_date='2023-05-02', amount, 0)) AS `2023-05-02` , SUM(IF(order_date='2023-05-03', amount, 0)) AS `2023-05-03` , SUM(IF(order_date='2023-05-04', amount, 0)) AS `2023-05-04` , SUM(IF(order_date='2023-05-05', amount, 0)) AS `2023-05-05` FROM advanced.orders GROUP BY user_id ORDER BY user_id ASC; -- 3. orders 테이블에서 user_id별 order_date간 주문존재 여부 PIVOT하기 -- 주문 존재하면 1, 없으면 0; 주문횟수가 아님에 유의 SELECT user_id , MAX(IF(order_date='2023-05-01', 1, 0)) AS `2023-05-01` , MAX(IF(order_date='2023-05-02', 1, 0)) AS `2023-05-02` , MAX(IF(order_date='2023-05-03', 1, 0)) AS `2023-05-03` , MAX(IF(order_date='2023-05-04', 1, 0)) AS `2023-05-04` , MAX(IF(order_date='2023-05-05', 1, 0)) AS `2023-05-05` FROM advanced.orders GROUP BY user_id ORDER BY user_id ASC; -- 4. app_logs 테이블 PIVOT 하기 -- user_id=32888이 카트 추가하기(click_cart)를 누를 때 어떤 음식(food_id)를 담았나요? WITH temp AS( SELECT user_id , event_date , event_name , event_timestamp , user_pseudo_id , EVENT_PARAMS.key AS key , EVENT_PARAMS.value.string_value AS string_value , EVENT_PARAMS.value.int_value AS int_value FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS EVENT_PARAMS ) SELECT user_id , event_date , event_name , event_timestamp , user_pseudo_id , MAX(IF(key='firebase_screen', string_value, null)) AS `firebase_screen` , MAX(IF(key='food_id', int_value, null)) AS `food_id` , MAX(IF(key='session_id', string_value, null)) AS `session_id` FROM temp -- GROUP BY user_id, event_date, event_timestamp, event_name, user_pseudo_id GROUP BY ALL 퍼널 쿼리 연습문제WITH main AS ( SELECT event_date, CONCAT(event_name,'-', event_param.value.string_value) AS event_name_with_screen, CASE WHEN event_name = 'screen_view' AND event_param.value.string_value = 'welcome' THEN 1 WHEN event_name = 'screen_view' AND event_param.value.string_value = 'home' THEN 2 WHEN event_name = 'screen_view' AND event_param.value.string_value = 'food_category' THEN 3 WHEN event_name = 'screen_view' AND event_param.value.string_value = 'restaurant' THEN 4 WHEN event_name = 'screen_view' AND event_param.value.string_value = 'cart' THEN 5 WHEN event_name = 'click_payment' AND event_param.value.string_value = 'cart' THEN 6 END AS step_num, COUNT(DISTINCT user_pseudo_id) AS cnt FROM advanced.app_logs CROSS JOIN UNNEST(event_params) AS event_param WHERE 1=1 AND event_date BETWEEN '2022-08-01' AND '2022-08-18' AND event_param.key = 'firebase_screen' AND event_name IN ("screen_view",'click_payment') GROUP BY 1,2,3 HAVING step_num IS NOT NULL ) SELECT event_date, SUM(IF(step_num = 1, cnt, 0)) AS `screen_view-welcome`, SUM(IF(step_num = 2, cnt, 0)) AS `screen_view-home`, SUM(IF(step_num = 3, cnt, 0)) AS `screen_view-food_category`, SUM(IF(step_num = 4, cnt, 0)) AS `screen_view-restaurant`, SUM(IF(step_num = 5, cnt, 0)) AS `screen_view-cart`, SUM(IF(step_num = 6, cnt, 0)) AS `click_payment-cart` FROM main GROUP BY 1 ORDER BY event_date ASC
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 1주차 과제] ARRAY, STRUCT 연습 문제/ PIVOT 연습문제/ 퍼널 쿼리 연습 문제
SELECT [1, 2, 3, 4, 5] AS some_numbers ; SELECT ARRAY<INT64>[1, 2, 3, 4, 5] AS some_numbers ; SELECT GENERATE_ARRAY(1, 5, 1) AS some_numbers ; SELECT [SAFE_OFFSET()] ; SELECT (1, 2, 3) AS struct_test ; SELECT STRUCT<hi INT64, hello INT64>(1, 2) AS struct_test ;SELECT a.title, b AS genre FROM workspace.array_exercises AS a JOIN UNNEST(genres) AS b ; SELECT a.title, b.actor, b.character FROM workspace.array_exercises AS a JOIN UNNEST(actors) AS b ; SELECT a.title, b.actor, b.character, c AS genre FROM workspace.array_exercises AS a JOIN UNNEST(actors) AS b JOIN UNNEST(genres) AS c ; SELECT a.user_id, a.event_date, a.event_name, a.user_pseudo_id, b.key, b.value.string_value, b.value.int_value FROM workspace.app_logs AS a JOIN UNNEST(event_params) AS b ; SELECT key, string_value, count(distinct user_pseudo_id) FROM ( SELECT a.user_id, a.event_date, a.event_name, a.user_pseudo_id, b.key, b.value.string_value, b.value.int_value FROM workspace.app_logs AS a JOIN UNNEST(event_params) AS b ) WHERE event_name = 'screen_view' GROUP BY ALL ; SELECT order_date, sum(if(user_id = 1, amount, 0)) as user_1, sum(if(user_id = 2, amount, 0)) as user_2, sum(if(user_id = 3, amount, 0)) as user_3 FROM workspace.orders GROUP BY ALL ORDER BY order_date ; SELECT user_id, sum(if(order_date = '2023-05-01', amount, 0)) `2023-05-01`, sum(if(order_date = '2023-05-02', amount, 0)) `2023-05-02`, sum(if(order_date = '2023-05-03', amount, 0)) `2023-05-03`, sum(if(order_date = '2023-05-04', amount, 0)) `2023-05-04`, sum(if(order_date = '2023-05-05', amount, 0)) `2023-05-05` FROM workspace.orders GROUP BY ALL ORDER BY user_id ; SELECT user_id, if(sum(if(order_date = '2023-05-01', amount, 0)) > 0, 1, 0) `2023-05-01`, if(sum(if(order_date = '2023-05-02', amount, 0)) > 0, 1, 0) `2023-05-02`, if(sum(if(order_date = '2023-05-03', amount, 0)) > 0, 1, 0) `2023-05-03`, if(sum(if(order_date = '2023-05-04', amount, 0)) > 0, 1, 0) `2023-05-04`, if(sum(if(order_date = '2023-05-05', amount, 0)) > 0, 1, 0) `2023-05-05` FROM workspace.orders GROUP BY ALL ORDER BY user_id ; WITH events AS ( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, platform, MAX(IF(b.key = "firebase_screen", b.value.string_value, NULL)) AS firebase_screen, MAX(IF(b.key = "session_id", b.value.string_value, NULL)) AS session_id FROM workspace.app_logs AS a JOIN UNNEST(event_params) AS b WHERE event_date >= '2022-08-01' AND event_date < '2022-08-19' GROUP BY ALL ), filter_event_and_concat_event_and_screen AS ( SELECT * EXCEPT(event_name, firebase_screen, event_timestamp), CONCAT(event_name, "-", firebase_screen) AS event_name_with_screen, DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Asia/Seoul') AS event_datetime FROM events WHERE event_name IN ('screen_view', 'click_payment') ) SELECT event_date, event_name_with_screen, CASE WHEN event_name_with_screen = 'screen_view-welcome' THEN 1 WHEN event_name_with_screen = 'screen_view-home' THEN 2 WHEN event_name_with_screen = 'screen_view-food_category' THEN 3 WHEN event_name_with_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_with_screen = 'screen_view-cart' THEN 5 WHEN event_name_with_screen = 'click_payment-cart' THEN 6 END AS step_number, COUNT(DISTINCT user_pseudo_id) AS cnt FROM filter_event_and_concat_event_and_screen GROUP BY ALL HAVING step_number IS NOT NULL ;
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 1주차 과제] ARRAY, STRUCT 연습 문제/ PIVOT 연습문제/ 퍼널 쿼리 연습 문제
ARRAY, STRUCT 연습문제-- 1)영화별 장르를 unnest해서 보여줘라. SELECT title, genre FROM advanced.array_exercises CROSS JOIN UNNEST(genres) AS genre; --2) 영화별 배우와 배역을 보여줘라. 배우와 배역은 별도의 컬럼으로 나와야 한다. SELECT title, ac.actor, ac.character FROM advanced.array_exercises CROSS JOIN UNNEST(actors) AS ac; --3. 영화별로 배우, 배역, 장르를 출력해라. SELECT title, actor.actor, actor.character, genre FROM `advanced.array_exercises` CROSS JOIN UNNEST(actors) as actor CROSS JOIN UNNEST(genres) as genre; --4. 앱 로그 데이터의 배열을 풀어라 SELECT user_id, event_date, event_name, user_pseudo_id, event_param.key, event_param.value.string_value, event_param.value.int_value FROM `advanced.app_logs` CROSS JOIN UNNEST(event_params) AS event_param PIVOT 연습문제-- 1. 유저별 주문금액의 합계를 poviot해라. -- SELECT -- order_date, -- SUM(IF(user_id = 1, amount, 0)) AS user_1, -- SUM(IF(user_id = 2, amount, 0)) AS user_2, -- SUM(IF(user_id = 3, amount, 0)) AS user_3 -- FROM `advanced.orders` -- GROUP BY order_date -- ORDER BY order_date; -- 2. 날짜별 유저들의 주문금액의 합계를 pivot해라. -- SELECT -- user_id, -- SUM(IF(order_date = '2023-05-01', amount, 0)) AS `2023-05-01`, -- SUM(IF(order_date = '2023-05-02', amount, 0)) AS `2023-05-02`, -- SUM(IF(order_date = '2023-05-03', amount, 0)) AS `2023-05-03`, -- SUM(IF(order_date = '2023-05-04', amount, 0)) AS `2023-05-04`, -- SUM(IF(order_date = '2023-05-05', amount, 0)) AS `2023-05-05` -- FROM advanced.orders -- GROUP BY user_id -- ORDER BY user_id; -- 3. 사용자별, 날짜별 주문이 있다면 1, 없다면 0으로 pivot해라. -- SELECT -- user_id, -- MAX(IF(order_date = '2023-05-01', 1, 0)) AS `2023-05-01`, -- MAX(IF(order_date = '2023-05-02', 1, 0)) AS `2023-05-02`, -- MAX(IF(order_date = '2023-05-03', 1, 0)) AS `2023-05-03`, -- MAX(IF(order_date = '2023-05-04', 1, 0)) AS `2023-05-04`, -- MAX(IF(order_date = '2023-05-05', 1, 0)) AS `2023-05-05` -- FROM advanced.orders -- GROUP BY user_id -- ORDER BY user_id; --4. key값을 column으로 pivot해라 SELECT user_id, event_date, event_name, user_pseudo_id, IF(key = 'food_id', IF(string_value IS NOT NULL, string_value, CAST(int_value AS STRING)), NULL) AS food_id FROM advanced.app_logs_unnest # app_logs 테이블을 unnest한 결과는 자주 쓰일 듯 싶어 app_logs_unnest 테이블을 따로 생성했다. WHERE event_name = 'click_cart' AND event_date = '2022-08-01'; 기억할 것IF 조건문을 사용할 때, TRUE일때의 값과 FALSE일때의 값의 데이터 타입이 같아야 한다. PIVOT 연습문제 4번을 풀 때 food_id를 PIVOT하는 과정에서 IF(key = 'food_id', IF(string_value IS NOT NULL, string_value, int_value AS STRING), NULL)와 같이 표현했었다. KEY값에 따라 string_value와 int_value중 타입에 맞는 하나의 컬럼에만 값이 있었고, 처음에는 이를 명시적으로 적기보다 범용적으로 사용될 수 있도록 하는게 더 좋지 않을까 싶었다. 값이 있는 컬럼의 값을 사용하자는 의도로 IF(string_value IS NOT NULL, string_value, int_value AS STRING)라고 표현했고 다음과 같은 에러를 만났다.No matching signature for function IF for argument types: BOOL, STRING, INT64. Supported signature: IF(BOOL, ANY, ANY) at [55:23]에러를 피하고자 CAST를 이용했는데 데이터 타입을 억지로 바꾸기보다는(추후에 숫자형 데이터로 대소 비교를 한다던지 나열을 한나던지의 상황이 있을 수 있으니..) 명시적으로 표현하는 것을 마냥 피할것은 아니겠다는 생각을 했다.특별한 문자(예약어, 숫자)를 ALIAS로 설정하려면 back tick(`)으로 감싸줘야한다.퍼널 쿼리 연습문제-- 퍼널 별 유저 수 집계(2022-08-01 ~ 2022-08-18) -- welcome -> home -> good category -> restaurant -> cart -> 주문하기 클릭 /* event_data | event_name_with_screen | stemp_number | cnt 의 컬럼 형태로 만들것 */ -- 처음 작성했던 쿼리 -- SELECT -- event_date, -- CASE -- WHEN event_name = 'screen_view' AND string_value = 'welcome' THEN CONCAT(event_name, '-', string_value) -- WHEN event_name = 'screen_view' AND string_value = 'home' THEN CONCAT(event_name, '-', string_value) -- WHEN event_name = 'screen_view' AND string_value = 'food_category' THEN CONCAT(event_name, '-', string_value) -- WHEN event_name = 'screen_view' AND string_value = 'restaurant' THEN CONCAT(event_name, '-', string_value) -- WHEN event_name = 'screen_view' AND string_value = 'cart' THEN CONCAT(event_name, '-', string_value) -- WHEN event_name = 'click_payment' AND string_value = 'cart' THEN CONCAT(event_name, '-', string_value) -- END AS event_name_with_screen, -- CASE -- WHEN event_name = 'screen_view' AND string_value = 'welcome' THEN 1 -- WHEN event_name = 'screen_view' AND string_value = 'home' THEN 2 -- WHEN event_name = 'screen_view' AND string_value = 'food_category' THEN 3 -- WHEN event_name = 'screen_view' AND string_value = 'restaurant' THEN 4 -- WHEN event_name = 'screen_view' AND string_value = 'cart' THEN 5 -- WHEN event_name = 'click_payment' AND string_value = 'cart' THEN 6 -- END AS step_number, -- COUNT(event_date) AS cnt -- FROM `advanced.app_logs_unnest` -- WHERE -- event_date BETWEEN '2022-08-01' AND '2022-08-18' -- GROUP BY ALL -- ORDER BY event_date, step_number -- ============================================= -- 정석 쿼리 WITH base AS( SELECT event_date, event_timestamp, event_name, user_id, user_pseudo_id, platform, MAX(IF(event_param.key = "firebase_screen", event_param.value.string_value, NULL)) AS firebase_screen, MAX(IF(event_param.key = "food_id", event_param.value.int_value, NULL)) AS food_id, MAX(IF(event_param.key = "session_id", event_param.value.int_value, NULL)) AS session_id FROM `advanced.app_logs` CROSS JOIN UNNEST(event_params) AS event_param -- WHERE -- event_date = "2022-08-01" -- event_date BETWEEN "2022-08-01" AND "2022-08-18" -- AND -- event_param.key IN ("screen_view", "click_payment") -- AND -- event_param.value.string_value IN ("welcome", "home", "food_category", "restaurant", "cart") GROUP BY ALL ), filter_event_and_concat_event_and_screen AS( SELECT * EXCEPT(event_name, firebase_screen), CONCAT(event_name, "-", firebase_screen) AS event_name_with_screen, DATETIME(TIMESTAMP_MICROS(event_timestamp), "Asia/Seoul") AS event_datetime FROM base WHERE event_date BETWEEN "2022-08-01" AND "2022-08-18" AND event_name IN ("screen_view", "click_payment") AND firebase_screen IN ("welcome", "home", "food_category", "restaurant", "cart") ) SELECT event_date, event_name_with_screen, -- event_datetime, -- user_pseudo_id, CASE WHEN event_name_with_screen = 'screen_view-welcome' THEN 1 WHEN event_name_with_screen = 'screen_view-home' THEN 2 WHEN event_name_with_screen = 'screen_view-food_category' THEN 3 WHEN event_name_with_screen = 'screen_view-restaurant' THEN 4 WHEN event_name_with_screen = 'screen_view-cart' THEN 5 WHEN event_name_with_screen = 'click_payment-cart' THEN 6 ELSE NULL END AS step_number, COUNT(DISTINCT user_pseudo_id) AS cnt FROM filter_event_and_concat_event_and_screen GROUP BY ALL HAVING step_number IS NOT NULL ORDER BY event_date, step_number기억할 것각 컬럼에 어떤 값이 있는지 잘! 확인하자.event_name_with_screen 부분을 CASE WHEN으로 처리하며 '이게 진정 맞을까......' 싶긴 했었었다. 왜인지 초반에 컬럼값 확인할 때 WHERE 절에 IN 구문으로 확인했을때 원하던 결과로 나오지 않아 CASE WHEN으로 직접 처리했었는데.. 실수였다. (아마 string_value만 IN연산으로 확인하고 섣부른 판단을 했던게 아닐지 싶다.)CASE WHEN 구문에서 전체를 포함하도록 조건을 구성하지 않으면 WHEN에 해당하지 않는 부분은 NULL값으로 생성된다.처음 쿼리를 짜고 CNT 컬럼의 값을 확인했을 때 event_name_with_screen과 step_number에 왜 NULL값이 있는지 당황스러웠다. 이미 있는 데이터에 새 컬럼을 만든 것이니 조건에 해당되지 않는 부분은 값이 없어 NULL로 남아있다는 사실...!DATETIME 함수를 이용하여 "Asia/Seoul" 처리하기새까맣게 잊고있었다. 까먹지 말자~!항상 데이터를 뽑아 어떤 내용을 확인하고 싶은 건지를 생각하자.문제에 맞는 쿼리를 짜고 작성된 쿼리의 결과가 잘 나오는지에만 급급했던 거 같다. 이 강의를 듣는 목적은 데이터 분석에 대한 감 잡기!지 쿼리 잘 짜기!는 아니었으니까..!(물론 SQL 실력 향상은 필요한 부분..!!) 스스로 고민해보는 시간을 꼭! 갖도록 해보자.
-
미해결BigQuery(SQL) 활용편(퍼널 분석, 리텐션 분석)
[빠짝스터디 1주차 과제] ARRAY, STRUCT 연습 문제/ PIVOT 연습문제/ 퍼널 쿼리 연습 문제
1) ARRAY, STRUCT 연습문제1-1)SELECT title, genre FROM advanced.array_exercises CROSS JOIN UNNEST (genres) as genre ORDER BY title;1-2)SELECT title, actor.actor, actor.character FROM advanced.array_exercises CROSS JOIN UNNEST(actors) as actor ORDER BY title;1-3)SELECT title, actor.actor, actor.character, genre FROM advanced.array_exercises CROSS JOIN UNNEST(genres) as genre CROSS JOIN UNNEST(actors) as actor ORDER BY 1,2,3;1-4)SELECT user_id, event_date, event_name, user_pseudo_id, event_param.key, event_param.value.string_value, event_param.value.int_value FROM advanced.app_logs CROSS JOIN UNNEST(event_params) as event_param WHERE event_date = '2022-08-01';2) PIVOT 연습문제2-1) 1) SELECT order_date, SUM(IF(user_id = 1, amount, 0 )) AS user_1, SUM(IF(user_id = 2, amount, 0 )) AS user_2, SUM(IF(user_id = 3, amount, 0 )) AS user_3 FROM advanced.orders GROUP BY 1 ORDER BY 1;2-2)SELECT user_id, SUM(IF(order_date = '2023-05-01', amount, 0)) AS `2023-05-01`, SUM(IF(order_date = '2023-05-02', amount, 0)) AS `2023-05-02`, SUM(IF(order_date = '2023-05-03', amount, 0)) AS `2023-05-03`, SUM(IF(order_date = '2023-05-04', amount, 0)) AS `2023-05-04`, SUM(IF(order_date = '2023-05-05', amount, 0)) AS `2023-05-05` FROM advanced.orders GROUP BY 1 ORDER BY 1;2-3)SELECT user_id, MAX(IF(order_date = '2023-05-01', 1, 0)) AS `2023-05-01`, MAX(IF(order_date = '2023-05-02', 1, 0)) AS `2023-05-02`, MAX(IF(order_date = '2023-05-03', 1, 0)) AS `2023-05-03`, MAX(IF(order_date = '2023-05-04', 1, 0)) AS `2023-05-04`, MAX(IF(order_date = '2023-05-05', 1, 0)) AS `2023-05-05` FROM advanced.orders GROUP BY 1 ORDER BY 1;2-4)SELECT event_date, event_timestamp, event_name, event_timestamp, user_pseudo_id, MAX(CASE WHEN event_param.key = 'firebase_screen' THEN event_param.value.string_value END) as firebase_screen, MAX(CASE WHEN event_param.key = 'food_id' THEN event_param.value.int_value END) as food_id, MAX(CASE WHEN event_param.key = 'session_id' THEN event_param.value.string_value END) as session_id, FROM advanced.app_logs CROSS JOIN UNNEST(event_params) as event_param WHERE event_date = '2022-08-01' GROUP BY ALL 3) 퍼널 쿼리 연습 문제WITH main as ( SELECT event_date, concat(event_name,'-', event_param.value.string_value) AS event_name_with_screen, CASE WHEN event_name = 'screen_view' AND event_param.value.string_value = 'welcome' THEN 1 WHEN event_name = 'screen_view' AND event_param.value.string_value = 'home' THEN 2 WHEN event_name = 'screen_view' AND event_param.value.string_value = 'food_category' THEN 3 WHEN event_name = 'screen_view' AND event_param.value.string_value = 'restaurant' THEN 4 WHEN event_name = 'screen_view' AND event_param.value.string_value = 'cart' THEN 5 WHEN event_name = 'click_payment' AND event_param.value.string_value = 'cart' THEN 6 END AS step_num, count(distinct user_pseudo_id) AS cnt FROM advanced.app_logs CROSS JOIN UNNEST(event_params) as event_param WHERE event_date between '2022-08-01' AND '2022-08-18' AND event_param.key = 'firebase_screen' AND event_name IN ("screen_view",'click_payment') GROUP BY 1,2,3 HAVING step_num IS NOT NULL ) SELECT event_date, SUM(IF(step_num = 1, cnt, 0)) AS `screen_view-welcome`, SUM(IF(step_num = 2, cnt, 0)) AS `screen_view-home`, SUM(IF(step_num = 3, cnt, 0)) AS `screen_view-food_category`, SUM(IF(step_num = 4, cnt, 0)) AS `screen_view-restaurant`, SUM(IF(step_num = 5, cnt, 0)) AS `screen_view-cart`, SUM(IF(step_num = 6, cnt, 0)) AS `click_payment-cart` FROM main GROUP BY 1 ORDER BY event_date