Big Data Processing (Spark) as Told by a Silicon Valley Leader

How is processing big data different from processing data with Pandas? Let's learn about Spark, an essential framework for big data processing.

(5.0) 1 reviews

30 students

Apache Spark
pyspark
Pandas
Big Data
SQL

This course is prepared for Basic Learners.

What you will learn!

  • Spark

  • Big data processing

  • Databricks

  • Spark SQL

  • Data Engineering

Silicon Valley Engineer Explains
Data Pipeline Design Practices

Spark, the standard for large-scale data processing

Pandas is useful for small-scale data processing, but it shows its limitations as the data grows. On the other hand , Spark is a proven big data processing framework chosen by numerous global companies .

It supports a wide range of tasks, from fast processing speeds and flexible scalability to real-time data analysis and machine learning pipeline construction, and is positioned as a key tool not only in data engineering but also in the AI field.

Spark is no longer just an option; it is a core technology that must be learned in large-scale data environments.
This course is designed to help those who are new to Spark gradually build the performance optimization and analysis design capabilities necessary for practical work.

Why you should take this course

A lecture that provides a proper starting point for learning Spark

Once you understand the context of the technology, subsequent learning becomes much faster and clearer. This course is designed to help you gradually learn big data analysis technology by learning about the evolution of data processing systems and the natural transition from Pandas to Spark.

A course covering Spark performance optimization and practical logic implementation .

This course will help you implement a variety of scenarios, including user behavior analysis, channel flow analysis, and sales aggregation.
Learn the core strategies for designing and utilizing Spark quickly and flexibly. You will also learn Partition, Shuffling, Join methods, and advanced features such as Parquet, UDF, and UDAF in a step-by-step manner, so you can naturally develop the performance optimization and complex logic implementation capabilities required for practical work .

Experience with Spark in a production environment , beyond local

This course will teach you how to set up and run a Spark cluster using AWS EMR through hands-on practice.
From the latest features of Spark 4.0 to operational flows in a cloud environment, it is designed to help you gain the ability to utilize Spark at a production level beyond local training.

Rated 4.9! Highly praised by San Jose State University data science students

This course is included in the Master's program in Data Science at San Jose State University in the United States, and has received a high rating of 4.9 in actual educational settings. It is a practical course recognized overseas that can prepare for Spark introduction and practical transition at the same time.

Learn about these things

1⃣ Understanding the characteristics and challenges of big data processing through the history of big data processing development
2⃣ Learn more about data processing through comparison between Pandas and Spark
3⃣ Understanding the structure of Spark programs and learning Spark SQL programming
4⃣ Learn practical Spark programming with various examples on Databricks
5⃣ Using Spark on AWS EMR and Introduction to Spark 4.0

I recommend this to these people

I'm working on analyzing and processing big data.
Data engineers/analysts/scientists who must process large amounts of data due to the nature of their work

I want to go beyond Pandas scale
Pandas users who are curious about the challenges of big data processing.

I need to understand big data processing.
Developers who need to understand big data processing, even though it is not their main job

After class

  • Understand how big data processing has evolved and what challenges it faces

  • Learn about Spark, the big data standard, its composition and program structure based on Python.

  • Based on examples, you will understand the differences with Pandas, why SQL is efficient in processing structured data, and learn how to use Spark SQL.

Insights from Silicon Valley -proven big data experts

Hello. I am Ki-Yong Han, a data expert in Silicon Valley with 30 years of experience. After starting my career at Samsung Electronics, I moved to Silicon Valley at the age of 31 and spent the first 11 years developing web search at Yahoo, where I first encountered big data processing. Since then, I have built data teams at organizations such as Udemy (listed on NASDAQ in 2021) and Polyvore (acquired by Yahoo in 2015), and have provided data consulting to various Silicon Valley and Korean companies . Based on this, I will share essential skills for data engineers based on my experience teaching master's students at San Jose State University, which boasts the highest employment rate in Silicon Valley .

Things to note before taking the class

Practice environment

  • We use Spark 3.5 based on Python (PySpark). We will briefly introduce Spark 4.x, which is currently in preview development, at the end.

  • The practice environment is mainly

    I use Databricks Community Edition, but I also use Google Colab, Databricks Free Trial, and AWS EMR.


Learning Materials

Player Knowledge and Notes

  • Basic Python syntax (especially Pandas)


  • Basic SQL Knowledge (Beginner)


Recommended for
these people!

Who is this course right for?

  • Someone who is basically interested in big data processing

  • Someone with Pandas experience who wants to expand into big data processing technologies.

  • Someone who needs big data processing for their work.

Need to know before starting?

  • Experience using Pandas

  • Basic Python

  • Basic SQL

Hello
This is keeyonghan9539

764

Students

38

Reviews

22

Answers

4.8

Rating

5

Courses

컴퓨터 공학 석사 후 삼성전자에서 시작된 커리어가 친구덕에 실리콘밸리로 이어져 지난 29년간 13개의 다양한 스테이지의 회사를 다녔습니다 (창업, 대기업들, 다수의 스타트업들).

  • 야후: 엔지니어링 디렉터로 검색엔진 개발.

  • 유데미. 데이터팀을 처음 만들어 30명까지 성장. 2021년 10월에 나스닥 상장

  • 삼성전자

  • ...

중간에 11개월 쉬어보기도 했고 본의 아니게 엔젤투자자(Chartmetric, Goodtime.io, Select Star, EO, 비지니스 캔버스, ...), 어드바이저(몰로코, 블라인드, 월급쟁이부자들, ...), 컨설팅(SK텔레콤, 현대카드, 이마트 등등) 등의 역할을 하면서 나만의 브랜드를 만들었습니다. 실패를 실패가 아닌 교훈으로 보는 긍정의 힘과 꾸준함이라는 복리의 힘을 믿습니다.

https://www.linkedin.com/in/keeyonghan/

유투브 채널

월급쟁이부자들 강의

More

Curriculum

All

45 lectures ∙ (11hr 25min)

Course Materials:

Lecture resources
Published: 
Last updated: 

Reviews

All

1 reviews

5.0

1 reviews

  • everythx

    Reviews 9

    Average Rating 5.0

    5

    32% enrolled

    고스펙의 실무와 대학강의를 겸비하셔서인지 이해가 쉽게됩니다

    keeyonghan9539's other courses

    Check out other courses by the instructor!

    Similar courses

    Explore other courses in the same field!

    $84.70