인프런 영문 브랜드 로고
인프런 영문 브랜드 로고
BEST
Data Science

/

Data Engineering

The Complete Guide to Spark Machine Learning - Part 1

If you want to be recognized as a machine learning expert based on large-scale data, from understanding the core framework of Spark machine learning, to SQL-based data processing through difficult practical problems, to data analysis through business domain analysis, and to the ability to implement optimized machine learning models, please join this course.

(4.9) 23 reviews

881 students

Apache Spark
Machine Learning(ML)
Big Data
Data Engineering
Thumbnail

This course is prepared for Intermediate Learners.

What you will learn!

  • Implementing Machine Learning Models in Spark

  • Detailed understanding of DataFrame, the foundation of Spark's data processing

  • Understand the various technical elements that make up the Spark Machine Learning Framework

  • Mastering Spark's Machine Learning Pipeline

  • Ability to use SQL for data analysis

  • SQL-based Feature Engineering Techniques

  • Implementing models with XGBoost and LightGBM in Spark

  • Model hyperparameter tuning method based on Bayesian optimization

  • Improve your data analysis and ML model implementation skills simultaneously through challenging real-world problems.

  • Data analysis method based on analysis domain

  • Various data visualization techniques

Data analysis + feature engineering + ML implementation,
Grab three competencies at once.

With Apache Spark
Machine learning meets.

Apache Spark, the leader in open source large-scale distributed processing solutions, has met with Machine Learning .

Many large domestic companies and financial institutions are using Apache Spark to analyze large amounts of data and create machine learning models. Since Spark is based on a distributed data processing framework, it can process large amounts of data and create ML models by expanding capacity on a few to dozens of servers . Therefore, it can overcome the limitations of scikit-learn, which can only implement machine learning models on a single server.


Also good at data processing/analysis
As a machine learning expert
I will help you grow.

The 'Spark Machine Learning Complete Guide - Part 1' course will help you grow into a machine learning expert who is skilled in data processing and analysis beyond learning how to implement machine learning models in Spark.

In order to grow into a true machine learning expert, it is very important not only to have ML implementation skills, but also to have the ability to process and combine business data to create ML models. To this end, you will learn how to process data using SQL, which is most commonly used in large-scale data processing in practice, and data analysis techniques based on business domain analysis through practice.

It is designed to help you develop data processing/analysis and ML implementation capabilities through detailed theoretical explanations and practical training.


The problems you will face
We will solve it for you.

Implementing machine learning models on Spark is not easy. This is because it faces many problems that existing data scientists or machine learning experts have not experienced, such as unique machine learning APIs and frameworks based on the Spark architecture, and data processing based on SQL.

In this course, The Complete Guide to Spark Machine Learning, we will develop your ability to solve the problems you encounter .

The first half of the lecture 'Spark Machine Learning Complete Guide - Part 1'

The first half of the lecture consists of detailed theoretical explanations and abundant practical exercises on various elements that make up the Spark Machine Learning Framework, such as DataFrame, SQL, Estimator, Transformer, Pipeline, and Evaluator. Through this, you will be able to implement ML models in Spark easily and quickly .

We will also go into detail about how to use XGBoost and LightGB in Spark, and how to tune hyperparameters using HyperOpt based on Bayesian optimization.

The second half of the lecture 'Spark Machine Learning Complete Guide - Part 1'

The second half of the lecture will improve your real-world data processing/analysis skills and machine learning model implementation skills at the same time through hands-on practice on Kaggle's Instacart Market Basket Analysis competition . The Kaggle Instacart competition is a difficult competition, and the data set consists of e-commerce order processing tables (products, orders, and order products).

Through this data set, you will learn in detail how to process and analyze business data based on SQL, perform feature engineering, how to derive analysis domains from business, and how to create models based on the derived features.

This is Part 1 of the 'Spark Machine Learning Complete Guide' course that is being released this time. Part 2 of the course will be released later, and will cover text analysis, recommendations, and time series analysis.

💻 Please check before taking the class!

  • All of the practical codes in this lecture are based on Python. Scala is not covered, so please refer to this before selecting a lecture.

The practice environment
Please check.

The hands-on training uses Databricks. Databricks provides a notebook environment that allows you to create Spark-based applications on the cloud without installing Spark.

Databricks is officially available for free use for 14 days as a Community version.
And in the video lecture ' Managing Spark Clusters on Databricks and Using Databricks Even After 2 Weeks of Signing Up ' in Section 0, I explain how you can continue to use it for free after 14 days, so please watch that video carefully (for explanation about the Databricks Community version, please refer to the link ).

You can download the lecture practice code and lecture explanation materials from ‘Download the practice code and explanation materials.’


Player knowledge
This is a required course.

This course is designed assuming that students have knowledge of Chapter 5 (Regression) of the Complete Guide to Python Machine Learning or equivalent, and that they have a very basic understanding of SQL . Please refer to the above when selecting a course.

It would be helpful to know the basics of Spark, but even if you don't, you will have no problem following the lecture.

Please check out the player lecture!

The Complete Guide to Python Machine Learning

Stop teaching theory-based machine learning.
From core machine learning concepts to practical skills, easily and accurately.

Are you curious about the interview with the knowledge sharer? (Click)

Recommended for
these people!

Who is this course right for?

  • Anyone who wants to implement machine learning using Spark

  • Those who want to implement machine learning based on large-scale data

  • Anyone who wants to improve their data processing techniques for machine learning using SQL

  • Anyone who wants to learn the entire process of processing data into the desired format and creating an ML model based on it in practice

  • Anyone who wants to improve data analysis, feature engineering capabilities, and ML implementation

Need to know before starting?

  • Understanding up to Chapter 5 (Regression) of the Complete Guide to Python Machine Learning or equivalent prior knowledge

  • Understanding SQL Basics

Hello
This is 권 철민

Students

23,091

Reviews

1,060

Rating

4.9

Courses

12

(전) 엔코아 컨설팅

(전) 한국 오라클

AI 프리랜서 컨설턴트

파이썬 머신러닝 완벽 가이드 저자

Curriculum

All

117 lectures ∙ (24hr 27min)

Lecture resources

are provided.

Published: 
Last updated: 

Reviews

Not enough reviews.
Become the author of a review that helps everyone!