If you want to be recognized as a machine learning expert based on large-scale data, from understanding the core framework of Spark machine learning, to SQL-based data processing through difficult practical problems, to data analysis through business domain analysis, and to the ability to implement optimized machine learning models, please join this course.
What you will learn!
Implementing Machine Learning Models in Spark
Detailed understanding of DataFrame, the foundation of Spark's data processing
Understand the various technical elements that make up the Spark Machine Learning Framework
Mastering Spark's Machine Learning Pipeline
Ability to use SQL for data analysis
SQL-based Feature Engineering Techniques
Implementing models with XGBoost and LightGBM in Spark
Model hyperparameter tuning method based on Bayesian optimization
Improve your data analysis and ML model implementation skills simultaneously through challenging real-world problems.
Data analysis method based on analysis domain
Various data visualization techniques
Data analysis + feature engineering + ML implementation,
Grab three competencies at once.
Apache Spark, the leader in open source large-scale distributed processing solutions, has met with Machine Learning .
Many large domestic companies and financial institutions are using Apache Spark to analyze large amounts of data and create machine learning models. Since Spark is based on a distributed data processing framework, it can process large amounts of data and create ML models by expanding capacity on a few to dozens of servers . Therefore, it can overcome the limitations of scikit-learn, which can only implement machine learning models on a single server.
The 'Spark Machine Learning Complete Guide - Part 1' course will help you grow into a machine learning expert who is skilled in data processing and analysis beyond learning how to implement machine learning models in Spark.
In order to grow into a true machine learning expert, it is very important not only to have ML implementation skills, but also to have the ability to process and combine business data to create ML models. To this end, you will learn how to process data using SQL, which is most commonly used in large-scale data processing in practice, and data analysis techniques based on business domain analysis through practice.
It is designed to help you develop data processing/analysis and ML implementation capabilities through detailed theoretical explanations and practical training.
Implementing machine learning models on Spark is not easy. This is because it faces many problems that existing data scientists or machine learning experts have not experienced, such as unique machine learning APIs and frameworks based on the Spark architecture, and data processing based on SQL.
In this course, The Complete Guide to Spark Machine Learning, we will develop your ability to solve the problems you encounter .
The first half of the lecture consists of detailed theoretical explanations and abundant practical exercises on various elements that make up the Spark Machine Learning Framework, such as DataFrame, SQL, Estimator, Transformer, Pipeline, and Evaluator. Through this, you will be able to implement ML models in Spark easily and quickly .
We will also go into detail about how to use XGBoost and LightGB in Spark, and how to tune hyperparameters using HyperOpt based on Bayesian optimization.
The second half of the lecture will improve your real-world data processing/analysis skills and machine learning model implementation skills at the same time through hands-on practice on Kaggle's Instacart Market Basket Analysis competition . The Kaggle Instacart competition is a difficult competition, and the data set consists of e-commerce order processing tables (products, orders, and order products).
Through this data set, you will learn in detail how to process and analyze business data based on SQL, perform feature engineering, how to derive analysis domains from business, and how to create models based on the derived features.
This is Part 1 of the 'Spark Machine Learning Complete Guide' course that is being released this time. Part 2 of the course will be released later, and will cover text analysis, recommendations, and time series analysis.
💻 Please check before taking the class!
The hands-on training uses Databricks. Databricks provides a notebook environment that allows you to create Spark-based applications on the cloud without installing Spark.
Databricks is officially available for free use for 14 days as a Community version.
And in the video lecture ' Managing Spark Clusters on Databricks and Using Databricks Even After 2 Weeks of Signing Up ' in Section 0, I explain how you can continue to use it for free after 14 days, so please watch that video carefully (for explanation about the Databricks Community version, please refer to the link ).
You can download the lecture practice code and lecture explanation materials from ‘Download the practice code and explanation materials.’
This course is designed assuming that students have knowledge of Chapter 5 (Regression) of the Complete Guide to Python Machine Learning or equivalent, and that they have a very basic understanding of SQL . Please refer to the above when selecting a course.
It would be helpful to know the basics of Spark, but even if you don't, you will have no problem following the lecture.
Are you curious about the interview with the knowledge sharer? (Click)
Who is this course right for?
Anyone who wants to implement machine learning using Spark
Those who want to implement machine learning based on large-scale data
Anyone who wants to improve their data processing techniques for machine learning using SQL
Anyone who wants to learn the entire process of processing data into the desired format and creating an ML model based on it in practice
Anyone who wants to improve data analysis, feature engineering capabilities, and ML implementation
Need to know before starting?
Understanding up to Chapter 5 (Regression) of the Complete Guide to Python Machine Learning or equivalent prior knowledge
Understanding SQL Basics
Students
23,090
Reviews
1,060
Rating
4.9
Courses
12
(전) 엔코아 컨설팅
(전) 한국 오라클
AI 프리랜서 컨설턴트
파이썬 머신러닝 완벽 가이드 저자
All
117 lectures ∙ (24hr 27min)
are provided.