Machine Learning With Apache Spark

IT Business Management Training

Machine Learning With Apache Spark

Available since October 31, 2019
...
Category

IT Business Management Training

Duration

2 days

Course description

Course Agenda:
Applied Data Science and Business Analytics
Machine Learning Algorithms, Techniques and Common Analytical Methods
Apache Spark Introduction
Spark’s MLlib Machine Learning Library

Target audience

Data Scientists, Business Analysts, Software Developers, IT Architects

Course requirements

Participants should have the general knowledge of statistics and programming

Course Plan

Section 01

Chapter 1

  • Machine Learning Algorithms
  • Supervised vs Unsupervised Machine Learning
  • Supervised Machine Learning Algorithms
  • Unsupervised Machine Learning Algorithms
  • Choose the Right Algorithm
  • Life-cycles of Machine Learning Development
  • Classifying with k-Nearest Neighbors (SL)k-Nearest Neighbors Algorithmk-Nearest Neighbors Algorithm
  • The Error Rate
  • Decision Trees (SL)Random Forests
  • Unsupervised Learning Type: ClusteringK-Means Clustering (UL)K-Means Clustering in a Nutshell
  • Regression Analysis
  • Logistic Regression
  • Summary
Section 02

Chapter 2

  • Introduction to Functional Programming
  • What is Functional Programming (FP)?
  • Terminology: Higher-Order Functions
  • Terminology: Lambda vs Closure
  • A Short List of Languages that Support FPFP with JavaFP With JavaScript
  • Imperative Programming in JavaScript
  • The JavaScript map (FP) Example
  • The JavaScript reduce (FP) Example
  • Using reduce to Flatten an Array of Arrays (FP) Example
  • The JavaScript filter (FP) Example
  • Common High-Order Functions in Python
  • Common High-Order Functions in Scala
  • Elements of FP in R
  • Summary
Section 03

Chapter 3

  • Introduction to Apache Spark
  • What is Apache Spark
  • A Short History of Spark
  • Where to Get Spark?The Spark Platform
  • Spark Logo
  • Common Spark Use Cases
  • Languages Supported by Spark
  • Running Spark on a Cluster
  • The Driver Process
  • Spark Applications
  • Spark Shell
  • The spark-submit Tool
  • The spark-submit Tool Configuration
  • The Executor and Worker Processes
  • The Spark Application Architecture
  • Interfaces with Data Storage Systems
  • Limitations of Hadoop's MapReduce
  • Spark vs MapReduce
  • Spark as an Alternative to Apache Tez
  • The Resilient Distributed Dataset (RDD)
  • Spark Streaming (Micro-batching)Spark SQL
  • Example of Spark SQLSpark Machine Learning Library
  • GraphXSpark vs R
  • Summary
Section 04

Chapter 4

  • The Spark Shell
  • The Spark Shell UI
  • Spark Shell Options
  • Getting Help
  • The Spark Context (sc) and SQL Context (sqlContext)
  • The Shell Spark Context
  • Loading Files
  • Saving Files
  • Basic Spark ETL Operations
  • Summary
Section 05

Chapter 5

  • Spark Machine Learning Library
  • What is MLlib?
  • Supported Languages
  • MLlib Packages
  • Dense and Sparse Vectors
  • Labeled Point
  • Python Example of Using the Labeled
  • Point Class
  • LIBSVM format
  • An Example of a LIBSVM File
  • Loading LIBSVM Files
  • Local Matrices
  • Example of Creating Matrices in MLlib
  • Distributed Matrices
  • Example of Using a Distributed Matrix
  • Classification and Regression Algorithm
  • Clustering
  • Summary
Section 06

Chapter 6

  • Text Mining
  • What is Text Mining?
  • The Common Text Mining Tasks
  • What is Natural Language Processing (NLP)?
  • Some of the NLP Use Cases
  • Machine Learning in Text Mining and NLP
  • Machine Learning in NLPTF-IDF
  • The Feature Hashing Trick
  • Stemming
  • Example of Stemming
  • Stop Words
  • Popular Text Mining and NLP Libraries and Packages
  • Summary
  • Lab Exercises
  • Lab 1. Learning the Lab Environment
  • Lab 2. The Spark Shell
  • Lab 3. Using Random Forests for Classification with Spark MLlib
  • Lab 4. Using k-means Algorithm from MLlib
  • Lab 5. Text Classification with Spark ML Pipeline

Reviews

Coming soon.

Scroll to top