Intermediate | Globant University

Real-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark

Read more about Real-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark

Deep dive into Spark streaming module, with structured streaming. Learn about Spark's micro-batch strategy and aggregations.

Spark Starter Kit

Read more about Spark Starter Kit

Learn about the similarities and differences between Spark and Hadoop, How Spark is faster than Hadoop?. Explore the challenges Spark tries to address, you will give you a good idea about the need for spark. Spark’s performance and efficiency. RDDs. Step by step how the program we write gets translated in to actual execution behind the scenes in a Spark cluster.

Spark Basics

Read more about Spark Basics

Understand Spark basics: Spark Core and RDDs

Spark

Read more about Spark

You’ll learn how to use Spark to work with big data and build machine learning models at scale, including how to wrangle and model massive datasets with PySpark. Learn about big data and how Spark fits into the big data ecosystem. Practice processing and cleaning datasets to get comfortable with Spark’s SQL and dataframe APIs. Debug and optimize your Spark code when running on a cluster. Use Spark’s Machine Learning Library to train machine learning models at scale.

Spark: The definite guide

Read more about Spark: The definite guide

Spark introduction: what is it, modules, data types, operations, aggregations, joins, developing applications

Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners

Read more about Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners

Apache Hive: History, what is it, data flow, modeling, types, modes and main features. Differences with RDBMS.

HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial

Read more about HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial

Learn about HBase: what is it, use cases and applications, storage and architecture. See a quick demo.

Apache Flume Tutorial | Twitter Data Streaming Using Flume | Hadoop Training

Read more about Apache Flume Tutorial | Twitter Data Streaming Using Flume | Hadoop Training

Learn Apache Flume basics, use cases, advantages, architecture and see an example of Twitter Data Streaming

Parquet file, Avro file, RC, ORC file formats in Hadoop | Different file formats in Hadoop

Read more about Parquet file, Avro file, RC, ORC file formats in Hadoop | Different file formats in Hadoop

Learn about file formats in Hadoop, their differences and when to choose each one

MapReduce in detail

Read more about MapReduce in detail

MR design patterns in detail, including stages and good practices