Software Engineer

Data & Analytics Engineer

Welcome to the Data & Analytics working ecosystem. A Data & Analytics Architect is a strategic profile inside a project. Combining soft, business knowledge and tech skills are able to make a direct impact on the client's needs and tackle several challenges. Data Architects are professionals prepared for analyzing different business cases, suggesting solutions based on data, creating data products or building data platforms. Once having completed these courses, you will have been introduced to the Data & Analytics world, gained hands-on experience with basic tools and acquired a theoretical knowledge which will ease your future learning. It is important to understand that no new position nor studio change is granted and that depends on several external factors which should be patiently evaluated for each individual scenario. Powered by Data & Analytics Studio

Read

General Understanding (18) Fundamentals (Python + Linux) (10) Data models (14) Hadoop ecosystem (22) Data Cloud Ecosystem (19) AWS - Data Ecosystem (3) Methodologies (11) Talent Manifesto (9)

General Understanding

This modules includes a brief introduction into the world of data, including data analysis, Hadoop ecosystem, Batch vs Streaming pipelines and data lakes fundamentals.

Specific Contents (18)

Batch vs Stream processing

8 resources

Data Analysis

4 resources

BD ecosystem

4 resources

"Main concepts (3Vs

1 resource

scalability

2 resources

etc)"

1 resource

Data Science

1 resource

"Main concepts (3Vs

1 resource

etc)"

1 resource

BASE vs ACID / CAP

1 resource

Distributed systems

6 resources

Replication

2 resources

Partitioning

1 resource

Stream processing

7 resources

Real time

4 resources

Data Lake

1 resource

Lambda Architecture

1 resource

Kappa Architecture

1 resource

Fundamentals (Python + Linux)

A DA needs to be ready to code and currently Python is the most widely used programming language for data. Combined with linux, this sections provides the desired minimum programming skills.

Specific Contents (10)

OOP

8 resources

Data structures

8 resources

Python

9 resources

Algorithmic

8 resources

Modules & Packages

7 resources

Testing

1 resource

Concurrency

1 resource

Microservices

3 resources

Flask

1 resource

Linux

2 resources

Data models

Choosing the most appropriate data model for your project is not trivial. As a DA, you are expected to be able to define what are the pros and cons of each model for your use case and make the final choice. This module will guide you to the right decisions.

Specific Contents (14)

Main concepts

3 resources

Programming with SQL

RDBMS

3 resources

Data modeling

3 resources

NoSQL basic concepts

3 resources

Columnar DB

3 resources

KeyValue store

3 resources

Graph databases

3 resources

Document Store

3 resources

OLAP vs OLTP

2 resources

Cassandra

2 resources

NewSQL

1 resource

SQL vs NoSQL

1 resource

Cache Store

1 resource

Hadoop ecosystem

The Apache Hadoop software library is a framework that allows the distributed processing of large data sets across clusters of computers using simple programming models, designed to scale up from single servers to thousands of machines. Apache Spark is an open-source distributed general-purpose cluster-computing framework. Understand how to analyze problems and drive to solutions using data, Hadoop main concepts and architecture, Map Reduce Essentials and Spark main concepts, architecture and basics.

Specific Contents (22)

Data Cloud Ecosystem

Specific Contents (19)

Main concepts

6 resources

IAM

4 resources

Cloud Storage

5 resources

Compute Engine

3 resources

CloudSQL

3 resources

BigQuery

3 resources

BigTable

3 resources

Datastore/Firestore

3 resources

App Engine

1 resource

Kubernetes Engine (GKE)

1 resource

Cloud Functions

1 resource

MemoryStore

1 resource

Spanner (NewSQL)

1 resource

Dataproc

2 resources

Dataflow

2 resources

PubSub

3 resources

Composer

1 resource

Datalab/AI Platform

2 resources

AutoML

1 resource

AWS - Data Ecosystem

Specific Contents (3)

General architecture

3 resources

Main concepts

3 resources

Architecture

3 resources

Methodologies

Learn to boost your knowledge and experience in different methodologies such as agile-lean practices, risk management, secure coding, devops, etc. in order to increase delivery time and quality

Specific Contents (11)

Agile Mindset

Scrum

Lean

Design thinking

Kanban

POD Methodology

Scale

Risk Management

Technical Mastery XP

DevOPS

Secure Development

Talent Manifesto

At Globant, we believe that talent is not a special gift; it is something that can be developed! With that in mind, You can take these opportunities to develop the nine soft skills that describe abilities to act in alignment with Globant’s values. This manifesto will nurture your career!

Software Engineer

Data & Analytics Engineer

Contents

General Understanding

Specific Contents (18)

Batch vs Stream processing

Data Analysis

BD ecosystem

"Main concepts (3Vs

scalability

etc)"

Data Science

"Main concepts (3Vs

etc)"

BASE vs ACID / CAP

Distributed systems

Replication

Partitioning

Stream processing

Real time

Data Lake

Lambda Architecture

Kappa Architecture

Fundamentals (Python + Linux)

Specific Contents (10)

OOP

Data structures

Python

Algorithmic

Modules & Packages

Testing

Concurrency

Microservices

Flask

Linux

Data models

Specific Contents (14)

Main concepts

Programming with SQL

RDBMS

Data modeling

NoSQL basic concepts

Columnar DB

KeyValue store

Graph databases

Document Store

OLAP vs OLTP

Cassandra

NewSQL

SQL vs NoSQL

Cache Store

Hadoop ecosystem

Specific Contents (22)

General architecture

MapReduce

HDFS

YARN

File Formats

Main concepts

Architecture

Advanced concepts

Hive

Sqoop

Flume

HBase

Pig

Zookeeper

SparkSQL

Spark streaming

"Other Spark modules (ML

etc)"

"Other Spark modules (ML

etc)"

"Other Spark modules (ML

etc)"

Data Cloud Ecosystem

Specific Contents (19)

Main concepts

IAM

Cloud Storage