This modules includes a brief introduction into the world of data, including data analysis, Hadoop ecosystem, Batch vs Streaming pipelines and data lakes fundamentals.
Fundamentals (Python + Linux)
A DA needs to be ready to code and currently Python is the most widely used programming language for data. Combined with linux, this sections provides the desired minimum programming skills.
Choosing the most appropriate data model for your project is not trivial. As a DA, you are expected to be able to define what are the pros and cons of each model for your use case and make the final choice. This module will guide you to the right decisions.
The Apache Hadoop software library is a framework that allows the distributed processing of large data sets across clusters of computers using simple programming models, designed to scale up from single servers to thousands of machines. Apache Spark is an open-source distributed general-purpose cluster-computing framework. Understand how to analyze problems and drive to solutions using data, Hadoop main concepts and architecture, Map Reduce Essentials and Spark main concepts, architecture and basics.
Data Cloud Ecosystem
AWS - Data Ecosystem
Learn to boost your knowledge and experience in different methodologies such as agile-lean practices, risk management, secure coding, devops, etc. in order to increase delivery time and quality
At Globant, we believe that talent is not a special gift; it is something that can be developed!
With that in mind, You can take these opportunities to develop the nine soft skills that describe abilities to act in alignment with Globant’s values. This manifesto will nurture your career!