Gustavo Saidler
by Gustavo Saidler
2 min read

Here you can find a list of materials I used for my own learning and highly recommend:


Python Crash Course: This is my absolute favorite beginner’s Python book, it is easy to follow and actually a pleasant read. If you want to start learning Python, this is in my opinion a must read! This book is split in 2 parts:

  1. The first half will cover the basics (programming concepts, such as lists, dictionaries, classes, and loops)
  2. The second half covers 3 projects: the first is a game, the second is about data visualization, and the third is a web application

Automate the Boring Stuff with Python: This book is all about showing you how to use Python to perform all sorts of repetitive tasks, but it is also a beginner’s book. The first part of it covers the Python basics: Python shell, variables, list, dictionaries… Now, the real purpose of this book is, as the title suggest, to show you the many ways you can automate things with Python. If you have an office job, I am pretty sure you perform a bunch of manual tasks on your computer on a daily basis. Such tasks are begging to be automated, giving you back some precious time.

Designing Data-Intensive Applications: This book would be my recommendation if you could only choose one resource from this entire list. Instead of focusing on specific technologies, the book discusses every aspect of distributed data systems from first principles, painting a coherent picture of the entire Big Data landscape.

Blog Posts

A Beginner’s Guide to Data Engineering, Robert Chang. A three post series where the author goes from describing ETL best practices to how to build ETL frameworks. While these posts are quite focused on Airflow, I believe these are a nice read regardless of which ETL tools you use.

The Rise of the Data Engineer, Maxime Beauchemin. Must read post, full of clarity and insights from Apache Airflow’s creator. A great description of the field.

Interactive coding

DataCamp: Offers interactive R, Python, Sheets, SQL and shell courses. All on topics in data science, statistics and machine learning. It is a great place to start, with a “learn by doing” philosophy. Their methodology include giving a bit of theory on short videos, then jumping straight to interactive exercises.

Codecademy: Similar to DataCamp, it is an online interactive platform that offers free coding classes in 12 different programming languages including Python, Java, JavaScript, Ruby, SQL, and Sass, as well as markup languages HTML and CSS. Perhaps a bit more challenging (and more fun) than DataCamp. The Python course in particular is really complete.

Online learning

Safari Books Online: O’Reilly’s online learning platform, with unlimited access to videos, live online training, learning paths, books, tutorials, and more.