Photo by explorenation # on Unsplash

The Capstone Project: The first think

Maximilian Pollak
3 min readFeb 7, 2022

--

The initial thoughts about how to realize my Data Engineering capstone project involving stock analysis.

With this series of articles, I want to take you on a road with me, through trials, thoughts, experiments, troubles and success that a Data Engineer (DE) in training might encounter when he builds his first fully-fledged DE project.

Project overview:

The project's goal is to have daily data on stocks and ETF’s and use it to build your own portfolio in a frontend. It also should have a nice visualization and simple to understand analytics of said portfolio.

I want this “product” to make my own life easier, by having a nice and simple way to track how my portfolio is performing. Especially for dividends investing there aren’t any nice/easy ways to do this yet.
So I decided to make this my capstone project for the Pipeline Academy Data Engineering Bootcamp.

I hope you enjoy these articles and come along for the ride.

First figure out what you want, then figure out what you need.

Photo by Brett Jordan on Unsplash

What is needed?

The first part I have to do is to figure out what is really needed to make it work, and what can be added later to improve upon the skeleton product.
So let’s list what we need:

  1. We need a way to get tickers from the stock exchanges that we want to track.
  2. We need a way to automatically get all the financial data that we want for these tickers
  3. We need a way to transform the data (since we have to grab it from multiple sources) and combine it to make a complete picture.
  4. We need a way to store the data so it’s retrievable (RDB)
  5. We need a way to interact and communicate with all parts (data acquisition, transformation, data storing) (most likely an API)
  6. We need a frontend that can display the stock information.

Now, this is the minimum we need. We do not need it at scale right away, we just need to do it for one or two stocks and if that works, then we can slowly expand each part of the process.
The reason why I want to keep things separate like this is so that each part is its own and can be upgraded or changed without affecting the other parts.
So we in total have 5 different separate parts.

Data-Acquisition | -Transformation | -Storage | Middleware (API)| Frontend

Now I’m unsure if this will be the final 5 big parts we have or if some will merge (like Data Acquisition and Transformation) or not.

Photo by Kelly Sikkema on Unsplash

What to do now:

Since the first step would be to build a “skeleton” version of the project, we should start with doing that. Figuring out how to do each step in the list above in a simple and minimalistic way, and figuring out where to get all the data from and how to transform and display it.

--

--

Maximilian Pollak

A 27 year old aspiring Data Engineer with an interest for programming, investing, reading and science. I hope to learn lots here