TrackMan Data Engineering System Design Challenge
The TrackMan Data Engineering System Design Challenge aims to gauge your familiarity with designing cloud based applications and solutions. We also want you to get a feel for some of the tasks you'll encounter on the TrackMan Data Engineering Team.
This challenge is as much about your ability to communicate and justify your decisions as it is the specifics of the design itself
Your Submitted Work
Please supply to following artefacts
- A diagram of your solution showing the relationship between the different components and services. A good tool for generating this diagram is app.diagrams.net
- A README file.
The README should include the following items
- A description of the solution approach taken.
- A description of each component. There is no need to provide code for any component, but for each that would contain code, you should give an overview of what the code would do any technical details about the implementation that you think are relevant.
- Any other notes that you think are valuable for us to know.
What We Care About
Reviewing the application we'll look at the following aspects of the solution
- Is the solution fit for purose?
- Scalability
- Maintainability
- Reliability
- Cost
TrackMan Data Engineering Challenge: Data Lake Infrastructure
For this task, you will design a pipeline in AWS for ingesting data into a data lake in S3.
Task Description
You can approach this as a green field project where we have minimal existing infrastructure in place. You can assume the existance of the following resources:
- A relation database called trackman-backend.
- An s3 bucket called trackman-lake.
Your solution should do the following
- Read data from multiple different tables in trackman-backend.
- Ingest the data into a different relational database (dataengineering-db) which will be used as the backend for an internal application. You do not need to make considerations for what this application does.
- Load the data into trackman-lake in a suitable format.
Technical Requirements and Considerations
Your solution should give consideration to the following constraints:
- trackman-backend is a production database which is used as the backend for a customer-facing application.
- Data should arrive in trackman-lake no later than 1 hour after it is first inserted into trackman-backend.
- Data should arrive in dataengineering-db no later than 30 minutes after it is first inserted into trackman-backend.
- At some point in the future, we expect to ingest data from other data sources in the company into this data lake.
- In the future we will want to make this data available to analysts through a data warehouse.
Good luck !