The Blueprint for a Data-Driven Life
Greetings,
I hope you are doing well and had a great weekend celebrating the start of the year’s final month.
To commemorate this, I am beginning a new series. Similar to my previous work, this series combines prior topics but pivots from theoretical discussion to practical, data-driven application. This series is titled “Data-Driven: Architecting Personal Systems” and will be divided into four parts.
The core methodology is to cover data science concepts and then build them into project components with real-world motivation.
The 4 parts are: Ingest, Engineer, Model, and Deploy.
Press enter or click to view image in full size
In this introductory article, I will explain why this structure is appropriate, briefly cover the concepts, and outline the project design. The following three articles will cover the four parts of the data science lifecycle iteratively, providing interactive building blocks for your own projects.
Let’s start with the motivation and the defense of this structure.
My 4-Layers of Data Science
As mentioned, the four parts are Ingest, Engineer, Model, and Deploy. I chose this structure to cover the full spectrum of concepts utilized in the field of data science — from having zero information to building a finished artifact that showcases your insights.
The layers can be defined as follows:
Ingest:
Data Collection, Structure, Validation
Engineer:
Feature Engineering, Pattern Definition
Model:
Analysis, Insights, & Action
Deploy:
Dashboarding, Operational & Auditing Systems, Visualization & Measurement
In each layer, multiple concepts are involved. This breakdown is not based on a corporate job profile; rather, it is based on the data-driven cycle for personal application and system building.
The Concepts Breakdown
For the upcoming articles, each layer will feature technical concepts defined by current standards. Each concept will include an example for understanding and, finally, an application towards our project (with instructions).
The concept breakdown is as follows:
Ingestion:
Data Sourcing (APIs/Scraping/Manual), Storage Formats, and Schema Validation.
Engineer:
Cleaning (Imputation), Feature Extraction, Transformation (Scaling), and Encoding.
Model:
Exploratory Analysis (EDA), Algorithm Selection (Regression/Clustering), and Evaluation Metrics.
Deploy:
Serving (APIs), Dashboarding (Streamlit/Viz), Automation (Cron/Actions), and Monitoring.
These layers are separated sequentially to assist with a data-driven mindset.
Ingest:
Data is collected and converted into a preferred format (Schema) tailored to the use case.
Engineer:
We build upon the previous layer by cleaning and validating the data, extracting important elements, and defining attributes.
Model:
We perform Exploratory Data Analysis (EDA) to find patterns, utilize algorithms to generate results, and evaluate them to ensure scalability and minimize errors.
Deploy:
Finally, we showcase the project visually. This provides a medium to understand the entire process, with automation to sustain growth and monitoring to assess gaps for improvement.
Architecting the Project
As promised, we will utilize these layers to build a project that converts information into action. To ground this data-driven project in personal thinking, nothing beats building a Personal System.
I will be adopting the structure explained in my previous series: the “Solo-Social System.” In that system, we have four parts: Execution, Reflection, Refinement, and Evolution. (Read here)
To incorporate this into our data science project, the structure changes slightly. It is not a 1-to-1 mapping. Instead, all layers of the Personal System feed into the Data-Driven Pipeline.
To elaborate:
The Ingest layer will intake data from your Execution, Reflection, Refinement, and Evolution logs.
The Engineer and Model layers will process that data to find relationships between those four personal aspects.
The idea is simple: each layer provides distinct, important information. By interconnecting them, new patterns emerge.
The project will not be restricted to one specific tech stack. In the beginning, we will explore multiple options, tapering down in the deployment phase to keep it feasible for most readers.
Conclusion
Thank you for reading. I hope you are excited to learn about data science concepts from the perspective of life experiences rather than just textbook definitions.
Moreover, I hope you are ready to build your own system — for yourself, by yourself. It is going to be a fun, interesting journey where you will learn more about yourself through the application of data science.
In the next article, I will cover Ingestion Concepts, understanding them through the system, and detailing the setup and design for the project.
Until next time.