Data Science Fundamentals
Data Science Fundamentals is a collection of selected resources aimed at providing a solid background for aspiring and junior Data Scientists. The objective is to create a list as small and powerful as possible. There are thousands of sources out there and it is sometimes difficult to focus on what’s important.
I am afraid to say that there is no shortcut to becoming a professional Data Scientist. I don’t believe in one-month bootcamps and I find that most MS programs miss some key topics. They are usually too focused on technology and ML/DL algorithms, and often forget about other important things such as communicating results or providing a broad picture of the role of a data scientist in a project.
My main motivation is to create a comprehensive list of resources that will allow future Data Scientists to gain a deep knowledge of a few core competencies from which they can build up their careers. If you have a strong background in any of these competencies, you may still find some other useful stuff on the list. The list is alive, I want to keep it short so I could replace one course if I find something better. Competences should remain mostly the same.
The core competencies covered are:
- Maths foundation: It is impossible to be successful doing a Data Science project without a solid understanding of Probability, Statistics and Linear Algebra. Machine learning might be useful to solve certain problems, but what you will always be using is statistics and algebra as a fundamental and general framework. It is the most time consuming competence to work on, but it will pay off.
- Communication skills: You won’t be working in isolation so you will need to show results, explain processes to non-technical audiences or write project proposals. Or you might just need a colleague to help you with something and you will need to explain it as clearly as possible via email or instant message. Learn to talk. Learn to write.
- Data Science Workflow: Understand the iterative process in data science. The importance of understanding business problems. Help businesses ask the right questions to solve a problem. Identify the role of data scientists in a project.
- Tools of the trade: You will need git, shell scripting and SQL. And a programming language. Remember that these are just tools to solve problems. Nothing else. I do not recommend learning lots of languages and tools. Stick to a shell, SQL and R/Python, and once you feel comfortable with those you can move forward and learn something else.
- Business understanding: You will be working in a particular sector and you need to understand this context. And also the different roles in business. Also try to get an idea of why the project is important for the different departments of the company, how it will be impacting customers and Profit & Loss account.
- Ethics: A great power comes with a great responsability. And although maybe your power is limited in scope at the beginning of your career, everyone working with data should be aware of issues such as data privacy, data ownership, anonimity, fairness and biases.
There are two levels:
- Foundation flagged contents are essential to be mastered by Data Scientists.
- Recommended items are highly useful resources but may not be essential for very junior DS.
|Maths||Statistics 101 Probability||Foundation|
|Maths||Time Series Analysis||Recommended|
|Maths||Machine Learning 101||Recommended|
|Communication||Communicate with impact||Foundation|
|Data Science Workflow||Good Data Analysis||Foundation|
|Data Science Workflow||The Data Science Process||Foundation|
|Data Science Workflow||A B Testing||Recommended|
|Data Science Workflow||Causal Inference||Recommended|
|Data Science Workflow||The Ultimate Guide to Deploying ML Models||Recommended|
|Data Science Workflow||Rules of ML||Recommended|
|Tools of the Trade||SQL||Foundation|
|Tools of the Trade||Programming Language||Foundation|
|Tools of the Trade||Shell Script and others||Foundation|
|Tools of the Trade||Git||Foundation|
|Tools of the Trade||Introduction to Computer Science||Recommended|
|Business Understanding||Oh Oh||Foundation|
|Ethics||Data Science Ethics||Foundation|
📫 If you have any suggestion do not hesitate to contact me via twitter at @pelayoarbues.
I have a preference to learn by reading books but I have tried to include video materials when there are a good alternatives.
In this list you might be missing some popular techniques such as Deep Learning, NLP and others. In my experience, these tools are not essential in a common project in a common company in a common industry. Besides being niche methods, it is quite unlikely that a newcomer will be handed one of this cool projects while more senior Data Scientists are usually doing unglamorous stuff and desperately willing to land a project in which to use any of these.