Data Science Fundamentals is a collection of selected resources aimed at providing a solid background for aspiring and junior Data Scientists. The objective is to create a list as small and powerful as possible. There are thousands of sources out there and it is sometimes difficult to focus on what’s important.

Warning

I am afraid to say that there is no shortcut to becoming a professional Data Scientist.

I don’t believe in one-month bootcamps and I find that most MS programs miss some key topics. They are usually too focused on technology and ML/DL algorithms, and often forget about other important things such as communicating results or providing a broad picture of the role of a data scientist in a project.

My main motivation is to create a comprehensive list of resources that will allow future Data Scientists to gain a deep knowledge of a few core competencies from which they can build up their careers. If you have a strong background in any of these competencies, you may still find some other useful stuff on the list. The list is alive, I want to keep it short so I could replace one course if I find something better. Competences should remain mostly the same.

The core competencies covered are:

  • Maths foundation: It is impossible to be successful doing a Data Science project without a solid understanding of Probability, Statistics and Linear Algebra. Machine learning might be useful to solve certain problems, but what you will always be using is statistics and algebra as a fundamental and general framework. It is the most time consuming competence to work on, but it will pay off.
  • Communication skills: You won’t be working in isolation so you will need to show results, explain processes to non-technical audiences or write project proposals. Or you might just need a colleague to help you with something and you will need to explain it as clearly as possible via email or instant message. Learn to talk. Learn to write.
  • Data Science Workflow: Understand the iterative process in data science. The importance of understanding business problems. Help businesses ask the right questions to solve a problem. Identify the role of data scientists in a project.
  • Tools of the trade: You will need git, shell scripting and SQL. And a programming language. Remember that these are just tools to solve problems. Nothing else. I do not recommend learning lots of languages and tools. Stick to a shell, SQL and R/Python, and once you feel comfortable with those you can move forward and learn something else.
  • Business understanding: You will be working in a particular sector and you need to understand this context. And also the different roles in business. Also try to get an idea of why the project is important for the different departments of the company, how it will be impacting customers and Profit & Loss account.
  • Ethics: A great power comes with a great responsability. And although maybe your power is limited in scope at the beginning of your career, everyone working with data should be aware of issues such as data privacy, data ownership, anonimity, fairness and biases.

Resources

There are two levels:

  • Foundation flagged contents are essential to be mastered by Data Scientists.
  • Recommended items are highly useful resources but may not be essential for very junior DS.
CompetenceNameLevel
MathsLinear AlgebraFoundation
MathsStatistical LearningFoundation
MathsStatistics 101 ProbabilityFoundation
MathsNumerical OptimizationRecommended
MathsTime Series AnalysisRecommended
MathsMachine Learning 101Recommended
CommunicationCommunicate with impactFoundation
CommunicationTechnical WritingFoundation
CommunicationData VisualizationFoundation
Data Science WorkflowGood Data AnalysisFoundation
Data Science WorkflowThe Data Science ProcessFoundation
Data Science WorkflowA B TestingRecommended
Data Science WorkflowCausal InferenceRecommended
Data Science WorkflowThe Ultimate Guide to Deploying ML ModelsRecommended
Data Science WorkflowRules of MLRecommended
Tools of the TradeSQLFoundation
Tools of the TradeProgramming LanguageFoundation
Tools of the TradeShell Script and othersFoundation
Tools of the TradeGitFoundation
Tools of the TradeIntroduction to Computer ScienceRecommended
Business UnderstandingOh OhFoundation
EthicsData Science EthicsFoundation

📫 If you have any suggestion do not hesitate to contact me via any social media linked at the bottom.

DISCLAIMERS

I have a preference to learn by reading books but I have tried to include video materials when there are a good alternatives.

In this list you might be missing some popular techniques such as Deep Learning, NLP and others. In my experience, these tools are not essential in a common project in a common company in a common industry. Besides being niche methods, it is quite unlikely that a newcomer will be handed one of this cool projects while more senior Data Scientists are usually doing unglamorous stuff and desperately willing to land a project in which to use any of these.