As a candidate, I’ve interviewed at a dozen big companies and startups. I’ve got offers for machine learning roles at companies including Google, NVIDIA, Snap, Netflix, Primer AI, and Snorkel AI. I’ve also been rejected at many other companies.
As an interviewer, I’ve been involved in designing and executing the hiring process at NVIDIA and Snorkel AI, having taken steps from cold emailing candidates whose work I love, screening resumes, doing exploratory and technical interviews, debating whether or not to hire a candidate, to trying to convince candidates to choose us over competitive offers.
As a friend and teacher, I’ve helped many friends and students prepare for their machine learning interviews at big companies and startups. I give them mock interviews and take notes of the process they went through as well as the questions they were asked.
I’ve also consulted several startups on their machine learning hiring pipelines. Hiring for machine learning roles turned out to be pretty difficult when you don’t already have a strong in-house machine learning team and process to help you evaluate candidates. As the use of machine learning in the industry is still pretty new, a lot of companies are still making it up as they go along, which doesn’t make it easier for candidates.
This book is the result of the collective wisdom of many people who have sat on both sides of the table and who have spent a lot of time thinking about the hiring process. It was written with candidates in mind, but hiring managers who saw the early drafts told me that they found it helpful to learn how other companies are hiring, and to rethink their own process.
The blog consists of two parts. The first part provides an overview of the machine learning interview process, what types of machine learning roles are available, what skills each role requires, what kinds of questions are often asked, and how to prepare for them. This part also explains the interviewers’ mindset and what kind of signals they look for.
The second part consists of over 200 knowledge questions, each noted with its level of difficulty — interviews for more senior roles should expect harder questions — that cover important concepts and common misconceptions in machine learning.
After you’ve finished this blog, you might want to checkout the 30 open-ended questions to test your ability to put together what you know to solve practical challenges. These questions test your problem-solving skills as well as the extent of your experiences in implementing and deploying machine learning models. Some companies call them machine learning systems design questions. Almost all companies I’ve talked to ask at least a question of this type in their interview process, and they are the questions that candidates often find to be the hardest.
“Machine learning systems design” is an intricate topic that merits its own book. To learn more about it, check out my course CS 329S: Machine learning systems design at Stanford.
This blog is not a replacement to machine learning textbooks nor a shortcut to game the interviews. It’s a tool to consolidate your existing theoretical and practical knowledge in machine learning. The questions in this blog can also help identify your blind/weak spots. Each topic is accompanied by resources that should help you strengthen your understanding of that topic.
Target audience
If you’ve picked up this blog because you’re interested in working with one of the key emerging technologies of the 2020s but not sure where to start, you’re in the right place. Whether you want to become an ML engineer, a platform engineer, a research scientist, or you want to do ML but don’t yet know the differences among those titles, I hope that this blog will give you some useful pointers.
This blog focuses more on roles involving machine learning production than research, not because I believe production is more important. As fewer and fewer companies can afford to pursue pure research whereas more and more companies want to adopt machine learning, there will be, and already are, vastly more roles involving production than research.
This blog was written with two main groups of candidates in mind:
-
- Recent graduates looking for their first full-time jobs.
-
- Software engineers and data scientists who want to transition into machine learning.
I imagine the majority of readers of this blog come from a computer science background. The second part of the blog, where the questions are, is fairly technical. However, as machine learning finds its use in more industries — healthcare, farming, trucking, fashion, you name it — the field needs more people with diverse interests. If you’re interested in machine learning but hesitant to pursue it because you don’t have an engineering degree, I strongly encourage you to explore it. This blog, especially the first part, might address some of your needs. After all, I only took an interest in matrix manipulation after working as a writer for almost a decade.
About the questions
The questions in this blog were selected out of thousands of questions, most have been asked in actual interviews for machine learning roles. You will find several questions that are technically incorrect or ambiguous. This is on purpose. Sometimes, interviewers ask these questions to see whether candidates will correct them, point out the edge cases, or ask for clarification. For these questions, the accompanying hints should help clarify the ambiguity or technical incorrectness.
Machine learning is a tool, and to effectively use any tool, we should know how, why, or when to use it on top of knowing what it is. Because the “what” questions can be easily found online, and if something can be easily acquired, it isn’t worth testing for. This blog focuses on the “how”, “why”, and “when” questions. For example, instead of asking for the exact algorithm for K-means clustering, the question asks in what scenarios K-means doesn’t work. You don’t need to understand K-means to cite its definition, but you do to know when not to use it.
Still, this blog contains a small number of “what” questions. While they aren’t good interview questions, they are good for interview preparation.
About the answers
I started the blog with the naive optimism that I’d write the answers for every question in this blog. It turned out writing detailed answers to 300+ technical questions — while juggling a full-time job, a teaching gig, a raging pandemic with a couple of family members in the hospital — is a lot.
Given the slow progress I’ve been making, I’ve decided that publishing the draft then continuing writing/crowdsourcing the answers might be more productive. The first draft of the blog has the answers for about 10
Hiring is a process, and questions aren’t evaluated in isolation. Your answer to each question is evaluated as part of your performance during the entire process. A candidate who claims to work with computer vision and fails to answer a question about techniques typically used for computer vision tasks is going to be evaluated differently from a candidate who doesn’t work with computer vision at all. Interviewers often care more about your approach than the actual objective correctness of your answers.
Gaming the interview process
People often ask me: “Don’t you worry that candidates will just memorize the answers in this blog and game the system?”
First, I don’t encourage interviewers to ask the exact questions in this blog, but I hope this blog provides a framework for interviewers to distinguish good questions from bad ones.
Second, there’s nothing wrong with memorizing something as long as that memorization is useful. The problem begins when memorization is impractical — candidates memorize something to pass the interviews and never use that knowledge again, or don’t know how to use it in real situations.
For this book, I aimed to include only concepts that I and many of my helpful colleagues deemed practical. For every concept, I ask: “Where in the real world is it used?” If I can’t find a good answer after extensive research, the concept is discarded. For example, while I chose to include questions about inner product and outer product, I left out cross product. You can see the list of discarded questions in the list of “Bad questions” on the book’s GitHub repository. This is far from a foolproof process. As the field expands, concepts that aren’t applicable now might be all that AI researchers ever talk about in 2030.
Interviews are stressful, even more so when they are for your dream job. As someone who has been in your shoes, and might again be in your shoes in the future, I just want to tell you that it doesn’t have to be so bad. Each interview is a learning experience. An offer is great, but a rejection isn’t necessarily a bad thing and is never the end of the world.
There are many random variables that influence the outcome of an interview: the questions asked, other candidates the interviewer has seen before you, after you, the interviewer’s expectation, even the interviewer’s mood. It is, in no way, a reflection of your ability or your self-worth.
I was pretty much rejected for every job I applied to when I began this process. Now I just get rejected at a less frequent rate. Keep on learning and improving. You’ve got this!
Part I. Overview
Chapter 1. Machine learning jobs
Before embarking on a journey to find a job, it might be helpful to know what types of jobs there are. These jobs vary wildly from company to company based on their focus, customer profile, and stage, so in the second part of this chapter, we will go over different types of companies.
1.1 Different machine learning roles
Some of the roles we’ll look into in this chapter:
-
- Machine learning engineer
-
- Data scientist
-
- ML/AI platform engineer
-
- ML/AI infrastructure engineer
-
- Framework engineer
-
- Solution architect
-
- Developer advocate
-
- Solutions engineer
-
- Applications engineer
-
- Applied research scientist
-
- Research engineer
-
- Research scientist
1.1.1 Working in research vs. working in production
I use research vs. production instead of academia vs. industry because even though academia is mostly concerned with research, research isn’t mostly done in academia. In fact, ML research nowadays is spearheaded by big corporations. See 1.1.2 Research for more details.
The first question you might want to figure out is whether you want to work in research or in production. They have very different job descriptions, requirements, hiring processes, and compensations.
The goal of research is to find the answers to fundamental questions and expand the body of theoretical knowledge. A research project usually involves using scientific methods to validate whether a hypothesis or a theory is true, without worrying about the practicality of the results.
The goal of production is to create or enhance a product. A product can be a good (e.g. a car), a service (e.g. ride-sharing service), a process (e.g. detecting whether a transaction is fraudulent), or a business insight (e.g. “to maximize profit we should increase our price 10
A research project doesn’t need users, but a product does. For a product to be useful, it has many more requirements other than just performance, such as inference latency, interpretability (both to users and to developers), fairness (to all subgroups of users), adaptability to changing environment. The majority of a production team’s job might be to ensure these other requirements.
The given definitions above are, of course, handwavy at best. What’s research and what’s production in machine learning remain a heated topic of debate as of 20213. One reason for the ambiguity is that novel ideas with obvious usefulness tend to attract more researchers, and solving practical problems often requires coming up with novel ideas.
For more differences between machine learning in research and in production, see Stanford’s CS 329S, lecture 1: Understanding machine learning production.
As a candidate, if you’re unfamiliar with both and not sure whether you want to find roles in research or in production, the latter might be the smoother path. There are many more roles involving production than roles involving research.
3: One example is the argument whether GPT-3 is research. Many researchers were upset when Language Models are Few-Shot Learners (OpenAI, 2020) was awarded the best paper at NeurIPS because they didn’t consider it research.
1.1.2 Research
As the research community takes the “bigger, better” approach, new models often require a massive amount of data and tens of millions of dollars in computing. The estimated market cost to train DeepMind’s AlphaStar and OpenAI’s GPT-3 is in the tens of millions each45. Most companies and academic institutions can’t afford to pursue pure research.
Outside academic institutions, there are only a handful of machine learning research labs in the world. Most of these labs are funded by corporations with deep pockets such as Alphabet (Google Brain, DeepMind), Microsoft, Facebook, Tencent6. You can find these labs by browsing the affiliations of published papers at major academic conferences including NeurIPS, ICLR, ICML, CVPR, ACL. In 2019 and 2020, Alphabet accounts for over 10
Tip Not all these industry labs publish papers — companies like Apple and Tesla are notoriously secretive. Even if an industry lab publishes, it might only publish a portion of its research. Before joining an industry lab, you might want to consider its publishing policy. Joining a secretive lab might mean that you won’t be able to explain to other people what you’ve been working on or what you’re capable of doing.
4: State of AI Report 2019 by Nathan Benaich and Ian Hogarth.
5: OpenAI’s massive GPT-3 model is impressive, but size isn’t everything by VentureBeat.
6: In an earlier draft of this book, I included Uber AI and Element AI. However, Uber AI research lab was laid off in April 2020, and Element AI was sold for cheap in November 2020.
1.1.2.1 Research vs. applied research
At some companies, you might encounter roles involving applied research. Applied research is somewhere between research and production, but much closer to research than production. Applied research involves finding solutions to practical problems, but doesn’t involve implementing those solutions in actual production environments.
Applied researchers are researchers. They come up with novel hypotheses and theses as well as validate them. However, since their hypotheses and theses deal with practical problems, they need to understand these problems as well. In industry lingo, they need to have subject matter expertise.
In machine learning, an example of a research project would be to develop an unsupervised transfer learning method for computer vision, experiment on a standard academic dataset. An example of an applied research project would be to develop techniques to make that new method work on a real-world problem in a specific industry, e.g. healthcare. People working on this applied research project will, therefore, need to have expertise in both machine learning and healthcare.
1.1.2.2 Research scientist vs. research engineer
There’s much confusion about the role of a research engineer. This is a rare role, often seen at major research labs in the industry. Loosely speaking, if the role of a research scientist is to come up with original ideas, the role of a research engineer is to use their engineering skills to set up and run experiments for these ideas. The research scientist role typically requires a Ph.D. and/or first author papers at top-tier conferences. The research engineer role doesn’t, though publishing papers always helps.
For some teams, there’s no difference between a research scientist and a research engineer. Research scientists should, first and foremost, be engineers. Both research scientists and engineers come up with ideas and implement those ideas. A researcher might also act as an advisor guiding research engineers in their own research. It’s not uncommon to see research scientists and research engineers be equal contributors to papers7. The different job titles are mainly a product of bureaucracy — research scientists are supposed to have bigger academic clout and are often better paid than research engineers.
Startups, to attract talents, might be more generous with the job titles. A candidate told me he chose a startup over a FAAAM company because the startup gave him the title of a research scientist, while that big company gave him the title of a research engineer.
Akihiro Matsukawa gave an interesting perspective on the difference between the research scientist and the research engineer with his post: Research Engineering FAQs.
7: Notable examples include “Attention Is All You Need” from Google and “Language Models are Unsupervised Multitask Learners” from OpenAI.
1.1.3 Production
As machine learning finds increasing use in virtually every industry, there’s a growing need for people to bring machine learning models into production. In this section, we will first cover the production cycle for machine learning, the skills needed for each step, and the distinctions of several roles that often confuse the candidates I’ve talked to.
1.1.3.1 Production cycle
To understand different roles involving machine learning in production, let’s first explore different steps in a production cycle. There are six major steps in a production cycle.
⚠ On the main skills listed at each step ⚠ The main skills listed at each step below will upset many people, as any attempt to simplify a complex, nuanced topic into a few sentences would. This portion should only be used as a reference to get a sense of the skill sets needed for different ML-related jobs.
-
- Project scopingA project starts with scoping the project, laying out goals & objectives, constraints, and evaluation criteria. Stakeholders should be identified and involved. Resources should be estimated and allocated.Main skills needed: product management, subject matter expertise to understand problems, some ML knowledge to know what ML can and can’t solve.
-
- Data managementData used and generated by ML systems can be large and diverse, which requires scalable infrastructure to process and access it fast and reliably. Data management covers data sources, data formats, data processing, data control, data storage, etc.Main skills needed: databases/query engines to know how to store/retrieve/process data, systems engineering to implement distributed systems to process large amounts of data, minimal ML knowledge to optimize the organization data for ML access patterns would be helpful, but not required.
-
- ML model developmentFrom raw data, you need to create training datasets and possibly label them, then generate features, train models, optimize models, and evaluate them. This is the stage that requires the most ML knowledge and is most often covered in ML courses.Main skills needed: This is the part of the process that requires the most amount of ML knowledge, statistics and probability to understand the data and evaluate models. Since feature engineering and model development require writing code, this part needs coding skills, especially in algorithms and data structures.
-
- DeploymentAfter a model is developed, it needs to be made accessible to users.Main skills needed: Bringing an ML model to users is largely an infrastructure problem: how to set up your infrastructures or help your customers set up their infrastructures to run your ML application. These applications are often data-, memory-, and compute-intensive. It might also require ML to compress ML models and optimize inference latency unless you can push these to the previous step of the process.
-
- Monitoring and maintenanceOnce in production, models need to be monitored for performance decay and maintained/updated to be adaptive to changing environments and changing requirements.Main skills needed: Monitoring and maintenance is also an infrastructure problem that requires computer systems knowledge. Monitoring often requires generating and tracking a large amount of system-generated data (e.g. logs), and managing this data requires an understanding of the data pipeline.
-
- Business analysisModel performance needs to be evaluated against business goals and analyzed to generate business insights. These insights can then be used to eliminate unproductive projects or scope out new projects.Main skills needed: This part of the process requires ML knowledge to interpret ML model’s outputs and behavior, in-depth statistics and probability knowledge to extract insights from data, as well as subject matter expertise to map these insights to the practical problems the ML models are supposed to solve.

Skill annotation
-
- Systems: system engineering e.g. to building distributed systems, container deployment.
-
- Databases: data management, storage, processing, databases, query engines. This is closely related to Systems since you might need to build distributed systems to process large amounts of data.
-
- ML: linear algebras, ML algorithms, etc.
-
- Algo: algorithmic coding
-
- Stats: probability, statistics
-
- SME: subjective matter expertise
-
- Prod: product management
The most successful approach to ML production I’ve seen in the industry is iterative and incremental development. It means that you can’t really be done with a step, move to the next, and never come back to it again. There’s a lot of back and forth among various steps.
Here is one common workflow that you might encounter when building an ML model to predict whether an ad should be shown when users enter a search query8.
8: Praying and crying not featured but present through the entire process.
-
- Choose a metric to optimize. For example, you might want to optimize for impressions — the number of times an ad is shown.
-
- Collect data and obtain labels.
-
- Engineer features.
-
- Train models.
-
- During error analysis, you realize that errors are caused by wrong labels, so you relabel data.
-
- Train model again.
-
- During error analysis, you realize that your model always predicts that an ad shouldn’t be shown, and the reason is that 99.99
-
- Train model again.
-
- The model performs well on your existing test data, which is by now two months ago. But it performs poorly on the test data from yesterday. Your model has degraded, so you need to collect more recent data.
-
- Train model again.
-
- Deploy model.
-
- The model seems to be performing well but then the business people come knocking on your door asking why the revenue is decreasing. It turns out the ads are being shown but few people click on them. So you want to change your model to optimize for clickthrough rate instead.
-
- Start over.
There are many people who will work on an ML project in production — ML engineers, data scientists, DevOps engineers, subject matter experts (SMEs). They might come from very different backgrounds, with very different languages and tools, and they should all be able to work on the system productively. Cross-functional communication and collaboration are crucial.
Tip As a candidate, understanding this production cycle is important. First, it gives you an idea of what work needs to be done to bring a model to the real world and the possible roles available. Second, it helps you avoid ML projects that are bound to fail when the organizations behind them don’t set them up in a way that allows iterative development and cross-functional communication.
1.1.3.2 Machine learning engineer vs. software engineer
ML engineering is considered a subfield of software engineering. In most organizations, the hiring process for MLEs is spun out of their existing SWE hiring process. Some organizations might swap out a few SWE questions for ML-specific questions. Some just add an interview specially focused on ML on top of their existing interview process for SWE, making their MLE process a bit longer than their SWE process.
Overall, MLE candidates are expected to know how to code and be familiar with software engineering tools. Many traditional SWE tools can be used to develop and deploy ML applications.
In the early days of ML adoption, when companies had little understanding of what ML production entailed, many used to expect MLE candidates to be both stellar software engineers and stellar ML researchers. However, finding a candidate fitting that profile turned out to be difficult, and many companies had relaxed their ML criteria. In fact, several hiring managers have told me that they’d rather hire people who are great engineers but don’t know much ML because it’s easier for great engineers to pick up ML than for ML experts to pick up good engineering practices.
Tip If you’re a candidate trying to decide between software engineering and ML, choose engineering.
1.1.3.3 Machine learning engineer vs. data scientist
ML engineers might spend most of their time wrangling and understanding data. This leads to the question: how is a data scientist different from an ML engineer?
There are three reasons for much overlap between the role of a data scientist and the role of an ML engineer.
First, according to Wikipedia, “data science is a multidisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data.” Since machine learning models learn from data, machine learning is part of data science.
Second, traditionally, companies have data science teams to generate business insights from data. When interests in ML were revived in the early 2010s, companies started looking into using ML. Before making a significant investment in starting full-fledged ML teams, companies might want to start with small ML projects to see if ML can add value. A natural candidate for this exploration is the team that is already working with the data: the data science team.
Third, many tasks of the data science teams, including demand forecasting, can be done using ML models. This is also how most data scientists transition into ML roles.
However, there are many differences between ML engineering and data science. The goal of data science is to generate business insights, whereas the goal of ML engineering is to turn data into products. This means that data scientists tend to be better statisticians, and ML engineers tend to be better engineers. ML engineers definitely need to know ML algorithms, whereas many data scientists can do their jobs without ever touching ML.
As a company’s adoption of ML matures, it might want to have a specialized ML engineering team. However, with an increasing number of prebuilt and pretrained models that can work off-the-shelf, it’s possible that developing ML models will require less ML knowledge, and ML engineering and data science will be even more unified.

1.1.3.4 Other technical roles in ML production
There are many other technical roles in the ML ecosystem. Many of them don’t require ML knowledge at all, e.g. you can build tools and infrastructures for ML without having to know what a neural network is (though knowing might help with your job). Examples include framework engineers at NVIDIA who work on CUDA optimization, people who work on TensorFlow at Google, or AWS platform engineers at Amazon.
ML infrastructure engineer, ML platform engineer
Because ML is resource-intensive, it relies on infrastructures that scale. Companies with mature ML pipelines often have infrastructure teams to help them build out the infrastructure for ML. Valuable skills for ML infrastructure/platform engineers include familiarity with parallelism, distributed computing, and low-level optimization.
These skills are hard to learn and take time to master, so companies prefer hiring engineers who are already skillful in this and train them in ML. If you are a rare breed that knows both systems and ML, you’ll be in demand.
ML accelerator/hardware engineer
Hardware is a major bottleneck for ML. Many ML algorithms are constrained by processors not being able to do computation fast enough, not having enough memory to store data/models and load them into memory, not being cheap enough to run experiments at scale, not having enough power to run applications on device.
There has been an explosion of companies that focus on building hardware both for training and serving ML models, both for cloud computing and edge computing. These hardware companies need people with ML expertise to guide their processor development process: to decide what ML models to focus on, then implement, optimize, and benchmark these models on their hardware. More and more hardware companies are also looking into using ML algorithms to improve their chip design process1112.
ML solutions architect
This role is often seen at companies that provide services and/or products to other companies that use ML. Because each company has its own use cases and unique requirements, this role involves working with existing or potential customers to figure out whether your service and/or product can help with their use case and if yes, how.
Developer advocate, developer programs engineer
You might have seen developer relationship (devrel) engineer roles such as developer advocate and developer programs engineer for ML. These roles bridge communication between people who build ML products and developers who use these products. The exact responsibilities vary from company to company, role to role, but you can expect them to be some combination of being the first users of ML products, writing tutorials, giving talks, collecting and addressing feedback from the community. Products like TensorFlow and AWS owe part of their popularity to the tireless work of their excellent devrel engineers.
Previously, these roles were only seen at major companies. However, as many machine learning startups now follow the open-core business model — open-sourcing the core or a feature-limited version of their products while offering commercial versions as proprietary software — these startups need to build and maintain good relationships with the developer community, devrel roles are crucial to their success. These roles are usually very hard to fill, as it’s rare to find great engineers who also have great communication skills. If you’re an engineer interested in more interaction with the community, you might want to consider this option.
11: Chip Design with Deep Reinforcement Learning (Google AI blog, 2020)
12: Accelerating Chip Design with Machine Learning (NVIDIA research blog, 2020)
1.1.3.5 Understanding roles and titles
While role definitions are useful for career orientation, a role definition poorly reflects what you do on the job. Two people with the same title on the same team might do very different things. Two people doing similar things at different companies might have different titles. In 2018, Lyft decided to rename their role of “Data Analyst” to “Data Scientist”, and “Data Scientist” to “Research Scientist”13, a move likely motivated by the job market’s demands, which shows how interchangeable roles are.
Tip When unsure what the job entails, ask. Here are some questions that might help you understand the scope of a role you’re applying for.
- How much of the job involves developing ML models?
- How much of the job involves data exploration and data wrangling? What are the characteristics of the data you’d have to work with, e.g. size, format?
- How much of the job involves DevOps?
- Does the job involve working with clients/customers? If yes, what kind of clients/customers? How many would you need to talk to? How often?
- Does the job involve reading and/or writing research papers?
- What are some of the tools that the team can’t work without?
In this book, I use the term “machine learning engineer” as an umbrella term to include research engineer, devrel engineer, framework engineer, data scientist, and the generic ML engineer.
Resources
- What machine learning role is right for you? by Josh Tobin, Full Stack Deep Learning Bootcamp 2019.
- Data science is different now by Vicki Boykis, 2019.
- The two sides of Getting a Job as a Data Scientist by Favio Vázquez, 2018.
- Goals and different roles in the Data Science platform at Netflix by Julie Pitt, live doc.
- Unpopular Opinion – Data Scientists Should Be More End-to-End by Eugene Yan, 2020.
13: What’s in a name? The semantics of Science at Lyft by Nicholas Chamandy (Lyft Engineering blog, 2018)
1.2 Types of companies
Different types of companies offer different roles, require different skills, and need to be evaluated using different criteria.
1.2.1 Applications companies vs. tooling companies
Personal story
After NVIDIA, I wanted to join an early-stage startup. I was considering two AI startups that were similar on the surface. Both had just raised seed rounds. Both had about 10 employees, most of these were engineers, and were ready for hypergrowth.
Startup A already had three customers, had just hired its first salesperson, and was ready to hire more salespeople to aggressively sell. Startup B had two customers, who they called design partners, and had no plan yet of hiring salespeople. I liked the work at startup B more but thought that startup A had a better sales prospect than startup B, and for early-stage startups, sales are essential for survival.
When I told this dilemma to a friend, who had invested in companies similar to both A and B, he pointed out to me that I forgot to consider the key difference between these two startups: A was an applications company while B was a tooling company.
Applications companies provide applications to solve a specific business problem such as business analytics or fraud detection. Tooling companies create tools to help companies build their own applications. Examples of tools are TensorFlow, Sparks, Airflow, Kubernetes.
In industry lingo, an application is said to target a vertical, whereas a tool targets a horizontal.

Applications are likely made to be used by subject matter experts who don’t want to be cumbered with engineering aspects (e.g. bankers who want applications for fraud detection or customer representatives who want applications to classify customer tickets). Tools are likely made to be used by developers. Some are explicitly known as devtools.
In the beginning, it’s easier to sell an application because it’s easier to see the immediate impact of an application and there’s less overhead for adopting an application. For example, you can tell a company that you can detect fraudulent transactions with 10
For a company to adopt a tool, there’s a lot of engineering overhead. They might have to swap out their existing tool, integrate the new tool with the rest of their infrastructure, retrain their staff or replace their staff. Many companies want to wait until a tool proves its usefulness and stability to a large number of companies first before adopting it.
However, for tooling companies, selling becomes a lot easier later on. Once your tool has reached a critical mass with a sufficient number of engineers who are proficient in it and prefer it, other companies might just become a user without you having to sell it. However, it’s really, really hard to get to that critical mass for new tools, and therefore, in general, tooling companies have higher risks than applications.
After talking to my friend, I realized that it’s normal for a company like A to have more customers than a company like B early on. But it doesn’t mean that A has a better sales prospect. In fact, having two large companies as design partners is a really good sign for B.
A year later, both companies acquired a similar number of customers and grew to have around 30 employees, but more than half of company A are in sales whereas 80
This new understanding helped me narrow down my choices. Because I preferred building tools for developers and wanted to work for an engineering-first instead of a sales-first organization, the decision became much easier.
⚠ Ambiguity ⚠ Whether a company is an application company or a tooling company might just be a go-to-market strategy. For example, you have a new tool that can address use cases that companies aren’t aware of yet, and you know it’d be hard to convince companies to make significant changes to their existing infrastructures for uncertain use cases. So you come up with a compelling use case that can’t be done without your tool, build an application around that use case, and sell that application instead. Once customers are aware of the usefulness of your tool, you switch to selling the tool directly.
Tip If you’re unsure whether the role involves working with an application or a tool, here are a few questions you may ask.
- Who are the main users of your product?
- What are the use cases you’re targeting?
- How many people does your company have? How many are engineers? How many are in sales?
1.2.2 Enterprise vs. consumer products
Another important distinction is between companies that build enterprise products (B2B – business to business) and companies that build customer products (B2C – business to consumer).
B2B companies build products for organizations. Examples of enterprise products are Customer relationship management (CRM) software, project management tools, database management systems, cloud hosting services, etc.
B2C companies build products for individuals. Examples of consumer products are social networks, search engines, ride-sharing services, health trackers, etc.
Many companies do both — their products can be used by individuals but they also offer plans for enterprise users. For example, Google Drive can be used by anyone but they also have Google Drive for Enterprise.
Even if a B2C company doesn’t create products for enterprises directly, they might still need to sell to enterprises. For example, Facebook’s main product is used by individuals but they sell ads to enterprises. Some might argue that this makes Facebook users products, as famously quipped: “If you’re not paying for it, you’re not the customer; you’re the product being sold.14”
These two types of companies have different sales strategies and engineering requirements. Consumer products tend to rely on viral marketing (e.g. invite your friends and get your next order for free) to reach a large number of users. Selling enterprise products tends to require selling to each user separately.
Enterprise companies usually have the role of solutions architect and its variances (solutions engineer, enterprise architect) to work with enterprise customers to figure out how to use the tool for their use cases.
Tip Since these two types of companies have different business models, they need to be evaluated differently when you consider joining them. For enterprise products, you might want to ask:
- How many customers do they have? What’s the customer growth rate (e.g. do they sign on a customer every month)?
- How long is their sales cycle (e.g. how long it usually takes them from talking to a potential customer to closing the contract)?
- How does their pricing structure work?
For consumer products, you might want to ask:
- How hard is it to integrate their product with their customers’ systems?
- How many active users do they have? What’s their user growth rate?
- How much does it cost to acquire a user? This is extremely important since the cost of user acquisition has been hailed as a startup killer15.
- Do users pay to use the product? If not, how are they going to make money?
- What privacy measures do they take when handling users’ data? E.g. you don’t want to work for the next Cambridge Analytica.
1.2.3 Startups or big companies
This is a question that I often hear from people early in their careers, and a question that can prompt heated discussions. I’ve worked at both big companies and startups, and my impressions were pretty much aligned with what is often said of the trade-off between big company stability and startup high impact (and high risk).
Statistically speaking, software engineers are more likely to work for a big company than a small startup. Even though there are more small companies than large corporations, large corporations employ more people. According to StackOver Developer Survey 2019, more than half of the 71K respondents worked for a company of at least 100 employees.

I couldn’t find a survey for ML specific roles, so I asked on Twitter and found similar results. This means that an average MLE most likely works for a company of at least 100 employees.
Tip A piece of advice I often hear and read in career books is that one should join a big company after graduation. The reasons given are:
- Big companies give you brand names you can put on your resume and benefit from for the rest of your life.
- Big companies often have standardized tech stack and good engineering practices.
- Major tech companies offer good compensation packages. Working for them even briefly will allow you to save money to pursue riskier ventures in the future.
- It’s good to learn how a big company operates since the small company that you later join or start might eventually grow big.
- You’ll know what it’s like to work for a big company so you’ll never again have to wonder.
- Most startups are bad startups. Working at a big company for a while will better equip you with technical skills and experience to differentiate a good startup from a bad one.
If you want to maximize your future career options, spending a year, or even just an internship, at a big company, is not a bad strategy. Whether you choose to join a startup or a big company, I hope that you get a chance to experience both environments and learn the very different sets of skills that they teach. You don’t know what you like until you’ve tried it. You might join a big company and realize you never want to join another big company again, or you might join a startup and realize you can’t live without the stability big companies give you. And if you believe that you’re offered a once-in-a-lifetime opportunity, take it, whether it’s at a big company or a startup.
- For immigrants, big companies might be the only option since small companies can’t afford to sponsor visas.
Personal story After graduation, I joined NVIDIA, not because it was a big company, but because I was excited about the opportunity to be part of a brand new team working on challenging projects. Looking back, I realized the brand name of NVIDIA helped my work to be taken seriously. Being an unknown employee at an unknown company would have thrown me even further into obscurity. I stayed at NVIDIA for a year and a half then joined a startup. I wanted a fast-moving environment with a steep learning curve, and I wasn’t disappointed.

Tip for engineers early in your careers: Know what you’re optimizing for With each career decision, be mindful of what you’re optimizing for so that you can get closer to your eventual goal. Some of the goals you can optimize for are:
- Money now: some people need or want immediate money, e.g. to pay off debts or to prepare for an economic downturn that they believe will happen in the near future. They might interview with multiple companies and go for the highest bidder. There’s nothing wrong with that.
- Money in the future: some are more concerned with being able to make a lot of money in the future. They might choose to pursue a Ph.D. that pays next to nothing but will help them get a highly paid job later.
- Impact: some focus on making an impact. You might work for a startup that allows you to make decisions that affect millions of users or work for a non-profit organization that changes people’s lives.
- Experience diversity: the most interesting people I’ve met optimize for new experiences. They choose jobs that allow them to do things that they’ve never done before.
- Brand name recognition: it’s not a bad strategy to choose to work for the most well-known company or person in your field. This brand can open many doors for you down the line.
You can optimize for different things at different stages in your life, but you can only optimize for one thing at a time. You might optimize for new experiences when you’re younger, money and recognition when you start having more responsibilities, then impact when you’ve had enough money to not have to worry about it. If you don’t know what you’re optimizing for, optimize for personal growth. Get a skill set that maximizes your options in the future14.
- Personal growth: those who optimize for this choose the job that allows them to learn the most, which in turn maximizes their career option. They might choose a job because it offers mentorship or allows them to work on new, challenging tasks.
Resources
Twitter thread: Advice for people who want to leave a big company to join a startup by Jensen Harris.
How to get rich in tech, guaranteed (Startups and Shit, 2016).
Twitter thread: Joining a startup is not a get-rich-quick scheme by me (shameless plug)
Chapter 2. Machine learning interview process
2.1 Understanding the interviewers’ mindset
To understand the interview process, it’s important to see it from the employers’ perspective. Not only candidates hate the hiring process. Employers hate it too. It’s expensive for companies, stressful for hiring managers, and boring for interviewers.
While a handful of well-known organizations are swamped with resumes, lesser-known companies struggle to attract talent. I keep hearing from small companies that it’s near impossible to compete with offers made by tech giants. After weeks of pulling out all the stops to court a candidate, the company makes an offer only to find out that FAAAM has outbid them. Companies often contract talent agencies that might charge 20-30
The competition for talent is especially brutal in Silicon Valley where the high number of companies per capita makes the odds in candidates’ favor. Recruiters, even those from companies that receive millions of resumes every year like Google17, aggressively court potential candidates even if these candidates aren’t looking. The majority of people who took a new job in 2018 weren’t searching for one18.
Some candidates express a mild annoyance at recruiters’ unsolicited contact. This attitude is often misguided because recruiters are your biggest ally: they work to get you hired. As a candidate, you want to have enough visibility so that recruiters reach out to you.
Every company says that they want to hire the best people. That’s not true. Companies want to hire the best people who can do a reasonable job within time and monetary constraints. If it takes a month and $10K to find candidate A who can do 93
You’d think that when companies hire, they know exactly what they want their new hires to do. Unless it’s an established team with routine tasks, hiring managers can seldom predict with perfect clarity what tasks need to get done or what skills are needed. Sometimes, companies can’t even be sure that they’ll need that person. Sam Altman, chairman of the startup accelerator Y Combinator and co-chairman of OpenAI, advises companies that, in the beginning, “you should only hire when you desperately need to.”
However, because hiring is so competitive and time-consuming, companies can’t afford to wait until they’re desperate. A desperate hire is likely to be a bad one. Sarah Catanzaro, a partner focusing on AI at Amplify Partners, advises her portfolio companies to start hiring when they’re 50
Imagine a startup that has just raised several million dollars and decided that they want to turn their logs into useful features. They think ML can help them, but don’t know how it’d be done. When the recruiter asks them for a job description, they whip up a list of generic ML-related skills and experiences they think might be necessary. Requirements such as “5 years of experience” or “degrees in related fields” are arbitrary and might prevent them from hiring the right candidate.
Tip Job descriptions are for reference. Apply for jobs you like even if you don’t have all the skills and experiences in the job descriptions. Chances are you don’t need them for those jobs.
Engineers start interviewing candidates after one to six months at a new company. New hires begin by shadowing more senior interviewers for a few interviews before doing it on their own, and that’s often all the training they get. Interviewers might have been in your shoes just months ago, and like you, they don’t know everything. Even after years of conducting interviews, I still worry that I’ll make a fool of myself in front of candidates and give them a bad impression of my company.
This lack of training means that even within the same company, interviewers may have different interviewing techniques and different ideas of what a good interview looks like. Rubrics to grade candidates — if in existence at all — are qualitative instead of quantitative (e.g. “candidates show good debugging skills’”).
Hiring managers also aggregate feedback from interviewers differently. Some hiring managers rely on the average feedback from all interviewers. Some rely on the best feedback — they’d prefer a candidate that at least one interviewer is really excited to work with to someone whose general feedback is good but no one is crazy about. Google is an example of a company that values enthusiastic endorsements over uniformly lukewarm reviews.
Some companies, in their aggressive expansion, might hire anyone as long as there’s no reason not to hire them. Other companies might only hire someone if there’s a great reason to hire them.
If you think one interview goes poorly, don’t despair. There are many random variables other than your actual performance that influence the outcome of an interview: the questions asked, other candidates the interviewer has seen before you, after you, the interviewer’s expectation, even the interviewer’s mood. It is, in no way, a reflection of your ability or your self-worth. Companies know that too, and it’s a common practice for companies to invite rejected candidates to interview again after a year or so.
17: Google Automatically Rejects Most Resumes for Common Mistakes You’ve Probably Made Too (Inc., 2018).
18: Your Approach to Hiring Is All Wrong (Peter Cappelli, Harvard Business Review, 2019)
2.1.1 What companies want from candidates
The goal of the interviewing process is for a company to assess:
-
- whether you have the necessary skills and knowledge to do the job
-
- whether they can provide you with a suitable environment to carry out that task.
Companies will be looking at both your technical and non-technical skills.
2.1.1.1 Technical skills
-
- Software engineering. As ML models often require extensive engineering to train and deploy, it’s important to have a good understanding of engineering principles. Aspects of computer science that are more relevant to ML include algorithms, data structures, time/space complexity, and scalability. You should be comfortable with the usual suspects: Python, Jupyter Notebook or Google Colab, NumPy, scikit-learn19, and a deep learning framework. Knowing at least one performance-oriented language such as C++ or Go can come in handy. BestPracticer has an interesting list of engineering skills needed for skills at different levels.
-
- Data cleaning, analytics, and visualization. Data handling is important yet often overlooked in ML education. It’s a huge bonus when a candidate knows how to collect, explore, clean data as well as knowing how to create training datasets. You should be comfortable with dataframe manipulation (pandas, dask) and data visualization (seaborn, altair, matplotlib, etc.). SQL is popular for relational databases and R for data analysis. Familiarity with distributed toolkits like Spark and Hadoop is also very useful.
-
- Machine learning knowledge. You should understand ML beyond citing buzzwords. Ideally, you should be able to explain every architectural choice you make. You might not need this understanding if all you do is clone an existing open-source implementation and it runs flawlessly on your data. But models seldom run flawlessly, so you’d need this understanding to evaluate potential solutions and debug your models.
-
- Domain-specific knowledge. You should have knowledge relevant to the products of the company you’re interviewing for. If it’s in the autonomous vehicle space, you’re probably expected to know computer vision techniques as well as computer vision tasks such as object detection, image segmentation, and motion analysis. If the company builds speech recognition systems, you should know about mel-filterbank features, CTC loss, and common benchmark datasets for the task of speech recognition.
2.1.1.2 Non-technical skills
-
- Analytical thinking, or the ability to solve problems effectively. This involves a step-by-step approach to break down complex problems into manageable components. You might not immediately know how to solve a problem, especially if it’s something you’ve never encountered before, but you should know how to systematically approach it. When hiring for junior roles, employers might value this skill more than anything else. You can teach someone Python in a few weeks, but it takes years to teach someone how to think.
-
- Communication skills. Real-world ML projects involve many different stakeholders from different backgrounds: ML engineers, DevOps engineers, subject matter experts (e.g. doctors, bankers, lawyers), product managers, business leaders. It’s important to communicate technical aspects of your ML models to people who are also involved in the developmental process but don’t necessarily have technical backgrounds.It’s hard to work with someone who can’t explain what they are doing. If you have a brilliant idea but nobody understands it, it’s not brilliant. Keep in mind that there’s a huge difference between fundamentally complex ideas and ideas made complicated by the author’s inability to articulate them.
-
- Experience. Whether you have completed similar tasks in the past and whether you can generalize from those experiences to future tasks. The tech industry is notorious for downplaying experience in favor of ready-to-burn-out twentysomethings. However, in ML where improvements are often made from empirical observations, experience makes all the difference between having a model that performs well on a benchmark dataset and making it work in real-time on real-world data. Experience is different from seniority. There are complacent engineers who’ve worked for decades with less experience than an inquisitive college student.
-
- Leadership. In this context, leadership means the ability to take initiative and complete tasks. If you’re assigned a task, will you be able to do it from start to finish without someone holding your hand? You don’t need to know how to do all components on your own, but you should know what help you need and be proactive in seeking it. This quality can be evaluated based on your past projects. In school or in your previous jobs, did you only do what you were told or did you seize opportunities and take initiative?
The skillset required varies from role to role. See section 1.1 Different machine learning roles for differences among roles.
2.1.1.3 What exactly is culture fit?
There’s one thing that companies look for that isn’t a skill and sometimes a source of contention: culture fit. Some even argue that this is a proxy to create an exclusive culture — managers hire only people who look like them, talk like them, and come from the same background as them.
Some companies have switched the term “culture fit” for terms like “value alignment”. A fit should be aligned to values, not lifestyle, e.g. whether you value constructive criticism, not whether you go out to drink every Sunday afternoon.
For most big companies, because their culture is already established, one new employee is unlikely to change the office dynamic and culture fit boils down to whether you’re someone people would like to work with (e.g. you’re not an asshole, you’re not defensive, you’re a team player). For small organizations, culture fit is more important as companies want people who share their mission, drive, ethics, and work habits.
Value alignment is also about you evaluating whether this is a company you want to work for. Do you believe in their mission and vision? Are their current employees the type you want to be around? Will the company provide a good environment for you to grow? One candidate told me he turned down an offer after being invited to the company’s push-up competition. He didn’t feel like a culture that places so much importance on testosterone-filled activities would be a good fit for him.
2.1.1.4 Junior vs senior roles
Companies might put junior and senior roles on different hiring processes. Junior candidates, who haven’t proved themselves through previous working experience, might have to go through more screenings. Amazon, for example, only requires coding challenges for junior candidates. Senior candidates, however, might be asked more difficult questions. For example, a few companies have told me that they only give the machine learning systems design questions21 to more senior candidates since they don’t think junior candidates would have the context to do those questions well.
A hiring manager at NVIDIA told me: “When you hire senior engineers, you hire for skills. When you hire junior engineers, you hire for attitude.” I’ve heard this sentiment echo in multiple places.
A senior director at a public tech company told me that when he interviews junior candidates, including interns and recent college graduates, he cares more about how they think, how they respond, and how they adapt. A junior candidate with weaker technical skills but is willing to learn might be preferred over another junior candidate with stronger technical skills but a horrible attitude.
2.1.1.5 Do I need a Ph.D. to work in machine learning?
No, you don’t need a Ph.D. to work in machine learning.
Those who think that a Ph.D. is needed often cite job posts for research scientists that list “Ph.D.” as a requirement. First, research scientist roles make up a very small portion of the ML ecosystem. No other roles, including the popular ML engineer, require a Ph.D.
Even for research scientists, there are plenty of false negatives. For example, OpenAI, one of the world’s top AI research labs, lists only two requirements for their research scientist position:
-
- track record of coming up with new ideas in machine learning
-
- past experience in creating high-performance implementations of deep learning algorithms (optional)22. The long list of people who have done amazing work in machine learning but don’t have a Ph.D. includes the current OpenAI CTO, IBM Watson master inventor, PyTorch creator, Keras creator, etc.
Companies know that you don’t need a Ph.D. to do ML research, but still require a Ph.D. because it’s a signal that you’re serious about research. At many companies, the people who screen your resumes aren’t technical and therefore rely on weak signals like Ph.D. to decide whether to pass your resume to the hiring managers.
Engineering roles that require PhDs are the exceptions, not the norm. Some candidates complain that they get rejected by big companies because they don’t have Ph.D.’s. Unless the rejections explicitly say so, don’t confuse correlation with causation. People with PhDs get rejected too.
In November 2017, Kaggle surveyed 16,000 of their users and found that 15.6
If you’re serious about research, a Ph.D. is encouraged. However, you shouldn’t let not having a Ph.D. stop you from applying for a job. If you’re interested in a company, build up your portfolio, and apply.
2.1.2 How companies source candidates
To get hired, it might be helpful to put yourself where employers are looking. Out of all possible channels for sourcing candidates, referrals are, by far, the best channel. Recruiters have, for a long time, unanimously agreed on the effectiveness of referrals. Here are some numbers:
-
- Across all jobs, referrals account for 7
Sam Altman, CEO of OpenAI and the former president of Y Combinator, wrote that: “By at least a 10x margin, the best candidate sources I’ve ever seen are friends and friends of friends.”
Lukas Biewald, founder of two machine learning startups Figure Eight and Weights & Biases, analyzed the performance of 129 hires and concluded that:
An analysis of 15,897 Glassdoor interview reviews for software engineering related roles at 27 major tech companies showed that: “For junior roles, about 10 – 20
The State of Data Science & Machine Learning survey in 2017 by Kaggle shows that while most people seeking to enter the field look for jobs through company websites and tech job boards, most people already employed in the field got their jobs through recruiters’ outreach or referrals. For junior roles, the biggest source for onsite candidates is campus recruiting. Microsoft and Oracle have more than half of their interviewees recruited through campus events such as career fairs and tech talks. Internet giants like Google, Facebook, and Airbnb rely less on campus recruiting, but it still accounts for between 20 and 30
From the employers’ perspective, targeting their most promising sources can reduce the hiring cost as well as the risk of disastrous hires. It is, therefore, not surprising that the default message to most candidates who submit their resumes through less promising sources like online applications is “Thank you, next.” This process is far from ideal as it creates an exclusive, anti-meritocratic environment. Many qualified people are rejected simply because they don’t go to the right school or don’t have the right network. If you’re one of these statistically unlucky candidates, one thing you can hope for is that you have a set of skills and/or portfolio that attract recruiters. Around 15 to 25
If all else fails, submit your applications and hope for the best. Companies that are the friendliest to online applicants are Twitter, Amazon, and Airbnb with roughly half of their onsite candidates being online applicants. Companies among the most likely to pass on hopeful online applicants are Facebook, Microsoft, and Oracle. Accurately evaluating candidates is very challenging. First, you can only evaluate something as well as your evaluators allow. Companies can only evaluate a candidate to the extent of the interviewers’ knowledge. If your interviewer has a shallow understanding of X, they won’t be able to evaluate your in-depth understanding of X. Many companies, including those who claim to be ML companies, don’t already have a strong in-house ML team to act as good evaluators35. Second, even strong in-house teams don’t always mean strong evaluators. Therefore, companies have to rely on signals to help them predict whether a candidate would be a good fit. As you might have already suspected, pedigrees make for strong signals. It’s not a coincidence that companies like to advertise how many ex-Googlers or ex-Facebookers they have on the payroll. If you’ve worked as a full-time ML engineer at Google, you must have passed its ML interviews and learned good engineering practices from Google. On resumes, college names matter but not much. Their importance is inversely proportional to seniority. If someone, with all the privileges of an elite education, still has no interesting past projects to put on their resume, the fancy college name might even hurt. However, going to a popular engineering school has several benefits. First, given two equally mediocre resumes, one from MIT, the other from a college nobody has ever heard of, the recruiters might be more inclined to give the one from MIT a call. Second, popular engineering colleges give you access to recruiters who hire from campus events. Third, you’ll likely have classmates at big companies who can refer you. If you’re a recent graduate, your college name might matter less than your GPA, which shows your dedication during your studies. Still, your GPA doesn’t matter as much if you have other things to show. I’ve had only one employer asking for my GPA, and it was after I’d got the offer so that they could put it in their database. The strongest signal is past experience, especially experience similar to the job you’re applying for. The experience can be work done at your previous jobs, projects you do independently, or competitions you enter. If you’ve placed highly in Kaggle competitions, made significant contributions to open-source projects, presented papers at top-tier conferences, written in-depth technical blog posts, self-published books, or done any interesting side projects, you should put them online and highlight them in your resume. There are so many things you can do to signal to people that you’re proactive, capable, and willing to work hard. When I asked on Twitter which signal is most important when screening for ML engineering roles, more than 50
The interview pipeline for ML roles evolved out of the tried-and-true software engineering interview pipeline and includes the same components that one can see in a traditional technical interviewing process. There are many people involved in the interviewing process. For hiring managers, it’s crucial to assign each interviewer a set of skills to evaluate, so that different interviewers ask different questions and that collectively, they get a holistic picture of where you’re at with all the skills they care about. One interviewer might ask you about theories, a couple about coding, one about ML systems design. Ideally, the interviewer tells you the skills they want to focus on so that you can tailor your answers to highlight those skills. If they don’t, ask. Recruiters are often encouraged to share with candidates the names of their interviewers. If they don’t, ask. You can look up your interviewers beforehand to learn what they do. It’ll give you a sense of not only what you’ll be doing if you join the company, but also your interviewers’ areas of interest. You should also ask about the team you’re being considered for. At most companies, you interview for a specific team. However, if you apply to companies such as Google and Facebook, you’re matched with a team after you’ve passed. This means that it’s possible to pass their interview process without getting an offer if no team takes you, though it’s rare40. The following list of interview components is long and intimidating, but companies usually use only a subset of them. The number of interviews in each component also varies. Companies might skip any step if they’re confident about your ability. Strong candidates might even be invited directly to onsites without all the previous steps. On the flip side, candidates that need to travel for onsites might be vetted more rigorously beforehand. The entire process can be long and tiring, but it’s long and tiring for every candidate. You don’t have to answer all questions flawlessly. You only need to do better than other candidates. No company expects you to know everything. The field is moving so fast it’s unrealistic to expect any candidate to know all the latest papers and techniques. Given how expensive, time-consuming, and inaccurate the traditional interview process is, companies are experimenting with new interviewing formats. Hiring at a small company is very different from hiring at a big company. One reason is that big companies make a lot of hires, so they need to standardize their process. Small companies refine the process as they go. Another reason is that big companies can afford to occasionally make bad hires, whereas a few bad hires can run a small company into the ground. As a result, processes at smaller companies can adapt to each role and each candidate. Processes at big companies can be rigid and bureaucratic and might involve questions irrelevant to the role or the candidate. The standardization at big companies also means that the process is more hackable — thorough preparation can substantially increase your odds. Another important difference is that big companies can afford to hire specialists, who are great in only a small area. Startups are unpredictable and ever-changing, so they might care less about what you do best and more about whether you can address their most urgent needs. It’s typically much easier to interview for an internship and then get a return offer as a full-time employee than to interview for a full-time position directly. The interview process is a proxy to predict how well you will perform on the job. If you’ve already interned with a team, they know your ability and fit, and therefore might prefer you to an unknown candidate. At major tech companies, intern programs exist to provide a steady, reliable source for full-time talent. An average intern at Facebook makes $8,000/month, almost twice as much as an average American full-time worker43, and Facebook isn’t even the highest bidder44. It’s hard to justify this salary unless it can offset the recruiting cost later. The interview process for interns is less complicated because an internship is less of a commitment. If a company doesn’t like an intern, that intern will be gone in three months. But if they don’t like a full-time employee, firing that person is expensive. The number of full-time positions is subjected to a strict headcount, but the number of interns often isn’t. Even when a company freezes the hiring process, such as to cut costs, they might still hire interns. If your internship doesn’t go terribly wrong and the company is hiring, you’ll likely get an intern-to-full-time offer. At NVIDIA, the majority of full-time offers for new graduates go to their interns. Rachelle Gupta, an ex-recruiter for Google and GitHub, wrote in one of her answers on Quora that: “Ranges [of the intern conversion rates] are between the high 60’s
If you’re a student, it’s never too early to start looking for internships. It’s not uncommon to intern as high school students. However, if you’ve already passed that phase, don’t fret. This book is to help you with the process. In this section, we’ll go over the main types of questions in an interview process for ML roles. Each interviewer is likely to start with a short introduction and one or two questions about your background. These questions are to get a sense of who you are and maybe to get you more comfortable. For companies that have a dedicated behavioral round, this part should be short during the technical interviews. The behavioral round may or may not be during lunch, which is usually hosted by the hiring manager or a designated behavioral interviewer. Behavioral questions are to assess whether your values align with those of the company and whether the company can provide an environment for you to thrive. They can be grouped into four main topics: your background/resume, your interests, your communication style, and your personality. One common question that a lot of interviewers start with is “Tell me about yourself”. You should come up with an under-a-minute answer and practice it beforehand. Your answer should be both personal (you’re you and nobody else) and professional (why you’re a good fit for the role). Your answer might guide the rest of the interview. Interviewers might also test your general level of expertise with the question: “What level of involvement have you had with [ML|computer vision|natural language processing|etc.]?” Don’t bluff. If you pretend that you’re more experienced than you actually are, the interviewer will find out. Companies will want to know about your career history. They will probably ask about your educational background, your previous jobs, and your past projects. You should highlight two things. One is the decisions you made at each position that had an impact, both positive and negative. Another is why you decided to leave one company to join the next. The interviewer will try to see if you’ve done what you claim on your resume. If you have, how much ownership you took. It’s pretty easy to spot the talkers from the doers just by asking for details. They might pick up one project from your resume and make you explain every choice you made in it. If you claim to be familiar with TensorFlow, be prepared to talk about eager execution, tf.estimator, and distributed training in TensorFlow. Employers will of course want to know about your current job search: why are you looking and what are you looking for. A question that every company asks is whether you have any impending offer/deadline that they should be aware of. Some candidates feel awkward to admit that they don’t have any offer yet for fear that the company will think less of them. Some resort to lying, telling recruiters that they have impending onsites or offers when they don’t. You might be able to get away with it — some even argue that since companies regularly take advantage of their users and employees, we have no ethical obligation to be honest with them. However, it’s a slippery slope. I wouldn’t want to be the person who lies to get ahead, but it’s a personal decision each of us has to make for ourselves. It’s in companies’ financial interest to have their employees passionate about their jobs. They want to make sure that your interests align. Some example questions are: Joshua Levy, a founding engineer at the AI-power SaaS company BloomReach, said his go-to interest question is: “Think of a period when you were thrilled and excited to go to work. What made you excited?” Was it the problem you were trying to solve, the impact you could make, the technologies you had access to, the people you worked with, your learning curve, or something else? One pet question for many interviewers is “explain a paper that you really like.” This answer shows not only the kind of problems you’re interested in, but also how far you’re willing to go to understand a problem, and how well you communicate technical concepts. It doesn’t matter what paper you pick, as long as you understand it enough to talk about it. Before your interviews, choose any paper, read it inside out and implement it yourself to understand all the subtleties. My personal favorite is: “What have you done that you’re most proud of?” The answer to this question doesn’t even have to be technical. I once told an interviewer that I’m most proud of the non-profit organization I started in high-school to encourage young people to explore outside their comfort zone. I got an offer. Interviewers want to get a sense of who you are: what motivates you to do your best work, how you handle obstacles. Employers want you to overcome obstacles and do your best work at their companies. Companies want passionate people, so talk about things that can bring out the fire in your eyes. Communication is arguably the most important soft skill for a job. Teamwork is impossible if team members don’t or can’t communicate with each other. Each company will want to evaluate not only how well you communicate but also whether you can adapt to the communication style of the company. For example, if a company is heavy on daily status updates and you prefer to disappear for days to focus on a project and only emerge when you’re done, it might not be a good fit. Your communication style and skills are evaluated through the entire process, from your cover letter, how you respond to their emails, to the way you answer each interview question. Companies might also ask explicit questions about how you communicate. Here are some of the questions to think about: Companies want to know about you, your personality, your grit, your strengths, and your weaknesses. Directly asking a candidate for their strengths and weaknesses often yields poor results — most people are reluctant to admit their flaws and too shy to boast about their strengths. Interviewers, therefore, might frame the questions differently. Here are some of the questions an interviewer might ask. Senior interviewers tend to have a go-to question that they believe reveals the most about a person. In his book Zero to One, Peter Thiel revealed that he asked everyone he interviewed: “What important truth do very few people agree with you on?” This question is hard to answer as it first requires you to know what most people agree on, be able to think independently and be confident enough to express it. Be yourself, but don’t be a jerk. Silicon Valley has been known for tolerating brilliant jerks — high performers who are rude and unpleasant to work with — which contributes to the alleged toxic culture of the tech industry. However, in recent years, many tech companies including Netflix have made the commitment to not hire jerks, no matter how brilliant they are. You usually have 5-10 minutes at the end of each interview, and a lot more during the behavioral round, to ask questions. Sometimes, you can learn more about someone from their questions than their answers. The interview process is a two-way street — not only a company evaluates whether you’re a fit but you also evaluate whether you want to work for that company. You should use every opportunity you have to ask questions to learn about the company — their mission, vision, values, competitors, future plans, challenges they’re facing, possible career path, policies that you should know about, internal hierarchy, and existing corporate politics. You should learn about the team you’re interviewing for: team composition and dynamics, team events, managerial style, the kind of people they want to bring onto the team. You should also try to get a sense of the projects you’re expected to work on and how your performance will be evaluated. If you care about the visibility of your work — which is especially important for those early in their career — you should ask about the company’s publishing policy: do they publish their papers and open-source their code? If a company doesn’t publish at all, you join and disappear — the outside world has no idea what you work on. Recently, there have been employee walkouts at major tech companies to protest their employers’ involvement with certain branches of the government. If that’s what you care about, you should definitely ask your potential employer for their stand on working with the government. Your interviewer’s career perspective is a good indication of what yours will be like if you join the company, so you should try to understand what they do and why/how they do it. Why did they choose this company? How is it compared to their previous employers? What do they find challenging about their job? How much freedom do they have in choosing what to work on? If you need more questions to ask, Julia Evans has a pretty great list. Most interviewers you meet will be bad interviewers. Few companies have proper interview training programs. Junior interviewers lack the experience to know what signals to look for and lack the technical depth to evaluate your expertise. Senior interviewers might be set in their way with their list of pet questions and might defend to the death the merit of their techniques even in light of contradicting evidence. Bad interviewers ask bad questions. Even good interviewers sometimes ask bad questions. Here are some examples of bad interview questions. When asked a question that you think is bad, should you tell your interviewer that it’s bad? There’s a non-zero chance that your interviewer might appreciate your candid feedback, but statistically speaking, they might get offended and write you off as someone they wouldn’t want to work with. Instead, you should ask for clarification. If stuck, explain why you’re stuck and what information you’d need to overcome it. If unsure about the interviewer’s intention, ask. “To better answer your question, is it to evaluate my understanding of X?” should be sufficient. You might want to avoid companies that have displayed the following red flags. Hiring speed varies wildly from company to company. On the long end of the spectrum, Google and DeepMind processes typically take between six weeks and three months. The long process is partly because of their recurring influx of candidates, and partly because they make hiring decisions at the company-level. Every two weeks, the hiring committee at Google meets to look at all the candidates that have passed the interview process and decide on who to hire. If you’ve made it to the hiring committee, your odds look good. On the short end of the spectrum, we have startups and big companies with flat organizational structures. The entire process at a startup can take days if they really want you. Big companies like NVIDIA and Netflix are fast, as they make hiring decisions at the team-level. A team manager can make hiring decisions on the spot. The whole process for my internship at NVIDIA took less than a week. I talked to my manager about converting to full-time in the afternoon and got my offer the next morning. The rest of the companies fall somewhere in the middle. The timeline depends a great deal on your availability for interviews, companies’ availability for interviewing spots, how much they want you, and how much they need you. You should ask your recruiter about the expected timeframe if they haven’t told you already, and inform them of any time constraints you have. If it’s been at least a week after your interviews and you haven’t heard from your recruiter, it’s okay to send a short and respectful check-in. Let them know that you’re excited about the company and would love to hear any updates as you have to consider other opportunities. When applying for a role, you might wonder what your odds for that role are. If hiring decisions followed a uniformly random distribution, the odds at major tech companies would be abysmal. Each year, Google receives several million resumes and hires several thousand, which makes the odds around 0.2
However, the odds are not uniformly distributed for people applying for the same role at the same company. It depends on your profile, whether you’re referred and who referred you, how much the company needs that role, who screens your resume, who they already have in their pipeline, how serious other applicants are. Companies have very different screening philosophies — some give every not-obviously-disqualified candidate a phone screen whereas some only respond to the top applicants. All of these factors, coupled with the fact that few companies publicize the number of resumes they receive or the number of hires each year, make it impossible to estimate the odds from submitting an application to getting an offer. However, it’s possible to estimate the onsite-to-offer ratio, the percentage of onsites that lead to offers, using the 15,897 interview reviews for software engineering related roles at 27 major tech companies on Glassdoor as of August 2019. This ratio correlates to the yield rate — the percentage of candidates who accept their offers at a company. Even though the estimation is for software engineering roles, it serves as an indication for ML roles. There are many biases in this data, but hopefully, a large number of reviews smoothes out some noise49. If all reviews suffer from the same biases, they are still useful for comparison across companies. The data shows that the onsite-to-offer ratio ranges from a low of 15
Due to the biases of online reviews, the actual numbers should be lower. After talking to recruiters and doing extensive research, I found that the onsite-to-offer ratios here are a few percentage points higher than the actual numbers. For example, this and this claim that the onsite-to-offer ratio for Google is 10-20
The offer yield rate of near 90
The 5 companies with the lowest onsite-to-offer ratios are all Internet giants — Yelp, Google, Snap, Airbnb, and Facebook — who are known to be highly selective. Companies with high onsite-to-offer ratios aren’t necessarily unselective. They might be more selective during the screening process and only interview candidates that they really like. Onsites are costly, so the higher the onsite-to-offer ratio, the more financially sound the process. There’s a strong correlation (0.81) between the onsite-to-offer ratio and the yield rate — the higher the onsite-to-offer ratio, the higher the yield rate. A candidate that gets an offer from Google is more likely to turn it down than a candidate that gets an offer from a less selective company. There are several reasons. First, if a candidate passes the interviews at selective companies like Google or Facebook, they probably have other attractive offers to choose from. Second, selective companies tend to make competitive offers, which incentivizes candidates to get offers from them to negotiate with companies that they really want to work for. Third, the process at those companies usually takes a long time. By the time a candidate receives the offer, they might have already settled at another company. Last but not least, since candidates at Google and Facebook only get matched with a team after they’ve received their offers, they might reject the offers if they don’t like the team. 49: Some of the biases in this data: Life doesn’t end after a rejection. It doesn’t end after getting an offer either. You might want to negotiate your offer, consider other opportunities, contemplate your career perspective at the company or after you leave. It seems silly to think about leaving a company before joining, but at 13.2
In this chapter, we’ll discuss compensation details and career progression that might be useful when you consider your options. There are two types of compensation: direct compensations and indirect compensations. For most tech companies, direct benefits include three main components: Indirect benefits are more diverse and can be very generous but often at major tech companies. Below are some excamples. In early 2020, levels.fyi shared their data with me, which includes self-reported direct compensation details of 18.8k tech workers. This data is US-focused with 80
The reported compensation packages follow a skew-normal distribution with a small percentage making outlier amounts of money — tech workers in India or Russia can make under $20k, while top engineers at Google, Facebook, OpenAI can make millions a year. The median yearly compensation is $195,000, while the mean is $225,000. These numbers are higher than the results of the StackOverflow survey in 2019, which states that in the US, the median salary for an Engineering Manager — the highest paying engineering position in the survey — is $152,000. This might be because levels.fyi data is FAAAM-focused and includes the whole package — base salary, equity, and bonus — while StackOverflow surveyed only salary. Equity grants can be understood as promises — a company promises you a certain amount of equity if you stay with that company for a certain period of time. When that time is up and you receive your equity, that equity is said to have been vested. Equity grants are vested on schedule which is the same for everyone in the company. The two most common schedules are: The more you stay at a company, the more equity you’ll get. Your new equity grants will also follow a vesting schedule. For example, you’re granted 100 shares when you join. After the first year, you have access to 25 shares. Because you’re performing well, you’re also awarded 80 extra shares over the next 4 years. So, at the end of your second year, you have: 25 old shares + 20 new shares = 45 shares. The longer you stay at a company, the more grants you have left to vest. The schedule is to incentivize employees to stay longer at a company. There are many types of bonuses, but the two most common are a sign-on bonus that you get when you join and an annual bonus. Sign-on bonuses for junior roles can go anywhere from $0 to over $100K. For more senior roles, they can be unbounded. I’ve seen sign-on bonuses in the neighborhood of half a million. Annual bonuses are often between 10
As level increases, base salary and bonus increase slowly, but equity increases the most. For junior levels, base salary makes up the largest chunk of your direct compensations. As you level up, the proportion of equity goes up — higher equity reflects more responsibilities you have towards the company’s success. Companies expect you to negotiate. The initial offers are designed to make room for that. They offer the smallest amount you’d accept, not what is fair. I’ve seen candidates who had their offers doubled through negotiation. I’ve also seen people getting paid $100K/year less than their peers because they didn’t. Negotiation is stressful. “How much should I ask for?” “Am I being lowballed?” “What if they rescind the offer because they think I’m being greedy?” In case you’re worried about the last question, relax. If you’re respectful during your negotiation, companies won’t rescind their offers. If they do, it’s not a company you want to work for. When negotiating, it’s important to know how much other people are getting paid and how much you should get paid. The employer has this information since they’ve negotiated with many candidates before, but most of us only negotiate with a handful of employers during our entire life. This imbalance of information gives employers more leverage. It especially hurts candidates of underrepresented groups who don’t have access to people who can guide them through the process. There are several ways you can gather more information: When negotiating with big companies, it’s easier to negotiate equity grants and bonuses than base salary. Look beyond direct compensations. You can negotiate for more paid days off, more flexibility at work, a better title, more conferences they’ll sponsor you to attend. Candidates dread the question: “What are your compensation expectations?” One way to answer is to get the recruiter to give you a number first. Here’s an example of how you can answer that question. “I’m excited about this opportunity and I believe that if it’s a good match, we’ll work out an agreement on compensation details. I’d like to learn more about what this position entails first to have a realistic expectation. It’d also be helpful to me to know the range of compensation one can expect at this position if you can share it with me.” This answer works because it shows that you’re ready to negotiate but you also want to make it work. One problem with asking the recruiter to give a number is that the negotiation will anchor around it. If they give you a number that’s too low, you might feel awkward raising it significantly. Another way to answer is to give them a range without committing to it. Come up with a number you want, add ~20
“My range is flexible. However, I’d like to be compensated fairly for my experience, my unique set of skills, and what the job entails. My understanding is that for this position in this area, you can expect between $220k and $240k annually.” Once you’ve given a number, it can only go down, not up, so make sure to give a number you won’t regret later. For more negotiation tips, check out 15 Rules for Negotiating a Job Offer (Harvard Business Review, 2014) and Ten Rules for Negotiating a Job Offer (Haseeb Qureshi, 2016). For each offer, you should also look at the level. Most companies have well-defined levels for their software engineering roles that encompass ML roles. Higher levels suggest higher compensations, more decision-making power, and more responsibilities. Major tech companies follow very different ladders. For example, engineering roles at Google have levels from L3 to L10, while Facebook E3 to E9, Microsoft from 59 to 80. These levels also map to a more standardized ladder which includes: Usually, recent college graduates start at the lowest level, master graduates at the level above, and Ph.D. graduates at the next level. There’s not much variance in the base salary for the same level at the same company, as the base salary is usually capped for each level. But there’s a lot more variance in equity grants which can change significantly through negotiation. It’s not uncommon to see strong candidates being offered levels higher than their peers. Sometimes, a company might up your level to give you a higher base salary to match another offer you might have. However, companies might be reluctant to level up a new hire, as the higher level means higher expectations, which, many argue, can negatively affect their ability to succeed at the company. Few companies put levels in their offer letters. If you ask, recruiters should tell you, since you’ll find out if you join anyway. It’s fashionable for people in tech to say things like “titles don’t matter”, but they do. In the absence of a perfect method to evaluate someone’s actual professional ability, society responds well to titles. Having a higher level means not only higher compensation but also more freedom in deciding what to work on and more negotiating power if you want to change jobs. In theory, companies should put a lot of thought into designing their levels, and employees should be able to find out what’s expected at each level and what they can do to go up the ladder. If you don’t know, talk to your manager. You can put it as simple as: “I want to build my career here and take on more responsibilities at the company. What do you need to see from me to consider me for the next level?” For more information on engineering levels, checkout Things to know about engineering levels (Charity Majors, 2020). In a better world, we shouldn’t have to prepare for interviews. In an ideal world, there shouldn’t be interviews at all. Unfortunately, in our world, for our ability to be favorably assessed, we need to prepare. This preparation is a lifelong process and should begin way before you start applying for any job. It includes building up your portfolio, experience, skill sets, and network. You can only write a good resume if you have good things to put on that resume. If you’re only thinking about it now that you want a job, it’s not too late. Regardless of what job you do now, keep on preparing so that you can put your best foot forward for your next job search. This section covers the resources to strengthen your application. They include free online courses, books, articles, tools to familiarize yourself with, etc. The level of preparation you can commit to depends on your timeline. The job search is a long, stressful, and occasionally demoralizing process. The timeline varies based on where you are and where you want to work. In the US, your job search should start three to six months before you want to start at your new job. For students, it may start at the beginning of your last year in school. If you’re required to give notice before leaving — for example, in Germany, leaving employees must give a three-month notice — this process should start much earlier. If you need a visa, you need to take into account the time it takes to obtain a visa. If you plan on getting a job in a year, you might be able to, in your free time, go over all those resources and build up your online presence. When going over an online course, make sure to do assignments on top of watching lectures. You can try to reimplement the papers that you find interesting, improve on them, and put them on GitHub. Try to enter at least a couple of Kaggle competitions. Three months before your interviews, you might have time to do two to three courses and read three books. For courses, I’d recommend one hands-on course like fast.ai’s Practical Deep Learning for Coders and one theoretical course like Machine Learning by Coursera. You might also want to check out my course at Stanford: Machine Learning Systems Design since the course covers practical challenges and solutions for ML in production. For books, I’d recommend Deep Learning by Goodfellow et al., Machine Learning: A Probabilistic Perspective by Kevin P. Murphy. A week before your interviews, review the notes of CS231N: Convolutional Neural Networks for Visual Recognition, especially the parts about gradient descent, activations, and optimizations as well as rewatch Full Stack Deep Learning lectures, especially the ones on Setting up Machine Learning Projects and Infrastructure and Tooling. You might want to skim the questions in part 2 and part 3 of this book again. You should also review your previous projects in case your interviewers want to know all about them. A day before your interviews, make sure that you get enough sleep. Don’t repeat my mistake of staying up late cramming and showing up to my interviews half-asleep. Arrive 10 minutes before so you have time to settle in. If you get the job but want a better job in the future, prepare early this time. If you don’t get the job, rinse and repeat. I find it helpful to follow people whose careers I admire and learn how they got there. There’s no one path to any job — not all ML researchers did their PhDs and not all ML engineers studied computer science in college or went to college at all. Often, candidates with more unconventional backgrounds are more desirable as they can bring fresh perspectives to the team. Many people have written about their career paths. Here are some of the stories that I found inspiring. Given the plethora of available resources online, it can be disorienting trying to figure out which resources to focus on. The Kaggle’s state of data science and machine learning survey 2017 asked respondents about the learning methods that they found the most helpful. Here is a visualization of the responses, created by my previous colleague Jack Cook. The most effective learning methods are doing projects, taking courses, and just spending a ton of time on StackOverFlow (SO). Kaggle competitions rank high on the list but since the respondents are Kaggle users, their answers are biased. A college education is perceived as slightly more useful than watching YouTube tutorials and reading blogs. The least useful methods in this survey are podcasts, newsletters, and conferences. Attending conferences might not be useful for building your skill sets, but very useful for building up your network. Getting published at conferences is a great way to put your name out there and signal that your knowledge of the field is in-depth enough to come up with original research ideas. The courses are listed in the order they should be taken. It was combined in August 2019, so some of the links might have become outdated, but the curriculum can still be useful to have a sense of what areas of knowledge you should acquire and find other ways to acquire them, e.g. other courses or books51. 1. Probability and Statistics by Stanford Online See course materials (free online course) This self-paced course covers basic concepts in probability and statistics spanning over four fundamental aspects of machine learning: exploratory data analysis, producing data, probability, and inference. Alternatively, you might want to check out this excellent course in statistical learning: An Introduction to Statistical Learning with Applications in R. 2. 18.06: Linear Algebra by MIT Textbook: Introduction to Linear Algebra (5th ed.) by Gilbert Strang See course materials (videos available) The best linear algebra course I’ve seen, taught by the legendary professor Gilbert Strang. I’ve heard students describe this as “life-changing”. 3. CS231N: Convolutional Neural Networks for Visual Recognition by Stanford CS231N is hands down the best deep learning course I’ve come across. It balances theories with practices. The lecture notes are well written with visualizations and examples that explain difficult concepts such as backpropagation, gradient descent, losses, regularizations, dropouts, batchnorm, etc. 4. Practical Deep Learning for Coders by fast.ai See course materials (free online course) With the ex-president of Kaggle as one of its co-founders, this hands-on course focuses on getting things up and running. It has a forum with helpful discussions about the current best practices in ML. 5. CS224N: Natural Language Processing with Deep Learning by Stanford52 Taught by one of the most influential (and most down-to-earth) researchers, Christopher Manning, this is a must-take course for anyone interested in NLP. The course is well organized, well taught, and up-to-date with the latest NLP research. The assignments, while useful, can sometimes be frustrating as training NLP models takes time. 6. Machine Learning by Coursera See course materials (free online course) Originally taught at Stanford, Andrew Ng’s course is probably the most popular ML course. As of writing, its Coursera version has been enrolled by more 2.5M people. This course is theoretical, so students would benefit more from it after more practical courses such as CS231N, CS224N, and Practical Deep Learning for Coders. 7. Probabilistic Graphical Models Specialization by Coursera Textbook: Probabilistic Graphical Models: Principles and Techniques by Daphne Koller and Nir Friedman See course materials (free online courses) Unlike most AI courses that introduce small concepts one by one or add one layer on top of another, this specialization tackles AI top down as it asks you to think about the relationships between different variables, how you represent those relationships, what independence you’re assuming, what exactly you’re trying to learn when you say machine learning. This specialization isn’t easy, but it’ll change the way you approach ML. You can also consult detailed notes written by Stanford CS228’s TAs here. 8. Introduction to Reinforcement Learning by DeepMind Reinforcement learning is hard. This course provides a great introduction to RL with intuitive explanations and fun examples, taught by one of the world’s leading RL experts, David Silver. 9. Full Stack Deep Learning Bootcamp53 Most courses only teach you how to train and tune your models. This is the first one I’ve seen that shows you how to design, train, and deploy models from A to Z. This is also a great resource for those struggling with the machine learning system design questions in interviews. 10. How to Win a Data Science Competition: Learn from Top Kagglers by Coursera See course materials (free online course) With all the knowledge we’ve learned, it’s time to head over to Kaggle to build some machine learning models to gain experience and win some money. Warning: Kaggle grandmasters might not necessarily be good instructors. For even more online sources, kmario23 compiled a list of available online courses. David Venturi also aggregated reviews for popular courses. Emil Wallner posted his 12-month curriculum on How to learn Deep Learning. 51: The list was originally shared on Twitter. It’s since then been retweeted more than 2,000 times, including by MIT CSAIL (Computer Science and Artificial Intelligence Laboratory) and Stanford NLP (Natural Language Processing) groups. 52: Disclaimer: I gave a guest lecture in a version of this course in 2018, unpaid. 53: Disclaimer: I gave a guest lecture in a version of this course in 2019, unpaid. In summary, below are some dos and don’ts that you should keep in mind during your job search. This part contains over 200 questions that have more or less deterministic answers. This type of question is to test your understanding of machine learning concepts. In an hour-long interview, you can cover 10 – 15 of those questions. I rank the questions by three levels of difficulty: Some of the knowledge questions are considered bad interview questions, especially those about definitions that can be easily looked up. For example, asking someone to explain PCA is good for evaluating their memorization of PCA, not their understanding of PCA. However, some bad interview questions can still make good questions when practicing for interviews, so I include some definition questions to remind readers that certain concepts are important. Techniques go in and out of fashion, but fundamental challenges stay the same. Instead of asking candidates to write out complex equations for certain techniques, this book focuses on the challenges that gave rise to those techniques in the first place. Most of the questions in this section are about why something matters and how it works. If the extent of your ML work will only ever consist of running However, some mathematical background will be helpful to the following. This section covers the following branches of math that are important in ML: algebra, probability and statistics, dimensionality reduction, and very little calculus and convex optimization. This list is far from exhaustive. For example, graph theory, logic, topology, and other mathematical branches occur frequently in ML but aren’t included here. If some characters seem to be missing, it’s because MathJax is not loaded correctly. Refreshing the page should fix it. If some characters seem to be missing, it’s because MathJax is not loaded correctly. Refreshing the page should fix it. If some characters seem to be missing, it’s because MathJax is not loaded correctly. Refreshing the page should fix it. If some characters seem to be missing, it’s because MathJax is not loaded correctly. Refreshing the page should fix it. If some characters seem to be missing, it’s because MathJax is not loaded correctly. Refreshing the page should fix it. If some characters seem to be missing, it’s because MathJax is not loaded correctly. Refreshing the page should fix it. If some characters seem to be missing, it’s because MathJax is not loaded correctly. Refreshing the page should fix it. Convex optimization is important because it’s the only type of optimization that we more or less understand. Some might argue that since many of the common objective functions in deep learning aren’t convex, we don’t need to know about convex optimization. However, even when the functions aren’t convex, analyzing them as if they were convex often gives us meaningful bounds. If an algorithm doesn’t work assuming that a loss function is convex, it definitely doesn’t work when the loss function is non-convex. Convexity is the exception, not the rule. If you’re asked whether a function is convex and it isn’t already in the list of commonly known convex functions, there’s a good chance that it isn’t convex. If you want to learn about convex optimization, check out Stephen Boyd’s textbook. The Hessian matrix or Hessian is a square matrix of second-order partial derivatives of a scalar-valued function. Given a function . If all second partial derivatives of f exist and are continuous over the domain of the function, then the Hessian matrix H of f is a square nn matrix such that: . The Hessian is used for large-scale optimization problems within Newton-type methods and quasi-Newton methods. It is also commonly used for expressing image processing operators in image processing and computer vision for tasks such as blob detection and multi-scale signal representation. A Reddit user once said: “Data science is just doing statistics on a Mac.” Knowledge of probability and statistics is extremely important in ML and data science. If you don’t understand cross-entropy, KL divergence, or just general probability distribution, most of the objective functions in ML will statistically make little sense. A likely question would be to explain any of the common distributions and write out its equation, draw its probability mass function (PMF) if it’s discrete and the probability density function (PDF) if it’s continuous. It’d be useful to review all the common distributions. If some characters seem to be missing, it’s because MathJax is not loaded correctly. Refreshing the page should fix it. Formally, a random variable is a measurable function from a set of possible outcomes to a measurable space . The probability that takes on a value in a measurable set is written as: , where is the probability measure on . The randomness comes from the randomness of the outcomes in . Informally, a random variable is a variable that probabilistically takes on different values. You can think of a random variable as being like a variable in a programming language. They take on values, have types, and have domains over which they are applicable. Random variable is a general concept. Almost everything in life can be described using a random variable. The time it takes you to commute to work is a normal random variable. The number of people you date before finding your life partner is a geometric random variable. A probability distribution is a function that describes possible outcomes of a random variable along with its corresponding probabilities. Also known as the Gaussian random variable, this is the single most important random variable. It’s parameterized by a random variable, parametrized by a mean and variance . The term we want to go over in this function is . Intuitively, it punishes the value far away from the mean, but the punishment is less when the variance is high. The term is normalization so that it integrates to 1. Here is the PDF of the normal distribution with different parameters. Also known as the multinoulli distribution, the categorical distribution is a generalization of the Bernoulli distribution. It describes the possible results of a random variable that can take on one of possible categories, with the probability of each category separately specified. A binomial random variable represents the number of successes in n successive independent trials, each succeeding with probability and failing with probability . One example is the number of heads in coin flips, each with a 0.5 probability of landing head. The binomial distribution is the basis for the binomial test for statistical significance. When there’s only 1 trial, it’s known as the Bernoulli distribution. Below is the PMF of the binomial distribution with different parameters. The multinomial random variable is a generalization of the binomial distribution. Instead of having only two outcomes like with a coin flip, it can have multiple outcomes like with a k-sided die. When the number of trials is 1, it’s the categorical distribution. The Poisson distribution is, in my opinion, among the more interesting distributions. It expresses the probability of a given number of events occurring in a fixed interval if these events occur with a known constant rate. This rate is denoted as . Note that the Poisson distribution is memoryless, which means the probability that an event occurs is independent of the time since the last event. One pretty neat perspective is to see the Poisson distribution as an approximation of the Binomial where is large, is small, and . For example, a Binomial random variable of 10000 trials with the success rate of 0.01 can be seen as a Poisson random variable of events happening every 10000 * 0.01 = 100 trials. Below is the PMF of the Poisson distribution with different values of , made by Skbkekas. If each trial has the probability of success of p, then the geometric random variable represents the number of independent trials until the first success. One example is the number of candidates you have to interview until you hire someone. Below is the PMF of the geometric distribution with different values of , made by Skbkekas. Beta is my favorite distribution (what do you mean you don’t have a favorite distribution?). It’s a random variable that estimates another random variable. Say, we have a coin with an unknown probability of turning heads. Let represent this probability. After flips, we get heads and tails. We might want to estimate that . However, this is unreliable, especially if is small. We’d like to say something like this: can also be more than, less than, or equal to , the values further away from having a smaller probability. And the higher the value of , the higher the probability of being . The beta distribution allows you to do that. The beta random variable is represented using two variables: to represent the number of successes and to represent the number of failures. The beta distribution can represent beyond coin flips. In fact, and can represent continuous value (though they can’t be non-positive). is the Gamma function: . The term is the normalization constant so that the expression integrates to 1. We can also incorporate a priori belief in the beta distribution. For example, if before even flipping the coin, we believe that the coin has a moderate chance of being biased towards heads, we can set the a priori to be . Then after 10 coin flips of which 7 are heads and 3 are tails, we can update the distribution to be . In fact, in Bayesian inference, the beta distribution is the conjugate prior probability distribution for Bernoulli, binomial, negative binomial, and geometric distributions. Below is the PDF of the geometric distribution with different parameters. The multivariate generalization of the beta random variable is called Dirichlet. A class of distributions is in the exponential family if it can be written in the form: where: The exponential family of distribution is important because it provides a general framework for working with many of the most common distributions, including the Bernoulli, binomial, Poisson, normal, and more. You can write the PMF and PDF of those distributions to match the form defined above. For example, given the Poisson random variable , with , belongs to the exponential family because can be written in the form . More examples that show that other distributions belong to the exponential family can be found here and here. A joint probability distribution gives the probability of two or more events happening at the same time. For example, given two discrete random variables and , the joint probability distribution of and gives the probability of for any combination of value and value . A marginal distribution gives the probabilities of various values of a subset of variables without reference to the values of the other variables. For example, given the joint distribution of and , we want to have a marginal probability distribution of without reference to . A conditional probability distribution gives the probability of a subset of events occurring assuming that other events also occur. One example is . If some characters seem to be missing, it’s because MathJax is not loaded correctly. Refreshing the page should fix it. From xkcd. To distract yourself from interviewing stress, here are more statistics jokes. For coding questions, the best way to prepare is to write code every day. Programming mastery requires consistent practice. However, the type of coding you do every day is different from the type of coding asked during interviews, so it’s a good idea to get some practice. Before your interviews, do a few typical software engineering questions to get yourself into the problem-solving, whiteboard coding mode. If you haven’t seen them in a while, pick up a good book on algorithms and data structures and skim it. We recommend the book Data Structures and Algorithms in Python by Michael T. Goodrich and the classic Introduction to Algorithms by Thomas Cormen et al. We also recommend practicing websites such as LeetCode, CodeSignal, and HackerRank. Those sites rank problems by difficulty — you should try to solve medium and hard problems. Most of them have solutions available in case you want to compare your solutions to more optimal ones1. This chapter addresses the three major aspects of computer science that are covered in machine learning interviews: algorithms, complexity and numerical analysis, and data, which includes data structures. If some characters seem to be missing, it’s because MathJax is not loaded correctly. Refreshing the page should fix it. Examples of classic algorithms you should know include various sorting algorithms (quicksort, radix sort), shortest path algorithms (Dijkstra’s, A*), tree algorithms (pre-, in-, post-order traversal), and solutions to popular problems such as the stable marriage problem and traveling salesman problem. You will probably never have to implement them since there already exist many efficient implementations, but it’s important to understand their underlying design principles and implementation tradeoffs in case you have to make similar decisions in your job. There are also programming techniques that you should be comfortable with, such as dynamic programming, recursions, string manipulation, matrix multiplication, regular expression, and memory allocation. Below are some of the questions that you might want to go over to refresh your memory. If some characters seem to be missing, it’s because MathJax is not loaded correctly. Refreshing the page should fix it. Given that most of the recent breakthroughs in machine learning come from bigger models that require massive memory and computational power, it’s important to not only know how to implement a model but also how to scale it. To scale a model, we’d need to be able to estimate memory requirement and computational cost, as well as mitigate numerical instability when training and serving machine learning models. Here are some of the questions that can be asked to evaluate your understanding of numerical stability and scalability. In an academic or research setting, you likely only work with clean, readily available datasets and therefore can afford to spend most of your time on modeling. In production, it’s likely that you will spend most of your time on the data pipeline. The ability to manage, process, and monitor data will make you attractive to potential employers. In your interviews, you might be asked questions that evaluate how comfortable you are with working with data. At a high level, you should be familiar with reading, writing, and serializing different types of data. You should have your go-to library for dataframe manipulation: If you want to work with big data, it doesn’t hurt to familiarize yourself with distributed data management systems such as Spark and Hadoop. Beyond Python, SQL is still ubiquitous for all applications that require persistent databases, and while R isn’t the sexiest language, it’s handy for quick data analytics. If our world is run by data, then data structures are what keep us from descending into chaos. From the dawn of the digital age, the best minds of computer science have kept themselves up at night thinking of efficient ways to store and manipulate data. Data structures are even more important in machine learning as the field is fueled by big data. While there are classical data structures that have stood the test of time, developing new data structures and improving on the existing ones are never-ending battles as new formats are introduced and new data are generated at a scale never seen before. Your familiarity with existing data structures, understanding of how they are implemented, and intuition on what data structures to use and when to use them will be highly valuable. Some of the data structures whose runtime complexities you should know and that you should be able to implement in at least one language: We don’t have questions about those data structures here, but you should try to implement them yourself, either on a coding exercise website or locally, and compare with known implementation. You should be comfortable with manipulating popular data formats such as the ubiquitous CSV format and web- and serialization-friendly JSON format. Both CSV and JSON are examples of the traditional row-based file formats: data is stored and often indexed row-by-row. In recent years, the column-based format has become more and more common, as it allows big data applications to quickly extract one feature from all the data points by calling the column corresponding to that feature. Popular data frameworks for machine learning include Row-based data formats are more efficient for writing while column-based formats are more efficient for reading. If your data is write-once-read-many, use column-based. If it requires regular rewriting, opt for row-based. For more detail on data engineering for machine learning, check out the lecture note on Data Engineering for the course Machine Learning Systems Design. Even though deep learning seems to be all that people in the research community is talking about, most real-world problems are still being solved by classical machine learning algorithms including k-nearest neighbor and XGBoost. In this chapter, we will cover fundamentals that are essential for understanding machine learning algorithms, as well as non-deep learning algorithms that you might find useful in both your day-to-day jobs and interviews. k-NN is a non-parametric method used for classification and regression. Given an object, the algorithm’s output is computed from its k closest training examples in the feature space. Applications: anomaly detection, search, recommender system k-means clustering aims to partition observations into k clusters in which each observation belongs to the cluster with the nearest mean. k-means minimizes within-cluster variances (squared Euclidean distances), but not regular Euclidean distances. The algorithm doesn’t guarantee convergence to the global optimum. The result may depend on the initial clusters. As the algorithm is usually fast, it is common to run it multiple times with different starting conditions. The problem is computationally difficult (NP-hard); however, efficient heuristic algorithms converge quickly to a local optimum. The algorithm has a loose relationship to the k-nearest neighbor classifier. After obtaining clusters using k-means clustering, we can classify new data into those clusters by applying the 1-nearest neighbor classifier to the cluster centers. Applications: Vector quantization for signal processing (where k-means clustering was originally developed), cluster analysis, feature learning, topic modeling. EM algorithm is an iterative method to find maximum likelihood (MLE) or maximum a posteriori (MAP) estimates of parameters. It’s useful when the model depends on unobserved latent variables and equations can’t be solved directly. The iteration alternates between performing: EM algorithm is guaranteed to return a local optimum of the sample likelihood function. Example: Gaussian mixture models (GMM) Applications: Data clustering, collaborative filtering. Decision tree is a tree-based method that goes from observations about an object (represented in the branches) to conclusions about its target value (represented in the leaves). At its core, decision trees are nest if-else conditions. In classification trees, the target value is discrete and each leaf represents a class. In regression trees, the target value is continuous and each leaf represents the mean of the target values of all objects that end up with that leaf. Decision trees are easy to interpret and can be used to visualize decisions. However, they are overfit to the data they are trained on — small changes to the training set can result in significantly different tree structures, which lead to significantly different outputs. Bagging and boosting are two popular ensembling methods commonly used with tree-based algorithms that can also be used for other algorithms. Bagging, shortened for bootstrap aggregating, is designed to improve the stability and accuracy of ML algorithms. It reduces variance and helps to avoid overfitting. Given a dataset, instead of training one classifier on the entire dataset, you sample with replacement to create different datasets, called bootstraps, and train a classification or regression model on each of these bootstraps. Sampling with replacement ensures each bootstrap is independent of its peers. If the problem is classification, the final prediction is decided by the majority vote of all models. For example, if 10 classifiers vote SPAM and 6 models vote NOT SPAM, the final prediction is SPAM. If the problem is regression, the final prediction is the average of all models’ predictions. Bagging generally improves unstable methods, such as neural networks, classification and regression trees, and subset selection in linear regression. However, it can mildly degrade the performance of stable methods such as k-nearest neighbors2. Illustration by Sirakorn A random forest is an example of bagging. A random forest is a collection of decision trees constructed by both bagging and feature randomness, each tree can pick only from a random subset of features to use. Due to its ensembling nature, random forests correct for decision trees’ overfitting to their training set. Applications: Random forests are among the most widely used machine learning algorithms in the real world. They are used in banking for fraud detection, medicine for disease prediction, stock market analysis, etc. For more information on random forests, see Understanding Random Forest by Tony Yiu. Boosting is a family of iterative ensemble algorithms that convert weak learners to strong ones. Each learner in this ensemble is trained on the same set of samples but the samples are weighted differently among iterations. Thus, future weak learners focus more on the examples that previous weak learners misclassified. Illustration by Sirakorn An example of a boosting algorithm is Gradient Boosting Machine which produces a prediction model typically from weak decision trees. It builds the model in a stage-wise fashion as other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function. XGBoost, a variant of GBM, used to be the algorithm of choice for many winning teams of machine learning competitions. It’s been used in a wide range of tasks from classification, ranking, to the discovery of the Higgs Boson3. However, many teams have been opting for LightGBM, a distributed gradient boosting framework that allows parallel learning which generally allows faster training on large datasets. In machine learning, kernel methods are a class of algorithms for pattern analysis, whose best-known member is the support vector machine (SVM). The general task of pattern analysis is to find and study general types of relations (for example clusters, rankings, principal components, correlations, classifications) in datasets. For many algorithms that solve these tasks, the data in raw representation have to be explicitly transformed into feature vector representations via a user-specified feature map: in contrast, kernel methods require only a user-specified kernel, i.e., a similarity function over pairs of data points in raw representation. Kernel methods owe their name to the use of kernel functions, which enable them to operate in a high-dimensional, implicit feature space without ever computing the coordinates of the data in that space, but rather by simply computing the inner products between the images of all pairs of data in the feature space. This operation is often computationally cheaper than the explicit computation of the coordinates. This approach is called the “kernel trick”.[1] Kernel functions have been introduced for sequence data, graphs, text, images, as well as vectors. Algorithms capable of operating with kernels include the kernel perceptron, support vector machines (SVM), Gaussian processes, principal components analysis (PCA), canonical correlation analysis, ridge regression, spectral clustering, linear adaptive filters, and many others. Any linear model can be turned into a non-linear model by applying the kernel trick to the model: replacing its features (predictors) with a kernel function. There are three main subfields in machine learning: speech and natural language processing (NLP), computer vision, and reinforcement learning. NLP has been successfully applied in business intelligence, voice assistant, machine translation, autocompletion and autocorrection, automated customer services, etc. Computer vision is the core technology in self-driving cars, security (surveillance cameras, facial recognition), photo/video generation (which are terrifyingly good), and other entertainment services such as photo editing, photo filters, face swap. Reinforcement learning is harder to deploy as the real-world environment is so much more complex than simulation, but we’ve seen the use of RL in ads bidding optimization, unmanned aerial vehicles (such as drones), and various robotic applications such as warehouse and production robots. A company or a team might focus on a subfield. For example, the Siri team at Apple might focus on speech and natural language understanding, and the Autopilot team at Tesla might be more interested in computer vision. However, techniques in one subfield can be used for another, and there are tasks that have components from different subfields. There’s undoubtedly value in being a world-class expert in your niche subfield, but to get there, you might need to have knowledge of other subfields.2.1.3 What signals companies look for in candidates
⚠ The free project bias ⚠ In 2013, Chris Anderson, the author of The Long Tail, tweeted about the advice he received about hiring software developers: “reject anyone who doesn’t have a GitHub profile (the more active the better).”
Even though GitHub/Kaggle in particular and past projects, in general, seem meritocratic, we have to be mindful of the candidates’ circumstances when looking at them. Not everyone can afford to contribute to open-source projects or enter Kaggle competitions. If we place too much importance on voluntary activities, we accidentally punish candidates from less privileged backgrounds — those who work long hours, have too many responsibilities at home or face online harassment for who they are.
One group that suffers if hiring decisions are made based on open-source contributions is women. According to a 2016 research by the National Center for Women & Information Technology, the percentages of women in various software engineering occupations are 21
Some hiring managers are aware of this privilege bias. Jeremy Howard, an ex-president of Kaggle and co-founder of fast.ai, responded to my survey on Twitter that he evaluates candidates’ achievements with respect to their backgrounds: “I look for people that have achieved an unusually high level of capability despite limited opportunities or significant constraints. It’s been the best hiring signal over many years and companies for me.39”
Tip: Sell yourself. Highlight your qualities The hiring process of most tech companies, including and especially the biggest ones, is far from perfect. It’s riddled with biases and loopholes. Yet, it’s still being used because of legacy and bureaucracy. Until a better process comes along, the best that candidates can do is to understand the signals employers look for and maximize our visibility. On average, recruiters spend only 7.4 seconds on a resume. If you’re a great ML engineer but can’t signal to recruiters that you’re amazing in those 7.4 seconds, you’re out.
2.2 Interview pipeline
2.2.1 Common interview formats
2.2.2 Alternative interview formats
2.2.3 Interviews at big companies vs. at small companies
2.2.4 Interviews for internships vs. for full-time positions
2.3 Types of questions
2.3.1 Behavioral questions
2.3.1.1 Background and resume
2.3.1.2 Interests
2.3.1.3 Communication
2.3.1.4 Personality
2.3.2 Questions to ask your interviewers
⚠ Never ask your interviewers about compensation ⚠ Unless the topic is explicitly brought up by the interviewers. Some hiring managers consider this a red flag as it signals that the candidate only cares about money and will jump ship as soon as a better offer comes along.
2.3.3 Bad interview questions
2.4 Red flags
2.5 Timeline
2.6 Understanding your odds
Chapter 3. After an offer
3.1 Compensation package
3.1 Base salary
3.2 Equity grants
3.3 Bonuses
3.4 Compensation packages at different levels
3.2 Negotiation
3.2.1 Compensation expectations
3.3 Career progression
⚠ This section only applies to big companies. At startups, hierarchies are flat and levels are not well-defined. ⚠
Chapter 4. Where to start
4.1 How long do I need for my job search?
⚠ How to become a machine learning expert in 3 months ⚠ You can’t. Becoming an expert in anything takes years, if not decades. Stay clear of anyone who claims that they can give you a shortcut to becoming a machine learning expert. At best, they teach you bad machine learning. At worst, it’s a scam.
Peter Norvig, director of search at Google, wrote a wonderful blog post on how long it takes to learn programming: Teach Yourself Programming in Ten Years. His advice is applicable to ML.
4.2 How other people did it
4.3 Resources
4.3.1 Courses
4.3.2 Books & articles
4.3.3 Other resources
4.4 Do’s and don’ts for ML interviews
4.4.1 Do’s
4.4.2 Don’ts
Part II: Questions
Tip: Strategy for
definition
questions When asked to give the definition of or explain a technique, always start with the motivation for that technique. For example, if asked to explain LSTM for recurrent neural networks, you should first bring up the problems that arise in normal RNNs and how LSTMs address those problems. Chapter 5. Math
keras.fit
or cloning existing implementations, you probably don’t need math. There are many courses and books that promise you machine learning mastery with little or no math at all. If that’s what you’re looking for, feel free to skip this chapter.
Notation
5.1 Algebra and (little) calculus
5.1.1 Vectors
5.1.2 Matrices
5.1.3 Dimensionality reduction
In case you need a refresh on PCA, here’s an explanation without any math.
Assume that your grandma likes wine and would like to find characteristics that best describe wine bottles sitting in her cellar. There are many characteristics we can use to describe a bottle of wine including age, price, color, alcoholic content, sweetness, acidity, etc. Many of these characteristics are related and therefore redundant. Is there a way we can choose fewer characteristics to describe our wine and answer questions such as: which two bottles of wine differ the most?
PCA is a technique to construct new characteristics out of the existing characteristics. For example, a new characteristic might be computed as
age - acidity + price
or something like that, which we call a linear combination.
To differentiate our wines, we’d like to find characteristics that strongly differ across wines. If we find a new characteristic that is the same for most of the wines, then it wouldn’t be very useful. PCA looks for characteristics that show as much variation across wines as possible, out of all linear combinations of existing characteristics. These constructed characteristics are principal components of our wines.
If you want to see a more detailed, intuitive explanation of PCA with visualization, check out amoeba’s answer on StackOverflow. This is possibly the best PCA explanation I’ve ever read. 5.1.4 Calculus and convex optimization
On convex optimization
On Hessian matrix
5.2 Probability and statistics
5.2.1 Probability
5.2.1.1 Basic concepts to review
Random variable
Probability distribution
Normal random variable
Categorical distribution
Binomial random variable
Multinomial random variable
Poisson random variable
Poisson vs binomial according to Data Science Central:
If your question has an average probability of an event happening per unit (i.e. per unit of time, cycle, event) and you want to find the probability of a certain number of events happening in a period of time (or a number of events), then use the Poisson Distribution.
If you are given an exact probability and you want to find the probability of the event happening a certain number of times out of x (i.e. 10 times out of 100, or 99 times out of 1000), use the Binomial Distribution formula.
Geometric random variable
Beta random variable
Exponential family of distributions
Marginal distribution, joint distribution, conditional distribution
5.2.1.2 Questions
5.2.2 Stats
Hint: Check out the curse of big data.
Chapter 6. Computer Science
What programming language to use during interviews
If you’re comfortable with only one language, feel no qualms using it during interviews. If you’re comfortable with multiple languages, listen to the question first before choosing a language. It shows that you understand that different languages are built for different purposes: a language suitable for one task might not be optimal for another.
Put all the languages you know on your resume but don’t feel the need to show them off during interviewers. Employers would rather hire someone really good at one language — it means that you can learn to be good at other languages — than hiring someone mediocre at multiple languages.
Based on the language you choose, interviewers might infer what you’re interested in. For example, a hiring manager told me that if a candidate chooses to implement data structures in Python, he knows that this candidate doesn’t focus on performance.
Python has become the de facto lingua franca of machine learning — most frameworks have Python APIs and most open-source projects are written in Python. It’s a useful language to know and most interviewers probably expect it, but don’t feel like you have to use it during interviews. If you’re more comfortable with another language, use it. Writing a complex model in another language, say Swift, is a lot more impressive and helps you stand out.
It also helps if you know at least one performance-oriented language such as C++ or Go. C++ is more popular with more support, but Go is easier to learn and manage. Since more and more machine learning models are being served as web applications, many startups look for machine learning engineers with front-end skills. Fluency in TypeScript or React is a huge plus.
6.1 Algorithms
malloc()
to allocate memory and free()
to free a memory block.
10 * 4 + (4 + 3) / (2 - 1)
, calculate it. It should support four operators +
, -
, :
, /
, and the brackets ()
.
Justify alignment
option that spaces your text to align with both left and right margins. Write a function to print out a given text line-by-line (except the last line) in Justify alignment format. The length of a line should be configurable.
6.2 Complexity and numerical analysis
6.3 Data
pandas
is popular for general data applications, and dask
is a good option if you want something GPU-compatible. You should be comfortable with at least a visualization library such as seaborn
, matplotlib
, Tableau
, or ggplot
.6.3.1 Data structures
pandas
and dask
are optimized for column-based operations. The two common column-based file formats are Parquet, championed by Apache Hadoop, and ORC, championed by Apache Hive.Chapter 7. Machine learning workflows
7.1 Basics
7.2 Sampling and creating training data
Hint: You might want to clarify what oversampling here means. Oversampling can be as simple as dupplicating samples from the rare class.
7.3 Objective functions, metrics, and evaluation
Chapter 8. Machine learning algorithms
Tip To refresh your knowledge of different ML algorithms, it’s a good idea to look at winning solutions for recent Kaggle competitions. For a list of machine learning algorithms and how they are used in winning solutions on Kaggle, check out Data Science Glossary on Kaggle.
8.1 Classical machine learning
8.1.1 Overview: Basic algorithms
8.1.1.1 k-nearest neighbor (k-NN)
8.1.1.2 k-means clustering
8.1.1.3 EM (expectation-maximization) algorithm
8.1.1.4 Tree-based methods
8.1.1.5 Bagging and boosting
8.1.1.5.1 Bagging
8.1.1.5.2 Boosting
8.1.1.6 Kernel methods
8.1.2 Questions
Image from Mohamad Ghassany’s course on Machine Learning
8.2 Deep learning architectures and applications
8.2.1 Natural language processing
D1: The duck loves to eat the worm D2: The worm doesn’t like the early bird D3: The bird loves to get up early to get the worm D4: The bird gets the worm from the early duck D5: The duck and the birds are so different from each other but one thing they have in common is that they both get the worm
8.2.2 Computer vision
8.2.3 Reinforcement learning
Tip To refresh your knowledge on deep RL, checkout Spinning Up in Deep RL (OpenAI)
8.2.4 Other
8.3 Training neural networks
Tip For more tips on training neural networks, check out: