\ No newline at end of file
diff --git a/Cheat-Sheets/Django/index.html b/Cheat-Sheets/Django/index.html
index 3eda98e..e4ba176 100644
--- a/Cheat-Sheets/Django/index.html
+++ b/Cheat-Sheets/Django/index.html
@@ -1 +1 @@
- Django - Data Science Interview preparation
\ No newline at end of file
diff --git a/Cheat-Sheets/Flask/index.html b/Cheat-Sheets/Flask/index.html
index 63677e2..d484c85 100644
--- a/Cheat-Sheets/Flask/index.html
+++ b/Cheat-Sheets/Flask/index.html
@@ -1 +1 @@
- Flask - Data Science Interview preparation
\ No newline at end of file
diff --git a/Cheat-Sheets/Hypothesis-Tests/index.html b/Cheat-Sheets/Hypothesis-Tests/index.html
new file mode 100644
index 0000000..f592682
--- /dev/null
+++ b/Cheat-Sheets/Hypothesis-Tests/index.html
@@ -0,0 +1,171 @@
+ Hypothesis Tests in Python (Cheat Sheet) - Data Science Interview preparation
A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters.
Few Notes:
When it comes to assumptions such as the expected distribution of data or sample size, the results of a given test are likely to degrade gracefully rather than become immediately unusable if an assumption is violated.
Generally, data samples need to be representative of the domain and large enough to expose their distribution to analysis.
In some cases, the data can be corrected to meet the assumptions, such as correcting a nearly normal distribution to be normal by removing outliers, or using a correction to the degrees of freedom in a statistical test when samples have differing variance, to name two examples.
Normality Tests
This section lists statistical tests that you can use to check if your data has a Gaussian distribution.
Tests whether a data sample has a Gaussian distribution/Normal distribution.
Assumptions
Observations in each sample are independent and identically distributed (iid).
Interpretation
H0: the sample has a Gaussian distribution.
H1: the sample does not have a Gaussian distribution.
Python Code
# Example of the Shapiro-Wilk Normality Test
+fromscipy.statsimportshapiro
+data=[0.873,2.817,0.121,-0.945,-0.055,-1.436,0.360,-1.478,-1.637,-1.869]
+stat,p=shapiro(data)
+print('stat=%.3f, p=%.3f'%(stat,p))
+ifp>0.05:
+ print('Probably Gaussian')
+else:
+ print('Probably not Gaussian')
+
Tests whether a data sample has a Gaussian distribution/Normal distribution.
Assumptions
Observations in each sample are independent and identically distributed (iid).
Interpretation
H0: the sample has a Gaussian distribution.
H1: the sample does not have a Gaussian distribution.
Python Code
# Example of the D'Agostino's K^2 Normality Test
+fromscipy.statsimportnormaltest
+data=[0.873,2.817,0.121,-0.945,-0.055,-1.436,0.360,-1.478,-1.637,-1.869]
+stat,p=normaltest(data)
+print('stat=%.3f, p=%.3f'%(stat,p))
+ifp>0.05:
+ print('Probably Gaussian')
+else:
+ print('Probably not Gaussian')
+
Tests whether a data sample has a Gaussian distribution/Normal distribution.
Assumptions
Observations in each sample are independent and identically distributed (iid).
Interpretation
H0: the sample has a Gaussian distribution.
H1: the sample does not have a Gaussian distribution.
Python Code
# Example of the Anderson-Darling Normality Test
+fromscipy.statsimportanderson
+data=[0.873,2.817,0.121,-0.945,-0.055,-1.436,0.360,-1.478,-1.637,-1.869]
+result=anderson(data)
+print('stat=%.3f'%(result.statistic))
+foriinrange(len(result.critical_values)):
+ sl,cv=result.significance_level[i],result.critical_values[i]
+ ifresult.statistic<cv:
+ print('Probably Gaussian at the %.1f%% level'%(sl))
+ else:
+ print('Probably not Gaussian at the %.1f%% level'%(sl))
+
Tests whether two categorical variables are related or independent.
Assumptions
Observations used in the calculation of the contingency table are independent.
25 or more examples in each cell of the contingency table.
Interpretation
H0: the two samples are independent.
H1: there is a dependency between the samples.
Python Code
# Example of the Chi-Squared Test
+fromscipy.statsimportchi2_contingency
+table=[[10,20,30],[6,9,17]]
+stat,p,dof,expected=chi2_contingency(table)
+print('stat=%.3f, p=%.3f'%(stat,p))
+ifp>0.05:
+ print('Probably independent')
+else:
+ print('Probably dependent')
+
This section lists statistical tests that you can use to check if a time series is stationary or not.
Augmented Dickey-Fuller Unit Root Test
Tests whether a time series has a unit root, e.g. has a trend or more generally is autoregressive.
Assumptions
Observations in are temporally ordered.
Interpretation
H0: a unit root is present (series is non-stationary).
H1: a unit root is not present (series is stationary).
Python Code
# Example of the Augmented Dickey-Fuller unit root test
+fromstatsmodels.tsa.stattoolsimportadfuller
+data=[0,1,2,3,4,5,6,7,8,9]
+stat,p,lags,obs,crit,t=adfuller(data)
+print('stat=%.3f, p=%.3f'%(stat,p))
+ifp>0.05:
+ print('Probably not Stationary')
+else:
+ print('Probably Stationary')
+
Tests whether a time series is trend stationary or not.
Assumptions
Observations in are temporally ordered.
Interpretation
H0: the time series is trend-stationary.
H1: the time series is not trend-stationary.
Python Code
# Example of the Kwiatkowski-Phillips-Schmidt-Shin test
+fromstatsmodels.tsa.stattoolsimportkpss
+data=[0,1,2,3,4,5,6,7,8,9]
+stat,p,lags,crit=kpss(data)
+print('stat=%.3f, p=%.3f'%(stat,p))
+ifp>0.05:
+ print('Probably Stationary')
+else:
+ print('Probably not Stationary')
+
This section lists statistical tests that you can use to compare data samples.
Student’s t-test
Tests whether the means of two independent samples are significantly different.
Assumptions
Observations in each sample are independent and identically distributed (iid).
Observations in each sample are normally distributed.
Observations in each sample have the same variance.
Interpretation
H0: the means of the samples are equal.
H1: the means of the samples are unequal.
Python Code
# Example of the Student's t-test
+fromscipy.statsimportttest_ind
+data1=[0.873,2.817,0.121,-0.945,-0.055,-1.436,0.360,-1.478,-1.637,-1.869]
+data2=[1.142,-0.432,-0.938,-0.729,-0.846,-0.157,0.500,1.183,-1.075,-0.169]
+stat,p=ttest_ind(data1,data2)
+print('stat=%.3f, p=%.3f'%(stat,p))
+ifp>0.05:
+ print('Probably the same distribution')
+else:
+ print('Probably different distributions')
+
Tests whether the means of two independent samples are significantly different.
Assumptions
Observations in each sample are independent and identically distributed (iid).
Observations in each sample are normally distributed.
Observations in each sample have the same variance.
Observations across each sample are paired.
Interpretation
H0: the means of the samples are equal.
H1: the means of the samples are unequal.
Python Code
# Example of the Paired Student's t-test
+fromscipy.statsimportttest_rel
+data1=[0.873,2.817,0.121,-0.945,-0.055,-1.436,0.360,-1.478,-1.637,-1.869]
+data2=[1.142,-0.432,-0.938,-0.729,-0.846,-0.157,0.500,1.183,-1.075,-0.169]
+stat,p=ttest_rel(data1,data2)
+print('stat=%.3f, p=%.3f'%(stat,p))
+ifp>0.05:
+ print('Probably the same distribution')
+else:
+ print('Probably different distributions')
+
Tests whether the means of two or more independent samples are significantly different.
Assumptions
Observations in each sample are independent and identically distributed (iid).
Observations in each sample are normally distributed.
Observations in each sample have the same variance.
Interpretation
H0: the means of the samples are equal.
H1: the means of the samples are unequal.
Python Code
# Example of the Analysis of Variance Test
+fromscipy.statsimportf_oneway
+data1=[0.873,2.817,0.121,-0.945,-0.055,-1.436,0.360,-1.478,-1.637,-1.869]
+data2=[1.142,-0.432,-0.938,-0.729,-0.846,-0.157,0.500,1.183,-1.075,-0.169]
+data3=[-0.208,0.696,0.928,-1.148,-0.213,0.229,0.137,0.269,-0.870,-1.204]
+stat,p=f_oneway(data1,data2,data3)
+print('stat=%.3f, p=%.3f'%(stat,p))
+ifp>0.05:
+ print('Probably the same distribution')
+else:
+ print('Probably different distributions')
+
In Non-Parametric tests, we don't make any assumption about the parameters for the given population or the population we are studying. In fact, these tests don't depend on the population. Hence, there is no fixed set of parameters is available, and also there is no distribution (normal distribution, etc.)
Mann-Whitney U Test
Tests whether the distributions of two independent samples are equal or not.
Assumptions
Observations in each sample are independent and identically distributed (iid).
Observations in each sample can be ranked.
Interpretation
H0: the distributions of both samples are equal.
H1: the distributions of both samples are not equal.
Python Code
# Example of the Mann-Whitney U Test
+fromscipy.statsimportmannwhitneyu
+data1=[0.873,2.817,0.121,-0.945,-0.055,-1.436,0.360,-1.478,-1.637,-1.869]
+data2=[1.142,-0.432,-0.938,-0.729,-0.846,-0.157,0.500,1.183,-1.075,-0.169]
+stat,p=mannwhitneyu(data1,data2)
+print('stat=%.3f, p=%.3f'%(stat,p))
+ifp>0.05:
+ print('Probably the same distribution')
+else:
+ print('Probably different distributions')
+
Tests whether the distributions of two paired samples are equal or not.
Assumptions
Observations in each sample are independent and identically distributed (iid).
Observations in each sample can be ranked.
Observations across each sample are paired.
Interpretation
H0: the distributions of both samples are equal.
H1: the distributions of both samples are not equal.
Python Code
# Example of the Wilcoxon Signed-Rank Test
+fromscipy.statsimportwilcoxon
+data1=[0.873,2.817,0.121,-0.945,-0.055,-1.436,0.360,-1.478,-1.637,-1.869]
+data2=[1.142,-0.432,-0.938,-0.729,-0.846,-0.157,0.500,1.183,-1.075,-0.169]
+stat,p=wilcoxon(data1,data2)
+print('stat=%.3f, p=%.3f'%(stat,p))
+ifp>0.05:
+ print('Probably the same distribution')
+else:
+ print('Probably different distributions')
+
Tests whether the distributions of two or more independent samples are equal or not.
Assumptions
Observations in each sample are independent and identically distributed (iid).
Observations in each sample can be ranked.
Interpretation
H0: the distributions of all samples are equal.
H1: the distributions of one or more samples are not equal.
Python Code
# Example of the Kruskal-Wallis H Test
+fromscipy.statsimportkruskal
+data1=[0.873,2.817,0.121,-0.945,-0.055,-1.436,0.360,-1.478,-1.637,-1.869]
+data2=[1.142,-0.432,-0.938,-0.729,-0.846,-0.157,0.500,1.183,-1.075,-0.169]
+stat,p=kruskal(data1,data2)
+print('stat=%.3f, p=%.3f'%(stat,p))
+ifp>0.05:
+ print('Probably the same distribution')
+else:
+ print('Probably different distributions')
+
Tests whether the distributions of two or more paired samples are equal or not.
Assumptions
Observations in each sample are independent and identically distributed (iid).
Observations in each sample can be ranked.
Observations across each sample are paired.
Interpretation
H0: the distributions of all samples are equal.
H1: the distributions of one or more samples are not equal.
Python Code
# Example of the Friedman Test
+fromscipy.statsimportfriedmanchisquare
+data1=[0.873,2.817,0.121,-0.945,-0.055,-1.436,0.360,-1.478,-1.637,-1.869]
+data2=[1.142,-0.432,-0.938,-0.729,-0.846,-0.157,0.500,1.183,-1.075,-0.169]
+data3=[-0.208,0.696,0.928,-1.148,-0.213,0.229,0.137,0.269,-0.870,-1.204]
+stat,p=friedmanchisquare(data1,data2,data3)
+print('stat=%.3f, p=%.3f'%(stat,p))
+ifp>0.05:
+ print('Probably the same distribution')
+else:
+ print('Probably different distributions')
+
Test is used to assess the equality of variance between two different samples.
Levene's test
Levene’s test is used to assess the equality of variance between two or more different samples.
Assumptions
The samples from the populations under consideration are independent.
The populations under consideration are approximately normally distributed.
Interpretation
H0: All the samples variances are equal
H1: At least one variance is different from the rest
Python Code
# Example of the Levene's test
+fromscipy.statsimportlevene
+a=[8.88,9.12,9.04,8.98,9.00,9.08,9.01,8.85,9.06,8.99]
+b=[8.88,8.95,9.29,9.44,9.15,9.58,8.36,9.18,8.67,9.05]
+c=[8.95,9.12,8.95,8.85,9.03,8.84,9.07,8.98,8.86,8.98]
+stat,p=levene(a,b,c)
+print('stat=%.3f, p=%.3f'%(stat,p))
+ifp>0.05:
+ print('Probably the same variances')
+else:
+ print('Probably at least one variance is different from the rest')
+
\ No newline at end of file
diff --git a/Cheat-Sheets/Keras/index.html b/Cheat-Sheets/Keras/index.html
index 1df7534..8739019 100644
--- a/Cheat-Sheets/Keras/index.html
+++ b/Cheat-Sheets/Keras/index.html
@@ -1 +1 @@
- Keras - Data Science Interview preparation
\ No newline at end of file
diff --git a/Cheat-Sheets/NumPy/index.html b/Cheat-Sheets/NumPy/index.html
index 1a76492..95fd17d 100644
--- a/Cheat-Sheets/NumPy/index.html
+++ b/Cheat-Sheets/NumPy/index.html
@@ -1 +1 @@
- NumPy - Data Science Interview preparation
\ No newline at end of file
diff --git a/Cheat-Sheets/Pandas/index.html b/Cheat-Sheets/Pandas/index.html
index 2df351e..2f5df21 100644
--- a/Cheat-Sheets/Pandas/index.html
+++ b/Cheat-Sheets/Pandas/index.html
@@ -1 +1 @@
- Pandas - Data Science Interview preparation
\ No newline at end of file
diff --git a/Cheat-Sheets/PySpark/index.html b/Cheat-Sheets/PySpark/index.html
index 3e797f5..0ae4d88 100644
--- a/Cheat-Sheets/PySpark/index.html
+++ b/Cheat-Sheets/PySpark/index.html
@@ -1 +1 @@
- PySpark - Data Science Interview preparation
\ No newline at end of file
diff --git a/Cheat-Sheets/PyTorch/index.html b/Cheat-Sheets/PyTorch/index.html
index 3d1dcc1..925ce73 100644
--- a/Cheat-Sheets/PyTorch/index.html
+++ b/Cheat-Sheets/PyTorch/index.html
@@ -1 +1 @@
- PyTorch - Data Science Interview preparation
\ No newline at end of file
diff --git a/Cheat-Sheets/Python/index.html b/Cheat-Sheets/Python/index.html
index c18dcb4..a326e4a 100644
--- a/Cheat-Sheets/Python/index.html
+++ b/Cheat-Sheets/Python/index.html
@@ -1 +1 @@
- Python - Data Science Interview preparation
\ No newline at end of file
diff --git a/Cheat-Sheets/SQL/index.html b/Cheat-Sheets/SQL/index.html
index 1d5770d..a3d0df8 100644
--- a/Cheat-Sheets/SQL/index.html
+++ b/Cheat-Sheets/SQL/index.html
@@ -1 +1 @@
- SQL - Data Science Interview preparation
\ No newline at end of file
diff --git a/Cheat-Sheets/Sk-learn/index.html b/Cheat-Sheets/Sk-learn/index.html
index 8c53641..33df924 100644
--- a/Cheat-Sheets/Sk-learn/index.html
+++ b/Cheat-Sheets/Sk-learn/index.html
@@ -1 +1 @@
- Scikit Learn - Data Science Interview preparation
\ No newline at end of file
diff --git a/Cheat-Sheets/tensorflow/index.html b/Cheat-Sheets/tensorflow/index.html
index b66ced2..4c8f81b 100644
--- a/Cheat-Sheets/tensorflow/index.html
+++ b/Cheat-Sheets/tensorflow/index.html
@@ -1 +1 @@
- TensorFlow - Data Science Interview preparation
\ No newline at end of file
diff --git a/Deploying-ML-models/deploying-ml-models/index.html b/Deploying-ML-models/deploying-ml-models/index.html
index 9cc7326..b137648 100644
--- a/Deploying-ML-models/deploying-ml-models/index.html
+++ b/Deploying-ML-models/deploying-ml-models/index.html
@@ -1,19 +1,19 @@
- Home - Data Science Interview preparation
This is a completely open-source platform for maintaining curated list of interview questions and answers for people looking and preparing for data science opportunities.
Not only this, the platform will also serve as one point destination for all your needs like tutorials, online materials, etc.
This platform is maintained by you! 🤗 You can help us by answering/ improving existing questions as well as by sharing any new questions that you faced during your interviews.
Contribute to the platform
Contribution in any form will be deeply appreciated. 🙏
Add questions
❓ Add your questions here. Please ensure to provide a detailed description to allow your fellow contributors to understand your questions and answer them to your satisfaction.
🤝 Please note that as of now, you cannot directly add a question via a pull request. This will help us to maintain the quality of the content for you.
Add answers/topics
📝 These are the answers/topics that need your help at the moment
This is a completely open-source platform for maintaining curated list of interview questions and answers for people looking and preparing for data science opportunities.
Not only this, the platform will also serve as one point destination for all your needs like tutorials, online materials, etc.
This platform is maintained by you! 🤗 You can help us by answering/ improving existing questions as well as by sharing any new questions that you faced during your interviews.
Contribute to the platform
Contribution in any form will be deeply appreciated. 🙏
Add questions
❓ Add your questions here. Please ensure to provide a detailed description to allow your fellow contributors to understand your questions and answer them to your satisfaction.
🤝 Please note that as of now, you cannot directly add a question via a pull request. This will help us to maintain the quality of the content for you.
Add answers/topics
📝 These are the answers/topics that need your help at the moment
🛠 You can also solve existing issues on GitHub and create a pull request.
Say Thanks
😊 If this platform helped you in any way, it would be great if you could share it with others.
Check out this 👇 platform 👇 for data science content:
👉 https://singhsidhukuldeep.github.io/data-science-interview-prep/ 👈
#data-science #machine-learning #interview-preparation
-
You can also star the repository on GitHub and watch-out for any updates
Features
🎨 Beautiful: The design is built on top of most popular libraries like MkDocs and material which allows the platform to be responsive and to work on all sorts of devices – from mobile phones to wide-screens. The underlying fluid layout will always adapt perfectly to the available screen space.
🧐 Searchable: almost magically, all the content on the website is searchable without any further ado. The built-in search – server-less – is fast and accurate in responses to any of the queries.
🙌 Accessible:
Easy to use: 👌 The website is hosted on github-pages and is free and open to use to over 40 million users of GitHub in 100+ countries.
Easy to contribute: 🤝 The website embodies the concept of collaboration to the latter. Allowing anyone to add/improve the content. To make contributing easy, everything is written in MarkDown and then compiled to beautiful html.
Setup
No setup is required for usage of the platform
Important:It is strongly advised to use virtual environment and not change anything in gh-pages
Linux Systems
python3 -m venv ./venv
+
You can also star the repository on GitHub and watch-out for any updates
Features
🎨 Beautiful: The design is built on top of most popular libraries like MkDocs and material which allows the platform to be responsive and to work on all sorts of devices – from mobile phones to wide-screens. The underlying fluid layout will always adapt perfectly to the available screen space.
🧐 Searchable: almost magically, all the content on the website is searchable without any further ado. The built-in search – server-less – is fast and accurate in responses to any of the queries.
🙌 Accessible:
Easy to use: 👌 The website is hosted on github-pages and is free and open to use to over 40 million users of GitHub in 100+ countries.
Easy to contribute: 🤝 The website embodies the concept of collaboration to the latter. Allowing anyone to add/improve the content. To make contributing easy, everything is written in MarkDown and then compiled to beautiful html.
Setup
No setup is required for usage of the platform
Important:It is strongly advised to use virtual environment and not change anything in gh-pages
mkdocs serve - Start the live-reloading docs server.
mkdocs build - Build the documentation site.
mkdocs -h - Print help message and exit.
mkdocs gh-deploy - Use mkdocs gh-deploy --help to get a full list of options available for the gh-deploy command. Be aware that you will not be able to review the built site before it is pushed to GitHub. Therefore, you may want to verify any changes you make to the docs beforehand by using the build or serve commands and reviewing the built files locally.
mkdocs new [dir-name] - Create a new project. No need to create a new project
As much as this platform aims to help you with your interview preparation, it is not a short-cut to crack one. Think of this platform as a practicing field to help you sharpen your skills for your interview processes. However, for your convenience we have sorted all the questions by topics for you. 🤓
This doesn't mean that such feature won't be added in the future. "Never say Never"
But as of now there is neither plan nor data to do so. 😢
Why is this platform free? 🤗
Currently there is no major cost involved in maintaining this platform other than time and effort that is put in by every contributor. If you want to help you can contribute here.
If you still want to pay for something that is free, we would request you to donate it to a charity of your choice instead. 😇
😎 The full list of all the contributors is available here
Current Status
Last update: July 1, 2020
\ No newline at end of file
+
To install the latest
pip3installmkdocs
+pip3installmkdocs-material
+
Useful Commands
mkdocs serve - Start the live-reloading docs server.
mkdocs build - Build the documentation site.
mkdocs -h - Print help message and exit.
mkdocs gh-deploy - Use mkdocs gh-deploy --help to get a full list of options available for the gh-deploy command. Be aware that you will not be able to review the built site before it is pushed to GitHub. Therefore, you may want to verify any changes you make to the docs beforehand by using the build or serve commands and reviewing the built files locally.
mkdocs new [dir-name] - Create a new project. No need to create a new project
As much as this platform aims to help you with your interview preparation, it is not a short-cut to crack one. Think of this platform as a practicing field to help you sharpen your skills for your interview processes. However, for your convenience we have sorted all the questions by topics for you. 🤓
This doesn't mean that such feature won't be added in the future. "Never say Never"
But as of now there is neither plan nor data to do so. 😢
Why is this platform free? 🤗
Currently there is no major cost involved in maintaining this platform other than time and effort that is put in by every contributor. If you want to help you can contribute here.
If you still want to pay for something that is free, we would request you to donate it to a charity of your choice instead. 😇
😎 The full list of all the contributors is available here
Current Status
Last update: July 1, 2020
\ No newline at end of file
diff --git a/Interview-Questions/Natural-Language-Processing/index.html b/Interview-Questions/Natural-Language-Processing/index.html
index 6c1d544..2e4ad9e 100644
--- a/Interview-Questions/Natural-Language-Processing/index.html
+++ b/Interview-Questions/Natural-Language-Processing/index.html
@@ -1 +1 @@
- NLP Questions - Data Science Interview preparation
\ No newline at end of file
diff --git a/Interview-Questions/Probability/index.html b/Interview-Questions/Probability/index.html
index 914a687..4462d37 100644
--- a/Interview-Questions/Probability/index.html
+++ b/Interview-Questions/Probability/index.html
@@ -1,4 +1,4 @@
- Probability Questions - Data Science Interview preparation
Consider a fair 6-sided dice. Your aim is to get the highest score you can, in at-most 3 roles.
A score is defined as the number that appears on the face of the dice facing up after the role. You can role at most 3 times but every time you role it is up to you to decide whether you want to role again.
The last score will be counted as your final score.
Find the average score if you rolled the dice only once?
Find the average score that you can get with at most 3 roles?
If the dice is fair, why is the average score for at most 3 roles and 1 role not the same?
The average score if you rolled the dice only once is 3.5
For at most 3 roles, let's try back-tracking. Let's say just did your second role and you have to decide whether to do your 3rd role!
We just found out if we role dice once on average we can expect score of 3.5. So we will only role the 3rd time if score on 2nd role is less than 3.5 i.e (1,2 or 3)
Possibilities
2nd role score
Probability
3rd role score
Probability
1
⅙
3.5
⅙
2
⅙
3.5
⅙
3
⅙
3.5
⅙
4
⅙
NA
We won't role
5
⅙
NA
3rd time if we
6
⅙
NA
get score >3 on 2nd
So if we had 2 roles, average score would be:
[We role again if current score is less than 3.4]
+ Probability Questions - Data Science Interview preparation
Consider a fair 6-sided dice. Your aim is to get the highest score you can, in at-most 3 roles.
A score is defined as the number that appears on the face of the dice facing up after the role. You can role at most 3 times but every time you role it is up to you to decide whether you want to role again.
The last score will be counted as your final score.
Find the average score if you rolled the dice only once?
Find the average score that you can get with at most 3 roles?
If the dice is fair, why is the average score for at most 3 roles and 1 role not the same?
The average score if you rolled the dice only once is 3.5
For at most 3 roles, let's try back-tracking. Let's say just did your second role and you have to decide whether to do your 3rd role!
We just found out if we role dice once on average we can expect score of 3.5. So we will only role the 3rd time if score on 2nd role is less than 3.5 i.e (1,2 or 3)
Possibilities
2nd role score
Probability
3rd role score
Probability
1
⅙
3.5
⅙
2
⅙
3.5
⅙
3
⅙
3.5
⅙
4
⅙
NA
We won't role
5
⅙
NA
3rd time if we
6
⅙
NA
get score >3 on 2nd
So if we had 2 roles, average score would be:
[We role again if current score is less than 3.4]
(3.5)*(1/6) + (3.5)*(1/6) + (3.5)*(1/6)
+
(4)*(1/6) + (5)*(1/6) + (6)*(1/6) [Decide not to role again]
diff --git a/Interview-Questions/System-design/index.html b/Interview-Questions/System-design/index.html
index c6d8d2a..f44cad1 100644
--- a/Interview-Questions/System-design/index.html
+++ b/Interview-Questions/System-design/index.html
@@ -1 +1 @@
- System Design - Data Science Interview preparation
\ No newline at end of file
diff --git a/Interview-Questions/data-structures-algorithms/index.html b/Interview-Questions/data-structures-algorithms/index.html
index c741448..8b2eef1 100644
--- a/Interview-Questions/data-structures-algorithms/index.html
+++ b/Interview-Questions/data-structures-algorithms/index.html
@@ -1,4 +1,4 @@
- Data Structure and Algorithms - Data Science Interview preparation
\ No newline at end of file
diff --git a/Machine-Learning/DBSCAN/index.html b/Machine-Learning/DBSCAN/index.html
index 69f75e0..455b11f 100644
--- a/Machine-Learning/DBSCAN/index.html
+++ b/Machine-Learning/DBSCAN/index.html
@@ -1 +1 @@
- DBSCAN - Data Science Interview preparation
\ No newline at end of file
diff --git a/Machine-Learning/Decision Trees/index.html b/Machine-Learning/Decision Trees/index.html
index 0cfe143..f35c080 100644
--- a/Machine-Learning/Decision Trees/index.html
+++ b/Machine-Learning/Decision Trees/index.html
@@ -1 +1 @@
- Decision Trees - Data Science Interview preparation
\ No newline at end of file
diff --git a/Machine-Learning/K-means clustering/index.html b/Machine-Learning/K-means clustering/index.html
index 8c3617b..e22efeb 100644
--- a/Machine-Learning/K-means clustering/index.html
+++ b/Machine-Learning/K-means clustering/index.html
@@ -1 +1 @@
- K means clustering - Data Science Interview preparation
\ No newline at end of file
diff --git a/Machine-Learning/Linear Regression/index.html b/Machine-Learning/Linear Regression/index.html
index 359b32d..b6abaf8 100644
--- a/Machine-Learning/Linear Regression/index.html
+++ b/Machine-Learning/Linear Regression/index.html
@@ -1 +1 @@
- Linear Regression - Data Science Interview preparation
\ No newline at end of file
diff --git a/Machine-Learning/Loss Function MAE, RMSE/index.html b/Machine-Learning/Loss Function MAE, RMSE/index.html
index ad3aed5..9beca73 100644
--- a/Machine-Learning/Loss Function MAE, RMSE/index.html
+++ b/Machine-Learning/Loss Function MAE, RMSE/index.html
@@ -1 +1 @@
- Loss Function MAE, RMSE - Data Science Interview preparation
\ No newline at end of file
diff --git a/Machine-Learning/Normal Distribution/index.html b/Machine-Learning/Normal Distribution/index.html
index 01215ad..469242a 100644
--- a/Machine-Learning/Normal Distribution/index.html
+++ b/Machine-Learning/Normal Distribution/index.html
@@ -1 +1 @@
- Normal Distribution - Data Science Interview preparation
\ No newline at end of file
diff --git a/Machine-Learning/PCA/index.html b/Machine-Learning/PCA/index.html
index b520c65..2be8ca9 100644
--- a/Machine-Learning/PCA/index.html
+++ b/Machine-Learning/PCA/index.html
@@ -1 +1 @@
- PCA - Data Science Interview preparation
\ No newline at end of file
diff --git a/Machine-Learning/Random Forest/index.html b/Machine-Learning/Random Forest/index.html
index 22f88e0..563f388 100644
--- a/Machine-Learning/Random Forest/index.html
+++ b/Machine-Learning/Random Forest/index.html
@@ -1 +1 @@
- Random Forest - Data Science Interview preparation
\ No newline at end of file
diff --git a/Machine-Learning/kNN/index.html b/Machine-Learning/kNN/index.html
index 743edfb..9202d55 100644
--- a/Machine-Learning/kNN/index.html
+++ b/Machine-Learning/kNN/index.html
@@ -1 +1 @@
- kNN - Data Science Interview preparation
\ No newline at end of file
diff --git a/Online-Material/Online-Material-for-Learning/index.html b/Online-Material/Online-Material-for-Learning/index.html
index f3a84b8..7957d62 100644
--- a/Online-Material/Online-Material-for-Learning/index.html
+++ b/Online-Material/Online-Material-for-Learning/index.html
@@ -1 +1 @@
- Online Study Material - Data Science Interview preparation
\ No newline at end of file
diff --git a/Online-Material/popular-resouces/index.html b/Online-Material/popular-resouces/index.html
index aff2064..90d007b 100644
--- a/Online-Material/popular-resouces/index.html
+++ b/Online-Material/popular-resouces/index.html
@@ -1 +1 @@
- Popular Blogs - Data Science Interview preparation
\ No newline at end of file
diff --git a/as-fast-as-possible/Deep-CV/index.html b/as-fast-as-possible/Deep-CV/index.html
index 2765e28..d9b42a3 100644
--- a/as-fast-as-possible/Deep-CV/index.html
+++ b/as-fast-as-possible/Deep-CV/index.html
@@ -1 +1 @@
- Deep Computer Vision - Data Science Interview preparation
\ No newline at end of file
diff --git a/as-fast-as-possible/Deep-NLP/index.html b/as-fast-as-possible/Deep-NLP/index.html
index d7769d2..8d4869c 100644
--- a/as-fast-as-possible/Deep-NLP/index.html
+++ b/as-fast-as-possible/Deep-NLP/index.html
@@ -1 +1 @@
- Deep Natural Language Processing - Data Science Interview preparation
\ No newline at end of file
diff --git a/as-fast-as-possible/Neural-Networks/index.html b/as-fast-as-possible/Neural-Networks/index.html
index 388e404..52553bf 100644
--- a/as-fast-as-possible/Neural-Networks/index.html
+++ b/as-fast-as-possible/Neural-Networks/index.html
@@ -1 +1 @@
- Neural Networks - Data Science Interview preparation
\ No newline at end of file
diff --git a/as-fast-as-possible/TF2-Keras/index.html b/as-fast-as-possible/TF2-Keras/index.html
index 6e3a86e..7bc5dbf 100644
--- a/as-fast-as-possible/TF2-Keras/index.html
+++ b/as-fast-as-possible/TF2-Keras/index.html
@@ -1 +1 @@
- Tensorflow 2 with Keras - Data Science Interview preparation
\ No newline at end of file
diff --git a/as-fast-as-possible/index.html b/as-fast-as-possible/index.html
index 7b58214..331d6c2 100644
--- a/as-fast-as-possible/index.html
+++ b/as-fast-as-possible/index.html
@@ -1 +1 @@
- Introduction - Data Science Interview preparation
\ No newline at end of file
diff --git a/index.html b/index.html
index fe20ebe..9024b91 100644
--- a/index.html
+++ b/index.html
@@ -1,4 +1,4 @@
- Data Science - Data Science Interview preparation
This is a completely open-source platform for maintaining curated list of interview questions and answers for people looking and preparing for data science opportunities.
Not only this, the platform will also serve as one point destination for all your needs like tutorials, online materials, etc.
This platform is maintained by you! 🤗 You can help us by answering/ improving existing questions as well as by sharing any new questions that you faced during your interviews.
Contribute to the platform
Contribution in any form will be deeply appreciated. 🙏
Add questions
❓ Add your questions here. Please ensure to provide a detailed description to allow your fellow contributors to understand your questions and answer them to your satisfaction.
🤝 Please note that as of now, you cannot directly add a question via a pull request. This will help us to maintain the quality of the content for you.
Add answers/topics
📝 These are the answers/topics that need your help at the moment
This is a completely open-source platform for maintaining curated list of interview questions and answers for people looking and preparing for data science opportunities.
Not only this, the platform will also serve as one point destination for all your needs like tutorials, online materials, etc.
This platform is maintained by you! 🤗 You can help us by answering/ improving existing questions as well as by sharing any new questions that you faced during your interviews.
Contribute to the platform
Contribution in any form will be deeply appreciated. 🙏
Add questions
❓ Add your questions here. Please ensure to provide a detailed description to allow your fellow contributors to understand your questions and answer them to your satisfaction.
🤝 Please note that as of now, you cannot directly add a question via a pull request. This will help us to maintain the quality of the content for you.
Add answers/topics
📝 These are the answers/topics that need your help at the moment
🛠 You can also solve existing issues on GitHub and create a pull request.
Say Thanks
😊 If this platform helped you in any way, it would be great if you could share it with others.
Check out this 👇 platform 👇 for data science content:
👉 https://singhsidhukuldeep.github.io/data-science-interview-prep/ 👈
-
You can also star the repository on GitHub and watch-out for any updates
Features
🎨 Beautiful: The design is built on top of most popular libraries like MkDocs and material which allows the platform to be responsive and to work on all sorts of devices – from mobile phones to wide-screens. The underlying fluid layout will always adapt perfectly to the available screen space.
🧐 Searchable: almost magically, all the content on the website is searchable without any further ado. The built-in search – server-less – is fast and accurate in responses to any of the queries.
🙌 Accessible:
Easy to use: 👌 The website is hosted on github-pages and is free and open to use to over 40 million users of GitHub in 100+ countries.
Easy to contribute: 🤝 The website embodies the concept of collaboration to the latter. Allowing anyone to add/improve the content. To make contributing easy, everything is written in MarkDown and then compiled to beautiful html.
Setup
No setup is required for usage of the platform
Important:It is strongly advised to use virtual environment and not change anything in gh-pages
Linux Systems
python3 -m venv ./venv
+
You can also star the repository on GitHub and watch-out for any updates
Features
🎨 Beautiful: The design is built on top of most popular libraries like MkDocs and material which allows the platform to be responsive and to work on all sorts of devices – from mobile phones to wide-screens. The underlying fluid layout will always adapt perfectly to the available screen space.
🧐 Searchable: almost magically, all the content on the website is searchable without any further ado. The built-in search – server-less – is fast and accurate in responses to any of the queries.
🙌 Accessible:
Easy to use: 👌 The website is hosted on github-pages and is free and open to use to over 40 million users of GitHub in 100+ countries.
Easy to contribute: 🤝 The website embodies the concept of collaboration to the latter. Allowing anyone to add/improve the content. To make contributing easy, everything is written in MarkDown and then compiled to beautiful html.
Setup
No setup is required for usage of the platform
Important:It is strongly advised to use virtual environment and not change anything in gh-pages
mkdocs serve - Start the live-reloading docs server.
mkdocs build - Build the documentation site.
mkdocs -h - Print help message and exit.
mkdocs gh-deploy - Use mkdocs gh-deploy --help to get a full list of options available for the gh-deploy command. Be aware that you will not be able to review the built site before it is pushed to GitHub. Therefore, you may want to verify any changes you make to the docs beforehand by using the build or serve commands and reviewing the built files locally.
mkdocs new [dir-name] - Create a new project. No need to create a new project
As much as this platform aims to help you with your interview preparation, it is not a short-cut to crack one. Think of this platform as a practicing field to help you sharpen your skills for your interview processes. However, for your convenience we have sorted all the questions by topics for you. 🤓
This doesn't mean that such feature won't be added in the future. "Never say Never"
But as of now there is neither plan nor data to do so. 😢
Why is this platform free? 🤗
Currently there is no major cost involved in maintaining this platform other than time and effort that is put in by every contributor. If you want to help you can contribute here.
If you still want to pay for something that is free, we would request you to donate it to a charity of your choice instead. 😇
mkdocs serve - Start the live-reloading docs server.
mkdocs build - Build the documentation site.
mkdocs -h - Print help message and exit.
mkdocs gh-deploy - Use mkdocs gh-deploy --help to get a full list of options available for the gh-deploy command. Be aware that you will not be able to review the built site before it is pushed to GitHub. Therefore, you may want to verify any changes you make to the docs beforehand by using the build or serve commands and reviewing the built files locally.
mkdocs new [dir-name] - Create a new project. No need to create a new project
As much as this platform aims to help you with your interview preparation, it is not a short-cut to crack one. Think of this platform as a practicing field to help you sharpen your skills for your interview processes. However, for your convenience we have sorted all the questions by topics for you. 🤓
This doesn't mean that such feature won't be added in the future. "Never say Never"
But as of now there is neither plan nor data to do so. 😢
Why is this platform free? 🤗
Currently there is no major cost involved in maintaining this platform other than time and effort that is put in by every contributor. If you want to help you can contribute here.
If you still want to pay for something that is free, we would request you to donate it to a charity of your choice instead. 😇
😎 The full list of all the contributors is available here
Current Status
Last update: August 3, 2022
\ No newline at end of file
diff --git a/projects/index.html b/projects/index.html
index 141b4fd..61cef7b 100644
--- a/projects/index.html
+++ b/projects/index.html
@@ -1 +1 @@
- Projects - Data Science Interview preparation
\ No newline at end of file
diff --git a/search/search_index.json b/search/search_index.json
index cce7754..cc180b5 100644
--- a/search/search_index.json
+++ b/search/search_index.json
@@ -1 +1 @@
-{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"Home Introduction This is a completely open-source platform for maintaining curated list of interview questions and answers for people looking and preparing for data science opportunities. Not only this, the platform will also serve as one point destination for all your needs like tutorials, online materials, etc. This platform is maintained by you! \ud83e\udd17 You can help us by answering/ improving existing questions as well as by sharing any new questions that you faced during your interviews. Contribute to the platform Contribution in any form will be deeply appreciated. \ud83d\ude4f Add questions \u2753 Add your questions here . Please ensure to provide a detailed description to allow your fellow contributors to understand your questions and answer them to your satisfaction. \ud83e\udd1d Please note that as of now, you cannot directly add a question via a pull request. This will help us to maintain the quality of the content for you . Add answers/topics \ud83d\udcdd These are the answers/topics that need your help at the moment Add documentation for the project Online Material for Learning Suggested Learning Paths Cheat Sheets Django Flask Numpy Pandas PySpark Python RegEx SQL NLP Interview Questions Add python common DSA interview questions Add Major ML topics Linear Regression Logistic Regression SVM Random Forest Gradient boosting PCA Collaborative Filtering K-means clustering kNN ARIMA Neural Networks Decision Trees Overfitting, Underfitting Unbalanced, Skewed data Activation functions relu/ leaky relu Normalization DBSCAN Normal Distribution Precision, Recall Loss Function MAE, RMSE Add Pandas questions Add NumPy questions Add TensorFlow questions Add PyTorch questions Add list of learning resources Report/Solve Issues \ud83d\udd27 To report any issues find me on LinkedIn or raise an issue on GitHub. \ud83d\udee0 You can also solve existing issues on GitHub and create a pull request. Say Thanks \ud83d\ude0a If this platform helped you in any way, it would be great if you could share it with others. Check out this \ud83d\udc47 platform \ud83d\udc47 for data science content: \ud83d\udc49 https://singhsidhukuldeep.github.io/data-science-interview-prep/ \ud83d\udc48 You can also star the repository on GitHub and watch-out for any updates Features \ud83c\udfa8 Beautiful: The design is built on top of most popular libraries like MkDocs and material which allows the platform to be responsive and to work on all sorts of devices \u2013 from mobile phones to wide-screens. The underlying fluid layout will always adapt perfectly to the available screen space. \ud83e\uddd0 Searchable: almost magically, all the content on the website is searchable without any further ado. The built-in search \u2013 server-less \u2013 is fast and accurate in responses to any of the queries. \ud83d\ude4c Accessible: Easy to use: \ud83d\udc4c The website is hosted on github-pages and is free and open to use to over 40 million users of GitHub in 100+ countries. Easy to contribute: \ud83e\udd1d The website embodies the concept of collaboration to the latter. Allowing anyone to add/improve the content. To make contributing easy, everything is written in MarkDown and then compiled to beautiful html. Setup No setup is required for usage of the platform Important: It is strongly advised to use virtual environment and not change anything in gh-pages Linux Systems python3 -m venv ./venv source venv/bin/activate pip3 install -r requirements.txt deactivate Windows Systems python3 -m venv ./venv venv \\S cripts \\a ctivate pip3 install -r requirements.txt venv \\S cripts \\d eactivate To install the latest pip3 install mkdocs pip3 install mkdocs-material pip3 install mkdocs-minify-plugin pip3 install mkdocs-git-revision-date-localized-plugin Useful Commands mkdocs serve - Start the live-reloading docs server. mkdocs build - Build the documentation site. mkdocs -h - Print help message and exit. mkdocs gh-deploy - Use mkdocs gh-deploy --help to get a full list of options available for the gh-deploy command. Be aware that you will not be able to review the built site before it is pushed to GitHub. Therefore, you may want to verify any changes you make to the docs beforehand by using the build or serve commands and reviewing the built files locally. mkdocs new [dir-name] - Create a new project. No need to create a new project Useful Documents \ud83d\udcd1 MkDocs: GitHub: https://github.com/mkdocs/mkdocs Documentation: https://www.mkdocs.org/ \ud83c\udfa8 Theme: GitHub: https://github.com/squidfunk/mkdocs-material Documentation: https://squidfunk.github.io/mkdocs-material/getting-started/ FAQ Can I filter questions based on companies? \ud83e\udd2a As much as this platform aims to help you with your interview preparation, it is not a short-cut to crack one. Think of this platform as a practicing field to help you sharpen your skills for your interview processes. However, for your convenience we have sorted all the questions by topics for you. \ud83e\udd13 This doesn't mean that such feature won't be added in the future. \"Never say Never\" But as of now there is neither plan nor data to do so. \ud83d\ude22 Why is this platform free? \ud83e\udd17 Currently there is no major cost involved in maintaining this platform other than time and effort that is put in by every contributor . If you want to help you can contribute here . If you still want to pay for something that is free, we would request you to donate it to a charity of your choice instead. \ud83d\ude07 Credits Maintained by \ud83d\udc68\u200d\ud83c\udf93 Kuldeep Singh Sidhu Github: github/singhsidhukuldeep https://github.com/singhsidhukuldeep Website: Kuldeep Singh Sidhu (Website) http://kuldeepsinghsidhu.com LinkedIn: Kuldeep Singh Sidhu (LinkedIn) https://www.linkedin.com/in/singhsidhukuldeep/ Contributors \ud83d\ude0e The full list of all the contributors is available here Current Status","title":"\ud83c\udfe1 Home"},{"location":"#home","text":"","title":"Home"},{"location":"#introduction","text":"This is a completely open-source platform for maintaining curated list of interview questions and answers for people looking and preparing for data science opportunities. Not only this, the platform will also serve as one point destination for all your needs like tutorials, online materials, etc. This platform is maintained by you! \ud83e\udd17 You can help us by answering/ improving existing questions as well as by sharing any new questions that you faced during your interviews.","title":"Introduction"},{"location":"#contribute-to-the-platform","text":"Contribution in any form will be deeply appreciated. \ud83d\ude4f","title":"Contribute to the platform"},{"location":"#add-questions","text":"\u2753 Add your questions here . Please ensure to provide a detailed description to allow your fellow contributors to understand your questions and answer them to your satisfaction. \ud83e\udd1d Please note that as of now, you cannot directly add a question via a pull request. This will help us to maintain the quality of the content for you .","title":"Add questions"},{"location":"#add-answerstopics","text":"\ud83d\udcdd These are the answers/topics that need your help at the moment Add documentation for the project Online Material for Learning Suggested Learning Paths Cheat Sheets Django Flask Numpy Pandas PySpark Python RegEx SQL NLP Interview Questions Add python common DSA interview questions Add Major ML topics Linear Regression Logistic Regression SVM Random Forest Gradient boosting PCA Collaborative Filtering K-means clustering kNN ARIMA Neural Networks Decision Trees Overfitting, Underfitting Unbalanced, Skewed data Activation functions relu/ leaky relu Normalization DBSCAN Normal Distribution Precision, Recall Loss Function MAE, RMSE Add Pandas questions Add NumPy questions Add TensorFlow questions Add PyTorch questions Add list of learning resources","title":"Add answers/topics"},{"location":"#reportsolve-issues","text":"\ud83d\udd27 To report any issues find me on LinkedIn or raise an issue on GitHub. \ud83d\udee0 You can also solve existing issues on GitHub and create a pull request.","title":"Report/Solve Issues"},{"location":"#say-thanks","text":"\ud83d\ude0a If this platform helped you in any way, it would be great if you could share it with others. Check out this \ud83d\udc47 platform \ud83d\udc47 for data science content: \ud83d\udc49 https://singhsidhukuldeep.github.io/data-science-interview-prep/ \ud83d\udc48 You can also star the repository on GitHub and watch-out for any updates","title":"Say Thanks"},{"location":"#features","text":"\ud83c\udfa8 Beautiful: The design is built on top of most popular libraries like MkDocs and material which allows the platform to be responsive and to work on all sorts of devices \u2013 from mobile phones to wide-screens. The underlying fluid layout will always adapt perfectly to the available screen space. \ud83e\uddd0 Searchable: almost magically, all the content on the website is searchable without any further ado. The built-in search \u2013 server-less \u2013 is fast and accurate in responses to any of the queries. \ud83d\ude4c Accessible: Easy to use: \ud83d\udc4c The website is hosted on github-pages and is free and open to use to over 40 million users of GitHub in 100+ countries. Easy to contribute: \ud83e\udd1d The website embodies the concept of collaboration to the latter. Allowing anyone to add/improve the content. To make contributing easy, everything is written in MarkDown and then compiled to beautiful html.","title":"Features"},{"location":"#setup","text":"No setup is required for usage of the platform Important: It is strongly advised to use virtual environment and not change anything in gh-pages","title":"Setup"},{"location":"#linux-systems","text":"python3 -m venv ./venv source venv/bin/activate pip3 install -r requirements.txt deactivate","title":"Linux Systems"},{"location":"#windows-systems","text":"python3 -m venv ./venv venv \\S cripts \\a ctivate pip3 install -r requirements.txt venv \\S cripts \\d eactivate","title":"Windows Systems"},{"location":"#to-install-the-latest","text":"pip3 install mkdocs pip3 install mkdocs-material pip3 install mkdocs-minify-plugin pip3 install mkdocs-git-revision-date-localized-plugin","title":"To install the latest"},{"location":"#useful-commands","text":"mkdocs serve - Start the live-reloading docs server. mkdocs build - Build the documentation site. mkdocs -h - Print help message and exit. mkdocs gh-deploy - Use mkdocs gh-deploy --help to get a full list of options available for the gh-deploy command. Be aware that you will not be able to review the built site before it is pushed to GitHub. Therefore, you may want to verify any changes you make to the docs beforehand by using the build or serve commands and reviewing the built files locally. mkdocs new [dir-name] - Create a new project. No need to create a new project","title":"Useful Commands"},{"location":"#useful-documents","text":"\ud83d\udcd1 MkDocs: GitHub: https://github.com/mkdocs/mkdocs Documentation: https://www.mkdocs.org/ \ud83c\udfa8 Theme: GitHub: https://github.com/squidfunk/mkdocs-material Documentation: https://squidfunk.github.io/mkdocs-material/getting-started/","title":"Useful Documents"},{"location":"#faq","text":"Can I filter questions based on companies? \ud83e\udd2a As much as this platform aims to help you with your interview preparation, it is not a short-cut to crack one. Think of this platform as a practicing field to help you sharpen your skills for your interview processes. However, for your convenience we have sorted all the questions by topics for you. \ud83e\udd13 This doesn't mean that such feature won't be added in the future. \"Never say Never\" But as of now there is neither plan nor data to do so. \ud83d\ude22 Why is this platform free? \ud83e\udd17 Currently there is no major cost involved in maintaining this platform other than time and effort that is put in by every contributor . If you want to help you can contribute here . If you still want to pay for something that is free, we would request you to donate it to a charity of your choice instead. \ud83d\ude07","title":"FAQ"},{"location":"#credits","text":"","title":"Credits"},{"location":"#maintained-by","text":"\ud83d\udc68\u200d\ud83c\udf93 Kuldeep Singh Sidhu Github: github/singhsidhukuldeep https://github.com/singhsidhukuldeep Website: Kuldeep Singh Sidhu (Website) http://kuldeepsinghsidhu.com LinkedIn: Kuldeep Singh Sidhu (LinkedIn) https://www.linkedin.com/in/singhsidhukuldeep/","title":"Maintained by"},{"location":"#contributors","text":"\ud83d\ude0e The full list of all the contributors is available here","title":"Contributors"},{"location":"#current-status","text":"","title":"Current Status"},{"location":"Suggested-Learning-Paths/","text":"","title":"\ud83d\udcc5 Suggested Learning Paths"},{"location":"projects/","text":"Projects Introduction These are the projects that you can take inspiration from and try to improve on them. \u270d\ufe0f Popular Sources List of projects Natural Language processing (NLP) Title Description Source Author Text Classification with Facebook fasttext Building the User Review Model with fastText (Text Classification) with response time of less than one second Kuldeep Singh Sidhu Chat-bot using ChatterBot ChatterBot is a Python library that makes it easy to generate automated responses to a user\u2019s input. Kuldeep Singh Sidhu Text Summarizer Comparing state of the art models for text summary generation Kuldeep Singh Sidhu NLP with Spacy Building NLP pipeline using Spacy Kuldeep Singh Sidhu Recommendation Engine Title Description Source Author Recommendation Engine with Surprise Comparing different recommendation systems algorithms like SVD, SVDpp (Matrix Factorization), KNN Baseline, KNN Basic, KNN Means, KNN ZScore), Baseline, Co Clustering Kuldeep Singh Sidhu Image Processing Title Description Source Author Facial Landmarks Using Dlib, a library capable of giving you 68 points (land marks) of the face. Kuldeep Singh Sidhu Reinforcement Learning Title Description Source Author Google Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. Kuldeep Singh Sidhu Tic Tac Toe Training a computer to play Tic Tac Toe using reinforcement learning algorithms. Kuldeep Singh Sidhu Others Title Description Source Author TensorFlow Eager Execution Eager Execution (EE) enables you to run operations immediately. Kuldeep Singh Sidhu","title":"\ud83d\udcf3 Projects"},{"location":"projects/#projects","text":"","title":"Projects"},{"location":"projects/#introduction","text":"These are the projects that you can take inspiration from and try to improve on them. \u270d\ufe0f","title":"Introduction"},{"location":"projects/#popular-sources","text":"","title":"Popular Sources"},{"location":"projects/#list-of-projects","text":"","title":"List of projects"},{"location":"projects/#natural-language-processing-nlp","text":"Title Description Source Author Text Classification with Facebook fasttext Building the User Review Model with fastText (Text Classification) with response time of less than one second Kuldeep Singh Sidhu Chat-bot using ChatterBot ChatterBot is a Python library that makes it easy to generate automated responses to a user\u2019s input. Kuldeep Singh Sidhu Text Summarizer Comparing state of the art models for text summary generation Kuldeep Singh Sidhu NLP with Spacy Building NLP pipeline using Spacy Kuldeep Singh Sidhu","title":"Natural Language processing (NLP)"},{"location":"projects/#recommendation-engine","text":"Title Description Source Author Recommendation Engine with Surprise Comparing different recommendation systems algorithms like SVD, SVDpp (Matrix Factorization), KNN Baseline, KNN Basic, KNN Means, KNN ZScore), Baseline, Co Clustering Kuldeep Singh Sidhu","title":"Recommendation Engine"},{"location":"projects/#image-processing","text":"Title Description Source Author Facial Landmarks Using Dlib, a library capable of giving you 68 points (land marks) of the face. Kuldeep Singh Sidhu","title":"Image Processing"},{"location":"projects/#reinforcement-learning","text":"Title Description Source Author Google Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. Kuldeep Singh Sidhu Tic Tac Toe Training a computer to play Tic Tac Toe using reinforcement learning algorithms. Kuldeep Singh Sidhu","title":"Reinforcement Learning"},{"location":"projects/#others","text":"Title Description Source Author TensorFlow Eager Execution Eager Execution (EE) enables you to run operations immediately. Kuldeep Singh Sidhu","title":"Others"},{"location":"Cheat-Sheets/Django/","text":"","title":"Django"},{"location":"Cheat-Sheets/Flask/","text":"","title":"Flask"},{"location":"Cheat-Sheets/Keras/","text":"","title":"Keras"},{"location":"Cheat-Sheets/NumPy/","text":"","title":"NumPy"},{"location":"Cheat-Sheets/Pandas/","text":"","title":"Pandas"},{"location":"Cheat-Sheets/PySpark/","text":"","title":"PySpark"},{"location":"Cheat-Sheets/PyTorch/","text":"","title":"PyTorch"},{"location":"Cheat-Sheets/Python/","text":"","title":"Python"},{"location":"Cheat-Sheets/RegEx/","text":"","title":"Regular Expressions (RegEx)"},{"location":"Cheat-Sheets/SQL/","text":"","title":"SQL"},{"location":"Cheat-Sheets/Sk-learn/","text":"","title":"Scikit Learn"},{"location":"Cheat-Sheets/tensorflow/","text":"","title":"TensorFlow"},{"location":"Deploying-ML-models/deploying-ml-models/","text":"Home Introduction This is a completely open-source platform for maintaining curated list of interview questions and answers for people looking and preparing for data science opportunities. Not only this, the platform will also serve as one point destination for all your needs like tutorials, online materials, etc. This platform is maintained by you! \ud83e\udd17 You can help us by answering/ improving existing questions as well as by sharing any new questions that you faced during your interviews. Contribute to the platform Contribution in any form will be deeply appreciated. \ud83d\ude4f Add questions \u2753 Add your questions here . Please ensure to provide a detailed description to allow your fellow contributors to understand your questions and answer them to your satisfaction. \ud83e\udd1d Please note that as of now, you cannot directly add a question via a pull request. This will help us to maintain the quality of the content for you . Add answers/topics \ud83d\udcdd These are the answers/topics that need your help at the moment Add documentation for the project Online Material for Learning Suggested Learning Paths Cheat Sheets Django Flask Numpy Pandas PySpark Python RegEx SQL NLP Interview Questions Add python common DSA interview questions Add Major ML topics Linear Regression Logistic Regression SVM Random Forest Gradient boosting PCA Collaborative Filtering K-means clustering kNN ARIMA Neural Networks Decision Trees Overfitting, Underfitting Unbalanced, Skewed data Activation functions relu/ leaky relu Normalization DBSCAN Normal Distribution Precision, Recall Loss Function MAE, RMSE Add Pandas questions Add NumPy questions Add TensorFlow questions Add PyTorch questions Add list of learning resources Report/Solve Issues \ud83d\udd27 To report any issues find me on LinkedIn or raise an issue on GitHub. \ud83d\udee0 You can also solve existing issues on GitHub and create a pull request. Say Thanks \ud83d\ude0a If this platform helped you in any way, it would be great if you could share it with others. Check out this \ud83d\udc47 platform \ud83d\udc47 for data science content: \ud83d\udc49 https://singhsidhukuldeep.github.io/data-science-interview-prep/ \ud83d\udc48 #data-science #machine-learning #interview-preparation You can also star the repository on GitHub and watch-out for any updates Features \ud83c\udfa8 Beautiful: The design is built on top of most popular libraries like MkDocs and material which allows the platform to be responsive and to work on all sorts of devices \u2013 from mobile phones to wide-screens. The underlying fluid layout will always adapt perfectly to the available screen space. \ud83e\uddd0 Searchable: almost magically, all the content on the website is searchable without any further ado. The built-in search \u2013 server-less \u2013 is fast and accurate in responses to any of the queries. \ud83d\ude4c Accessible: Easy to use: \ud83d\udc4c The website is hosted on github-pages and is free and open to use to over 40 million users of GitHub in 100+ countries. Easy to contribute: \ud83e\udd1d The website embodies the concept of collaboration to the latter. Allowing anyone to add/improve the content. To make contributing easy, everything is written in MarkDown and then compiled to beautiful html. Setup No setup is required for usage of the platform Important: It is strongly advised to use virtual environment and not change anything in gh-pages Linux Systems python3 -m venv ./venv source venv/bin/activate pip3 install -r requirements.txt deactivate Windows Systems python3 -m venv ./venv venv \\S cripts \\a ctivate pip3 install -r requirements.txt venv \\S cripts \\d eactivate To install the latest pip3 install mkdocs pip3 install mkdocs-material Useful Commands mkdocs serve - Start the live-reloading docs server. mkdocs build - Build the documentation site. mkdocs -h - Print help message and exit. mkdocs gh-deploy - Use mkdocs gh-deploy --help to get a full list of options available for the gh-deploy command. Be aware that you will not be able to review the built site before it is pushed to GitHub. Therefore, you may want to verify any changes you make to the docs beforehand by using the build or serve commands and reviewing the built files locally. mkdocs new [dir-name] - Create a new project. No need to create a new project Useful Documents \ud83d\udcd1 MkDocs: https://github.com/mkdocs/mkdocs \ud83c\udfa8 Theme: https://github.com/squidfunk/mkdocs-material FAQ Can I filter questions based on companies? \ud83e\udd2a As much as this platform aims to help you with your interview preparation, it is not a short-cut to crack one. Think of this platform as a practicing field to help you sharpen your skills for your interview processes. However, for your convenience we have sorted all the questions by topics for you. \ud83e\udd13 This doesn't mean that such feature won't be added in the future. \"Never say Never\" But as of now there is neither plan nor data to do so. \ud83d\ude22 Why is this platform free? \ud83e\udd17 Currently there is no major cost involved in maintaining this platform other than time and effort that is put in by every contributor . If you want to help you can contribute here . If you still want to pay for something that is free, we would request you to donate it to a charity of your choice instead. \ud83d\ude07 Credits Maintained by \ud83d\udc68\u200d\ud83c\udf93 Kuldeep Singh Sidhu Github: github/singhsidhukuldeep https://github.com/singhsidhukuldeep Website: Kuldeep Singh Sidhu (Website) http://kuldeepsinghsidhu.com LinkedIn: Kuldeep Singh Sidhu (LinkedIn) https://www.linkedin.com/in/singhsidhukuldeep/ Contributors \ud83d\ude0e The full list of all the contributors is available here Current Status","title":"Production Deployment"},{"location":"Deploying-ML-models/deploying-ml-models/#home","text":"","title":"Home"},{"location":"Deploying-ML-models/deploying-ml-models/#introduction","text":"This is a completely open-source platform for maintaining curated list of interview questions and answers for people looking and preparing for data science opportunities. Not only this, the platform will also serve as one point destination for all your needs like tutorials, online materials, etc. This platform is maintained by you! \ud83e\udd17 You can help us by answering/ improving existing questions as well as by sharing any new questions that you faced during your interviews.","title":"Introduction"},{"location":"Deploying-ML-models/deploying-ml-models/#contribute-to-the-platform","text":"Contribution in any form will be deeply appreciated. \ud83d\ude4f","title":"Contribute to the platform"},{"location":"Deploying-ML-models/deploying-ml-models/#add-questions","text":"\u2753 Add your questions here . Please ensure to provide a detailed description to allow your fellow contributors to understand your questions and answer them to your satisfaction. \ud83e\udd1d Please note that as of now, you cannot directly add a question via a pull request. This will help us to maintain the quality of the content for you .","title":"Add questions"},{"location":"Deploying-ML-models/deploying-ml-models/#add-answerstopics","text":"\ud83d\udcdd These are the answers/topics that need your help at the moment Add documentation for the project Online Material for Learning Suggested Learning Paths Cheat Sheets Django Flask Numpy Pandas PySpark Python RegEx SQL NLP Interview Questions Add python common DSA interview questions Add Major ML topics Linear Regression Logistic Regression SVM Random Forest Gradient boosting PCA Collaborative Filtering K-means clustering kNN ARIMA Neural Networks Decision Trees Overfitting, Underfitting Unbalanced, Skewed data Activation functions relu/ leaky relu Normalization DBSCAN Normal Distribution Precision, Recall Loss Function MAE, RMSE Add Pandas questions Add NumPy questions Add TensorFlow questions Add PyTorch questions Add list of learning resources","title":"Add answers/topics"},{"location":"Deploying-ML-models/deploying-ml-models/#reportsolve-issues","text":"\ud83d\udd27 To report any issues find me on LinkedIn or raise an issue on GitHub. \ud83d\udee0 You can also solve existing issues on GitHub and create a pull request.","title":"Report/Solve Issues"},{"location":"Deploying-ML-models/deploying-ml-models/#say-thanks","text":"\ud83d\ude0a If this platform helped you in any way, it would be great if you could share it with others. Check out this \ud83d\udc47 platform \ud83d\udc47 for data science content: \ud83d\udc49 https://singhsidhukuldeep.github.io/data-science-interview-prep/ \ud83d\udc48 #data-science #machine-learning #interview-preparation You can also star the repository on GitHub and watch-out for any updates","title":"Say Thanks"},{"location":"Deploying-ML-models/deploying-ml-models/#features","text":"\ud83c\udfa8 Beautiful: The design is built on top of most popular libraries like MkDocs and material which allows the platform to be responsive and to work on all sorts of devices \u2013 from mobile phones to wide-screens. The underlying fluid layout will always adapt perfectly to the available screen space. \ud83e\uddd0 Searchable: almost magically, all the content on the website is searchable without any further ado. The built-in search \u2013 server-less \u2013 is fast and accurate in responses to any of the queries. \ud83d\ude4c Accessible: Easy to use: \ud83d\udc4c The website is hosted on github-pages and is free and open to use to over 40 million users of GitHub in 100+ countries. Easy to contribute: \ud83e\udd1d The website embodies the concept of collaboration to the latter. Allowing anyone to add/improve the content. To make contributing easy, everything is written in MarkDown and then compiled to beautiful html.","title":"Features"},{"location":"Deploying-ML-models/deploying-ml-models/#setup","text":"No setup is required for usage of the platform Important: It is strongly advised to use virtual environment and not change anything in gh-pages","title":"Setup"},{"location":"Deploying-ML-models/deploying-ml-models/#linux-systems","text":"python3 -m venv ./venv source venv/bin/activate pip3 install -r requirements.txt deactivate","title":"Linux Systems"},{"location":"Deploying-ML-models/deploying-ml-models/#windows-systems","text":"python3 -m venv ./venv venv \\S cripts \\a ctivate pip3 install -r requirements.txt venv \\S cripts \\d eactivate","title":"Windows Systems"},{"location":"Deploying-ML-models/deploying-ml-models/#to-install-the-latest","text":"pip3 install mkdocs pip3 install mkdocs-material","title":"To install the latest"},{"location":"Deploying-ML-models/deploying-ml-models/#useful-commands","text":"mkdocs serve - Start the live-reloading docs server. mkdocs build - Build the documentation site. mkdocs -h - Print help message and exit. mkdocs gh-deploy - Use mkdocs gh-deploy --help to get a full list of options available for the gh-deploy command. Be aware that you will not be able to review the built site before it is pushed to GitHub. Therefore, you may want to verify any changes you make to the docs beforehand by using the build or serve commands and reviewing the built files locally. mkdocs new [dir-name] - Create a new project. No need to create a new project","title":"Useful Commands"},{"location":"Deploying-ML-models/deploying-ml-models/#useful-documents","text":"\ud83d\udcd1 MkDocs: https://github.com/mkdocs/mkdocs \ud83c\udfa8 Theme: https://github.com/squidfunk/mkdocs-material","title":"Useful Documents"},{"location":"Deploying-ML-models/deploying-ml-models/#faq","text":"Can I filter questions based on companies? \ud83e\udd2a As much as this platform aims to help you with your interview preparation, it is not a short-cut to crack one. Think of this platform as a practicing field to help you sharpen your skills for your interview processes. However, for your convenience we have sorted all the questions by topics for you. \ud83e\udd13 This doesn't mean that such feature won't be added in the future. \"Never say Never\" But as of now there is neither plan nor data to do so. \ud83d\ude22 Why is this platform free? \ud83e\udd17 Currently there is no major cost involved in maintaining this platform other than time and effort that is put in by every contributor . If you want to help you can contribute here . If you still want to pay for something that is free, we would request you to donate it to a charity of your choice instead. \ud83d\ude07","title":"FAQ"},{"location":"Deploying-ML-models/deploying-ml-models/#credits","text":"","title":"Credits"},{"location":"Deploying-ML-models/deploying-ml-models/#maintained-by","text":"\ud83d\udc68\u200d\ud83c\udf93 Kuldeep Singh Sidhu Github: github/singhsidhukuldeep https://github.com/singhsidhukuldeep Website: Kuldeep Singh Sidhu (Website) http://kuldeepsinghsidhu.com LinkedIn: Kuldeep Singh Sidhu (LinkedIn) https://www.linkedin.com/in/singhsidhukuldeep/","title":"Maintained by"},{"location":"Deploying-ML-models/deploying-ml-models/#contributors","text":"\ud83d\ude0e The full list of all the contributors is available here","title":"Contributors"},{"location":"Deploying-ML-models/deploying-ml-models/#current-status","text":"","title":"Current Status"},{"location":"Interview-Questions/Natural-Language-Processing/","text":"NLP Interview Questions","title":"Natural Language Processing (NLP)"},{"location":"Interview-Questions/Natural-Language-Processing/#nlp-interview-questions","text":"","title":"NLP Interview Questions"},{"location":"Interview-Questions/Probability/","text":"Probability Interview Questions Average score on a dice role of at most 3 times Question Consider a fair 6-sided dice. Your aim is to get the highest score you can, in at-most 3 roles. A score is defined as the number that appears on the face of the dice facing up after the role. You can role at most 3 times but every time you role it is up to you to decide whether you want to role again. The last score will be counted as your final score. Find the average score if you rolled the dice only once? Find the average score that you can get with at most 3 roles? If the dice is fair, why is the average score for at most 3 roles and 1 role not the same? Answer If you role a fair dice once you can get: Score Probability 1 \u2159 2 \u2159 3 \u2159 4 \u2159 5 \u2159 6 \u2159 So your average score with one role is: sum of(score * scores's probability) = (1+2+3+4+5+6)*(\u2159) = (21/6) = 3.5 The average score if you rolled the dice only once is 3.5 For at most 3 roles, let's try back-tracking. Let's say just did your second role and you have to decide whether to do your 3 rd role! We just found out if we role dice once on average we can expect score of 3.5. So we will only role the 3 rd time if score on 2 nd role is less than 3.5 i.e (1,2 or 3) Possibilities 2 nd role score Probability 3 rd role score Probability 1 \u2159 3.5 \u2159 2 \u2159 3.5 \u2159 3 \u2159 3.5 \u2159 4 \u2159 NA We won't role 5 \u2159 NA 3 rd time if we 6 \u2159 NA get score >3 on 2 nd So if we had 2 roles, average score would be: [We role again if current score is less than 3.4] (3.5)*(1/6) + (3.5)*(1/6) + (3.5)*(1/6) + (4)*(1/6) + (5)*(1/6) + (6)*(1/6) [Decide not to role again] = 1.75 + 2.5 = 4.25 The average score if you rolled the dice twice is 4.25 So now if we look from the perspective of first role. We will only role again if our score is less than 4.25 i.e 1,2,3 or 4 Possibilities 1 st role score Probability 2 nd and 3 rd role score Probability 1 \u2159 4.25 \u2159 2 \u2159 4.25 \u2159 3 \u2159 4.25 \u2159 4 \u2159 4.25 \u2159 5 \u2159 NA We won't role again if we 6 \u2159 NA get score >4.25 on 1 st So if we had 3 roles, average score would be: [We role again if current score is less than 4.25] (4.25)*(1/6) + (4.25)*(1/6) + (4.25)*(1/6) + (4.25)*(1/6) + (5)*(1/6) + (6)*(1/6) [[Decide not to role again] = 17/6 + 11/6 = 4.66 The average score if you rolled the dice only once is 4.66 The average score for at most 3 roles and 1 role is not the same because although the dice is fair the event of rolling the dice is no longer independent . The scores would have been the same if we rolled the dice 2 nd and 3 rd time without considering what we got in the last roll i.e. if the event of rolling the dice was independent.","title":"Probability"},{"location":"Interview-Questions/Probability/#probability-interview-questions","text":"","title":"Probability Interview Questions"},{"location":"Interview-Questions/Probability/#average-score-on-a-dice-role-of-at-most-3-times","text":"Question Consider a fair 6-sided dice. Your aim is to get the highest score you can, in at-most 3 roles. A score is defined as the number that appears on the face of the dice facing up after the role. You can role at most 3 times but every time you role it is up to you to decide whether you want to role again. The last score will be counted as your final score. Find the average score if you rolled the dice only once? Find the average score that you can get with at most 3 roles? If the dice is fair, why is the average score for at most 3 roles and 1 role not the same? Answer If you role a fair dice once you can get: Score Probability 1 \u2159 2 \u2159 3 \u2159 4 \u2159 5 \u2159 6 \u2159 So your average score with one role is: sum of(score * scores's probability) = (1+2+3+4+5+6)*(\u2159) = (21/6) = 3.5 The average score if you rolled the dice only once is 3.5 For at most 3 roles, let's try back-tracking. Let's say just did your second role and you have to decide whether to do your 3 rd role! We just found out if we role dice once on average we can expect score of 3.5. So we will only role the 3 rd time if score on 2 nd role is less than 3.5 i.e (1,2 or 3) Possibilities 2 nd role score Probability 3 rd role score Probability 1 \u2159 3.5 \u2159 2 \u2159 3.5 \u2159 3 \u2159 3.5 \u2159 4 \u2159 NA We won't role 5 \u2159 NA 3 rd time if we 6 \u2159 NA get score >3 on 2 nd So if we had 2 roles, average score would be: [We role again if current score is less than 3.4] (3.5)*(1/6) + (3.5)*(1/6) + (3.5)*(1/6) + (4)*(1/6) + (5)*(1/6) + (6)*(1/6) [Decide not to role again] = 1.75 + 2.5 = 4.25 The average score if you rolled the dice twice is 4.25 So now if we look from the perspective of first role. We will only role again if our score is less than 4.25 i.e 1,2,3 or 4 Possibilities 1 st role score Probability 2 nd and 3 rd role score Probability 1 \u2159 4.25 \u2159 2 \u2159 4.25 \u2159 3 \u2159 4.25 \u2159 4 \u2159 4.25 \u2159 5 \u2159 NA We won't role again if we 6 \u2159 NA get score >4.25 on 1 st So if we had 3 roles, average score would be: [We role again if current score is less than 4.25] (4.25)*(1/6) + (4.25)*(1/6) + (4.25)*(1/6) + (4.25)*(1/6) + (5)*(1/6) + (6)*(1/6) [[Decide not to role again] = 17/6 + 11/6 = 4.66 The average score if you rolled the dice only once is 4.66 The average score for at most 3 roles and 1 role is not the same because although the dice is fair the event of rolling the dice is no longer independent . The scores would have been the same if we rolled the dice 2 nd and 3 rd time without considering what we got in the last roll i.e. if the event of rolling the dice was independent.","title":"Average score on a dice role of at most 3 times"},{"location":"Interview-Questions/System-design/","text":"System Design","title":"System Design"},{"location":"Interview-Questions/System-design/#system-design","text":"","title":"System Design"},{"location":"Interview-Questions/data-structures-algorithms/","text":"Data Structure and Algorithms (DSA) To-do Add https://leetcode.com/discuss/interview-question/344650/Amazon-Online-Assessment-Questions \ud83d\ude01 Easy Two Number Sum Write a function that takes in a non-empty array of distinct integers and an integer representing a target sum. If any two numbers in the input array sum up to the target sum, the function should return them in an array, in any order. If no two numbers sum up to the target sum, the function should return an empty array. # O(n) time | O(n) space def twoNumberSum ( array , targetSum ): avail = set () for i , v in enumerate ( array ): if targetSum - v in avail : return [ targetSum - v , v ] else : avail . add ( v ) return [] pass # O(nlog(n)) time | O(1) space def twoNumberSum ( array , targetSum ): array . sort () n = len ( array ) left = 0 right = n - 1 while left < right : currSum = array [ left ] + array [ right ] if currSum == targetSum : return [ array [ left ], array [ right ]] elif currSum < targetSum : left += 1 elif currSum > targetSum : right -= 1 return [] pass # O(n^2) time | O(1) space def twoNumberSum ( array , targetSum ): n = len ( array ) for i in range ( n - 1 ): for j in range ( i + 1 , n ): if array [ i ] + array [ j ] == targetSum : return [ array [ i ], array [ j ]] return [] pass Validate Subsequence Given two non-empty arrays of integers, write a function that determines whether the second array is a subsequence of the first one. A subsequence of an array is a set of numbers that aren't necessarily adjacent in the array but that are in the same order as they appear in the array. For instance, the numbers [1, 3, 4] form a subsequence of the array [1, 2, 3, 4] , and so do the numbers [2, 4]. Note that a single number in an array and the array itself are both valid subsequences of the array. # O(n) time | O(1) space - where n is the length of the array def isValidSubsequence ( array , sequence ): pArray = pSequence = 0 while pArray < len ( array ) and pSequence < len ( sequence ): if array [ pArray ] == sequence [ pSequence ]: pArray += 1 pSequence += 1 else : pArray += 1 return pSequence == len ( sequence ) pass Nth Fibonacci The Fibonacci sequence is defined as follows: Any number in the sequence is the sum of the previous 2. for fib[n] = fib[n-1] + fib[n-2] The 1 st and 2 nd are fixed at 0,1 Find the nth Nth Fibonacci sequence # O(n) time | O(n) space def getNthFib ( n ): dp = [ 0 , 1 ] while len ( dp ) < n : dp . append ( dp [ - 1 ] + dp [ - 2 ]) return dp [ n - 1 ] pass # O(n) time | O(1) space def getNthFib ( n ): last_two = [ 0 , 1 ] count = 2 while count < n : currFib = last_two [ 0 ] + last_two [ 1 ] last_two [ 0 ] = last_two [ 1 ] last_two [ 1 ] = currFib count += 1 return last_two [ 1 ] if n > 1 else last_two [ 0 ] pass Product Sum Write a function that takes in a \"special\" array and returns its product sum. A \"special\" array is a non-empty array that contains either integers or other \"special\" arrays. The product sum of a \"special\" array is the sum of its elements, where \"special\" arrays inside it are summed themselves and then multiplied by their level of depth. For example, the product sum of [x, y] is x + y ; the product sum of [x, [y, z]] is x + 2y + 2z Eg: Input: [5, 2, [7, -1], 3, [6, [-13, 8], 4]] Output: 12 # calculated as: 5 + 2 + 2 * (7 - 1) + 3 + 2 * (6 + 3 * (-13 + 8) + 4) # O(n) time | O(d) space - where n is the total number of elements in the array, # including sub-elements, and d is the greatest depth of \"special\" arrays in the array def productSum ( array , depth = 1 ): sum = 0 for i , v in enumerate ( array ): if type ( v ) is list : sum += productSum ( v , depth + 1 ) else : sum += v return sum * depth pass \ud83d\ude42 Medium Top K Frequent Words Given a non-empty list of words, return the k most frequent elements. Your answer should be sorted by frequency from highest to lowest. If two words have the same frequency, then the word with the lower alphabetical order comes first. Example 1: Input: [\"i\", \"love\", \"leetcode\", \"i\", \"love\", \"coding\"], k = 2 Output: [\"i\", \"love\"] Explanation: \"i\" and \"love\" are the two most frequent words. Note that \"i\" comes before \"love\" due to a lower alphabetical order. Example 2: Input: [\"the\", \"day\", \"is\", \"sunny\", \"the\", \"the\", \"the\", \"sunny\", \"is\", \"is\"], k = 4 Output: [\"the\", \"is\", \"sunny\", \"day\"] Explanation: \"the\", \"is\", \"sunny\" and \"day\" are the four most frequent words, with the number of occurrence being 4, 3, 2 and 1 respectively. Note: You may assume k is always valid, 1 \u2264 k \u2264 number of unique elements. Input words contain only lowercase letters. Follow up: Try to solve it in O ( n log k ) time and O ( n ) extra space. # Count the frequency of each word, and # sort the words with a custom ordering relation # that uses these frequencies. Then take the best k of them. # Time Complexity: O(N \\log{N})O(NlogN), where NN is the length of words. # We count the frequency of each word in O(N)O(N) time, # then we sort the given words in O(N \\log{N})O(NlogN) time. # Space Complexity: O(N)O(N), the space used to store our uniqueWords. def topKFrequentWords ( words , k ) -> List [ str ]: from collections import Counter wordsFreq = Counter ( words ) uniqueWords = list ( wordsFreq . keys ()) uniqueWords . sort ( key = lambda x : ( - wordsFreq [ x ], x )) return uniqueWords [: k ] # Time Complexity: O(N \\log{k})O(Nlogk), where NN is the length of words. # We count the frequency of each word in O(N)O(N) time, then we add NN words to the heap, # each in O(\\log {k})O(logk) time. Finally, we pop from the heap up to kk times. # As k \\leq Nk\u2264N, this is O(N \\log{k})O(Nlogk) in total. # In Python, we improve this to O(N + k \\log {N})O(N+klogN): our heapq.heapify operation and # counting operations are O(N)O(N), and # each of kk heapq.heappop operations are O(\\log {N})O(logN). # Space Complexity: O(N)O(N), the space used to store our wordsFreq. # Count the frequency of each word, then add it to heap that stores the best k candidates. # Here, \"best\" is defined with our custom ordering relation, # which puts the worst candidates at the top of the heap. # At the end, we pop off the heap up to k times and reverse the result # so that the best candidates are first. # In Python, we instead use heapq.heapify, which can turn a list into a heap in linear time, # simplifying our work. def topKFrequentWords ( words , k ) -> List [ str ]: from heapq import heapify , heappop #, heappush from collections import Counter wordsFreq = Counter ( words ) heap = [( - freq , word ) for word , freq in wordsFreq . items ()] heapq . heapify ( heap ) return [ heapq . heappop ( heap )[ 1 ] for _ in range ( k )] \ud83e\udd28 Hard \ud83d\ude32 Very Hard","title":"DSA (Data Structures & Algorithms)"},{"location":"Interview-Questions/data-structures-algorithms/#data-structure-and-algorithms-dsa","text":"","title":"Data Structure and Algorithms (DSA)"},{"location":"Interview-Questions/data-structures-algorithms/#to-do","text":"Add https://leetcode.com/discuss/interview-question/344650/Amazon-Online-Assessment-Questions","title":"To-do"},{"location":"Interview-Questions/data-structures-algorithms/#easy","text":"","title":"\ud83d\ude01 Easy"},{"location":"Interview-Questions/data-structures-algorithms/#two-number-sum","text":"Write a function that takes in a non-empty array of distinct integers and an integer representing a target sum. If any two numbers in the input array sum up to the target sum, the function should return them in an array, in any order. If no two numbers sum up to the target sum, the function should return an empty array. # O(n) time | O(n) space def twoNumberSum ( array , targetSum ): avail = set () for i , v in enumerate ( array ): if targetSum - v in avail : return [ targetSum - v , v ] else : avail . add ( v ) return [] pass # O(nlog(n)) time | O(1) space def twoNumberSum ( array , targetSum ): array . sort () n = len ( array ) left = 0 right = n - 1 while left < right : currSum = array [ left ] + array [ right ] if currSum == targetSum : return [ array [ left ], array [ right ]] elif currSum < targetSum : left += 1 elif currSum > targetSum : right -= 1 return [] pass # O(n^2) time | O(1) space def twoNumberSum ( array , targetSum ): n = len ( array ) for i in range ( n - 1 ): for j in range ( i + 1 , n ): if array [ i ] + array [ j ] == targetSum : return [ array [ i ], array [ j ]] return [] pass","title":"Two Number Sum"},{"location":"Interview-Questions/data-structures-algorithms/#validate-subsequence","text":"Given two non-empty arrays of integers, write a function that determines whether the second array is a subsequence of the first one. A subsequence of an array is a set of numbers that aren't necessarily adjacent in the array but that are in the same order as they appear in the array. For instance, the numbers [1, 3, 4] form a subsequence of the array [1, 2, 3, 4] , and so do the numbers [2, 4]. Note that a single number in an array and the array itself are both valid subsequences of the array. # O(n) time | O(1) space - where n is the length of the array def isValidSubsequence ( array , sequence ): pArray = pSequence = 0 while pArray < len ( array ) and pSequence < len ( sequence ): if array [ pArray ] == sequence [ pSequence ]: pArray += 1 pSequence += 1 else : pArray += 1 return pSequence == len ( sequence ) pass","title":"Validate Subsequence"},{"location":"Interview-Questions/data-structures-algorithms/#nth-fibonacci","text":"The Fibonacci sequence is defined as follows: Any number in the sequence is the sum of the previous 2. for fib[n] = fib[n-1] + fib[n-2] The 1 st and 2 nd are fixed at 0,1 Find the nth Nth Fibonacci sequence # O(n) time | O(n) space def getNthFib ( n ): dp = [ 0 , 1 ] while len ( dp ) < n : dp . append ( dp [ - 1 ] + dp [ - 2 ]) return dp [ n - 1 ] pass # O(n) time | O(1) space def getNthFib ( n ): last_two = [ 0 , 1 ] count = 2 while count < n : currFib = last_two [ 0 ] + last_two [ 1 ] last_two [ 0 ] = last_two [ 1 ] last_two [ 1 ] = currFib count += 1 return last_two [ 1 ] if n > 1 else last_two [ 0 ] pass","title":"Nth Fibonacci"},{"location":"Interview-Questions/data-structures-algorithms/#product-sum","text":"Write a function that takes in a \"special\" array and returns its product sum. A \"special\" array is a non-empty array that contains either integers or other \"special\" arrays. The product sum of a \"special\" array is the sum of its elements, where \"special\" arrays inside it are summed themselves and then multiplied by their level of depth. For example, the product sum of [x, y] is x + y ; the product sum of [x, [y, z]] is x + 2y + 2z Eg: Input: [5, 2, [7, -1], 3, [6, [-13, 8], 4]] Output: 12 # calculated as: 5 + 2 + 2 * (7 - 1) + 3 + 2 * (6 + 3 * (-13 + 8) + 4) # O(n) time | O(d) space - where n is the total number of elements in the array, # including sub-elements, and d is the greatest depth of \"special\" arrays in the array def productSum ( array , depth = 1 ): sum = 0 for i , v in enumerate ( array ): if type ( v ) is list : sum += productSum ( v , depth + 1 ) else : sum += v return sum * depth pass","title":"Product Sum"},{"location":"Interview-Questions/data-structures-algorithms/#medium","text":"","title":"\ud83d\ude42 Medium"},{"location":"Interview-Questions/data-structures-algorithms/#top-k-frequent-words","text":"Given a non-empty list of words, return the k most frequent elements. Your answer should be sorted by frequency from highest to lowest. If two words have the same frequency, then the word with the lower alphabetical order comes first. Example 1: Input: [\"i\", \"love\", \"leetcode\", \"i\", \"love\", \"coding\"], k = 2 Output: [\"i\", \"love\"] Explanation: \"i\" and \"love\" are the two most frequent words. Note that \"i\" comes before \"love\" due to a lower alphabetical order. Example 2: Input: [\"the\", \"day\", \"is\", \"sunny\", \"the\", \"the\", \"the\", \"sunny\", \"is\", \"is\"], k = 4 Output: [\"the\", \"is\", \"sunny\", \"day\"] Explanation: \"the\", \"is\", \"sunny\" and \"day\" are the four most frequent words, with the number of occurrence being 4, 3, 2 and 1 respectively. Note: You may assume k is always valid, 1 \u2264 k \u2264 number of unique elements. Input words contain only lowercase letters. Follow up: Try to solve it in O ( n log k ) time and O ( n ) extra space. # Count the frequency of each word, and # sort the words with a custom ordering relation # that uses these frequencies. Then take the best k of them. # Time Complexity: O(N \\log{N})O(NlogN), where NN is the length of words. # We count the frequency of each word in O(N)O(N) time, # then we sort the given words in O(N \\log{N})O(NlogN) time. # Space Complexity: O(N)O(N), the space used to store our uniqueWords. def topKFrequentWords ( words , k ) -> List [ str ]: from collections import Counter wordsFreq = Counter ( words ) uniqueWords = list ( wordsFreq . keys ()) uniqueWords . sort ( key = lambda x : ( - wordsFreq [ x ], x )) return uniqueWords [: k ] # Time Complexity: O(N \\log{k})O(Nlogk), where NN is the length of words. # We count the frequency of each word in O(N)O(N) time, then we add NN words to the heap, # each in O(\\log {k})O(logk) time. Finally, we pop from the heap up to kk times. # As k \\leq Nk\u2264N, this is O(N \\log{k})O(Nlogk) in total. # In Python, we improve this to O(N + k \\log {N})O(N+klogN): our heapq.heapify operation and # counting operations are O(N)O(N), and # each of kk heapq.heappop operations are O(\\log {N})O(logN). # Space Complexity: O(N)O(N), the space used to store our wordsFreq. # Count the frequency of each word, then add it to heap that stores the best k candidates. # Here, \"best\" is defined with our custom ordering relation, # which puts the worst candidates at the top of the heap. # At the end, we pop off the heap up to k times and reverse the result # so that the best candidates are first. # In Python, we instead use heapq.heapify, which can turn a list into a heap in linear time, # simplifying our work. def topKFrequentWords ( words , k ) -> List [ str ]: from heapq import heapify , heappop #, heappush from collections import Counter wordsFreq = Counter ( words ) heap = [( - freq , word ) for word , freq in wordsFreq . items ()] heapq . heapify ( heap ) return [ heapq . heappop ( heap )[ 1 ] for _ in range ( k )]","title":"Top K Frequent Words"},{"location":"Interview-Questions/data-structures-algorithms/#hard","text":"","title":"\ud83e\udd28 Hard"},{"location":"Interview-Questions/data-structures-algorithms/#very-hard","text":"","title":"\ud83d\ude32 Very Hard"},{"location":"Machine-Learning/ARIMA/","text":"","title":"ARIMA"},{"location":"Machine-Learning/Activation%20functions/","text":"","title":"Activation functions"},{"location":"Machine-Learning/Collaborative%20Filtering/","text":"","title":"Collaborative Filtering"},{"location":"Machine-Learning/Confusion%20Matrix/","text":"","title":"Confusion Matrix"},{"location":"Machine-Learning/DBSCAN/","text":"","title":"DBSCAN"},{"location":"Machine-Learning/Decision%20Trees/","text":"","title":"Decision Trees"},{"location":"Machine-Learning/Gradient%20Boosting/","text":"","title":"Gradient Boosting"},{"location":"Machine-Learning/K-means%20clustering/","text":"","title":"K means clustering"},{"location":"Machine-Learning/Linear%20Regression/","text":"","title":"Linear Regression"},{"location":"Machine-Learning/Logistic%20Regression/","text":"","title":"Logistic Regression"},{"location":"Machine-Learning/Loss%20Function%20MAE%2C%20RMSE/","text":"","title":"Loss Function MAE, RMSE"},{"location":"Machine-Learning/Neural%20Networks/","text":"","title":"Neural Networks"},{"location":"Machine-Learning/Normal%20Distribution/","text":"","title":"Normal Distribution"},{"location":"Machine-Learning/Normalization%20Regularisation/","text":"","title":"Normalization Regularisation"},{"location":"Machine-Learning/Overfitting%2C%20Underfitting/","text":"","title":"Overfitting, Underfitting"},{"location":"Machine-Learning/PCA/","text":"","title":"PCA"},{"location":"Machine-Learning/Random%20Forest/","text":"","title":"Random Forest"},{"location":"Machine-Learning/Support%20Vector%20Machines/","text":"","title":"Support Vector Machines"},{"location":"Machine-Learning/Unbalanced%2C%20Skewed%20data/","text":"","title":"Unbalanced, Skewed data"},{"location":"Machine-Learning/kNN/","text":"","title":"kNN"},{"location":"Online-Material/Online-Material-for-Learning/","text":"","title":"Online Study Material"},{"location":"Online-Material/popular-resouces/","text":"","title":"Popular Blogs"},{"location":"as-fast-as-possible/","text":"","title":"Introduction"},{"location":"as-fast-as-possible/Deep-CV/","text":"","title":"Deep Computer Vision"},{"location":"as-fast-as-possible/Deep-NLP/","text":"","title":"Deep Natural Language Processing"},{"location":"as-fast-as-possible/Neural-Networks/","text":"","title":"Neural Networks"},{"location":"as-fast-as-possible/TF2-Keras/","text":"","title":"Tensorflow 2 with Keras"}]}
\ No newline at end of file
+{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"Home Introduction This is a completely open-source platform for maintaining curated list of interview questions and answers for people looking and preparing for data science opportunities. Not only this, the platform will also serve as one point destination for all your needs like tutorials, online materials, etc. This platform is maintained by you! \ud83e\udd17 You can help us by answering/ improving existing questions as well as by sharing any new questions that you faced during your interviews. Contribute to the platform Contribution in any form will be deeply appreciated. \ud83d\ude4f Add questions \u2753 Add your questions here . Please ensure to provide a detailed description to allow your fellow contributors to understand your questions and answer them to your satisfaction. \ud83e\udd1d Please note that as of now, you cannot directly add a question via a pull request. This will help us to maintain the quality of the content for you . Add answers/topics \ud83d\udcdd These are the answers/topics that need your help at the moment Add documentation for the project Online Material for Learning Suggested Learning Paths Cheat Sheets Django Flask Numpy Pandas PySpark Python RegEx SQL NLP Interview Questions Add python common DSA interview questions Add Major ML topics Linear Regression Logistic Regression SVM Random Forest Gradient boosting PCA Collaborative Filtering K-means clustering kNN ARIMA Neural Networks Decision Trees Overfitting, Underfitting Unbalanced, Skewed data Activation functions relu/ leaky relu Normalization DBSCAN Normal Distribution Precision, Recall Loss Function MAE, RMSE Add Pandas questions Add NumPy questions Add TensorFlow questions Add PyTorch questions Add list of learning resources Report/Solve Issues \ud83d\udd27 To report any issues find me on LinkedIn or raise an issue on GitHub. \ud83d\udee0 You can also solve existing issues on GitHub and create a pull request. Say Thanks \ud83d\ude0a If this platform helped you in any way, it would be great if you could share it with others. Check out this \ud83d\udc47 platform \ud83d\udc47 for data science content: \ud83d\udc49 https://singhsidhukuldeep.github.io/data-science-interview-prep/ \ud83d\udc48 You can also star the repository on GitHub and watch-out for any updates Features \ud83c\udfa8 Beautiful: The design is built on top of most popular libraries like MkDocs and material which allows the platform to be responsive and to work on all sorts of devices \u2013 from mobile phones to wide-screens. The underlying fluid layout will always adapt perfectly to the available screen space. \ud83e\uddd0 Searchable: almost magically, all the content on the website is searchable without any further ado. The built-in search \u2013 server-less \u2013 is fast and accurate in responses to any of the queries. \ud83d\ude4c Accessible: Easy to use: \ud83d\udc4c The website is hosted on github-pages and is free and open to use to over 40 million users of GitHub in 100+ countries. Easy to contribute: \ud83e\udd1d The website embodies the concept of collaboration to the latter. Allowing anyone to add/improve the content. To make contributing easy, everything is written in MarkDown and then compiled to beautiful html. Setup No setup is required for usage of the platform Important: It is strongly advised to use virtual environment and not change anything in gh-pages Linux Systems python3 -m venv ./venv source venv/bin/activate pip3 install -r requirements.txt deactivate Windows Systems python3 -m venv ./venv venv \\S cripts \\a ctivate pip3 install -r requirements.txt venv \\S cripts \\d eactivate To install the latest pip3 install mkdocs pip3 install mkdocs-material pip3 install mkdocs-minify-plugin pip3 install mkdocs-git-revision-date-localized-plugin Useful Commands mkdocs serve - Start the live-reloading docs server. mkdocs build - Build the documentation site. mkdocs -h - Print help message and exit. mkdocs gh-deploy - Use mkdocs gh-deploy --help to get a full list of options available for the gh-deploy command. Be aware that you will not be able to review the built site before it is pushed to GitHub. Therefore, you may want to verify any changes you make to the docs beforehand by using the build or serve commands and reviewing the built files locally. mkdocs new [dir-name] - Create a new project. No need to create a new project Useful Documents \ud83d\udcd1 MkDocs: GitHub: https://github.com/mkdocs/mkdocs Documentation: https://www.mkdocs.org/ \ud83c\udfa8 Theme: GitHub: https://github.com/squidfunk/mkdocs-material Documentation: https://squidfunk.github.io/mkdocs-material/getting-started/ FAQ Can I filter questions based on companies? \ud83e\udd2a As much as this platform aims to help you with your interview preparation, it is not a short-cut to crack one. Think of this platform as a practicing field to help you sharpen your skills for your interview processes. However, for your convenience we have sorted all the questions by topics for you. \ud83e\udd13 This doesn't mean that such feature won't be added in the future. \"Never say Never\" But as of now there is neither plan nor data to do so. \ud83d\ude22 Why is this platform free? \ud83e\udd17 Currently there is no major cost involved in maintaining this platform other than time and effort that is put in by every contributor . If you want to help you can contribute here . If you still want to pay for something that is free, we would request you to donate it to a charity of your choice instead. \ud83d\ude07 Credits Maintained by \ud83d\udc68\u200d\ud83c\udf93 Kuldeep Singh Sidhu Github: github/singhsidhukuldeep https://github.com/singhsidhukuldeep Website: Kuldeep Singh Sidhu (Website) http://kuldeepsinghsidhu.com LinkedIn: Kuldeep Singh Sidhu (LinkedIn) https://www.linkedin.com/in/singhsidhukuldeep/ Contributors \ud83d\ude0e The full list of all the contributors is available here Current Status","title":"\ud83c\udfe1 Home"},{"location":"#home","text":"","title":"Home"},{"location":"#introduction","text":"This is a completely open-source platform for maintaining curated list of interview questions and answers for people looking and preparing for data science opportunities. Not only this, the platform will also serve as one point destination for all your needs like tutorials, online materials, etc. This platform is maintained by you! \ud83e\udd17 You can help us by answering/ improving existing questions as well as by sharing any new questions that you faced during your interviews.","title":"Introduction"},{"location":"#contribute-to-the-platform","text":"Contribution in any form will be deeply appreciated. \ud83d\ude4f","title":"Contribute to the platform"},{"location":"#add-questions","text":"\u2753 Add your questions here . Please ensure to provide a detailed description to allow your fellow contributors to understand your questions and answer them to your satisfaction. \ud83e\udd1d Please note that as of now, you cannot directly add a question via a pull request. This will help us to maintain the quality of the content for you .","title":"Add questions"},{"location":"#add-answerstopics","text":"\ud83d\udcdd These are the answers/topics that need your help at the moment Add documentation for the project Online Material for Learning Suggested Learning Paths Cheat Sheets Django Flask Numpy Pandas PySpark Python RegEx SQL NLP Interview Questions Add python common DSA interview questions Add Major ML topics Linear Regression Logistic Regression SVM Random Forest Gradient boosting PCA Collaborative Filtering K-means clustering kNN ARIMA Neural Networks Decision Trees Overfitting, Underfitting Unbalanced, Skewed data Activation functions relu/ leaky relu Normalization DBSCAN Normal Distribution Precision, Recall Loss Function MAE, RMSE Add Pandas questions Add NumPy questions Add TensorFlow questions Add PyTorch questions Add list of learning resources","title":"Add answers/topics"},{"location":"#reportsolve-issues","text":"\ud83d\udd27 To report any issues find me on LinkedIn or raise an issue on GitHub. \ud83d\udee0 You can also solve existing issues on GitHub and create a pull request.","title":"Report/Solve Issues"},{"location":"#say-thanks","text":"\ud83d\ude0a If this platform helped you in any way, it would be great if you could share it with others. Check out this \ud83d\udc47 platform \ud83d\udc47 for data science content: \ud83d\udc49 https://singhsidhukuldeep.github.io/data-science-interview-prep/ \ud83d\udc48 You can also star the repository on GitHub and watch-out for any updates","title":"Say Thanks"},{"location":"#features","text":"\ud83c\udfa8 Beautiful: The design is built on top of most popular libraries like MkDocs and material which allows the platform to be responsive and to work on all sorts of devices \u2013 from mobile phones to wide-screens. The underlying fluid layout will always adapt perfectly to the available screen space. \ud83e\uddd0 Searchable: almost magically, all the content on the website is searchable without any further ado. The built-in search \u2013 server-less \u2013 is fast and accurate in responses to any of the queries. \ud83d\ude4c Accessible: Easy to use: \ud83d\udc4c The website is hosted on github-pages and is free and open to use to over 40 million users of GitHub in 100+ countries. Easy to contribute: \ud83e\udd1d The website embodies the concept of collaboration to the latter. Allowing anyone to add/improve the content. To make contributing easy, everything is written in MarkDown and then compiled to beautiful html.","title":"Features"},{"location":"#setup","text":"No setup is required for usage of the platform Important: It is strongly advised to use virtual environment and not change anything in gh-pages","title":"Setup"},{"location":"#linux-systems","text":"python3 -m venv ./venv source venv/bin/activate pip3 install -r requirements.txt deactivate","title":"Linux Systems"},{"location":"#windows-systems","text":"python3 -m venv ./venv venv \\S cripts \\a ctivate pip3 install -r requirements.txt venv \\S cripts \\d eactivate","title":"Windows Systems"},{"location":"#to-install-the-latest","text":"pip3 install mkdocs pip3 install mkdocs-material pip3 install mkdocs-minify-plugin pip3 install mkdocs-git-revision-date-localized-plugin","title":"To install the latest"},{"location":"#useful-commands","text":"mkdocs serve - Start the live-reloading docs server. mkdocs build - Build the documentation site. mkdocs -h - Print help message and exit. mkdocs gh-deploy - Use mkdocs gh-deploy --help to get a full list of options available for the gh-deploy command. Be aware that you will not be able to review the built site before it is pushed to GitHub. Therefore, you may want to verify any changes you make to the docs beforehand by using the build or serve commands and reviewing the built files locally. mkdocs new [dir-name] - Create a new project. No need to create a new project","title":"Useful Commands"},{"location":"#useful-documents","text":"\ud83d\udcd1 MkDocs: GitHub: https://github.com/mkdocs/mkdocs Documentation: https://www.mkdocs.org/ \ud83c\udfa8 Theme: GitHub: https://github.com/squidfunk/mkdocs-material Documentation: https://squidfunk.github.io/mkdocs-material/getting-started/","title":"Useful Documents"},{"location":"#faq","text":"Can I filter questions based on companies? \ud83e\udd2a As much as this platform aims to help you with your interview preparation, it is not a short-cut to crack one. Think of this platform as a practicing field to help you sharpen your skills for your interview processes. However, for your convenience we have sorted all the questions by topics for you. \ud83e\udd13 This doesn't mean that such feature won't be added in the future. \"Never say Never\" But as of now there is neither plan nor data to do so. \ud83d\ude22 Why is this platform free? \ud83e\udd17 Currently there is no major cost involved in maintaining this platform other than time and effort that is put in by every contributor . If you want to help you can contribute here . If you still want to pay for something that is free, we would request you to donate it to a charity of your choice instead. \ud83d\ude07","title":"FAQ"},{"location":"#credits","text":"","title":"Credits"},{"location":"#maintained-by","text":"\ud83d\udc68\u200d\ud83c\udf93 Kuldeep Singh Sidhu Github: github/singhsidhukuldeep https://github.com/singhsidhukuldeep Website: Kuldeep Singh Sidhu (Website) http://kuldeepsinghsidhu.com LinkedIn: Kuldeep Singh Sidhu (LinkedIn) https://www.linkedin.com/in/singhsidhukuldeep/","title":"Maintained by"},{"location":"#contributors","text":"\ud83d\ude0e The full list of all the contributors is available here","title":"Contributors"},{"location":"#current-status","text":"","title":"Current Status"},{"location":"Suggested-Learning-Paths/","text":"","title":"\ud83d\udcc5 Suggested Learning Paths"},{"location":"projects/","text":"Projects Introduction These are the projects that you can take inspiration from and try to improve on them. \u270d\ufe0f Popular Sources List of projects Natural Language processing (NLP) Title Description Source Author Text Classification with Facebook fasttext Building the User Review Model with fastText (Text Classification) with response time of less than one second Kuldeep Singh Sidhu Chat-bot using ChatterBot ChatterBot is a Python library that makes it easy to generate automated responses to a user\u2019s input. Kuldeep Singh Sidhu Text Summarizer Comparing state of the art models for text summary generation Kuldeep Singh Sidhu NLP with Spacy Building NLP pipeline using Spacy Kuldeep Singh Sidhu Recommendation Engine Title Description Source Author Recommendation Engine with Surprise Comparing different recommendation systems algorithms like SVD, SVDpp (Matrix Factorization), KNN Baseline, KNN Basic, KNN Means, KNN ZScore), Baseline, Co Clustering Kuldeep Singh Sidhu Image Processing Title Description Source Author Facial Landmarks Using Dlib, a library capable of giving you 68 points (land marks) of the face. Kuldeep Singh Sidhu Reinforcement Learning Title Description Source Author Google Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. Kuldeep Singh Sidhu Tic Tac Toe Training a computer to play Tic Tac Toe using reinforcement learning algorithms. Kuldeep Singh Sidhu Others Title Description Source Author TensorFlow Eager Execution Eager Execution (EE) enables you to run operations immediately. Kuldeep Singh Sidhu","title":"\ud83d\udcf3 Projects"},{"location":"projects/#projects","text":"","title":"Projects"},{"location":"projects/#introduction","text":"These are the projects that you can take inspiration from and try to improve on them. \u270d\ufe0f","title":"Introduction"},{"location":"projects/#popular-sources","text":"","title":"Popular Sources"},{"location":"projects/#list-of-projects","text":"","title":"List of projects"},{"location":"projects/#natural-language-processing-nlp","text":"Title Description Source Author Text Classification with Facebook fasttext Building the User Review Model with fastText (Text Classification) with response time of less than one second Kuldeep Singh Sidhu Chat-bot using ChatterBot ChatterBot is a Python library that makes it easy to generate automated responses to a user\u2019s input. Kuldeep Singh Sidhu Text Summarizer Comparing state of the art models for text summary generation Kuldeep Singh Sidhu NLP with Spacy Building NLP pipeline using Spacy Kuldeep Singh Sidhu","title":"Natural Language processing (NLP)"},{"location":"projects/#recommendation-engine","text":"Title Description Source Author Recommendation Engine with Surprise Comparing different recommendation systems algorithms like SVD, SVDpp (Matrix Factorization), KNN Baseline, KNN Basic, KNN Means, KNN ZScore), Baseline, Co Clustering Kuldeep Singh Sidhu","title":"Recommendation Engine"},{"location":"projects/#image-processing","text":"Title Description Source Author Facial Landmarks Using Dlib, a library capable of giving you 68 points (land marks) of the face. Kuldeep Singh Sidhu","title":"Image Processing"},{"location":"projects/#reinforcement-learning","text":"Title Description Source Author Google Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. Kuldeep Singh Sidhu Tic Tac Toe Training a computer to play Tic Tac Toe using reinforcement learning algorithms. Kuldeep Singh Sidhu","title":"Reinforcement Learning"},{"location":"projects/#others","text":"Title Description Source Author TensorFlow Eager Execution Eager Execution (EE) enables you to run operations immediately. Kuldeep Singh Sidhu","title":"Others"},{"location":"Cheat-Sheets/Django/","text":"","title":"Django"},{"location":"Cheat-Sheets/Flask/","text":"","title":"Flask"},{"location":"Cheat-Sheets/Hypothesis-Tests/","text":"Hypothesis Tests in Python A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. Few Notes: When it comes to assumptions such as the expected distribution of data or sample size, the results of a given test are likely to degrade gracefully rather than become immediately unusable if an assumption is violated. Generally, data samples need to be representative of the domain and large enough to expose their distribution to analysis. In some cases, the data can be corrected to meet the assumptions, such as correcting a nearly normal distribution to be normal by removing outliers, or using a correction to the degrees of freedom in a statistical test when samples have differing variance, to name two examples. Normality Tests This section lists statistical tests that you can use to check if your data has a Gaussian distribution. Gaussian distribution (also known as normal distribution) is a bell-shaped curve. Shapiro-Wilk Test Tests whether a data sample has a Gaussian distribution/Normal distribution. Assumptions Observations in each sample are independent and identically distributed (iid). Interpretation H0: the sample has a Gaussian distribution. H1: the sample does not have a Gaussian distribution. Python Code # Example of the Shapiro-Wilk Normality Test from scipy.stats import shapiro data = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] stat , p = shapiro ( data ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably Gaussian' ) else : print ( 'Probably not Gaussian' ) Sources scipy.stats.shapiro Shapiro-Wilk test on Wikipedia D\u2019Agostino\u2019s K^2 Test Tests whether a data sample has a Gaussian distribution/Normal distribution. Assumptions Observations in each sample are independent and identically distributed (iid). Interpretation H0: the sample has a Gaussian distribution. H1: the sample does not have a Gaussian distribution. Python Code # Example of the D'Agostino's K^2 Normality Test from scipy.stats import normaltest data = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] stat , p = normaltest ( data ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably Gaussian' ) else : print ( 'Probably not Gaussian' ) Sources scipy.stats.normaltest D'Agostino's K-squared test on Wikipedia Anderson-Darling Test Tests whether a data sample has a Gaussian distribution/Normal distribution. Assumptions Observations in each sample are independent and identically distributed (iid). Interpretation H0: the sample has a Gaussian distribution. H1: the sample does not have a Gaussian distribution. Python Code # Example of the Anderson-Darling Normality Test from scipy.stats import anderson data = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] result = anderson ( data ) print ( 'stat= %.3f ' % ( result . statistic )) for i in range ( len ( result . critical_values )): sl , cv = result . significance_level [ i ], result . critical_values [ i ] if result . statistic < cv : print ( 'Probably Gaussian at the %.1f%% level' % ( sl )) else : print ( 'Probably not Gaussian at the %.1f%% level' % ( sl )) Sources scipy.stats.anderson Anderson-Darling test on Wikipedia Correlation Tests This section lists statistical tests that you can use to check if two samples are related. Pearson\u2019s Correlation Coefficient Tests whether two samples have a linear relationship. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance. Interpretation H0: the two samples are independent. H1: there is a dependency between the samples. Python Code # Example of the Pearson's Correlation test from scipy.stats import pearsonr data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 0.353 , 3.517 , 0.125 , - 7.545 , - 0.555 , - 1.536 , 3.350 , - 1.578 , - 3.537 , - 1.579 ] stat , p = pearsonr ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably independent' ) else : print ( 'Probably dependent' ) Sources scipy.stats.pearsonr Pearson's correlation coefficient on Wikipedia Spearman\u2019s Rank Correlation Tests whether two samples have a monotonic relationship. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample can be ranked. Interpretation H0: the two samples are independent. H1: there is a dependency between the samples. Python Code # Example of the Spearman's Rank Correlation Test from scipy.stats import spearmanr data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 0.353 , 3.517 , 0.125 , - 7.545 , - 0.555 , - 1.536 , 3.350 , - 1.578 , - 3.537 , - 1.579 ] stat , p = spearmanr ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably independent' ) else : print ( 'Probably dependent' ) Sources scipy.stats.spearmanr Spearman's rank correlation coefficient on Wikipedia Kendall\u2019s Rank Correlation Tests whether two samples have a monotonic relationship. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample can be ranked. Interpretation H0: the two samples are independent. H1: there is a dependency between the samples. Python Code # Example of the Kendall's Rank Correlation Test from scipy.stats import kendalltau data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 0.353 , 3.517 , 0.125 , - 7.545 , - 0.555 , - 1.536 , 3.350 , - 1.578 , - 3.537 , - 1.579 ] stat , p = kendalltau ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably independent' ) else : print ( 'Probably dependent' ) Sources scipy.stats.kendalltau Kendall rank correlation coefficient on Wikipedia Chi-Squared Test Tests whether two categorical variables are related or independent. Assumptions Observations used in the calculation of the contingency table are independent. 25 or more examples in each cell of the contingency table. Interpretation H0: the two samples are independent. H1: there is a dependency between the samples. Python Code # Example of the Chi-Squared Test from scipy.stats import chi2_contingency table = [[ 10 , 20 , 30 ],[ 6 , 9 , 17 ]] stat , p , dof , expected = chi2_contingency ( table ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably independent' ) else : print ( 'Probably dependent' ) Sources scipy.stats.chi2_contingency Chi-Squared test on Wikipedia Stationary Tests This section lists statistical tests that you can use to check if a time series is stationary or not. Augmented Dickey-Fuller Unit Root Test Tests whether a time series has a unit root, e.g. has a trend or more generally is autoregressive. Assumptions Observations in are temporally ordered. Interpretation H0: a unit root is present (series is non-stationary). H1: a unit root is not present (series is stationary). Python Code # Example of the Augmented Dickey-Fuller unit root test from statsmodels.tsa.stattools import adfuller data = [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 ] stat , p , lags , obs , crit , t = adfuller ( data ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably not Stationary' ) else : print ( 'Probably Stationary' ) Sources statsmodels.tsa.stattools.adfuller API . Augmented Dickey--Fuller test, Wikipedia . Kwiatkowski-Phillips-Schmidt-Shin Tests whether a time series is trend stationary or not. Assumptions Observations in are temporally ordered. Interpretation H0: the time series is trend-stationary. H1: the time series is not trend-stationary. Python Code # Example of the Kwiatkowski-Phillips-Schmidt-Shin test from statsmodels.tsa.stattools import kpss data = [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 ] stat , p , lags , crit = kpss ( data ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably Stationary' ) else : print ( 'Probably not Stationary' ) Sources statsmodels.tsa.stattools.kpss API . KPSS test, Wikipedia . Parametric Statistical Hypothesis Tests This section lists statistical tests that you can use to compare data samples. Student\u2019s t-test Tests whether the means of two independent samples are significantly different. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance. Interpretation H0: the means of the samples are equal. H1: the means of the samples are unequal. Python Code # Example of the Student's t-test from scipy.stats import ttest_ind data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 1.142 , - 0.432 , - 0.938 , - 0.729 , - 0.846 , - 0.157 , 0.500 , 1.183 , - 1.075 , - 0.169 ] stat , p = ttest_ind ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same distribution' ) else : print ( 'Probably different distributions' ) Sources scipy.stats.ttest_ind Student's t-test on Wikipedia Paired Student\u2019s t-test Tests whether the means of two independent samples are significantly different. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance. Observations across each sample are paired. Interpretation H0: the means of the samples are equal. H1: the means of the samples are unequal. Python Code # Example of the Paired Student's t-test from scipy.stats import ttest_rel data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 1.142 , - 0.432 , - 0.938 , - 0.729 , - 0.846 , - 0.157 , 0.500 , 1.183 , - 1.075 , - 0.169 ] stat , p = ttest_rel ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same distribution' ) else : print ( 'Probably different distributions' ) Sources scipy.stats.ttest_rel Student's t-test on Wikipedia Analysis of Variance Test (ANOVA) Tests whether the means of two or more independent samples are significantly different. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance. Interpretation H0: the means of the samples are equal. H1: the means of the samples are unequal. Python Code # Example of the Analysis of Variance Test from scipy.stats import f_oneway data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 1.142 , - 0.432 , - 0.938 , - 0.729 , - 0.846 , - 0.157 , 0.500 , 1.183 , - 1.075 , - 0.169 ] data3 = [ - 0.208 , 0.696 , 0.928 , - 1.148 , - 0.213 , 0.229 , 0.137 , 0.269 , - 0.870 , - 1.204 ] stat , p = f_oneway ( data1 , data2 , data3 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same distribution' ) else : print ( 'Probably different distributions' ) Sources scipy.stats.f_oneway Analysis of variance on Wikipedia Repeated Measures ANOVA Test Tests whether the means of two or more paired samples are significantly different. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance. Observations across each sample are paired. Interpretation H0: the means of the samples are equal. H1: one or more of the means of the samples are unequal. Python Code # Currently not supported in Python. :( Sources Analysis of variance on Wikipedia Nonparametric Statistical Hypothesis Tests In Non-Parametric tests, we don't make any assumption about the parameters for the given population or the population we are studying. In fact, these tests don't depend on the population. Hence, there is no fixed set of parameters is available, and also there is no distribution (normal distribution, etc.) Mann-Whitney U Test Tests whether the distributions of two independent samples are equal or not. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample can be ranked. Interpretation H0: the distributions of both samples are equal. H1: the distributions of both samples are not equal. Python Code # Example of the Mann-Whitney U Test from scipy.stats import mannwhitneyu data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 1.142 , - 0.432 , - 0.938 , - 0.729 , - 0.846 , - 0.157 , 0.500 , 1.183 , - 1.075 , - 0.169 ] stat , p = mannwhitneyu ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same distribution' ) else : print ( 'Probably different distributions' ) Sources scipy.stats.mannwhitneyu Mann-Whitney U test on Wikipedia Wilcoxon Signed-Rank Test Tests whether the distributions of two paired samples are equal or not. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample can be ranked. Observations across each sample are paired. Interpretation H0: the distributions of both samples are equal. H1: the distributions of both samples are not equal. Python Code # Example of the Wilcoxon Signed-Rank Test from scipy.stats import wilcoxon data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 1.142 , - 0.432 , - 0.938 , - 0.729 , - 0.846 , - 0.157 , 0.500 , 1.183 , - 1.075 , - 0.169 ] stat , p = wilcoxon ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same distribution' ) else : print ( 'Probably different distributions' ) Sources scipy.stats.wilcoxon Wilcoxon signed-rank test on Wikipedia Kruskal-Wallis H Test Tests whether the distributions of two or more independent samples are equal or not. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample can be ranked. Interpretation H0: the distributions of all samples are equal. H1: the distributions of one or more samples are not equal. Python Code # Example of the Kruskal-Wallis H Test from scipy.stats import kruskal data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 1.142 , - 0.432 , - 0.938 , - 0.729 , - 0.846 , - 0.157 , 0.500 , 1.183 , - 1.075 , - 0.169 ] stat , p = kruskal ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same distribution' ) else : print ( 'Probably different distributions' ) Sources scipy.stats.kruskal Kruskal-Wallis one-way analysis of variance on Wikipedia Friedman Test Tests whether the distributions of two or more paired samples are equal or not. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample can be ranked. Observations across each sample are paired. Interpretation H0: the distributions of all samples are equal. H1: the distributions of one or more samples are not equal. Python Code # Example of the Friedman Test from scipy.stats import friedmanchisquare data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 1.142 , - 0.432 , - 0.938 , - 0.729 , - 0.846 , - 0.157 , 0.500 , 1.183 , - 1.075 , - 0.169 ] data3 = [ - 0.208 , 0.696 , 0.928 , - 1.148 , - 0.213 , 0.229 , 0.137 , 0.269 , - 0.870 , - 1.204 ] stat , p = friedmanchisquare ( data1 , data2 , data3 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same distribution' ) else : print ( 'Probably different distributions' ) Sources scipy.stats.friedmanchisquare Friedman test on Wikipedia Equality of variance test Test is used to assess the equality of variance between two different samples. Levene's test Levene\u2019s test is used to assess the equality of variance between two or more different samples. Assumptions The samples from the populations under consideration are independent. The populations under consideration are approximately normally distributed. Interpretation H0: All the samples variances are equal H1: At least one variance is different from the rest Python Code # Example of the Levene's test from scipy.stats import levene a = [ 8.88 , 9.12 , 9.04 , 8.98 , 9.00 , 9.08 , 9.01 , 8.85 , 9.06 , 8.99 ] b = [ 8.88 , 8.95 , 9.29 , 9.44 , 9.15 , 9.58 , 8.36 , 9.18 , 8.67 , 9.05 ] c = [ 8.95 , 9.12 , 8.95 , 8.85 , 9.03 , 8.84 , 9.07 , 8.98 , 8.86 , 8.98 ] stat , p = levene ( a , b , c ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same variances' ) else : print ( 'Probably at least one variance is different from the rest' ) Sources scipy.stats.levene Levene's test on Wikipedia Source: https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/","title":"Hypothesis Tests"},{"location":"Cheat-Sheets/Hypothesis-Tests/#hypothesis-tests-in-python","text":"A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. Few Notes: When it comes to assumptions such as the expected distribution of data or sample size, the results of a given test are likely to degrade gracefully rather than become immediately unusable if an assumption is violated. Generally, data samples need to be representative of the domain and large enough to expose their distribution to analysis. In some cases, the data can be corrected to meet the assumptions, such as correcting a nearly normal distribution to be normal by removing outliers, or using a correction to the degrees of freedom in a statistical test when samples have differing variance, to name two examples.","title":"Hypothesis Tests in Python"},{"location":"Cheat-Sheets/Hypothesis-Tests/#normality-tests","text":"This section lists statistical tests that you can use to check if your data has a Gaussian distribution. Gaussian distribution (also known as normal distribution) is a bell-shaped curve.","title":"Normality Tests"},{"location":"Cheat-Sheets/Hypothesis-Tests/#shapiro-wilk-test","text":"Tests whether a data sample has a Gaussian distribution/Normal distribution. Assumptions Observations in each sample are independent and identically distributed (iid). Interpretation H0: the sample has a Gaussian distribution. H1: the sample does not have a Gaussian distribution. Python Code # Example of the Shapiro-Wilk Normality Test from scipy.stats import shapiro data = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] stat , p = shapiro ( data ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably Gaussian' ) else : print ( 'Probably not Gaussian' ) Sources scipy.stats.shapiro Shapiro-Wilk test on Wikipedia","title":"Shapiro-Wilk Test"},{"location":"Cheat-Sheets/Hypothesis-Tests/#dagostinos-k2-test","text":"Tests whether a data sample has a Gaussian distribution/Normal distribution. Assumptions Observations in each sample are independent and identically distributed (iid). Interpretation H0: the sample has a Gaussian distribution. H1: the sample does not have a Gaussian distribution. Python Code # Example of the D'Agostino's K^2 Normality Test from scipy.stats import normaltest data = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] stat , p = normaltest ( data ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably Gaussian' ) else : print ( 'Probably not Gaussian' ) Sources scipy.stats.normaltest D'Agostino's K-squared test on Wikipedia","title":"D\u2019Agostino\u2019s K^2 Test"},{"location":"Cheat-Sheets/Hypothesis-Tests/#anderson-darling-test","text":"Tests whether a data sample has a Gaussian distribution/Normal distribution. Assumptions Observations in each sample are independent and identically distributed (iid). Interpretation H0: the sample has a Gaussian distribution. H1: the sample does not have a Gaussian distribution. Python Code # Example of the Anderson-Darling Normality Test from scipy.stats import anderson data = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] result = anderson ( data ) print ( 'stat= %.3f ' % ( result . statistic )) for i in range ( len ( result . critical_values )): sl , cv = result . significance_level [ i ], result . critical_values [ i ] if result . statistic < cv : print ( 'Probably Gaussian at the %.1f%% level' % ( sl )) else : print ( 'Probably not Gaussian at the %.1f%% level' % ( sl )) Sources scipy.stats.anderson Anderson-Darling test on Wikipedia","title":"Anderson-Darling Test"},{"location":"Cheat-Sheets/Hypothesis-Tests/#correlation-tests","text":"This section lists statistical tests that you can use to check if two samples are related.","title":"Correlation Tests"},{"location":"Cheat-Sheets/Hypothesis-Tests/#pearsons-correlation-coefficient","text":"Tests whether two samples have a linear relationship. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance. Interpretation H0: the two samples are independent. H1: there is a dependency between the samples. Python Code # Example of the Pearson's Correlation test from scipy.stats import pearsonr data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 0.353 , 3.517 , 0.125 , - 7.545 , - 0.555 , - 1.536 , 3.350 , - 1.578 , - 3.537 , - 1.579 ] stat , p = pearsonr ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably independent' ) else : print ( 'Probably dependent' ) Sources scipy.stats.pearsonr Pearson's correlation coefficient on Wikipedia","title":"Pearson\u2019s Correlation Coefficient"},{"location":"Cheat-Sheets/Hypothesis-Tests/#spearmans-rank-correlation","text":"Tests whether two samples have a monotonic relationship. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample can be ranked. Interpretation H0: the two samples are independent. H1: there is a dependency between the samples. Python Code # Example of the Spearman's Rank Correlation Test from scipy.stats import spearmanr data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 0.353 , 3.517 , 0.125 , - 7.545 , - 0.555 , - 1.536 , 3.350 , - 1.578 , - 3.537 , - 1.579 ] stat , p = spearmanr ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably independent' ) else : print ( 'Probably dependent' ) Sources scipy.stats.spearmanr Spearman's rank correlation coefficient on Wikipedia","title":"Spearman\u2019s Rank Correlation"},{"location":"Cheat-Sheets/Hypothesis-Tests/#kendalls-rank-correlation","text":"Tests whether two samples have a monotonic relationship. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample can be ranked. Interpretation H0: the two samples are independent. H1: there is a dependency between the samples. Python Code # Example of the Kendall's Rank Correlation Test from scipy.stats import kendalltau data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 0.353 , 3.517 , 0.125 , - 7.545 , - 0.555 , - 1.536 , 3.350 , - 1.578 , - 3.537 , - 1.579 ] stat , p = kendalltau ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably independent' ) else : print ( 'Probably dependent' ) Sources scipy.stats.kendalltau Kendall rank correlation coefficient on Wikipedia","title":"Kendall\u2019s Rank Correlation"},{"location":"Cheat-Sheets/Hypothesis-Tests/#chi-squared-test","text":"Tests whether two categorical variables are related or independent. Assumptions Observations used in the calculation of the contingency table are independent. 25 or more examples in each cell of the contingency table. Interpretation H0: the two samples are independent. H1: there is a dependency between the samples. Python Code # Example of the Chi-Squared Test from scipy.stats import chi2_contingency table = [[ 10 , 20 , 30 ],[ 6 , 9 , 17 ]] stat , p , dof , expected = chi2_contingency ( table ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably independent' ) else : print ( 'Probably dependent' ) Sources scipy.stats.chi2_contingency Chi-Squared test on Wikipedia","title":"Chi-Squared Test"},{"location":"Cheat-Sheets/Hypothesis-Tests/#stationary-tests","text":"This section lists statistical tests that you can use to check if a time series is stationary or not.","title":"Stationary Tests"},{"location":"Cheat-Sheets/Hypothesis-Tests/#augmented-dickey-fuller-unit-root-test","text":"Tests whether a time series has a unit root, e.g. has a trend or more generally is autoregressive. Assumptions Observations in are temporally ordered. Interpretation H0: a unit root is present (series is non-stationary). H1: a unit root is not present (series is stationary). Python Code # Example of the Augmented Dickey-Fuller unit root test from statsmodels.tsa.stattools import adfuller data = [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 ] stat , p , lags , obs , crit , t = adfuller ( data ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably not Stationary' ) else : print ( 'Probably Stationary' ) Sources statsmodels.tsa.stattools.adfuller API . Augmented Dickey--Fuller test, Wikipedia .","title":"Augmented Dickey-Fuller Unit Root Test"},{"location":"Cheat-Sheets/Hypothesis-Tests/#kwiatkowski-phillips-schmidt-shin","text":"Tests whether a time series is trend stationary or not. Assumptions Observations in are temporally ordered. Interpretation H0: the time series is trend-stationary. H1: the time series is not trend-stationary. Python Code # Example of the Kwiatkowski-Phillips-Schmidt-Shin test from statsmodels.tsa.stattools import kpss data = [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 ] stat , p , lags , crit = kpss ( data ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably Stationary' ) else : print ( 'Probably not Stationary' ) Sources statsmodels.tsa.stattools.kpss API . KPSS test, Wikipedia .","title":"Kwiatkowski-Phillips-Schmidt-Shin"},{"location":"Cheat-Sheets/Hypothesis-Tests/#parametric-statistical-hypothesis-tests","text":"This section lists statistical tests that you can use to compare data samples.","title":"Parametric Statistical Hypothesis Tests"},{"location":"Cheat-Sheets/Hypothesis-Tests/#students-t-test","text":"Tests whether the means of two independent samples are significantly different. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance. Interpretation H0: the means of the samples are equal. H1: the means of the samples are unequal. Python Code # Example of the Student's t-test from scipy.stats import ttest_ind data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 1.142 , - 0.432 , - 0.938 , - 0.729 , - 0.846 , - 0.157 , 0.500 , 1.183 , - 1.075 , - 0.169 ] stat , p = ttest_ind ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same distribution' ) else : print ( 'Probably different distributions' ) Sources scipy.stats.ttest_ind Student's t-test on Wikipedia","title":"Student\u2019s t-test"},{"location":"Cheat-Sheets/Hypothesis-Tests/#paired-students-t-test","text":"Tests whether the means of two independent samples are significantly different. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance. Observations across each sample are paired. Interpretation H0: the means of the samples are equal. H1: the means of the samples are unequal. Python Code # Example of the Paired Student's t-test from scipy.stats import ttest_rel data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 1.142 , - 0.432 , - 0.938 , - 0.729 , - 0.846 , - 0.157 , 0.500 , 1.183 , - 1.075 , - 0.169 ] stat , p = ttest_rel ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same distribution' ) else : print ( 'Probably different distributions' ) Sources scipy.stats.ttest_rel Student's t-test on Wikipedia","title":"Paired Student\u2019s t-test"},{"location":"Cheat-Sheets/Hypothesis-Tests/#analysis-of-variance-test-anova","text":"Tests whether the means of two or more independent samples are significantly different. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance. Interpretation H0: the means of the samples are equal. H1: the means of the samples are unequal. Python Code # Example of the Analysis of Variance Test from scipy.stats import f_oneway data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 1.142 , - 0.432 , - 0.938 , - 0.729 , - 0.846 , - 0.157 , 0.500 , 1.183 , - 1.075 , - 0.169 ] data3 = [ - 0.208 , 0.696 , 0.928 , - 1.148 , - 0.213 , 0.229 , 0.137 , 0.269 , - 0.870 , - 1.204 ] stat , p = f_oneway ( data1 , data2 , data3 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same distribution' ) else : print ( 'Probably different distributions' ) Sources scipy.stats.f_oneway Analysis of variance on Wikipedia","title":"Analysis of Variance Test (ANOVA)"},{"location":"Cheat-Sheets/Hypothesis-Tests/#repeated-measures-anova-test","text":"Tests whether the means of two or more paired samples are significantly different. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance. Observations across each sample are paired. Interpretation H0: the means of the samples are equal. H1: one or more of the means of the samples are unequal. Python Code # Currently not supported in Python. :( Sources Analysis of variance on Wikipedia","title":"Repeated Measures ANOVA Test"},{"location":"Cheat-Sheets/Hypothesis-Tests/#nonparametric-statistical-hypothesis-tests","text":"In Non-Parametric tests, we don't make any assumption about the parameters for the given population or the population we are studying. In fact, these tests don't depend on the population. Hence, there is no fixed set of parameters is available, and also there is no distribution (normal distribution, etc.)","title":"Nonparametric Statistical Hypothesis Tests"},{"location":"Cheat-Sheets/Hypothesis-Tests/#mann-whitney-u-test","text":"Tests whether the distributions of two independent samples are equal or not. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample can be ranked. Interpretation H0: the distributions of both samples are equal. H1: the distributions of both samples are not equal. Python Code # Example of the Mann-Whitney U Test from scipy.stats import mannwhitneyu data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 1.142 , - 0.432 , - 0.938 , - 0.729 , - 0.846 , - 0.157 , 0.500 , 1.183 , - 1.075 , - 0.169 ] stat , p = mannwhitneyu ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same distribution' ) else : print ( 'Probably different distributions' ) Sources scipy.stats.mannwhitneyu Mann-Whitney U test on Wikipedia","title":"Mann-Whitney U Test"},{"location":"Cheat-Sheets/Hypothesis-Tests/#wilcoxon-signed-rank-test","text":"Tests whether the distributions of two paired samples are equal or not. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample can be ranked. Observations across each sample are paired. Interpretation H0: the distributions of both samples are equal. H1: the distributions of both samples are not equal. Python Code # Example of the Wilcoxon Signed-Rank Test from scipy.stats import wilcoxon data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 1.142 , - 0.432 , - 0.938 , - 0.729 , - 0.846 , - 0.157 , 0.500 , 1.183 , - 1.075 , - 0.169 ] stat , p = wilcoxon ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same distribution' ) else : print ( 'Probably different distributions' ) Sources scipy.stats.wilcoxon Wilcoxon signed-rank test on Wikipedia","title":"Wilcoxon Signed-Rank Test"},{"location":"Cheat-Sheets/Hypothesis-Tests/#kruskal-wallis-h-test","text":"Tests whether the distributions of two or more independent samples are equal or not. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample can be ranked. Interpretation H0: the distributions of all samples are equal. H1: the distributions of one or more samples are not equal. Python Code # Example of the Kruskal-Wallis H Test from scipy.stats import kruskal data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 1.142 , - 0.432 , - 0.938 , - 0.729 , - 0.846 , - 0.157 , 0.500 , 1.183 , - 1.075 , - 0.169 ] stat , p = kruskal ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same distribution' ) else : print ( 'Probably different distributions' ) Sources scipy.stats.kruskal Kruskal-Wallis one-way analysis of variance on Wikipedia","title":"Kruskal-Wallis H Test"},{"location":"Cheat-Sheets/Hypothesis-Tests/#friedman-test","text":"Tests whether the distributions of two or more paired samples are equal or not. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample can be ranked. Observations across each sample are paired. Interpretation H0: the distributions of all samples are equal. H1: the distributions of one or more samples are not equal. Python Code # Example of the Friedman Test from scipy.stats import friedmanchisquare data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 1.142 , - 0.432 , - 0.938 , - 0.729 , - 0.846 , - 0.157 , 0.500 , 1.183 , - 1.075 , - 0.169 ] data3 = [ - 0.208 , 0.696 , 0.928 , - 1.148 , - 0.213 , 0.229 , 0.137 , 0.269 , - 0.870 , - 1.204 ] stat , p = friedmanchisquare ( data1 , data2 , data3 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same distribution' ) else : print ( 'Probably different distributions' ) Sources scipy.stats.friedmanchisquare Friedman test on Wikipedia","title":"Friedman Test"},{"location":"Cheat-Sheets/Hypothesis-Tests/#equality-of-variance-test","text":"Test is used to assess the equality of variance between two different samples.","title":"Equality of variance test"},{"location":"Cheat-Sheets/Hypothesis-Tests/#levenes-test","text":"Levene\u2019s test is used to assess the equality of variance between two or more different samples. Assumptions The samples from the populations under consideration are independent. The populations under consideration are approximately normally distributed. Interpretation H0: All the samples variances are equal H1: At least one variance is different from the rest Python Code # Example of the Levene's test from scipy.stats import levene a = [ 8.88 , 9.12 , 9.04 , 8.98 , 9.00 , 9.08 , 9.01 , 8.85 , 9.06 , 8.99 ] b = [ 8.88 , 8.95 , 9.29 , 9.44 , 9.15 , 9.58 , 8.36 , 9.18 , 8.67 , 9.05 ] c = [ 8.95 , 9.12 , 8.95 , 8.85 , 9.03 , 8.84 , 9.07 , 8.98 , 8.86 , 8.98 ] stat , p = levene ( a , b , c ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same variances' ) else : print ( 'Probably at least one variance is different from the rest' ) Sources scipy.stats.levene Levene's test on Wikipedia Source: https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/","title":"Levene's test"},{"location":"Cheat-Sheets/Keras/","text":"","title":"Keras"},{"location":"Cheat-Sheets/NumPy/","text":"","title":"NumPy"},{"location":"Cheat-Sheets/Pandas/","text":"","title":"Pandas"},{"location":"Cheat-Sheets/PySpark/","text":"","title":"PySpark"},{"location":"Cheat-Sheets/PyTorch/","text":"","title":"PyTorch"},{"location":"Cheat-Sheets/Python/","text":"","title":"Python"},{"location":"Cheat-Sheets/RegEx/","text":"","title":"Regular Expressions (RegEx)"},{"location":"Cheat-Sheets/SQL/","text":"","title":"SQL"},{"location":"Cheat-Sheets/Sk-learn/","text":"","title":"Scikit Learn"},{"location":"Cheat-Sheets/tensorflow/","text":"","title":"TensorFlow"},{"location":"Deploying-ML-models/deploying-ml-models/","text":"Home Introduction This is a completely open-source platform for maintaining curated list of interview questions and answers for people looking and preparing for data science opportunities. Not only this, the platform will also serve as one point destination for all your needs like tutorials, online materials, etc. This platform is maintained by you! \ud83e\udd17 You can help us by answering/ improving existing questions as well as by sharing any new questions that you faced during your interviews. Contribute to the platform Contribution in any form will be deeply appreciated. \ud83d\ude4f Add questions \u2753 Add your questions here . Please ensure to provide a detailed description to allow your fellow contributors to understand your questions and answer them to your satisfaction. \ud83e\udd1d Please note that as of now, you cannot directly add a question via a pull request. This will help us to maintain the quality of the content for you . Add answers/topics \ud83d\udcdd These are the answers/topics that need your help at the moment Add documentation for the project Online Material for Learning Suggested Learning Paths Cheat Sheets Django Flask Numpy Pandas PySpark Python RegEx SQL NLP Interview Questions Add python common DSA interview questions Add Major ML topics Linear Regression Logistic Regression SVM Random Forest Gradient boosting PCA Collaborative Filtering K-means clustering kNN ARIMA Neural Networks Decision Trees Overfitting, Underfitting Unbalanced, Skewed data Activation functions relu/ leaky relu Normalization DBSCAN Normal Distribution Precision, Recall Loss Function MAE, RMSE Add Pandas questions Add NumPy questions Add TensorFlow questions Add PyTorch questions Add list of learning resources Report/Solve Issues \ud83d\udd27 To report any issues find me on LinkedIn or raise an issue on GitHub. \ud83d\udee0 You can also solve existing issues on GitHub and create a pull request. Say Thanks \ud83d\ude0a If this platform helped you in any way, it would be great if you could share it with others. Check out this \ud83d\udc47 platform \ud83d\udc47 for data science content: \ud83d\udc49 https://singhsidhukuldeep.github.io/data-science-interview-prep/ \ud83d\udc48 #data-science #machine-learning #interview-preparation You can also star the repository on GitHub and watch-out for any updates Features \ud83c\udfa8 Beautiful: The design is built on top of most popular libraries like MkDocs and material which allows the platform to be responsive and to work on all sorts of devices \u2013 from mobile phones to wide-screens. The underlying fluid layout will always adapt perfectly to the available screen space. \ud83e\uddd0 Searchable: almost magically, all the content on the website is searchable without any further ado. The built-in search \u2013 server-less \u2013 is fast and accurate in responses to any of the queries. \ud83d\ude4c Accessible: Easy to use: \ud83d\udc4c The website is hosted on github-pages and is free and open to use to over 40 million users of GitHub in 100+ countries. Easy to contribute: \ud83e\udd1d The website embodies the concept of collaboration to the latter. Allowing anyone to add/improve the content. To make contributing easy, everything is written in MarkDown and then compiled to beautiful html. Setup No setup is required for usage of the platform Important: It is strongly advised to use virtual environment and not change anything in gh-pages Linux Systems python3 -m venv ./venv source venv/bin/activate pip3 install -r requirements.txt deactivate Windows Systems python3 -m venv ./venv venv \\S cripts \\a ctivate pip3 install -r requirements.txt venv \\S cripts \\d eactivate To install the latest pip3 install mkdocs pip3 install mkdocs-material Useful Commands mkdocs serve - Start the live-reloading docs server. mkdocs build - Build the documentation site. mkdocs -h - Print help message and exit. mkdocs gh-deploy - Use mkdocs gh-deploy --help to get a full list of options available for the gh-deploy command. Be aware that you will not be able to review the built site before it is pushed to GitHub. Therefore, you may want to verify any changes you make to the docs beforehand by using the build or serve commands and reviewing the built files locally. mkdocs new [dir-name] - Create a new project. No need to create a new project Useful Documents \ud83d\udcd1 MkDocs: https://github.com/mkdocs/mkdocs \ud83c\udfa8 Theme: https://github.com/squidfunk/mkdocs-material FAQ Can I filter questions based on companies? \ud83e\udd2a As much as this platform aims to help you with your interview preparation, it is not a short-cut to crack one. Think of this platform as a practicing field to help you sharpen your skills for your interview processes. However, for your convenience we have sorted all the questions by topics for you. \ud83e\udd13 This doesn't mean that such feature won't be added in the future. \"Never say Never\" But as of now there is neither plan nor data to do so. \ud83d\ude22 Why is this platform free? \ud83e\udd17 Currently there is no major cost involved in maintaining this platform other than time and effort that is put in by every contributor . If you want to help you can contribute here . If you still want to pay for something that is free, we would request you to donate it to a charity of your choice instead. \ud83d\ude07 Credits Maintained by \ud83d\udc68\u200d\ud83c\udf93 Kuldeep Singh Sidhu Github: github/singhsidhukuldeep https://github.com/singhsidhukuldeep Website: Kuldeep Singh Sidhu (Website) http://kuldeepsinghsidhu.com LinkedIn: Kuldeep Singh Sidhu (LinkedIn) https://www.linkedin.com/in/singhsidhukuldeep/ Contributors \ud83d\ude0e The full list of all the contributors is available here Current Status","title":"Production Deployment"},{"location":"Deploying-ML-models/deploying-ml-models/#home","text":"","title":"Home"},{"location":"Deploying-ML-models/deploying-ml-models/#introduction","text":"This is a completely open-source platform for maintaining curated list of interview questions and answers for people looking and preparing for data science opportunities. Not only this, the platform will also serve as one point destination for all your needs like tutorials, online materials, etc. This platform is maintained by you! \ud83e\udd17 You can help us by answering/ improving existing questions as well as by sharing any new questions that you faced during your interviews.","title":"Introduction"},{"location":"Deploying-ML-models/deploying-ml-models/#contribute-to-the-platform","text":"Contribution in any form will be deeply appreciated. \ud83d\ude4f","title":"Contribute to the platform"},{"location":"Deploying-ML-models/deploying-ml-models/#add-questions","text":"\u2753 Add your questions here . Please ensure to provide a detailed description to allow your fellow contributors to understand your questions and answer them to your satisfaction. \ud83e\udd1d Please note that as of now, you cannot directly add a question via a pull request. This will help us to maintain the quality of the content for you .","title":"Add questions"},{"location":"Deploying-ML-models/deploying-ml-models/#add-answerstopics","text":"\ud83d\udcdd These are the answers/topics that need your help at the moment Add documentation for the project Online Material for Learning Suggested Learning Paths Cheat Sheets Django Flask Numpy Pandas PySpark Python RegEx SQL NLP Interview Questions Add python common DSA interview questions Add Major ML topics Linear Regression Logistic Regression SVM Random Forest Gradient boosting PCA Collaborative Filtering K-means clustering kNN ARIMA Neural Networks Decision Trees Overfitting, Underfitting Unbalanced, Skewed data Activation functions relu/ leaky relu Normalization DBSCAN Normal Distribution Precision, Recall Loss Function MAE, RMSE Add Pandas questions Add NumPy questions Add TensorFlow questions Add PyTorch questions Add list of learning resources","title":"Add answers/topics"},{"location":"Deploying-ML-models/deploying-ml-models/#reportsolve-issues","text":"\ud83d\udd27 To report any issues find me on LinkedIn or raise an issue on GitHub. \ud83d\udee0 You can also solve existing issues on GitHub and create a pull request.","title":"Report/Solve Issues"},{"location":"Deploying-ML-models/deploying-ml-models/#say-thanks","text":"\ud83d\ude0a If this platform helped you in any way, it would be great if you could share it with others. Check out this \ud83d\udc47 platform \ud83d\udc47 for data science content: \ud83d\udc49 https://singhsidhukuldeep.github.io/data-science-interview-prep/ \ud83d\udc48 #data-science #machine-learning #interview-preparation You can also star the repository on GitHub and watch-out for any updates","title":"Say Thanks"},{"location":"Deploying-ML-models/deploying-ml-models/#features","text":"\ud83c\udfa8 Beautiful: The design is built on top of most popular libraries like MkDocs and material which allows the platform to be responsive and to work on all sorts of devices \u2013 from mobile phones to wide-screens. The underlying fluid layout will always adapt perfectly to the available screen space. \ud83e\uddd0 Searchable: almost magically, all the content on the website is searchable without any further ado. The built-in search \u2013 server-less \u2013 is fast and accurate in responses to any of the queries. \ud83d\ude4c Accessible: Easy to use: \ud83d\udc4c The website is hosted on github-pages and is free and open to use to over 40 million users of GitHub in 100+ countries. Easy to contribute: \ud83e\udd1d The website embodies the concept of collaboration to the latter. Allowing anyone to add/improve the content. To make contributing easy, everything is written in MarkDown and then compiled to beautiful html.","title":"Features"},{"location":"Deploying-ML-models/deploying-ml-models/#setup","text":"No setup is required for usage of the platform Important: It is strongly advised to use virtual environment and not change anything in gh-pages","title":"Setup"},{"location":"Deploying-ML-models/deploying-ml-models/#linux-systems","text":"python3 -m venv ./venv source venv/bin/activate pip3 install -r requirements.txt deactivate","title":"Linux Systems"},{"location":"Deploying-ML-models/deploying-ml-models/#windows-systems","text":"python3 -m venv ./venv venv \\S cripts \\a ctivate pip3 install -r requirements.txt venv \\S cripts \\d eactivate","title":"Windows Systems"},{"location":"Deploying-ML-models/deploying-ml-models/#to-install-the-latest","text":"pip3 install mkdocs pip3 install mkdocs-material","title":"To install the latest"},{"location":"Deploying-ML-models/deploying-ml-models/#useful-commands","text":"mkdocs serve - Start the live-reloading docs server. mkdocs build - Build the documentation site. mkdocs -h - Print help message and exit. mkdocs gh-deploy - Use mkdocs gh-deploy --help to get a full list of options available for the gh-deploy command. Be aware that you will not be able to review the built site before it is pushed to GitHub. Therefore, you may want to verify any changes you make to the docs beforehand by using the build or serve commands and reviewing the built files locally. mkdocs new [dir-name] - Create a new project. No need to create a new project","title":"Useful Commands"},{"location":"Deploying-ML-models/deploying-ml-models/#useful-documents","text":"\ud83d\udcd1 MkDocs: https://github.com/mkdocs/mkdocs \ud83c\udfa8 Theme: https://github.com/squidfunk/mkdocs-material","title":"Useful Documents"},{"location":"Deploying-ML-models/deploying-ml-models/#faq","text":"Can I filter questions based on companies? \ud83e\udd2a As much as this platform aims to help you with your interview preparation, it is not a short-cut to crack one. Think of this platform as a practicing field to help you sharpen your skills for your interview processes. However, for your convenience we have sorted all the questions by topics for you. \ud83e\udd13 This doesn't mean that such feature won't be added in the future. \"Never say Never\" But as of now there is neither plan nor data to do so. \ud83d\ude22 Why is this platform free? \ud83e\udd17 Currently there is no major cost involved in maintaining this platform other than time and effort that is put in by every contributor . If you want to help you can contribute here . If you still want to pay for something that is free, we would request you to donate it to a charity of your choice instead. \ud83d\ude07","title":"FAQ"},{"location":"Deploying-ML-models/deploying-ml-models/#credits","text":"","title":"Credits"},{"location":"Deploying-ML-models/deploying-ml-models/#maintained-by","text":"\ud83d\udc68\u200d\ud83c\udf93 Kuldeep Singh Sidhu Github: github/singhsidhukuldeep https://github.com/singhsidhukuldeep Website: Kuldeep Singh Sidhu (Website) http://kuldeepsinghsidhu.com LinkedIn: Kuldeep Singh Sidhu (LinkedIn) https://www.linkedin.com/in/singhsidhukuldeep/","title":"Maintained by"},{"location":"Deploying-ML-models/deploying-ml-models/#contributors","text":"\ud83d\ude0e The full list of all the contributors is available here","title":"Contributors"},{"location":"Deploying-ML-models/deploying-ml-models/#current-status","text":"","title":"Current Status"},{"location":"Interview-Questions/Natural-Language-Processing/","text":"NLP Interview Questions","title":"Natural Language Processing (NLP)"},{"location":"Interview-Questions/Natural-Language-Processing/#nlp-interview-questions","text":"","title":"NLP Interview Questions"},{"location":"Interview-Questions/Probability/","text":"Probability Interview Questions Average score on a dice role of at most 3 times Question Consider a fair 6-sided dice. Your aim is to get the highest score you can, in at-most 3 roles. A score is defined as the number that appears on the face of the dice facing up after the role. You can role at most 3 times but every time you role it is up to you to decide whether you want to role again. The last score will be counted as your final score. Find the average score if you rolled the dice only once? Find the average score that you can get with at most 3 roles? If the dice is fair, why is the average score for at most 3 roles and 1 role not the same? Answer If you role a fair dice once you can get: Score Probability 1 \u2159 2 \u2159 3 \u2159 4 \u2159 5 \u2159 6 \u2159 So your average score with one role is: sum of(score * scores's probability) = (1+2+3+4+5+6)*(\u2159) = (21/6) = 3.5 The average score if you rolled the dice only once is 3.5 For at most 3 roles, let's try back-tracking. Let's say just did your second role and you have to decide whether to do your 3 rd role! We just found out if we role dice once on average we can expect score of 3.5. So we will only role the 3 rd time if score on 2 nd role is less than 3.5 i.e (1,2 or 3) Possibilities 2 nd role score Probability 3 rd role score Probability 1 \u2159 3.5 \u2159 2 \u2159 3.5 \u2159 3 \u2159 3.5 \u2159 4 \u2159 NA We won't role 5 \u2159 NA 3 rd time if we 6 \u2159 NA get score >3 on 2 nd So if we had 2 roles, average score would be: [We role again if current score is less than 3.4] (3.5)*(1/6) + (3.5)*(1/6) + (3.5)*(1/6) + (4)*(1/6) + (5)*(1/6) + (6)*(1/6) [Decide not to role again] = 1.75 + 2.5 = 4.25 The average score if you rolled the dice twice is 4.25 So now if we look from the perspective of first role. We will only role again if our score is less than 4.25 i.e 1,2,3 or 4 Possibilities 1 st role score Probability 2 nd and 3 rd role score Probability 1 \u2159 4.25 \u2159 2 \u2159 4.25 \u2159 3 \u2159 4.25 \u2159 4 \u2159 4.25 \u2159 5 \u2159 NA We won't role again if we 6 \u2159 NA get score >4.25 on 1 st So if we had 3 roles, average score would be: [We role again if current score is less than 4.25] (4.25)*(1/6) + (4.25)*(1/6) + (4.25)*(1/6) + (4.25)*(1/6) + (5)*(1/6) + (6)*(1/6) [[Decide not to role again] = 17/6 + 11/6 = 4.66 The average score if you rolled the dice only once is 4.66 The average score for at most 3 roles and 1 role is not the same because although the dice is fair the event of rolling the dice is no longer independent . The scores would have been the same if we rolled the dice 2 nd and 3 rd time without considering what we got in the last roll i.e. if the event of rolling the dice was independent.","title":"Probability"},{"location":"Interview-Questions/Probability/#probability-interview-questions","text":"","title":"Probability Interview Questions"},{"location":"Interview-Questions/Probability/#average-score-on-a-dice-role-of-at-most-3-times","text":"Question Consider a fair 6-sided dice. Your aim is to get the highest score you can, in at-most 3 roles. A score is defined as the number that appears on the face of the dice facing up after the role. You can role at most 3 times but every time you role it is up to you to decide whether you want to role again. The last score will be counted as your final score. Find the average score if you rolled the dice only once? Find the average score that you can get with at most 3 roles? If the dice is fair, why is the average score for at most 3 roles and 1 role not the same? Answer If you role a fair dice once you can get: Score Probability 1 \u2159 2 \u2159 3 \u2159 4 \u2159 5 \u2159 6 \u2159 So your average score with one role is: sum of(score * scores's probability) = (1+2+3+4+5+6)*(\u2159) = (21/6) = 3.5 The average score if you rolled the dice only once is 3.5 For at most 3 roles, let's try back-tracking. Let's say just did your second role and you have to decide whether to do your 3 rd role! We just found out if we role dice once on average we can expect score of 3.5. So we will only role the 3 rd time if score on 2 nd role is less than 3.5 i.e (1,2 or 3) Possibilities 2 nd role score Probability 3 rd role score Probability 1 \u2159 3.5 \u2159 2 \u2159 3.5 \u2159 3 \u2159 3.5 \u2159 4 \u2159 NA We won't role 5 \u2159 NA 3 rd time if we 6 \u2159 NA get score >3 on 2 nd So if we had 2 roles, average score would be: [We role again if current score is less than 3.4] (3.5)*(1/6) + (3.5)*(1/6) + (3.5)*(1/6) + (4)*(1/6) + (5)*(1/6) + (6)*(1/6) [Decide not to role again] = 1.75 + 2.5 = 4.25 The average score if you rolled the dice twice is 4.25 So now if we look from the perspective of first role. We will only role again if our score is less than 4.25 i.e 1,2,3 or 4 Possibilities 1 st role score Probability 2 nd and 3 rd role score Probability 1 \u2159 4.25 \u2159 2 \u2159 4.25 \u2159 3 \u2159 4.25 \u2159 4 \u2159 4.25 \u2159 5 \u2159 NA We won't role again if we 6 \u2159 NA get score >4.25 on 1 st So if we had 3 roles, average score would be: [We role again if current score is less than 4.25] (4.25)*(1/6) + (4.25)*(1/6) + (4.25)*(1/6) + (4.25)*(1/6) + (5)*(1/6) + (6)*(1/6) [[Decide not to role again] = 17/6 + 11/6 = 4.66 The average score if you rolled the dice only once is 4.66 The average score for at most 3 roles and 1 role is not the same because although the dice is fair the event of rolling the dice is no longer independent . The scores would have been the same if we rolled the dice 2 nd and 3 rd time without considering what we got in the last roll i.e. if the event of rolling the dice was independent.","title":"Average score on a dice role of at most 3 times"},{"location":"Interview-Questions/System-design/","text":"System Design","title":"System Design"},{"location":"Interview-Questions/System-design/#system-design","text":"","title":"System Design"},{"location":"Interview-Questions/data-structures-algorithms/","text":"Data Structure and Algorithms (DSA) To-do Add https://leetcode.com/discuss/interview-question/344650/Amazon-Online-Assessment-Questions \ud83d\ude01 Easy Two Number Sum Write a function that takes in a non-empty array of distinct integers and an integer representing a target sum. If any two numbers in the input array sum up to the target sum, the function should return them in an array, in any order. If no two numbers sum up to the target sum, the function should return an empty array. # O(n) time | O(n) space def twoNumberSum ( array , targetSum ): avail = set () for i , v in enumerate ( array ): if targetSum - v in avail : return [ targetSum - v , v ] else : avail . add ( v ) return [] pass # O(nlog(n)) time | O(1) space def twoNumberSum ( array , targetSum ): array . sort () n = len ( array ) left = 0 right = n - 1 while left < right : currSum = array [ left ] + array [ right ] if currSum == targetSum : return [ array [ left ], array [ right ]] elif currSum < targetSum : left += 1 elif currSum > targetSum : right -= 1 return [] pass # O(n^2) time | O(1) space def twoNumberSum ( array , targetSum ): n = len ( array ) for i in range ( n - 1 ): for j in range ( i + 1 , n ): if array [ i ] + array [ j ] == targetSum : return [ array [ i ], array [ j ]] return [] pass Validate Subsequence Given two non-empty arrays of integers, write a function that determines whether the second array is a subsequence of the first one. A subsequence of an array is a set of numbers that aren't necessarily adjacent in the array but that are in the same order as they appear in the array. For instance, the numbers [1, 3, 4] form a subsequence of the array [1, 2, 3, 4] , and so do the numbers [2, 4]. Note that a single number in an array and the array itself are both valid subsequences of the array. # O(n) time | O(1) space - where n is the length of the array def isValidSubsequence ( array , sequence ): pArray = pSequence = 0 while pArray < len ( array ) and pSequence < len ( sequence ): if array [ pArray ] == sequence [ pSequence ]: pArray += 1 pSequence += 1 else : pArray += 1 return pSequence == len ( sequence ) pass Nth Fibonacci The Fibonacci sequence is defined as follows: Any number in the sequence is the sum of the previous 2. for fib[n] = fib[n-1] + fib[n-2] The 1 st and 2 nd are fixed at 0,1 Find the nth Nth Fibonacci sequence # O(n) time | O(n) space def getNthFib ( n ): dp = [ 0 , 1 ] while len ( dp ) < n : dp . append ( dp [ - 1 ] + dp [ - 2 ]) return dp [ n - 1 ] pass # O(n) time | O(1) space def getNthFib ( n ): last_two = [ 0 , 1 ] count = 2 while count < n : currFib = last_two [ 0 ] + last_two [ 1 ] last_two [ 0 ] = last_two [ 1 ] last_two [ 1 ] = currFib count += 1 return last_two [ 1 ] if n > 1 else last_two [ 0 ] pass Product Sum Write a function that takes in a \"special\" array and returns its product sum. A \"special\" array is a non-empty array that contains either integers or other \"special\" arrays. The product sum of a \"special\" array is the sum of its elements, where \"special\" arrays inside it are summed themselves and then multiplied by their level of depth. For example, the product sum of [x, y] is x + y ; the product sum of [x, [y, z]] is x + 2y + 2z Eg: Input: [5, 2, [7, -1], 3, [6, [-13, 8], 4]] Output: 12 # calculated as: 5 + 2 + 2 * (7 - 1) + 3 + 2 * (6 + 3 * (-13 + 8) + 4) # O(n) time | O(d) space - where n is the total number of elements in the array, # including sub-elements, and d is the greatest depth of \"special\" arrays in the array def productSum ( array , depth = 1 ): sum = 0 for i , v in enumerate ( array ): if type ( v ) is list : sum += productSum ( v , depth + 1 ) else : sum += v return sum * depth pass \ud83d\ude42 Medium Top K Frequent Words Given a non-empty list of words, return the k most frequent elements. Your answer should be sorted by frequency from highest to lowest. If two words have the same frequency, then the word with the lower alphabetical order comes first. Example 1: Input: [\"i\", \"love\", \"leetcode\", \"i\", \"love\", \"coding\"], k = 2 Output: [\"i\", \"love\"] Explanation: \"i\" and \"love\" are the two most frequent words. Note that \"i\" comes before \"love\" due to a lower alphabetical order. Example 2: Input: [\"the\", \"day\", \"is\", \"sunny\", \"the\", \"the\", \"the\", \"sunny\", \"is\", \"is\"], k = 4 Output: [\"the\", \"is\", \"sunny\", \"day\"] Explanation: \"the\", \"is\", \"sunny\" and \"day\" are the four most frequent words, with the number of occurrence being 4, 3, 2 and 1 respectively. Note: You may assume k is always valid, 1 \u2264 k \u2264 number of unique elements. Input words contain only lowercase letters. Follow up: Try to solve it in O ( n log k ) time and O ( n ) extra space. # Count the frequency of each word, and # sort the words with a custom ordering relation # that uses these frequencies. Then take the best k of them. # Time Complexity: O(N \\log{N})O(NlogN), where NN is the length of words. # We count the frequency of each word in O(N)O(N) time, # then we sort the given words in O(N \\log{N})O(NlogN) time. # Space Complexity: O(N)O(N), the space used to store our uniqueWords. def topKFrequentWords ( words , k ) -> List [ str ]: from collections import Counter wordsFreq = Counter ( words ) uniqueWords = list ( wordsFreq . keys ()) uniqueWords . sort ( key = lambda x : ( - wordsFreq [ x ], x )) return uniqueWords [: k ] # Time Complexity: O(N \\log{k})O(Nlogk), where NN is the length of words. # We count the frequency of each word in O(N)O(N) time, then we add NN words to the heap, # each in O(\\log {k})O(logk) time. Finally, we pop from the heap up to kk times. # As k \\leq Nk\u2264N, this is O(N \\log{k})O(Nlogk) in total. # In Python, we improve this to O(N + k \\log {N})O(N+klogN): our heapq.heapify operation and # counting operations are O(N)O(N), and # each of kk heapq.heappop operations are O(\\log {N})O(logN). # Space Complexity: O(N)O(N), the space used to store our wordsFreq. # Count the frequency of each word, then add it to heap that stores the best k candidates. # Here, \"best\" is defined with our custom ordering relation, # which puts the worst candidates at the top of the heap. # At the end, we pop off the heap up to k times and reverse the result # so that the best candidates are first. # In Python, we instead use heapq.heapify, which can turn a list into a heap in linear time, # simplifying our work. def topKFrequentWords ( words , k ) -> List [ str ]: from heapq import heapify , heappop #, heappush from collections import Counter wordsFreq = Counter ( words ) heap = [( - freq , word ) for word , freq in wordsFreq . items ()] heapq . heapify ( heap ) return [ heapq . heappop ( heap )[ 1 ] for _ in range ( k )] \ud83e\udd28 Hard \ud83d\ude32 Very Hard","title":"DSA (Data Structures & Algorithms)"},{"location":"Interview-Questions/data-structures-algorithms/#data-structure-and-algorithms-dsa","text":"","title":"Data Structure and Algorithms (DSA)"},{"location":"Interview-Questions/data-structures-algorithms/#to-do","text":"Add https://leetcode.com/discuss/interview-question/344650/Amazon-Online-Assessment-Questions","title":"To-do"},{"location":"Interview-Questions/data-structures-algorithms/#easy","text":"","title":"\ud83d\ude01 Easy"},{"location":"Interview-Questions/data-structures-algorithms/#two-number-sum","text":"Write a function that takes in a non-empty array of distinct integers and an integer representing a target sum. If any two numbers in the input array sum up to the target sum, the function should return them in an array, in any order. If no two numbers sum up to the target sum, the function should return an empty array. # O(n) time | O(n) space def twoNumberSum ( array , targetSum ): avail = set () for i , v in enumerate ( array ): if targetSum - v in avail : return [ targetSum - v , v ] else : avail . add ( v ) return [] pass # O(nlog(n)) time | O(1) space def twoNumberSum ( array , targetSum ): array . sort () n = len ( array ) left = 0 right = n - 1 while left < right : currSum = array [ left ] + array [ right ] if currSum == targetSum : return [ array [ left ], array [ right ]] elif currSum < targetSum : left += 1 elif currSum > targetSum : right -= 1 return [] pass # O(n^2) time | O(1) space def twoNumberSum ( array , targetSum ): n = len ( array ) for i in range ( n - 1 ): for j in range ( i + 1 , n ): if array [ i ] + array [ j ] == targetSum : return [ array [ i ], array [ j ]] return [] pass","title":"Two Number Sum"},{"location":"Interview-Questions/data-structures-algorithms/#validate-subsequence","text":"Given two non-empty arrays of integers, write a function that determines whether the second array is a subsequence of the first one. A subsequence of an array is a set of numbers that aren't necessarily adjacent in the array but that are in the same order as they appear in the array. For instance, the numbers [1, 3, 4] form a subsequence of the array [1, 2, 3, 4] , and so do the numbers [2, 4]. Note that a single number in an array and the array itself are both valid subsequences of the array. # O(n) time | O(1) space - where n is the length of the array def isValidSubsequence ( array , sequence ): pArray = pSequence = 0 while pArray < len ( array ) and pSequence < len ( sequence ): if array [ pArray ] == sequence [ pSequence ]: pArray += 1 pSequence += 1 else : pArray += 1 return pSequence == len ( sequence ) pass","title":"Validate Subsequence"},{"location":"Interview-Questions/data-structures-algorithms/#nth-fibonacci","text":"The Fibonacci sequence is defined as follows: Any number in the sequence is the sum of the previous 2. for fib[n] = fib[n-1] + fib[n-2] The 1 st and 2 nd are fixed at 0,1 Find the nth Nth Fibonacci sequence # O(n) time | O(n) space def getNthFib ( n ): dp = [ 0 , 1 ] while len ( dp ) < n : dp . append ( dp [ - 1 ] + dp [ - 2 ]) return dp [ n - 1 ] pass # O(n) time | O(1) space def getNthFib ( n ): last_two = [ 0 , 1 ] count = 2 while count < n : currFib = last_two [ 0 ] + last_two [ 1 ] last_two [ 0 ] = last_two [ 1 ] last_two [ 1 ] = currFib count += 1 return last_two [ 1 ] if n > 1 else last_two [ 0 ] pass","title":"Nth Fibonacci"},{"location":"Interview-Questions/data-structures-algorithms/#product-sum","text":"Write a function that takes in a \"special\" array and returns its product sum. A \"special\" array is a non-empty array that contains either integers or other \"special\" arrays. The product sum of a \"special\" array is the sum of its elements, where \"special\" arrays inside it are summed themselves and then multiplied by their level of depth. For example, the product sum of [x, y] is x + y ; the product sum of [x, [y, z]] is x + 2y + 2z Eg: Input: [5, 2, [7, -1], 3, [6, [-13, 8], 4]] Output: 12 # calculated as: 5 + 2 + 2 * (7 - 1) + 3 + 2 * (6 + 3 * (-13 + 8) + 4) # O(n) time | O(d) space - where n is the total number of elements in the array, # including sub-elements, and d is the greatest depth of \"special\" arrays in the array def productSum ( array , depth = 1 ): sum = 0 for i , v in enumerate ( array ): if type ( v ) is list : sum += productSum ( v , depth + 1 ) else : sum += v return sum * depth pass","title":"Product Sum"},{"location":"Interview-Questions/data-structures-algorithms/#medium","text":"","title":"\ud83d\ude42 Medium"},{"location":"Interview-Questions/data-structures-algorithms/#top-k-frequent-words","text":"Given a non-empty list of words, return the k most frequent elements. Your answer should be sorted by frequency from highest to lowest. If two words have the same frequency, then the word with the lower alphabetical order comes first. Example 1: Input: [\"i\", \"love\", \"leetcode\", \"i\", \"love\", \"coding\"], k = 2 Output: [\"i\", \"love\"] Explanation: \"i\" and \"love\" are the two most frequent words. Note that \"i\" comes before \"love\" due to a lower alphabetical order. Example 2: Input: [\"the\", \"day\", \"is\", \"sunny\", \"the\", \"the\", \"the\", \"sunny\", \"is\", \"is\"], k = 4 Output: [\"the\", \"is\", \"sunny\", \"day\"] Explanation: \"the\", \"is\", \"sunny\" and \"day\" are the four most frequent words, with the number of occurrence being 4, 3, 2 and 1 respectively. Note: You may assume k is always valid, 1 \u2264 k \u2264 number of unique elements. Input words contain only lowercase letters. Follow up: Try to solve it in O ( n log k ) time and O ( n ) extra space. # Count the frequency of each word, and # sort the words with a custom ordering relation # that uses these frequencies. Then take the best k of them. # Time Complexity: O(N \\log{N})O(NlogN), where NN is the length of words. # We count the frequency of each word in O(N)O(N) time, # then we sort the given words in O(N \\log{N})O(NlogN) time. # Space Complexity: O(N)O(N), the space used to store our uniqueWords. def topKFrequentWords ( words , k ) -> List [ str ]: from collections import Counter wordsFreq = Counter ( words ) uniqueWords = list ( wordsFreq . keys ()) uniqueWords . sort ( key = lambda x : ( - wordsFreq [ x ], x )) return uniqueWords [: k ] # Time Complexity: O(N \\log{k})O(Nlogk), where NN is the length of words. # We count the frequency of each word in O(N)O(N) time, then we add NN words to the heap, # each in O(\\log {k})O(logk) time. Finally, we pop from the heap up to kk times. # As k \\leq Nk\u2264N, this is O(N \\log{k})O(Nlogk) in total. # In Python, we improve this to O(N + k \\log {N})O(N+klogN): our heapq.heapify operation and # counting operations are O(N)O(N), and # each of kk heapq.heappop operations are O(\\log {N})O(logN). # Space Complexity: O(N)O(N), the space used to store our wordsFreq. # Count the frequency of each word, then add it to heap that stores the best k candidates. # Here, \"best\" is defined with our custom ordering relation, # which puts the worst candidates at the top of the heap. # At the end, we pop off the heap up to k times and reverse the result # so that the best candidates are first. # In Python, we instead use heapq.heapify, which can turn a list into a heap in linear time, # simplifying our work. def topKFrequentWords ( words , k ) -> List [ str ]: from heapq import heapify , heappop #, heappush from collections import Counter wordsFreq = Counter ( words ) heap = [( - freq , word ) for word , freq in wordsFreq . items ()] heapq . heapify ( heap ) return [ heapq . heappop ( heap )[ 1 ] for _ in range ( k )]","title":"Top K Frequent Words"},{"location":"Interview-Questions/data-structures-algorithms/#hard","text":"","title":"\ud83e\udd28 Hard"},{"location":"Interview-Questions/data-structures-algorithms/#very-hard","text":"","title":"\ud83d\ude32 Very Hard"},{"location":"Machine-Learning/ARIMA/","text":"","title":"ARIMA"},{"location":"Machine-Learning/Activation%20functions/","text":"","title":"Activation functions"},{"location":"Machine-Learning/Collaborative%20Filtering/","text":"","title":"Collaborative Filtering"},{"location":"Machine-Learning/Confusion%20Matrix/","text":"","title":"Confusion Matrix"},{"location":"Machine-Learning/DBSCAN/","text":"","title":"DBSCAN"},{"location":"Machine-Learning/Decision%20Trees/","text":"","title":"Decision Trees"},{"location":"Machine-Learning/Gradient%20Boosting/","text":"","title":"Gradient Boosting"},{"location":"Machine-Learning/K-means%20clustering/","text":"","title":"K means clustering"},{"location":"Machine-Learning/Linear%20Regression/","text":"","title":"Linear Regression"},{"location":"Machine-Learning/Logistic%20Regression/","text":"","title":"Logistic Regression"},{"location":"Machine-Learning/Loss%20Function%20MAE%2C%20RMSE/","text":"","title":"Loss Function MAE, RMSE"},{"location":"Machine-Learning/Neural%20Networks/","text":"","title":"Neural Networks"},{"location":"Machine-Learning/Normal%20Distribution/","text":"","title":"Normal Distribution"},{"location":"Machine-Learning/Normalization%20Regularisation/","text":"","title":"Normalization Regularisation"},{"location":"Machine-Learning/Overfitting%2C%20Underfitting/","text":"","title":"Overfitting, Underfitting"},{"location":"Machine-Learning/PCA/","text":"","title":"PCA"},{"location":"Machine-Learning/Random%20Forest/","text":"","title":"Random Forest"},{"location":"Machine-Learning/Support%20Vector%20Machines/","text":"","title":"Support Vector Machines"},{"location":"Machine-Learning/Unbalanced%2C%20Skewed%20data/","text":"","title":"Unbalanced, Skewed data"},{"location":"Machine-Learning/kNN/","text":"","title":"kNN"},{"location":"Online-Material/Online-Material-for-Learning/","text":"","title":"Online Study Material"},{"location":"Online-Material/popular-resouces/","text":"","title":"Popular Blogs"},{"location":"as-fast-as-possible/","text":"","title":"Introduction"},{"location":"as-fast-as-possible/Deep-CV/","text":"","title":"Deep Computer Vision"},{"location":"as-fast-as-possible/Deep-NLP/","text":"","title":"Deep Natural Language Processing"},{"location":"as-fast-as-possible/Neural-Networks/","text":"","title":"Neural Networks"},{"location":"as-fast-as-possible/TF2-Keras/","text":"","title":"Tensorflow 2 with Keras"}]}
\ No newline at end of file
diff --git a/sitemap.xml b/sitemap.xml
index c94a470..1816fcf 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -2,237 +2,242 @@
None
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12dailyNone
- 2022-08-03
+ 2023-04-12
+ daily
+
+
+ None
+ 2023-04-12daily
\ No newline at end of file
diff --git a/sitemap.xml.gz b/sitemap.xml.gz
index 2f670fe..4ea5314 100644
Binary files a/sitemap.xml.gz and b/sitemap.xml.gz differ