diff --git a/404.html b/404.html index 2d85b6e..bfcde75 100644 --- a/404.html +++ b/404.html @@ -1 +1 @@ - Data Science Interview preparation
\ No newline at end of file + Data Science Interview preparation
\ No newline at end of file diff --git a/Cheat-Sheets/Django/index.html b/Cheat-Sheets/Django/index.html index 3eda98e..e4ba176 100644 --- a/Cheat-Sheets/Django/index.html +++ b/Cheat-Sheets/Django/index.html @@ -1 +1 @@ - Django - Data Science Interview preparation
\ No newline at end of file + Django - Data Science Interview preparation
\ No newline at end of file diff --git a/Cheat-Sheets/Flask/index.html b/Cheat-Sheets/Flask/index.html index 63677e2..d484c85 100644 --- a/Cheat-Sheets/Flask/index.html +++ b/Cheat-Sheets/Flask/index.html @@ -1 +1 @@ - Flask - Data Science Interview preparation
\ No newline at end of file + Flask - Data Science Interview preparation
\ No newline at end of file diff --git a/Cheat-Sheets/Hypothesis-Tests/index.html b/Cheat-Sheets/Hypothesis-Tests/index.html new file mode 100644 index 0000000..f592682 --- /dev/null +++ b/Cheat-Sheets/Hypothesis-Tests/index.html @@ -0,0 +1,171 @@ + Hypothesis Tests in Python (Cheat Sheet) - Data Science Interview preparation
Skip to content

Hypothesis Tests in Python

statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters.

Few Notes:

  • When it comes to assumptions such as the expected distribution of data or sample size, the results of a given test are likely to degrade gracefully rather than become immediately unusable if an assumption is violated.
  • Generally, data samples need to be representative of the domain and large enough to expose their distribution to analysis.
  • In some cases, the data can be corrected to meet the assumptions, such as correcting a nearly normal distribution to be normal by removing outliers, or using a correction to the degrees of freedom in a statistical test when samples have differing variance, to name two examples.

Normality Tests

This section lists statistical tests that you can use to check if your data has a Gaussian distribution.

Gaussian distribution (also known as normal distribution) is a bell-shaped curve.

Shapiro-Wilk Test

Tests whether a data sample has a Gaussian distribution/Normal distribution.

  • Assumptions

    • Observations in each sample are independent and identically distributed (iid).
  • Interpretation

    • H0: the sample has a Gaussian distribution.
    • H1: the sample does not have a Gaussian distribution.
  • Python Code

    # Example of the Shapiro-Wilk Normality Test
    +from scipy.stats import shapiro
    +data = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
    +stat, p = shapiro(data)
    +print('stat=%.3f, p=%.3f' % (stat, p))
    +if p > 0.05:
    +    print('Probably Gaussian')
    +else:
    +    print('Probably not Gaussian')
    +
  • Sources

D’Agostino’s K^2 Test

Tests whether a data sample has a Gaussian distribution/Normal distribution.

  • Assumptions

    • Observations in each sample are independent and identically distributed (iid).
  • Interpretation

    • H0: the sample has a Gaussian distribution.
    • H1: the sample does not have a Gaussian distribution.
  • Python Code

    # Example of the D'Agostino's K^2 Normality Test
    +from scipy.stats import normaltest
    +data = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
    +stat, p = normaltest(data)
    +print('stat=%.3f, p=%.3f' % (stat, p))
    +if p > 0.05:
    +    print('Probably Gaussian')
    +else:
    +    print('Probably not Gaussian')
    +
  • Sources

Anderson-Darling Test

Tests whether a data sample has a Gaussian distribution/Normal distribution.

  • Assumptions

    • Observations in each sample are independent and identically distributed (iid).
  • Interpretation

    • H0: the sample has a Gaussian distribution.
    • H1: the sample does not have a Gaussian distribution.
  • Python Code

    # Example of the Anderson-Darling Normality Test
    +from scipy.stats import anderson
    +data = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
    +result = anderson(data)
    +print('stat=%.3f' % (result.statistic))
    +for i in range(len(result.critical_values)):
    +    sl, cv = result.significance_level[i], result.critical_values[i]
    +    if result.statistic < cv:
    +        print('Probably Gaussian at the %.1f%% level' % (sl))
    +    else:
    +        print('Probably not Gaussian at the %.1f%% level' % (sl))
    +
  • Sources

Correlation Tests

This section lists statistical tests that you can use to check if two samples are related.

Pearson’s Correlation Coefficient

Tests whether two samples have a linear relationship.

  • Assumptions

    • Observations in each sample are independent and identically distributed (iid).
    • Observations in each sample are normally distributed.
    • Observations in each sample have the same variance.
  • Interpretation

    • H0: the two samples are independent.
    • H1: there is a dependency between the samples.
  • Python Code

    # Example of the Pearson's Correlation test
    +from scipy.stats import pearsonr
    +data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
    +data2 = [0.353, 3.517, 0.125, -7.545, -0.555, -1.536, 3.350, -1.578, -3.537, -1.579]
    +stat, p = pearsonr(data1, data2)
    +print('stat=%.3f, p=%.3f' % (stat, p))
    +if p > 0.05:
    +    print('Probably independent')
    +else:
    +    print('Probably dependent')
    +
  • Sources

Spearman’s Rank Correlation

Tests whether two samples have a monotonic relationship.

  • Assumptions

    • Observations in each sample are independent and identically distributed (iid).
    • Observations in each sample can be ranked.
  • Interpretation

    • H0: the two samples are independent.
    • H1: there is a dependency between the samples.
  • Python Code

    # Example of the Spearman's Rank Correlation Test
    +from scipy.stats import spearmanr
    +data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
    +data2 = [0.353, 3.517, 0.125, -7.545, -0.555, -1.536, 3.350, -1.578, -3.537, -1.579]
    +stat, p = spearmanr(data1, data2)
    +print('stat=%.3f, p=%.3f' % (stat, p))
    +if p > 0.05:
    +    print('Probably independent')
    +else:
    +    print('Probably dependent')
    +
  • Sources

Kendall’s Rank Correlation

Tests whether two samples have a monotonic relationship.

  • Assumptions

    • Observations in each sample are independent and identically distributed (iid).
    • Observations in each sample can be ranked.
  • Interpretation

    • H0: the two samples are independent.
    • H1: there is a dependency between the samples.
  • Python Code

    # Example of the Kendall's Rank Correlation Test
    +from scipy.stats import kendalltau
    +data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
    +data2 = [0.353, 3.517, 0.125, -7.545, -0.555, -1.536, 3.350, -1.578, -3.537, -1.579]
    +stat, p = kendalltau(data1, data2)
    +print('stat=%.3f, p=%.3f' % (stat, p))
    +if p > 0.05:
    +    print('Probably independent')
    +else:
    +    print('Probably dependent')
    +
  • Sources

Chi-Squared Test

Tests whether two categorical variables are related or independent.

  • Assumptions

    • Observations used in the calculation of the contingency table are independent.
    • 25 or more examples in each cell of the contingency table.
  • Interpretation

    • H0: the two samples are independent.
    • H1: there is a dependency between the samples.
  • Python Code

    # Example of the Chi-Squared Test
    +from scipy.stats import chi2_contingency
    +table = [[10, 20, 30],[6,  9,  17]]
    +stat, p, dof, expected = chi2_contingency(table)
    +print('stat=%.3f, p=%.3f' % (stat, p))
    +if p > 0.05:
    +    print('Probably independent')
    +else:
    +    print('Probably dependent')
    +
  • Sources

Stationary Tests

This section lists statistical tests that you can use to check if a time series is stationary or not.

Augmented Dickey-Fuller Unit Root Test

Tests whether a time series has a unit root, e.g. has a trend or more generally is autoregressive.

  • Assumptions

    • Observations in are temporally ordered.
  • Interpretation

    • H0: a unit root is present (series is non-stationary).
    • H1: a unit root is not present (series is stationary).
  • Python Code

    # Example of the Augmented Dickey-Fuller unit root test
    +from statsmodels.tsa.stattools import adfuller
    +data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    +stat, p, lags, obs, crit, t = adfuller(data)
    +print('stat=%.3f, p=%.3f' % (stat, p))
    +if p > 0.05:
    +    print('Probably not Stationary')
    +else:
    +    print('Probably Stationary')
    +
  • Sources

Kwiatkowski-Phillips-Schmidt-Shin

Tests whether a time series is trend stationary or not.

  • Assumptions

    • Observations in are temporally ordered.
  • Interpretation

    • H0: the time series is trend-stationary.
    • H1: the time series is not trend-stationary.
  • Python Code

    # Example of the Kwiatkowski-Phillips-Schmidt-Shin test
    +from statsmodels.tsa.stattools import kpss
    +data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    +stat, p, lags, crit = kpss(data)
    +print('stat=%.3f, p=%.3f' % (stat, p))
    +if p > 0.05:
    +    print('Probably Stationary')
    +else:
    +    print('Probably not Stationary')
    +
  • Sources

Parametric Statistical Hypothesis Tests

This section lists statistical tests that you can use to compare data samples.

Student’s t-test

Tests whether the means of two independent samples are significantly different.

  • Assumptions

    • Observations in each sample are independent and identically distributed (iid).
    • Observations in each sample are normally distributed.
    • Observations in each sample have the same variance.
  • Interpretation

    • H0: the means of the samples are equal.
    • H1: the means of the samples are unequal.
  • Python Code

    # Example of the Student's t-test
    +from scipy.stats import ttest_ind
    +data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
    +data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
    +stat, p = ttest_ind(data1, data2)
    +print('stat=%.3f, p=%.3f' % (stat, p))
    +if p > 0.05:
    +    print('Probably the same distribution')
    +else:
    +    print('Probably different distributions')
    +
  • Sources

Paired Student’s t-test

Tests whether the means of two independent samples are significantly different.

  • Assumptions

    • Observations in each sample are independent and identically distributed (iid).
    • Observations in each sample are normally distributed.
    • Observations in each sample have the same variance.
    • Observations across each sample are paired.
  • Interpretation

    • H0: the means of the samples are equal.
    • H1: the means of the samples are unequal.
  • Python Code

    # Example of the Paired Student's t-test
    +from scipy.stats import ttest_rel
    +data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
    +data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
    +stat, p = ttest_rel(data1, data2)
    +print('stat=%.3f, p=%.3f' % (stat, p))
    +if p > 0.05:
    +    print('Probably the same distribution')
    +else:
    +    print('Probably different distributions')
    +
  • Sources

Analysis of Variance Test (ANOVA)

Tests whether the means of two or more independent samples are significantly different.

  • Assumptions

    • Observations in each sample are independent and identically distributed (iid).
    • Observations in each sample are normally distributed.
    • Observations in each sample have the same variance.
  • Interpretation

    • H0: the means of the samples are equal.
    • H1: the means of the samples are unequal.
  • Python Code

    # Example of the Analysis of Variance Test
    +from scipy.stats import f_oneway
    +data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
    +data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
    +data3 = [-0.208, 0.696, 0.928, -1.148, -0.213, 0.229, 0.137, 0.269, -0.870, -1.204]
    +stat, p = f_oneway(data1, data2, data3)
    +print('stat=%.3f, p=%.3f' % (stat, p))
    +if p > 0.05:
    +    print('Probably the same distribution')
    +else:
    +    print('Probably different distributions')
    +
  • Sources

Repeated Measures ANOVA Test

Tests whether the means of two or more paired samples are significantly different.

  • Assumptions

    • Observations in each sample are independent and identically distributed (iid).
    • Observations in each sample are normally distributed.
    • Observations in each sample have the same variance.
    • Observations across each sample are paired.
  • Interpretation

    • H0: the means of the samples are equal.
    • H1: one or more of the means of the samples are unequal.
  • Python Code

    # Currently not supported in Python. :(
    +
  • Sources

Nonparametric Statistical Hypothesis Tests

In Non-Parametric tests, we don't make any assumption about the parameters for the given population or the population we are studying. In fact, these tests don't depend on the population. Hence, there is no fixed set of parameters is available, and also there is no distribution (normal distribution, etc.)

Mann-Whitney U Test

Tests whether the distributions of two independent samples are equal or not.

  • Assumptions

    • Observations in each sample are independent and identically distributed (iid).
    • Observations in each sample can be ranked.
  • Interpretation

    • H0: the distributions of both samples are equal.
    • H1: the distributions of both samples are not equal.
  • Python Code

    # Example of the Mann-Whitney U Test
    +from scipy.stats import mannwhitneyu
    +data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
    +data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
    +stat, p = mannwhitneyu(data1, data2)
    +print('stat=%.3f, p=%.3f' % (stat, p))
    +if p > 0.05:
    +    print('Probably the same distribution')
    +else:
    +    print('Probably different distributions')
    +
  • Sources

Wilcoxon Signed-Rank Test

Tests whether the distributions of two paired samples are equal or not.

  • Assumptions

    • Observations in each sample are independent and identically distributed (iid).
    • Observations in each sample can be ranked.
    • Observations across each sample are paired.
  • Interpretation

    • H0: the distributions of both samples are equal.
    • H1: the distributions of both samples are not equal.
  • Python Code

    # Example of the Wilcoxon Signed-Rank Test
    +from scipy.stats import wilcoxon
    +data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
    +data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
    +stat, p = wilcoxon(data1, data2)
    +print('stat=%.3f, p=%.3f' % (stat, p))
    +if p > 0.05:
    +    print('Probably the same distribution')
    +else:
    +    print('Probably different distributions')
    +
  • Sources

Kruskal-Wallis H Test

Tests whether the distributions of two or more independent samples are equal or not.

  • Assumptions

    • Observations in each sample are independent and identically distributed (iid).
    • Observations in each sample can be ranked.
  • Interpretation

    • H0: the distributions of all samples are equal.
    • H1: the distributions of one or more samples are not equal.
  • Python Code

    # Example of the Kruskal-Wallis H Test
    +from scipy.stats import kruskal
    +data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
    +data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
    +stat, p = kruskal(data1, data2)
    +print('stat=%.3f, p=%.3f' % (stat, p))
    +if p > 0.05:
    +    print('Probably the same distribution')
    +else:
    +    print('Probably different distributions')
    +
  • Sources

Friedman Test

Tests whether the distributions of two or more paired samples are equal or not.

  • Assumptions

    • Observations in each sample are independent and identically distributed (iid).
    • Observations in each sample can be ranked.
    • Observations across each sample are paired.
  • Interpretation

    • H0: the distributions of all samples are equal.
    • H1: the distributions of one or more samples are not equal.
  • Python Code

    # Example of the Friedman Test
    +from scipy.stats import friedmanchisquare
    +data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
    +data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
    +data3 = [-0.208, 0.696, 0.928, -1.148, -0.213, 0.229, 0.137, 0.269, -0.870, -1.204]
    +stat, p = friedmanchisquare(data1, data2, data3)
    +print('stat=%.3f, p=%.3f' % (stat, p))
    +if p > 0.05:
    +    print('Probably the same distribution')
    +else:
    +    print('Probably different distributions')
    +
  • Sources

Equality of variance test

Test is used to assess the equality of variance between two different samples.

Levene's test

Levene’s test is used to assess the equality of variance between two or more different samples.

  • Assumptions

    • The samples from the populations under consideration are independent.
    • The populations under consideration are approximately normally distributed.
  • Interpretation

    • H0: All the samples variances are equal
    • H1: At least one variance is different from the rest
  • Python Code

    # Example of the Levene's test
    +from scipy.stats import levene
    +a = [8.88, 9.12, 9.04, 8.98, 9.00, 9.08, 9.01, 8.85, 9.06, 8.99]
    +b = [8.88, 8.95, 9.29, 9.44, 9.15, 9.58, 8.36, 9.18, 8.67, 9.05]
    +c = [8.95, 9.12, 8.95, 8.85, 9.03, 8.84, 9.07, 8.98, 8.86, 8.98]
    +stat, p = levene(a, b, c)
    +print('stat=%.3f, p=%.3f' % (stat, p))
    +if p > 0.05:
    +    print('Probably the same variances')
    +else:
    +    print('Probably at least one variance is different from the rest')
    +
  • Sources


Source: https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/


Last update: April 12, 2023
\ No newline at end of file diff --git a/Cheat-Sheets/Keras/index.html b/Cheat-Sheets/Keras/index.html index 1df7534..8739019 100644 --- a/Cheat-Sheets/Keras/index.html +++ b/Cheat-Sheets/Keras/index.html @@ -1 +1 @@ - Keras - Data Science Interview preparation
\ No newline at end of file + Keras - Data Science Interview preparation
\ No newline at end of file diff --git a/Cheat-Sheets/NumPy/index.html b/Cheat-Sheets/NumPy/index.html index 1a76492..95fd17d 100644 --- a/Cheat-Sheets/NumPy/index.html +++ b/Cheat-Sheets/NumPy/index.html @@ -1 +1 @@ - NumPy - Data Science Interview preparation
\ No newline at end of file + NumPy - Data Science Interview preparation
\ No newline at end of file diff --git a/Cheat-Sheets/Pandas/index.html b/Cheat-Sheets/Pandas/index.html index 2df351e..2f5df21 100644 --- a/Cheat-Sheets/Pandas/index.html +++ b/Cheat-Sheets/Pandas/index.html @@ -1 +1 @@ - Pandas - Data Science Interview preparation
\ No newline at end of file + Pandas - Data Science Interview preparation
\ No newline at end of file diff --git a/Cheat-Sheets/PySpark/index.html b/Cheat-Sheets/PySpark/index.html index 3e797f5..0ae4d88 100644 --- a/Cheat-Sheets/PySpark/index.html +++ b/Cheat-Sheets/PySpark/index.html @@ -1 +1 @@ - PySpark - Data Science Interview preparation
\ No newline at end of file + PySpark - Data Science Interview preparation
\ No newline at end of file diff --git a/Cheat-Sheets/PyTorch/index.html b/Cheat-Sheets/PyTorch/index.html index 3d1dcc1..925ce73 100644 --- a/Cheat-Sheets/PyTorch/index.html +++ b/Cheat-Sheets/PyTorch/index.html @@ -1 +1 @@ - PyTorch - Data Science Interview preparation
\ No newline at end of file + PyTorch - Data Science Interview preparation
\ No newline at end of file diff --git a/Cheat-Sheets/Python/index.html b/Cheat-Sheets/Python/index.html index c18dcb4..a326e4a 100644 --- a/Cheat-Sheets/Python/index.html +++ b/Cheat-Sheets/Python/index.html @@ -1 +1 @@ - Python - Data Science Interview preparation
\ No newline at end of file + Python - Data Science Interview preparation
\ No newline at end of file diff --git a/Cheat-Sheets/RegEx/index.html b/Cheat-Sheets/RegEx/index.html index 083161a..e6d2d7f 100644 --- a/Cheat-Sheets/RegEx/index.html +++ b/Cheat-Sheets/RegEx/index.html @@ -1 +1 @@ - Regular Expressions (RegEx) - Data Science Interview preparation
\ No newline at end of file + Regular Expressions (RegEx) - Data Science Interview preparation
\ No newline at end of file diff --git a/Cheat-Sheets/SQL/index.html b/Cheat-Sheets/SQL/index.html index 1d5770d..a3d0df8 100644 --- a/Cheat-Sheets/SQL/index.html +++ b/Cheat-Sheets/SQL/index.html @@ -1 +1 @@ - SQL - Data Science Interview preparation
\ No newline at end of file + SQL - Data Science Interview preparation
\ No newline at end of file diff --git a/Cheat-Sheets/Sk-learn/index.html b/Cheat-Sheets/Sk-learn/index.html index 8c53641..33df924 100644 --- a/Cheat-Sheets/Sk-learn/index.html +++ b/Cheat-Sheets/Sk-learn/index.html @@ -1 +1 @@ - Scikit Learn - Data Science Interview preparation
\ No newline at end of file + Scikit Learn - Data Science Interview preparation
\ No newline at end of file diff --git a/Cheat-Sheets/tensorflow/index.html b/Cheat-Sheets/tensorflow/index.html index b66ced2..4c8f81b 100644 --- a/Cheat-Sheets/tensorflow/index.html +++ b/Cheat-Sheets/tensorflow/index.html @@ -1 +1 @@ - TensorFlow - Data Science Interview preparation
\ No newline at end of file + TensorFlow - Data Science Interview preparation
\ No newline at end of file diff --git a/Deploying-ML-models/deploying-ml-models/index.html b/Deploying-ML-models/deploying-ml-models/index.html index 9cc7326..b137648 100644 --- a/Deploying-ML-models/deploying-ml-models/index.html +++ b/Deploying-ML-models/deploying-ml-models/index.html @@ -1,19 +1,19 @@ - Home - Data Science Interview preparation
Skip to content

Home

Go to website

Introduction

This is a completely open-source platform for maintaining curated list of interview questions and answers for people looking and preparing for data science opportunities.

Not only this, the platform will also serve as one point destination for all your needs like tutorials, online materials, etc.

This platform is maintained by you! 🤗 You can help us by answering/ improving existing questions as well as by sharing any new questions that you faced during your interviews.

Contribute to the platform

Contribution in any form will be deeply appreciated. 🙏

Add questions

❓ Add your questions here. Please ensure to provide a detailed description to allow your fellow contributors to understand your questions and answer them to your satisfaction.

Add New question

🤝 Please note that as of now, you cannot directly add a question via a pull request. This will help us to maintain the quality of the content for you.

Add answers/topics

📝 These are the answers/topics that need your help at the moment

  • Add documentation for the project
  • Online Material for Learning
  • Suggested Learning Paths
  • Cheat Sheets
    • Django
    • Flask
    • Numpy
    • Pandas
    • PySpark
    • Python
    • RegEx
    • SQL
  • NLP Interview Questions
  • Add python common DSA interview questions
  • Add Major ML topics
    • Linear Regression
    • Logistic Regression
    • SVM
    • Random Forest
    • Gradient boosting
    • PCA
    • Collaborative Filtering
    • K-means clustering
    • kNN
    • ARIMA
    • Neural Networks
    • Decision Trees
    • Overfitting, Underfitting
    • Unbalanced, Skewed data
    • Activation functions relu/ leaky relu
    • Normalization
    • DBSCAN
    • Normal Distribution
    • Precision, Recall
    • Loss Function MAE, RMSE
  • Add Pandas questions
  • Add NumPy questions
  • Add TensorFlow questions
  • Add PyTorch questions
  • Add list of learning resources

Report/Solve Issues

Issues

🔧 To report any issues find me on LinkedIn or raise an issue on GitHub.

🛠 You can also solve existing issues on GitHub and create a pull request.

Say Thanks

😊 If this platform helped you in any way, it would be great if you could share it with others.

Check out this 👇 platform 👇 for data science content:
+ Home - Data Science Interview preparation      

Home

Go to website

Introduction

This is a completely open-source platform for maintaining curated list of interview questions and answers for people looking and preparing for data science opportunities.

Not only this, the platform will also serve as one point destination for all your needs like tutorials, online materials, etc.

This platform is maintained by you! 🤗 You can help us by answering/ improving existing questions as well as by sharing any new questions that you faced during your interviews.

Contribute to the platform

Contribution in any form will be deeply appreciated. 🙏

Add questions

❓ Add your questions here. Please ensure to provide a detailed description to allow your fellow contributors to understand your questions and answer them to your satisfaction.

Add New question

🤝 Please note that as of now, you cannot directly add a question via a pull request. This will help us to maintain the quality of the content for you.

Add answers/topics

📝 These are the answers/topics that need your help at the moment

  • Add documentation for the project
  • Online Material for Learning
  • Suggested Learning Paths
  • Cheat Sheets
    • Django
    • Flask
    • Numpy
    • Pandas
    • PySpark
    • Python
    • RegEx
    • SQL
  • NLP Interview Questions
  • Add python common DSA interview questions
  • Add Major ML topics
    • Linear Regression
    • Logistic Regression
    • SVM
    • Random Forest
    • Gradient boosting
    • PCA
    • Collaborative Filtering
    • K-means clustering
    • kNN
    • ARIMA
    • Neural Networks
    • Decision Trees
    • Overfitting, Underfitting
    • Unbalanced, Skewed data
    • Activation functions relu/ leaky relu
    • Normalization
    • DBSCAN
    • Normal Distribution
    • Precision, Recall
    • Loss Function MAE, RMSE
  • Add Pandas questions
  • Add NumPy questions
  • Add TensorFlow questions
  • Add PyTorch questions
  • Add list of learning resources

Report/Solve Issues

Issues

🔧 To report any issues find me on LinkedIn or raise an issue on GitHub.

🛠 You can also solve existing issues on GitHub and create a pull request.

Say Thanks

😊 If this platform helped you in any way, it would be great if you could share it with others.

Check out this 👇 platform 👇 for data science content:
 👉 https://singhsidhukuldeep.github.io/data-science-interview-prep/ 👈
 
 #data-science #machine-learning #interview-preparation 
-

You can also star the repository on GitHub Stars and watch-out for any updates Watchers

Features

  • 🎨 Beautiful: The design is built on top of most popular libraries like MkDocs and material which allows the platform to be responsive and to work on all sorts of devices – from mobile phones to wide-screens. The underlying fluid layout will always adapt perfectly to the available screen space.

  • 🧐 Searchable: almost magically, all the content on the website is searchable without any further ado. The built-in search – server-less – is fast and accurate in responses to any of the queries.

  • 🙌 Accessible:

    • Easy to use: 👌 The website is hosted on github-pages and is free and open to use to over 40 million users of GitHub in 100+ countries.
    • Easy to contribute: 🤝 The website embodies the concept of collaboration to the latter. Allowing anyone to add/improve the content. To make contributing easy, everything is written in MarkDown and then compiled to beautiful html.

Setup

No setup is required for usage of the platform

Important: It is strongly advised to use virtual environment and not change anything in gh-pages

Linux Systems Linux

python3 -m venv ./venv
+

You can also star the repository on GitHub Stars and watch-out for any updates Watchers

Features

  • 🎨 Beautiful: The design is built on top of most popular libraries like MkDocs and material which allows the platform to be responsive and to work on all sorts of devices – from mobile phones to wide-screens. The underlying fluid layout will always adapt perfectly to the available screen space.

  • 🧐 Searchable: almost magically, all the content on the website is searchable without any further ado. The built-in search – server-less – is fast and accurate in responses to any of the queries.

  • 🙌 Accessible:

    • Easy to use: 👌 The website is hosted on github-pages and is free and open to use to over 40 million users of GitHub in 100+ countries.
    • Easy to contribute: 🤝 The website embodies the concept of collaboration to the latter. Allowing anyone to add/improve the content. To make contributing easy, everything is written in MarkDown and then compiled to beautiful html.

Setup

No setup is required for usage of the platform

Important: It is strongly advised to use virtual environment and not change anything in gh-pages

Linux Systems Linux

python3 -m venv ./venv
 
-source venv/bin/activate
+source venv/bin/activate
 
-pip3 install -r requirements.txt
+pip3 install -r requirements.txt
 
deactivate
-

Windows Systems Windows

python3 -m venv ./venv
+

Windows Systems Windows

python3 -m venv ./venv
 
 venv\Scripts\activate
 
-pip3 install -r requirements.txt
+pip3 install -r requirements.txt
 
venv\Scripts\deactivate
-

To install the latest

pip3 install mkdocs
-pip3 install mkdocs-material
-

Useful Commands

  • mkdocs serve - Start the live-reloading docs server.
  • mkdocs build - Build the documentation site.
  • mkdocs -h - Print help message and exit.
  • mkdocs gh-deploy - Use mkdocs gh-deploy --help to get a full list of options available for the gh-deploy command. Be aware that you will not be able to review the built site before it is pushed to GitHub. Therefore, you may want to verify any changes you make to the docs beforehand by using the build or serve commands and reviewing the built files locally.
  • mkdocs new [dir-name] - Create a new project. No need to create a new project

Useful Documents

FAQ

  • Can I filter questions based on companies? 🤪

    As much as this platform aims to help you with your interview preparation, it is not a short-cut to crack one. Think of this platform as a practicing field to help you sharpen your skills for your interview processes. However, for your convenience we have sorted all the questions by topics for you. 🤓

    This doesn't mean that such feature won't be added in the future. "Never say Never"

    But as of now there is neither plan nor data to do so. 😢

  • Why is this platform free? 🤗

    Currently there is no major cost involved in maintaining this platform other than time and effort that is put in by every contributor. If you want to help you can contribute here.

    If you still want to pay for something that is free, we would request you to donate it to a charity of your choice instead. 😇

Credits

Maintained by

👨‍🎓 Kuldeep Singh Sidhu

Github: github/singhsidhukuldeep https://github.com/singhsidhukuldeep

Website: Kuldeep Singh Sidhu (Website) http://kuldeepsinghsidhu.com

LinkedIn: Kuldeep Singh Sidhu (LinkedIn) https://www.linkedin.com/in/singhsidhukuldeep/

Contributors

😎 The full list of all the contributors is available here

Current Status

Maintenance Website shields.io GitHub pages status GitHub up-time BOT Commits DependaBot

Issues Total Commits Contributors Forks Stars Watchers Branches

License: AGPL v3 made-with-python made-with-Markdown repo- size Followers


Last update: July 1, 2020
\ No newline at end of file +

To install the latest

pip3 install mkdocs
+pip3 install mkdocs-material
+

Useful Commands

  • mkdocs serve - Start the live-reloading docs server.
  • mkdocs build - Build the documentation site.
  • mkdocs -h - Print help message and exit.
  • mkdocs gh-deploy - Use mkdocs gh-deploy --help to get a full list of options available for the gh-deploy command. Be aware that you will not be able to review the built site before it is pushed to GitHub. Therefore, you may want to verify any changes you make to the docs beforehand by using the build or serve commands and reviewing the built files locally.
  • mkdocs new [dir-name] - Create a new project. No need to create a new project

Useful Documents

FAQ

  • Can I filter questions based on companies? 🤪

    As much as this platform aims to help you with your interview preparation, it is not a short-cut to crack one. Think of this platform as a practicing field to help you sharpen your skills for your interview processes. However, for your convenience we have sorted all the questions by topics for you. 🤓

    This doesn't mean that such feature won't be added in the future. "Never say Never"

    But as of now there is neither plan nor data to do so. 😢

  • Why is this platform free? 🤗

    Currently there is no major cost involved in maintaining this platform other than time and effort that is put in by every contributor. If you want to help you can contribute here.

    If you still want to pay for something that is free, we would request you to donate it to a charity of your choice instead. 😇

Credits

Maintained by

👨‍🎓 Kuldeep Singh Sidhu

Github: github/singhsidhukuldeep https://github.com/singhsidhukuldeep

Website: Kuldeep Singh Sidhu (Website) http://kuldeepsinghsidhu.com

LinkedIn: Kuldeep Singh Sidhu (LinkedIn) https://www.linkedin.com/in/singhsidhukuldeep/

Contributors

😎 The full list of all the contributors is available here

Current Status

Maintenance Website shields.io GitHub pages status GitHub up-time BOT Commits DependaBot

Issues Total Commits Contributors Forks Stars Watchers Branches

License: AGPL v3 made-with-python made-with-Markdown repo- size Followers


Last update: July 1, 2020
\ No newline at end of file diff --git a/Interview-Questions/Natural-Language-Processing/index.html b/Interview-Questions/Natural-Language-Processing/index.html index 6c1d544..2e4ad9e 100644 --- a/Interview-Questions/Natural-Language-Processing/index.html +++ b/Interview-Questions/Natural-Language-Processing/index.html @@ -1 +1 @@ - NLP Questions - Data Science Interview preparation
Skip to content
\ No newline at end of file + NLP Questions - Data Science Interview preparation
Skip to content
\ No newline at end of file diff --git a/Interview-Questions/Probability/index.html b/Interview-Questions/Probability/index.html index 914a687..4462d37 100644 --- a/Interview-Questions/Probability/index.html +++ b/Interview-Questions/Probability/index.html @@ -1,4 +1,4 @@ - Probability Questions - Data Science Interview preparation
Skip to content

Probability Interview Questions

Total Questions Unanswered Questions Answered Questions


Average score on a dice role of at most 3 times

Question

Consider a fair 6-sided dice. Your aim is to get the highest score you can, in at-most 3 roles.

A score is defined as the number that appears on the face of the dice facing up after the role. You can role at most 3 times but every time you role it is up to you to decide whether you want to role again.

The last score will be counted as your final score.

  • Find the average score if you rolled the dice only once?
  • Find the average score that you can get with at most 3 roles?
  • If the dice is fair, why is the average score for at most 3 roles and 1 role not the same?

Answer

If you role a fair dice once you can get:

Score Probability
1
2
3
4
5
6

So your average score with one role is:

sum of(score * scores's probability) = (1+2+3+4+5+6)*(⅙) = (21/6) = 3.5

The average score if you rolled the dice only once is 3.5

For at most 3 roles, let's try back-tracking. Let's say just did your second role and you have to decide whether to do your 3rd role!

We just found out if we role dice once on average we can expect score of 3.5. So we will only role the 3rd time if score on 2nd role is less than 3.5 i.e (1,2 or 3)

Possibilities

2nd role score Probability 3rd role score Probability
1 3.5
2 3.5
3 3.5
4 NA We won't role
5 NA 3rd time if we
6 NA get score >3 on 2nd

So if we had 2 roles, average score would be:

[We role again if current score is less than 3.4]
+ Probability Questions - Data Science Interview preparation      

Probability Interview Questions

Total Questions Unanswered Questions Answered Questions


Average score on a dice role of at most 3 times

Question

Consider a fair 6-sided dice. Your aim is to get the highest score you can, in at-most 3 roles.

A score is defined as the number that appears on the face of the dice facing up after the role. You can role at most 3 times but every time you role it is up to you to decide whether you want to role again.

The last score will be counted as your final score.

  • Find the average score if you rolled the dice only once?
  • Find the average score that you can get with at most 3 roles?
  • If the dice is fair, why is the average score for at most 3 roles and 1 role not the same?

Answer

If you role a fair dice once you can get:

Score Probability
1
2
3
4
5
6

So your average score with one role is:

sum of(score * scores's probability) = (1+2+3+4+5+6)*(⅙) = (21/6) = 3.5

The average score if you rolled the dice only once is 3.5

For at most 3 roles, let's try back-tracking. Let's say just did your second role and you have to decide whether to do your 3rd role!

We just found out if we role dice once on average we can expect score of 3.5. So we will only role the 3rd time if score on 2nd role is less than 3.5 i.e (1,2 or 3)

Possibilities

2nd role score Probability 3rd role score Probability
1 3.5
2 3.5
3 3.5
4 NA We won't role
5 NA 3rd time if we
6 NA get score >3 on 2nd

So if we had 2 roles, average score would be:

[We role again if current score is less than 3.4]
 (3.5)*(1/6) + (3.5)*(1/6) + (3.5)*(1/6) 
 +
 (4)*(1/6) + (5)*(1/6) + (6)*(1/6) [Decide not to role again]
diff --git a/Interview-Questions/System-design/index.html b/Interview-Questions/System-design/index.html
index c6d8d2a..f44cad1 100644
--- a/Interview-Questions/System-design/index.html
+++ b/Interview-Questions/System-design/index.html
@@ -1 +1 @@
- System Design - Data Science Interview preparation      
\ No newline at end of file + System Design - Data Science Interview preparation
\ No newline at end of file diff --git a/Interview-Questions/data-structures-algorithms/index.html b/Interview-Questions/data-structures-algorithms/index.html index c741448..8b2eef1 100644 --- a/Interview-Questions/data-structures-algorithms/index.html +++ b/Interview-Questions/data-structures-algorithms/index.html @@ -1,4 +1,4 @@ - Data Structure and Algorithms - Data Science Interview preparation

Data Structure and Algorithms (DSA)

Total Questions Unanswered Questions Answered Questions

To-do

Data Science Interview preparation
Data Structure and Algorithms

Data Structure and Algorithms (DSA)

Total Questions Unanswered Questions Answered Questions

To-do

Data Science Interview preparation
ARIMA
\ No newline at end of file + ARIMA - Data Science Interview preparation
\ No newline at end of file diff --git a/Machine-Learning/Activation functions/index.html b/Machine-Learning/Activation functions/index.html index 8968e3e..13a0610 100644 --- a/Machine-Learning/Activation functions/index.html +++ b/Machine-Learning/Activation functions/index.html @@ -1 +1 @@ - Activation functions - Data Science Interview preparation
\ No newline at end of file + Activation functions - Data Science Interview preparation
\ No newline at end of file diff --git a/Machine-Learning/Collaborative Filtering/index.html b/Machine-Learning/Collaborative Filtering/index.html index 548580d..67ab351 100644 --- a/Machine-Learning/Collaborative Filtering/index.html +++ b/Machine-Learning/Collaborative Filtering/index.html @@ -1 +1 @@ - Collaborative Filtering - Data Science Interview preparation
\ No newline at end of file + Collaborative Filtering - Data Science Interview preparation
\ No newline at end of file diff --git a/Machine-Learning/Confusion Matrix/index.html b/Machine-Learning/Confusion Matrix/index.html index 3a62a29..d8ffbca 100644 --- a/Machine-Learning/Confusion Matrix/index.html +++ b/Machine-Learning/Confusion Matrix/index.html @@ -1 +1 @@ - Confusion Matrix - Data Science Interview preparation
\ No newline at end of file + Confusion Matrix - Data Science Interview preparation
\ No newline at end of file diff --git a/Machine-Learning/DBSCAN/index.html b/Machine-Learning/DBSCAN/index.html index 69f75e0..455b11f 100644 --- a/Machine-Learning/DBSCAN/index.html +++ b/Machine-Learning/DBSCAN/index.html @@ -1 +1 @@ - DBSCAN - Data Science Interview preparation
\ No newline at end of file + DBSCAN - Data Science Interview preparation
\ No newline at end of file diff --git a/Machine-Learning/Decision Trees/index.html b/Machine-Learning/Decision Trees/index.html index 0cfe143..f35c080 100644 --- a/Machine-Learning/Decision Trees/index.html +++ b/Machine-Learning/Decision Trees/index.html @@ -1 +1 @@ - Decision Trees - Data Science Interview preparation
\ No newline at end of file + Decision Trees - Data Science Interview preparation
\ No newline at end of file diff --git a/Machine-Learning/Gradient Boosting/index.html b/Machine-Learning/Gradient Boosting/index.html index 3d2dedc..8f44be6 100644 --- a/Machine-Learning/Gradient Boosting/index.html +++ b/Machine-Learning/Gradient Boosting/index.html @@ -1 +1 @@ - Gradient Boosting - Data Science Interview preparation
\ No newline at end of file + Gradient Boosting - Data Science Interview preparation
\ No newline at end of file diff --git a/Machine-Learning/K-means clustering/index.html b/Machine-Learning/K-means clustering/index.html index 8c3617b..e22efeb 100644 --- a/Machine-Learning/K-means clustering/index.html +++ b/Machine-Learning/K-means clustering/index.html @@ -1 +1 @@ - K means clustering - Data Science Interview preparation
\ No newline at end of file + K means clustering - Data Science Interview preparation
\ No newline at end of file diff --git a/Machine-Learning/Linear Regression/index.html b/Machine-Learning/Linear Regression/index.html index 359b32d..b6abaf8 100644 --- a/Machine-Learning/Linear Regression/index.html +++ b/Machine-Learning/Linear Regression/index.html @@ -1 +1 @@ - Linear Regression - Data Science Interview preparation
\ No newline at end of file + Linear Regression - Data Science Interview preparation
\ No newline at end of file diff --git a/Machine-Learning/Logistic Regression/index.html b/Machine-Learning/Logistic Regression/index.html index c42231f..ac14d1b 100644 --- a/Machine-Learning/Logistic Regression/index.html +++ b/Machine-Learning/Logistic Regression/index.html @@ -1 +1 @@ - Logistic Regression - Data Science Interview preparation
\ No newline at end of file + Logistic Regression - Data Science Interview preparation
\ No newline at end of file diff --git a/Machine-Learning/Loss Function MAE, RMSE/index.html b/Machine-Learning/Loss Function MAE, RMSE/index.html index ad3aed5..9beca73 100644 --- a/Machine-Learning/Loss Function MAE, RMSE/index.html +++ b/Machine-Learning/Loss Function MAE, RMSE/index.html @@ -1 +1 @@ - Loss Function MAE, RMSE - Data Science Interview preparation
\ No newline at end of file + Loss Function MAE, RMSE - Data Science Interview preparation
\ No newline at end of file diff --git a/Machine-Learning/Neural Networks/index.html b/Machine-Learning/Neural Networks/index.html index 73ef438..d2f0866 100644 --- a/Machine-Learning/Neural Networks/index.html +++ b/Machine-Learning/Neural Networks/index.html @@ -1 +1 @@ - Neural Networks - Data Science Interview preparation
\ No newline at end of file + Neural Networks - Data Science Interview preparation
\ No newline at end of file diff --git a/Machine-Learning/Normal Distribution/index.html b/Machine-Learning/Normal Distribution/index.html index 01215ad..469242a 100644 --- a/Machine-Learning/Normal Distribution/index.html +++ b/Machine-Learning/Normal Distribution/index.html @@ -1 +1 @@ - Normal Distribution - Data Science Interview preparation
\ No newline at end of file + Normal Distribution - Data Science Interview preparation
\ No newline at end of file diff --git a/Machine-Learning/Normalization Regularisation/index.html b/Machine-Learning/Normalization Regularisation/index.html index f4f42b7..fe1d625 100644 --- a/Machine-Learning/Normalization Regularisation/index.html +++ b/Machine-Learning/Normalization Regularisation/index.html @@ -1 +1 @@ - Normalization Regularisation - Data Science Interview preparation
\ No newline at end of file + Normalization Regularisation - Data Science Interview preparation
\ No newline at end of file diff --git a/Machine-Learning/Overfitting, Underfitting/index.html b/Machine-Learning/Overfitting, Underfitting/index.html index 4538b3f..fe354f2 100644 --- a/Machine-Learning/Overfitting, Underfitting/index.html +++ b/Machine-Learning/Overfitting, Underfitting/index.html @@ -1 +1 @@ - Overfitting, Underfitting - Data Science Interview preparation
\ No newline at end of file + Overfitting, Underfitting - Data Science Interview preparation
\ No newline at end of file diff --git a/Machine-Learning/PCA/index.html b/Machine-Learning/PCA/index.html index b520c65..2be8ca9 100644 --- a/Machine-Learning/PCA/index.html +++ b/Machine-Learning/PCA/index.html @@ -1 +1 @@ - PCA - Data Science Interview preparation
\ No newline at end of file + PCA - Data Science Interview preparation
\ No newline at end of file diff --git a/Machine-Learning/Random Forest/index.html b/Machine-Learning/Random Forest/index.html index 22f88e0..563f388 100644 --- a/Machine-Learning/Random Forest/index.html +++ b/Machine-Learning/Random Forest/index.html @@ -1 +1 @@ - Random Forest - Data Science Interview preparation
\ No newline at end of file + Random Forest - Data Science Interview preparation
\ No newline at end of file diff --git a/Machine-Learning/Support Vector Machines/index.html b/Machine-Learning/Support Vector Machines/index.html index f6678e4..2138917 100644 --- a/Machine-Learning/Support Vector Machines/index.html +++ b/Machine-Learning/Support Vector Machines/index.html @@ -1 +1 @@ - Support Vector Machines - Data Science Interview preparation
\ No newline at end of file + Support Vector Machines - Data Science Interview preparation
\ No newline at end of file diff --git a/Machine-Learning/Unbalanced, Skewed data/index.html b/Machine-Learning/Unbalanced, Skewed data/index.html index 40e5d36..6915868 100644 --- a/Machine-Learning/Unbalanced, Skewed data/index.html +++ b/Machine-Learning/Unbalanced, Skewed data/index.html @@ -1 +1 @@ - Unbalanced, Skewed data - Data Science Interview preparation
\ No newline at end of file + Unbalanced, Skewed data - Data Science Interview preparation
\ No newline at end of file diff --git a/Machine-Learning/kNN/index.html b/Machine-Learning/kNN/index.html index 743edfb..9202d55 100644 --- a/Machine-Learning/kNN/index.html +++ b/Machine-Learning/kNN/index.html @@ -1 +1 @@ - kNN - Data Science Interview preparation
\ No newline at end of file + kNN - Data Science Interview preparation
\ No newline at end of file diff --git a/Online-Material/Online-Material-for-Learning/index.html b/Online-Material/Online-Material-for-Learning/index.html index f3a84b8..7957d62 100644 --- a/Online-Material/Online-Material-for-Learning/index.html +++ b/Online-Material/Online-Material-for-Learning/index.html @@ -1 +1 @@ - Online Study Material - Data Science Interview preparation
\ No newline at end of file + Online Study Material - Data Science Interview preparation
\ No newline at end of file diff --git a/Online-Material/popular-resouces/index.html b/Online-Material/popular-resouces/index.html index aff2064..90d007b 100644 --- a/Online-Material/popular-resouces/index.html +++ b/Online-Material/popular-resouces/index.html @@ -1 +1 @@ - Popular Blogs - Data Science Interview preparation
\ No newline at end of file + Popular Blogs - Data Science Interview preparation
\ No newline at end of file diff --git a/Suggested-Learning-Paths/index.html b/Suggested-Learning-Paths/index.html index 10e6a34..60f14d0 100644 --- a/Suggested-Learning-Paths/index.html +++ b/Suggested-Learning-Paths/index.html @@ -1 +1 @@ - 📅 Suggested Learning Paths - Data Science Interview preparation
\ No newline at end of file + 📅 Suggested Learning Paths - Data Science Interview preparation
\ No newline at end of file diff --git a/as-fast-as-possible/Deep-CV/index.html b/as-fast-as-possible/Deep-CV/index.html index 2765e28..d9b42a3 100644 --- a/as-fast-as-possible/Deep-CV/index.html +++ b/as-fast-as-possible/Deep-CV/index.html @@ -1 +1 @@ - Deep Computer Vision - Data Science Interview preparation
\ No newline at end of file + Deep Computer Vision - Data Science Interview preparation
\ No newline at end of file diff --git a/as-fast-as-possible/Deep-NLP/index.html b/as-fast-as-possible/Deep-NLP/index.html index d7769d2..8d4869c 100644 --- a/as-fast-as-possible/Deep-NLP/index.html +++ b/as-fast-as-possible/Deep-NLP/index.html @@ -1 +1 @@ - Deep Natural Language Processing - Data Science Interview preparation
\ No newline at end of file + Deep Natural Language Processing - Data Science Interview preparation
\ No newline at end of file diff --git a/as-fast-as-possible/Neural-Networks/index.html b/as-fast-as-possible/Neural-Networks/index.html index 388e404..52553bf 100644 --- a/as-fast-as-possible/Neural-Networks/index.html +++ b/as-fast-as-possible/Neural-Networks/index.html @@ -1 +1 @@ - Neural Networks - Data Science Interview preparation
\ No newline at end of file + Neural Networks - Data Science Interview preparation
\ No newline at end of file diff --git a/as-fast-as-possible/TF2-Keras/index.html b/as-fast-as-possible/TF2-Keras/index.html index 6e3a86e..7bc5dbf 100644 --- a/as-fast-as-possible/TF2-Keras/index.html +++ b/as-fast-as-possible/TF2-Keras/index.html @@ -1 +1 @@ - Tensorflow 2 with Keras - Data Science Interview preparation
\ No newline at end of file + Tensorflow 2 with Keras - Data Science Interview preparation
\ No newline at end of file diff --git a/as-fast-as-possible/index.html b/as-fast-as-possible/index.html index 7b58214..331d6c2 100644 --- a/as-fast-as-possible/index.html +++ b/as-fast-as-possible/index.html @@ -1 +1 @@ - Introduction - Data Science Interview preparation
\ No newline at end of file + Introduction - Data Science Interview preparation
\ No newline at end of file diff --git a/index.html b/index.html index fe20ebe..9024b91 100644 --- a/index.html +++ b/index.html @@ -1,4 +1,4 @@ - Data Science - Data Science Interview preparation

Home

Go to website

Introduction

This is a completely open-source platform for maintaining curated list of interview questions and answers for people looking and preparing for data science opportunities.

Not only this, the platform will also serve as one point destination for all your needs like tutorials, online materials, etc.

This platform is maintained by you! 🤗 You can help us by answering/ improving existing questions as well as by sharing any new questions that you faced during your interviews.

Contribute to the platform

Contribution in any form will be deeply appreciated. 🙏

Add questions

❓ Add your questions here. Please ensure to provide a detailed description to allow your fellow contributors to understand your questions and answer them to your satisfaction.

Add New question

🤝 Please note that as of now, you cannot directly add a question via a pull request. This will help us to maintain the quality of the content for you.

Add answers/topics

📝 These are the answers/topics that need your help at the moment

  • Add documentation for the project
  • Online Material for Learning
  • Suggested Learning Paths
  • Cheat Sheets
    • Django
    • Flask
    • Numpy
    • Pandas
    • PySpark
    • Python
    • RegEx
    • SQL
  • NLP Interview Questions
  • Add python common DSA interview questions
  • Add Major ML topics
    • Linear Regression
    • Logistic Regression
    • SVM
    • Random Forest
    • Gradient boosting
    • PCA
    • Collaborative Filtering
    • K-means clustering
    • kNN
    • ARIMA
    • Neural Networks
    • Decision Trees
    • Overfitting, Underfitting
    • Unbalanced, Skewed data
    • Activation functions relu/ leaky relu
    • Normalization
    • DBSCAN
    • Normal Distribution
    • Precision, Recall
    • Loss Function MAE, RMSE
  • Add Pandas questions
  • Add NumPy questions
  • Add TensorFlow questions
  • Add PyTorch questions
  • Add list of learning resources

Report/Solve Issues

Issues

🔧 To report any issues find me on LinkedIn or raise an issue on GitHub.

🛠 You can also solve existing issues on GitHub and create a pull request.

Say Thanks

😊 If this platform helped you in any way, it would be great if you could share it with others.

Data Science Interview preparation
Data Science

Home

Go to website

Introduction

This is a completely open-source platform for maintaining curated list of interview questions and answers for people looking and preparing for data science opportunities.

Not only this, the platform will also serve as one point destination for all your needs like tutorials, online materials, etc.

This platform is maintained by you! 🤗 You can help us by answering/ improving existing questions as well as by sharing any new questions that you faced during your interviews.

Contribute to the platform

Contribution in any form will be deeply appreciated. 🙏

Add questions

❓ Add your questions here. Please ensure to provide a detailed description to allow your fellow contributors to understand your questions and answer them to your satisfaction.

Add New question

🤝 Please note that as of now, you cannot directly add a question via a pull request. This will help us to maintain the quality of the content for you.

Add answers/topics

📝 These are the answers/topics that need your help at the moment

  • Add documentation for the project
  • Online Material for Learning
  • Suggested Learning Paths
  • Cheat Sheets
    • Django
    • Flask
    • Numpy
    • Pandas
    • PySpark
    • Python
    • RegEx
    • SQL
  • NLP Interview Questions
  • Add python common DSA interview questions
  • Add Major ML topics
    • Linear Regression
    • Logistic Regression
    • SVM
    • Random Forest
    • Gradient boosting
    • PCA
    • Collaborative Filtering
    • K-means clustering
    • kNN
    • ARIMA
    • Neural Networks
    • Decision Trees
    • Overfitting, Underfitting
    • Unbalanced, Skewed data
    • Activation functions relu/ leaky relu
    • Normalization
    • DBSCAN
    • Normal Distribution
    • Precision, Recall
    • Loss Function MAE, RMSE
  • Add Pandas questions
  • Add NumPy questions
  • Add TensorFlow questions
  • Add PyTorch questions
  • Add list of learning resources

Report/Solve Issues

Issues

🔧 To report any issues find me on LinkedIn or raise an issue on GitHub.

🛠 You can also solve existing issues on GitHub and create a pull request.

Say Thanks

😊 If this platform helped you in any way, it would be great if you could share it with others.

Check out this 👇 platform 👇 for data science content:
 👉 https://singhsidhukuldeep.github.io/data-science-interview-prep/ 👈
-

You can also star the repository on GitHub Stars and watch-out for any updates Watchers

Features

  • 🎨 Beautiful: The design is built on top of most popular libraries like MkDocs and material which allows the platform to be responsive and to work on all sorts of devices – from mobile phones to wide-screens. The underlying fluid layout will always adapt perfectly to the available screen space.

  • 🧐 Searchable: almost magically, all the content on the website is searchable without any further ado. The built-in search – server-less – is fast and accurate in responses to any of the queries.

  • 🙌 Accessible:

    • Easy to use: 👌 The website is hosted on github-pages and is free and open to use to over 40 million users of GitHub in 100+ countries.
    • Easy to contribute: 🤝 The website embodies the concept of collaboration to the latter. Allowing anyone to add/improve the content. To make contributing easy, everything is written in MarkDown and then compiled to beautiful html.

Setup

No setup is required for usage of the platform

Important: It is strongly advised to use virtual environment and not change anything in gh-pages

Linux Systems Linux

python3 -m venv ./venv
+

You can also star the repository on GitHub Stars and watch-out for any updates Watchers

Features

  • 🎨 Beautiful: The design is built on top of most popular libraries like MkDocs and material which allows the platform to be responsive and to work on all sorts of devices – from mobile phones to wide-screens. The underlying fluid layout will always adapt perfectly to the available screen space.

  • 🧐 Searchable: almost magically, all the content on the website is searchable without any further ado. The built-in search – server-less – is fast and accurate in responses to any of the queries.

  • 🙌 Accessible:

    • Easy to use: 👌 The website is hosted on github-pages and is free and open to use to over 40 million users of GitHub in 100+ countries.
    • Easy to contribute: 🤝 The website embodies the concept of collaboration to the latter. Allowing anyone to add/improve the content. To make contributing easy, everything is written in MarkDown and then compiled to beautiful html.

Setup

No setup is required for usage of the platform

Important: It is strongly advised to use virtual environment and not change anything in gh-pages

Linux Systems Linux

python3 -m venv ./venv
 
-source venv/bin/activate
+source venv/bin/activate
 
-pip3 install -r requirements.txt
+pip3 install -r requirements.txt
 
deactivate
-

Windows Systems Windows

python3 -m venv ./venv
+

Windows Systems Windows

python3 -m venv ./venv
 
 venv\Scripts\activate
 
-pip3 install -r requirements.txt
+pip3 install -r requirements.txt
 
venv\Scripts\deactivate
-

To install the latest

pip3 install mkdocs
-pip3 install mkdocs-material
-pip3 install mkdocs-minify-plugin
-pip3 install mkdocs-git-revision-date-localized-plugin
-

Useful Commands

  • mkdocs serve - Start the live-reloading docs server.
  • mkdocs build - Build the documentation site.
  • mkdocs -h - Print help message and exit.
  • mkdocs gh-deploy - Use mkdocs gh-deploy --help to get a full list of options available for the gh-deploy command. Be aware that you will not be able to review the built site before it is pushed to GitHub. Therefore, you may want to verify any changes you make to the docs beforehand by using the build or serve commands and reviewing the built files locally.
  • mkdocs new [dir-name] - Create a new project. No need to create a new project

Useful Documents

FAQ

  • Can I filter questions based on companies? 🤪

    As much as this platform aims to help you with your interview preparation, it is not a short-cut to crack one. Think of this platform as a practicing field to help you sharpen your skills for your interview processes. However, for your convenience we have sorted all the questions by topics for you. 🤓

    This doesn't mean that such feature won't be added in the future. "Never say Never"

    But as of now there is neither plan nor data to do so. 😢

  • Why is this platform free? 🤗

    Currently there is no major cost involved in maintaining this platform other than time and effort that is put in by every contributor. If you want to help you can contribute here.

    If you still want to pay for something that is free, we would request you to donate it to a charity of your choice instead. 😇

Credits

Maintained by

👨‍🎓 Kuldeep Singh Sidhu

Github: github/singhsidhukuldeep https://github.com/singhsidhukuldeep

Website: Kuldeep Singh Sidhu (Website) http://kuldeepsinghsidhu.com

LinkedIn: Kuldeep Singh Sidhu (LinkedIn) https://www.linkedin.com/in/singhsidhukuldeep/

Contributors

😎 The full list of all the contributors is available here

Current Status

Maintenance Website shields.io GitHub pages status GitHub up-time BOT Commits

Issues Total Commits Contributors Forks Stars Watchers Branches

License: AGPL v3 made-with-python made-with-Markdown repo- size Followers


Last update: July 4, 2020
\ No newline at end of file +

To install the latest

pip3 install mkdocs
+pip3 install mkdocs-material
+pip3 install mkdocs-minify-plugin
+pip3 install mkdocs-git-revision-date-localized-plugin
+

Useful Commands

  • mkdocs serve - Start the live-reloading docs server.
  • mkdocs build - Build the documentation site.
  • mkdocs -h - Print help message and exit.
  • mkdocs gh-deploy - Use mkdocs gh-deploy --help to get a full list of options available for the gh-deploy command. Be aware that you will not be able to review the built site before it is pushed to GitHub. Therefore, you may want to verify any changes you make to the docs beforehand by using the build or serve commands and reviewing the built files locally.
  • mkdocs new [dir-name] - Create a new project. No need to create a new project

Useful Documents

FAQ

  • Can I filter questions based on companies? 🤪

    As much as this platform aims to help you with your interview preparation, it is not a short-cut to crack one. Think of this platform as a practicing field to help you sharpen your skills for your interview processes. However, for your convenience we have sorted all the questions by topics for you. 🤓

    This doesn't mean that such feature won't be added in the future. "Never say Never"

    But as of now there is neither plan nor data to do so. 😢

  • Why is this platform free? 🤗

    Currently there is no major cost involved in maintaining this platform other than time and effort that is put in by every contributor. If you want to help you can contribute here.

    If you still want to pay for something that is free, we would request you to donate it to a charity of your choice instead. 😇

Credits

Maintained by

👨‍🎓 Kuldeep Singh Sidhu

Github: github/singhsidhukuldeep https://github.com/singhsidhukuldeep

Website: Kuldeep Singh Sidhu (Website) http://kuldeepsinghsidhu.com

LinkedIn: Kuldeep Singh Sidhu (LinkedIn) https://www.linkedin.com/in/singhsidhukuldeep/

Contributors

😎 The full list of all the contributors is available here

Current Status

Maintenance Website shields.io GitHub pages status GitHub up-time BOT Commits

Issues Total Commits Contributors Forks Stars Watchers Branches

License: AGPL v3 made-with-python made-with-Markdown repo- size Followers


Last update: August 3, 2022
\ No newline at end of file diff --git a/projects/index.html b/projects/index.html index 141b4fd..61cef7b 100644 --- a/projects/index.html +++ b/projects/index.html @@ -1 +1 @@ - Projects - Data Science Interview preparation

Projects

Introduction

These are the projects that you can take inspiration from and try to improve on them. ✍️

Number of Projects

Github Google Collab

List of projects

Natural Language processing (NLP)

Title Description Source Author
Text Classification with Facebook fasttext Building the User Review Model with fastText (Text Classification) with response time of less than one second Github Kuldeep Singh Sidhu
Chat-bot using ChatterBot ChatterBot is a Python library that makes it easy to generate automated responses to a user’s input. Github Kuldeep Singh Sidhu
Text Summarizer Comparing state of the art models for text summary generation Github Google Collab Kuldeep Singh Sidhu
NLP with Spacy Building NLP pipeline using Spacy Github Kuldeep Singh Sidhu

Recommendation Engine

Title Description Source Author
Recommendation Engine with Surprise Comparing different recommendation systems algorithms like SVD, SVDpp (Matrix Factorization), KNN Baseline, KNN Basic, KNN Means, KNN ZScore), Baseline, Co Clustering Github Google Collab Kuldeep Singh Sidhu

Image Processing

Title Description Source Author
Facial Landmarks Using Dlib, a library capable of giving you 68 points (land marks) of the face. Github Kuldeep Singh Sidhu

Reinforcement Learning

Title Description Source Author
Google Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. Github Google Collab Kuldeep Singh Sidhu
Tic Tac Toe Training a computer to play Tic Tac Toe using reinforcement learning algorithms. Github Google Collab Kuldeep Singh Sidhu

Others

Title Description Source Author
TensorFlow Eager Execution Eager Execution (EE) enables you to run operations immediately. Github Google Collab Kuldeep Singh Sidhu

Last update: July 3, 2020
\ No newline at end of file + Projects - Data Science Interview preparation

Projects

Introduction

These are the projects that you can take inspiration from and try to improve on them. ✍️

Number of Projects

Github Google Collab

List of projects

Natural Language processing (NLP)

Title Description Source Author
Text Classification with Facebook fasttext Building the User Review Model with fastText (Text Classification) with response time of less than one second Github Kuldeep Singh Sidhu
Chat-bot using ChatterBot ChatterBot is a Python library that makes it easy to generate automated responses to a user’s input. Github Kuldeep Singh Sidhu
Text Summarizer Comparing state of the art models for text summary generation Github Google Collab Kuldeep Singh Sidhu
NLP with Spacy Building NLP pipeline using Spacy Github Kuldeep Singh Sidhu

Recommendation Engine

Title Description Source Author
Recommendation Engine with Surprise Comparing different recommendation systems algorithms like SVD, SVDpp (Matrix Factorization), KNN Baseline, KNN Basic, KNN Means, KNN ZScore), Baseline, Co Clustering Github Google Collab Kuldeep Singh Sidhu

Image Processing

Title Description Source Author
Facial Landmarks Using Dlib, a library capable of giving you 68 points (land marks) of the face. Github Kuldeep Singh Sidhu

Reinforcement Learning

Title Description Source Author
Google Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. Github Google Collab Kuldeep Singh Sidhu
Tic Tac Toe Training a computer to play Tic Tac Toe using reinforcement learning algorithms. Github Google Collab Kuldeep Singh Sidhu

Others

Title Description Source Author
TensorFlow Eager Execution Eager Execution (EE) enables you to run operations immediately. Github Google Collab Kuldeep Singh Sidhu

Last update: July 3, 2020
\ No newline at end of file diff --git a/search/search_index.json b/search/search_index.json index cce7754..cc180b5 100644 --- a/search/search_index.json +++ b/search/search_index.json @@ -1 +1 @@ -{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"Home Introduction This is a completely open-source platform for maintaining curated list of interview questions and answers for people looking and preparing for data science opportunities. Not only this, the platform will also serve as one point destination for all your needs like tutorials, online materials, etc. This platform is maintained by you! \ud83e\udd17 You can help us by answering/ improving existing questions as well as by sharing any new questions that you faced during your interviews. Contribute to the platform Contribution in any form will be deeply appreciated. \ud83d\ude4f Add questions \u2753 Add your questions here . Please ensure to provide a detailed description to allow your fellow contributors to understand your questions and answer them to your satisfaction. \ud83e\udd1d Please note that as of now, you cannot directly add a question via a pull request. This will help us to maintain the quality of the content for you . Add answers/topics \ud83d\udcdd These are the answers/topics that need your help at the moment Add documentation for the project Online Material for Learning Suggested Learning Paths Cheat Sheets Django Flask Numpy Pandas PySpark Python RegEx SQL NLP Interview Questions Add python common DSA interview questions Add Major ML topics Linear Regression Logistic Regression SVM Random Forest Gradient boosting PCA Collaborative Filtering K-means clustering kNN ARIMA Neural Networks Decision Trees Overfitting, Underfitting Unbalanced, Skewed data Activation functions relu/ leaky relu Normalization DBSCAN Normal Distribution Precision, Recall Loss Function MAE, RMSE Add Pandas questions Add NumPy questions Add TensorFlow questions Add PyTorch questions Add list of learning resources Report/Solve Issues \ud83d\udd27 To report any issues find me on LinkedIn or raise an issue on GitHub. \ud83d\udee0 You can also solve existing issues on GitHub and create a pull request. Say Thanks \ud83d\ude0a If this platform helped you in any way, it would be great if you could share it with others. Check out this \ud83d\udc47 platform \ud83d\udc47 for data science content: \ud83d\udc49 https://singhsidhukuldeep.github.io/data-science-interview-prep/ \ud83d\udc48 You can also star the repository on GitHub and watch-out for any updates Features \ud83c\udfa8 Beautiful: The design is built on top of most popular libraries like MkDocs and material which allows the platform to be responsive and to work on all sorts of devices \u2013 from mobile phones to wide-screens. The underlying fluid layout will always adapt perfectly to the available screen space. \ud83e\uddd0 Searchable: almost magically, all the content on the website is searchable without any further ado. The built-in search \u2013 server-less \u2013 is fast and accurate in responses to any of the queries. \ud83d\ude4c Accessible: Easy to use: \ud83d\udc4c The website is hosted on github-pages and is free and open to use to over 40 million users of GitHub in 100+ countries. Easy to contribute: \ud83e\udd1d The website embodies the concept of collaboration to the latter. Allowing anyone to add/improve the content. To make contributing easy, everything is written in MarkDown and then compiled to beautiful html. Setup No setup is required for usage of the platform Important: It is strongly advised to use virtual environment and not change anything in gh-pages Linux Systems python3 -m venv ./venv source venv/bin/activate pip3 install -r requirements.txt deactivate Windows Systems python3 -m venv ./venv venv \\S cripts \\a ctivate pip3 install -r requirements.txt venv \\S cripts \\d eactivate To install the latest pip3 install mkdocs pip3 install mkdocs-material pip3 install mkdocs-minify-plugin pip3 install mkdocs-git-revision-date-localized-plugin Useful Commands mkdocs serve - Start the live-reloading docs server. mkdocs build - Build the documentation site. mkdocs -h - Print help message and exit. mkdocs gh-deploy - Use mkdocs gh-deploy --help to get a full list of options available for the gh-deploy command. Be aware that you will not be able to review the built site before it is pushed to GitHub. Therefore, you may want to verify any changes you make to the docs beforehand by using the build or serve commands and reviewing the built files locally. mkdocs new [dir-name] - Create a new project. No need to create a new project Useful Documents \ud83d\udcd1 MkDocs: GitHub: https://github.com/mkdocs/mkdocs Documentation: https://www.mkdocs.org/ \ud83c\udfa8 Theme: GitHub: https://github.com/squidfunk/mkdocs-material Documentation: https://squidfunk.github.io/mkdocs-material/getting-started/ FAQ Can I filter questions based on companies? \ud83e\udd2a As much as this platform aims to help you with your interview preparation, it is not a short-cut to crack one. Think of this platform as a practicing field to help you sharpen your skills for your interview processes. However, for your convenience we have sorted all the questions by topics for you. \ud83e\udd13 This doesn't mean that such feature won't be added in the future. \"Never say Never\" But as of now there is neither plan nor data to do so. \ud83d\ude22 Why is this platform free? \ud83e\udd17 Currently there is no major cost involved in maintaining this platform other than time and effort that is put in by every contributor . If you want to help you can contribute here . If you still want to pay for something that is free, we would request you to donate it to a charity of your choice instead. \ud83d\ude07 Credits Maintained by \ud83d\udc68\u200d\ud83c\udf93 Kuldeep Singh Sidhu Github: github/singhsidhukuldeep https://github.com/singhsidhukuldeep Website: Kuldeep Singh Sidhu (Website) http://kuldeepsinghsidhu.com LinkedIn: Kuldeep Singh Sidhu (LinkedIn) https://www.linkedin.com/in/singhsidhukuldeep/ Contributors \ud83d\ude0e The full list of all the contributors is available here Current Status","title":"\ud83c\udfe1 Home"},{"location":"#home","text":"","title":"Home"},{"location":"#introduction","text":"This is a completely open-source platform for maintaining curated list of interview questions and answers for people looking and preparing for data science opportunities. Not only this, the platform will also serve as one point destination for all your needs like tutorials, online materials, etc. This platform is maintained by you! \ud83e\udd17 You can help us by answering/ improving existing questions as well as by sharing any new questions that you faced during your interviews.","title":"Introduction"},{"location":"#contribute-to-the-platform","text":"Contribution in any form will be deeply appreciated. \ud83d\ude4f","title":"Contribute to the platform"},{"location":"#add-questions","text":"\u2753 Add your questions here . Please ensure to provide a detailed description to allow your fellow contributors to understand your questions and answer them to your satisfaction. \ud83e\udd1d Please note that as of now, you cannot directly add a question via a pull request. This will help us to maintain the quality of the content for you .","title":"Add questions"},{"location":"#add-answerstopics","text":"\ud83d\udcdd These are the answers/topics that need your help at the moment Add documentation for the project Online Material for Learning Suggested Learning Paths Cheat Sheets Django Flask Numpy Pandas PySpark Python RegEx SQL NLP Interview Questions Add python common DSA interview questions Add Major ML topics Linear Regression Logistic Regression SVM Random Forest Gradient boosting PCA Collaborative Filtering K-means clustering kNN ARIMA Neural Networks Decision Trees Overfitting, Underfitting Unbalanced, Skewed data Activation functions relu/ leaky relu Normalization DBSCAN Normal Distribution Precision, Recall Loss Function MAE, RMSE Add Pandas questions Add NumPy questions Add TensorFlow questions Add PyTorch questions Add list of learning resources","title":"Add answers/topics"},{"location":"#reportsolve-issues","text":"\ud83d\udd27 To report any issues find me on LinkedIn or raise an issue on GitHub. \ud83d\udee0 You can also solve existing issues on GitHub and create a pull request.","title":"Report/Solve Issues"},{"location":"#say-thanks","text":"\ud83d\ude0a If this platform helped you in any way, it would be great if you could share it with others. Check out this \ud83d\udc47 platform \ud83d\udc47 for data science content: \ud83d\udc49 https://singhsidhukuldeep.github.io/data-science-interview-prep/ \ud83d\udc48 You can also star the repository on GitHub and watch-out for any updates","title":"Say Thanks"},{"location":"#features","text":"\ud83c\udfa8 Beautiful: The design is built on top of most popular libraries like MkDocs and material which allows the platform to be responsive and to work on all sorts of devices \u2013 from mobile phones to wide-screens. The underlying fluid layout will always adapt perfectly to the available screen space. \ud83e\uddd0 Searchable: almost magically, all the content on the website is searchable without any further ado. The built-in search \u2013 server-less \u2013 is fast and accurate in responses to any of the queries. \ud83d\ude4c Accessible: Easy to use: \ud83d\udc4c The website is hosted on github-pages and is free and open to use to over 40 million users of GitHub in 100+ countries. Easy to contribute: \ud83e\udd1d The website embodies the concept of collaboration to the latter. Allowing anyone to add/improve the content. To make contributing easy, everything is written in MarkDown and then compiled to beautiful html.","title":"Features"},{"location":"#setup","text":"No setup is required for usage of the platform Important: It is strongly advised to use virtual environment and not change anything in gh-pages","title":"Setup"},{"location":"#linux-systems","text":"python3 -m venv ./venv source venv/bin/activate pip3 install -r requirements.txt deactivate","title":"Linux Systems"},{"location":"#windows-systems","text":"python3 -m venv ./venv venv \\S cripts \\a ctivate pip3 install -r requirements.txt venv \\S cripts \\d eactivate","title":"Windows Systems"},{"location":"#to-install-the-latest","text":"pip3 install mkdocs pip3 install mkdocs-material pip3 install mkdocs-minify-plugin pip3 install mkdocs-git-revision-date-localized-plugin","title":"To install the latest"},{"location":"#useful-commands","text":"mkdocs serve - Start the live-reloading docs server. mkdocs build - Build the documentation site. mkdocs -h - Print help message and exit. mkdocs gh-deploy - Use mkdocs gh-deploy --help to get a full list of options available for the gh-deploy command. Be aware that you will not be able to review the built site before it is pushed to GitHub. Therefore, you may want to verify any changes you make to the docs beforehand by using the build or serve commands and reviewing the built files locally. mkdocs new [dir-name] - Create a new project. No need to create a new project","title":"Useful Commands"},{"location":"#useful-documents","text":"\ud83d\udcd1 MkDocs: GitHub: https://github.com/mkdocs/mkdocs Documentation: https://www.mkdocs.org/ \ud83c\udfa8 Theme: GitHub: https://github.com/squidfunk/mkdocs-material Documentation: https://squidfunk.github.io/mkdocs-material/getting-started/","title":"Useful Documents"},{"location":"#faq","text":"Can I filter questions based on companies? \ud83e\udd2a As much as this platform aims to help you with your interview preparation, it is not a short-cut to crack one. Think of this platform as a practicing field to help you sharpen your skills for your interview processes. However, for your convenience we have sorted all the questions by topics for you. \ud83e\udd13 This doesn't mean that such feature won't be added in the future. \"Never say Never\" But as of now there is neither plan nor data to do so. \ud83d\ude22 Why is this platform free? \ud83e\udd17 Currently there is no major cost involved in maintaining this platform other than time and effort that is put in by every contributor . If you want to help you can contribute here . If you still want to pay for something that is free, we would request you to donate it to a charity of your choice instead. \ud83d\ude07","title":"FAQ"},{"location":"#credits","text":"","title":"Credits"},{"location":"#maintained-by","text":"\ud83d\udc68\u200d\ud83c\udf93 Kuldeep Singh Sidhu Github: github/singhsidhukuldeep https://github.com/singhsidhukuldeep Website: Kuldeep Singh Sidhu (Website) http://kuldeepsinghsidhu.com LinkedIn: Kuldeep Singh Sidhu (LinkedIn) https://www.linkedin.com/in/singhsidhukuldeep/","title":"Maintained by"},{"location":"#contributors","text":"\ud83d\ude0e The full list of all the contributors is available here","title":"Contributors"},{"location":"#current-status","text":"","title":"Current Status"},{"location":"Suggested-Learning-Paths/","text":"","title":"\ud83d\udcc5 Suggested Learning Paths"},{"location":"projects/","text":"Projects Introduction These are the projects that you can take inspiration from and try to improve on them. \u270d\ufe0f Popular Sources List of projects Natural Language processing (NLP) Title Description Source Author Text Classification with Facebook fasttext Building the User Review Model with fastText (Text Classification) with response time of less than one second Kuldeep Singh Sidhu Chat-bot using ChatterBot ChatterBot is a Python library that makes it easy to generate automated responses to a user\u2019s input. Kuldeep Singh Sidhu Text Summarizer Comparing state of the art models for text summary generation Kuldeep Singh Sidhu NLP with Spacy Building NLP pipeline using Spacy Kuldeep Singh Sidhu Recommendation Engine Title Description Source Author Recommendation Engine with Surprise Comparing different recommendation systems algorithms like SVD, SVDpp (Matrix Factorization), KNN Baseline, KNN Basic, KNN Means, KNN ZScore), Baseline, Co Clustering Kuldeep Singh Sidhu Image Processing Title Description Source Author Facial Landmarks Using Dlib, a library capable of giving you 68 points (land marks) of the face. Kuldeep Singh Sidhu Reinforcement Learning Title Description Source Author Google Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. Kuldeep Singh Sidhu Tic Tac Toe Training a computer to play Tic Tac Toe using reinforcement learning algorithms. Kuldeep Singh Sidhu Others Title Description Source Author TensorFlow Eager Execution Eager Execution (EE) enables you to run operations immediately. Kuldeep Singh Sidhu","title":"\ud83d\udcf3 Projects"},{"location":"projects/#projects","text":"","title":"Projects"},{"location":"projects/#introduction","text":"These are the projects that you can take inspiration from and try to improve on them. \u270d\ufe0f","title":"Introduction"},{"location":"projects/#popular-sources","text":"","title":"Popular Sources"},{"location":"projects/#list-of-projects","text":"","title":"List of projects"},{"location":"projects/#natural-language-processing-nlp","text":"Title Description Source Author Text Classification with Facebook fasttext Building the User Review Model with fastText (Text Classification) with response time of less than one second Kuldeep Singh Sidhu Chat-bot using ChatterBot ChatterBot is a Python library that makes it easy to generate automated responses to a user\u2019s input. Kuldeep Singh Sidhu Text Summarizer Comparing state of the art models for text summary generation Kuldeep Singh Sidhu NLP with Spacy Building NLP pipeline using Spacy Kuldeep Singh Sidhu","title":"Natural Language processing (NLP)"},{"location":"projects/#recommendation-engine","text":"Title Description Source Author Recommendation Engine with Surprise Comparing different recommendation systems algorithms like SVD, SVDpp (Matrix Factorization), KNN Baseline, KNN Basic, KNN Means, KNN ZScore), Baseline, Co Clustering Kuldeep Singh Sidhu","title":"Recommendation Engine"},{"location":"projects/#image-processing","text":"Title Description Source Author Facial Landmarks Using Dlib, a library capable of giving you 68 points (land marks) of the face. Kuldeep Singh Sidhu","title":"Image Processing"},{"location":"projects/#reinforcement-learning","text":"Title Description Source Author Google Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. Kuldeep Singh Sidhu Tic Tac Toe Training a computer to play Tic Tac Toe using reinforcement learning algorithms. Kuldeep Singh Sidhu","title":"Reinforcement Learning"},{"location":"projects/#others","text":"Title Description Source Author TensorFlow Eager Execution Eager Execution (EE) enables you to run operations immediately. Kuldeep Singh Sidhu","title":"Others"},{"location":"Cheat-Sheets/Django/","text":"","title":"Django"},{"location":"Cheat-Sheets/Flask/","text":"","title":"Flask"},{"location":"Cheat-Sheets/Keras/","text":"","title":"Keras"},{"location":"Cheat-Sheets/NumPy/","text":"","title":"NumPy"},{"location":"Cheat-Sheets/Pandas/","text":"","title":"Pandas"},{"location":"Cheat-Sheets/PySpark/","text":"","title":"PySpark"},{"location":"Cheat-Sheets/PyTorch/","text":"","title":"PyTorch"},{"location":"Cheat-Sheets/Python/","text":"","title":"Python"},{"location":"Cheat-Sheets/RegEx/","text":"","title":"Regular Expressions (RegEx)"},{"location":"Cheat-Sheets/SQL/","text":"","title":"SQL"},{"location":"Cheat-Sheets/Sk-learn/","text":"","title":"Scikit Learn"},{"location":"Cheat-Sheets/tensorflow/","text":"","title":"TensorFlow"},{"location":"Deploying-ML-models/deploying-ml-models/","text":"Home Introduction This is a completely open-source platform for maintaining curated list of interview questions and answers for people looking and preparing for data science opportunities. Not only this, the platform will also serve as one point destination for all your needs like tutorials, online materials, etc. This platform is maintained by you! \ud83e\udd17 You can help us by answering/ improving existing questions as well as by sharing any new questions that you faced during your interviews. Contribute to the platform Contribution in any form will be deeply appreciated. \ud83d\ude4f Add questions \u2753 Add your questions here . Please ensure to provide a detailed description to allow your fellow contributors to understand your questions and answer them to your satisfaction. \ud83e\udd1d Please note that as of now, you cannot directly add a question via a pull request. This will help us to maintain the quality of the content for you . Add answers/topics \ud83d\udcdd These are the answers/topics that need your help at the moment Add documentation for the project Online Material for Learning Suggested Learning Paths Cheat Sheets Django Flask Numpy Pandas PySpark Python RegEx SQL NLP Interview Questions Add python common DSA interview questions Add Major ML topics Linear Regression Logistic Regression SVM Random Forest Gradient boosting PCA Collaborative Filtering K-means clustering kNN ARIMA Neural Networks Decision Trees Overfitting, Underfitting Unbalanced, Skewed data Activation functions relu/ leaky relu Normalization DBSCAN Normal Distribution Precision, Recall Loss Function MAE, RMSE Add Pandas questions Add NumPy questions Add TensorFlow questions Add PyTorch questions Add list of learning resources Report/Solve Issues \ud83d\udd27 To report any issues find me on LinkedIn or raise an issue on GitHub. \ud83d\udee0 You can also solve existing issues on GitHub and create a pull request. Say Thanks \ud83d\ude0a If this platform helped you in any way, it would be great if you could share it with others. Check out this \ud83d\udc47 platform \ud83d\udc47 for data science content: \ud83d\udc49 https://singhsidhukuldeep.github.io/data-science-interview-prep/ \ud83d\udc48 #data-science #machine-learning #interview-preparation You can also star the repository on GitHub and watch-out for any updates Features \ud83c\udfa8 Beautiful: The design is built on top of most popular libraries like MkDocs and material which allows the platform to be responsive and to work on all sorts of devices \u2013 from mobile phones to wide-screens. The underlying fluid layout will always adapt perfectly to the available screen space. \ud83e\uddd0 Searchable: almost magically, all the content on the website is searchable without any further ado. The built-in search \u2013 server-less \u2013 is fast and accurate in responses to any of the queries. \ud83d\ude4c Accessible: Easy to use: \ud83d\udc4c The website is hosted on github-pages and is free and open to use to over 40 million users of GitHub in 100+ countries. Easy to contribute: \ud83e\udd1d The website embodies the concept of collaboration to the latter. Allowing anyone to add/improve the content. To make contributing easy, everything is written in MarkDown and then compiled to beautiful html. Setup No setup is required for usage of the platform Important: It is strongly advised to use virtual environment and not change anything in gh-pages Linux Systems python3 -m venv ./venv source venv/bin/activate pip3 install -r requirements.txt deactivate Windows Systems python3 -m venv ./venv venv \\S cripts \\a ctivate pip3 install -r requirements.txt venv \\S cripts \\d eactivate To install the latest pip3 install mkdocs pip3 install mkdocs-material Useful Commands mkdocs serve - Start the live-reloading docs server. mkdocs build - Build the documentation site. mkdocs -h - Print help message and exit. mkdocs gh-deploy - Use mkdocs gh-deploy --help to get a full list of options available for the gh-deploy command. Be aware that you will not be able to review the built site before it is pushed to GitHub. Therefore, you may want to verify any changes you make to the docs beforehand by using the build or serve commands and reviewing the built files locally. mkdocs new [dir-name] - Create a new project. No need to create a new project Useful Documents \ud83d\udcd1 MkDocs: https://github.com/mkdocs/mkdocs \ud83c\udfa8 Theme: https://github.com/squidfunk/mkdocs-material FAQ Can I filter questions based on companies? \ud83e\udd2a As much as this platform aims to help you with your interview preparation, it is not a short-cut to crack one. Think of this platform as a practicing field to help you sharpen your skills for your interview processes. However, for your convenience we have sorted all the questions by topics for you. \ud83e\udd13 This doesn't mean that such feature won't be added in the future. \"Never say Never\" But as of now there is neither plan nor data to do so. \ud83d\ude22 Why is this platform free? \ud83e\udd17 Currently there is no major cost involved in maintaining this platform other than time and effort that is put in by every contributor . If you want to help you can contribute here . If you still want to pay for something that is free, we would request you to donate it to a charity of your choice instead. \ud83d\ude07 Credits Maintained by \ud83d\udc68\u200d\ud83c\udf93 Kuldeep Singh Sidhu Github: github/singhsidhukuldeep https://github.com/singhsidhukuldeep Website: Kuldeep Singh Sidhu (Website) http://kuldeepsinghsidhu.com LinkedIn: Kuldeep Singh Sidhu (LinkedIn) https://www.linkedin.com/in/singhsidhukuldeep/ Contributors \ud83d\ude0e The full list of all the contributors is available here Current Status","title":"Production Deployment"},{"location":"Deploying-ML-models/deploying-ml-models/#home","text":"","title":"Home"},{"location":"Deploying-ML-models/deploying-ml-models/#introduction","text":"This is a completely open-source platform for maintaining curated list of interview questions and answers for people looking and preparing for data science opportunities. Not only this, the platform will also serve as one point destination for all your needs like tutorials, online materials, etc. This platform is maintained by you! \ud83e\udd17 You can help us by answering/ improving existing questions as well as by sharing any new questions that you faced during your interviews.","title":"Introduction"},{"location":"Deploying-ML-models/deploying-ml-models/#contribute-to-the-platform","text":"Contribution in any form will be deeply appreciated. \ud83d\ude4f","title":"Contribute to the platform"},{"location":"Deploying-ML-models/deploying-ml-models/#add-questions","text":"\u2753 Add your questions here . Please ensure to provide a detailed description to allow your fellow contributors to understand your questions and answer them to your satisfaction. \ud83e\udd1d Please note that as of now, you cannot directly add a question via a pull request. This will help us to maintain the quality of the content for you .","title":"Add questions"},{"location":"Deploying-ML-models/deploying-ml-models/#add-answerstopics","text":"\ud83d\udcdd These are the answers/topics that need your help at the moment Add documentation for the project Online Material for Learning Suggested Learning Paths Cheat Sheets Django Flask Numpy Pandas PySpark Python RegEx SQL NLP Interview Questions Add python common DSA interview questions Add Major ML topics Linear Regression Logistic Regression SVM Random Forest Gradient boosting PCA Collaborative Filtering K-means clustering kNN ARIMA Neural Networks Decision Trees Overfitting, Underfitting Unbalanced, Skewed data Activation functions relu/ leaky relu Normalization DBSCAN Normal Distribution Precision, Recall Loss Function MAE, RMSE Add Pandas questions Add NumPy questions Add TensorFlow questions Add PyTorch questions Add list of learning resources","title":"Add answers/topics"},{"location":"Deploying-ML-models/deploying-ml-models/#reportsolve-issues","text":"\ud83d\udd27 To report any issues find me on LinkedIn or raise an issue on GitHub. \ud83d\udee0 You can also solve existing issues on GitHub and create a pull request.","title":"Report/Solve Issues"},{"location":"Deploying-ML-models/deploying-ml-models/#say-thanks","text":"\ud83d\ude0a If this platform helped you in any way, it would be great if you could share it with others. Check out this \ud83d\udc47 platform \ud83d\udc47 for data science content: \ud83d\udc49 https://singhsidhukuldeep.github.io/data-science-interview-prep/ \ud83d\udc48 #data-science #machine-learning #interview-preparation You can also star the repository on GitHub and watch-out for any updates","title":"Say Thanks"},{"location":"Deploying-ML-models/deploying-ml-models/#features","text":"\ud83c\udfa8 Beautiful: The design is built on top of most popular libraries like MkDocs and material which allows the platform to be responsive and to work on all sorts of devices \u2013 from mobile phones to wide-screens. The underlying fluid layout will always adapt perfectly to the available screen space. \ud83e\uddd0 Searchable: almost magically, all the content on the website is searchable without any further ado. The built-in search \u2013 server-less \u2013 is fast and accurate in responses to any of the queries. \ud83d\ude4c Accessible: Easy to use: \ud83d\udc4c The website is hosted on github-pages and is free and open to use to over 40 million users of GitHub in 100+ countries. Easy to contribute: \ud83e\udd1d The website embodies the concept of collaboration to the latter. Allowing anyone to add/improve the content. To make contributing easy, everything is written in MarkDown and then compiled to beautiful html.","title":"Features"},{"location":"Deploying-ML-models/deploying-ml-models/#setup","text":"No setup is required for usage of the platform Important: It is strongly advised to use virtual environment and not change anything in gh-pages","title":"Setup"},{"location":"Deploying-ML-models/deploying-ml-models/#linux-systems","text":"python3 -m venv ./venv source venv/bin/activate pip3 install -r requirements.txt deactivate","title":"Linux Systems"},{"location":"Deploying-ML-models/deploying-ml-models/#windows-systems","text":"python3 -m venv ./venv venv \\S cripts \\a ctivate pip3 install -r requirements.txt venv \\S cripts \\d eactivate","title":"Windows Systems"},{"location":"Deploying-ML-models/deploying-ml-models/#to-install-the-latest","text":"pip3 install mkdocs pip3 install mkdocs-material","title":"To install the latest"},{"location":"Deploying-ML-models/deploying-ml-models/#useful-commands","text":"mkdocs serve - Start the live-reloading docs server. mkdocs build - Build the documentation site. mkdocs -h - Print help message and exit. mkdocs gh-deploy - Use mkdocs gh-deploy --help to get a full list of options available for the gh-deploy command. Be aware that you will not be able to review the built site before it is pushed to GitHub. Therefore, you may want to verify any changes you make to the docs beforehand by using the build or serve commands and reviewing the built files locally. mkdocs new [dir-name] - Create a new project. No need to create a new project","title":"Useful Commands"},{"location":"Deploying-ML-models/deploying-ml-models/#useful-documents","text":"\ud83d\udcd1 MkDocs: https://github.com/mkdocs/mkdocs \ud83c\udfa8 Theme: https://github.com/squidfunk/mkdocs-material","title":"Useful Documents"},{"location":"Deploying-ML-models/deploying-ml-models/#faq","text":"Can I filter questions based on companies? \ud83e\udd2a As much as this platform aims to help you with your interview preparation, it is not a short-cut to crack one. Think of this platform as a practicing field to help you sharpen your skills for your interview processes. However, for your convenience we have sorted all the questions by topics for you. \ud83e\udd13 This doesn't mean that such feature won't be added in the future. \"Never say Never\" But as of now there is neither plan nor data to do so. \ud83d\ude22 Why is this platform free? \ud83e\udd17 Currently there is no major cost involved in maintaining this platform other than time and effort that is put in by every contributor . If you want to help you can contribute here . If you still want to pay for something that is free, we would request you to donate it to a charity of your choice instead. \ud83d\ude07","title":"FAQ"},{"location":"Deploying-ML-models/deploying-ml-models/#credits","text":"","title":"Credits"},{"location":"Deploying-ML-models/deploying-ml-models/#maintained-by","text":"\ud83d\udc68\u200d\ud83c\udf93 Kuldeep Singh Sidhu Github: github/singhsidhukuldeep https://github.com/singhsidhukuldeep Website: Kuldeep Singh Sidhu (Website) http://kuldeepsinghsidhu.com LinkedIn: Kuldeep Singh Sidhu (LinkedIn) https://www.linkedin.com/in/singhsidhukuldeep/","title":"Maintained by"},{"location":"Deploying-ML-models/deploying-ml-models/#contributors","text":"\ud83d\ude0e The full list of all the contributors is available here","title":"Contributors"},{"location":"Deploying-ML-models/deploying-ml-models/#current-status","text":"","title":"Current Status"},{"location":"Interview-Questions/Natural-Language-Processing/","text":"NLP Interview Questions","title":"Natural Language Processing (NLP)"},{"location":"Interview-Questions/Natural-Language-Processing/#nlp-interview-questions","text":"","title":"NLP Interview Questions"},{"location":"Interview-Questions/Probability/","text":"Probability Interview Questions Average score on a dice role of at most 3 times Question Consider a fair 6-sided dice. Your aim is to get the highest score you can, in at-most 3 roles. A score is defined as the number that appears on the face of the dice facing up after the role. You can role at most 3 times but every time you role it is up to you to decide whether you want to role again. The last score will be counted as your final score. Find the average score if you rolled the dice only once? Find the average score that you can get with at most 3 roles? If the dice is fair, why is the average score for at most 3 roles and 1 role not the same? Answer If you role a fair dice once you can get: Score Probability 1 \u2159 2 \u2159 3 \u2159 4 \u2159 5 \u2159 6 \u2159 So your average score with one role is: sum of(score * scores's probability) = (1+2+3+4+5+6)*(\u2159) = (21/6) = 3.5 The average score if you rolled the dice only once is 3.5 For at most 3 roles, let's try back-tracking. Let's say just did your second role and you have to decide whether to do your 3 rd role! We just found out if we role dice once on average we can expect score of 3.5. So we will only role the 3 rd time if score on 2 nd role is less than 3.5 i.e (1,2 or 3) Possibilities 2 nd role score Probability 3 rd role score Probability 1 \u2159 3.5 \u2159 2 \u2159 3.5 \u2159 3 \u2159 3.5 \u2159 4 \u2159 NA We won't role 5 \u2159 NA 3 rd time if we 6 \u2159 NA get score >3 on 2 nd So if we had 2 roles, average score would be: [We role again if current score is less than 3.4] (3.5)*(1/6) + (3.5)*(1/6) + (3.5)*(1/6) + (4)*(1/6) + (5)*(1/6) + (6)*(1/6) [Decide not to role again] = 1.75 + 2.5 = 4.25 The average score if you rolled the dice twice is 4.25 So now if we look from the perspective of first role. We will only role again if our score is less than 4.25 i.e 1,2,3 or 4 Possibilities 1 st role score Probability 2 nd and 3 rd role score Probability 1 \u2159 4.25 \u2159 2 \u2159 4.25 \u2159 3 \u2159 4.25 \u2159 4 \u2159 4.25 \u2159 5 \u2159 NA We won't role again if we 6 \u2159 NA get score >4.25 on 1 st So if we had 3 roles, average score would be: [We role again if current score is less than 4.25] (4.25)*(1/6) + (4.25)*(1/6) + (4.25)*(1/6) + (4.25)*(1/6) + (5)*(1/6) + (6)*(1/6) [[Decide not to role again] = 17/6 + 11/6 = 4.66 The average score if you rolled the dice only once is 4.66 The average score for at most 3 roles and 1 role is not the same because although the dice is fair the event of rolling the dice is no longer independent . The scores would have been the same if we rolled the dice 2 nd and 3 rd time without considering what we got in the last roll i.e. if the event of rolling the dice was independent.","title":"Probability"},{"location":"Interview-Questions/Probability/#probability-interview-questions","text":"","title":"Probability Interview Questions"},{"location":"Interview-Questions/Probability/#average-score-on-a-dice-role-of-at-most-3-times","text":"Question Consider a fair 6-sided dice. Your aim is to get the highest score you can, in at-most 3 roles. A score is defined as the number that appears on the face of the dice facing up after the role. You can role at most 3 times but every time you role it is up to you to decide whether you want to role again. The last score will be counted as your final score. Find the average score if you rolled the dice only once? Find the average score that you can get with at most 3 roles? If the dice is fair, why is the average score for at most 3 roles and 1 role not the same? Answer If you role a fair dice once you can get: Score Probability 1 \u2159 2 \u2159 3 \u2159 4 \u2159 5 \u2159 6 \u2159 So your average score with one role is: sum of(score * scores's probability) = (1+2+3+4+5+6)*(\u2159) = (21/6) = 3.5 The average score if you rolled the dice only once is 3.5 For at most 3 roles, let's try back-tracking. Let's say just did your second role and you have to decide whether to do your 3 rd role! We just found out if we role dice once on average we can expect score of 3.5. So we will only role the 3 rd time if score on 2 nd role is less than 3.5 i.e (1,2 or 3) Possibilities 2 nd role score Probability 3 rd role score Probability 1 \u2159 3.5 \u2159 2 \u2159 3.5 \u2159 3 \u2159 3.5 \u2159 4 \u2159 NA We won't role 5 \u2159 NA 3 rd time if we 6 \u2159 NA get score >3 on 2 nd So if we had 2 roles, average score would be: [We role again if current score is less than 3.4] (3.5)*(1/6) + (3.5)*(1/6) + (3.5)*(1/6) + (4)*(1/6) + (5)*(1/6) + (6)*(1/6) [Decide not to role again] = 1.75 + 2.5 = 4.25 The average score if you rolled the dice twice is 4.25 So now if we look from the perspective of first role. We will only role again if our score is less than 4.25 i.e 1,2,3 or 4 Possibilities 1 st role score Probability 2 nd and 3 rd role score Probability 1 \u2159 4.25 \u2159 2 \u2159 4.25 \u2159 3 \u2159 4.25 \u2159 4 \u2159 4.25 \u2159 5 \u2159 NA We won't role again if we 6 \u2159 NA get score >4.25 on 1 st So if we had 3 roles, average score would be: [We role again if current score is less than 4.25] (4.25)*(1/6) + (4.25)*(1/6) + (4.25)*(1/6) + (4.25)*(1/6) + (5)*(1/6) + (6)*(1/6) [[Decide not to role again] = 17/6 + 11/6 = 4.66 The average score if you rolled the dice only once is 4.66 The average score for at most 3 roles and 1 role is not the same because although the dice is fair the event of rolling the dice is no longer independent . The scores would have been the same if we rolled the dice 2 nd and 3 rd time without considering what we got in the last roll i.e. if the event of rolling the dice was independent.","title":"Average score on a dice role of at most 3 times"},{"location":"Interview-Questions/System-design/","text":"System Design","title":"System Design"},{"location":"Interview-Questions/System-design/#system-design","text":"","title":"System Design"},{"location":"Interview-Questions/data-structures-algorithms/","text":"Data Structure and Algorithms (DSA) To-do Add https://leetcode.com/discuss/interview-question/344650/Amazon-Online-Assessment-Questions \ud83d\ude01 Easy Two Number Sum Write a function that takes in a non-empty array of distinct integers and an integer representing a target sum. If any two numbers in the input array sum up to the target sum, the function should return them in an array, in any order. If no two numbers sum up to the target sum, the function should return an empty array. # O(n) time | O(n) space def twoNumberSum ( array , targetSum ): avail = set () for i , v in enumerate ( array ): if targetSum - v in avail : return [ targetSum - v , v ] else : avail . add ( v ) return [] pass # O(nlog(n)) time | O(1) space def twoNumberSum ( array , targetSum ): array . sort () n = len ( array ) left = 0 right = n - 1 while left < right : currSum = array [ left ] + array [ right ] if currSum == targetSum : return [ array [ left ], array [ right ]] elif currSum < targetSum : left += 1 elif currSum > targetSum : right -= 1 return [] pass # O(n^2) time | O(1) space def twoNumberSum ( array , targetSum ): n = len ( array ) for i in range ( n - 1 ): for j in range ( i + 1 , n ): if array [ i ] + array [ j ] == targetSum : return [ array [ i ], array [ j ]] return [] pass Validate Subsequence Given two non-empty arrays of integers, write a function that determines whether the second array is a subsequence of the first one. A subsequence of an array is a set of numbers that aren't necessarily adjacent in the array but that are in the same order as they appear in the array. For instance, the numbers [1, 3, 4] form a subsequence of the array [1, 2, 3, 4] , and so do the numbers [2, 4]. Note that a single number in an array and the array itself are both valid subsequences of the array. # O(n) time | O(1) space - where n is the length of the array def isValidSubsequence ( array , sequence ): pArray = pSequence = 0 while pArray < len ( array ) and pSequence < len ( sequence ): if array [ pArray ] == sequence [ pSequence ]: pArray += 1 pSequence += 1 else : pArray += 1 return pSequence == len ( sequence ) pass Nth Fibonacci The Fibonacci sequence is defined as follows: Any number in the sequence is the sum of the previous 2. for fib[n] = fib[n-1] + fib[n-2] The 1 st and 2 nd are fixed at 0,1 Find the nth Nth Fibonacci sequence # O(n) time | O(n) space def getNthFib ( n ): dp = [ 0 , 1 ] while len ( dp ) < n : dp . append ( dp [ - 1 ] + dp [ - 2 ]) return dp [ n - 1 ] pass # O(n) time | O(1) space def getNthFib ( n ): last_two = [ 0 , 1 ] count = 2 while count < n : currFib = last_two [ 0 ] + last_two [ 1 ] last_two [ 0 ] = last_two [ 1 ] last_two [ 1 ] = currFib count += 1 return last_two [ 1 ] if n > 1 else last_two [ 0 ] pass Product Sum Write a function that takes in a \"special\" array and returns its product sum. A \"special\" array is a non-empty array that contains either integers or other \"special\" arrays. The product sum of a \"special\" array is the sum of its elements, where \"special\" arrays inside it are summed themselves and then multiplied by their level of depth. For example, the product sum of [x, y] is x + y ; the product sum of [x, [y, z]] is x + 2y + 2z Eg: Input: [5, 2, [7, -1], 3, [6, [-13, 8], 4]] Output: 12 # calculated as: 5 + 2 + 2 * (7 - 1) + 3 + 2 * (6 + 3 * (-13 + 8) + 4) # O(n) time | O(d) space - where n is the total number of elements in the array, # including sub-elements, and d is the greatest depth of \"special\" arrays in the array def productSum ( array , depth = 1 ): sum = 0 for i , v in enumerate ( array ): if type ( v ) is list : sum += productSum ( v , depth + 1 ) else : sum += v return sum * depth pass \ud83d\ude42 Medium Top K Frequent Words Given a non-empty list of words, return the k most frequent elements. Your answer should be sorted by frequency from highest to lowest. If two words have the same frequency, then the word with the lower alphabetical order comes first. Example 1: Input: [\"i\", \"love\", \"leetcode\", \"i\", \"love\", \"coding\"], k = 2 Output: [\"i\", \"love\"] Explanation: \"i\" and \"love\" are the two most frequent words. Note that \"i\" comes before \"love\" due to a lower alphabetical order. Example 2: Input: [\"the\", \"day\", \"is\", \"sunny\", \"the\", \"the\", \"the\", \"sunny\", \"is\", \"is\"], k = 4 Output: [\"the\", \"is\", \"sunny\", \"day\"] Explanation: \"the\", \"is\", \"sunny\" and \"day\" are the four most frequent words, with the number of occurrence being 4, 3, 2 and 1 respectively. Note: You may assume k is always valid, 1 \u2264 k \u2264 number of unique elements. Input words contain only lowercase letters. Follow up: Try to solve it in O ( n log k ) time and O ( n ) extra space. # Count the frequency of each word, and # sort the words with a custom ordering relation # that uses these frequencies. Then take the best k of them. # Time Complexity: O(N \\log{N})O(NlogN), where NN is the length of words. # We count the frequency of each word in O(N)O(N) time, # then we sort the given words in O(N \\log{N})O(NlogN) time. # Space Complexity: O(N)O(N), the space used to store our uniqueWords. def topKFrequentWords ( words , k ) -> List [ str ]: from collections import Counter wordsFreq = Counter ( words ) uniqueWords = list ( wordsFreq . keys ()) uniqueWords . sort ( key = lambda x : ( - wordsFreq [ x ], x )) return uniqueWords [: k ] # Time Complexity: O(N \\log{k})O(Nlogk), where NN is the length of words. # We count the frequency of each word in O(N)O(N) time, then we add NN words to the heap, # each in O(\\log {k})O(logk) time. Finally, we pop from the heap up to kk times. # As k \\leq Nk\u2264N, this is O(N \\log{k})O(Nlogk) in total. # In Python, we improve this to O(N + k \\log {N})O(N+klogN): our heapq.heapify operation and # counting operations are O(N)O(N), and # each of kk heapq.heappop operations are O(\\log {N})O(logN). # Space Complexity: O(N)O(N), the space used to store our wordsFreq. # Count the frequency of each word, then add it to heap that stores the best k candidates. # Here, \"best\" is defined with our custom ordering relation, # which puts the worst candidates at the top of the heap. # At the end, we pop off the heap up to k times and reverse the result # so that the best candidates are first. # In Python, we instead use heapq.heapify, which can turn a list into a heap in linear time, # simplifying our work. def topKFrequentWords ( words , k ) -> List [ str ]: from heapq import heapify , heappop #, heappush from collections import Counter wordsFreq = Counter ( words ) heap = [( - freq , word ) for word , freq in wordsFreq . items ()] heapq . heapify ( heap ) return [ heapq . heappop ( heap )[ 1 ] for _ in range ( k )] \ud83e\udd28 Hard \ud83d\ude32 Very Hard","title":"DSA (Data Structures & Algorithms)"},{"location":"Interview-Questions/data-structures-algorithms/#data-structure-and-algorithms-dsa","text":"","title":"Data Structure and Algorithms (DSA)"},{"location":"Interview-Questions/data-structures-algorithms/#to-do","text":"Add https://leetcode.com/discuss/interview-question/344650/Amazon-Online-Assessment-Questions","title":"To-do"},{"location":"Interview-Questions/data-structures-algorithms/#easy","text":"","title":"\ud83d\ude01 Easy"},{"location":"Interview-Questions/data-structures-algorithms/#two-number-sum","text":"Write a function that takes in a non-empty array of distinct integers and an integer representing a target sum. If any two numbers in the input array sum up to the target sum, the function should return them in an array, in any order. If no two numbers sum up to the target sum, the function should return an empty array. # O(n) time | O(n) space def twoNumberSum ( array , targetSum ): avail = set () for i , v in enumerate ( array ): if targetSum - v in avail : return [ targetSum - v , v ] else : avail . add ( v ) return [] pass # O(nlog(n)) time | O(1) space def twoNumberSum ( array , targetSum ): array . sort () n = len ( array ) left = 0 right = n - 1 while left < right : currSum = array [ left ] + array [ right ] if currSum == targetSum : return [ array [ left ], array [ right ]] elif currSum < targetSum : left += 1 elif currSum > targetSum : right -= 1 return [] pass # O(n^2) time | O(1) space def twoNumberSum ( array , targetSum ): n = len ( array ) for i in range ( n - 1 ): for j in range ( i + 1 , n ): if array [ i ] + array [ j ] == targetSum : return [ array [ i ], array [ j ]] return [] pass","title":"Two Number Sum"},{"location":"Interview-Questions/data-structures-algorithms/#validate-subsequence","text":"Given two non-empty arrays of integers, write a function that determines whether the second array is a subsequence of the first one. A subsequence of an array is a set of numbers that aren't necessarily adjacent in the array but that are in the same order as they appear in the array. For instance, the numbers [1, 3, 4] form a subsequence of the array [1, 2, 3, 4] , and so do the numbers [2, 4]. Note that a single number in an array and the array itself are both valid subsequences of the array. # O(n) time | O(1) space - where n is the length of the array def isValidSubsequence ( array , sequence ): pArray = pSequence = 0 while pArray < len ( array ) and pSequence < len ( sequence ): if array [ pArray ] == sequence [ pSequence ]: pArray += 1 pSequence += 1 else : pArray += 1 return pSequence == len ( sequence ) pass","title":"Validate Subsequence"},{"location":"Interview-Questions/data-structures-algorithms/#nth-fibonacci","text":"The Fibonacci sequence is defined as follows: Any number in the sequence is the sum of the previous 2. for fib[n] = fib[n-1] + fib[n-2] The 1 st and 2 nd are fixed at 0,1 Find the nth Nth Fibonacci sequence # O(n) time | O(n) space def getNthFib ( n ): dp = [ 0 , 1 ] while len ( dp ) < n : dp . append ( dp [ - 1 ] + dp [ - 2 ]) return dp [ n - 1 ] pass # O(n) time | O(1) space def getNthFib ( n ): last_two = [ 0 , 1 ] count = 2 while count < n : currFib = last_two [ 0 ] + last_two [ 1 ] last_two [ 0 ] = last_two [ 1 ] last_two [ 1 ] = currFib count += 1 return last_two [ 1 ] if n > 1 else last_two [ 0 ] pass","title":"Nth Fibonacci"},{"location":"Interview-Questions/data-structures-algorithms/#product-sum","text":"Write a function that takes in a \"special\" array and returns its product sum. A \"special\" array is a non-empty array that contains either integers or other \"special\" arrays. The product sum of a \"special\" array is the sum of its elements, where \"special\" arrays inside it are summed themselves and then multiplied by their level of depth. For example, the product sum of [x, y] is x + y ; the product sum of [x, [y, z]] is x + 2y + 2z Eg: Input: [5, 2, [7, -1], 3, [6, [-13, 8], 4]] Output: 12 # calculated as: 5 + 2 + 2 * (7 - 1) + 3 + 2 * (6 + 3 * (-13 + 8) + 4) # O(n) time | O(d) space - where n is the total number of elements in the array, # including sub-elements, and d is the greatest depth of \"special\" arrays in the array def productSum ( array , depth = 1 ): sum = 0 for i , v in enumerate ( array ): if type ( v ) is list : sum += productSum ( v , depth + 1 ) else : sum += v return sum * depth pass","title":"Product Sum"},{"location":"Interview-Questions/data-structures-algorithms/#medium","text":"","title":"\ud83d\ude42 Medium"},{"location":"Interview-Questions/data-structures-algorithms/#top-k-frequent-words","text":"Given a non-empty list of words, return the k most frequent elements. Your answer should be sorted by frequency from highest to lowest. If two words have the same frequency, then the word with the lower alphabetical order comes first. Example 1: Input: [\"i\", \"love\", \"leetcode\", \"i\", \"love\", \"coding\"], k = 2 Output: [\"i\", \"love\"] Explanation: \"i\" and \"love\" are the two most frequent words. Note that \"i\" comes before \"love\" due to a lower alphabetical order. Example 2: Input: [\"the\", \"day\", \"is\", \"sunny\", \"the\", \"the\", \"the\", \"sunny\", \"is\", \"is\"], k = 4 Output: [\"the\", \"is\", \"sunny\", \"day\"] Explanation: \"the\", \"is\", \"sunny\" and \"day\" are the four most frequent words, with the number of occurrence being 4, 3, 2 and 1 respectively. Note: You may assume k is always valid, 1 \u2264 k \u2264 number of unique elements. Input words contain only lowercase letters. Follow up: Try to solve it in O ( n log k ) time and O ( n ) extra space. # Count the frequency of each word, and # sort the words with a custom ordering relation # that uses these frequencies. Then take the best k of them. # Time Complexity: O(N \\log{N})O(NlogN), where NN is the length of words. # We count the frequency of each word in O(N)O(N) time, # then we sort the given words in O(N \\log{N})O(NlogN) time. # Space Complexity: O(N)O(N), the space used to store our uniqueWords. def topKFrequentWords ( words , k ) -> List [ str ]: from collections import Counter wordsFreq = Counter ( words ) uniqueWords = list ( wordsFreq . keys ()) uniqueWords . sort ( key = lambda x : ( - wordsFreq [ x ], x )) return uniqueWords [: k ] # Time Complexity: O(N \\log{k})O(Nlogk), where NN is the length of words. # We count the frequency of each word in O(N)O(N) time, then we add NN words to the heap, # each in O(\\log {k})O(logk) time. Finally, we pop from the heap up to kk times. # As k \\leq Nk\u2264N, this is O(N \\log{k})O(Nlogk) in total. # In Python, we improve this to O(N + k \\log {N})O(N+klogN): our heapq.heapify operation and # counting operations are O(N)O(N), and # each of kk heapq.heappop operations are O(\\log {N})O(logN). # Space Complexity: O(N)O(N), the space used to store our wordsFreq. # Count the frequency of each word, then add it to heap that stores the best k candidates. # Here, \"best\" is defined with our custom ordering relation, # which puts the worst candidates at the top of the heap. # At the end, we pop off the heap up to k times and reverse the result # so that the best candidates are first. # In Python, we instead use heapq.heapify, which can turn a list into a heap in linear time, # simplifying our work. def topKFrequentWords ( words , k ) -> List [ str ]: from heapq import heapify , heappop #, heappush from collections import Counter wordsFreq = Counter ( words ) heap = [( - freq , word ) for word , freq in wordsFreq . items ()] heapq . heapify ( heap ) return [ heapq . heappop ( heap )[ 1 ] for _ in range ( k )]","title":"Top K Frequent Words"},{"location":"Interview-Questions/data-structures-algorithms/#hard","text":"","title":"\ud83e\udd28 Hard"},{"location":"Interview-Questions/data-structures-algorithms/#very-hard","text":"","title":"\ud83d\ude32 Very Hard"},{"location":"Machine-Learning/ARIMA/","text":"","title":"ARIMA"},{"location":"Machine-Learning/Activation%20functions/","text":"","title":"Activation functions"},{"location":"Machine-Learning/Collaborative%20Filtering/","text":"","title":"Collaborative Filtering"},{"location":"Machine-Learning/Confusion%20Matrix/","text":"","title":"Confusion Matrix"},{"location":"Machine-Learning/DBSCAN/","text":"","title":"DBSCAN"},{"location":"Machine-Learning/Decision%20Trees/","text":"","title":"Decision Trees"},{"location":"Machine-Learning/Gradient%20Boosting/","text":"","title":"Gradient Boosting"},{"location":"Machine-Learning/K-means%20clustering/","text":"","title":"K means clustering"},{"location":"Machine-Learning/Linear%20Regression/","text":"","title":"Linear Regression"},{"location":"Machine-Learning/Logistic%20Regression/","text":"","title":"Logistic Regression"},{"location":"Machine-Learning/Loss%20Function%20MAE%2C%20RMSE/","text":"","title":"Loss Function MAE, RMSE"},{"location":"Machine-Learning/Neural%20Networks/","text":"","title":"Neural Networks"},{"location":"Machine-Learning/Normal%20Distribution/","text":"","title":"Normal Distribution"},{"location":"Machine-Learning/Normalization%20Regularisation/","text":"","title":"Normalization Regularisation"},{"location":"Machine-Learning/Overfitting%2C%20Underfitting/","text":"","title":"Overfitting, Underfitting"},{"location":"Machine-Learning/PCA/","text":"","title":"PCA"},{"location":"Machine-Learning/Random%20Forest/","text":"","title":"Random Forest"},{"location":"Machine-Learning/Support%20Vector%20Machines/","text":"","title":"Support Vector Machines"},{"location":"Machine-Learning/Unbalanced%2C%20Skewed%20data/","text":"","title":"Unbalanced, Skewed data"},{"location":"Machine-Learning/kNN/","text":"","title":"kNN"},{"location":"Online-Material/Online-Material-for-Learning/","text":"","title":"Online Study Material"},{"location":"Online-Material/popular-resouces/","text":"","title":"Popular Blogs"},{"location":"as-fast-as-possible/","text":"","title":"Introduction"},{"location":"as-fast-as-possible/Deep-CV/","text":"","title":"Deep Computer Vision"},{"location":"as-fast-as-possible/Deep-NLP/","text":"","title":"Deep Natural Language Processing"},{"location":"as-fast-as-possible/Neural-Networks/","text":"","title":"Neural Networks"},{"location":"as-fast-as-possible/TF2-Keras/","text":"","title":"Tensorflow 2 with Keras"}]} \ No newline at end of file +{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"Home Introduction This is a completely open-source platform for maintaining curated list of interview questions and answers for people looking and preparing for data science opportunities. Not only this, the platform will also serve as one point destination for all your needs like tutorials, online materials, etc. This platform is maintained by you! \ud83e\udd17 You can help us by answering/ improving existing questions as well as by sharing any new questions that you faced during your interviews. Contribute to the platform Contribution in any form will be deeply appreciated. \ud83d\ude4f Add questions \u2753 Add your questions here . Please ensure to provide a detailed description to allow your fellow contributors to understand your questions and answer them to your satisfaction. \ud83e\udd1d Please note that as of now, you cannot directly add a question via a pull request. This will help us to maintain the quality of the content for you . Add answers/topics \ud83d\udcdd These are the answers/topics that need your help at the moment Add documentation for the project Online Material for Learning Suggested Learning Paths Cheat Sheets Django Flask Numpy Pandas PySpark Python RegEx SQL NLP Interview Questions Add python common DSA interview questions Add Major ML topics Linear Regression Logistic Regression SVM Random Forest Gradient boosting PCA Collaborative Filtering K-means clustering kNN ARIMA Neural Networks Decision Trees Overfitting, Underfitting Unbalanced, Skewed data Activation functions relu/ leaky relu Normalization DBSCAN Normal Distribution Precision, Recall Loss Function MAE, RMSE Add Pandas questions Add NumPy questions Add TensorFlow questions Add PyTorch questions Add list of learning resources Report/Solve Issues \ud83d\udd27 To report any issues find me on LinkedIn or raise an issue on GitHub. \ud83d\udee0 You can also solve existing issues on GitHub and create a pull request. Say Thanks \ud83d\ude0a If this platform helped you in any way, it would be great if you could share it with others. Check out this \ud83d\udc47 platform \ud83d\udc47 for data science content: \ud83d\udc49 https://singhsidhukuldeep.github.io/data-science-interview-prep/ \ud83d\udc48 You can also star the repository on GitHub and watch-out for any updates Features \ud83c\udfa8 Beautiful: The design is built on top of most popular libraries like MkDocs and material which allows the platform to be responsive and to work on all sorts of devices \u2013 from mobile phones to wide-screens. The underlying fluid layout will always adapt perfectly to the available screen space. \ud83e\uddd0 Searchable: almost magically, all the content on the website is searchable without any further ado. The built-in search \u2013 server-less \u2013 is fast and accurate in responses to any of the queries. \ud83d\ude4c Accessible: Easy to use: \ud83d\udc4c The website is hosted on github-pages and is free and open to use to over 40 million users of GitHub in 100+ countries. Easy to contribute: \ud83e\udd1d The website embodies the concept of collaboration to the latter. Allowing anyone to add/improve the content. To make contributing easy, everything is written in MarkDown and then compiled to beautiful html. Setup No setup is required for usage of the platform Important: It is strongly advised to use virtual environment and not change anything in gh-pages Linux Systems python3 -m venv ./venv source venv/bin/activate pip3 install -r requirements.txt deactivate Windows Systems python3 -m venv ./venv venv \\S cripts \\a ctivate pip3 install -r requirements.txt venv \\S cripts \\d eactivate To install the latest pip3 install mkdocs pip3 install mkdocs-material pip3 install mkdocs-minify-plugin pip3 install mkdocs-git-revision-date-localized-plugin Useful Commands mkdocs serve - Start the live-reloading docs server. mkdocs build - Build the documentation site. mkdocs -h - Print help message and exit. mkdocs gh-deploy - Use mkdocs gh-deploy --help to get a full list of options available for the gh-deploy command. Be aware that you will not be able to review the built site before it is pushed to GitHub. Therefore, you may want to verify any changes you make to the docs beforehand by using the build or serve commands and reviewing the built files locally. mkdocs new [dir-name] - Create a new project. No need to create a new project Useful Documents \ud83d\udcd1 MkDocs: GitHub: https://github.com/mkdocs/mkdocs Documentation: https://www.mkdocs.org/ \ud83c\udfa8 Theme: GitHub: https://github.com/squidfunk/mkdocs-material Documentation: https://squidfunk.github.io/mkdocs-material/getting-started/ FAQ Can I filter questions based on companies? \ud83e\udd2a As much as this platform aims to help you with your interview preparation, it is not a short-cut to crack one. Think of this platform as a practicing field to help you sharpen your skills for your interview processes. However, for your convenience we have sorted all the questions by topics for you. \ud83e\udd13 This doesn't mean that such feature won't be added in the future. \"Never say Never\" But as of now there is neither plan nor data to do so. \ud83d\ude22 Why is this platform free? \ud83e\udd17 Currently there is no major cost involved in maintaining this platform other than time and effort that is put in by every contributor . If you want to help you can contribute here . If you still want to pay for something that is free, we would request you to donate it to a charity of your choice instead. \ud83d\ude07 Credits Maintained by \ud83d\udc68\u200d\ud83c\udf93 Kuldeep Singh Sidhu Github: github/singhsidhukuldeep https://github.com/singhsidhukuldeep Website: Kuldeep Singh Sidhu (Website) http://kuldeepsinghsidhu.com LinkedIn: Kuldeep Singh Sidhu (LinkedIn) https://www.linkedin.com/in/singhsidhukuldeep/ Contributors \ud83d\ude0e The full list of all the contributors is available here Current Status","title":"\ud83c\udfe1 Home"},{"location":"#home","text":"","title":"Home"},{"location":"#introduction","text":"This is a completely open-source platform for maintaining curated list of interview questions and answers for people looking and preparing for data science opportunities. Not only this, the platform will also serve as one point destination for all your needs like tutorials, online materials, etc. This platform is maintained by you! \ud83e\udd17 You can help us by answering/ improving existing questions as well as by sharing any new questions that you faced during your interviews.","title":"Introduction"},{"location":"#contribute-to-the-platform","text":"Contribution in any form will be deeply appreciated. \ud83d\ude4f","title":"Contribute to the platform"},{"location":"#add-questions","text":"\u2753 Add your questions here . Please ensure to provide a detailed description to allow your fellow contributors to understand your questions and answer them to your satisfaction. \ud83e\udd1d Please note that as of now, you cannot directly add a question via a pull request. This will help us to maintain the quality of the content for you .","title":"Add questions"},{"location":"#add-answerstopics","text":"\ud83d\udcdd These are the answers/topics that need your help at the moment Add documentation for the project Online Material for Learning Suggested Learning Paths Cheat Sheets Django Flask Numpy Pandas PySpark Python RegEx SQL NLP Interview Questions Add python common DSA interview questions Add Major ML topics Linear Regression Logistic Regression SVM Random Forest Gradient boosting PCA Collaborative Filtering K-means clustering kNN ARIMA Neural Networks Decision Trees Overfitting, Underfitting Unbalanced, Skewed data Activation functions relu/ leaky relu Normalization DBSCAN Normal Distribution Precision, Recall Loss Function MAE, RMSE Add Pandas questions Add NumPy questions Add TensorFlow questions Add PyTorch questions Add list of learning resources","title":"Add answers/topics"},{"location":"#reportsolve-issues","text":"\ud83d\udd27 To report any issues find me on LinkedIn or raise an issue on GitHub. \ud83d\udee0 You can also solve existing issues on GitHub and create a pull request.","title":"Report/Solve Issues"},{"location":"#say-thanks","text":"\ud83d\ude0a If this platform helped you in any way, it would be great if you could share it with others. Check out this \ud83d\udc47 platform \ud83d\udc47 for data science content: \ud83d\udc49 https://singhsidhukuldeep.github.io/data-science-interview-prep/ \ud83d\udc48 You can also star the repository on GitHub and watch-out for any updates","title":"Say Thanks"},{"location":"#features","text":"\ud83c\udfa8 Beautiful: The design is built on top of most popular libraries like MkDocs and material which allows the platform to be responsive and to work on all sorts of devices \u2013 from mobile phones to wide-screens. The underlying fluid layout will always adapt perfectly to the available screen space. \ud83e\uddd0 Searchable: almost magically, all the content on the website is searchable without any further ado. The built-in search \u2013 server-less \u2013 is fast and accurate in responses to any of the queries. \ud83d\ude4c Accessible: Easy to use: \ud83d\udc4c The website is hosted on github-pages and is free and open to use to over 40 million users of GitHub in 100+ countries. Easy to contribute: \ud83e\udd1d The website embodies the concept of collaboration to the latter. Allowing anyone to add/improve the content. To make contributing easy, everything is written in MarkDown and then compiled to beautiful html.","title":"Features"},{"location":"#setup","text":"No setup is required for usage of the platform Important: It is strongly advised to use virtual environment and not change anything in gh-pages","title":"Setup"},{"location":"#linux-systems","text":"python3 -m venv ./venv source venv/bin/activate pip3 install -r requirements.txt deactivate","title":"Linux Systems"},{"location":"#windows-systems","text":"python3 -m venv ./venv venv \\S cripts \\a ctivate pip3 install -r requirements.txt venv \\S cripts \\d eactivate","title":"Windows Systems"},{"location":"#to-install-the-latest","text":"pip3 install mkdocs pip3 install mkdocs-material pip3 install mkdocs-minify-plugin pip3 install mkdocs-git-revision-date-localized-plugin","title":"To install the latest"},{"location":"#useful-commands","text":"mkdocs serve - Start the live-reloading docs server. mkdocs build - Build the documentation site. mkdocs -h - Print help message and exit. mkdocs gh-deploy - Use mkdocs gh-deploy --help to get a full list of options available for the gh-deploy command. Be aware that you will not be able to review the built site before it is pushed to GitHub. Therefore, you may want to verify any changes you make to the docs beforehand by using the build or serve commands and reviewing the built files locally. mkdocs new [dir-name] - Create a new project. No need to create a new project","title":"Useful Commands"},{"location":"#useful-documents","text":"\ud83d\udcd1 MkDocs: GitHub: https://github.com/mkdocs/mkdocs Documentation: https://www.mkdocs.org/ \ud83c\udfa8 Theme: GitHub: https://github.com/squidfunk/mkdocs-material Documentation: https://squidfunk.github.io/mkdocs-material/getting-started/","title":"Useful Documents"},{"location":"#faq","text":"Can I filter questions based on companies? \ud83e\udd2a As much as this platform aims to help you with your interview preparation, it is not a short-cut to crack one. Think of this platform as a practicing field to help you sharpen your skills for your interview processes. However, for your convenience we have sorted all the questions by topics for you. \ud83e\udd13 This doesn't mean that such feature won't be added in the future. \"Never say Never\" But as of now there is neither plan nor data to do so. \ud83d\ude22 Why is this platform free? \ud83e\udd17 Currently there is no major cost involved in maintaining this platform other than time and effort that is put in by every contributor . If you want to help you can contribute here . If you still want to pay for something that is free, we would request you to donate it to a charity of your choice instead. \ud83d\ude07","title":"FAQ"},{"location":"#credits","text":"","title":"Credits"},{"location":"#maintained-by","text":"\ud83d\udc68\u200d\ud83c\udf93 Kuldeep Singh Sidhu Github: github/singhsidhukuldeep https://github.com/singhsidhukuldeep Website: Kuldeep Singh Sidhu (Website) http://kuldeepsinghsidhu.com LinkedIn: Kuldeep Singh Sidhu (LinkedIn) https://www.linkedin.com/in/singhsidhukuldeep/","title":"Maintained by"},{"location":"#contributors","text":"\ud83d\ude0e The full list of all the contributors is available here","title":"Contributors"},{"location":"#current-status","text":"","title":"Current Status"},{"location":"Suggested-Learning-Paths/","text":"","title":"\ud83d\udcc5 Suggested Learning Paths"},{"location":"projects/","text":"Projects Introduction These are the projects that you can take inspiration from and try to improve on them. \u270d\ufe0f Popular Sources List of projects Natural Language processing (NLP) Title Description Source Author Text Classification with Facebook fasttext Building the User Review Model with fastText (Text Classification) with response time of less than one second Kuldeep Singh Sidhu Chat-bot using ChatterBot ChatterBot is a Python library that makes it easy to generate automated responses to a user\u2019s input. Kuldeep Singh Sidhu Text Summarizer Comparing state of the art models for text summary generation Kuldeep Singh Sidhu NLP with Spacy Building NLP pipeline using Spacy Kuldeep Singh Sidhu Recommendation Engine Title Description Source Author Recommendation Engine with Surprise Comparing different recommendation systems algorithms like SVD, SVDpp (Matrix Factorization), KNN Baseline, KNN Basic, KNN Means, KNN ZScore), Baseline, Co Clustering Kuldeep Singh Sidhu Image Processing Title Description Source Author Facial Landmarks Using Dlib, a library capable of giving you 68 points (land marks) of the face. Kuldeep Singh Sidhu Reinforcement Learning Title Description Source Author Google Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. Kuldeep Singh Sidhu Tic Tac Toe Training a computer to play Tic Tac Toe using reinforcement learning algorithms. Kuldeep Singh Sidhu Others Title Description Source Author TensorFlow Eager Execution Eager Execution (EE) enables you to run operations immediately. Kuldeep Singh Sidhu","title":"\ud83d\udcf3 Projects"},{"location":"projects/#projects","text":"","title":"Projects"},{"location":"projects/#introduction","text":"These are the projects that you can take inspiration from and try to improve on them. \u270d\ufe0f","title":"Introduction"},{"location":"projects/#popular-sources","text":"","title":"Popular Sources"},{"location":"projects/#list-of-projects","text":"","title":"List of projects"},{"location":"projects/#natural-language-processing-nlp","text":"Title Description Source Author Text Classification with Facebook fasttext Building the User Review Model with fastText (Text Classification) with response time of less than one second Kuldeep Singh Sidhu Chat-bot using ChatterBot ChatterBot is a Python library that makes it easy to generate automated responses to a user\u2019s input. Kuldeep Singh Sidhu Text Summarizer Comparing state of the art models for text summary generation Kuldeep Singh Sidhu NLP with Spacy Building NLP pipeline using Spacy Kuldeep Singh Sidhu","title":"Natural Language processing (NLP)"},{"location":"projects/#recommendation-engine","text":"Title Description Source Author Recommendation Engine with Surprise Comparing different recommendation systems algorithms like SVD, SVDpp (Matrix Factorization), KNN Baseline, KNN Basic, KNN Means, KNN ZScore), Baseline, Co Clustering Kuldeep Singh Sidhu","title":"Recommendation Engine"},{"location":"projects/#image-processing","text":"Title Description Source Author Facial Landmarks Using Dlib, a library capable of giving you 68 points (land marks) of the face. Kuldeep Singh Sidhu","title":"Image Processing"},{"location":"projects/#reinforcement-learning","text":"Title Description Source Author Google Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. Kuldeep Singh Sidhu Tic Tac Toe Training a computer to play Tic Tac Toe using reinforcement learning algorithms. Kuldeep Singh Sidhu","title":"Reinforcement Learning"},{"location":"projects/#others","text":"Title Description Source Author TensorFlow Eager Execution Eager Execution (EE) enables you to run operations immediately. Kuldeep Singh Sidhu","title":"Others"},{"location":"Cheat-Sheets/Django/","text":"","title":"Django"},{"location":"Cheat-Sheets/Flask/","text":"","title":"Flask"},{"location":"Cheat-Sheets/Hypothesis-Tests/","text":"Hypothesis Tests in Python A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. Few Notes: When it comes to assumptions such as the expected distribution of data or sample size, the results of a given test are likely to degrade gracefully rather than become immediately unusable if an assumption is violated. Generally, data samples need to be representative of the domain and large enough to expose their distribution to analysis. In some cases, the data can be corrected to meet the assumptions, such as correcting a nearly normal distribution to be normal by removing outliers, or using a correction to the degrees of freedom in a statistical test when samples have differing variance, to name two examples. Normality Tests This section lists statistical tests that you can use to check if your data has a Gaussian distribution. Gaussian distribution (also known as normal distribution) is a bell-shaped curve. Shapiro-Wilk Test Tests whether a data sample has a Gaussian distribution/Normal distribution. Assumptions Observations in each sample are independent and identically distributed (iid). Interpretation H0: the sample has a Gaussian distribution. H1: the sample does not have a Gaussian distribution. Python Code # Example of the Shapiro-Wilk Normality Test from scipy.stats import shapiro data = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] stat , p = shapiro ( data ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably Gaussian' ) else : print ( 'Probably not Gaussian' ) Sources scipy.stats.shapiro Shapiro-Wilk test on Wikipedia D\u2019Agostino\u2019s K^2 Test Tests whether a data sample has a Gaussian distribution/Normal distribution. Assumptions Observations in each sample are independent and identically distributed (iid). Interpretation H0: the sample has a Gaussian distribution. H1: the sample does not have a Gaussian distribution. Python Code # Example of the D'Agostino's K^2 Normality Test from scipy.stats import normaltest data = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] stat , p = normaltest ( data ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably Gaussian' ) else : print ( 'Probably not Gaussian' ) Sources scipy.stats.normaltest D'Agostino's K-squared test on Wikipedia Anderson-Darling Test Tests whether a data sample has a Gaussian distribution/Normal distribution. Assumptions Observations in each sample are independent and identically distributed (iid). Interpretation H0: the sample has a Gaussian distribution. H1: the sample does not have a Gaussian distribution. Python Code # Example of the Anderson-Darling Normality Test from scipy.stats import anderson data = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] result = anderson ( data ) print ( 'stat= %.3f ' % ( result . statistic )) for i in range ( len ( result . critical_values )): sl , cv = result . significance_level [ i ], result . critical_values [ i ] if result . statistic < cv : print ( 'Probably Gaussian at the %.1f%% level' % ( sl )) else : print ( 'Probably not Gaussian at the %.1f%% level' % ( sl )) Sources scipy.stats.anderson Anderson-Darling test on Wikipedia Correlation Tests This section lists statistical tests that you can use to check if two samples are related. Pearson\u2019s Correlation Coefficient Tests whether two samples have a linear relationship. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance. Interpretation H0: the two samples are independent. H1: there is a dependency between the samples. Python Code # Example of the Pearson's Correlation test from scipy.stats import pearsonr data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 0.353 , 3.517 , 0.125 , - 7.545 , - 0.555 , - 1.536 , 3.350 , - 1.578 , - 3.537 , - 1.579 ] stat , p = pearsonr ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably independent' ) else : print ( 'Probably dependent' ) Sources scipy.stats.pearsonr Pearson's correlation coefficient on Wikipedia Spearman\u2019s Rank Correlation Tests whether two samples have a monotonic relationship. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample can be ranked. Interpretation H0: the two samples are independent. H1: there is a dependency between the samples. Python Code # Example of the Spearman's Rank Correlation Test from scipy.stats import spearmanr data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 0.353 , 3.517 , 0.125 , - 7.545 , - 0.555 , - 1.536 , 3.350 , - 1.578 , - 3.537 , - 1.579 ] stat , p = spearmanr ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably independent' ) else : print ( 'Probably dependent' ) Sources scipy.stats.spearmanr Spearman's rank correlation coefficient on Wikipedia Kendall\u2019s Rank Correlation Tests whether two samples have a monotonic relationship. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample can be ranked. Interpretation H0: the two samples are independent. H1: there is a dependency between the samples. Python Code # Example of the Kendall's Rank Correlation Test from scipy.stats import kendalltau data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 0.353 , 3.517 , 0.125 , - 7.545 , - 0.555 , - 1.536 , 3.350 , - 1.578 , - 3.537 , - 1.579 ] stat , p = kendalltau ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably independent' ) else : print ( 'Probably dependent' ) Sources scipy.stats.kendalltau Kendall rank correlation coefficient on Wikipedia Chi-Squared Test Tests whether two categorical variables are related or independent. Assumptions Observations used in the calculation of the contingency table are independent. 25 or more examples in each cell of the contingency table. Interpretation H0: the two samples are independent. H1: there is a dependency between the samples. Python Code # Example of the Chi-Squared Test from scipy.stats import chi2_contingency table = [[ 10 , 20 , 30 ],[ 6 , 9 , 17 ]] stat , p , dof , expected = chi2_contingency ( table ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably independent' ) else : print ( 'Probably dependent' ) Sources scipy.stats.chi2_contingency Chi-Squared test on Wikipedia Stationary Tests This section lists statistical tests that you can use to check if a time series is stationary or not. Augmented Dickey-Fuller Unit Root Test Tests whether a time series has a unit root, e.g. has a trend or more generally is autoregressive. Assumptions Observations in are temporally ordered. Interpretation H0: a unit root is present (series is non-stationary). H1: a unit root is not present (series is stationary). Python Code # Example of the Augmented Dickey-Fuller unit root test from statsmodels.tsa.stattools import adfuller data = [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 ] stat , p , lags , obs , crit , t = adfuller ( data ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably not Stationary' ) else : print ( 'Probably Stationary' ) Sources statsmodels.tsa.stattools.adfuller API . Augmented Dickey--Fuller test, Wikipedia . Kwiatkowski-Phillips-Schmidt-Shin Tests whether a time series is trend stationary or not. Assumptions Observations in are temporally ordered. Interpretation H0: the time series is trend-stationary. H1: the time series is not trend-stationary. Python Code # Example of the Kwiatkowski-Phillips-Schmidt-Shin test from statsmodels.tsa.stattools import kpss data = [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 ] stat , p , lags , crit = kpss ( data ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably Stationary' ) else : print ( 'Probably not Stationary' ) Sources statsmodels.tsa.stattools.kpss API . KPSS test, Wikipedia . Parametric Statistical Hypothesis Tests This section lists statistical tests that you can use to compare data samples. Student\u2019s t-test Tests whether the means of two independent samples are significantly different. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance. Interpretation H0: the means of the samples are equal. H1: the means of the samples are unequal. Python Code # Example of the Student's t-test from scipy.stats import ttest_ind data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 1.142 , - 0.432 , - 0.938 , - 0.729 , - 0.846 , - 0.157 , 0.500 , 1.183 , - 1.075 , - 0.169 ] stat , p = ttest_ind ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same distribution' ) else : print ( 'Probably different distributions' ) Sources scipy.stats.ttest_ind Student's t-test on Wikipedia Paired Student\u2019s t-test Tests whether the means of two independent samples are significantly different. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance. Observations across each sample are paired. Interpretation H0: the means of the samples are equal. H1: the means of the samples are unequal. Python Code # Example of the Paired Student's t-test from scipy.stats import ttest_rel data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 1.142 , - 0.432 , - 0.938 , - 0.729 , - 0.846 , - 0.157 , 0.500 , 1.183 , - 1.075 , - 0.169 ] stat , p = ttest_rel ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same distribution' ) else : print ( 'Probably different distributions' ) Sources scipy.stats.ttest_rel Student's t-test on Wikipedia Analysis of Variance Test (ANOVA) Tests whether the means of two or more independent samples are significantly different. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance. Interpretation H0: the means of the samples are equal. H1: the means of the samples are unequal. Python Code # Example of the Analysis of Variance Test from scipy.stats import f_oneway data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 1.142 , - 0.432 , - 0.938 , - 0.729 , - 0.846 , - 0.157 , 0.500 , 1.183 , - 1.075 , - 0.169 ] data3 = [ - 0.208 , 0.696 , 0.928 , - 1.148 , - 0.213 , 0.229 , 0.137 , 0.269 , - 0.870 , - 1.204 ] stat , p = f_oneway ( data1 , data2 , data3 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same distribution' ) else : print ( 'Probably different distributions' ) Sources scipy.stats.f_oneway Analysis of variance on Wikipedia Repeated Measures ANOVA Test Tests whether the means of two or more paired samples are significantly different. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance. Observations across each sample are paired. Interpretation H0: the means of the samples are equal. H1: one or more of the means of the samples are unequal. Python Code # Currently not supported in Python. :( Sources Analysis of variance on Wikipedia Nonparametric Statistical Hypothesis Tests In Non-Parametric tests, we don't make any assumption about the parameters for the given population or the population we are studying. In fact, these tests don't depend on the population. Hence, there is no fixed set of parameters is available, and also there is no distribution (normal distribution, etc.) Mann-Whitney U Test Tests whether the distributions of two independent samples are equal or not. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample can be ranked. Interpretation H0: the distributions of both samples are equal. H1: the distributions of both samples are not equal. Python Code # Example of the Mann-Whitney U Test from scipy.stats import mannwhitneyu data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 1.142 , - 0.432 , - 0.938 , - 0.729 , - 0.846 , - 0.157 , 0.500 , 1.183 , - 1.075 , - 0.169 ] stat , p = mannwhitneyu ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same distribution' ) else : print ( 'Probably different distributions' ) Sources scipy.stats.mannwhitneyu Mann-Whitney U test on Wikipedia Wilcoxon Signed-Rank Test Tests whether the distributions of two paired samples are equal or not. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample can be ranked. Observations across each sample are paired. Interpretation H0: the distributions of both samples are equal. H1: the distributions of both samples are not equal. Python Code # Example of the Wilcoxon Signed-Rank Test from scipy.stats import wilcoxon data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 1.142 , - 0.432 , - 0.938 , - 0.729 , - 0.846 , - 0.157 , 0.500 , 1.183 , - 1.075 , - 0.169 ] stat , p = wilcoxon ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same distribution' ) else : print ( 'Probably different distributions' ) Sources scipy.stats.wilcoxon Wilcoxon signed-rank test on Wikipedia Kruskal-Wallis H Test Tests whether the distributions of two or more independent samples are equal or not. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample can be ranked. Interpretation H0: the distributions of all samples are equal. H1: the distributions of one or more samples are not equal. Python Code # Example of the Kruskal-Wallis H Test from scipy.stats import kruskal data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 1.142 , - 0.432 , - 0.938 , - 0.729 , - 0.846 , - 0.157 , 0.500 , 1.183 , - 1.075 , - 0.169 ] stat , p = kruskal ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same distribution' ) else : print ( 'Probably different distributions' ) Sources scipy.stats.kruskal Kruskal-Wallis one-way analysis of variance on Wikipedia Friedman Test Tests whether the distributions of two or more paired samples are equal or not. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample can be ranked. Observations across each sample are paired. Interpretation H0: the distributions of all samples are equal. H1: the distributions of one or more samples are not equal. Python Code # Example of the Friedman Test from scipy.stats import friedmanchisquare data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 1.142 , - 0.432 , - 0.938 , - 0.729 , - 0.846 , - 0.157 , 0.500 , 1.183 , - 1.075 , - 0.169 ] data3 = [ - 0.208 , 0.696 , 0.928 , - 1.148 , - 0.213 , 0.229 , 0.137 , 0.269 , - 0.870 , - 1.204 ] stat , p = friedmanchisquare ( data1 , data2 , data3 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same distribution' ) else : print ( 'Probably different distributions' ) Sources scipy.stats.friedmanchisquare Friedman test on Wikipedia Equality of variance test Test is used to assess the equality of variance between two different samples. Levene's test Levene\u2019s test is used to assess the equality of variance between two or more different samples. Assumptions The samples from the populations under consideration are independent. The populations under consideration are approximately normally distributed. Interpretation H0: All the samples variances are equal H1: At least one variance is different from the rest Python Code # Example of the Levene's test from scipy.stats import levene a = [ 8.88 , 9.12 , 9.04 , 8.98 , 9.00 , 9.08 , 9.01 , 8.85 , 9.06 , 8.99 ] b = [ 8.88 , 8.95 , 9.29 , 9.44 , 9.15 , 9.58 , 8.36 , 9.18 , 8.67 , 9.05 ] c = [ 8.95 , 9.12 , 8.95 , 8.85 , 9.03 , 8.84 , 9.07 , 8.98 , 8.86 , 8.98 ] stat , p = levene ( a , b , c ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same variances' ) else : print ( 'Probably at least one variance is different from the rest' ) Sources scipy.stats.levene Levene's test on Wikipedia Source: https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/","title":"Hypothesis Tests"},{"location":"Cheat-Sheets/Hypothesis-Tests/#hypothesis-tests-in-python","text":"A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. Few Notes: When it comes to assumptions such as the expected distribution of data or sample size, the results of a given test are likely to degrade gracefully rather than become immediately unusable if an assumption is violated. Generally, data samples need to be representative of the domain and large enough to expose their distribution to analysis. In some cases, the data can be corrected to meet the assumptions, such as correcting a nearly normal distribution to be normal by removing outliers, or using a correction to the degrees of freedom in a statistical test when samples have differing variance, to name two examples.","title":"Hypothesis Tests in Python"},{"location":"Cheat-Sheets/Hypothesis-Tests/#normality-tests","text":"This section lists statistical tests that you can use to check if your data has a Gaussian distribution. Gaussian distribution (also known as normal distribution) is a bell-shaped curve.","title":"Normality Tests"},{"location":"Cheat-Sheets/Hypothesis-Tests/#shapiro-wilk-test","text":"Tests whether a data sample has a Gaussian distribution/Normal distribution. Assumptions Observations in each sample are independent and identically distributed (iid). Interpretation H0: the sample has a Gaussian distribution. H1: the sample does not have a Gaussian distribution. Python Code # Example of the Shapiro-Wilk Normality Test from scipy.stats import shapiro data = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] stat , p = shapiro ( data ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably Gaussian' ) else : print ( 'Probably not Gaussian' ) Sources scipy.stats.shapiro Shapiro-Wilk test on Wikipedia","title":"Shapiro-Wilk Test"},{"location":"Cheat-Sheets/Hypothesis-Tests/#dagostinos-k2-test","text":"Tests whether a data sample has a Gaussian distribution/Normal distribution. Assumptions Observations in each sample are independent and identically distributed (iid). Interpretation H0: the sample has a Gaussian distribution. H1: the sample does not have a Gaussian distribution. Python Code # Example of the D'Agostino's K^2 Normality Test from scipy.stats import normaltest data = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] stat , p = normaltest ( data ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably Gaussian' ) else : print ( 'Probably not Gaussian' ) Sources scipy.stats.normaltest D'Agostino's K-squared test on Wikipedia","title":"D\u2019Agostino\u2019s K^2 Test"},{"location":"Cheat-Sheets/Hypothesis-Tests/#anderson-darling-test","text":"Tests whether a data sample has a Gaussian distribution/Normal distribution. Assumptions Observations in each sample are independent and identically distributed (iid). Interpretation H0: the sample has a Gaussian distribution. H1: the sample does not have a Gaussian distribution. Python Code # Example of the Anderson-Darling Normality Test from scipy.stats import anderson data = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] result = anderson ( data ) print ( 'stat= %.3f ' % ( result . statistic )) for i in range ( len ( result . critical_values )): sl , cv = result . significance_level [ i ], result . critical_values [ i ] if result . statistic < cv : print ( 'Probably Gaussian at the %.1f%% level' % ( sl )) else : print ( 'Probably not Gaussian at the %.1f%% level' % ( sl )) Sources scipy.stats.anderson Anderson-Darling test on Wikipedia","title":"Anderson-Darling Test"},{"location":"Cheat-Sheets/Hypothesis-Tests/#correlation-tests","text":"This section lists statistical tests that you can use to check if two samples are related.","title":"Correlation Tests"},{"location":"Cheat-Sheets/Hypothesis-Tests/#pearsons-correlation-coefficient","text":"Tests whether two samples have a linear relationship. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance. Interpretation H0: the two samples are independent. H1: there is a dependency between the samples. Python Code # Example of the Pearson's Correlation test from scipy.stats import pearsonr data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 0.353 , 3.517 , 0.125 , - 7.545 , - 0.555 , - 1.536 , 3.350 , - 1.578 , - 3.537 , - 1.579 ] stat , p = pearsonr ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably independent' ) else : print ( 'Probably dependent' ) Sources scipy.stats.pearsonr Pearson's correlation coefficient on Wikipedia","title":"Pearson\u2019s Correlation Coefficient"},{"location":"Cheat-Sheets/Hypothesis-Tests/#spearmans-rank-correlation","text":"Tests whether two samples have a monotonic relationship. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample can be ranked. Interpretation H0: the two samples are independent. H1: there is a dependency between the samples. Python Code # Example of the Spearman's Rank Correlation Test from scipy.stats import spearmanr data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 0.353 , 3.517 , 0.125 , - 7.545 , - 0.555 , - 1.536 , 3.350 , - 1.578 , - 3.537 , - 1.579 ] stat , p = spearmanr ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably independent' ) else : print ( 'Probably dependent' ) Sources scipy.stats.spearmanr Spearman's rank correlation coefficient on Wikipedia","title":"Spearman\u2019s Rank Correlation"},{"location":"Cheat-Sheets/Hypothesis-Tests/#kendalls-rank-correlation","text":"Tests whether two samples have a monotonic relationship. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample can be ranked. Interpretation H0: the two samples are independent. H1: there is a dependency between the samples. Python Code # Example of the Kendall's Rank Correlation Test from scipy.stats import kendalltau data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 0.353 , 3.517 , 0.125 , - 7.545 , - 0.555 , - 1.536 , 3.350 , - 1.578 , - 3.537 , - 1.579 ] stat , p = kendalltau ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably independent' ) else : print ( 'Probably dependent' ) Sources scipy.stats.kendalltau Kendall rank correlation coefficient on Wikipedia","title":"Kendall\u2019s Rank Correlation"},{"location":"Cheat-Sheets/Hypothesis-Tests/#chi-squared-test","text":"Tests whether two categorical variables are related or independent. Assumptions Observations used in the calculation of the contingency table are independent. 25 or more examples in each cell of the contingency table. Interpretation H0: the two samples are independent. H1: there is a dependency between the samples. Python Code # Example of the Chi-Squared Test from scipy.stats import chi2_contingency table = [[ 10 , 20 , 30 ],[ 6 , 9 , 17 ]] stat , p , dof , expected = chi2_contingency ( table ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably independent' ) else : print ( 'Probably dependent' ) Sources scipy.stats.chi2_contingency Chi-Squared test on Wikipedia","title":"Chi-Squared Test"},{"location":"Cheat-Sheets/Hypothesis-Tests/#stationary-tests","text":"This section lists statistical tests that you can use to check if a time series is stationary or not.","title":"Stationary Tests"},{"location":"Cheat-Sheets/Hypothesis-Tests/#augmented-dickey-fuller-unit-root-test","text":"Tests whether a time series has a unit root, e.g. has a trend or more generally is autoregressive. Assumptions Observations in are temporally ordered. Interpretation H0: a unit root is present (series is non-stationary). H1: a unit root is not present (series is stationary). Python Code # Example of the Augmented Dickey-Fuller unit root test from statsmodels.tsa.stattools import adfuller data = [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 ] stat , p , lags , obs , crit , t = adfuller ( data ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably not Stationary' ) else : print ( 'Probably Stationary' ) Sources statsmodels.tsa.stattools.adfuller API . Augmented Dickey--Fuller test, Wikipedia .","title":"Augmented Dickey-Fuller Unit Root Test"},{"location":"Cheat-Sheets/Hypothesis-Tests/#kwiatkowski-phillips-schmidt-shin","text":"Tests whether a time series is trend stationary or not. Assumptions Observations in are temporally ordered. Interpretation H0: the time series is trend-stationary. H1: the time series is not trend-stationary. Python Code # Example of the Kwiatkowski-Phillips-Schmidt-Shin test from statsmodels.tsa.stattools import kpss data = [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 ] stat , p , lags , crit = kpss ( data ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably Stationary' ) else : print ( 'Probably not Stationary' ) Sources statsmodels.tsa.stattools.kpss API . KPSS test, Wikipedia .","title":"Kwiatkowski-Phillips-Schmidt-Shin"},{"location":"Cheat-Sheets/Hypothesis-Tests/#parametric-statistical-hypothesis-tests","text":"This section lists statistical tests that you can use to compare data samples.","title":"Parametric Statistical Hypothesis Tests"},{"location":"Cheat-Sheets/Hypothesis-Tests/#students-t-test","text":"Tests whether the means of two independent samples are significantly different. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance. Interpretation H0: the means of the samples are equal. H1: the means of the samples are unequal. Python Code # Example of the Student's t-test from scipy.stats import ttest_ind data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 1.142 , - 0.432 , - 0.938 , - 0.729 , - 0.846 , - 0.157 , 0.500 , 1.183 , - 1.075 , - 0.169 ] stat , p = ttest_ind ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same distribution' ) else : print ( 'Probably different distributions' ) Sources scipy.stats.ttest_ind Student's t-test on Wikipedia","title":"Student\u2019s t-test"},{"location":"Cheat-Sheets/Hypothesis-Tests/#paired-students-t-test","text":"Tests whether the means of two independent samples are significantly different. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance. Observations across each sample are paired. Interpretation H0: the means of the samples are equal. H1: the means of the samples are unequal. Python Code # Example of the Paired Student's t-test from scipy.stats import ttest_rel data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 1.142 , - 0.432 , - 0.938 , - 0.729 , - 0.846 , - 0.157 , 0.500 , 1.183 , - 1.075 , - 0.169 ] stat , p = ttest_rel ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same distribution' ) else : print ( 'Probably different distributions' ) Sources scipy.stats.ttest_rel Student's t-test on Wikipedia","title":"Paired Student\u2019s t-test"},{"location":"Cheat-Sheets/Hypothesis-Tests/#analysis-of-variance-test-anova","text":"Tests whether the means of two or more independent samples are significantly different. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance. Interpretation H0: the means of the samples are equal. H1: the means of the samples are unequal. Python Code # Example of the Analysis of Variance Test from scipy.stats import f_oneway data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 1.142 , - 0.432 , - 0.938 , - 0.729 , - 0.846 , - 0.157 , 0.500 , 1.183 , - 1.075 , - 0.169 ] data3 = [ - 0.208 , 0.696 , 0.928 , - 1.148 , - 0.213 , 0.229 , 0.137 , 0.269 , - 0.870 , - 1.204 ] stat , p = f_oneway ( data1 , data2 , data3 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same distribution' ) else : print ( 'Probably different distributions' ) Sources scipy.stats.f_oneway Analysis of variance on Wikipedia","title":"Analysis of Variance Test (ANOVA)"},{"location":"Cheat-Sheets/Hypothesis-Tests/#repeated-measures-anova-test","text":"Tests whether the means of two or more paired samples are significantly different. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance. Observations across each sample are paired. Interpretation H0: the means of the samples are equal. H1: one or more of the means of the samples are unequal. Python Code # Currently not supported in Python. :( Sources Analysis of variance on Wikipedia","title":"Repeated Measures ANOVA Test"},{"location":"Cheat-Sheets/Hypothesis-Tests/#nonparametric-statistical-hypothesis-tests","text":"In Non-Parametric tests, we don't make any assumption about the parameters for the given population or the population we are studying. In fact, these tests don't depend on the population. Hence, there is no fixed set of parameters is available, and also there is no distribution (normal distribution, etc.)","title":"Nonparametric Statistical Hypothesis Tests"},{"location":"Cheat-Sheets/Hypothesis-Tests/#mann-whitney-u-test","text":"Tests whether the distributions of two independent samples are equal or not. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample can be ranked. Interpretation H0: the distributions of both samples are equal. H1: the distributions of both samples are not equal. Python Code # Example of the Mann-Whitney U Test from scipy.stats import mannwhitneyu data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 1.142 , - 0.432 , - 0.938 , - 0.729 , - 0.846 , - 0.157 , 0.500 , 1.183 , - 1.075 , - 0.169 ] stat , p = mannwhitneyu ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same distribution' ) else : print ( 'Probably different distributions' ) Sources scipy.stats.mannwhitneyu Mann-Whitney U test on Wikipedia","title":"Mann-Whitney U Test"},{"location":"Cheat-Sheets/Hypothesis-Tests/#wilcoxon-signed-rank-test","text":"Tests whether the distributions of two paired samples are equal or not. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample can be ranked. Observations across each sample are paired. Interpretation H0: the distributions of both samples are equal. H1: the distributions of both samples are not equal. Python Code # Example of the Wilcoxon Signed-Rank Test from scipy.stats import wilcoxon data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 1.142 , - 0.432 , - 0.938 , - 0.729 , - 0.846 , - 0.157 , 0.500 , 1.183 , - 1.075 , - 0.169 ] stat , p = wilcoxon ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same distribution' ) else : print ( 'Probably different distributions' ) Sources scipy.stats.wilcoxon Wilcoxon signed-rank test on Wikipedia","title":"Wilcoxon Signed-Rank Test"},{"location":"Cheat-Sheets/Hypothesis-Tests/#kruskal-wallis-h-test","text":"Tests whether the distributions of two or more independent samples are equal or not. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample can be ranked. Interpretation H0: the distributions of all samples are equal. H1: the distributions of one or more samples are not equal. Python Code # Example of the Kruskal-Wallis H Test from scipy.stats import kruskal data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 1.142 , - 0.432 , - 0.938 , - 0.729 , - 0.846 , - 0.157 , 0.500 , 1.183 , - 1.075 , - 0.169 ] stat , p = kruskal ( data1 , data2 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same distribution' ) else : print ( 'Probably different distributions' ) Sources scipy.stats.kruskal Kruskal-Wallis one-way analysis of variance on Wikipedia","title":"Kruskal-Wallis H Test"},{"location":"Cheat-Sheets/Hypothesis-Tests/#friedman-test","text":"Tests whether the distributions of two or more paired samples are equal or not. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample can be ranked. Observations across each sample are paired. Interpretation H0: the distributions of all samples are equal. H1: the distributions of one or more samples are not equal. Python Code # Example of the Friedman Test from scipy.stats import friedmanchisquare data1 = [ 0.873 , 2.817 , 0.121 , - 0.945 , - 0.055 , - 1.436 , 0.360 , - 1.478 , - 1.637 , - 1.869 ] data2 = [ 1.142 , - 0.432 , - 0.938 , - 0.729 , - 0.846 , - 0.157 , 0.500 , 1.183 , - 1.075 , - 0.169 ] data3 = [ - 0.208 , 0.696 , 0.928 , - 1.148 , - 0.213 , 0.229 , 0.137 , 0.269 , - 0.870 , - 1.204 ] stat , p = friedmanchisquare ( data1 , data2 , data3 ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same distribution' ) else : print ( 'Probably different distributions' ) Sources scipy.stats.friedmanchisquare Friedman test on Wikipedia","title":"Friedman Test"},{"location":"Cheat-Sheets/Hypothesis-Tests/#equality-of-variance-test","text":"Test is used to assess the equality of variance between two different samples.","title":"Equality of variance test"},{"location":"Cheat-Sheets/Hypothesis-Tests/#levenes-test","text":"Levene\u2019s test is used to assess the equality of variance between two or more different samples. Assumptions The samples from the populations under consideration are independent. The populations under consideration are approximately normally distributed. Interpretation H0: All the samples variances are equal H1: At least one variance is different from the rest Python Code # Example of the Levene's test from scipy.stats import levene a = [ 8.88 , 9.12 , 9.04 , 8.98 , 9.00 , 9.08 , 9.01 , 8.85 , 9.06 , 8.99 ] b = [ 8.88 , 8.95 , 9.29 , 9.44 , 9.15 , 9.58 , 8.36 , 9.18 , 8.67 , 9.05 ] c = [ 8.95 , 9.12 , 8.95 , 8.85 , 9.03 , 8.84 , 9.07 , 8.98 , 8.86 , 8.98 ] stat , p = levene ( a , b , c ) print ( 'stat= %.3f , p= %.3f ' % ( stat , p )) if p > 0.05 : print ( 'Probably the same variances' ) else : print ( 'Probably at least one variance is different from the rest' ) Sources scipy.stats.levene Levene's test on Wikipedia Source: https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/","title":"Levene's test"},{"location":"Cheat-Sheets/Keras/","text":"","title":"Keras"},{"location":"Cheat-Sheets/NumPy/","text":"","title":"NumPy"},{"location":"Cheat-Sheets/Pandas/","text":"","title":"Pandas"},{"location":"Cheat-Sheets/PySpark/","text":"","title":"PySpark"},{"location":"Cheat-Sheets/PyTorch/","text":"","title":"PyTorch"},{"location":"Cheat-Sheets/Python/","text":"","title":"Python"},{"location":"Cheat-Sheets/RegEx/","text":"","title":"Regular Expressions (RegEx)"},{"location":"Cheat-Sheets/SQL/","text":"","title":"SQL"},{"location":"Cheat-Sheets/Sk-learn/","text":"","title":"Scikit Learn"},{"location":"Cheat-Sheets/tensorflow/","text":"","title":"TensorFlow"},{"location":"Deploying-ML-models/deploying-ml-models/","text":"Home Introduction This is a completely open-source platform for maintaining curated list of interview questions and answers for people looking and preparing for data science opportunities. Not only this, the platform will also serve as one point destination for all your needs like tutorials, online materials, etc. This platform is maintained by you! \ud83e\udd17 You can help us by answering/ improving existing questions as well as by sharing any new questions that you faced during your interviews. Contribute to the platform Contribution in any form will be deeply appreciated. \ud83d\ude4f Add questions \u2753 Add your questions here . Please ensure to provide a detailed description to allow your fellow contributors to understand your questions and answer them to your satisfaction. \ud83e\udd1d Please note that as of now, you cannot directly add a question via a pull request. This will help us to maintain the quality of the content for you . Add answers/topics \ud83d\udcdd These are the answers/topics that need your help at the moment Add documentation for the project Online Material for Learning Suggested Learning Paths Cheat Sheets Django Flask Numpy Pandas PySpark Python RegEx SQL NLP Interview Questions Add python common DSA interview questions Add Major ML topics Linear Regression Logistic Regression SVM Random Forest Gradient boosting PCA Collaborative Filtering K-means clustering kNN ARIMA Neural Networks Decision Trees Overfitting, Underfitting Unbalanced, Skewed data Activation functions relu/ leaky relu Normalization DBSCAN Normal Distribution Precision, Recall Loss Function MAE, RMSE Add Pandas questions Add NumPy questions Add TensorFlow questions Add PyTorch questions Add list of learning resources Report/Solve Issues \ud83d\udd27 To report any issues find me on LinkedIn or raise an issue on GitHub. \ud83d\udee0 You can also solve existing issues on GitHub and create a pull request. Say Thanks \ud83d\ude0a If this platform helped you in any way, it would be great if you could share it with others. Check out this \ud83d\udc47 platform \ud83d\udc47 for data science content: \ud83d\udc49 https://singhsidhukuldeep.github.io/data-science-interview-prep/ \ud83d\udc48 #data-science #machine-learning #interview-preparation You can also star the repository on GitHub and watch-out for any updates Features \ud83c\udfa8 Beautiful: The design is built on top of most popular libraries like MkDocs and material which allows the platform to be responsive and to work on all sorts of devices \u2013 from mobile phones to wide-screens. The underlying fluid layout will always adapt perfectly to the available screen space. \ud83e\uddd0 Searchable: almost magically, all the content on the website is searchable without any further ado. The built-in search \u2013 server-less \u2013 is fast and accurate in responses to any of the queries. \ud83d\ude4c Accessible: Easy to use: \ud83d\udc4c The website is hosted on github-pages and is free and open to use to over 40 million users of GitHub in 100+ countries. Easy to contribute: \ud83e\udd1d The website embodies the concept of collaboration to the latter. Allowing anyone to add/improve the content. To make contributing easy, everything is written in MarkDown and then compiled to beautiful html. Setup No setup is required for usage of the platform Important: It is strongly advised to use virtual environment and not change anything in gh-pages Linux Systems python3 -m venv ./venv source venv/bin/activate pip3 install -r requirements.txt deactivate Windows Systems python3 -m venv ./venv venv \\S cripts \\a ctivate pip3 install -r requirements.txt venv \\S cripts \\d eactivate To install the latest pip3 install mkdocs pip3 install mkdocs-material Useful Commands mkdocs serve - Start the live-reloading docs server. mkdocs build - Build the documentation site. mkdocs -h - Print help message and exit. mkdocs gh-deploy - Use mkdocs gh-deploy --help to get a full list of options available for the gh-deploy command. Be aware that you will not be able to review the built site before it is pushed to GitHub. Therefore, you may want to verify any changes you make to the docs beforehand by using the build or serve commands and reviewing the built files locally. mkdocs new [dir-name] - Create a new project. No need to create a new project Useful Documents \ud83d\udcd1 MkDocs: https://github.com/mkdocs/mkdocs \ud83c\udfa8 Theme: https://github.com/squidfunk/mkdocs-material FAQ Can I filter questions based on companies? \ud83e\udd2a As much as this platform aims to help you with your interview preparation, it is not a short-cut to crack one. Think of this platform as a practicing field to help you sharpen your skills for your interview processes. However, for your convenience we have sorted all the questions by topics for you. \ud83e\udd13 This doesn't mean that such feature won't be added in the future. \"Never say Never\" But as of now there is neither plan nor data to do so. \ud83d\ude22 Why is this platform free? \ud83e\udd17 Currently there is no major cost involved in maintaining this platform other than time and effort that is put in by every contributor . If you want to help you can contribute here . If you still want to pay for something that is free, we would request you to donate it to a charity of your choice instead. \ud83d\ude07 Credits Maintained by \ud83d\udc68\u200d\ud83c\udf93 Kuldeep Singh Sidhu Github: github/singhsidhukuldeep https://github.com/singhsidhukuldeep Website: Kuldeep Singh Sidhu (Website) http://kuldeepsinghsidhu.com LinkedIn: Kuldeep Singh Sidhu (LinkedIn) https://www.linkedin.com/in/singhsidhukuldeep/ Contributors \ud83d\ude0e The full list of all the contributors is available here Current Status","title":"Production Deployment"},{"location":"Deploying-ML-models/deploying-ml-models/#home","text":"","title":"Home"},{"location":"Deploying-ML-models/deploying-ml-models/#introduction","text":"This is a completely open-source platform for maintaining curated list of interview questions and answers for people looking and preparing for data science opportunities. Not only this, the platform will also serve as one point destination for all your needs like tutorials, online materials, etc. This platform is maintained by you! \ud83e\udd17 You can help us by answering/ improving existing questions as well as by sharing any new questions that you faced during your interviews.","title":"Introduction"},{"location":"Deploying-ML-models/deploying-ml-models/#contribute-to-the-platform","text":"Contribution in any form will be deeply appreciated. \ud83d\ude4f","title":"Contribute to the platform"},{"location":"Deploying-ML-models/deploying-ml-models/#add-questions","text":"\u2753 Add your questions here . Please ensure to provide a detailed description to allow your fellow contributors to understand your questions and answer them to your satisfaction. \ud83e\udd1d Please note that as of now, you cannot directly add a question via a pull request. This will help us to maintain the quality of the content for you .","title":"Add questions"},{"location":"Deploying-ML-models/deploying-ml-models/#add-answerstopics","text":"\ud83d\udcdd These are the answers/topics that need your help at the moment Add documentation for the project Online Material for Learning Suggested Learning Paths Cheat Sheets Django Flask Numpy Pandas PySpark Python RegEx SQL NLP Interview Questions Add python common DSA interview questions Add Major ML topics Linear Regression Logistic Regression SVM Random Forest Gradient boosting PCA Collaborative Filtering K-means clustering kNN ARIMA Neural Networks Decision Trees Overfitting, Underfitting Unbalanced, Skewed data Activation functions relu/ leaky relu Normalization DBSCAN Normal Distribution Precision, Recall Loss Function MAE, RMSE Add Pandas questions Add NumPy questions Add TensorFlow questions Add PyTorch questions Add list of learning resources","title":"Add answers/topics"},{"location":"Deploying-ML-models/deploying-ml-models/#reportsolve-issues","text":"\ud83d\udd27 To report any issues find me on LinkedIn or raise an issue on GitHub. \ud83d\udee0 You can also solve existing issues on GitHub and create a pull request.","title":"Report/Solve Issues"},{"location":"Deploying-ML-models/deploying-ml-models/#say-thanks","text":"\ud83d\ude0a If this platform helped you in any way, it would be great if you could share it with others. Check out this \ud83d\udc47 platform \ud83d\udc47 for data science content: \ud83d\udc49 https://singhsidhukuldeep.github.io/data-science-interview-prep/ \ud83d\udc48 #data-science #machine-learning #interview-preparation You can also star the repository on GitHub and watch-out for any updates","title":"Say Thanks"},{"location":"Deploying-ML-models/deploying-ml-models/#features","text":"\ud83c\udfa8 Beautiful: The design is built on top of most popular libraries like MkDocs and material which allows the platform to be responsive and to work on all sorts of devices \u2013 from mobile phones to wide-screens. The underlying fluid layout will always adapt perfectly to the available screen space. \ud83e\uddd0 Searchable: almost magically, all the content on the website is searchable without any further ado. The built-in search \u2013 server-less \u2013 is fast and accurate in responses to any of the queries. \ud83d\ude4c Accessible: Easy to use: \ud83d\udc4c The website is hosted on github-pages and is free and open to use to over 40 million users of GitHub in 100+ countries. Easy to contribute: \ud83e\udd1d The website embodies the concept of collaboration to the latter. Allowing anyone to add/improve the content. To make contributing easy, everything is written in MarkDown and then compiled to beautiful html.","title":"Features"},{"location":"Deploying-ML-models/deploying-ml-models/#setup","text":"No setup is required for usage of the platform Important: It is strongly advised to use virtual environment and not change anything in gh-pages","title":"Setup"},{"location":"Deploying-ML-models/deploying-ml-models/#linux-systems","text":"python3 -m venv ./venv source venv/bin/activate pip3 install -r requirements.txt deactivate","title":"Linux Systems"},{"location":"Deploying-ML-models/deploying-ml-models/#windows-systems","text":"python3 -m venv ./venv venv \\S cripts \\a ctivate pip3 install -r requirements.txt venv \\S cripts \\d eactivate","title":"Windows Systems"},{"location":"Deploying-ML-models/deploying-ml-models/#to-install-the-latest","text":"pip3 install mkdocs pip3 install mkdocs-material","title":"To install the latest"},{"location":"Deploying-ML-models/deploying-ml-models/#useful-commands","text":"mkdocs serve - Start the live-reloading docs server. mkdocs build - Build the documentation site. mkdocs -h - Print help message and exit. mkdocs gh-deploy - Use mkdocs gh-deploy --help to get a full list of options available for the gh-deploy command. Be aware that you will not be able to review the built site before it is pushed to GitHub. Therefore, you may want to verify any changes you make to the docs beforehand by using the build or serve commands and reviewing the built files locally. mkdocs new [dir-name] - Create a new project. No need to create a new project","title":"Useful Commands"},{"location":"Deploying-ML-models/deploying-ml-models/#useful-documents","text":"\ud83d\udcd1 MkDocs: https://github.com/mkdocs/mkdocs \ud83c\udfa8 Theme: https://github.com/squidfunk/mkdocs-material","title":"Useful Documents"},{"location":"Deploying-ML-models/deploying-ml-models/#faq","text":"Can I filter questions based on companies? \ud83e\udd2a As much as this platform aims to help you with your interview preparation, it is not a short-cut to crack one. Think of this platform as a practicing field to help you sharpen your skills for your interview processes. However, for your convenience we have sorted all the questions by topics for you. \ud83e\udd13 This doesn't mean that such feature won't be added in the future. \"Never say Never\" But as of now there is neither plan nor data to do so. \ud83d\ude22 Why is this platform free? \ud83e\udd17 Currently there is no major cost involved in maintaining this platform other than time and effort that is put in by every contributor . If you want to help you can contribute here . If you still want to pay for something that is free, we would request you to donate it to a charity of your choice instead. \ud83d\ude07","title":"FAQ"},{"location":"Deploying-ML-models/deploying-ml-models/#credits","text":"","title":"Credits"},{"location":"Deploying-ML-models/deploying-ml-models/#maintained-by","text":"\ud83d\udc68\u200d\ud83c\udf93 Kuldeep Singh Sidhu Github: github/singhsidhukuldeep https://github.com/singhsidhukuldeep Website: Kuldeep Singh Sidhu (Website) http://kuldeepsinghsidhu.com LinkedIn: Kuldeep Singh Sidhu (LinkedIn) https://www.linkedin.com/in/singhsidhukuldeep/","title":"Maintained by"},{"location":"Deploying-ML-models/deploying-ml-models/#contributors","text":"\ud83d\ude0e The full list of all the contributors is available here","title":"Contributors"},{"location":"Deploying-ML-models/deploying-ml-models/#current-status","text":"","title":"Current Status"},{"location":"Interview-Questions/Natural-Language-Processing/","text":"NLP Interview Questions","title":"Natural Language Processing (NLP)"},{"location":"Interview-Questions/Natural-Language-Processing/#nlp-interview-questions","text":"","title":"NLP Interview Questions"},{"location":"Interview-Questions/Probability/","text":"Probability Interview Questions Average score on a dice role of at most 3 times Question Consider a fair 6-sided dice. Your aim is to get the highest score you can, in at-most 3 roles. A score is defined as the number that appears on the face of the dice facing up after the role. You can role at most 3 times but every time you role it is up to you to decide whether you want to role again. The last score will be counted as your final score. Find the average score if you rolled the dice only once? Find the average score that you can get with at most 3 roles? If the dice is fair, why is the average score for at most 3 roles and 1 role not the same? Answer If you role a fair dice once you can get: Score Probability 1 \u2159 2 \u2159 3 \u2159 4 \u2159 5 \u2159 6 \u2159 So your average score with one role is: sum of(score * scores's probability) = (1+2+3+4+5+6)*(\u2159) = (21/6) = 3.5 The average score if you rolled the dice only once is 3.5 For at most 3 roles, let's try back-tracking. Let's say just did your second role and you have to decide whether to do your 3 rd role! We just found out if we role dice once on average we can expect score of 3.5. So we will only role the 3 rd time if score on 2 nd role is less than 3.5 i.e (1,2 or 3) Possibilities 2 nd role score Probability 3 rd role score Probability 1 \u2159 3.5 \u2159 2 \u2159 3.5 \u2159 3 \u2159 3.5 \u2159 4 \u2159 NA We won't role 5 \u2159 NA 3 rd time if we 6 \u2159 NA get score >3 on 2 nd So if we had 2 roles, average score would be: [We role again if current score is less than 3.4] (3.5)*(1/6) + (3.5)*(1/6) + (3.5)*(1/6) + (4)*(1/6) + (5)*(1/6) + (6)*(1/6) [Decide not to role again] = 1.75 + 2.5 = 4.25 The average score if you rolled the dice twice is 4.25 So now if we look from the perspective of first role. We will only role again if our score is less than 4.25 i.e 1,2,3 or 4 Possibilities 1 st role score Probability 2 nd and 3 rd role score Probability 1 \u2159 4.25 \u2159 2 \u2159 4.25 \u2159 3 \u2159 4.25 \u2159 4 \u2159 4.25 \u2159 5 \u2159 NA We won't role again if we 6 \u2159 NA get score >4.25 on 1 st So if we had 3 roles, average score would be: [We role again if current score is less than 4.25] (4.25)*(1/6) + (4.25)*(1/6) + (4.25)*(1/6) + (4.25)*(1/6) + (5)*(1/6) + (6)*(1/6) [[Decide not to role again] = 17/6 + 11/6 = 4.66 The average score if you rolled the dice only once is 4.66 The average score for at most 3 roles and 1 role is not the same because although the dice is fair the event of rolling the dice is no longer independent . The scores would have been the same if we rolled the dice 2 nd and 3 rd time without considering what we got in the last roll i.e. if the event of rolling the dice was independent.","title":"Probability"},{"location":"Interview-Questions/Probability/#probability-interview-questions","text":"","title":"Probability Interview Questions"},{"location":"Interview-Questions/Probability/#average-score-on-a-dice-role-of-at-most-3-times","text":"Question Consider a fair 6-sided dice. Your aim is to get the highest score you can, in at-most 3 roles. A score is defined as the number that appears on the face of the dice facing up after the role. You can role at most 3 times but every time you role it is up to you to decide whether you want to role again. The last score will be counted as your final score. Find the average score if you rolled the dice only once? Find the average score that you can get with at most 3 roles? If the dice is fair, why is the average score for at most 3 roles and 1 role not the same? Answer If you role a fair dice once you can get: Score Probability 1 \u2159 2 \u2159 3 \u2159 4 \u2159 5 \u2159 6 \u2159 So your average score with one role is: sum of(score * scores's probability) = (1+2+3+4+5+6)*(\u2159) = (21/6) = 3.5 The average score if you rolled the dice only once is 3.5 For at most 3 roles, let's try back-tracking. Let's say just did your second role and you have to decide whether to do your 3 rd role! We just found out if we role dice once on average we can expect score of 3.5. So we will only role the 3 rd time if score on 2 nd role is less than 3.5 i.e (1,2 or 3) Possibilities 2 nd role score Probability 3 rd role score Probability 1 \u2159 3.5 \u2159 2 \u2159 3.5 \u2159 3 \u2159 3.5 \u2159 4 \u2159 NA We won't role 5 \u2159 NA 3 rd time if we 6 \u2159 NA get score >3 on 2 nd So if we had 2 roles, average score would be: [We role again if current score is less than 3.4] (3.5)*(1/6) + (3.5)*(1/6) + (3.5)*(1/6) + (4)*(1/6) + (5)*(1/6) + (6)*(1/6) [Decide not to role again] = 1.75 + 2.5 = 4.25 The average score if you rolled the dice twice is 4.25 So now if we look from the perspective of first role. We will only role again if our score is less than 4.25 i.e 1,2,3 or 4 Possibilities 1 st role score Probability 2 nd and 3 rd role score Probability 1 \u2159 4.25 \u2159 2 \u2159 4.25 \u2159 3 \u2159 4.25 \u2159 4 \u2159 4.25 \u2159 5 \u2159 NA We won't role again if we 6 \u2159 NA get score >4.25 on 1 st So if we had 3 roles, average score would be: [We role again if current score is less than 4.25] (4.25)*(1/6) + (4.25)*(1/6) + (4.25)*(1/6) + (4.25)*(1/6) + (5)*(1/6) + (6)*(1/6) [[Decide not to role again] = 17/6 + 11/6 = 4.66 The average score if you rolled the dice only once is 4.66 The average score for at most 3 roles and 1 role is not the same because although the dice is fair the event of rolling the dice is no longer independent . The scores would have been the same if we rolled the dice 2 nd and 3 rd time without considering what we got in the last roll i.e. if the event of rolling the dice was independent.","title":"Average score on a dice role of at most 3 times"},{"location":"Interview-Questions/System-design/","text":"System Design","title":"System Design"},{"location":"Interview-Questions/System-design/#system-design","text":"","title":"System Design"},{"location":"Interview-Questions/data-structures-algorithms/","text":"Data Structure and Algorithms (DSA) To-do Add https://leetcode.com/discuss/interview-question/344650/Amazon-Online-Assessment-Questions \ud83d\ude01 Easy Two Number Sum Write a function that takes in a non-empty array of distinct integers and an integer representing a target sum. If any two numbers in the input array sum up to the target sum, the function should return them in an array, in any order. If no two numbers sum up to the target sum, the function should return an empty array. # O(n) time | O(n) space def twoNumberSum ( array , targetSum ): avail = set () for i , v in enumerate ( array ): if targetSum - v in avail : return [ targetSum - v , v ] else : avail . add ( v ) return [] pass # O(nlog(n)) time | O(1) space def twoNumberSum ( array , targetSum ): array . sort () n = len ( array ) left = 0 right = n - 1 while left < right : currSum = array [ left ] + array [ right ] if currSum == targetSum : return [ array [ left ], array [ right ]] elif currSum < targetSum : left += 1 elif currSum > targetSum : right -= 1 return [] pass # O(n^2) time | O(1) space def twoNumberSum ( array , targetSum ): n = len ( array ) for i in range ( n - 1 ): for j in range ( i + 1 , n ): if array [ i ] + array [ j ] == targetSum : return [ array [ i ], array [ j ]] return [] pass Validate Subsequence Given two non-empty arrays of integers, write a function that determines whether the second array is a subsequence of the first one. A subsequence of an array is a set of numbers that aren't necessarily adjacent in the array but that are in the same order as they appear in the array. For instance, the numbers [1, 3, 4] form a subsequence of the array [1, 2, 3, 4] , and so do the numbers [2, 4]. Note that a single number in an array and the array itself are both valid subsequences of the array. # O(n) time | O(1) space - where n is the length of the array def isValidSubsequence ( array , sequence ): pArray = pSequence = 0 while pArray < len ( array ) and pSequence < len ( sequence ): if array [ pArray ] == sequence [ pSequence ]: pArray += 1 pSequence += 1 else : pArray += 1 return pSequence == len ( sequence ) pass Nth Fibonacci The Fibonacci sequence is defined as follows: Any number in the sequence is the sum of the previous 2. for fib[n] = fib[n-1] + fib[n-2] The 1 st and 2 nd are fixed at 0,1 Find the nth Nth Fibonacci sequence # O(n) time | O(n) space def getNthFib ( n ): dp = [ 0 , 1 ] while len ( dp ) < n : dp . append ( dp [ - 1 ] + dp [ - 2 ]) return dp [ n - 1 ] pass # O(n) time | O(1) space def getNthFib ( n ): last_two = [ 0 , 1 ] count = 2 while count < n : currFib = last_two [ 0 ] + last_two [ 1 ] last_two [ 0 ] = last_two [ 1 ] last_two [ 1 ] = currFib count += 1 return last_two [ 1 ] if n > 1 else last_two [ 0 ] pass Product Sum Write a function that takes in a \"special\" array and returns its product sum. A \"special\" array is a non-empty array that contains either integers or other \"special\" arrays. The product sum of a \"special\" array is the sum of its elements, where \"special\" arrays inside it are summed themselves and then multiplied by their level of depth. For example, the product sum of [x, y] is x + y ; the product sum of [x, [y, z]] is x + 2y + 2z Eg: Input: [5, 2, [7, -1], 3, [6, [-13, 8], 4]] Output: 12 # calculated as: 5 + 2 + 2 * (7 - 1) + 3 + 2 * (6 + 3 * (-13 + 8) + 4) # O(n) time | O(d) space - where n is the total number of elements in the array, # including sub-elements, and d is the greatest depth of \"special\" arrays in the array def productSum ( array , depth = 1 ): sum = 0 for i , v in enumerate ( array ): if type ( v ) is list : sum += productSum ( v , depth + 1 ) else : sum += v return sum * depth pass \ud83d\ude42 Medium Top K Frequent Words Given a non-empty list of words, return the k most frequent elements. Your answer should be sorted by frequency from highest to lowest. If two words have the same frequency, then the word with the lower alphabetical order comes first. Example 1: Input: [\"i\", \"love\", \"leetcode\", \"i\", \"love\", \"coding\"], k = 2 Output: [\"i\", \"love\"] Explanation: \"i\" and \"love\" are the two most frequent words. Note that \"i\" comes before \"love\" due to a lower alphabetical order. Example 2: Input: [\"the\", \"day\", \"is\", \"sunny\", \"the\", \"the\", \"the\", \"sunny\", \"is\", \"is\"], k = 4 Output: [\"the\", \"is\", \"sunny\", \"day\"] Explanation: \"the\", \"is\", \"sunny\" and \"day\" are the four most frequent words, with the number of occurrence being 4, 3, 2 and 1 respectively. Note: You may assume k is always valid, 1 \u2264 k \u2264 number of unique elements. Input words contain only lowercase letters. Follow up: Try to solve it in O ( n log k ) time and O ( n ) extra space. # Count the frequency of each word, and # sort the words with a custom ordering relation # that uses these frequencies. Then take the best k of them. # Time Complexity: O(N \\log{N})O(NlogN), where NN is the length of words. # We count the frequency of each word in O(N)O(N) time, # then we sort the given words in O(N \\log{N})O(NlogN) time. # Space Complexity: O(N)O(N), the space used to store our uniqueWords. def topKFrequentWords ( words , k ) -> List [ str ]: from collections import Counter wordsFreq = Counter ( words ) uniqueWords = list ( wordsFreq . keys ()) uniqueWords . sort ( key = lambda x : ( - wordsFreq [ x ], x )) return uniqueWords [: k ] # Time Complexity: O(N \\log{k})O(Nlogk), where NN is the length of words. # We count the frequency of each word in O(N)O(N) time, then we add NN words to the heap, # each in O(\\log {k})O(logk) time. Finally, we pop from the heap up to kk times. # As k \\leq Nk\u2264N, this is O(N \\log{k})O(Nlogk) in total. # In Python, we improve this to O(N + k \\log {N})O(N+klogN): our heapq.heapify operation and # counting operations are O(N)O(N), and # each of kk heapq.heappop operations are O(\\log {N})O(logN). # Space Complexity: O(N)O(N), the space used to store our wordsFreq. # Count the frequency of each word, then add it to heap that stores the best k candidates. # Here, \"best\" is defined with our custom ordering relation, # which puts the worst candidates at the top of the heap. # At the end, we pop off the heap up to k times and reverse the result # so that the best candidates are first. # In Python, we instead use heapq.heapify, which can turn a list into a heap in linear time, # simplifying our work. def topKFrequentWords ( words , k ) -> List [ str ]: from heapq import heapify , heappop #, heappush from collections import Counter wordsFreq = Counter ( words ) heap = [( - freq , word ) for word , freq in wordsFreq . items ()] heapq . heapify ( heap ) return [ heapq . heappop ( heap )[ 1 ] for _ in range ( k )] \ud83e\udd28 Hard \ud83d\ude32 Very Hard","title":"DSA (Data Structures & Algorithms)"},{"location":"Interview-Questions/data-structures-algorithms/#data-structure-and-algorithms-dsa","text":"","title":"Data Structure and Algorithms (DSA)"},{"location":"Interview-Questions/data-structures-algorithms/#to-do","text":"Add https://leetcode.com/discuss/interview-question/344650/Amazon-Online-Assessment-Questions","title":"To-do"},{"location":"Interview-Questions/data-structures-algorithms/#easy","text":"","title":"\ud83d\ude01 Easy"},{"location":"Interview-Questions/data-structures-algorithms/#two-number-sum","text":"Write a function that takes in a non-empty array of distinct integers and an integer representing a target sum. If any two numbers in the input array sum up to the target sum, the function should return them in an array, in any order. If no two numbers sum up to the target sum, the function should return an empty array. # O(n) time | O(n) space def twoNumberSum ( array , targetSum ): avail = set () for i , v in enumerate ( array ): if targetSum - v in avail : return [ targetSum - v , v ] else : avail . add ( v ) return [] pass # O(nlog(n)) time | O(1) space def twoNumberSum ( array , targetSum ): array . sort () n = len ( array ) left = 0 right = n - 1 while left < right : currSum = array [ left ] + array [ right ] if currSum == targetSum : return [ array [ left ], array [ right ]] elif currSum < targetSum : left += 1 elif currSum > targetSum : right -= 1 return [] pass # O(n^2) time | O(1) space def twoNumberSum ( array , targetSum ): n = len ( array ) for i in range ( n - 1 ): for j in range ( i + 1 , n ): if array [ i ] + array [ j ] == targetSum : return [ array [ i ], array [ j ]] return [] pass","title":"Two Number Sum"},{"location":"Interview-Questions/data-structures-algorithms/#validate-subsequence","text":"Given two non-empty arrays of integers, write a function that determines whether the second array is a subsequence of the first one. A subsequence of an array is a set of numbers that aren't necessarily adjacent in the array but that are in the same order as they appear in the array. For instance, the numbers [1, 3, 4] form a subsequence of the array [1, 2, 3, 4] , and so do the numbers [2, 4]. Note that a single number in an array and the array itself are both valid subsequences of the array. # O(n) time | O(1) space - where n is the length of the array def isValidSubsequence ( array , sequence ): pArray = pSequence = 0 while pArray < len ( array ) and pSequence < len ( sequence ): if array [ pArray ] == sequence [ pSequence ]: pArray += 1 pSequence += 1 else : pArray += 1 return pSequence == len ( sequence ) pass","title":"Validate Subsequence"},{"location":"Interview-Questions/data-structures-algorithms/#nth-fibonacci","text":"The Fibonacci sequence is defined as follows: Any number in the sequence is the sum of the previous 2. for fib[n] = fib[n-1] + fib[n-2] The 1 st and 2 nd are fixed at 0,1 Find the nth Nth Fibonacci sequence # O(n) time | O(n) space def getNthFib ( n ): dp = [ 0 , 1 ] while len ( dp ) < n : dp . append ( dp [ - 1 ] + dp [ - 2 ]) return dp [ n - 1 ] pass # O(n) time | O(1) space def getNthFib ( n ): last_two = [ 0 , 1 ] count = 2 while count < n : currFib = last_two [ 0 ] + last_two [ 1 ] last_two [ 0 ] = last_two [ 1 ] last_two [ 1 ] = currFib count += 1 return last_two [ 1 ] if n > 1 else last_two [ 0 ] pass","title":"Nth Fibonacci"},{"location":"Interview-Questions/data-structures-algorithms/#product-sum","text":"Write a function that takes in a \"special\" array and returns its product sum. A \"special\" array is a non-empty array that contains either integers or other \"special\" arrays. The product sum of a \"special\" array is the sum of its elements, where \"special\" arrays inside it are summed themselves and then multiplied by their level of depth. For example, the product sum of [x, y] is x + y ; the product sum of [x, [y, z]] is x + 2y + 2z Eg: Input: [5, 2, [7, -1], 3, [6, [-13, 8], 4]] Output: 12 # calculated as: 5 + 2 + 2 * (7 - 1) + 3 + 2 * (6 + 3 * (-13 + 8) + 4) # O(n) time | O(d) space - where n is the total number of elements in the array, # including sub-elements, and d is the greatest depth of \"special\" arrays in the array def productSum ( array , depth = 1 ): sum = 0 for i , v in enumerate ( array ): if type ( v ) is list : sum += productSum ( v , depth + 1 ) else : sum += v return sum * depth pass","title":"Product Sum"},{"location":"Interview-Questions/data-structures-algorithms/#medium","text":"","title":"\ud83d\ude42 Medium"},{"location":"Interview-Questions/data-structures-algorithms/#top-k-frequent-words","text":"Given a non-empty list of words, return the k most frequent elements. Your answer should be sorted by frequency from highest to lowest. If two words have the same frequency, then the word with the lower alphabetical order comes first. Example 1: Input: [\"i\", \"love\", \"leetcode\", \"i\", \"love\", \"coding\"], k = 2 Output: [\"i\", \"love\"] Explanation: \"i\" and \"love\" are the two most frequent words. Note that \"i\" comes before \"love\" due to a lower alphabetical order. Example 2: Input: [\"the\", \"day\", \"is\", \"sunny\", \"the\", \"the\", \"the\", \"sunny\", \"is\", \"is\"], k = 4 Output: [\"the\", \"is\", \"sunny\", \"day\"] Explanation: \"the\", \"is\", \"sunny\" and \"day\" are the four most frequent words, with the number of occurrence being 4, 3, 2 and 1 respectively. Note: You may assume k is always valid, 1 \u2264 k \u2264 number of unique elements. Input words contain only lowercase letters. Follow up: Try to solve it in O ( n log k ) time and O ( n ) extra space. # Count the frequency of each word, and # sort the words with a custom ordering relation # that uses these frequencies. Then take the best k of them. # Time Complexity: O(N \\log{N})O(NlogN), where NN is the length of words. # We count the frequency of each word in O(N)O(N) time, # then we sort the given words in O(N \\log{N})O(NlogN) time. # Space Complexity: O(N)O(N), the space used to store our uniqueWords. def topKFrequentWords ( words , k ) -> List [ str ]: from collections import Counter wordsFreq = Counter ( words ) uniqueWords = list ( wordsFreq . keys ()) uniqueWords . sort ( key = lambda x : ( - wordsFreq [ x ], x )) return uniqueWords [: k ] # Time Complexity: O(N \\log{k})O(Nlogk), where NN is the length of words. # We count the frequency of each word in O(N)O(N) time, then we add NN words to the heap, # each in O(\\log {k})O(logk) time. Finally, we pop from the heap up to kk times. # As k \\leq Nk\u2264N, this is O(N \\log{k})O(Nlogk) in total. # In Python, we improve this to O(N + k \\log {N})O(N+klogN): our heapq.heapify operation and # counting operations are O(N)O(N), and # each of kk heapq.heappop operations are O(\\log {N})O(logN). # Space Complexity: O(N)O(N), the space used to store our wordsFreq. # Count the frequency of each word, then add it to heap that stores the best k candidates. # Here, \"best\" is defined with our custom ordering relation, # which puts the worst candidates at the top of the heap. # At the end, we pop off the heap up to k times and reverse the result # so that the best candidates are first. # In Python, we instead use heapq.heapify, which can turn a list into a heap in linear time, # simplifying our work. def topKFrequentWords ( words , k ) -> List [ str ]: from heapq import heapify , heappop #, heappush from collections import Counter wordsFreq = Counter ( words ) heap = [( - freq , word ) for word , freq in wordsFreq . items ()] heapq . heapify ( heap ) return [ heapq . heappop ( heap )[ 1 ] for _ in range ( k )]","title":"Top K Frequent Words"},{"location":"Interview-Questions/data-structures-algorithms/#hard","text":"","title":"\ud83e\udd28 Hard"},{"location":"Interview-Questions/data-structures-algorithms/#very-hard","text":"","title":"\ud83d\ude32 Very Hard"},{"location":"Machine-Learning/ARIMA/","text":"","title":"ARIMA"},{"location":"Machine-Learning/Activation%20functions/","text":"","title":"Activation functions"},{"location":"Machine-Learning/Collaborative%20Filtering/","text":"","title":"Collaborative Filtering"},{"location":"Machine-Learning/Confusion%20Matrix/","text":"","title":"Confusion Matrix"},{"location":"Machine-Learning/DBSCAN/","text":"","title":"DBSCAN"},{"location":"Machine-Learning/Decision%20Trees/","text":"","title":"Decision Trees"},{"location":"Machine-Learning/Gradient%20Boosting/","text":"","title":"Gradient Boosting"},{"location":"Machine-Learning/K-means%20clustering/","text":"","title":"K means clustering"},{"location":"Machine-Learning/Linear%20Regression/","text":"","title":"Linear Regression"},{"location":"Machine-Learning/Logistic%20Regression/","text":"","title":"Logistic Regression"},{"location":"Machine-Learning/Loss%20Function%20MAE%2C%20RMSE/","text":"","title":"Loss Function MAE, RMSE"},{"location":"Machine-Learning/Neural%20Networks/","text":"","title":"Neural Networks"},{"location":"Machine-Learning/Normal%20Distribution/","text":"","title":"Normal Distribution"},{"location":"Machine-Learning/Normalization%20Regularisation/","text":"","title":"Normalization Regularisation"},{"location":"Machine-Learning/Overfitting%2C%20Underfitting/","text":"","title":"Overfitting, Underfitting"},{"location":"Machine-Learning/PCA/","text":"","title":"PCA"},{"location":"Machine-Learning/Random%20Forest/","text":"","title":"Random Forest"},{"location":"Machine-Learning/Support%20Vector%20Machines/","text":"","title":"Support Vector Machines"},{"location":"Machine-Learning/Unbalanced%2C%20Skewed%20data/","text":"","title":"Unbalanced, Skewed data"},{"location":"Machine-Learning/kNN/","text":"","title":"kNN"},{"location":"Online-Material/Online-Material-for-Learning/","text":"","title":"Online Study Material"},{"location":"Online-Material/popular-resouces/","text":"","title":"Popular Blogs"},{"location":"as-fast-as-possible/","text":"","title":"Introduction"},{"location":"as-fast-as-possible/Deep-CV/","text":"","title":"Deep Computer Vision"},{"location":"as-fast-as-possible/Deep-NLP/","text":"","title":"Deep Natural Language Processing"},{"location":"as-fast-as-possible/Neural-Networks/","text":"","title":"Neural Networks"},{"location":"as-fast-as-possible/TF2-Keras/","text":"","title":"Tensorflow 2 with Keras"}]} \ No newline at end of file diff --git a/sitemap.xml b/sitemap.xml index c94a470..1816fcf 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -2,237 +2,242 @@ None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 daily None - 2022-08-03 + 2023-04-12 + daily + + + None + 2023-04-12 daily \ No newline at end of file diff --git a/sitemap.xml.gz b/sitemap.xml.gz index 2f670fe..4ea5314 100644 Binary files a/sitemap.xml.gz and b/sitemap.xml.gz differ