Skip to content
View 21TN2's full-sized avatar

Block or report 21TN2

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
21TN2/README.md

Hi there, I'm Theresa NguyenπŸ‘‹

text here

About Me:

text

I am always eager to learn and expand my knowledge in these fields, and I have a passion for using data and technology to make a positive impact. My portfolio on GitHub showcases a selection of my projects and experiences, and demonstrates my capabilities as a Data Science, Data Engineer and Machine Learning Professional.Whether it's through building Predictive Models, Computer Vision, Recommendation System, A/B Testing,Text Mining or Developing Data Pipelines, I am dedicated to using my skills to drive innovation and solve complex problems. I am excited to share my work with others and contribute to the Data Science and Machine Learning Community. Thank you for considering my portfolio.πŸ™‚

πŸ”§ πŸͺš My Skills:

I started learning python in my first semester which helped to understand data science and machine learning.I also learned R language which is one of the important languages in statistics and machine learning. Below is a list of skills that I have gained through my experience in the field of data science and machine learning. Learning these essential skills and techniques helped me to built and deployed projects successfully.
Python R SQLite MySQL MongoDB MicrosoftSQLServer ApacheCassandra Postgres AWS Spyder Keras Matplotlib NumPy Pandas PyTorch scikit-learn SciPy TensorFlow Microsoft Excel Power Bi Apache Airflow Apache Git Microsoft Office Microsoft PowerPoint Anaconda Plotly mlflow PyCharm Tableau DialogFlow Git AWS Azure Google Cloud OpenCV Docker Grafana Kubernetes

πŸ“ πŸ”– Portfolio Overview

Thank you for visiting my portfolio! I have had an awesome experience working on machine learning and deep learning projects and am excited to share them with you. Please find below links to my projects. For more detailed information about each project, including results and descriptions, please click on the project link. Below is a summary of my entire portfolio. Thank you for considering my work.πŸ™‚

my banner

Machine Learning Projects

Bank Customers Churn Prediction Model Gym Members attendance Prediction Model
my banner
RFM Analysis and Customer Segmentation Time Series Modelling using ARIMA & SARIMAX Model
my banner
Movies Recommendation System Collaborative Filtering for Movies using Matrix Factorisation
my banner
House Price Prediction Model Popular Recipe Predition Model
Speech Emotion Recognition

Computer Vision Projects

Animated Face Generation using GAN
Fashion Clothes Classification Using Convolutional Neural Network Sign Language Indentification

Unsupervised Machine Learning Projects

Credit Card Customer Segmentation

Big Data Projects

Analyzing Customer Shopping Behavior using AWS Services End to End ML Model Using Pyspark
my banner
Analysing Car's Performance using Pyspark Performance Comparsion between Mongodb and Cassandra
my banner

Natural Language Processing (NLP) Projects

Twitter Sentiment Analysis Extracting Stock Sentiment from News Headlines
my banner

Data Mining Projects

Market Basket Analysis for Grocery Store Restaurant Recommendation System based on Yelp Data
my banner

Data Visualisation Projects

Netflix Movies Analysis

SQL Case Study

Danny's Diner Case Study Foodie-Fi Case Study
my banner
Online News Exhibition Best Selling Video Games
my banner

Statistics

Multi-Variate Regression Using Statsmodels and Gradient Descent Optimisation Finding the best version for Mobile Game using A/B Testing
my banner

Education

  πŸ“š Bachelors in Computer Science | [California State University, Long Beach, California] [August 2022 - May 2026]

List of Certifications

  πŸ”– Data Analyst

  πŸ”– Data Scientist

  πŸ”– Machine Learning for Time Series Data in Python

  πŸ”– Analysis Marketing Campaigns Using Pandas

  πŸ”– Introduction to Deep Learning In Python

  πŸ”– Analysing Data In Tableau

  πŸ”– Deep Learning in Pytorch

Work Experience

  πŸ’» Data Engineer |Roku Inc [May 2023 - Present]

  πŸ’» Senior Executive |Oil and Natural Gas Corporation Ltd [July 2018 - April 2021]

  πŸ’» Intern | Wipro Ltd [April 2017 - June 2017]

  πŸ’» SAP Consultant | Tata Consultancy Services Ltd [September 2014 - June 2016]

Assignments and Coursework

β€ƒπŸ“™ DATA-255 Deep Learning

      πŸ“– Homework 1

      πŸ“– Homework 2

      πŸ“– Homework 3

      πŸ“– Homework 4

β€ƒπŸ“™ DATA-245 Machine Learning

      πŸ“– Homework 1

      πŸ“– Homework 2

      πŸ“– Homework 3

      πŸ“– Homework 4

      πŸ“– Homework 5

  πŸ“™ DATA-228 Big Data

      πŸ“– Homework 1

      πŸ“– Homework 2

      πŸ“– Homework 3

      πŸ“– Homework 4

      πŸ“– Homework 5

Competencies

   πŸ‘©β€πŸ« Leadership Skills
   πŸ‘©β€πŸ« Biased for Action
   πŸ‘©β€πŸ« Communication Skills
   πŸ‘©β€πŸ« Team Work
   πŸ‘©β€πŸ« Curios
   πŸ‘©β€πŸ« Problem-solving Skills
   πŸ‘©β€πŸ« Time Management
   πŸ‘©β€πŸ« Accountiblity

Blogs On Medium

   πŸ“ƒ Learn Python in 30 Days

   πŸ“ƒ How to Prepare for SQL Interviews in 15 Days

   πŸ“ƒ What is Dummy Variable Trap and How it can be avoided?

   πŸ“ƒ Hyper-Parameter Tuning for Machine Learning Models Using Optuna

   πŸ“ƒ Detecting and Resolving Duplicate Records using Record Linkage

   πŸ“ƒ Two Simple Tests to check Normality of the data

   πŸ“ƒ How to perform Market Basket Analysis using Apriori Algorithm and Association Rules

   πŸ“ƒ How to determine the order of ARIMA or SARIMA Models

   πŸ“ƒ How to Perform Hyper-Parameter Tuning in Artificial Neural Networks

   πŸ“ƒ Different-Loss-Functions-used-in-Regression

   πŸ“ƒ Using Hexbin Plots to visualise relationship between two variables

   πŸ“ƒ Different Correlation Coefficients to measure the relationship between two variables

   πŸ“ƒ Different Methods to replace Missing Values in Data

   πŸ“ƒ How to find Optimal Parameters for Regression Model using Scipy

   πŸ“ƒ What is Pandas Profiler and Why it is used

   πŸ“ƒ OLAP Operations in SQL

   πŸ“ƒ Feature-Encoding-Using-K-Fold-Target-Encoding

   πŸ“ƒ Different Linkage Methods used in Hierarchical Clustering

   πŸ“ƒ What is LIME and how it can be used?

   πŸ“ƒ Benefits Of Dropblock Over Dropout in CNN

   πŸ“ƒ Benefits Of Dropblock Over Dropout in CNN

   πŸ“ƒ Difference between Prediction Interval and Confidence Interval

   πŸ“ƒ Understanding Keras Embedding for NLP

   πŸ“ƒ Augmenting Text Using Large Language Models GPT-2 GPT-3 BERT

   πŸ“ƒ Equiangular Basis Vectors:a better alternative to softmax for classification tasks

   πŸ“ƒ Federated Learning by Google bringing Privacy to Machine Learning

   πŸ“ƒ Different Types of Filters in Tableau

   πŸ“ƒ What is Struct in SQL

   πŸ“ƒ Topic Modelling using LDA

   πŸ“ƒ Difference Between Stateful and Stateless RNNs

   πŸ“ƒ Power of Continuous Integration and Continuous Deployment (CI/CD) in Data Engineering

   πŸ“ƒ Useful Data Science Libraries in Python

   πŸ“ƒ What is Lamda and Kappa Architecture

   πŸ“ƒ Difference between Data Lake and Data Lakehouse

   πŸ“ƒ AWS Dynamic Frames

   πŸ“ƒ Thompson Sampling for Multi Arm Bandit Problem

   πŸ“ƒ Confounding and Instrumental Variables

   πŸ“ƒ Comprehensive Overview of Introduction to MLOPS by Oreilly

   πŸ“ƒ Responsibleai Rainwidget: a tool for interpretable fair and accurate Machine Learning

   πŸ“ƒ Grouped Data Cross Validation with Scikit-learn: Groupshufflesplit and Groupkfold

State-of-the-art (SOTA) Techniques in Artificial Intelligence and Statistics:

  SHAP (Explainable AI) (SHapley Additive exPlanations) is a framework for interpretability and visualization of machine learning models. It provides an explanation for the predictions made by a machine learning model by assigning an importance score to each feature, indicating its contribution to the prediction. The SHAP values are based on a game theoretic approach to feature attribution, known as the Shapley values, which are a measure of the contribution of each feature to the prediction of a model.

The SHAP values are calculated for each prediction and are designed to be both model-agnostic and locally accurate. This means that they can be used to interpret any machine learning model, not just specific types of models, and the values for each feature are only relevant for the specific prediction being made. This allows for detailed, local interpretation of model predictions, rather than relying on global feature importance values.

  Record Linkage ,also known as entity resolution or data matching, is the process of identifying records in different databases that refer to the same real-world entity, despite differences in their representation or encoding. The goal of record linkage is to merge duplicate records into a single, accurate representation of the entity. This is an important step in data cleaning, which is a crucial part of the data preparation process in data analysis and machine learning.

Record linkage can be accomplished using various techniques, including deterministic methods, such as exact or fuzzy matching, and probabilistic methods, such as statistical matching and Bayesian inference. The choice of method will depend on the specific requirements of the data, such as the size of the data set, the amount of noise in the data, and the desired level of accuracy.

In practice, record linkage is used in a wide range of applications, including data integration, data quality improvement, fraud detection, customer relationship management, and market research.

  Optuna is an open-source library for hyperparameter optimization that enables users to efficiently perform Bayesian optimization, grid search, and random search. It supports a wide range of machine learning frameworks, including TensorFlow, PyTorch, and XGBoost. Optuna provides a high-level API for defining objectives and constraints, as well as a set of built-in algorithms for choosing the next set of hyperparameters to evaluate. It also integrates with popular visualization libraries such as Matplotlib and Plotly for easy visualizations of the optimization process. Optuna is designed to be easy to use and customizable, allowing users to implement their own optimization algorithms or to extend the existing ones with custom functions

  CUPED CUPED (Controlled Uplift Pre-Processing Experimental Design) is a technique used in A/B testing to reduce bias and improve the accuracy of estimated treatment effects. The main goal of CUPED is to adjust the treatment effect estimate by removing the systematic variation in the control group that is correlated with the treatment group.The basic idea behind CUPED is to fit a model to the control group data that predicts the outcome variable based on the covariates, and then use the residuals from this model to adjust the treatment effect estimate. This is done by subtracting the average residual in the control group from the observed treatment effect.The CUPED method has several advantages over traditional A/B testing methods. First, it can help reduce bias and improve the accuracy of treatment effect estimates by accounting for covariates that are correlated with both the treatment and the outcome. Second, it can help reduce the variance of the treatment effect estimate by removing the noise due to covariate variation in the control group. Finally, it can help improve the power of the test by reducing the sample size needed to detect a significant treatment effect.

  BERT, which stands for Bidirectional Encoder Representations from Transformers, is a powerful language model that has revolutionized natural language processing (NLP) tasks. It uses a transformer-based neural network architecture to learn contextual relationships between words and generate high-quality representations of text. BERT is pre-trained on massive amounts of text data, making it adept at tasks such as sentiment analysis, text classification, and question-answering. Its effectiveness has made it a popular choice for NLP researchers and practitioners alike, and it continues to be a driving force in the development of cutting-edge NLP applications.

  LIME (Local Interpretable Model-Agnostic Explanations) is a powerful AI tool used for interpreting the decisions made by machine learning models. It provides insights into how a model arrives at its output, which can help users understand and validate model behavior. LIME generates model-agnostic explanations, meaning that it can be used with a wide range of machine learning models. Its local explanations are designed to be interpretable, meaning they are easy for humans to understand. LIME has proven to be a valuable tool in fields such as healthcare, finance, and law, where interpretability of machine learning models is crucial for decision-making.

  GPT API by OpenAI is a powerful language model that uses deep learning techniques to generate natural language text. It is based on the GPT architecture, which stands for Generative Pre-trained Transformer, and has been pre-trained on massive amounts of text data. The GPT API allows users to generate high-quality text content with just a few lines of code, making it a popular choice for applications such as chatbots, text summarization, and content creation. Its ability to understand the context and generate coherent, natural-sounding text has made it a game-changer in the field of natural language processing.

Contact Information

   πŸ“± Phone No: 669-230-9604

   πŸ–‡ LinkedIn: https://www.linkedin.com/in/iqra-bismi/

   πŸ“« Email: [email protected]

   ✍🏻 Medium: https://medium.com/@iqra.bismi

Popular repositories Loading

  1. dailytakehome7 dailytakehome7 Public

    HTML

  2. final-project final-project Public

    HTML

  3. 21TN2 21TN2 Public