Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database for Project #43

Open
abhinavtripathy opened this issue May 25, 2020 · 10 comments
Open

Database for Project #43

abhinavtripathy opened this issue May 25, 2020 · 10 comments
Assignees
Labels
development Something that needs to worked on research

Comments

@abhinavtripathy
Copy link
Member

We need a database solution for the search engine. Something that is scalable and fast. Some notes

@abhinavtripathy abhinavtripathy added development Something that needs to worked on research labels May 25, 2020
@abhinavtripathy abhinavtripathy changed the title Database for Search Engine Database for Project Jul 8, 2020
@abhinavtripathy
Copy link
Member Author

abhinavtripathy commented Jul 8, 2020

@kevinmsmith131 There are many tasks for you to get started with.

  • Research on SQL vs NoSQL databases and their types. I think for this project we might want to use postgres as a SQL style database
  • There are a lot of requirements for this project in terms of database, these include: a database for storing user information like their login and basic user info(this is not super urgent or important), we want to be able to store search engine queries and find "trending searches" or "searches that yielded no results" or ask user feedback if they got what they are looking for and then review why the results weren't accurate on not (there could be other analytics too, think of some) and all these analytics need to be stored. For trending queries, we need to check if graph databases would be an interesting resource.
  • We also might need a caching database to speed things up, maybe redis would be an interesting option.
  • Come up with a architecture design for all our database requirements and what we need to do. We will most likely use docker-compose for the whole project, which I will set up soon.

@kevinmsmith131
Copy link
Collaborator

kevinmsmith131 commented Jul 8, 2020

Requirements Specification:

There will be a Tweets database that stores a tweet, the tweet id, the location the tweet was made from, the user that posted the tweet, the user's gender, the user's age, and the user's ethnicity. This database will be a SQL relational database where each row is a tweet with all the information for that tweet in the same row.

The database that stores search queries will be a NoSQL Graph Database. A graph database will allow us to connect queries to each other if they share a common theme, and if the number of connections is high for a query, then it will be labeled as a trending query.

The database that stores analytics will be a SQL relational database in which each row is a search query and in that row are all the analytics that are collected for that query. The analytics to be stored are if the query, if the query returned any results, if the results that the query are useful (which would require the user to be asked this for feedback), and if the information found is a credible source (may also require user feedback).

The caching database will use Redis and will use the cache-aside caching strategy. The database will be a relational database that stores frequent tweets, frequent search queries, and frequent analytics.

@kevinmsmith131
Copy link
Collaborator

kevinmsmith131 commented Jul 20, 2020

Data Set:

TWEETS (tweet_id, tweet, tweet time, tweet date, tweet_location, username, user_gender, user_age, user_ethnicity)
SEARCH_QUERIES (query_id, search_query, query time, query date)
ANALYTICS (query_id, search_query, query time, query date, trending_status, returned_results, useful_results, credible_information)
CACHE (frequent_tweet, frequent_query, frequent_analytic)

@kevinmsmith131
Copy link
Collaborator

kevinmsmith131 commented Jul 23, 2020

Conceptual Data Model:
Screenshot from 2020-07-23 16-33-08

@AdiNar1106
Copy link
Member

AdiNar1106 commented Jul 23, 2020

Great job on getting the database architecture together, it looks great and pretty robust! Can you resend that drive link, it doesn't seem to be working

@kevinmsmith131
Copy link
Collaborator

Ya the link wasn't working for me either so I ended up just screenshotting and pasting it here.

@kevinmsmith131
Copy link
Collaborator

Logical Data Model:
Social Insights Logical Data Model

@abhinavtripathy
Copy link
Member Author

Amazing work @kevinmsmith131

@kevinmsmith131
Copy link
Collaborator

Thanks @abhinavtripathy!

@abhinavtripathy
Copy link
Member Author

@kevinmsmith131 could you go ahead and create a branch on the repo and start to put all this code down into files, that way we could look at code reviews more easily.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Something that needs to worked on research
Projects
None yet
Development

No branches or pull requests

3 participants