To interact with my project, simply run final.py by: python final.py. If the program cannot detect a cache file, it will automatically craw data from steampower, which will takes ~15 minutes since we need to crawl 5000 games. The project requires pandas, bs4 and requests library
For the program, I use graphs to store all the information, where the game, the tag, price and rating are all considered as vertices, and games with same tags, similar price or rating are grouped together and form a row of a matrix. I create 4 dictionary, game_all stores the game along with its rate, price and tag; tag_all stores all the 436 tags along with the games that fit the tag; rate_all stores different rate interval and the games that fall into this interval; price_all stores different price interval and the games that fall into this interval. These data structure can be considered as adjacency matrix.