StackOverflow Look Back Tool

ARCHIVED: Very happy with this tool, but I've stopped using it, especially because we have LLMs now.

StackOverflow Look Back Tool

🛠 Extract and search the posts you've up-voted on StackOverflow. Look back on your data.

Description

I need to quickly browse and re-learn from questions I've up-voted in the past. This is a browser extension and search UI for doing that. See Background for more information.

NOTE: This project was developed on macOS. It is for my own personal use.

Design

The overall flow of the tool breaks down like this:

Scrape your votes data from https://stackoverflow.com
Expand the votes data into posts data using https://data.stackexchange.com
View and search the posts

The application is made up of several distinct programs:

A browser extension
- The code is in src/
A search frontend
- This is implemented as a React application with NextJS.
- The code is in search-ui/
A search backend
- Algolia is used as the search back-end
- For local development and experimentation, these is also a Lucene-based web server in search-api/. This acts as a substitute for Algolia.

The source code of the browser extension is generally grouped by the execution context that the code runs in and is inviting for future additions like Manifest V3 support, or a Safari browser extension.

util/
- Miscellaneous utility code that is not specific to the Look Back Tool.
src/
- The code in this directory is specific to the Look Back Tool.
src/web-page/
- The code in this directory runs on the web page.
src/backend/
- The code in this directory runs in the extension backend contexts: background workers, popups, and content scripts.
src/chromium-manifest-v2/
- Code that supports a Manifest V2 web extension developed for Chromium browsers.
src/firefox-manifest-v2/
- Code that supports a Manifest V2 web extension developed for Firefox.

There is one library dependency for the extension: https://github.com/dgroomes/browser-extension-framework. The BrowserExtensionFramework is an RPC-centric web extension framework that was originally developed as part of the Look Back Tool codebase.

The extension has been verified to work in the checked [x] browsers:

My Bias Against Content Scripts

In my opinion, content scripts are not compelling and I don't quite get their necessity in browser extension technical architecture. From my perspective there of course needs to be one isolated JavaScript execution environment that powers an extension. Why do there need to be two? The extension context has access to powerful browser APIs. The web page itself is a powerful execution environment because it has access to the DOM and the application source code. So what is the place of content scripts? I know by design, they have access to the DOM while the extension environment does not. But why? I'm sure there are good reasons. But the three different exec environments and their unique capabilities and restrictions has made it difficult to design and implement my own code.

My Bias for Web APIs

A corollary to my bias against content scripts is my bias for Web APIs. Most of the source code for this extension actually executes on the web page, where standard Web APIs can be used. This code executes the domain logic like the data scraping and HTML generation. As such, this code is perfectly portable to other "evergreen" browsers because it just relies on standard web APIs instead of non-standard browser extension APIs (i.e. Manifest V2 and V3).

Instructions

Follow these instructions to install the tool as a Chrome browser extension and use it:

Install npm
Clone the BrowserExtensionFramework (BEF) Git submodule:
- ```
git submodule update --init
```
Build the BEF distribution
- Follow the build instructions in the BEF README. It is located at browser-extension-framework/README.md.

Install BEF:

npm install browser-extension-framework/framework/dgroomes-browser-extension-framework-0.1.0.tgz

Run the Webpack build:
- ```
npm run build
```
Build the extension distributions:
- ```
./build.sh
```
- This takes about a minute! I'm assuming the TypeScript type checking takes a lot of time.
Open Chrome's extension settings page
- Open Chrome to the URL: chrome://extensions
- Alternatively, follow the instructions in the Firefox section below to install the extension in Firefox
- Alternatively, follow the instructions in the Opera section below to install the extension in Opera
Enable developer mode
- Enable the Developer mode toggle control in the upper right corner of the page
Install the extension
- Click the Load unpacked button
- In the file finder window that opens, find the extension distribution directory build/chromium-manifest-v2-web-extension/, single click it to highlight it, and click the Select button.
- It's installed!
Open StackOverflow
- Go to https://stackoverflow.com/ in your browser
Log in
Open your profile
- Click your picture in the top right corner to open your profile
Open the "Votes" tab
- Find the "Votes" tab and click it.
- For me, my Votes tab navigates to this URL: https://stackoverflow.com/users/1333713/david-groomes?tab=votes
Scrape the votes data
- Open the extensions menu by pressing the puzzle icon in the top right of the window
  - Alternatively, for Opera, it is a cube button
  - Alternatively, for Firefox, there is NOT an extensions menu and instead you invoke the extension directly by clicking a puzzle icon button on the right side of the URL bar.
- Click the "stackoverflow-look-back" extension entry
- A popup will show up with buttons titled "Scrape votes" and "Expand posts". Click "Scrape votes" and check the console logs. The votes data will have been scraped and saved to browser storage.
Expand the post data
- Go to the Stack Exchange Data Explorer
  - If not logged in, then log in and navigate back to the original page.
- Repeat the earlier steps to open the extension entry
- The same popup will appear. Click "Expand posts". The post data will be expanded and saved into browser storage.
Download the posts data
- While on the same StackExchange page, repeat the earlier steps to open the extension entry
- Click the "View posts" button
- Click the download button. Now, you have a copy of the data in a JSON file.
Upload to Algolia
- You're on your own for this step. Algolia is really easy to use.
Search the posts
- Follow the instructions in search-ui to run the search UI.
- Finally, search for that one post you up-voted that has the magic incantation of code that you urgently need!

Firefox

The tool can also be installed as a web extension in Firefox! Follow these instructions to install it:

Open Firefox to the debug page
- Open Firefox
- Paste and go to this URL: about:debugging#/runtime/this-firefox
Load the plugin
- Click the button with the words Load Temporary Add-on…
- In the file finder window that opens, find the file build/firefox-manifest-v2-web-extension/manifest.json and click Open
- It's installed!

Opera

The extension can also run in Opera.

Follow these instructions to install it in Opera:

Open Opera to the debug page:
- Open Opera
- Paste and go to this URL: opera:extensions
Enable developer mode
- Toggle on the Developer mode control in the top right corner
Load the plugin
- Click the "Load unpacked" button
- In the file finder window that opens, find the directory src/extension/chromium-manifest-v2 and click Select
- It's installed!

Wish List

General clean ups, TODOs and things I wish to implement for this project:

Finished Wish List items

These are the finished items from the Wish List:

Background

Here is some background on this project and some of my research which contextualizes the "why" of this project.

Does StackOverflow already support this? stackoverflow.com does not have search functionality for posts that you've up-voted. By contrast, there is a way to search for posts that you've bookmarked (née favorited) using the search option inbookmarks:mine. See the search page https://stackoverflow.com/search for all search options. I've bookmarked 121 posts whereas I've up-voted 2,200 posts! I want search coverage on my votes ( Hello StackOverflow, if you see this, consider this a feature request, or at least, a user experience data point! Thank you). Here are some related questions by other people:
- "How do I search for posts I've interacted on, with a particular word in them?"
- Search Q or A's I've upvoted
Why scrape the HTML for this data and not just query it via the Stack Exchange Data Explorer (SEDE)? Unfortunately, up-vote and down-vote data is private. It is anonymized in SEDE. The StackOverflow API also does not expose this data. So, it must be scraped from the HTML.
This is a fun project for me
I like JavaScript and the browser
- Why do I like the browser so much? Among other things, the MDN Web Docs are so amazing 🤩⭐️ and make it fun and rewarding to develop using Web APIs.
This is a vehicle for me to learn TypeScript on a non-trivial project. I'm learning TypeScript with the help of Deno and its bundle command.

Notes

The Chrome extension development experience is overall pretty good. I imagine it's much better than it was in the early years of Chrome. That said, it's difficult to debug the JavaScript code that runs in a service worker (the one defined by the background.service_worker field in the manifest. I find that 1) When it errors, there are no logs but just the infamous "Service worker registration failed" message in the "chrome://extensions" page and 2) I can't attach a debugger. The only thing I can do is comment out the whole file, and uncomment lines little by little and adding console.log statements.
How many execution contexts are there? 1) The JavaScript execution environment in the page 2) The JavaScript execution environment that executes the extension code like the popups and 3) The JavaScrip execution environment that runs the content scripts? For example, I need to understand this because I'm hitting a roadblock where I want to make a Proxy over jQuery on the webpage, but a content script's execution environment doesn't have access to the web page's variables, but it does have access to the DOM (seems arbitrary to allow one but block the other, but there is probably a good reason). And there is a way to work around this problem anyway: inject a script element into the page itself from a content script. See this StackOverflow question and answer.
The let that = this trick I have to use in the ES6 classes is a bit disappointing... how else could this code be designed? Is there an idiomatic ES6 class way? Or this a quirk of classes? Answer: no, see this SO question. Update 2: well, in all cases arrow functions actually solve my problem (not sure if that's a good thing but I'll take it)!
One of the significant changes of Chrome's Manifest V3 over Manifest V2 is the Action API unification
I'm not sure how to do global state anymore since I've incorporated modules. In a browser extension context especially, a content script might be loaded multiple times, a web page script might be loaded multiple times and it's important for the subsequent loads to not have a negative effect. For example, the first load might initialize an listener object, and subsequent loads must not initialize a new listener object because then it leads to "double listens" and other unintended side effects. Plus I'm confused how to declare global variables in TypeScript. I should stick the to the window right?

Reference

Materials I referenced when building this tool and deep diving on learning.

MDN Web Docs

MDN Web docs: API docs for NodeList
MDN Web docs: API docs for MutationObserver
MDN Web docs: JavaScript modules
MDN Web docs: toJSON() behavior
MDN Web docs: "page_action"
- Note that the Manifest property show_matches (of page_actions) is only supported in Firefox. By default, page actions are hidden in Firefox but by contrast, page actions are shown by default in other browsers. This was a surprising find to me because I couldn't see the page action in the URL bar and I was confused! I need to explicitly enable it with the show_matches property.
MDN Web Docs: Manifest property "externally_connectable"
- The externally_connectable is not supported in Firefox. An alternative must be used for message passing between the web page and the extension. See https://github.com/mdn/webextensions-examples/tree/master/page-to-extension-messaging.
MDN Web Docs: the EventTarget APIs
MDN Web Docs: Window postMessage API
MDN Web Docs: runtime.sendMessage()
MDN Web Docs: browserAction.onClicked
MDN Web Docs: tabs.sendMessage()
- Send messages from background scripts to content scripts
- Chrome equivalent
MDN Web Docs: extension storage API
- Enables extensions to store and retrieve data, and listen for changes to stored items.

Chrome extension docs

Chrome extensions docs
Chrome extension docs: chrome.webRequest
- Consider using this API to intercept requests instead of using a Proxy object on the web page
Chrome extension docs: Manifest V2 Getting started
Chrome extension docs: chrome.browserAction

Other

Meta Stack Exchange: Database schema for the Stack Exchange Data Explorer (SEDE)
StackExchange: What are tags, and how should I use them?
- This describes the tag naming convention. E.g. command-line, powershell
Multiple references on recommended/possible ways to render HTML dynamically from JS code in the browser (there are many but there is not an obvious choice!)
dgroomes/web-playground/browser-extensions
- My own reference project for Chrome extensions
Extension Workshop: Porting a Google Chrome extension
- Shoot, Firefox doesn't support Manifest v3 and I spent all this time writing a Chrome extension in Manifest v3. I wish I had implemented in Manifest v2 so that I could compatibility with Firefox.
Extension Workshop
- A special Firefox site that is focused entirely on extension development.
- Get help creating and publishing Firefox add-ons that make browsing smarter, safer, and faster.
Bugzilla (Firefox bug tracker)
- You can't use symlinks in web extensions. This works in Chrome, so this type of issue wasn't on my radar and I spent a lot of time trying to track this issue down. I wonder if symlinks might work in Firefox Development version? Update: no, it is the same in Firefox Developer edition.
GitHub repo: mozilla/web-ext
- I'm purposely choosing to not use this tool. I want to keep the dependencies to an absolute minimum and this tool is not critical.
Opera dev docs: The Basics of Making an Extension
Deno: "A modern runtime for JavaScript and TypeScript."

Name		Name	Last commit message	Last commit date
Latest commit History 215 Commits
browser-extension-framework @ 872fc2e		browser-extension-framework @ 872fc2e
docs		docs
search-api		search-api
search-ui		search-ui
src		src
util		util
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
build.sh		build.sh
package-lock.json		package-lock.json
package.json		package.json
sede.ddl		sede.ddl
stackoverflow-sql-find-my-useful-stuff.md		stackoverflow-sql-find-my-useful-stuff.md
tsconfig.json		tsconfig.json
webpack.config.js		webpack.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StackOverflow Look Back Tool

Description

Design

My Bias Against Content Scripts

My Bias for Web APIs

Instructions

Firefox

Opera

Wish List

Finished Wish List items

Background

Notes

Reference

MDN Web Docs

Chrome extension docs

Other

About

Languages

dgroomes/stackoverflow-look-back

Folders and files

Latest commit

History

Repository files navigation

StackOverflow Look Back Tool

Description

Design

My Bias Against Content Scripts

My Bias for Web APIs

Instructions

Firefox

Opera

Wish List

Finished Wish List items

Background

Notes

Reference

MDN Web Docs

Chrome extension docs

Other

About

Topics

Resources

Stars

Watchers

Forks

Languages