Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Bounce Tracking #116

Merged
merged 16 commits into from
Dec 12, 2024
Merged

Adding Bounce Tracking #116

merged 16 commits into from
Dec 12, 2024

Conversation

S-Makrod
Copy link
Contributor

@S-Makrod S-Makrod commented Nov 28, 2024

Description

Adds tracking of bounces to collect.ts and query.ts. The implementation follows the approach outlined in the article https://notes.normally.com/cookieless-unique-visitor-counts/. I modified the existing implementation in collect.ts a bit to actually increment the Last-Modified header by one second instead of putting the current time. This is closer to the implementation in the article and allows the code to dynamically calculate the visits and determine the bounce value. Once the bounce value is determined it is written by the analytics engine.

Side note: I did not want to touch the checkVisitorSession function but maybe it would be a good idea to merge this with the getBounce function?

In query.ts I updated the queries to also return the bounce value.

In resources.stats.tsx I updated the route to display the bounce value.

I have the changes deployed at: https://bounce-tracking.counterscale-7bi.pages.dev/dashboard?site=counterscale-dev

Note

Hi Ben, I am a student from the University of Toronto and attended the guest lecture you held for CSCD01 for the Fall 2024 semester. One of the things you mentioned was that you are working on Counterscale. I got pretty interested and saw this issue which I thought I could tackle. I am new to the repository so I am not too sure if I did everything correctly and I may have missed some details but please let me know what you think when you get a chance!

Issue

This closes #47

Copy link
Owner

@benvinegar benvinegar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First off – I'm impressed you tackled this. Thanks for the contribution!

The changes seem pretty reasonable. I left some comments I'd like addressed before merging. I'll also run this patch on my website for a bit and see how the production data turns out.

app/analytics/__tests__/collect.test.ts Outdated Show resolved Hide resolved
app/analytics/__tests__/collect.test.ts Outdated Show resolved Hide resolved
app/analytics/__tests__/collect.test.ts Outdated Show resolved Hide resolved
app/analytics/collect.ts Outdated Show resolved Hide resolved
app/analytics/collect.ts Outdated Show resolved Hide resolved
@S-Makrod
Copy link
Contributor Author

S-Makrod commented Nov 28, 2024

Hi @benvinegar, I did some thinking and actually realized the implementation I proposed had an oversight. In the article, it was assumed that we are keeping track of each visit; however, I didn't realize that in the Counterscale implementation, a visit is tracked if they visit in a new session, which is 30 minutes. This means we have to preserve time, and as a result, the implementation in the article does not actually satisfy all the needs.

I got around this by tweaking the implementation so that we include milliseconds. This allows the code to track the bounce values in the milliseconds while preserving the time (up to the accuracy of seconds). By doing this, the implementation can keep track of the unique visits, sessions, and bounces. Note that to do this I needed to use toISOString() instead of toUTCString(). I'm not sure if that will be an issue.

New deployment: https://50205ade.counterscale-7bi.pages.dev 

@benvinegar
Copy link
Owner

I think you've highlighted this is complicated and I need to refresh myself on the whole thing again before I can review this properly.

I'll add that if it does deviate from the website (sounds like it), then that highlights a painful lack of commenting that should have indicated as such. Oops.

@benvinegar
Copy link
Owner

a visit is tracked if they visit in a new session, which is 30 minutes

Note the language around "visit" and "session" is very specific:

  • A new session occurs if a hit is recorded without an earlier hit occurring in the previous 30 minutes
  • A new visit occurs if a hit is recorded without an earlier hit occurring in the current day

Copy link
Owner

@benvinegar benvinegar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my individual feedback items.

Note that to do this I needed to use toISOString() instead of toUTCString(). I'm not sure if that will be an issue.

It might. Whenever you get into things like HTTP headers, I recommend looking at the relevant MDN docs, or even going straight to the spec.

Here's the expected syntax:

If-Modified-Since: <day-name>, <day> <month> <year> <hour>:<minute>:<second> GMT

So, it looks like toISOString does always output UTC, so that's fine. But the spec doesn't support a millisecond component, which is bad.

Now given that this string is returned from the server, will it be properly saved and resent by the client? It seems you've verified it still works. But it's no longer HTTP compliant ... which could cause other issues (e.g. maybe some browser/other client might drop the millisecond component, or consider the whole thing malformed and not send it).

Hmm. Maybe we just have to abandon "sessions" and stick to visits and bounces? Do we gamble on this "just working"? 🤷

app/analytics/collect.ts Outdated Show resolved Hide resolved
app/analytics/collect.ts Outdated Show resolved Hide resolved
app/analytics/query.ts Outdated Show resolved Hide resolved
app/analytics/query.ts Outdated Show resolved Hide resolved
@benvinegar
Copy link
Owner

benvinegar commented Nov 30, 2024

For what it's worth, the current implementation was designed to mirror how Google Analytics describes terms.

So it would be nice if a bounce was recorded within the context of a 30 minute session (which AFAICT is what your patch does). That means if a user loads a page for the first time today, and doesn't load another page within 30 minutes, that is a bounce. This also matches GA's bounce-tracking behavior.

Plausible also hosts their own definitions.

A session (also known as a visit) is a set of actions that a user takes on your site. A visit is started when a visitor first lands on your page and ends when no action is taken on your site for 30 minutes.

Fathom uses visitor (and presumably sessions, which they don't break out) to mean 24 hours / once a day:

Visitors are the number of people who visited your website. Let’s say Sally and John both visit your website, and each of them clicks on three pages. You’d have two visitors (John and Sally) and six views (two people visit three pages each).

Note: due to privacy, we reset how we determine a visitor every 24 hours (you can read about salts and hashes here.

@benvinegar
Copy link
Owner

benvinegar commented Nov 30, 2024

I've tested the "transmit bounces as the undefined millisecond component of if-modified-since" implementation with both Chrome and Firefox and it seems to work (they send the full datetime string with the millisecond component). Still nervous about the undefined behavior though.

My instinct right now is that we should abandon the 30-minute session concept, and strictly record 1) visitors (once a day), 2) views, and 3) bounces. That way we avoid trying to overload if-modified-since to use milliseconds, and use seconds to track bounces (per your original implementation). It means we basically abandon the isSession column in Clickhouse.

I don't know that Fathom does exactly the same, but based on their feature set and description of how they record visits, I suspect they do (see above).

Thoughts? This has some serious product implications so need some input from users. @funkaoshi @jmorrell @silent1mezzo

@S-Makrod
Copy link
Contributor Author

Hi Ben, thanks for all your feedback! I'll work on getting through it and hopefully have the changes done sometime today/tomorrow.

The sources you pointed out were helpful and I never took the time considered them earlier on, it is interesting to learn that milliseconds are not included in the HTTP standard of the header. Once you decide on how you would like to proceed with the session vs views and bounces, please let me know, and I'll update the implementation accordingly. Thanks again!

@funkaoshi
Copy link

Tangent: this is a cute way to count visits. I didn't realize this was how the application worked.

On topic, I also think you should drop the 30 minute session concept.

@silent1mezzo
Copy link

I think it ultimately comes down to who Counterscale is for. Dropping the 30-minute session concept works for me (and I suspect the majority of hobby customers but probably doesn't work for people who'd need more granular concept for advertising/monetary reasons. Based on the first line "Counterscale is a simple web analytics tracker" I think it makes sense to drop the session concept in favour of bounces.

@benvinegar
Copy link
Owner

Okay, I generally agree – the extra detail of sessions vs. isn't important for users. Plus it's privacy focused so not having additional detail about "when" a visitor came seems also ideal.

So I propose dropping the 30 minute tracking and use the original midnight implementation.

@S-Makrod if you can bring it back to your original implementation, I can take it the rest of the way, because there will be other wider product ramifications.

@S-Makrod
Copy link
Contributor Author

S-Makrod commented Dec 4, 2024

Hi @benvinegar, I have reverted back to the original implementation

@funkaoshi
Copy link

None of my business, but I think you should still fold back in some of the suggestions from the previous implementation.

@benvinegar benvinegar self-requested a review December 12, 2024 03:10
@benvinegar benvinegar merged commit db24a2b into benvinegar:main Dec 12, 2024
1 check passed
@benvinegar
Copy link
Owner

benvinegar commented Dec 12, 2024

@S-Makrod It's done!

Here's what else happened:

  • I removed the session concept but kept that column around.
  • I wrote some code to show n/a if bounce rate cannot be calculated for the current period (needs to collect data first).

I merged to master but going to test this a bit live before publishing a version.

@S-Makrod
Copy link
Contributor Author

@benvinegar thanks for all your help getting this merged! This was a cool issue to work on, and I'm glad I could contribute. Looking forward to seeing it published!

benvinegar added a commit that referenced this pull request Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Determine and show bounce rate in dashboard
4 participants