Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit stored articles #184

Closed
lamescholar opened this issue Mar 18, 2024 · 11 comments
Closed

Limit stored articles #184

lamescholar opened this issue Mar 18, 2024 · 11 comments

Comments

@lamescholar
Copy link

When you have a lot of feeds, the storage database grows fast in size and the program starts to work slower, so you need to delete the storage database, and start with a clean slate.

The solution may be adding an option to choose the amount of feed articles stored in memory. 100 or 200. So old ones are automatically deleted after refreshing.

@xfzv
Copy link
Contributor

xfzv commented Mar 19, 2024

Most likely never going to happen (#57) but I wish there was a way to achieve this too.

@nkanaev
Copy link
Owner

nkanaev commented Apr 9, 2024

@lamescholar what's the scale of your feed? I've initially tested it against a subscription with 1000 feeds, which it handled fine, though I didn't run long-term test and how it'd affect it performance-wise.

Currently it does delete old feeds to cleanup space and prevent performance decrease (see here).

@lamescholar
Copy link
Author

I have about 600 feeds. After a month, when the storage database grows past 1 GB, the program starts to work slower.

@nkanaev
Copy link
Owner

nkanaev commented Apr 10, 2024

I have a feeling that SQLite is perfectly capable of handling dbs larger than 1GB. This may be the app issue that needs fine-tuning.

I can do some profiling to find out the root cause, but I need sample db (causing slowness) for that. If you'd be comfortable sharing your local db, drop me a messages at nkanaev [at] live [dot] dom.

@lamescholar
Copy link
Author

I deleted .db recently. At the moment it's 300 MB. I will send it when lags will pop up.

@sjehuda
Copy link

sjehuda commented Apr 25, 2024

Most likely never going to happen (#57) but I wish there was a way to achieve this too.

I wish for the same thing.

This is one of the greatest tasks in managing database for feed reader. Everything must be stored in order to compare it with currently pulled items.

Did you think of removing all data* and keep only identifying information like ID, Link and Dates** until they are no longer valid***?


* Items older than X days.
** By Date I refer to the property "Updated" which is subjected to updates, unlike property "Published" which should be fixed.
*** Until an item is no longer exist on the server.

@thanhnguyen2187
Copy link

Hi. Is there any update for this? I'm cleaning up my VPS and found that storage.db's size reached 600MB. I want to preserve the starred articles while cleaning up the rest, and did that by clearing data of items, but I'm unsure how to touch search_.... What do you think? Thanks!

@Skabit
Copy link

Skabit commented Jun 19, 2024

Only for educational purposes. Can I change the date of the general configuration if I apply this comand?

sqlite3 /path/to/your/local.db "update items set status = 1 where date < date('now', '-90 days')"

And the same way, Is possible change the number of news to keep? (Default 50)

@nkanaev
Copy link
Owner

nkanaev commented Jun 20, 2024

Leaving notes for visibility:

@lamescholar has generously provided me a 1.8GB database file with >600 feeds. His complaint was that refreshing feeds and adding a feed were slow. Additionally to that, I've discovered that loading feed articles with pagination was slow in certain cases.

Refreshing feed on my laptop took ~20 mins (with the UI in the browser open) and ~10 mins (with the browser closed). I've managed to bring the refresh time down to ~1 min simply by removing db max connection limitation and switching to WAL journaling mode.

I'll be investigating further to see if any other improvements could be made.

@lamescholar
Copy link
Author

Thank you. As was pointed out by thanhnguyen2187 there is also a problem with db size.

@nkanaev
Copy link
Owner

nkanaev commented Jun 21, 2024

the storage database grows fast in size and the program starts to work slower

this isn't the case anymore.

  • refreshing feeds is blazingly fast now (switched to WAL journal)
  • listing articles with pagination is fast now (changed indexing)
  • adding a feed may be slow depending on the network and the feed server latency
  • db size depends on the number of feeds and their respective content size. yarr periodically cleans up old articles, but the settings for that are hardcoded. I may consider exposing them, but that won't guarantee to keep the size under control. If size is a concern, please consider manually doing backups/deletes of the database file.

@nkanaev nkanaev closed this as completed Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants