Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add direct database insertion option for fast visit generation #139

Merged
merged 8 commits into from
Mar 4, 2024

Conversation

bx80
Copy link
Contributor

@bx80 bx80 commented Feb 23, 2024

Description:

This PR adds a new console command to the visitor generator which will directly insert fake visits into the database without using the tracker API.

This is intended for quickly creating large datasets where speed of data generation is more important than a perfect simulation of tracking requests.

Usage:

./console visitorgenerator:generate-visits-db [--idsite IDSITE] [--days DAYS] 
[--start-date START-DATE] 
[--limit-visits LIMIT-VISITS] 
[--limit-random-percent LIMIT-RANDOM-PERCENT] 
[--threads THREAD-COUNT]
[--conversion-percent CONVERSION-PERCENT]
[--min-actions MIN-ACTIONS]
[--max-actions MAX-ACTIONS]
[--actions-pool-size ACTIONS-POOL-SIZE]
[--v] [--vv] [--vvv]

Unoptimized performance on basic hardware is currently 350 - 700 visits per second (~2,500 insert/s) compared to ~7 visits per second for visits generated using the normal tracker API method.

vgdb-demo

Fixes #138

Review

@bx80 bx80 self-assigned this Feb 23, 2024
@bx80
Copy link
Contributor Author

bx80 commented Feb 26, 2024

Added support for multiple threads using the Symphony\Process component, which this PR adds to the plugin composer dependencies. This is only used when the --threads=x option is used and will divide the daily limit by the number of specified threads, for example, if generating data for 3 days with a 1,000 visit limit per day and 4 threads, then each thread would simultaneously run for all three days but with a 250 limit per day. All multi-process input will be captured and displayed by the parent process along with a total summary of generated data.

@michalkleiner
Copy link
Contributor

Nice work! There's somehow more to it than I envisaged 🙈
Had an initial look over the code and left a few questions, will try to run it locally as well.

Copy link
Contributor

@AltamashShaikh AltamashShaikh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bx80 Since we are not prefixing the tables, getting below errors

In GenerateVisitsDb.php line 495:
                                                                               
  SQLSTATE[42S02]: Base table or view not found: 1146 Table 'matomo_local.log  
  _action' doesn't exist   

@AltamashShaikh
Copy link
Contributor

@bx80 Left comments on slack thread regarding warnings being received, when there is no goal also when there is 1 goal.

Copy link
Contributor

@AltamashShaikh AltamashShaikh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bx80 functional testing now works as expected

./console visitorgenerator:generate-visits-db --idsite 4 --limit-visits=200 --days=3 --threads=8
8 threads requested
Starting threads........ [8 threads were started]
........................
Site Id                             4
Time taken                      3.03s
Visits generated                  192
Visits actions generated          842
Actions generated                 842
Conversions generated               6
Visits per second                  63 / sec
Queries per second                621 / sec

@bx80 bx80 requested a review from michalkleiner March 3, 2024 22:18
@bx80
Copy link
Contributor Author

bx80 commented Mar 4, 2024

Thanks for the review @AltamashShaikh 👍

If there is no more review feedback, can someone in @matomo-org/plugin-reviewers merge this? (I don't have access).

@snake14 snake14 merged commit 0414bf7 into 5.x-dev Mar 4, 2024
4 checks passed
@snake14 snake14 deleted the vg138-direct-db-option branch March 4, 2024 22:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support a direct database insertion option for fast visit generation
4 participants