Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce RecordBuilder concept to split up Archiver code #20394

Merged
merged 65 commits into from
Jun 5, 2023
Merged
Show file tree
Hide file tree
Changes from 55 commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
6a388a0
introduce RecordBuilder concept and re-organize Goals archiving code …
diosmosis Feb 23, 2023
5f42975
fix loop iteration bug
diosmosis Feb 23, 2023
a9c3168
Merge branch '5.x-dev' into record-builders-poc
diosmosis Feb 25, 2023
5921f9d
split ecommerce records recordbuilder into 3 separate records
diosmosis Feb 26, 2023
b2d8e10
make sure Goals::getRecordMetadata() behaves like old archiver code
diosmosis Feb 26, 2023
2be26f2
make sure recordbuilder archive processor is restored after being use…
diosmosis Feb 26, 2023
0db7cb5
just make ArchiveProcessor a parameter
diosmosis Feb 26, 2023
fda7ee7
check for plugin before calling buildMultiplePeriod()
diosmosis Feb 26, 2023
b2ea252
do not invoke record builders if archiver has no plugin (happens duri…
diosmosis Feb 26, 2023
6cfe3ee
insert empty DataTables (as this appears to be the existing behavior …
diosmosis Feb 26, 2023
43bd346
add RecordBuilder class name to aggregation query hint
diosmosis Feb 26, 2023
c24d3aa
clear up in-source todo
diosmosis Feb 26, 2023
c2de813
attempt only archiving requested report if range archive and the reco…
diosmosis Mar 3, 2023
0139de3
refactor ArchiveSelector::getArchiveIds() to provide result with stri…
diosmosis Mar 4, 2023
f7e3831
when all found archives are partial archives, check that requested da…
diosmosis Mar 4, 2023
70b5bab
return correct value in Model::getRecordsContainedInArchives()
diosmosis Mar 4, 2023
d05f243
fix if formatting
diosmosis Mar 4, 2023
1609d31
existingArchives can be falsy
diosmosis Mar 4, 2023
234467a
existing archives can be null if the check is not relevant to the cur…
diosmosis Mar 5, 2023
cb1cf08
do not archive dependent segments if only processing the specific req…
diosmosis Mar 5, 2023
21b62d8
fix more tests
diosmosis Mar 5, 2023
99599e9
fix LoaderTest
diosmosis Mar 5, 2023
46abe79
make sure if archiving specific reports for a single plugin that arch…
diosmosis Mar 5, 2023
4c2701b
add filterRecordBuilders event
diosmosis Mar 15, 2023
c817f79
if it looks like the requested records are numeric, prioritize the nu…
diosmosis Mar 15, 2023
d18b61e
fix copy-paste error
diosmosis Mar 15, 2023
f6714dc
add dummy test for numeric values
diosmosis Mar 16, 2023
52c4e93
add test for partial archiving of numeric records for ranges and fix …
diosmosis Mar 16, 2023
e1938de
Merge branch '5.x-dev' into record-builders-poc
diosmosis Apr 20, 2023
ce8f7a0
lessen code redundancy in Archive.php, use Piwik\\Request and do not …
diosmosis Apr 20, 2023
5b719b1
fix type hint
diosmosis Apr 20, 2023
8b48227
fix php-cs errors
diosmosis Apr 20, 2023
16c5857
fix failing tests
diosmosis Apr 20, 2023
0d5bd3d
fix failing tests (really)
diosmosis Apr 20, 2023
a30c89e
Merge branch '5.x-dev' into record-builders-poc
diosmosis May 7, 2023
e7f60b0
Merge branch '5.x-dev' into record-builders-poc
diosmosis May 11, 2023
81e9ead
fix isEnabled calls
diosmosis May 11, 2023
94b98da
only add idarchive to Archive.php idarchive cache if it is not alread…
diosmosis May 11, 2023
63146a3
remove unneeded TODO
diosmosis May 11, 2023
6b09b25
when forcing new archive because timestamp is too old, do not report …
diosmosis May 11, 2023
23e2e37
report no existing archives if done flag is different + add tests
diosmosis May 12, 2023
eea3211
remove unneeded unset
diosmosis May 14, 2023
8e780aa
Merge branch '5.x-dev' into record-builders-poc
sgiehl May 15, 2023
6c610fc
fix phpcs
sgiehl May 15, 2023
5eea5a1
remove unneeded newline
diosmosis May 15, 2023
2317316
use siteAware cache for RecordBuilder array
diosmosis May 17, 2023
ef4ca18
better typehints in RecordBuilder
diosmosis May 19, 2023
80a270e
ignore any records that are not declared in the record metadata (whic…
diosmosis May 21, 2023
8a7c75d
apply review feedback
diosmosis May 22, 2023
831e000
remove stray debugging change
diosmosis May 22, 2023
cf3def7
Merge branch '5.x-dev' into record-builders-poc
michalkleiner May 25, 2023
fd5f69e
Merge remote-tracking branch 'origin/5.x-dev' into record-builders-poc
michalkleiner May 26, 2023
b8bccd7
Update variable name for consistency
michalkleiner May 26, 2023
76a2823
Remove unnecessary array_filter since a valid class name never has an…
michalkleiner May 26, 2023
e59621d
Add TODOs
michalkleiner May 26, 2023
bb1b2f3
add comment on why we look for data within partial archives prior to …
diosmosis May 26, 2023
94c9796
typehint fixes + make insertBlobRecord (formerly insertRecord) protec…
diosmosis May 26, 2023
0a30895
more typehints
diosmosis May 27, 2023
6a5c2fd
in aggregateNumericMetrics() allow operationsToApply to be array mapp…
diosmosis May 27, 2023
b99f84d
optimization: when getting recordbuilders, only post Archiver.addReco…
diosmosis May 27, 2023
0b197bb
default to null if default column aggregation operation is not specified
diosmosis May 30, 2023
f06453b
Merge branch '5.x-dev' into record-builders-poc
sgiehl Jun 2, 2023
264c500
add check for invalid record name to Record
diosmosis Jun 2, 2023
540e6a2
allow dashes in record name since entity IDs can be used in them
diosmosis Jun 3, 2023
cdafadc
Merge branch '5.x-dev' into record-builders-poc
sgiehl Jun 5, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 46 additions & 21 deletions core/Archive.php
Original file line number Diff line number Diff line change
Expand Up @@ -436,20 +436,23 @@ public function getDataTableExpanded($name, $idSubtable = null, $depth = null, $
}

/**
* Returns the list of plugins that archive the given reports.
* Returns the given reports grouped by the plugin name that archives them.
*
* @param array $archiveNames
* @return array
* @return array `['MyPlugin' => ['MyPlugin_metric1', 'MyPlugin_report1'], ...]`
*/
private function getRequestedPlugins($archiveNames)
{
$result = [];

foreach ($archiveNames as $name) {
$result[] = self::getPluginForReport($name);
$plugin = self::getPluginForReport($name);
$result[$plugin][] = $name;
}

return array_unique($result);
$result = array_map('array_unique', $result);

return $result;
diosmosis marked this conversation as resolved.
Show resolved Hide resolved
}

/**
Expand Down Expand Up @@ -601,7 +604,8 @@ protected function get($archiveNames, $archiveDataType, $idSubtable = null)
*/
private function getArchiveIds($archiveNames)
{
$plugins = $this->getRequestedPlugins($archiveNames);
$archiveNamesByPlugin = $this->getRequestedPlugins($archiveNames);
$plugins = array_keys($archiveNamesByPlugin);

// figure out which archives haven't been processed (if an archive has been processed,
// then we have the archive IDs in $this->idarchives)
Expand All @@ -627,15 +631,13 @@ private function getArchiveIds($archiveNames)
$globalDoneFlag = Rules::getDoneFlagArchiveContainsAllPlugins($this->params->getSegment());
$doneFlags[$globalDoneFlag] = true;

$archiveGroups = array_unique($archiveGroups);

// cache id archives for plugins we haven't processed yet
if (!empty($archiveGroups)) {
if (
Rules::isArchivingEnabledFor($this->params->getIdSites(), $this->params->getSegment(), $this->getPeriodLabel())
&& !$this->forceFetchingWithoutLaunchingArchiving
) {
$this->cacheArchiveIdsAfterLaunching($archiveGroups, $plugins);
$this->cacheArchiveIdsAfterLaunching($archiveNamesByPlugin);
} else {
$this->cacheArchiveIdsWithoutLaunching($plugins);
}
Expand All @@ -651,10 +653,9 @@ private function getArchiveIds($archiveNames)
* This function will launch the archiving process for each period/site/plugin if
* metrics/reports have not been calculated/archived already.
*
* @param array $archiveGroups @see getArchiveGroupOfReport
* @param array $plugins List of plugin names to archive.
* @param array $archiveNamesByPlugin @see getRequestedPlugins
*/
private function cacheArchiveIdsAfterLaunching($archiveGroups, $plugins)
private function cacheArchiveIdsAfterLaunching($archiveNamesByPlugin)
{
foreach ($this->params->getPeriods() as $period) {
$twoDaysAfterPeriod = $period->getDateEnd()->addDay(2);
Expand Down Expand Up @@ -697,7 +698,7 @@ private function cacheArchiveIdsAfterLaunching($archiveGroups, $plugins)
continue;
}

$this->prepareArchive($archiveGroups, $site, $period);
$this->prepareArchive($archiveNamesByPlugin, $site, $period);
}
}
}
Expand Down Expand Up @@ -744,6 +745,15 @@ private function cacheArchiveIdsWithoutLaunching($plugins)
*/
private function getDoneStringForPlugin($plugin, $idSites)
{
$requestedReport = $this->getRequestedReport();

$shouldOnlyProcessRequestedArchives = empty($requestedReport)
&& Rules::shouldProcessOnlyReportsRequestedInArchiveQuery($this->getPeriodLabel());

if ($shouldOnlyProcessRequestedArchives) {
return Rules::getDoneFlagArchiveContainsOnePlugin($this->params->getSegment(), $plugin);
}

return Rules::getDoneStringFlagFor(
$idSites,
$this->params->getSegment(),
Expand Down Expand Up @@ -881,44 +891,50 @@ public static function getPluginForReport($report)
}

/**
* @param $archiveGroups
* @param $archiveNamesByPlugin
* @param $site
* @param $period
*/
private function prepareArchive(array $archiveGroups, Site $site, Period $period)
private function prepareArchive(array $archiveNamesByPlugin, Site $site, Period $period)
{
$coreAdminHomeApi = API::getInstance();

$requestedReport = null;
if (SettingsServer::isArchivePhpTriggered()) {
$requestedReport = Common::getRequestVar('requestedReport', '', 'string');
}
$requestedReport = $this->getRequestedReport();

$shouldOnlyProcessRequestedArchives = empty($requestedReport)
michalkleiner marked this conversation as resolved.
Show resolved Hide resolved
&& Rules::shouldProcessOnlyReportsRequestedInArchiveQuery($period->getLabel());

$periodString = $period->getRangeString();
$periodDateStr = $period->getLabel() == 'range' ? $periodString : $period->getDateStart()->toString();

$idSites = [$site->getId()];

// process for each plugin as well
foreach ($archiveGroups as $plugin) {
foreach ($archiveNamesByPlugin as $plugin => $archiveNames) {
$doneFlag = $this->getDoneStringForPlugin($plugin, $idSites);
$this->initializeArchiveIdCache($doneFlag);

$reportsToArchiveForThisPlugin = (empty($requestedReport) && $shouldOnlyProcessRequestedArchives) ? $archiveNames : $requestedReport;

$prepareResult = $coreAdminHomeApi->archiveReports(
$site->getId(),
$period->getLabel(),
$periodDateStr,
$this->params->getSegment()->getOriginalString(),
$plugin,
$requestedReport
$reportsToArchiveForThisPlugin
);

if (
!empty($prepareResult)
&& !empty($prepareResult['idarchives'])
) {
foreach ($prepareResult['idarchives'] as $idArchive) {
$this->idarchives[$doneFlag][$periodString][] = $idArchive;
if (empty($this->idarchives[$doneFlag][$periodString])
|| !in_array($idArchive, $this->idarchives[$doneFlag][$periodString])
) {
$this->idarchives[$doneFlag][$periodString][] = $idArchive;
}
}
}
}
Expand Down Expand Up @@ -956,4 +972,13 @@ public function forceFetchingWithoutLaunchingArchiving()
{
$this->forceFetchingWithoutLaunchingArchiving = true;
}

private function getRequestedReport(): ?string
michalkleiner marked this conversation as resolved.
Show resolved Hide resolved
{
$requestedReport = null;
if (SettingsServer::isArchivePhpTriggered()) {
$requestedReport = Request::fromRequest()->getStringParameter('requestedReport', '');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: The new request class doesn't do any sanitizing. I had a rough look where this parameter is passed through. Looks like there shouldn't be any risk of using the possibly user provided value.

}
return $requestedReport;
}
}
6 changes: 6 additions & 0 deletions core/ArchiveProcessor.php
Original file line number Diff line number Diff line change
Expand Up @@ -740,6 +740,12 @@ public function processDependentArchive($plugin, $segment)
}

$params = $this->getParams();
// range archives are always processed on demand, so pre-processing dependent archives is not required
// here
if (Rules::shouldProcessOnlyReportsRequestedInArchiveQuery($params->getPeriod()->getLabel())) {
return;
}

$idSites = [$params->getSite()->getId()];

// important to use the original segment string when combining. As the API itself would combine the original string.
Expand Down
50 changes: 39 additions & 11 deletions core/ArchiveProcessor/Loader.php
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ private function prepareArchiveImpl($pluginName)
if (sizeof($data) == 2) {
return $data;
}
list($idArchives, $visits, $visitsConverted) = $data;
list($idArchives, $visits, $visitsConverted, $foundRecords) = $data;

// only lock meet those conditions
if (ArchiveProcessor::$isRootArchivingRequest && !SettingsServer::isArchivePhpTriggered()) {
Expand All @@ -153,15 +153,15 @@ private function prepareArchiveImpl($pluginName)
return $data;
}

list($idArchives, $visits, $visitsConverted) = $data;
list($idArchives, $visits, $visitsConverted, $foundRecords) = $data;

return $this->insertArchiveData($visits, $visitsConverted);
return $this->insertArchiveData($visits, $visitsConverted, $idArchives, $foundRecords);
} finally {
$lock->unlock();
}
} else {

return $this->insertArchiveData($visits, $visitsConverted);
return $this->insertArchiveData($visits, $visitsConverted, $idArchives, $foundRecords);
}
}

Expand All @@ -171,17 +171,27 @@ private function prepareArchiveImpl($pluginName)
* @param $visitsConverted
* @return array|false[]
*/
protected function insertArchiveData($visits, $visitsConverted)
protected function insertArchiveData($visits, $visitsConverted, $existingArchives, $foundRecords)
{
if (SettingsServer::isArchivePhpTriggered()) {
$this->logger->info("initiating archiving via core:archive for " . $this->params);
}

if (!empty($foundRecords)) {
$this->params->setFoundRequestedReports($foundRecords);
}

list($visits, $visitsConverted) = $this->prepareCoreMetricsArchive($visits, $visitsConverted);
list($idArchive, $visits) = $this->prepareAllPluginsArchive($visits, $visitsConverted);

if ($this->isThereSomeVisits($visits) || PluginsArchiver::doesAnyPluginArchiveWithoutVisits()) {
return [[$idArchive], $visits];
if ($this->isThereSomeVisits($visits)
|| PluginsArchiver::doesAnyPluginArchiveWithoutVisits()
) {
$idArchivesToQuery = [$idArchive];
if (!empty($foundRecords)) {
$idArchivesToQuery = array_merge($idArchivesToQuery, $existingArchives ?: []);
}
return [$idArchivesToQuery, $visits];
}

return [false, false];
Expand All @@ -208,11 +218,22 @@ protected function loadArchiveData()
// this hack was used to check the main function goes to return or continue
// NOTE: $idArchives will contain the latest DONE_OK/DONE_INVALIDATED archive as well as any partial archives
// with a ts_archived >= the DONE_OK/DONE_INVALIDATED date.
list($idArchives, $visits, $visitsConverted, $isAnyArchiveExists, $tsArchived, $value) = $this->loadExistingArchiveIdFromDb();
$archiveInfo = $this->loadExistingArchiveIdFromDb();
$idArchives = $archiveInfo['idArchives'];
sgiehl marked this conversation as resolved.
Show resolved Hide resolved
$visits = $archiveInfo['visits'];
$visitsConverted = $archiveInfo['visitsConverted'];
$tsArchived = $archiveInfo['tsArchived'];
$doneFlagValue = $archiveInfo['doneFlagValue'];
$existingArchives = $archiveInfo['existingRecords'];

$requestedRecords = $this->params->getArchiveOnlyReportAsArray();
$isMissingRequestedRecords = !empty($requestedRecords) && is_array($existingArchives) && count($requestedRecords) != count($existingArchives);

if (!empty($idArchives)
&& !Rules::isActuallyForceArchivingSinglePlugin()
&& !$this->shouldForceInvalidatedArchive($value, $tsArchived)) {
&& !$this->shouldForceInvalidatedArchive($doneFlagValue, $tsArchived)
&& !$isMissingRequestedRecords
) {
// we have a usable idarchive (it's not invalidated and it's new enough), and we are not archiving
// a single report
return [$idArchives, $visits];
Expand All @@ -232,7 +253,7 @@ protected function loadArchiveData()
}
}

return [$idArchives, $visits, $visitsConverted];
return [$idArchives, $visits, $visitsConverted, $existingArchives];
}

/**
Expand Down Expand Up @@ -331,7 +352,14 @@ public function loadExistingArchiveIdFromDb()

// return no usable archive found, and no existing archive. this will skip invalidation, which should
// be fine since we just force archiving.
return [false, false, false, false, false, false];
return [
'idArchives' => false,
'visits' => false,
'visitsConverted' => false,
'archiveExists' => false,
'tsArchived' => false,
'doneFlagValue' => false,
];
}

$minDatetimeArchiveProcessedUTC = $this->getMinTimeArchiveProcessed();
Expand Down
32 changes: 30 additions & 2 deletions core/ArchiveProcessor/Parameters.php
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,11 @@ class Parameters
*/
private $isArchiveOnlyReportHandled;

/**
* @var string[]|null
*/
private $foundRequestedReports;

/**
* Constructor.
*
Expand All @@ -71,7 +76,7 @@ public function __construct(Site $site, Period $period, Segment $segment)
* If we want to archive only a single report, we can request that via this method.
* It is up to each plugin's archiver to respect the setting.
*
* @param string $archiveOnlyReport
* @param string|string[] $archiveOnlyReport
* @api
*/
public function setArchiveOnlyReport($archiveOnlyReport)
Expand Down Expand Up @@ -267,7 +272,11 @@ public function logStatusDebug()

public function __toString()
{
return "[idSite = {$this->getSite()->getId()}, period = {$this->getPeriod()->getLabel()} {$this->getPeriod()->getRangeString()}, segment = {$this->getSegment()->getString()}, plugin = {$this->getRequestedPlugin()}, report = {$this->getArchiveOnlyReport()}]";
$requestedReports = $this->getArchiveOnlyReport();
if (is_array($requestedReports)) {
$requestedReports = implode(', ', $requestedReports);
}
return "[idSite = {$this->getSite()->getId()}, period = {$this->getPeriod()->getLabel()} {$this->getPeriod()->getRangeString()}, segment = {$this->getSegment()->getString()}, plugin = {$this->getRequestedPlugin()}, report = {$requestedReports}]";
}

/**
Expand Down Expand Up @@ -295,4 +304,23 @@ public function setIsPartialArchive($isArchiveOnlyReportHandled)
{
$this->isArchiveOnlyReportHandled = $isArchiveOnlyReportHandled;
}

public function getArchiveOnlyReportAsArray()
{
$requestedReport = $this->getArchiveOnlyReport();
if (empty($requestedReport)) {
return [];
}
return is_array($requestedReport) ? $requestedReport : [$requestedReport];
}

public function setFoundRequestedReports(array $foundRecords)
{
$this->foundRequestedReports = $foundRecords;
}

public function getFoundRequestedReports()
{
return $this->foundRequestedReports ?: [];
}
}
12 changes: 12 additions & 0 deletions core/ArchiveProcessor/PluginsArchiver.php
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,19 @@ public function callAggregateAllPlugins($visits, $visitsConverted, $forceArchivi

$archivers = static::getPluginArchivers();

$archiveOnlyPlugin = $this->params->getRequestedPlugin();
$archiveOnlyReports = $this->params->getArchiveOnlyReport();

foreach ($archivers as $pluginName => $archiverClass) {
// if we are archiving specific reports for a single plugin then we don't need or want to create
// Archiver instances, since they will set the archive to partial even if the requested reports aren't
// handled by the Archiver
if (!empty($archiveOnlyReports)
&& $archiveOnlyPlugin != $pluginName
) {
continue;
}

// We clean up below all tables created during this function call (and recursive calls)
$latestUsedTableId = Manager::getInstance()->getMostRecentTableId();

Expand Down
Loading