-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Post Import Service #4370
base: 2.x
Are you sure you want to change the base?
Conversation
…it over to the postimport service class, refactored and improved tests
34e4851
to
2d86f0d
Compare
Going through the QA, I hit an issue at Unsuccessful Post Import (Drop Resource) > Confirm that the dataset cedcd327-4e5d-43f9-8eb1-c11850fa7c55 has a Store value of waiting(0%) with an error in the Post Import column. When I ran cron I got the following error: $ ddev drush cron
[notice] Localization of resource 3a187a87dc6cd47c48b6b4c4785224b7:
[notice] ImportService for 3a187a87dc6cd47c48b6b4c4785224b7__1735074165 completed.
[error] SQLSTATE[22007]: Invalid datetime format: 1292 Incorrect date value: '2049' for column 'objectid' at row 1
[error] Failed to drop datastore for aba79c7a62fec6c2e49bc1cebd0d7233. No datastore storage found for 3a187a87dc6cd47c48b6b4c4785224b7:1735068191. On the dashboard, I do see the right error in the postImport column but the store says done: I can confirm the datastore was not dropped and the objectId has reverted back to text (it was int after the previous QA steps): GET https://kaise.ddev.site/api/1/datastore/query/cedcd327-4e5d-43f9-8eb1-c11850fa7c55/0?results=false&count=false returns {
"schema": {
"699890c9-480c-5f2e-89c5-06e9723303fb": {
"fields": {
"objectid": {
"type": "text",
"mysql_type": "text",
"description": "OBJECTID"
},
"roadway": {
"type": "text",
"mysql_type": "text",
"description": "ROADWAY"
},
"road_side": {
"type": "text",
"mysql_type": "text",
"description": "ROAD_SIDE"
},
"lncd": {
"type": "text",
"mysql_type": "text",
"description": "LNCD"
},
"descr": {
"type": "text",
"mysql_type": "text",
"description": "DESCR"
},
"begin_post": {
"type": "text",
"mysql_type": "text",
"description": "BEGIN_POST"
},
"end_post": {
"type": "text",
"mysql_type": "text",
"description": "END_POST"
},
"shape_leng": {
"type": "text",
"mysql_type": "text",
"description": "Shape_Leng"
}
}
}
},
"query": {
"results": false,
"count": false,
"resources": [
{
"id": "699890c9-480c-5f2e-89c5-06e9723303fb",
"alias": "t"
}
],
"limit": 500,
"offset": 0,
"schema": true,
"keys": true,
"format": "json",
"rowIds": false,
"properties": [
"objectid",
"roadway",
"road_side",
"lncd",
"descr",
"begin_post",
"end_post",
"shape_leng"
]
}
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! See the question on QA in the PR conversation.
Also, reviewing how the code works there is something off to me in the back and forth between PostImport
and PostImportResult
. It's just strange to me that PostImport
puts this data into an object, which has a storeResult()
method, but the storeResult()
method just calls another method that's back in PostImport()
. It's essentially a circular dependency, though Drupal won't notice because PostImportResult
is not a service. I think the correct way to do this would be one of two things:
- Make the entire storeResult() method on PostImportResult be essentially self-contained, so it contains the DB query and just needs to be passed a Connection object (ie the
database
service). Remove PostImport::storeJobStatus(). This would also remove Connection as a dependency for PostImport. - Remove the
storeResult()
method fromPostImportResult
, and simply havePostImport::storeJobStatus
take aPostImportResult
object as its argument, rather than all those individual arguments. This means that rather than rather than calling$result->storeResult()
,PostImportResourceProcessor::processItem()
would call$this->postImport->storeJobStatus($result)
.
I'd like to see this changed but if it feels out of scope we can make a new ticket for it, tell me what you think.
* @param mixed $resourceId | ||
* A resource ID. | ||
*/ | ||
public function invalidateCacheTags($resourceId): void { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a redundant method. It's identical to DatastoreService::invalidateCacheTags, and we are already injecting DatastoreService. Remove this method, remove ReferenceLookup as a dependency, and call the function as $this->datastoreService->invalidateReferencerCacheTags().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(need to make the DatastoreService method public, obviously)
) { | ||
$this->configFactory = $configFactory; | ||
$this->logger = $logger; | ||
$this->resourceMapper = $resourceMapper; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to think about how to get the number of dependencies down here. ResourceMapper is also part of DatastoreService. I think it would be safe to just make $datastoreService->resourceMapper public, since there are no public properties or setter methods on resourceMapper. Or we could add a getResourceMapper() method to $datastoreService.
$this->resourceMapper = $resourceMapper; | ||
$this->resourceProcessorCollector = $resourceProcessorCollector; | ||
$this->dataDictionaryDiscovery = $dataDictionaryDiscovery; | ||
$this->referenceLookup = $referenceLookup; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See commnent below for a suggestion how we can eliminate the referenceLookup dependency.
$timeout = (int) $plugin_definition['cron']['lease_time']; | ||
$alter_table_query_builder->setConnectionTimeout($timeout); | ||
$this->referenceLookup = $referenceLookup; | ||
$this->configFactory = $config_factory; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could be wrong but I don't think this is used for anything?
@dafeder Can you confirm you selected Drop the datastore table if the post import queue reports an error. on the Resource settings page (admin/dkan/resources) before attempting to run a failing post import? I attempted the QA steps and was able to see "waiting (0%)" under the store column. |
QA Steps
Run the following tests.
ddev dkan-phpunit --filter PostImportTest
ddev dkan-phpunit --filter dictionaryenforcertest
Successful Post Import
Unsuccessful Post Import (Drop Resource)
Data Dictionary Disabled
Data Dictionary via Distribution Reference
Resource does not have dictionary
QA:
Helpful script:
You should see messages like this:
You should then also see a message like this in the dataset import status dashboard: