Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timestamp import too sensitive #45

Open
therourke opened this issue Jan 24, 2013 · 2 comments
Open

Timestamp import too sensitive #45

therourke opened this issue Jan 24, 2013 · 2 comments
Assignees

Comments

@therourke
Copy link

I have been having a few problems since I last updated with imports. Both my feeds from Delicious and Librarything often end up with time stamps very close together like this:

Posted: Wed, 23 Jan 2013 08:35:44 -0500
Posted: Wed, 23 Jan 2013 08:35:42 -0500
Posted: Wed, 23 Jan 2013 08:35:40 -0500
Posted: Wed, 23 Jan 2013 08:35:39 -0500

They used to import fine, but now only one item makes it into the database (usually the last one). Any hints on how to change this behavior?

MitchellMcKenna added a commit that referenced this issue Jan 25, 2013
- Fix detection of duplicate items during import; stopping duplicates
  based on the minute they were made is too sensitive, base it on the
  actual unix timestamp.
- Note this will cause duplicates during import, a migration will be
  needed to remove those duplicates.

Fixes #45
@MitchellMcKenna
Copy link
Owner

Cause
During the import of new feed items, SimplePie by default returns get_date() in the format ('j F Y, g:i a') and then we convert this to a unix timestamp using strtotime() in Lifepress.php. Because that date format is only specific down to the minute, this means LifePress thinks anything posted from the same feed during the same minute is a duplicate.

Solution
get_date() can however return any date format, so by changing Lifepress.php line 102 to return the actual timestamp of the feed item, we can solve this issue:

$new->item_date = $item->get_date('U');

This however will cause LifePress to create duplicates of all the feed items during the next import because the timestamps from old imports don't match those from the new code. Duplicate items can be removed by running the following query:

/* Delete Duplicate LifePress items */
UPDATE items i1
LEFT JOIN items i2 ON i1.`item_permalink` = i2.`item_permalink`
SET i1.`item_status` = "deleted"
WHERE i1.ID != i2.ID
AND i1.`item_feed_id` = i2.`item_feed_id`
AND i1.item_status = 'publish'
AND i2.item_status = 'publish'
AND i1.ID > i2.ID;

This query should be moved into a db migration that gets merged into the develop branch with the above change. The migration should force a new import from the feeds, then run the above query.

This means this ticket should not be merged into master until the DB Migration Admin Page is created #19. Until then, if you are having this issue feel free to make the code change manually in Lifepress.php and after fetching new items, run the above query to remove the duplicates.

@ghost ghost assigned MitchellMcKenna Jan 25, 2013
@therourke
Copy link
Author

Good stuff!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants