Skip to content
This repository has been archived by the owner on Oct 4, 2024. It is now read-only.

Commit

Permalink
Merge pull request #61 from gilesknap/filesystem-search
Browse files Browse the repository at this point in the history
Fix shared folders and improve comparison
  • Loading branch information
gilesknap authored Mar 4, 2019
2 parents 37bd4aa + 1ce7f10 commit 3ac5798
Show file tree
Hide file tree
Showing 10 changed files with 154 additions and 84 deletions.
6 changes: 3 additions & 3 deletions Pipfile.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

73 changes: 37 additions & 36 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,6 @@
Google Photos Sync
==================

Version 2.0 Major Upgrade
==============================
Google has released a new Google Photos API and this project is now based on that API. The myriad issues with the
previous approach using Drive API and Picasa API are now resolved. However, see new known issues below.

In addition to this, the new code uses parallel processing to speed up downloads considerably.

Description
===========

Google Photos Sync downloads your Google Photos to the local file system. It will backup all the photos the
user uploaded to
Google Photos, but also the album information and additional Google Photos 'Creations' (animations, panoramas,
Expand All @@ -27,62 +17,73 @@ After doing a full sync you will have 2 directories off of the specified root:
* **photos** - contains all photos and videos from your Google Photos Library organized into folders with the
structure 'photos/YYYY/MM' where 'YYYY/MM' is the date the photo/video was taken. The filenames within a folder
will be as per the original upload except that duplicate names will have a suffix ' (n)' where n is the duplicate number
of the file (this matched the approach used in the official Google tool for Windows).
of the file (this matches the approach used in the official Google tool for Windows).

* **albums** - contains a folder hierarchy representing the set of albums and shared albums in your library. All
the files are symlinks to content in one of the other folders. The folder names will be
'albums/YYYY/MM Original Album Name'.

In the root folder a sqlite database holds an index of all media and albums. Useful to find out about the state of your
photo store. You can open it with the sqlite3 tool and perform any sql queries.
In addition there will be further folders when using the --compare-folder option. The option is used to make a
comparison of the contents of your library with a local folder such as a previous backup. The comparison does not require
that the files are arranged in the same folders, it uses meta-data in the files such as create date and
exif UID to match pairs of items. The additional folders after a comparison will be:

This has been tested against my photo store of nearly 100,000 photos.
* **comparison** a new folder off of the specified root containing the following:

* **missing_files** - contains symlinks to the files in the comparison folder that were not found in the Google
Photos Library. The folder structure is the same as that in the comparison folder. These are the
files that you would upload to Google Photos via the Web interface to restore from backup.

Currently Download Only
-----------------------
``gphotos-sync`` currently does not have upload features. I do intend to provide an upload facility so that it would
be possible to download your library and upload it to another account, or to upload new photos. Full two way
synchronization capability is a much bigger challenge and at present I've not come up with a robust enough approach
for this. UPDATE: there are a couple of limitations on the API that will stop me from bothering to do upload until they are
addressed: (1) all uploads count against quota - Google probably won't address this (2) you can only add media to
albums at upload time, not rearrange existing media into albums.
* **extra_files** - contains symlinks into to the files in photos folder which appear in the Library but not in the
comparison folder. The folder structure is the same as the photos folder.

* **duplicates** - contains symlinks to any duplicate files found in the comparison folder. This is a flat structure
and the symlink filenames have a numeric prefix to make them unique and group the duplicates together.

Primary Goals
-------------
* Provide a file system backup so it is easy to monitor for accidental deletions (or deletions caused by bugs)
in very large photo collections.

* Make it feasible to switch to a different photo management system in future if this ever becomes desirable/necessary.
NOTES:

* Provide a comparison function so that your current Photos library can be verified against a historical backup.
* the comparison code uses an external tool 'ffprobe'. It will run without it but will not be able to
extract metadata from video files and revert to relying on Google Photos meta data and file modified date (this is
a much less reliable way to match video files, but the results should be OK if the backup folder
was originally created using gphotos-sync).
* If the library contains two separate items that have the same exif UID then this will result in seeing one
pair of duplicates, plus one of those duplicates will appear in the extra_files list.

Known Issues
------------
A few outstanding limitations of the Google API restrict what can be achieved. All these issues have been reported
to Google and this project will be updated once they are resolved.

* There is no way to discover modified date of library media items. Currently ``gphotos-sync`` will refresh your local
copy with any new photos added since the last scan but will not update any photos that have been modified in Google
Photos. A feature request has been submitted to Google see https://issuetracker.google.com/issues/122737849.
* Some types of video will not download using the new API. This mostly is restricted to old formats of video file (in
my library it is a subset of videos shot before 2010). Google is looking at this problem see
https://issuetracker.google.com/issues/116842164
* The API strips GPS data from images see https://issuetracker.google.com/issues/80379228.
* Video download transcodes the videos even if you ask for the original file (=vd parameter) see https://issuetracker.google.com/issues/80149160. My experience is that the result is indistinguishable visually but it is a smaller file with approximately 60% bitrate (same resolution).

* Video download transcodes the videos even if you ask for the original file (=vd parameter) see
https://issuetracker.google.com/issues/80149160. My experience is that the result is looks similar to the original
but the compression is more clearly visible. It is a smaller file with approximately 60% bitrate (same resolution).


Install and configure
---------------------
To install latest published version from PyPi, simply::
To install the latest published version from PyPi, simply::

pipenv install gphotos-sync

Or if you don't want to use pipenv::

pip install gphotos-sync
sudo pip install gphotos-sync

To work from the source code, clone the git repository and run setup.py from the source
directory. (if required use a virtualenv) ::
To work from the source code, clone the git repository and use pipenv to create a virtual environment and run
the code. (if you don't have pipenv, then I recommend getting it - but you can use
'sudo python setup.py install' instead) ::

git clone https://github.com/gilesknap/gphotos-sync.git
cd gphotos-sync
sudo python3 setup.py install
pipenv install .
pipenv run gphotos-sync

In order to work, ``gphotos-sync`` first needs a valid client id linked to a project
authorized to use the 'Photos Library API'. It is not provided in the distribution. Each client id
Expand Down
52 changes: 40 additions & 12 deletions gphotos/GoogleAlbumsSync.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,15 @@
# coding: utf8
import shutil
from datetime import datetime
from typing import Dict
from typing import Dict, Callable
from pathlib import Path
import os.path

from . import Utils
from .GoogleAlbumMedia import GoogleAlbumMedia
from .GooglePhotosMedia import GooglePhotosMedia
from .GoogleAlbumsRow import GoogleAlbumsRow
from .GooglePhotosRow import GooglePhotosRow
from .LocalData import LocalData
from .restclient import RestClient
import logging
Expand All @@ -31,6 +32,7 @@ def __init__(self, api: RestClient, root_folder: Path, db: LocalData,
"""
self._root_folder: Path = root_folder
self._links_root = self._root_folder / 'albums'
self._photos_root = self._root_folder / 'photos'
self._db: LocalData = db
self._api: RestClient = api
self.flush = flush
Expand All @@ -45,7 +47,8 @@ def make_search_parameters(cls, album_id: str,
}
return body

def fetch_album_contents(self, album_id: str) -> (datetime, datetime):
def fetch_album_contents(self, album_id: str,
add_media_items: bool) -> (datetime, datetime):
first_date = Utils.maximum_date()
last_date = Utils.minimum_date()
body = self.make_search_parameters(album_id=album_id)
Expand All @@ -62,6 +65,14 @@ def fetch_album_contents(self, album_id: str) -> (datetime, datetime):
self._db.put_album_file(album_id, media_item.id)
last_date = max(media_item.create_date, last_date)
first_date = min(media_item.create_date, first_date)
# this adds other users photos from shared albums
log.debug('Adding album media item %s %s %s',
media_item.relative_path, media_item.filename,
media_item.duplicate_number)
if add_media_items:
media_item.set_path_by_date(self._photos_root)
self._db.put_row(
GooglePhotosRow.from_media(media_item), False)
next_page = items_json.get('nextPageToken')
if next_page:
body = self.make_search_parameters(album_id=album_id,
Expand All @@ -72,34 +83,47 @@ def fetch_album_contents(self, album_id: str) -> (datetime, datetime):
return first_date, last_date

def index_album_media(self):
self.index_albums_type(self._api.sharedAlbums.list.execute,
'sharedAlbums', "Shared (titled) Albums",
False, True)
self.index_albums_type(self._api.albums.list.execute,
'albums', "Albums", True, False)

def index_albums_type(self, api_function: Callable, item_key: str,
description: str, allow_null_title: bool,
add_media_items: bool):
"""
query google photos interface for a list of all albums and index their
contents into the db
"""
log.warning('Indexing Albums ...')
log.warning('Indexing {} ...'.format(description))

# there are no filters in album listing at present so it always a
# full rescan - it's quite quick

count = 0
response = self._api.albums.list.execute(pageSize=50)
response = api_function(pageSize=50)
while response:
results = response.json()
for album_json in results['albums']:
for album_json in results.get(item_key, []):
count += 1

album = GoogleAlbumMedia(album_json)
indexed_album = self._db.get_album(album_id=album.id)
already_indexed = indexed_album.size == album.size if \
indexed_album else False

if already_indexed:
if not allow_null_title and album.description == 'none':
log.debug('Skipping no-title album, photos: %d',
album.size)
elif already_indexed and not self.flush:
log.debug('Skipping Album: %s, photos: %d', album.filename,
album.size)
else:
log.info('Indexing Album: %s, photos: %d', album.filename,
album.size)
first_date, last_date = self.fetch_album_contents(album.id)
first_date, last_date = self.fetch_album_contents(
album.id, add_media_items)
# write the album data down now we know the contents'
# date range
gar = GoogleAlbumsRow.from_parm(
Expand All @@ -119,11 +143,11 @@ def index_album_media(self):

next_page = results.get('nextPageToken')
if next_page:
response = self._api.albums.list.execute(pageSize=50,
pageToken=next_page)
response = api_function(pageSize=50,
pageToken=next_page)
else:
break
log.warning('Indexed %d Albums', count)
log.warning('Indexed %d %s', count, description)

def album_folder_name(self, album_name: str, end_date: datetime) -> Path:
year = Utils.safe_str_time(end_date, '%Y')
Expand All @@ -143,7 +167,7 @@ def create_album_content_links(self):
shutil.rmtree(self._links_root)
re_download = not self._links_root.exists()

for (path, file_name, album_name, end_date_str, rid) in \
for (path, file_name, album_name, end_date_str, rid, created) in \
self._db.get_album_files(download_again=re_download):
if current_rid == rid:
album_item += 1
Expand All @@ -168,10 +192,14 @@ def create_album_content_links(self):
log.debug('new album folder %s', link_folder)
link_folder.mkdir(parents=True)

created_date = Utils.string_to_date(created)
link_file.symlink_to(relative_filename)
os.utime(str(link_file),
(Utils.safe_timestamp(created_date),
Utils.safe_timestamp(created_date)),
follow_symlinks=False)
count += 1
except FileExistsError:
log.error('bad link to %s', full_file_name)

log.warning("Created %d new album folder links", count)

1 change: 0 additions & 1 deletion gphotos/GooglePhotosDownload.py
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,6 @@ def do_download_file(self, base_url: str, media_item: DatabaseMedia):
temp_file.close()
response.close()
t_path.rename(local_full_path)
# todo is there a path lib equivalent
os.utime(str(local_full_path),
(Utils.safe_timestamp(media_item.modify_date),
Utils.safe_timestamp(media_item.create_date)))
Expand Down
21 changes: 10 additions & 11 deletions gphotos/LocalData.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,8 +129,13 @@ def put_row(self, row: DbRow, update=False):
query = "UPDATE {0} Set {1} WHERE RemoteId = '{2}'".format(
row.table, row.update, row.RemoteId)
else:
query = "INSERT INTO {0} ({1}) VALUES ({2})".format(
row.table, row.columns, row.params)
# EXISTS - allows for no action when trying to re-insert
# noinspection PyUnresolvedReferences
query = \
"INSERT INTO {0} ({1}) SELECT {2} " \
"WHERE NOT EXISTS (SELECT * FROM SyncFiles " \
"WHERE RemoteId = '{3}')".format(
row.table, row.columns, row.params, row.RemoteId)
self.cur.execute(query, row.dict)
row_id = self.cur.lastrowid
except lite.IntegrityError:
Expand Down Expand Up @@ -274,7 +279,7 @@ def put_album_downloaded(self, album_id: str, downloaded: bool = True):
"WHERE RemoteId IS ?;", (downloaded, album_id))

def get_album_files(self, album_id: str = '%', download_again: bool = False
) -> (str, str, str, str):
) -> (str, str, str, str, str):
""" Join the Albums, SyncFiles and AlbumFiles tables to get a list
of the files in an album or all albums.
Parameters
Expand All @@ -289,12 +294,12 @@ def get_album_files(self, album_id: str = '%', download_again: bool = False

query = """
SELECT SyncFiles.Path, SyncFiles.Filename, Albums.AlbumName,
Albums.EndDate, Albums.RemoteId FROM AlbumFiles
Albums.EndDate, Albums.RemoteId, SyncFiles.CreateDate FROM AlbumFiles
INNER JOIN SyncFiles ON AlbumFiles.DriveRec=SyncFiles.RemoteId
INNER JOIN Albums ON AlbumFiles.AlbumRec=Albums.RemoteId
WHERE Albums.RemoteId LIKE ?
{}
ORDER BY AlbumName, SyncFiles.CreateDate;""".format(extra_clauses)
ORDER BY Albums.RemoteId, SyncFiles.CreateDate;""".format(extra_clauses)

self.cur.execute(query, (album_id,))
results = self.cur.fetchall()
Expand All @@ -311,12 +316,10 @@ def put_album_file(self, album_rec: str, file_rec: str):
"?) ;",
(album_rec, file_rec))


def remove_all_album_files(self):
# noinspection SqlWithoutWhere
self.cur.execute("DELETE FROM AlbumFiles")


# ---- LocalFiles Queries -------------------------------------------

def get_missing_paths(self):
Expand All @@ -330,7 +333,6 @@ def get_missing_paths(self):
pth = Path(r.relative_path.parent / r.filename)
yield pth


def get_duplicates(self):
self.cur2.execute(Queries.duplicate_files)
while True:
Expand All @@ -342,7 +344,6 @@ def get_duplicates(self):
pth = r.relative_path.parent / r.filename
yield r.id, pth


def get_extra_paths(self):
self.cur2.execute(Queries.extra_files)
while True:
Expand All @@ -354,15 +355,13 @@ def get_extra_paths(self):
pth = r.relative_path.parent / r.filename
yield pth


def local_exists(self, file_name: str, path: str):
self.cur.execute(
"SELECT COUNT() FROM main.LocalFiles WHERE FileName = ?"
"AND PATH = ?;", (file_name, path))
result = int(self.cur.fetchone()[0])
return result


def find_local_matches(self):
# noinspection SqlWithoutWhere
for q in Queries.match:
Expand Down
Loading

0 comments on commit 3ac5798

Please sign in to comment.