-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/migrate images #265
Changes from 50 commits
9c83f35
4032485
8101d52
baf2fb5
282b22d
1c67449
37323a6
ef0d31e
e8baf1d
fb6bb36
6722c04
b79d68b
83f84f3
fe04bc5
e7f85c0
2b466a5
4037eb7
722d2e2
2788225
446ffc3
60d5763
52a33f5
5af7481
c80d02d
3947ff5
026da5c
723dc2b
3883d3a
c990005
00ff73a
9b33d29
7ddbf9e
ce2cfdd
60baf76
dcb6068
ba4d226
a91713c
8086c62
0de07ab
a807447
2ec73c2
276c9f5
dfbd135
11d675f
ab0a119
55ce56f
7210971
7d7f957
576a2e3
4328ebc
71e357c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,14 +1,23 @@ | ||
"""Convert mezzanine-based pages to wagtail page models.""" | ||
|
||
import filecmp | ||
import glob | ||
import json | ||
import os | ||
import os.path | ||
import shutil | ||
from collections import defaultdict | ||
|
||
from cdhweb.pages.models import ContentPage, HomePage, LandingPage | ||
from django.conf import settings | ||
from django.core.files.images import ImageFile | ||
from django.core.management.base import BaseCommand | ||
from django.db.models import Q | ||
from mezzanine.core.models import CONTENT_STATUS_PUBLISHED | ||
from mezzanine.pages import models as mezz_page_models | ||
from wagtail.core.blocks import RichTextBlock | ||
from wagtail.core.models import Page, Site | ||
from wagtail.core.models import Page, Site, Collection, get_root_collection_id | ||
from wagtail.images.models import Image | ||
|
||
from cdhweb.pages.models import ContentPage, HomePage, LandingPage | ||
|
||
|
||
class Command(BaseCommand): | ||
|
@@ -41,6 +50,7 @@ def create_landingpage(self, page): | |
return LandingPage( | ||
title=page.title, | ||
tagline=page.landingpage.tagline, # landing pages have a tagline | ||
header_image=self.get_wagtail_image(page.landingpage.image), | ||
slug=self.convert_slug(page.slug), | ||
seo_title=page._meta_title or page.title, | ||
body=json.dumps([{ | ||
|
@@ -81,6 +91,9 @@ def handle(self, *args, **options): | |
Page.objects.filter(depth__gt=2).delete() | ||
# PageRevision.objects.all().delete() | ||
|
||
# convert media images to wagtail images | ||
self.image_exodus() | ||
|
||
# create the new homepage | ||
old_homepage = mezz_page_models.Page.objects.get(slug="/") | ||
homepage = self.create_homepage(old_homepage) | ||
|
@@ -138,7 +151,7 @@ def handle(self, *args, **options): | |
.filter(Q(slug__startswith="events/") | Q(slug="year-of-data")) | ||
for page in event_pages: | ||
self.migrate_pages(page, events) | ||
|
||
if projects: | ||
# - migrate project pages but specify new projects list page as parent | ||
# - process about page last so project pages don't nest | ||
|
@@ -198,7 +211,6 @@ def migrate_pages(self, page, parent): | |
# recursively create and add new versions of all this page's children | ||
for child in page.children.all(): | ||
self.migrate_pages(child, new_page) | ||
|
||
|
||
def form_pages(self): | ||
# migrate embedded google forms from mezzanine templates | ||
|
@@ -219,3 +231,124 @@ def form_pages(self): | |
{"type": "migrated", "value": '<iframe title="Cosponsorship Request Form" height="3250" src="https://docs.google.com/forms/d/e/1FAIpQLSeP40DBM7n8GYgW_i99nRxY5T5P39DrIWyIwq9LggIwu4r5jQ/viewform?embedded=true">Loading...</iframe>'} | ||
]) | ||
cosponsor.save() | ||
|
||
# cached collections used for migrated media | ||
collections = { | ||
# get root collection so we can add children to it | ||
'root': Collection.objects.get(pk=get_root_collection_id()) | ||
} | ||
|
||
def get_collection(self, name): | ||
# if we don't already have this collection, get it | ||
if name not in self.collections: | ||
# try to get it if it already exists | ||
coll = Collection.objects.filter(name=name).first() | ||
# otherwise, create it | ||
if not coll: | ||
coll = Collection(name=name) | ||
self.collections['root'].add_child(instance=coll) | ||
self.collections['root'].save() | ||
|
||
self.collections[name] = coll | ||
|
||
return self.collections[name] | ||
|
||
def image_exodus(self): | ||
# generate wagtail images for all uploaded media | ||
|
||
# mezzanine/filebrowser_safe doesn't seem to have useful objects | ||
# or track metadata, so just import from the filesystem | ||
|
||
# delete all images prior to run (clear out past migration attempts) | ||
Image.objects.all().delete() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should we also do this for collections? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe? I have some extraneous collections right now that I think the current version of the script wouldn't create but older versions did; but once the script is stable I don't think it matters or gets us much. (I'm open to it, LMK) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i think for consistency let's do it. gives me peace of mind to know that everything is being wiped, haha |
||
# also delete any wagtail image files, since they are not deleted | ||
# by removing the objects | ||
shutil.rmtree('%s/images' % settings.MEDIA_ROOT, ignore_errors=True) | ||
shutil.rmtree('%s/original_images' % settings.MEDIA_ROOT, ignore_errors=True) | ||
|
||
# get media filenames to migrate, with duplicates filtered out | ||
media_filenames = self.get_media_files() | ||
|
||
for imgpath in media_filenames: | ||
extension = os.path.splitext(imgpath)[1] | ||
# skip unsupported files based on file extension | ||
# NOTE: leaving this here in case we want to handle | ||
# documents the same way | ||
if extension in ['.pdf', '.svg', '.docx']: | ||
continue | ||
Comment on lines
+278
to
+281
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah, this is handy - thx for the note There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah — thought about filtering them out when I filter out the duplicates, but wasn't sure yet how we'll handle the files |
||
|
||
# if image is in a subdirectory under uploads (e.g. projects, blog) | ||
# add it to a collection with that name | ||
relative_path = os.path.dirname(imgpath) \ | ||
.replace('%s/uploads/' % settings.MEDIA_ROOT, '') | ||
|
||
# there are two variants of Slavic DH, one with and one | ||
# without a space; remove the space so they'll be in one collection | ||
basedir = relative_path.split('/')[0].replace(' ', '') | ||
collection = None | ||
if basedir: | ||
collection = self.get_collection(basedir) | ||
|
||
with open(imgpath, 'rb') as imgfilehandle: | ||
title = os.path.basename(imgpath) | ||
# passing collection=None errors, so | ||
# only specify collection option when we have one | ||
extra_opts = {} | ||
if collection: | ||
extra_opts['collection'] = collection | ||
try: | ||
Image.objects.create( | ||
title=title, | ||
file=ImageFile(imgfilehandle, name=title), | ||
**extra_opts) | ||
except Exception as err: | ||
# seems to mean that height/width calculation failed | ||
# (usually non-images) | ||
print('Error creating image for %s: %s' % (imgpath, err)) | ||
|
||
def get_media_files(self): | ||
# wagtail images support: GIF, JPEG, PNG, WEBP | ||
imgfile_path = '%s/**/*.*' % settings.MEDIA_ROOT | ||
# get filenames for all uploaded files | ||
filenames = glob.glob(imgfile_path, recursive=True) | ||
# aggregate files by basename to identify files with the same | ||
# name in different locations | ||
unique_filenames = defaultdict(list) | ||
for path in filenames: | ||
unique_filenames[os.path.basename(path)].append(path) | ||
|
||
# check files with the same name in multiple locations | ||
for key, val in unique_filenames.items(): | ||
if len(val) > 1: | ||
samefile = filecmp.cmp(val[0], val[1], shallow=False) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this only compares the first two, right? |
||
# if the files are the same | ||
if samefile: | ||
# keep the first one and remove the others from the | ||
# list of files to be migrated | ||
extra_copies = val[1:] | ||
|
||
# if not all the same, identify the largest | ||
# (all are variant/cropped versions of the same image) | ||
else: | ||
largest_file = None | ||
largest_size = 0 | ||
for filepath in val: | ||
size = os.stat(filepath).st_size | ||
if size > largest_size: | ||
largest_size = size | ||
largest_file = filepath | ||
|
||
extra_copies = [f for f in val if f != largest_file] | ||
Comment on lines
+333
to
+344
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. did we ever end up with situations where there are totally unrelated images with the same name (e.g. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah, nvm - I see your report on the issue that confirms this isn't a problem. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. right — here's the list #248 (comment) |
||
|
||
# remove duplicate and variant images that | ||
# will not be imported into wagtail | ||
for extra_copy in extra_copies: | ||
filenames.remove(extra_copy) | ||
|
||
return filenames | ||
|
||
def get_wagtail_image(self, image): | ||
# get the migrated wagtail image for a foreign-key image | ||
# using image file basename, which is migrated as image title | ||
return Image.objects.get(title=os.path.basename(image.name)) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
{% extends 'base.html' %} | ||
{% load wagtailcore_tags %} | ||
{% load wagtailcore_tags wagtailimages_tags %} | ||
|
||
{% block page-title %}{{ page.title }}{% endblock %} | ||
|
||
|
@@ -12,7 +12,15 @@ | |
{% endblock %} | ||
|
||
{% block content-header %} | ||
<header {% if page.header_image %}style="background-image:url('{{ MEDIA_URL }}{{ page.header_image }}')" | ||
{% comment %} | ||
image asset guidelines for landing page | ||
L,M @1x W: 2560px H: 680px, @2x W: 5120px H: 1360px | ||
S @1x W: 736px H: 320px, @2x W: 1472px H: 640px | ||
Comment on lines
+17
to
+18
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fwiw the L/M 2x seems huge; I don't think we ever want to be serving stuff larger than 2500px. but fine for now; we can always revisit. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These are Xinyi's guidelines, maybe they need to be revisited. I agree it's huge (and I think what we've always been serving, even for mobile — although I guess before we just served out whatever we had; will wagtail make images larger?). This is why I asked you about the background image and different sizes. Do you want to make a new issue for addressing this and add it to 3.1 ? It would be a good improvement to the site. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. created #266 and added to 3.1 |
||
{% endcomment %} | ||
{% if page.header_image %} | ||
{% image page.header_image fill-5120x1360 as header_img %} | ||
{% endif %} | ||
<header {% if page.header_image %}style="background-image:url('{{ header_img.url }}')" | ||
{% else %}class="no-background"{% endif %}> | ||
<div> | ||
<a href="{% url 'home' %}" class="home">/</a><h1>{{ page.title }}</h1> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this be a
get_or_create()
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I tried that — doesn't work because it has to be added to the parent to set the hierarchy properly