Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/migrate images #265

Merged
merged 51 commits into from
Dec 22, 2020
Merged
Show file tree
Hide file tree
Changes from 50 commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
9c83f35
begin adding page models
thatbudakguy Jan 7, 2020
4032485
begin homepage migration testing
thatbudakguy Jan 7, 2020
8101d52
fix tests for homepage creation migration
thatbudakguy Jan 8, 2020
baf2fb5
suppress pytest warning about junit
thatbudakguy Jan 8, 2020
282b22d
unregister wagtail models from django admin
thatbudakguy Jan 8, 2020
1c67449
move migration utilities to module
thatbudakguy Jan 8, 2020
37323a6
add blockquote feature to draftail editor
thatbudakguy Jan 8, 2020
ef0d31e
update migration tests
thatbudakguy Jan 8, 2020
e8baf1d
basic page model tests
thatbudakguy Jan 8, 2020
fb6bb36
make tests module to fix pytest naming
thatbudakguy Jan 8, 2020
6722c04
remove unused views.py
thatbudakguy Jan 8, 2020
b79d68b
update pages models
thatbudakguy Jan 8, 2020
83f84f3
remove LinkableSectionBlock from tests
thatbudakguy Jan 8, 2020
fe04bc5
update homepage template
thatbudakguy Jan 8, 2020
e7f85c0
fixup initial pages migration
thatbudakguy Jan 8, 2020
2b466a5
update app label to fix NoReverseMatch issue
thatbudakguy Jan 8, 2020
4037eb7
add sample pages fixture
thatbudakguy Jan 8, 2020
722d2e2
update page model tests
thatbudakguy Jan 8, 2020
2788225
add site changes to homepage migration
thatbudakguy Jan 8, 2020
446ffc3
minor edits
thatbudakguy Jan 8, 2020
60d5763
update migration utils and tests
thatbudakguy Jan 9, 2020
52a33f5
PEP8 comment style cleanup
thatbudakguy Jan 9, 2020
5af7481
implement create_revision, add logging to migration utils
thatbudakguy Jan 9, 2020
c80d02d
add wagtail site to sample pages fixture
thatbudakguy Jan 9, 2020
3947ff5
fix homepage creation migration test
thatbudakguy Jan 9, 2020
026da5c
add page_to_dict helper and update create_revision
thatbudakguy Jan 9, 2020
723dc2b
improve migration_utils and tests
thatbudakguy Jan 23, 2020
3883d3a
add utils for converting btw. html and StreamField
thatbudakguy Jan 28, 2020
c990005
Proof-of-concept exodus script
thatbudakguy Dec 15, 2020
00ff73a
Add homepage exodus
thatbudakguy Dec 15, 2020
9b33d29
Make exodus script idempotent
kmcelwee Dec 16, 2020
7ddbf9e
Restructure exodus command; fix slug conversion
thatbudakguy Dec 16, 2020
ce2cfdd
Update exodus script to use BFS traversal
thatbudakguy Dec 17, 2020
60baf76
Remove visited check for exodus script
thatbudakguy Dec 17, 2020
dcb6068
Add handling for project/event child pages
thatbudakguy Dec 17, 2020
ba4d226
Recursive version of exodus script
rlskoeser Dec 17, 2020
a91713c
Fix project page double-nesting
rlskoeser Dec 17, 2020
8086c62
Convert all pages/links to generic wagtail page; don't migrate home 2x
rlskoeser Dec 17, 2020
0de07ab
Placeholder for consult/cosponsor form logic
rlskoeser Dec 17, 2020
a807447
Create "migration" block and alter allowed tags
thatbudakguy Dec 18, 2020
2ec73c2
Update exodus script to use new migrated field and transfer form iframes
thatbudakguy Dec 18, 2020
276c9f5
Add basic content page template
thatbudakguy Dec 18, 2020
dfbd135
Update home page template
thatbudakguy Dec 18, 2020
11d675f
Update landing page template
thatbudakguy Dec 18, 2020
ab0a119
Generalize iframe css, shift form height into attributes
thatbudakguy Dec 18, 2020
55ce56f
Image exodus: convert media uploads to wagtail images #248
rlskoeser Dec 18, 2020
7210971
Update landing page creation to include wagtail header image
rlskoeser Dec 18, 2020
7d7f957
Use wagtail image tags to generate landing page header background image
rlskoeser Dec 18, 2020
576a2e3
Merge branch 'develop' into feature/migrate-images
rlskoeser Dec 21, 2020
4328ebc
Remove old files not removed by merge from develop
rlskoeser Dec 21, 2020
71e357c
Clear out collections (except root collection) when running exodus
rlskoeser Dec 22, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
143 changes: 138 additions & 5 deletions cdhweb/pages/management/commands/exodus.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,23 @@
"""Convert mezzanine-based pages to wagtail page models."""

import filecmp
import glob
import json
import os
import os.path
import shutil
from collections import defaultdict

from cdhweb.pages.models import ContentPage, HomePage, LandingPage
from django.conf import settings
from django.core.files.images import ImageFile
from django.core.management.base import BaseCommand
from django.db.models import Q
from mezzanine.core.models import CONTENT_STATUS_PUBLISHED
from mezzanine.pages import models as mezz_page_models
from wagtail.core.blocks import RichTextBlock
from wagtail.core.models import Page, Site
from wagtail.core.models import Page, Site, Collection, get_root_collection_id
from wagtail.images.models import Image

from cdhweb.pages.models import ContentPage, HomePage, LandingPage


class Command(BaseCommand):
Expand Down Expand Up @@ -41,6 +50,7 @@ def create_landingpage(self, page):
return LandingPage(
title=page.title,
tagline=page.landingpage.tagline, # landing pages have a tagline
header_image=self.get_wagtail_image(page.landingpage.image),
slug=self.convert_slug(page.slug),
seo_title=page._meta_title or page.title,
body=json.dumps([{
Expand Down Expand Up @@ -81,6 +91,9 @@ def handle(self, *args, **options):
Page.objects.filter(depth__gt=2).delete()
# PageRevision.objects.all().delete()

# convert media images to wagtail images
self.image_exodus()

# create the new homepage
old_homepage = mezz_page_models.Page.objects.get(slug="/")
homepage = self.create_homepage(old_homepage)
Expand Down Expand Up @@ -138,7 +151,7 @@ def handle(self, *args, **options):
.filter(Q(slug__startswith="events/") | Q(slug="year-of-data"))
for page in event_pages:
self.migrate_pages(page, events)

if projects:
# - migrate project pages but specify new projects list page as parent
# - process about page last so project pages don't nest
Expand Down Expand Up @@ -198,7 +211,6 @@ def migrate_pages(self, page, parent):
# recursively create and add new versions of all this page's children
for child in page.children.all():
self.migrate_pages(child, new_page)


def form_pages(self):
# migrate embedded google forms from mezzanine templates
Expand All @@ -219,3 +231,124 @@ def form_pages(self):
{"type": "migrated", "value": '<iframe title="Cosponsorship Request Form" height="3250" src="https://docs.google.com/forms/d/e/1FAIpQLSeP40DBM7n8GYgW_i99nRxY5T5P39DrIWyIwq9LggIwu4r5jQ/viewform?embedded=true">Loading...</iframe>'}
])
cosponsor.save()

# cached collections used for migrated media
collections = {
# get root collection so we can add children to it
'root': Collection.objects.get(pk=get_root_collection_id())
}

def get_collection(self, name):
# if we don't already have this collection, get it
if name not in self.collections:
# try to get it if it already exists
coll = Collection.objects.filter(name=name).first()
# otherwise, create it
if not coll:
coll = Collection(name=name)
Comment on lines +245 to +248
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this be a get_or_create()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I tried that — doesn't work because it has to be added to the parent to set the hierarchy properly

self.collections['root'].add_child(instance=coll)
self.collections['root'].save()

self.collections[name] = coll

return self.collections[name]

def image_exodus(self):
# generate wagtail images for all uploaded media

# mezzanine/filebrowser_safe doesn't seem to have useful objects
# or track metadata, so just import from the filesystem

# delete all images prior to run (clear out past migration attempts)
Image.objects.all().delete()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also do this for collections?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe? I have some extraneous collections right now that I think the current version of the script wouldn't create but older versions did; but once the script is stable I don't think it matters or gets us much. (I'm open to it, LMK)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think for consistency let's do it. gives me peace of mind to know that everything is being wiped, haha

# also delete any wagtail image files, since they are not deleted
# by removing the objects
shutil.rmtree('%s/images' % settings.MEDIA_ROOT, ignore_errors=True)
shutil.rmtree('%s/original_images' % settings.MEDIA_ROOT, ignore_errors=True)

# get media filenames to migrate, with duplicates filtered out
media_filenames = self.get_media_files()

for imgpath in media_filenames:
extension = os.path.splitext(imgpath)[1]
# skip unsupported files based on file extension
# NOTE: leaving this here in case we want to handle
# documents the same way
if extension in ['.pdf', '.svg', '.docx']:
continue
Comment on lines +278 to +281
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, this is handy - thx for the note

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah — thought about filtering them out when I filter out the duplicates, but wasn't sure yet how we'll handle the files


# if image is in a subdirectory under uploads (e.g. projects, blog)
# add it to a collection with that name
relative_path = os.path.dirname(imgpath) \
.replace('%s/uploads/' % settings.MEDIA_ROOT, '')

# there are two variants of Slavic DH, one with and one
# without a space; remove the space so they'll be in one collection
basedir = relative_path.split('/')[0].replace(' ', '')
collection = None
if basedir:
collection = self.get_collection(basedir)

with open(imgpath, 'rb') as imgfilehandle:
title = os.path.basename(imgpath)
# passing collection=None errors, so
# only specify collection option when we have one
extra_opts = {}
if collection:
extra_opts['collection'] = collection
try:
Image.objects.create(
title=title,
file=ImageFile(imgfilehandle, name=title),
**extra_opts)
except Exception as err:
# seems to mean that height/width calculation failed
# (usually non-images)
print('Error creating image for %s: %s' % (imgpath, err))

def get_media_files(self):
# wagtail images support: GIF, JPEG, PNG, WEBP
imgfile_path = '%s/**/*.*' % settings.MEDIA_ROOT
# get filenames for all uploaded files
filenames = glob.glob(imgfile_path, recursive=True)
# aggregate files by basename to identify files with the same
# name in different locations
unique_filenames = defaultdict(list)
for path in filenames:
unique_filenames[os.path.basename(path)].append(path)

# check files with the same name in multiple locations
for key, val in unique_filenames.items():
if len(val) > 1:
samefile = filecmp.cmp(val[0], val[1], shallow=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this only compares the first two, right?

# if the files are the same
if samefile:
# keep the first one and remove the others from the
# list of files to be migrated
extra_copies = val[1:]

# if not all the same, identify the largest
# (all are variant/cropped versions of the same image)
else:
largest_file = None
largest_size = 0
for filepath in val:
size = os.stat(filepath).st_size
if size > largest_size:
largest_size = size
largest_file = filepath

extra_copies = [f for f in val if f != largest_file]
Comment on lines +333 to +344
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we ever end up with situations where there are totally unrelated images with the same name (e.g. Untitled-1 or Upload or something generic)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, nvm - I see your report on the issue that confirms this isn't a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right — here's the list #248 (comment)


# remove duplicate and variant images that
# will not be imported into wagtail
for extra_copy in extra_copies:
filenames.remove(extra_copy)

return filenames

def get_wagtail_image(self, image):
# get the migrated wagtail image for a foreign-key image
# using image file basename, which is migrated as image title
return Image.objects.get(title=os.path.basename(image.name))

12 changes: 10 additions & 2 deletions cdhweb/pages/templates/cdhpages/landing_page.html
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{% extends 'base.html' %}
{% load wagtailcore_tags %}
{% load wagtailcore_tags wagtailimages_tags %}

{% block page-title %}{{ page.title }}{% endblock %}

Expand All @@ -12,7 +12,15 @@
{% endblock %}

{% block content-header %}
<header {% if page.header_image %}style="background-image:url('{{ MEDIA_URL }}{{ page.header_image }}')"
{% comment %}
image asset guidelines for landing page
L,M @1x W: 2560px H: 680px, @2x W: 5120px H: 1360px
S @1x W: 736px H: 320px, @2x W: 1472px H: 640px
Comment on lines +17 to +18
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw the L/M 2x seems huge; I don't think we ever want to be serving stuff larger than 2500px. but fine for now; we can always revisit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are Xinyi's guidelines, maybe they need to be revisited. I agree it's huge (and I think what we've always been serving, even for mobile — although I guess before we just served out whatever we had; will wagtail make images larger?). This is why I asked you about the background image and different sizes.

Do you want to make a new issue for addressing this and add it to 3.1 ? It would be a good improvement to the site.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created #266 and added to 3.1

{% endcomment %}
{% if page.header_image %}
{% image page.header_image fill-5120x1360 as header_img %}
{% endif %}
<header {% if page.header_image %}style="background-image:url('{{ header_img.url }}')"
{% else %}class="no-background"{% endif %}>
<div>
<a href="{% url 'home' %}" class="home">/</a><h1>{{ page.title }}</h1>
Expand Down
4 changes: 2 additions & 2 deletions cdhweb/pages/tests/test_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@


class TestHomePage(WagtailPageTests):

def test_can_create(self):
root = Page.objects.get(title='Root')
self.assertCanCreate(root, HomePage, nested_form_data({
Expand All @@ -30,7 +30,7 @@ def test_subpages(self):

class TestLandingPage(WagtailPageTests):
fixtures = ['test_pages']

def test_can_create(self):
home = HomePage.objects.get(title='Home')
self.assertCanCreate(home, LandingPage, nested_form_data({
Expand Down