Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added an index for tweet_id in the _media tables for new bins #303

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

xmacex
Copy link
Contributor

@xmacex xmacex commented Feb 15, 2018

Fixes #299 for new bins, by considerably speeding up the

SELECT * FROM the_bin_name_media WHERE tweet_id = 916039373991361504

queries which are done from analysis/mod.export_tweets.php to look up fields from the media table.

Does not solve the issue with already existing bins, for which the same index must be created. I wrote following Python program (improvements welcome of course), but am unsure how to best contribute something similar to TCAT:

"""
Add tweet_id indices to a TCAT database so that export can feasibly be
done with additional fields.
"""

import logging
import sqlalchemy

CONNECTION = "mysql://username:password@hostname/database"

def add_index(table, engine):
    """Add the tweet_id index for table table."""
    if 'tweet_id' not in [idx.name for idx in table.indexes]:
        mindex = sqlalchemy.Index('tweet_id', table.c.tweet_id)
        logging.info("Indexing %s", table)
        return mindex.create(engine)

    logging.info("Already exists for %s", table)
    return None


def add_indices(suffix):
    """Add indices for all tables which have the suffix suffix."""
    logging.info("Adding indices")
    engine = sqlalchemy.create_engine(CONNECTION)
    results = []
    for table_name in engine.table_names():
        if table_name.endswith(suffix):
            meta = sqlalchemy.MetaData()
            table = sqlalchemy.Table(table_name, meta, autoload=True, autoload_with=engine)
            results.append(add_index(table, engine))
    return results


if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)
    print(add_indices("_media"))
    print(add_indices("_mentions"))
    print(add_indices("_hashtags"))

@xmacex xmacex changed the title Added an index for tweet_id in the _media tables for new bins. Fixes #299 Added an index for tweet_id in the _media tables for new bins Feb 15, 2018
…erous, suggesting modifying existing tables, so someone please watch after me :)
…ns. Honestly speaking, I am cannibalizing the '24/05/2016 Fix index of tweet_id on withheld tables', commit 16110ec, which looks most like the update I suggest here.
…d3bd27, which looks like it might have caused the export with media slowness by removing the index for tweet_id, but instead I am building incrementally on top of all the upgrades, in chronological order of this script
@xmacex
Copy link
Contributor Author

xmacex commented Feb 18, 2018

LOL please someone review my code for the upgrade script. There's a couple of layers of conditional logic and I am not entirely confident about the contexts in which the upgrade script run. I feel that what I propose could be simplified, but I'm sticking close to code I cannibalized from 16110ec. Of course I tested it on my toy installation, and it seems to work, but the upgrade path is long and I didn't test all execution paths though I hope I reasoned myself through all of them... a refactoring opportunity maybe?

I also made an inappropriate change in f0df436 by poking around an earlier 2015 change, but cancelled the change in 21c6a8b after I figured out the logic of the upgrade script, and wrote what I consider a more appropriate change by imitating what the upgrade script seems to be doing, ie. layering changes upon one another. Sorry and thank you 😺

@niczem
Copy link
Contributor

niczem commented Jul 3, 2020

Thanks @xmacex and sorry for the long time to review!

The code looks solid! Would it be possible to update the upgrade.php, so that your change can be rolled out?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

slow download when media is flagged
2 participants