Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New behaviour of S3 storage exists() breaks current workflow #1430

Closed
violuke opened this issue Jul 9, 2024 · 37 comments · Fixed by #1484
Closed

New behaviour of S3 storage exists() breaks current workflow #1430

violuke opened this issue Jul 9, 2024 · 37 comments · Fixed by #1484

Comments

@violuke
Copy link

violuke commented Jul 9, 2024

Since upgrading from 1.14.3 to 1.14.4, we're having a problem with this code:

from django.core.files.base import ContentFile
from django.core.files.storage import storages

storage = storages["default"]

file_name = storage.save("example/example.txt", ContentFile(content=b"hello there!"))

if not storage.exists(file_name):
    raise Exception(f"File {file_name} does not exist")

print("All good!")

On 1.14.3, I get the "All good!" print but with 1.14.4, I get the Exception.

Django 5.0.6. I suspect this is a problem with #1381 edit: see below

I've downgraded again for now, but any help would be much appreciated. Thanks to everyone involved for a really helpful package 🙏

@violuke violuke changed the title Bug introduced in 1.14.4 with S3 (related to #1381 maybe) Bug introduced in 1.14.4 with S3 Jul 9, 2024
@violuke
Copy link
Author

violuke commented Jul 9, 2024

This is probably related to #1429 that was posted while I was typing my issue up 😄 The fact that #1422 was not in the changelog is why I wrongly assumed the issue might be related to a different PR. Thank you.

@jschneier
Copy link
Owner

jschneier commented Jul 10, 2024

Thanks for posting. The reason I made this change was two-fold:

  1. A security issue (which I reported) that was patched upstream: https://www.djangoproject.com/weblog/2024/jul/09/security-releases/
  2. The overwrite functionality was added to the upstream FilesystemStorage (part of how I found the issue) django/django@0b33a3a and as part of this they implemented it in terms of exists, see this comment: Fixed #35326 -- added allow_overwrite parameter to FileSystemStorage. django/django#18020 (comment)

I'm not surprised people were relying on the previous behavior and I frankly find the Django documentation bizarre but I feel a slight bit paralyzed.

@jschneier jschneier changed the title Bug introduced in 1.14.4 with S3 New behaviour of S3 storage exists() breaks current workflow Jul 10, 2024
@saemideluxe
Copy link

Sorry, I am not versatile enough to get the full picture of the security issue.
However, if you say those changes are necessary to prevent a potential directory traversal exploit, I am definitely in favor of not reversing this change.
However, it would still be good to have a way to a) allow for overwrites b) checking if a file already exists at the same time. Or are you saying that this combination (by its nature) is the vulnerability, per se?

@jschneier
Copy link
Owner

Sorry, I am not versatile enough to get the full picture of the security issue. However, if you say those changes are necessary to prevent a potential directory traversal exploit, I am definitely in favor of not reversing this change. However, it would still be good to have a way to a) allow for overwrites b) checking if a file already exists at the same time. Or are you saying that this combination (by its nature) is the vulnerability, per se?

Before the upstream fix, yes. Now that Django has patched, no.

The issue, in short, is that if the Storage overrides get_available_name and does not mirror all of the security checks in it (which this library respectively did and did not because the CVE fixes were added later) then calling storage.save(name) with a user supplied name was vulnerable. NOTE that field.save was not vulnerable because we were not overwriting generate_filename (although there were folks via GitHub search who did so).

Also, I still provide support for unsupported versions of Django (though they will be dropped shortly) which did not get the patch.

@jschneier
Copy link
Owner

The corresponding fix in Django: django/django@fe4a0bb

@saemideluxe
Copy link

Okay, thanks for the explanation of the vulnerability.

Am I correct in assuming that one of following solutions would work?

  • Wait until django-storage can declare the requirement Django>=4.2.14 (because we can assume from then on the vulnerability is fixed upstream) and then remove the lines at
    if self.file_overwrite:
    return False
  • Let users of django-storage (who rely on the old behavior) subclass the S3Storage class and overwrite the exists method according to their needs.

@jschneier
Copy link
Owner

If the fix in Django is there then the following I think is fine because we are no longer relying on Storage.get_available_name for any of the security requirements.

def get_available_name(self, name, max_length=None):
  # TODO: something with max_length?
  if self.file_overwrite:
    return name
  return super().get_available_name(name, max_length=max_length)

@violuke
Copy link
Author

violuke commented Jul 10, 2024

Thanks everyone for the comments and time spent on this 🙏

I think I understand the reasoning behind the change originally, and we don't want a security issue, of course, but I personally feel that this should be reverted (or partially, at least).

In my opinion, it's a bug of this package that I can create a file, then call .exists() and get a False returned value? I should get the right boolean response to match the Django docs. Especially with file_overwrite being set to its default value of True and it not being obviously a setting that impacts if the exists() function works or not.

If people are bothered about having security fixes (which they should be) then they should be updating this package and Django. If they don't do updates properly and continue to use old versions of this package and/or Django, then that's their (ill-advised) choice.

If this change is reverted, then my thinking is that they have the security risk because they haven't also updated Django, as the fix is in Django and as long as they use the latest patch release of a supported version, they'll have the security fix. How do others feel about this logic?

Thank you.

@craiga
Copy link

craiga commented Jul 10, 2024

Yeah, the documentation is a bit confusing:

Returns True if a file referenced by the given name already exists in the storage system, or False if the name is available for a new file.

It's not clear what should happen when a file referenced by the given name already exists and the name is available for a new file.

I suppose I'd lean toward emulating whatever happens with a local filesystem, and maybe issue a warning if the user is using an unpatched version of Django.

Another solution would be to add a head method to the S3 storage which implements the old functionality. Possibly with a log message in exists to try and explain what's going on?

I'd be happy to try making a contribution of either of these fixes.

@th3hamm0r
Copy link
Contributor

I just stumbled over this issue too, so I'm also a bit late to the changes in Django, which I'm not real fan of. I think the basic "exists" operation of a storage class should return exactly that information.

But since this change is now part of Django too, I think the best option is to also set file_overwrite to False by default analog to the FileSystemStorage.

We're using wagtail a lot and setting file_overwrite to False is actually required there (see docs), but we've missed it for some other usages... So this default now seems to be the safer option, also because its analog to Django's handling.

@jschneier
Copy link
Owner

I think we should probably open a bug with Django. If someone has the time, please also to link to this issue.

@th3hamm0r
Copy link
Contributor

Good idea, done: https://code.djangoproject.com/ticket/35604
I hope it is explained clearly, otherwise you are welcome to edit it.

@sarahboyce
Copy link
Contributor

Thank you all for this discussion 👍

From what I understand, #1422 is motivated by django/django@0b33a3a.
This aligns the behaviour of django-storages as if the storages defined in this package were inheriting from FileSystemStorage with the new flag allow_overwrite=True (which is added in Django 5.1).
This is also motivated by Django discouraging overwriting get_available_name due to security concerns

However, after the security release to Django, having get_available_name to be defined similar to as described by @jschneier should be secure and django-storages can keep the previous behaviour of .exists()

The "side question" is "what does .exists() mean when you can overwrite files" given that the docs here are a little ambiguous

Returns True if a file referenced by the given name already exists in the storage system, or False if the name is available for a new file.

Given the engagement on this discussion around what .exists() should do in the overwrite case, and the recent security patch to Django, it makes sense for FileSystemStorage to also take this approach and overwrite get_available_name but leave the existing exists behaviour.
A clarification to the FileSystemStorage docs also make sense

This is roughly my conclusion 👍 I will also write on the ticket

So, in short, I hear you and will work on this.
Changes to django-storages are independent to this effort and can be worked on seperately

@nitsujri
Copy link

Hi all, I also attempted to upgrade to 1.14.4 and, as well, was hit by:

if self.file_overwrite:
return False

For us, this behavior based on variable naming is quite confusing because we want both exists() to return True AND file_overwrite to be True.

Doubt it matters, but our use case is uploading compiled web assets to S3.

I'm reading this thread and am very confused (I understand a lot of people are). At the root, is unintuitive that exists() is affected by file_overwrite because those should be independent options?

@jschneier
Copy link
Owner

Given the way Django is moving there will be another release that partially undoes this change.

derlin added a commit to divio/django-storage-url that referenced this issue Jul 19, 2024
The overwrite_files is True by default, and should be settable from the
django settings using AZURE_OVERWRITE_FILES.

Removing this override will keep the default behaviour (True), while
still allowing users to change it.

NOTE: this change is important because django / django-storages changed
the semantics of the .exists() function, which now returns whether a
path is free for upload (and not whether the file already exists on the
remote storage). With overwrite_files set to True, .exists() will now
always return False.

See jschneier/django-storages#1430
derlin added a commit to divio/django-storage-url that referenced this issue Jul 19, 2024
The overwrite_files is True by default, and should be settable from the
django settings using AZURE_OVERWRITE_FILES.

Removing this override will keep the default behaviour (True), while
still allowing users to change it.

NOTE: this change is important because django / django-storages changed
the semantics of the .exists() function, which now returns whether a
path is free for upload (and not whether the file already exists on the
remote storage). With overwrite_files set to True, .exists() will now
always return False.

See jschneier/django-storages#1430
@berkcoker
Copy link

Seconding the opinion that this should have been included in the changelog. We had this commit break the functionality of our app unexpectedly

@mindcruzer
Copy link

mindcruzer commented Jul 22, 2024

This change breaks the ManifestFilesMixin when collecting static files. Before attempting to get the file blob to compute the hash, it first checks if the file exists. If file overwrite is enabled it's not possible for this mixin to function.

@sarahboyce
Copy link
Contributor

django/django@8d6a20b just landed in Django

I want to thank @jschneier for all his work on the recent Django improvements and security release.
I know it's generated extra work (and possibly stress), and a lot of that work is behind closed doors. You've been fantastic and t's very appreciated, thank you 👏 👏 🥇

derlin added a commit to divio/django-storage-url that referenced this issue Jul 25, 2024
The overwrite_files is True by default, and should be settable from the
django settings using AZURE_OVERWRITE_FILES.

Removing this override will keep the default behaviour (True), while
still allowing users to change it.

NOTE: this change is important because django / django-storages changed
the semantics of the .exists() function, which now returns whether a
path is free for upload (and not whether the file already exists on the
remote storage). With overwrite_files set to True, .exists() will now
always return False.

See jschneier/django-storages#1430
lunika added a commit to openfun/marsha that referenced this issue Jul 30, 2024
In version 1.14.4 there are at least two modifications made that lead to
a breaking change in Marsha.
The most annoying one is linked to this issue:
jschneier/django-storages#1430 and we have to
wait a newer version with a fix to have the previous behaviour. This fix
is related to a security issue in django. This security is fixed in
version 4.2.14 and we already use this version, so we are safe.
The second one is related to how the signature in computed when an url
is generated. Previously the signature was generated no matter if we
need it or not and then we choose to remove the signautre part using the
private method `_strip_signing_parameters`. This private does not exists
anymore, instead a new setting is used, we have to set the setting
`querystring_auth` to False to not compute the signature, it's real
improvement as it saves the cost of computing the signature.
@MaximePETIT-code
Copy link

Okay, thanks for the explanation of the vulnerability.

Am I correct in assuming that one of following solutions would work?

  • Wait until django-storage can declare the requirement Django>=4.2.14 (because we can assume from then on the vulnerability is fixed upstream) and then remove the lines at
    if self.file_overwrite:
    return False
  • Let users of django-storage (who rely on the old behavior) subclass the S3Storage class and overwrite the exists method according to their needs.

Okay, thanks for the explanation of the vulnerability.

Am I correct in assuming that one of following solutions would work?

  • Wait until django-storage can declare the requirement Django>=4.2.14 (because we can assume from then on the vulnerability is fixed upstream) and then remove the lines at
    if self.file_overwrite:
    return False
  • Let users of django-storage (who rely on the old behavior) subclass the S3Storage class and overwrite the exists method according to their needs.

jbpenrath added a commit to openfun/joanie that referenced this issue Sep 26, 2024
In version 1.14.4 there is one modification made that lead to a
breaking change in Joanie as explain in jschneier/django-storages#1430.
We have to wait a newer version with a fix to have the previous behaviour. This
fix is related to a security issue in django. This security is fixed in version
4.2.14, and we already use this version, so we are safe.
jbpenrath added a commit to openfun/joanie that referenced this issue Sep 26, 2024
In version 1.14.4 there is one modification made that lead to a
breaking change in Joanie as explain in jschneier/django-storages#1430.
We have to wait a newer version with a fix to have the previous behaviour. This
fix is related to a security issue in django. This security is fixed in version
4.2.14, and we already use this version, so we are safe.
@UlvacMoscow
Copy link

Hello guys, i am updating django from 3.0.14 to 3.2.25.
And in one step i have problem with django-compressor(not important version, with all, my case 2.4.1 and 4.4)
Job correct worked python manage.py compress
and in Botos3Storage i saw my compressed manifest ("CACHE/manifest.json").
I callabled method from compessor.templatetags.compress import get_offline_manifest
and got empty dict, and code up raise error OfflineGenerationError
You have offline compression enabled but key "c94109d08dfb2a6d0d20e54e22e06888713051f85a7c27250bf88fd863e41112" is missing from offline manifest. You may need to run "python manage.py compress". Here is the original content:
Then i downgrade to django-storages==1.14.3
the error disappeared

@rcludwick
Copy link

So I hit this week because it broke our APIs, and unfortunately spent a couple hours trying to figure out why this broke.

I feel like this should have been a version bump to 1.15.0 with a warning emitted to the logs about this at a minimum when the return False is hit, or maybe raise an exception instead. Errors shouldn't really fail silently unless explicitly silenced.

Further we're still on 1.14.4 with this behavior still even though Django 5.0.7 is out. Could you check the version of django using import django; django.VERSION or provide an override through an environment variable?

A lot of people are pinning the older version of django-storages which seems bad if there's a security hole.

@dalpii
Copy link

dalpii commented Nov 26, 2024

Maybe the new S3 Put-If-Match will help with overwrites:
https://aws.amazon.com/pt/about-aws/whats-new/2024/11/amazon-s3-functionality-conditional-writes/

@dopry
Copy link

dopry commented Jan 23, 2025

.exists should simply check if the file exists. allow_overwrite has absolutely nothing to do with whether a file exists. This is a straight forward bug. allow_overwrite should only impact methods writing the file. whoever merged that exists implementation should be more mindful.

@ddelange
Copy link

django/django@8d6a20b just landed in Django

I want to thank @jschneier for all his work on the recent Django improvements and security release. I know it's generated extra work (and possibly stress), and a lot of that work is behind closed doors. You've been fantastic and t's very appreciated, thank you 👏 👏 🥇

@sarahboyce does this mean that this issue can be closed? since that commit has been released in django 5.2a1, when we upgrade to django 5.2 we can also use django-storages 1.14.4?

@micmarc
Copy link

micmarc commented Jan 23, 2025

@sarahboyce does this mean that this issue can be closed? since that commit has been released in django 5.2a1, when we upgrade to django 5.2 we can also use django-storages 1.14.4?

As someone who has also been impacted by this, I think the behavior of the S3 exists() needs to be reverted before this issue can be closed.

@sarahboyce
Copy link
Contributor

I think the behavior of the S3 exists() needs to be reverted

This is also my understanding. 5.1 already has the fix from Django's end 👍

@koleror
Copy link

koleror commented Feb 2, 2025

Hey there,
Any update on this?
I'd like to unlock this requirement (currently locked on django-storages[s3]<1.14.4 due to this issue).

@brycedrennan
Copy link

brycedrennan commented Feb 5, 2025

Work around:

from botocore.exceptions import ClientError
from storages.backends.s3 import S3Storage


class FixedS3Storage(S3Storage):
    """
    Custom S3Storage class that fixes the exists() bug in django-storages 1.14.4.

    The bug causes exists() to return False for files that do in fact exist.
    See: https://github.com/jschneier/django-storages/issues/1430
    """

    def exists(self, name):
        try:
            self.connection.meta.client.head_object(
                Bucket=self.bucket_name,
                Key=self._normalize_name(self._clean_name(name)),
            )
            return True
         except ClientError as err: 
           if err.response["ResponseMetadata"]["HTTPStatusCode"] == 404: 
               return False 
    
           # Some other error was encountered. Re-raise it. 
           raise 
  

(now fixed to match django-storages function thanks to @ddelange pointer)

@ddelange
Copy link

ddelange commented Feb 5, 2025

@brycedrennan good pointer!

To avoid a blind except Exception and to avoid ClientError like missing credentials, only catch the actual 404 Not Found:

from botocore.exceptions import ClientError
from storages.backends.s3 import S3Storage


class FixedS3Storage(S3Storage):
    """
    Custom S3Storage class that fixes the exists() bug in django-storages 1.14.4.

    The bug causes exists() to return False for files that do in fact exist.
    See: https://github.com/jschneier/django-storages/issues/1430
    """

    def exists(self, name):
        try:
            self.connection.meta.client.head_object(
                Bucket=self.bucket_name,
                Key=self._normalize_name(self._clean_name(name)),
            )
        except ClientError as exc:
            if exc.response.get("ResponseMetadata", {}).get("HTTPStatusCode") == 404:
                return False
            else:
                raise
        else:
            return True

@brycedrennan
Copy link

You are correct. I should never trust an LLM to do a copy/paste (like I told it to). I intended to have it perfectly match source but with slight modification. Here is the source code it was supposed to mostly copy:

def exists(self, name):
if self.file_overwrite:
return False
name = self._normalize_name(clean_name(name))
try:
self.connection.meta.client.head_object(Bucket=self.bucket_name, Key=name)
return True
except ClientError as err:
if err.response["ResponseMetadata"]["HTTPStatusCode"] == 404:
return False
# Some other error was encountered. Re-raise it.
raise

@koleror
Copy link

koleror commented Feb 13, 2025

So what's the plan here?
Will all S3 users need to override the exist method, or will it eventually be fixed in a release?
Is it for retrocompat that these changes don't get reverted?
In that case, maybe we could change the behavior depending on django's version. A bit dirty I guess though. Or make it a separate setting maybe?

@rcludwick
Copy link

rcludwick commented Feb 13, 2025

@koleror I think most people have pinned the django-storages at a non-borked version if they need that function to work.

I think a better approach would have been a settings config which needed an override if you really want that behavior. And then it would be your decision.

I hope the devs take that approach in the future, as it broke us, and it wasn't clear in the Changelog that exists() was broken.

@koleror
Copy link

koleror commented Feb 14, 2025

Yes, of course, I followed the same approach as I think most people, which is fine for now because we're basically skipping only one version. But this can't stay like this for very long. At some point, there might be some compatibility issues with other packages or even python versions that would probably be handled in newer versions.
My question is more about long-term maintenance.
Yes a setting, or a revert if no longer needed, would be a much better approach I agree.

@jschneier
Copy link
Owner

So what's the plan here? Will all S3 users need to override the exist method, or will it eventually be fixed in a release? Is it for retrocompat that these changes don't get reverted? In that case, maybe we could change the behavior depending on django's version. A bit dirty I guess though. Or make it a separate setting maybe?

The plan is that I wait for someone to provide a good PR that fixes the issue. I am happy to accept such a fix and the lack of it is why I haven't released a new version. I'm unlikely to do the work myself in the near-term future though.

I also don't feel comfortable shipping a version that leaves the minor security issue I reported. Time will solve that problem eventually but I agree it's not ideal to wait.

@rcludwick
Copy link

@jschneier

I appreciate the effort that you've put into this project. But on the other hand it broke us in production. And there wasn't anything in the changelog about why the behavior was changed. It was very frustrating on a Friday afternoon last year when an api wasn't working in production to find out that it was broken on purpose.

First it's not your issue, it's django's issue to fix. This maybe a poor analogy but we don't shut down the internet because MacOS suddenly has a vulnerability. It's Apple's job to fix it for their devices.

Second, everyone here is all grown adults. We can make informed choices. If this is a "minor" issue, why did you make the major change without a major version bump?

@jschneier
Copy link
Owner

@jschneier

I appreciate the effort that you've put into this project. But on the other hand it broke us in production. And there wasn't anything in the changelog about why the behavior was changed. It was very frustrating on a Friday afternoon last year when an api wasn't working in production to find out that it was broken on purpose.

Yes, completely understand the frustration, and my apologies about that.

First it's not your issue, it's django's issue to fix. This maybe a poor analogy but we don't shut down the internet because MacOS suddenly has a vulnerability. It's Apple's job to fix it for their devices.

I get your point but Apple can also proactively push & inform about fixes. It's honestly heartening that people are upgrading as much as they are. Part of the pickle I've gotten myself into here is that I want to drop older versions of Django but also want to fix this problem first.

Second, everyone here is all grown adults. We can make informed choices. If this is a "minor" issue, why did you make the major change without a major version bump?

I made this change because that was what the upstream maintainers of Django decided was meant by the API contract. Thankfully, users of this package were able to convince them that it was the wrong decision and they updated again to the less-surprising definition before releasing because I made the change (I did try myself beforehand).

@jschneier
Copy link
Owner

Fix for this is out, thanks for your patience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.