Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download Error · due to Content-length #457

Open
siwhitehouse opened this issue Jun 12, 2024 · 12 comments
Open

Download Error · due to Content-length #457

siwhitehouse opened this issue Jun 12, 2024 · 12 comments
Assignees

Comments

@siwhitehouse
Copy link
Contributor

siwhitehouse commented Jun 12, 2024

As a publisher,
I want to be informed via email when an activity file exceeds the maximum content-length
so that I can resolve the problem and have my activities appear in IATI products.

Acceptance criteria
When the Registry identifies that an activity file is too big and displays an error message on the data set page in the form:

Download Error · (_date_)
Content-length (_figure_) exceeds maximum allowed value 60000000

then it should also send an email to the contact email address to inform them. This email should only be sent when an update takes an activity file from below the maximum file size to above it.

That email should be similar to:

Dear {user_name}

The latest update of the {dataset_name] for {organisation} exceeds the maximum allowed size for IATI Activity files. The maximum size is 60MiB (62914560B), whereas your file is {file_size}.

This means your latest activity file will not appear in some IATI products, such as the IATI Datastore. It also shows an error on its IATI Registry webpage .

To resolve this you should split your IATI publication into multiple activity files.

Kind regards,

IATI Support

where {dataset_name], {organisation}, {file_size} and {URL} are placeholder variables.

We need the following to be in a position to raise a pull request, please:

  1. @cormachallinanderilinx to identify where the trigger for this needs to be
  2. @cormachallinanderilinx to replace the placeholder variables
  3. @Bjwebb to review the text of the email, with specific reference to the second paragraph
  4. I will check for guidance on splitting activity files and include a link or more information as relevant.

Also cc-ing @dan-odsc and @robredpath for info and comment

@robredpath
Copy link

Thanks for this @siwhitehouse .

We'll want make sure that this email only gets sent once - we wouldn't want to email someone every day about this. Ideally, it would be a bit more nuanced than that, but we're looking at how we contact people more broadly within our work at ODS so I don't think it's worth building out any sort of complex system in the Registry.

The maximum size is 60000000

I think this should just say 60MB, and {file_size} should also be expressed in MB. A suitable algorithm might be to always round up to the next tenth of an MB, so that it's always a bigger number than the maximum size.

To resolve this you should split your IATI publication into multiple activity files.

I think that "guidance on splitting activity files" is essential here - I had a conversation just yesterday about this and I don't think it's clear how this should be carried out. I may well have missed something, mind!

@siwhitehouse
Copy link
Contributor Author

Thanks @robredpath

I've added some additional text to the specification:

This email should only be sent when an update takes an activity file from below the maximum file size to above it.

This means that if an organisation amends an activity file to below the maximum file size they should again be notified if it subsequently goes back over, but they shouldn't get regular emails once they have breached.

@cormachallinanderilinx can you propose how this email is triggered based on this specification, please?

Searching the IATI Standard website led me to How To Create Your IATI Data Files, which only includes:

"Please ensure your IATI XML files are less than 40MB. Larger IATI activity files can be split down into multiple sub-files e.g. split by country, region or date, with each activity contained in only one file."

Which is not the specificity of advice I'd like us to be able to point to.

@Bjwebb
Copy link
Contributor

Bjwebb commented Jun 20, 2024

The maximum size is 60000000

This is less than the 60MiB, or 62914560B listed in the draft IATI data policy ("60MB" is ambiguous and could be used to mean either).

But it's quite a bit more than the 40MB listed on https://iatistandard.org/en/guidance/publishing-data/creating-files/how-to-create-your-iati-data-files/

@Bjwebb
Copy link
Contributor

Bjwebb commented Jun 20, 2024

This means your latest activity file will not appear in IATI products, such as the IATI Datastore. It also shows an error on its IATI Registry webpage .

This is correct for the datastore, I'm not sure about D-Portal.

@cormachallinanderilinx
Copy link
Collaborator

@siwhitehouse should be able to tell from the package dict if there is already a content length error
If there is already and error: we ignore
If there isnt one: we send email

@siwhitehouse
Copy link
Contributor Author

siwhitehouse commented Jul 23, 2024

The maximum size is 60000000

This is less than the 60MiB, or 62914560B listed in the draft IATI data policy ("60MB" is ambiguous and could be used to mean either).

But it's quite a bit more than the 40MB listed on https://iatistandard.org/en/guidance/publishing-data/creating-files/how-to-create-your-iati-data-files/

Thanks @Bjwebb

What do you suggest we use in the email, please? cc @robredpath

@siwhitehouse
Copy link
Contributor Author

This means your latest activity file will not appear in IATI products, such as the IATI Datastore. It also shows an error on its IATI Registry webpage .

This is correct for the datastore, I'm not sure about D-Portal.

I could rewrite this as

"This means your latest activity file will not appear in some IATI products, such as the IATI Datastore."

@siwhitehouse
Copy link
Contributor Author

@siwhitehouse should be able to tell from the package dict if there is already a content length error If there is already and error: we ignore If there isnt one: we send email

Thanks @cormachallinanderilinx

Does this work if there has been an error previously, the file has been successfully reduced in size below the maximum and then goes back over?

@Bjwebb
Copy link
Contributor

Bjwebb commented Jul 24, 2024

The maximum size is 60000000

This is less than the 60MiB, or 62914560B listed in the draft IATI data policy ("60MB" is ambiguous and could be used to mean either).
But it's quite a bit more than the 40MB listed on https://iatistandard.org/en/guidance/publishing-data/creating-files/how-to-create-your-iati-data-files/

Thanks @Bjwebb

What do you suggest we use in the email, please? cc @robredpath

Personally, I would increase the limit to 60MiB (62914560B), and then state that, as I think this would be the least likely to confuse people. You could say we suggest files should be less than 40MiB, but the registry enforces a hard limit of 60MiB to give some leeway.

@cormachallinanderilinx
Copy link
Collaborator

Does this work if there has been an error previously, the file has been successfully reduced in size below the maximum and then goes back over?
So once the error is fixed it will be removed from the package dict.
So if its reduced it will be removed, it it goes back over it again it will be re-added and based on the above logic send an email

@robredpath
Copy link

Personally, I would increase the limit to 60MiB (62914560B), and then state that

Agreed. I think we should just be straightforward and consistent.

@siwhitehouse
Copy link
Contributor Author

Thank you @Bjwebb @cormachallinanderilinx @robredpath
I have rewritten the acceptance criteria as per our discussion above. I haven't added anything about a recommended 40 MiB limit as we don't have an email that goes out at 40 MiB to the best of my knowledge and mentioning it here might cause confusion.

We agreed to add this to the project and to mark it as ready. Having reviewed the issue I'm going to put it at the top of Proposed as we haven't resolved the question Rob raised about advice on splitting files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants