Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding not handled correctly on Shapefile upload #12981

Closed
marlowp opened this issue Mar 17, 2025 · 2 comments · Fixed by #12982
Closed

Encoding not handled correctly on Shapefile upload #12981

marlowp opened this issue Mar 17, 2025 · 2 comments · Fixed by #12982
Assignees

Comments

@marlowp
Copy link
Contributor

marlowp commented Mar 17, 2025

Expected Behavior

Given I have a Shapefile that is encoded via ISO-8859-1 and has a .cst file that denotes this (but no .cpg file)
When I upload this Shapefile using Geonode with a PostgreSQL/PostGIS backend set to UTF-8 encoding
Then the Shapefile is successfully converted from ISO-8859-1 to UTF-8 during the upload process and is successfully created as a new layer in Geonode.

Actual Behavior

The ogr2ogr command throws the following warning and error:

Warning 6: dataset XXX does not support layer creation option ENCODING
ERROR 1: Non UTF-8 content found when writing feature -1 of layer XXX

I can see that there is function which attempts to get the encoding from the .cst if there isn't a .cpg provided:

def _get_encoding(files):
if files.get("cpg_file"):
# prefer cpg file which is handled by gdal
return None
encoding = None
if files.get("cst_file"):
# GeoServer exports cst-file
encoding_file = files.get("cst_file")
with open(encoding_file, "r") as f:
encoding = f.read()
try:
codecs.lookup(encoding)
except LookupError as e:
encoding = None
logger.error(f"Will ignore invalid encoding: {e}")
return encoding

The problem appears to be that when it gets the encoding from the .cst file it is adding it to the ogr2ogr command via the ENCODING parameter:

if encoding:
additional_options.append(f"-lco ENCODING={encoding}")

This doesn't work because the target in this instance is the PostgreSQL database where the encoding is not managed at the layer (table) level, but rather at the database level. I think instead it should be setting a --config option, and as this is the handler for Shapefiles, it probably makes sense to use the SHAPE_ENCODING parameter, so something like:

if encoding:
  additional_options.append(f"--config SHAPE_ENCODING {encoding}")

Alternatively, the _get_encoding function could be updated so if there isn't a .cpg file then it creates one from the contents of the .cst file (as providing a .cpg file with the ISO-8859-1 encoding is handled correctly by ogr2ogr). This is probably more work though.

The following ZIP file can be used to replicate this issue. I've done some testing using the --config SHAPE_ENCODING method mentioned above and it resolves the issue in this instance.
hospitals_8859.zip

Specifications

  • GeoNode version: 4.3.1
  • Installation type (vanilla, geonode-project):
  • Installation method (manual, docker):
  • Platform:
  • Additional details:
@giohappy
Copy link
Contributor

Thanks @marlowp. Could you send a PR with the proposed enhancement?

@marlowp
Copy link
Contributor Author

marlowp commented Mar 17, 2025

Thanks @marlowp. Could you send a PR with the proposed enhancement?

Yep, will take a look. I'll also look at backporting the fix to the geonode-importer repository as I'm currently using GeoNode 4.3.1.

marlowp added a commit to marlowp/geonode that referenced this issue Mar 18, 2025
Use SHAPE_ENCODING parameter to inform the ogr2ogr command of the encoding for the Shapefile so it can transform accordingly.
@mattiagiupponi mattiagiupponi linked a pull request Mar 18, 2025 that will close this issue
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants