Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out how to deal with type mismatch #40

Closed
jj0hns0n opened this issue Feb 13, 2015 · 6 comments
Closed

Figure out how to deal with type mismatch #40

jj0hns0n opened this issue Feb 13, 2015 · 6 comments

Comments

@jj0hns0n
Copy link
Owner

In this example, the tract field is an int in the table and a varchar in the layer.

https://dpaste.de/CbXC

https://dpaste.de/zy5Q#L9,35

@raprasad
Copy link

Possible Strategy: Require Data Formatting Before Table Upload

Thoughts about formatting before table upload

  1. API User (Dataverse system) is cleaning/formatting data
  2. The Dataverse system needs to know how to clean/format the data for each JoinTarget.
  3. GeoNode JoinTarget needs to provide some type of formatting information.
    1. How do you do this? (not sure of best path)

Example (brainstorming)

  1. Change JoinTarget objects to have a single Attribute (instead of a M2M relation)
  • change: attributes = models.ManyToManyField(Attribute)
  • to: attribute = models.ForeignKey(Attribute)
  1. Require JoinTarget objects to specify a JoinTargetFormatType.
  • This might be an FK to an object that specifies the Format.
  • Curator of the JoinTargets creates the JoinTargetFormatTypes--regulated by GeoNode admin
    • e.g. This is an expert user
  • For the API user (Dataverse system), the JoinTargetFormatType can offer information on what is expected
  • Very rough idea:
    • Note: regex_replacement_string, python_code_snippet, and tranformation_function_name are just brainstorming ideas
class JoinTargetFormatType(models.Model)

    name = models.CharField(max_length=255, help_text='Census Tract (6 digits, no decimal)') 
    description_shorthand = models.CharField(max_length=255, help_text='dddddd') 
    clean_steps = models.TextField(help_text='verbal description. e.g. Remove non integers. Check for empty string. Pad with zeros until 6 digits.')

    # ??regex_replacement_string - could get ugly
    regex_replacement_string = models.CharField(help_text='"[^0-9]"; Usage: re.sub("[^0-9]", "", "1234.99"'\
                                , max_length=255)

    # ??python_code_snippet - dangerous - not to run directly
    #   
    #   val_list = [re.sub("[^0-9]", "", `x`) for x in val_list]
    #   val_list = [x for x in val_list if len(x) > 0]
    #   val_list = [x.zfill(6) for x in val_list if len(x) > 0]
    #
    python_code_snippet = models.TextField(blank=True)

    # ??tranformation_function_name - do cleaning on GeoNode side with a set of predefined transformation functions
    #
    #   Viable - e.g. Use top geospatial identifiers and formats
    #
    tranformation_function_name = models.CharField(max_length=255, blank=True, choices=TRANSFORMATION_FUNCTIONS)

@jj0hns0n
Copy link
Owner Author

This should work just fine. Not sure what you would do with the regex on the other side, but it should work. Lets discuss.

@jj0hns0n
Copy link
Owner Author

Ok, I've implemented the basic model for this (without actually hooking anything up so that it does something).

The jointarget endpoint now returns this.

[
    {
        "category": "census-tract",
        "attribute": {
            "attribute": "TRACTCE",
            "type": "xsd:string"
        },
        "layer": "geonode:tl_2014_25_tract",
        "type": {
            "clean_steps": "Cast from Integer to string\r\nalter table <tablename> alter column tract type character varying(6);\r\nupdate <tablename> set tract = lpad(\"tract\", 6, '000000');",
            "name": "Census Tract (6 digits, no decimal, left pad with 0)",
            "description": "dddddd"
        },
        "id": 2
    }
]

@jj0hns0n
Copy link
Owner Author

This endpoint is now returning a meaningful error now as well on type mismatch.

Error Creating Join: Error Joining table boston_income_73g_5fihmpx to layer geonode:tl_2014_25_tract: operator does not exist: character varying = integer\nLINE 1: ..._income_73g_5fihmpx on tl_2014_25_tract."TRACTCE" = boston_i...\n ^\nHINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.\n

@jj0hns0n
Copy link
Owner Author

Note that for testing purposes, Im doing this manually on the server side by doing an explicit type cast from integer to string and then left padding with 0s.

@jj0hns0n
Copy link
Owner Author

Closing as dupe of #49

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants