Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor converting ODK XML to OSM XML #259

Merged
merged 32 commits into from
Jun 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
6dace61
fix: Set self.file to None so we don't get duplicate footers
rsavoye May 30, 2024
969d7ea
fix: Add leisure fields, ignore cellular
rsavoye Jun 2, 2024
8625197
fix: Add function to parse select_multiple
rsavoye Jun 2, 2024
904445c
fix: Use new convertMultiple() to support select_multiple in XForms
rsavoye Jun 2, 2024
edea8ba
fix: Refactor test case for select_multiple, now it actually works
rsavoye Jun 2, 2024
77ac3d9
fix: Minor reformatting and updating of code comment blocks, also add…
rsavoye Jun 2, 2024
d8b004c
fix: Update and reformat all code comment blocks
rsavoye Jun 2, 2024
2adbfa9
fix: Move pareseXLS to the Convert class so it can be shared
rsavoye Jun 2, 2024
ca88b6b
fix: Move createEntry() to Convert class so it can be shared
rsavoye Jun 2, 2024
864feed
fix: refactor converting a JSON file from Central to OSM XML and add …
rsavoye Jun 3, 2024
9ec1ec6
fix: Move code for writing to output files to it's own class
rsavoye Jun 6, 2024
6eda12a
fix: Correctly parse an instanxe file from ODK Collect, and make a dict
rsavoye Jun 6, 2024
98a3430
fix: Major refactoring, it now works like the other conversion classes
rsavoye Jun 6, 2024
9bf510a
fix: Parse the XLS file so conversion is better
rsavoye Jun 6, 2024
3499231
fix: Convert ODK XML to OSM XML
rsavoye Jun 6, 2024
eda2ada
fix: move basename from covert and make a standalone function
rsavoye Jun 10, 2024
1763ce5
fix: Move all file output and other code to a shareable class
rsavoye Jun 10, 2024
18f4601
fix: Move all file output and other code to a shareable class
rsavoye Jun 10, 2024
86cd1ce
fix: Now that more code is shareable, merge all parsers programs into…
rsavoye Jun 10, 2024
7be8ce3
fix: Start moving all parsers to this file
rsavoye Jun 10, 2024
29bde7c
fix: Drop now unused command line utilities
rsavoye Jun 14, 2024
653b43f
fix: Add a comment about the files about to be deleted sincee they've…
rsavoye Jun 15, 2024
97dc7fe
fix: Improve basemap() to split on : as well as -
rsavoye Jun 15, 2024
1bacb97
fix: Add XMLParser, derived from ODKInstance
rsavoye Jun 15, 2024
b253a9d
fix: be less verbose
rsavoye Jun 15, 2024
3647f12
fix: Use new XMLParser() instead of ODKInstance
rsavoye Jun 15, 2024
3f7b94d
fix: Use new Parsers() class instead of the old files
rsavoye Jun 15, 2024
acef7b5
fix: Cleanup conflicts caused by minor reformatting of the comment bl…
rsavoye Jun 15, 2024
9953555
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 15, 2024
cb0fe7a
fix: Fix tests to work with new ODKParsers() class
rsavoye Jun 15, 2024
0e42198
fix: Make sure value also isn't NULL
rsavoye Jun 15, 2024
5e93da4
Merge branch 'multi' of github.com:hotosm/osm-fieldwork into multi
rsavoye Jun 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 26 additions & 43 deletions osm_fieldwork/ODKInstance.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,13 @@
#

import argparse
import json
import logging
import os
import re
import sys

# from shapely.geometry import Point, LineString, Polygon
from collections import OrderedDict

import flatdict
import xmltodict

# Instantiate logger
Expand All @@ -38,8 +37,8 @@ def __init__(
filespec: str = None,
data: str = None,
):
"""This class imports a ODK Instance file, which is in XML into a data
structure.
"""This class imports a ODK Instance file, which is in XML into a
data structure.

Args:
filespec (str): The filespec to the ODK XML Instance file
Expand All @@ -50,6 +49,7 @@ def __init__(
"""
self.data = data
self.filespec = filespec
self.ignore = ["today", "start", "deviceid", "nodel", "instanceID"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably nodel --> model?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ODKInstance.py is about to be deleted, as now parsers.py is used instead. Also now the tags to ignore are in the xforms.yaml file.

Copy link
Collaborator Author

@rsavoye rsavoye Jun 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, json2osm.py is about to be deleted, now odk2osm.py handles all 3 formats from ODK.

if filespec:
self.data = self.parse(filespec=filespec)
elif data:
Expand All @@ -59,7 +59,7 @@ def parse(
self,
filespec: str,
data: str = None,
):
) -> dict:
"""Import an ODK XML Instance file ito a data structure. The input is
either a filespec to the Instance file copied off your phone, or
the XML that has been read in elsewhere.
Expand All @@ -69,9 +69,9 @@ def parse(
data (str): The XML data

Returns:
(list): All the entries in the IOPDK XML Instance file
(dict): All the entries in the OSM XML Instance file
"""
rows = list()
row = dict()
if filespec:
logging.info("Processing instance file: %s" % filespec)
file = open(filespec, "rb")
Expand All @@ -80,47 +80,29 @@ def parse(
elif data:
xml = data
doc = xmltodict.parse(xml)
import json

json.dumps(doc)
tags = dict()
data = doc["data"]
for i, j in data.items():
if j is None or i == "meta":
flattened = flatdict.FlatDict(data)
rows = list()
pat = re.compile("[0-9.]* [0-9.-]* [0-9.]* [0-9.]*")
for key, value in flattened.items():
if key[0] == "@" or value is None:
continue
print(f"tag: {i} == {j}")
pat = re.compile("[0-9.]* [0-9.-]* [0-9.]* [0-9.]*")
if pat.match(str(j)):
if i == "warmup":
continue
gps = j.split(" ")
tags["lat"] = gps[0]
tags["lon"] = gps[1]
if re.search(pat, value):
gps = value.split(" ")
row["lat"] = gps[0]
row["lon"] = gps[1]
continue
if type(j) == OrderedDict or type(j) == dict:
for ii, jj in j.items():
pat = re.compile("[0-9.]* [0-9.-]* [0-9.]* [0-9.]*")
if pat.match(str(jj)):
gps = jj.split(" ")
tags["lat"] = gps[0]
tags["lon"] = gps[1]
continue
if jj is None:
continue
print(f"tag: {i} == {j}")
if type(jj) == OrderedDict or type(jj) == dict:
for iii, jjj in jj.items():
if jjj is not None:
tags[iii] = jjj
# print(iii, jjj)
else:
print(ii, jj)
tags[ii] = jj
else:
if i[0:1] != "@":
tags[i] = j
rows.append(tags)
return rows

# print(key, value)
tmp = key.split(":")
if tmp[len(tmp) - 1] in self.ignore:
continue
row[tmp[len(tmp) - 1]] = value

return row


if __name__ == "__main__":
Expand All @@ -147,3 +129,4 @@ def parse(

inst = ODKInstance(args.infile)
data = inst.parse(args.infile)
# print(data)
36 changes: 25 additions & 11 deletions osm_fieldwork/convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ def privateData(
self,
keyword: str,
) -> bool:
"""Search he private data category for a keyword.
"""Search the private data category for a keyword.

Args:
keyword (str): The keyword to search for
Expand Down Expand Up @@ -207,14 +207,14 @@ def convertEntry(
# If the tag is in the config file, convert it.
if self.convertData(newtag):
newtag = self.convertTag(newtag)
if newtag != tag:
logging.debug(f"Converted Tag for entry {tag} to {newtag}")
# if newtag != tag:
# logging.debug(f"Converted Tag for entry {tag} to {newtag}")

# Truncate the elevation, as it's really long
if newtag == "ele":
value = value[:7]
newval = self.convertValue(newtag, value)
logging.debug("Converted Value for entry '%s' to '%s'" % (value, newval))
# logging.debug("Converted Value for entry '%s' to '%s'" % (value, newval))
# there can be multiple new tag/value pairs for some values from ODK
if type(newval) == str:
all.append({newtag: newval})
Expand Down Expand Up @@ -287,7 +287,7 @@ def convertTag(
if low in self.convert:
newtag = self.convert[low]
if type(newtag) is str:
logging.debug("\tTag '%s' converted tag to '%s'" % (tag, newtag))
# logging.debug("\tTag '%s' converted tag to '%s'" % (tag, newtag))
tmp = newtag.split("=")
if len(tmp) > 1:
newtag = tmp[0]
Expand Down Expand Up @@ -315,18 +315,20 @@ def convertMultiple(
Returns:
(list): The new tags
"""
tags = list()
tags = dict()
for tag in value.split(" "):
low = tag.lower()
if self.convertData(low):
newtag = self.convert[low]
# tags.append({newtag}: {value})
if newtag.find("=") > 0:
tmp = newtag.split("=")
tags.append({tmp[0]: tmp[1]})
if tmp[0] in tags:
tags[tmp[0]] = f"{tags[tmp[0]]};{tmp[1]}"
else:
tags.update({tmp[0]: tmp[1]})
else:
tags.append({low: "yes"})
logging.debug(f"\tConverted multiple to {tags}")
tags.update({low: "yes"})
# logging.debug(f"\tConverted multiple to {tags}")
return tags

def parseXLS(
Expand Down Expand Up @@ -396,6 +398,8 @@ def createEntry(
"action",
)

if key in self.ignore:
continue
# When using existing OSM data, there's a special geometry field.
# Otherwise use the GPS coordinates where you are.
if key == "geometry" and len(value) > 0:
Expand All @@ -412,8 +416,18 @@ def createEntry(
attrs[key] = value
# log.debug("Adding attribute %s with value %s" % (key, value))
continue

if value is not None and value != "no" and value != "unknown":
if key == "username":
tags["user"] = value
continue
items = self.convertEntry(key, value)
if key in self.types:
if self.types[key] == "select_multiple":
vals = self.convertMultiple(value)
if len(vals) > 0:
for tag in vals:
tags.update(tag)
continue
if key == "track" or key == "geoline":
# refs.append(tags)
# log.debug("Adding reference %s" % tags)
Expand Down
145 changes: 3 additions & 142 deletions osm_fieldwork/CSVDump.py → osm_fieldwork/csvdump.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,8 @@
import sys
from datetime import datetime

from geojson import Feature, FeatureCollection, Point, dump

from osm_fieldwork.convert import Convert
from osm_fieldwork.osmfile import OsmFile
from osm_fieldwork.support import basename
from osm_fieldwork.xlsforms import xlsforms_path

# Instantiate logger
Expand Down Expand Up @@ -59,124 +57,6 @@ def __init__(
self.entries = dict()
self.types = dict()

def lastSaved(
self,
keyword: str,
) -> str:
"""Get the last saved value for a question.

Args:
keyword (str): The keyword to search for

Returns:
(str): The last saved value for the question

"""
if keyword is not None and len(keyword) > 0:
return self.saved[keyword]
return None

def updateSaved(
self,
keyword: str,
value: str,
) -> bool:
"""Update the last saved value for a question.

Args:
keyword (str): The keyword to search for
value (str): The new value

Returns:
(bool): If the new value got saved

"""
if keyword is not None and value is not None and len(value) > 0:
self.saved[keyword] = value
return True
else:
return False

def createOSM(
self,
filespec: str,
):
"""Create an OSM XML output files.

Args:
filespec (str): The output file name
"""
log.debug("Creating OSM XML file: %s" % filespec)
self.osm = OsmFile(filespec)
# self.osm.header()

def writeOSM(
self,
feature: dict,
):
"""Write a feature to an OSM XML output file.

Args:
feature (dict): The OSM feature to write to
"""
out = ""
if "id" in feature["tags"]:
feature["id"] = feature["tags"]["id"]
if "lat" not in feature["attrs"] or "lon" not in feature["attrs"]:
return None
if "refs" not in feature:
out += self.osm.createNode(feature)
else:
out += self.osm.createWay(feature)
self.osm.write(out)

def finishOSM(self):
"""Write the OSM XML file footer and close it."""
# This is now handled by a destructor in the OsmFile class
# self.osm.footer()

def createGeoJson(
self,
filespec: str = "tmp.geojson",
):
"""Create a GeoJson output file.

Args:
filespec (str): The output file name
"""
log.debug("Creating GeoJson file: %s" % filespec)
self.json = open(filespec, "w")

def writeGeoJson(
self,
feature: dict,
):
"""Write a feature to a GeoJson output file.

Args:
feature (dict): The OSM feature to write to
"""
# These get written later when finishing , since we have to create a FeatureCollection
if "lat" not in feature["attrs"] or "lon" not in feature["attrs"]:
return None
self.features.append(feature)

def finishGeoJson(self):
"""Write the GeoJson FeatureCollection to the output file and close it."""
features = list()
for item in self.features:
if len(item["attrs"]["lon"]) == 0 or len(item["attrs"]["lat"]) == 0:
log.warning("Bad location data in entry! %r", item["attrs"])
continue
poi = Point((float(item["attrs"]["lon"]), float(item["attrs"]["lat"])))
if "private" in item:
props = {**item["tags"], **item["private"]}
else:
props = item["tags"]
features.append(Feature(geometry=poi, properties=props))
collection = FeatureCollection(features)
dump(collection, self.json)

def parse(
self,
filespec: str,
Expand All @@ -201,9 +81,9 @@ def parse(
tags = dict()
# log.info(f"ROW: {row}")
for keyword, value in row.items():
if keyword is None or (value and len(value) == 0):
if keyword is None or len(value) == 0:
continue
base = self.basename(keyword).lower()
base = basename(keyword).lower()
# There's many extraneous fields in the input file which we don't need.
if base is None or base in self.ignore or value is None:
continue
Expand All @@ -228,7 +108,6 @@ def parse(
if base == "longitude" and len(value) == 0:
value = row["warmup-Longitude"]
items = self.convertEntry(base, value)

# log.info(f"ROW: {base} {value}")
if len(items) > 0:
if base in self.saved:
Expand All @@ -253,24 +132,6 @@ def parse(
all_tags.append(tags)
return all_tags

def basename(
self,
line: str,
) -> str:
"""Extract the basename of a path after the last -.

Args:
line (str): The path from the json file entry

Returns:
(str): The last node of the path
"""
tmp = line.split("-")
if len(tmp) == 0:
return line
base = tmp[len(tmp) - 1]
return base


def main():
"""Run conversion directly from the terminal."""
Expand Down
Loading
Loading