-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Common: Upgrade yauaa to 7.x #678
Conversation
oguzhanunlu
commented
Aug 16, 2022
•
edited by peel
Loading
edited by peel
- add yauaa_context/1-0-4 schema to commonFs2 test/iglu-client-embedded schemas
- change com.snowplowanalytics.snowplow.enrich.common.fs2.enrichments.YauaaEnrichmentSpec to use new version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the original issue (#639) is about more than simply bumping the yauaa library version. I believe the issue was also requesting us to make a code change to use a new feature of yauaa. See my comment in the issue.
definitely, I'll update this PR, thanks! |
a1cdb6e
to
f533bce
Compare
@@ -9,7 +9,7 @@ | |||
"vendorPrefixes": [ "com.snowplowanalytics" ], | |||
"connection": { | |||
"http": { | |||
"uri": "http://iglucentral.com" | |||
"uri": "http://iglucentral-dev.com.s3-website-us-east-1.amazonaws.com/feature/yauaa-1-0-4/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line will be removed when the iglu central PR is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to this comment, we're supposed to be filtering out a field. But I didn't see any changes to that effect.
In addition, I've added a few comments about things that can be improved. The main one being, that at present, if we have an explicit null
or ''
useragent, then we ignore all the client hint headers. I just want to ensure that this is indeed what we want to be doing.
@@ -40,7 +40,8 @@ object Dependencies { | |||
val hikariCP = "5.0.1" | |||
val jaywayJsonpath = "2.7.0" | |||
val iabClient = "0.2.0" | |||
val yauaa = "5.23" | |||
val yauaa = "7.7.0" | |||
val log4jToSlf4j = "2.18.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this required by the yauaa bump?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the related issue in yauaa.
|
||
/** | ||
* Gets the map of attributes retrieved by YAUAA from the user agent. | ||
* @return Map with all the fields extracted by YAUAA by parsing the user agent. | ||
* If the input is null or empty, a map with just the DeviceClass set to Unknown is returned. | ||
*/ | ||
def parseUserAgent(userAgent: String): Map[String, String] = | ||
def parseUserAgent(userAgent: String, headers: List[String]): Map[String, String] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's annoying but it feels like this could be named better, since it now parses more than the useragent string. Perhaps something like extractUserAgent
.
The userAgent
arg is presumably coming from being explicitly added as a field in the event, rather than from a header. If it's null
or empty, we are ignoring all headers. Is that what we want?
Alternatively, if the userAgent
is coming from headers, then this function can maybe only take a single headers: List[String]
argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The userAgent
field can come from two places:
- Extracted from the header
- Explicitly set as a field in the event.
By the time we reach this enrichment, we have dropped information about where userAgent
originally came from. This cannot be easily changed without quite a big re-write of the enrichment logic.
If the user agent is null or empty, then this request did not original from a browser. So it is fine to ignore client hint headers, because they are only relevant for web.
We cannot change the function to take a single headers
field. Imagine the case of server-side tracking where the headers contains the user agent of the server, but the tp2 payload explicitly sets the user agent of the end client. In this case, we need to override the User-Agent
header in the list. I put in a test for this case, so we don't get any accidental regressions if someone tries to change the code.
Renaming the function is a good idea. How about analyzeUserAgent
-- afterall that's what the 3rd a
in yauaa stands for.
} | ||
|
||
/** Yauaa 7.x added many new fields which are not in the 1-0-4 schema */ | ||
private val validFields = Set( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we get these fields from the properties of the schema? These hardcoded values will not be fun to maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this too! If you think more widely, it's a bigger piece of functionality that is currently missing from Iglu libs:
Given some json (the output of yauaa enrichment) and given a schema (the yauaa 1-0-4) schema, cast the json so it is valid against the schema. In this case, drop fields and keep just the valid fields.
I did not implement it here for a few reasons:
- Yauaa enrichment would then have a dependency on Iglu client, to get the schema. It makes the code change quite a lot bigger, because it introduces a
F[_]
and we have to start handling failures to fetch the schema. - It is a temporary problem that we cannot upgrade the yauaa schema to add extra fields. Once that problem is fixed then we don't need to maintain this list.
It's a balance. Weighing up 1 vs 2, and considering we have users waiting for this feature, I decided to implement it the easy way.
@@ -0,0 +1,233 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we have a duplicate of this schema here and in common-fs2/src/test
. Can we not use a single file? And do we need it hardcoded once the iglu-central PR is merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the tip of something bigger. We have a lot of iglu central schemas hardcoded into Enrich's test resources. Several of them are duplicated between common
and commonFs2
.
Yes I could deduplicate those duplicates, but then it would look a bit odd that commonFs2
has some gaps in the long list of schemas that it maintains.
do we need it hardcoded once the iglu-central PR is merged?
I like how they are locally available when the tests run. For unit tests especially, it is good to avoid going to the network. But that doesn't mean they need to be hardcoded in Enrich!
One solution could be to download selected iglu central schemas before the tests run, and add them to a shared resources directory so they are available for all tests. It could be implemented as a sbt task, so that test
depends on this new task. It would provide a better guarantee that our code is valid against the real Iglu Central.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be great! It exposes us to some rogue iglu central patch; but that seems less likely than us forgetting to update a schema somewhere in the hardcoded files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to avoid confusion: do you want to do it now or is that an idea for the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea for the future, but I will create an issue now.
We are filtering out the field. This is the relevant line. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
b907e1f
to
dbe12a2
Compare
Will merge as soon as the Iglu Central schema is released. |
dbe12a2
to
66dbeea
Compare