Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing state/city in iOS first seen and active users tables prior to 5/17/2024 #6555

Open
data-sync-user opened this issue Nov 25, 2024 · 6 comments

Comments

@data-sync-user
Copy link
Collaborator

data-sync-user commented Nov 25, 2024

https://sql.telemetry.mozilla.org/queries/102596/source

The state/city info is missing in mozdata.firefox_ios.baseline_clients_first_seen prior to 5/17/2024. Is there any other way we can get the new profiles in a specific city (like Chicago, IL) prior to that date?

┆Issue is synchronized with this Jira Story
┆Attachments: image-20241220-132922.png | image-20250116-145956.png

@data-sync-user
Copy link
Collaborator Author

➤ George Kaberere commented:

Hey Krzysztof Ignasiak could you take a look at this ticket?

@data-sync-user
Copy link
Collaborator Author

➤ Krzysztof Ignasiak commented:

Hey Alex He ,

Was just having a look at this. Could you please provide more details around what the problem is that you’re seeing. I just tried running the query and it appears the geo data is there and goes back as far as 2020? I specifically took a look at Chicago: https://sql.telemetry.mozilla.org/queries/104583/source#257586 ( https://sql.telemetry.mozilla.org/queries/104583/source#257586 )

!image-20241220-132922.png|width=1376,height=758,alt="image-20241220-132922.png"!

The query I used:

SELECT
first_seen_date,
COUNT(DISTINCT client_id) AS cnt,
FROM
mozdata.firefox_ios.baseline_clients_first_seen
WHERE
submission_date < "2024-12-20"
AND country = "US"
AND city = "Chicago"
GROUP BY 1
ORDER BY 1 DESC

@data-sync-user
Copy link
Collaborator Author

➤ Alex He commented:

Hi Kik, when I run the query below, it seems to me that the iOS new profiles data starts on May 17, 2024. No city level geo info was available prior to that.

!image-20250116-145956.png|width=1747,height=1083,alt="image-20250116-145956.png"!

@data-sync-user
Copy link
Collaborator Author

➤ Krzysztof Ignasiak commented:

Alex He ok, I had a look and it appears it’s the geo_subdivision filter that is causing the data available in your query to be limited to May 2024. I will take a look and see if I can find out when the geo_subdivision information was added. If the data is in the stable ping for dates prior to May 2024 it just means it was not added to our derived ETL pipeline until that date. At this point the question would also become if it is worth the cost of having to recalculate all the baseline tables for this field to become available. This will require both money and time.

@data-sync-user
Copy link
Collaborator Author

➤ Alex He commented:

For my study I joined the table with {{mozdata.org_mozilla_ios_firefox.baseline }}to get the geo_subdivision or state info. The workaround works for now. If it takes huge effort to fix it, I don’t think it is worth it.

@data-sync-user
Copy link
Collaborator Author

➤ Krzysztof Ignasiak commented:

Hey Alex He so I just took a look and indeed it appears this data exists in the stable dataset. The geo_subdivision was only added to our derived ETL (which created tables like baseline_clients_daily and baseline_clients_first_seen) on May 17th funny enough by me: b460280 ( b4602805d9655a170b83ed4c30f401acfbd030c7|smart-link )

If I recall correctly now the reason this was added to allow the global_outages dataset to be more accurate, but at the time there was no need to recalculate historical data. My main concern here is that we’d have to rebuild all of the baseline tables for this data to be available prior to May 2024 in the original table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant