Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BI-OA: Open Access KPI #7

Open
9 of 11 tasks
ErnestaP opened this issue Apr 22, 2024 · 1 comment
Open
9 of 11 tasks

BI-OA: Open Access KPI #7

ErnestaP opened this issue Apr 22, 2024 · 1 comment

Comments

@ErnestaP
Copy link

ErnestaP commented Apr 22, 2024

Collect all needed queries from Anne for collecting needed data, google doc:

  • Closed access
  • Bronze access
  • Green access
  1. After collecting the feedback from Anne, we found out that initial query was not fully correct, however, Anne is not sure how the wanted query should look like. S̶h̶e̶ ̶c̶o̶n̶t̶a̶c̶t̶e̶d̶ ̶C̶D̶S̶.̶ ̶I̶f̶ ̶t̶h̶e̶y̶ ̶a̶r̶e̶n̶'̶t̶ ̶b̶e̶ ̶a̶b̶l̶e̶ ̶t̶o̶ ̶p̶r̶o̶v̶i̶d̶e̶ ̶t̶h̶e̶ ̶q̶u̶e̶r̶y̶ ̶w̶e̶ ̶n̶e̶e̶d̶,̶ ̶w̶i̶l̶l̶ ̶n̶e̶e̶d̶ ̶t̶o̶ ̶w̶r̶i̶t̶e̶ ̶t̶h̶e̶ ̶p̶a̶r̶s̶e̶r̶ ̶w̶h̶i̶c̶h̶ ̶w̶i̶l̶l̶ ̶h̶e̶l̶p̶ ̶t̶o̶ ̶g̶e̶t̶ ̶d̶a̶t̶a̶ ̶w̶e̶ ̶n̶e̶e̶d̶. We need to write a parser. In this case: green_open_access = these_articles - overlapping gold_open_access_article
  • Gold access
  1. The same problem as before. We need a query that let us take the articles that have 540__3:'publication' AND 540__a:'CC-BY' in the same 540 field, or 540__3:'publication' and 540__a:’CC BY’ in the same 540 field. The current query returns articles that have wanted fields, but not necessarily in the same 540 field.

Setup Airflow workflow:

  • Add tables migration
  • Write a DAG

DB

  • Connect Airflow workflow to DB
  • Connect Supserset BI to DB

Feedback regarding results

  • Ask Anne for feedback regarding the results we are getting from CDS. Adjust if needed

Colors

Anne wants to have specific colors for the pie chart pieces:

closed_access -> 🔵
green_open_access -> 🟤
bronze_open_access -> 🟢
gold_open_access -> 🟡

However, we have limitations on the superset side. Most likely we will need to add config on Kubernetes side:
chat


Deploy

  • Deploy to QA
  • Deploy to PROD
@ErnestaP
Copy link
Author

In the end, we did not get data from CDS as we wanted, by using queries. We came up with the idea to parse a bigger set and extract green and golden access from it.
PR: cern-sis/bi-dags#28
Ticket: https://cern.service-now.com/service-portal?id=ticket&table=u_request_fulfillment&n=RQF2650918

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant