Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jiraone delta extraction #116

Open
juliariza opened this issue Sep 4, 2023 · 2 comments
Open

Jiraone delta extraction #116

juliariza opened this issue Sep 4, 2023 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@juliariza
Copy link

juliariza commented Sep 4, 2023

Hello!

I have been using the jiraone python module to extract the historic information of issues.
Example code from docs:

from jiraone import LOGIN, PROJECT

user = "email"
password = "token"
link = https://yourinstance.atlassian.net/
LOGIN(user=user, password=password, url=link)

if name == 'main':

jql = "project in (PYT) ORDER BY Rank DESC"  
PROJECT.change_log(jql=jql)

Its great except everytime I run it, it extracts from the beginning which takes a long time. I was wondering if it was possible to extract just the delta/updates not the whole of the information.

Thanks!

@princenyeche
Copy link
Owner

Hi @juliariza
The change_log method doesn't do that but with JQL, you can just get only issues that have been updated by altering the JQL to search those issues. Although to your point, it will extract everything. I think it is time for me to update that method to allow multiple processing to append to the same document.

@princenyeche princenyeche self-assigned this Sep 21, 2023
@princenyeche princenyeche added the enhancement New feature or request label Sep 21, 2023
@princenyeche
Copy link
Owner

Hey @juliariza

About your initial ask, I think it's doable but there are 3 problems to solve. There has to be some storage of each extraction to know

  • The last time a specific issue key was updated and compare it with what's on the Jira environment
  • What history item was last updated per issue key and if new items exist anytime it is checked (this is how to know the delta)
  • The filename that was used to store the history data and at what point is this new insertion going to start within the saved file when a new history item is found.

While I like the challenge of creating such a feature, I don't think I would be doing that anytime soon. However, with the new version 0.7.9, you can make a very fast asynchronous request for history extraction reducing the long waiting time.
For example: minimal style

# import statement
PROJECT.async_change_log(
               jql, folder="TEST", file="sample.csv"
            )

If you need the extraction to be faster, you can increase the workers for running simultaneous extraction requests as the default is 4.
For example: comprehensive style

# import statement
PROJECT.async_change_log(
               jql, folder="TEST", file="sample.csv", workers=20, flush=10
            )

How it works

Let's say your JQL has 100 issues returned with the search. What the above code does, is take 20 issue keys out of that list and run the request at the same time. It does it 4 more times in batches of 20 requests at a time rather than the normal 1 request at a time that the change_log method provides making it faster to extract data. You can increase the number of workers to 50 or even 100. However, it is recommended to leave it at an acceptable number that wouldn't take too much CPU resource or even make too many requests to your Jira environment. The flush argument causes a delay in seconds just to allow any final asynchronous request that might be running before the file is written to disk.

I believe this will help with the performance improvement if you're extracting more data frequently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants