Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TDL-13967: Add ProfitAndLoss report stream #37

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

savan-chovatiya
Copy link
Contributor

@savan-chovatiya savan-chovatiya commented Jul 26, 2021

Description of change

TDL-13967: Add ProfitAndLoss report stream

Finalized approach:

  • Fetch records for a minimum of 30 days if the bookmark is used from the state in sync.
    • If the bookmark is more than 30 days away then fetch all records up to the current day.
    • If the bookmark is less than 30 days away then fetch records for the last 30 days.
  • Fetch records from start date to current date if the start date is used in sync.
  • Bookmark is required so consider it as an INCREMENTAL stream.
  • Consider ‘ReportDate’ as a primary key and replication key.
  • Only consider Accrual accounting method for the report as of now.

Manual QA steps

  • Verified that record for every day is emitted from the start date to current date in start date scenario.
  • Verified that a minimum 30-day record is emitted if a bookmark from the state is used.

Risks

Rollback steps

  • revert this branch

@luandy64 luandy64 force-pushed the TDL-13967-add-profit-loss-report-stream branch from 8f786ff to b7b19cd Compare August 26, 2021 15:58
@luandy64
Copy link

luandy64 force-pushed the TDL-13967-add-profit-loss-report-stream branch from 8f786ff to b7b19cd

Ignore me. I was trying to reason about the report parser and wrote a test, but did not see that one was written in test_report_parser.py

params["end_date"] = end_tm_str

resp = self.client.get(self.endpoint, params=params)
self.parse_report_metadata(resp)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having parse_report_metadata() be recursive is excessive. The response object here is just three keys, Header (which we ignore), Rows, and Columns.

Parsing resp['Columns'] seems like a loop over resp['Columns']['Column'] and filtering the 'Metadata'.

Parsing resp['Rows'] is the only recursive part, so we can repurpose most of the logic in parse_report_metadata() into parse_rows() or something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split down parse_report_metadata() into two functions parse_report_columns() and parse_report_rows().

parse_report_columns() will parse resp['Columns'] with a loop over resp['Columns']['Column'] and filters Metadata.

parse_report_rows() is the recursive one which will parse resp['Rows'].

'data': []
}

start_tm_str = strftime(start_dttm)[0:10]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use str(start_dttm.date()) to get date

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated date retrieval as per suggestion.

@savan-chovatiya savan-chovatiya requested review from zachharris1 and removed request for KAllan357 September 23, 2021 10:44
columns = pileOfColumns.get('Column', [])
for column in columns:
metadatas = column.get('MetaData', [])
for md in metadatas:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer instead of using md have more descriptive variable name like metadata

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated md with proper variable name metadata

'''

if isinstance(pileOfRows, list):
for x in pileOfRows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stop using variable names like x. Please provide more descriptive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated x with descriptive name row

self.parse_report_rows(pileOfRows['Summary'])

if 'ColData' in pileOfRows.keys():
d = dict()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stop using variable names like d. Provide more descriptive variable name. Bad coding practice

Copy link
Contributor Author

@savan-chovatiya savan-chovatiya Oct 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will keep in mind using descriptive variable names and updated above d with a proper name.

Comment on lines 322 to 325
for x in pileOfRows['ColData'][1:]:
vals.append(x['value'])
d['values'] = vals
self.parsed_metadata['data'].append(d)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No x, d variable names. Provide descriptive names

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated x,d types of variables with proper descriptive names.

reports = self.day_wise_reports()
if reports:
for report in reports:
yield report
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between yield report and yield from report? I've seen in many places using yield from report

Copy link
Contributor Author

@savan-chovatiya savan-chovatiya Oct 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • yield from reports and yield inside for loops, both return generators with every element of reports.

  • The python generator will flush out after iteration over it and here, reports is a generator, and tap use last report from reports for storing bookmark.

  • If we use yield from reports then no way to retrieve the last report after it so we used yield with for loop here to utilize the loop's local variable report to retrieve the last report.

self.state = singer.write_bookmark(self.state, self.stream_name, 'LastUpdatedTime', strptime_to_utc(report.get('ReportDate')).isoformat())

@@ -193,6 +198,151 @@ class Vendors(Stream):
table_name = 'Vendor'
additional_where = "Active IN (true, false)"

class ReportStream(Stream):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add more comments to the code to explain at each section what is being done

Copy link
Contributor Author

@savan-chovatiya savan-chovatiya Oct 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more comments to the code for better code explanation.

Comment on lines +309 to +313
if 'Rows' in pileOfRows.keys():
self.parse_report_rows(pileOfRows['Rows'])

if 'Row' in pileOfRows.keys():
self.parse_report_rows(pileOfRows['Row'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between Rows and Row here?

Copy link
Contributor Author

@savan-chovatiya savan-chovatiya Oct 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rows is top-level information about profit and loss reports while Row is information about entries of reports. Documentation of the profit and loss report's metadata: documentation.

Copy link
Contributor

@kspeer825 kspeer825 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a suggestion.

@@ -230,3 +244,14 @@ def minimum_record_count_by_stream(self):
record_counts["vendors"] = 26

return record_counts

def dt_to_ts(self, dtime):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raising an exception after this for loop would prevent the test from getting back a None and failing in a confusing way in the case where the datetime format is unaccounted for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants