-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for replace option when using insert_rows with MsSqlHook #40836
Added support for replace option when using insert_rows with MsSqlHook #40836
Conversation
… can also support the replace option and generate a MERGE INTO statement for MSSQL so replace with insert_rows also works on MSSQL
…de, just an sql file
@potiuk Is there a way to ignore the required license header in sql files within tests? Searched through the project but didn't find any example, saw other sql file and seemed that one wasn't impacted so wonder why |
Never mind the hive.sql also has the header, problem is I use the content of the sql for my assertion in the test, as I could have it as a python string as it would be sensitive to formatting and thus make false assertion failures |
I've added the license header in the sql file and written a function which filters out the license header so the assertion still works |
… header whe reading a file with the load_file method from conftest module
It's a pitty that I can't use the functools.lru_cache decorator? I could use the cachetools cache decorator, but then that would add an extra dependency. Dunno why unctools.lru_cache is banned, as per default it has a maxsize of 128 entries per cached method? This method needs to be cache as it will be called multiple times per chunked insert rows, otherwise we could end up having the same issue as in #40609. A decorator also looks nicer instead of manually implementing it with a class member as it is a cross cutting concern. |
Functools.lru_cache should not be used on object methods only on functions. https://stackoverflow.com/questions/33672412/python-functools-lru-cache-with-instance-methods-release-object And we are using methodtools.lru_cache instead (already our dependency). See #37757. Maybe a good idea (feel free to add PR) is to update the description of error and suggest to use methodtools insteadl |
Thank you Jarek good to know didn't catch that one but should have ;) |
Integration test for mssql is failing, but don't know why cause I didn't change anything related to connection here for MsSqlHook.
|
Not the most stable server via docker compose - it happens. Rerun it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice feature.
This PR adds support for the replace option when using insert_rows with the MsSqlHook. Unfortunately MSSQL doesn't have a REPLACE or an UPSERT SQL syntax. In order to achieve the same functionality, you have to use the MERGE INTO statement. In order to be able to use that statement in a dynamic way, you have to get the primary keys of the table you want to replace into. This means the current implementation of the _generate_insert_sql with the _replace_statement_format field won't work, as some assumptions have been made there which makes it impossible to achieve the same with the current _generate_insert_sql implementation of the DbApiHook.
So to make it work for MsSqlHook, I've override the _generate_insert_sql method in the MsSqlHook and also added a method named get_primary_keys which dynamically retrieves the PK associated to the given table as this will be needed for the MERGE INTO SQL statement. If the _generate_insert_sql method is called for insert statement, then the original implementation is being called, otherwise the MSSQL specific implementation is being executed.
Another advantage is that you can benefit from fast performance while upserting with the executemany option.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.