bug: Cache SQL columns and schemas does not work with a shared connector #2325

raulbonet · 2024-03-18T14:34:18Z

Singer SDK Version

0.35.2

Is this a regression?

Yes

Python Version

3.9

Bug scope

Taps (catalog, state, etc.)

Operating System

Linux

Description

Meltano throws an exception whenever there is a schema change in the same run.

This feature does not seem to work after the release of this feature.

Whenever a schema is changed, the cache of the connector should no longer valid. But currently, upon sink initialization, the connector is reused.

Indeed, if you look at the add_sqlsink() method, the sink is initialized with the existing connector

class SQLTarget(Target):
    def get_sink():
        # This part of the code calls `add_sqlsink()`
        ...
        
     def add_sqlsink():
       sink = sink_class(
            target=self,
            stream_name=stream_name,
            schema=schema,
            key_properties=key_properties,
            # HERE: the Sink is being initialized with the EXISTING connector
            connector=self.target_connector,
        )

Code

No response

The text was updated successfully, but these errors were encountered:

raulbonet · 2024-03-19T09:40:13Z

As per @pnadolny13 request here, I opened an issue with this.

I have been giving a thought to all of this and with the current version of the SDK I am not sure if we can optimize even more the calls to prepare_columns(), get_column_type() etc. , which I understand were the original reason to develop the caching.

The mentioned methods I think are called only as part of the prepare_table() method. But this method is only called by the add_sink() method (or I think that's how it should be as I mention here ). , which in turn is only called by the get_sink() method .
But the get_sink() method already checks for schema changes, and only calls add_sink() whenever there has been a schema change. If there were no changes, an existing sink is returned.

And if there has been a schema change, I want to invalidate the cache.

So I don't think that this process can be optimized any more, and therefore the caching cannot be done.

Any thoughts? I am new to the Meltano SDK so maybe I am understanding the whole code wrong!

edgarrmondragon · 2024-03-20T19:30:30Z

cc @BuzzCutNorman in case you have thoughts about this

BuzzCutNorman · 2024-03-20T20:21:52Z

@edgarrmondragon I don't have any initial thoughts. I will need to find a way to reproduce the issue.

raulbonet · 2024-03-22T07:32:44Z

I agree with the the approach and reasoning of @BuzzCutNorman : from an OOD perspective, the way it was intended is for the SQLConnector class to serve as a wrapper/interface around the connection, to encapsulate some common operations like "rename", "delete" or "get_ddl".

I think then the SQLConnector 's implementation class has to be as close as possible to the underlying database connector. It makes sense to leverage connection pooling, as the underlying connection does.

But I wonder if the logic of caching is the connector's responsibility. In any case, I don't think any caching can be done as I exposed.

Would it be OK if I just remove the caching implemented here?

BuzzCutNorman · 2024-03-22T13:49:53Z

@raulbonet Thank you for creating the issue and starting the conversation. I understand what you are saying in theory I just have not gotten time to reproduce the issue you are describing. Once I can observe it and work with it in a testing scenario, then I can attempt to have a knowledgeable discussion on ways to resolve this. If you could provide the error that you are seeing or the traceback of the error that would be very helpful. My guess is you are running your PR version of meltanolabs target-postgres but I don't know if you are seeing the error when you run the testing or if the error occurs when you are giving target-postgres data from a particular tap.

raulbonet · 2024-03-30T16:28:13Z

Hello @BuzzCutNorman and company,

Sorry, I have been sick and not been able to follow up on this until now. In this PR I replaced (almost) all the custom overridden methods of the target-postgres SDK by the native Meltano SDK functions.

If you do:

docker compose up
pytest

You will see that 2 tests of the test suite fail:
TestTargetPostgres.test_target_schema_updates
test_schema_updates

I am attaching the full log. It's interesting to note that they only fail once: if you re-execute the test, they will be successful: that's because the test fails but it is able to create the new column, so if the test is executed without restarting the Postgres instance, it is successful.

If I remove the part of the code belonging to the caching, the tests are successful.
pytest.log

raulbonet · 2024-04-02T21:50:39Z

Closed by mistake when merging to my personal fork, re-opened.

raulbonet added kind/Bug Something isn't working valuestream/SDK labels Mar 18, 2024

This was referenced Mar 30, 2024

fix: Removed unnecessary and problematic column caching #2352

Merged

fix: deprecate SQL column caching raulbonet/sdk#1

Merged

raulbonet added a commit to raulbonet/sdk that referenced this issue Mar 31, 2024

fix: deprecate SQL column caching (meltano#2325)

c7645a7

raulbonet closed this as completed in raulbonet/sdk#1 Mar 31, 2024

raulbonet reopened this Apr 2, 2024

edgarrmondragon closed this as completed in #2352 Apr 16, 2024

pnadolny13 mentioned this issue Apr 19, 2024

Is load_method applicable to this target? MeltanoLabs/target-postgres#266

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: Cache SQL columns and schemas does not work with a shared connector #2325

bug: Cache SQL columns and schemas does not work with a shared connector #2325

raulbonet commented Mar 18, 2024 •

edited

Loading

raulbonet commented Mar 19, 2024

edgarrmondragon commented Mar 20, 2024

BuzzCutNorman commented Mar 20, 2024 •

edited

Loading

raulbonet commented Mar 22, 2024 •

edited

Loading

BuzzCutNorman commented Mar 22, 2024

raulbonet commented Mar 30, 2024

raulbonet commented Apr 2, 2024

bug: Cache SQL columns and schemas does not work with a shared connector #2325

bug: Cache SQL columns and schemas does not work with a shared connector #2325

Comments

raulbonet commented Mar 18, 2024 • edited Loading

Singer SDK Version

Is this a regression?

Python Version

Bug scope

Operating System

Description

Code

raulbonet commented Mar 19, 2024

edgarrmondragon commented Mar 20, 2024

BuzzCutNorman commented Mar 20, 2024 • edited Loading

raulbonet commented Mar 22, 2024 • edited Loading

BuzzCutNorman commented Mar 22, 2024

raulbonet commented Mar 30, 2024

raulbonet commented Apr 2, 2024

raulbonet commented Mar 18, 2024 •

edited

Loading

BuzzCutNorman commented Mar 20, 2024 •

edited

Loading

raulbonet commented Mar 22, 2024 •

edited

Loading