Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
HAWQ-1605. Support INSERT in PXF JDBC plugin
(closes apache#1353) Fix incorrect TIMESTAMP handling PXF JDBC plugin update * Add support for INSERT queries: * The INSERT queries are processed by the same classes as the SELECT queries; * INSERTs are processed by the JDBC PreparedStatement; * INSERTs support batching (by means of JDBC); * Minor changes in WhereSQLBuilder and JdbcPartitionFragmenter: * Removed 'WHERE 1=1'; * The same pattern of spaces around operators everywhere ('a = b', not 'a=b'); * JdbcPartitionFragmenter.buildFragmenterSql() made static to avoid extra checks of InputData (proposed by @sansanichfb); * Refactoring and some microoptimizations; PXF JDBC refactoring * The README.md is completely rewritten; * Lots of changes in comments and javadoc comments; * Code refactoring and minor changes in codestyle Fixes proposed by @sansanichfb Add DbProduct for Microsoft SQL Server Notes on consistency in README and errors * Add an explicit note on consistency of INSERT queries (it is not guaranteed). * Change error message on INSERT failure * Minor corrections of README The fixes were proposed by @sansanichfb Improve WhereSQLBuilder * Add support of TIMESTAMP values; * Add support of operations <>, LIKE, IS NULL, IS NOT NULL. Fix proposed by @sansanichfb Throw an exception when trying to open an already open connection when writing to an external database using `openForWrite()`. Although the behaviour is different in case of `openForRead()`, it does not apply here. The second call to `openForWrite()` could be made from another thread, and that would result in a race: the `PreparedStatement` we use to write to an external database is the same object for all threads, and the procedure `writeNextObject()` is not `synchronized` (or "protected" some other way). Simplify logging; BatchUpdateException Simplify logging so that the logs produced by pxf-jdbc do not grow too big in case DEBUG is enabled (the removed logging calls provide the field types and names, and in most cases they are the same as in the data provided. The exceptions are still being logged). Add processing of BatchUpdateException, so that the real cause of an exception is returned to the user. PXF JDBC thread pool support Implement support of multi-threaded processing of INSERT queries, using a thread pool. To use the feature, set the parameter POOL_SIZE in the LOCATION clause of an external table (<1: Pool size is equal to a number of CPUs available to JVM; =1: Disable thread pool; >1: Use the given size of a pool. Not all operations are processed by pool threads: pool threads only execute() the queries, but they do not fill the PreparedStatement from OneRow. Redesign connection pooling * Redesign connection pooling: move OneRow objects processing to threads from the pool. This decreases the load of a single-thread part of PXF; * Introduce WriterCallable & related. This significantly simplifies the code of JdbcAccessor and allows to introduce new methods of processing INSERT queries with ease and enables fast hardcode tweaks for the same purpose. * Add docs on thread pool feature Support long values in PARTITION clause Support values of Java primitive type 'long' in PARTITION clause (both for RANGE and INTERVAL variables). * Modify JdbcPartitionFragmenter (convert all int variables to long) * Move parsing of INTERVAL values for PARTITION_TYPE "INT" to class constructor (and add a parse exception handler) * Simplify ByteUtil (remove methods to deal with values of type 'int') * Update JdbcPartitionFragmenterTest * Minor changes in comments Fix pxf-profiles-default.xml Remove ampersand from a description of JDBC profile from pxf-profiles-default.xml Remove explicit throws of IllegalArgumentException Remove explicit references to 'IllegalArgumentException', as the caller is probably unable to recover from them. 'IllegalStateException' is left unchanged, as it is thrown when the caller must perform an action that will resolve the problem ('WriterCallable' is full). Other runtime exceptions are explicitly listed in function definitions as before; their causes are usually known to the caller, so it could do something about them or at least send a more meaningful message about the error cause to the user. Proposed by Alex Denissov <[email protected]> Simplify isCallRequired() Make function 'isCallRequired()' body a one-line expression in all implementations of 'WriterCallable'. Proposed by Alex Denissov <[email protected]> Remove rollback and change BATCH_SIZE logic Remove calls to 'tryRollback()' and all processing of rollbacks in INSERT queries. The reason for the change is that rollback is effective for only one case: INSERT is performed from one PXF segment that uses one thread to perform that INSERT, and the external database supports transactions. In most cases, there are more than one PXF segment that performs INSERT, and rollback is of no use then. On the other hand, rollback logic is cumbersome and notably increases code complexity. Due to the removal of rollback, there is no longer a need to keep BATCH_SIZE infinite as often as possible (when BATCH_SIZE is infinite, the number of scenarious of rollback() failing is lower (but this number is not zero)). Thus, setting a recommended (https://docs.oracle.com/cd/E11882_01/java.112/e16548/oraperf.htm#JJDBC28754) value makes sense. The old logic of infinite batch size also remains active. Modify README.md: minor corrections, new BATCH_SIZE logic Proposed by Alex Denissov <[email protected]> Change BATCH_SIZE logic * Modify BATCH_SIZE parameter processing according to new proposals apache#1353 (comment) * Update README.md * Restore fallback to non-batched INSERTs in case the external database (or JDBC connector) does not support batch updates Proposed by Alex Denissov <[email protected]> Proposed by Dmitriy Pavlov <[email protected]> Modify processing of BATCH_SIZE parameter Modify BATCH_SIZE parameter processing according to the proposal apache#1353 (comment): * Update allowed values of BATCH_SIZE and their meanings * Introduce explicit flag of presentness of a BATCH_SIZE parameter * Introduce DEFAULT_BATCH_SIZE constant in JdbcPlugin * Move processing of BATCH_SIZE values to JdbcAccessor * Update README.md Proposed by @divyabhargov, @denalex Fix column type for columns converted to TEXT Modify column type processing so that the column type is set correctly for fields that: * Are represented as columns of type TEXT by GPDBWritable, but whose actual type is different * Contain NULL value Before, the column type code was not set correctly for such columns due to a check of NULL field value. Proposed and authored by @divyabhargov removed parseUnsignedInt
- Loading branch information