-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow the user to override parameter metadata #518
Comments
Thanks. Perhaps this works better with R 4.2.1? |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Somewhat simpler/clearer reprex: library(DBI)
con <- dbConnect(
odbc::odbc(),
dsn = "MicrosoftSQLServer",
uid = "SA",
pwd = "BoopBop123"
)
dbGetQuery(con, "SELECT ?", params = list("\u2915"))[[1]]
#> [1] "?"
df <- data.frame(x = "\u2915")
dbWriteTable(con, "testunicode", df)
dbReadTable(con, "testunicode")
#> x
#> 1 ?
dbRemoveTable(con, "testunicode") Created on 2023-12-12 with reprex v2.0.2.9000 |
Hi @erikvona: Have you considered setting the
|
@detule I have. However, after turning library(DBI)
library(odbc)
conn <- dbConnect(odbc(), driver = "ODBC Driver 17 for SQL Server", server = "localhost", trusted_connection = "yes", AutoTranslate = "no")
# Specifying collation can cause incorrect interpretation of non-wide characters
# Actually, getting the correct character is hard on my system and requires both changing the SQL server side and the R side
dbGetQuery(conn, "SELECT ?", params = list("\u2915"))[[1]]
#> [1] "G\xf1\xf2"
dbGetQuery(conn, "SELECT ? COLLATE Latin1_General_100_CI_AI_SC_UTF8", params = list("\u2915"))[[1]]
#> [1] "⤕"
# Specifying collation does not affect wide characters
dbGetQuery(conn, "SELECT N'\u2915'")[[1]]
#> [1] "⤕"
dbGetQuery(conn, "SELECT N'\u2915' COLLATE Latin1_General_100_CI_AI_SC_UTF8")[[1]]
#> [1] "⤕"
# Reproduction information
sessionInfo()$locale
#> [1] "LC_COLLATE=Dutch_Netherlands.utf8;LC_CTYPE=Dutch_Netherlands.utf8;LC_MONETARY=Dutch_Netherlands.utf8;LC_NUMERIC=C;LC_TIME=Dutch_Netherlands.utf8"
dbGetQuery(conn, "SELECT CONVERT (varchar(256), SERVERPROPERTY('collation'))")
#>
#> 1 SQL_Latin1_General_CP1_CI_AS
dbDisconnect(conn) If we have native support for unicode characters in ODBC, we should be able to get the correct character regardless of database collation. If we don't, we may get code working for a specific database and a specific configuration of R, but it will be very hard to get it working for all users regardless of machine language and database configuration. |
Hey @erikvona: Thanks. I don't have a Windows machine, but will try to replicate your environment on my Linux box this week. Before I try that, however, what version of SQL Server are you connecting to? FWIW I would be interested in necking down what exactly the ask is. "Adding unicode support to SQL server" seems a bit overly broad. |
Most of this testing is done on SQL Server Developer version 16.0.1000.6. I'm not that familiar with nanodbc, but using the Windows ODBC API, this would entail:
I believe that in nanodbc, this would entail storing those strings as |
Thanks @erikvona: I am not convinced, but happy to be wrong. Is the summary below correct?
Does that sound reasonable? Thanks again for iterating. I want to make sure we are on the same page in terms of what you are seeing as not working. Cheers |
@detule Thanks for your summary, and the time invested. Indeed, if I'm working with strings where the data type is well-defined, then it's working, except if they have a non-unicode type. The main problem I'm having is when working with parameters with an undefined data type, In the following example, we can see using a library(DBI)
conn <- dbConnect(odbc::odbc(), driver = "ODBC Driver 17 for SQL Server", server = "localhost", trusted_connection = "yes", AutoTranslate = "no", encoding = "utf8")
dbExecute(conn, "SELECT ? As Col1 INTO tmp", params = list("\u2915"))
#> [1] 1
dbGetQuery(conn, "SELECT COLUMN_NAME, DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'tmp'")
#> COLUMN_NAME DATA_TYPE
#> 1 Col1 varchar
dbReadTable(conn, "tmp")
#> Error in eval(expr, envir, enclos): Incomplete multibyte sequence
#> Warning in dbClearResult(rs): Result already cleared
dbExecute(conn, "DROP TABLE tmp")
#> [1] 0 In other ODBC implementations, e.g. .NET/C/etc., I'm allowed to specify the data type when passing a parameter, so I can ensure the right type is used. E.g.: OdbcCommand cmd = conn.CreateCommand();
cmd.CommandText = "SELECT ?";
cmd.Parameters.Add("@par1", OdbcType.NVarChar).Value = "\u2915" Regarding unicode support for SQL server, as I understand it, it is supported as a storage medium, not as a code page. See the following example: library(DBI)
conn <- dbConnect(odbc::odbc(), driver = "ODBC Driver 17 for SQL Server", server = "localhost", trusted_connection = "yes", AutoTranslate = "no", encoding = "utf8")
dbExecute(conn, "CREATE TABLE tmp (col1 VARCHAR(50) COLLATE Latin1_General_100_CI_AI_SC_UTF8);")
#> [1] 0
dbExecute(conn, "INSERT INTO tmp SELECT ? COLLATE Latin1_General_100_CI_AI_SC_UTF8", params = list("\u2915"))
#> [1] 1
dbReadTable(conn, "tmp")
#> col1
#> 1 ⤕ As we can see, even though both the table and the insert query use an UTF-8 collation, and I specify AFAIK, all UTF-8 collations still use code pages to interpret data passed as VARCHAR (source indicating the code page part is separate from whether or not to use UTF-8). There unfortunately are no code page 65001 collations in SQL server (yet), all UTF-8 collations specify a different code page as well, and only specify the use of UTF-8 at the end, not in place of the code page. This means I can't adjust collation on the server to get this to work for all clients. This may be a more niche problem than I originally thought, since now that I understand it better it is easier to work around and I thought initially that more users would encounter this, but it's still a problem. |
Thanks for the write up. I think I have a better idea what the issue you are facing is. BLUF: If you have the ability to install the development/master@github version of this package, you could try the following
In a bit more detail: You are right,
Not surprising, I guess, as this call happens before the data is bound to the buffer, and the driver can only guess based on the target. At any rate, historically we have had some issues with In terms of what to do about this.
At any rate, if you agree, I think we can close this issue and open a more targeted feature request. Perhaps something like "Add an argument to |
I certainly agree the title could be better. I've renamed it to Allow the user to override parameter metadata to leave it open if this should happen in I think only doing this in For your interest, a quick reprex of the suggestion on my system: library(DBI)
conn <- dbConnect(odbc::odbc(), driver = "ODBC Driver 17 for SQL Server", server = "localhost", trusted_connection = "yes", AutoTranslate = "no", encoding = "utf8")
dbExecute(conn, "CREATE TABLE tmp (col1 VARCHAR(50) COLLATE Latin1_General_100_CI_AI_SC_UTF8);")
#> [1] 0
res <- dbSendQuery(conn, "INSERT INTO tmp SELECT ? COLLATE Latin1_General_100_CI_AI_SC_UTF8")
df <- data.frame( "param_index" = 1, "data_type" = -9, "column_size"=10, decimal_digits=0)
odbc:::result_describe_parameters(res@ptr, df)
dbBind(res, params = list("\u2915"))
dbClearResult(res)
dbReadTable(conn, "tmp")
#> col1
#> 1 ⤕
dbGetQuery(conn, "EXEC sp_describe_undeclared_parameters @tsql=N'INSERT INTO tmp SELECT @par COLLATE Latin1_General_100_CI_AI_SC_UTF8'")$suggested_system_type_name
#> [1] "varchar(50)"
dbGetQuery(conn, "EXEC sp_describe_undeclared_parameters @tsql=N'SELECT @par COLLATE Latin1_General_100_CI_AI_SC_UTF8'")
#> Error in `dbGetQuery()`:
#> ! ODBC failed with error 42000 from [Microsoft][ODBC Driver 17 for SQL
#> Server][SQL Server].
#> ✖ The parameter type for '@par' cannot be uniquely deduced; two possibilities
#> are 'sql_variant' and 'xml'.
#> • <SQL> 'EXEC sp_describe_undeclared_parameters @tsql=N'SELECT @par COLLATE
#> Latin1_General_100_CI_AI_SC_UTF8''
#> ℹ From 'nanodbc/nanodbc.cpp:1726'.
dbExecute(conn, "DROP TABLE tmp")
#> [1] 0 So the suggestion is certainly working, now I can properly bind When SQL server can infer the type as However, for these cases, I can now also make sure the parameter type is set to NVARCHAR, so the following: res <- dbSendQuery(conn, "SELECT ? As Col1 INTO tmp")
df <- data.frame( "param_index" = 1, "data_type" = -9, "column_size"=10, decimal_digits=0)
odbc:::result_describe_parameters(res@ptr, df)
dbBind(res, params = list("\u2915"))
dbClearResult(res) Allows me to create a table with Col1 set as dbGetQuery(conn, "SELECT COLUMN_NAME, DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'tmp'")
#> COLUMN_NAME DATA_TYPE
#>1 Col1 nvarchar It does mean I need to manually specify types when binding parameters (and in other functions such as |
Currently, the implementation for SQL server does not allow for unicode support when writing tables, quoting literals, or using parameters.
This leads to tables being created without unicode support or special unicode characters, and queries being created with literals without special unicode characters, leading to a comparison with a non-ascii unicode character returning the wrong results.
There have been some issues, like r-dbi/DBI#215, where it's suggested to encode strings as bytes, but that results in fragile code that relies on specific SQL server settings and locales, and only solves specific issues like writing tables. I'd like to write code that "just works" if at all possible.
Created on 2022-11-04 with reprex v2.0.2
Session info
The text was updated successfully, but these errors were encountered: