forked from r-dbi/odbc
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
355 lines (289 loc) · 12.1 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
---
output:
github_document:
html_preview: false
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-",
eval = as.logical(Sys.getenv("ODBC_EVAL_README", "false"))
)
```
# odbc
<!-- badges: start -->
[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](http://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/odbc)](https://cran.r-project.org/package=odbc)
[![R build status](https://github.com/r-dbi/odbc/workflows/R-CMD-check/badge.svg)](https://github.com/r-dbi/odbc)
[![Coverage Status](https://img.shields.io/codecov/c/github/r-dbi/odbc/master.svg)](https://codecov.io/github/r-dbi/odbc?branch=master)
<!-- badges: end -->
The goal of the odbc package is to provide a DBI-compliant interface to [Open
Database
Connectivity](https://docs.microsoft.com/en-us/sql/odbc/microsoft-open-database-connectivity-odbc?view=sql-server-ver15)
(ODBC) drivers. This allows for an efficient, easy to setup connection to any
database with ODBC drivers available, including [SQL
Server](https://www.microsoft.com/en-us/sql-server/),
[Oracle](https://www.oracle.com/database), [MySQL](https://www.mysql.com/),
[PostgreSQL](https://www.postgresql.org/), [SQLite](https://sqlite.org/index.html) and
others. The implementation builds on the
[nanodbc](https://nanodbc.github.io/nanodbc/) C++ library.
- [Installation](#installation)
- [Windows](#windows)
- [MacOS](#macos)
- [Linux - Debian / Ubuntu](#linux---debian-ubuntu)
- [R](#r)
- [Connecting to a Database](#connecting-to-a-database)
- [Connection Strings](#connection-strings)
- [DSN Configuration files](#dsn-configuration-files)
- [Usage](#usage)
- [Table and Field information](#table-and-field-information)
- [Reading](#reading)
- [Writing](#writing)
- [Querying](#querying)
- [Benchmarks](#benchmarks)
- [Reading](#reading-1)
- [Writing](#writing-1)
## Installation
For Unix and MacOS ODBC drivers should be compiled against
[unixODBC](http://www.unixodbc.org/). Drivers compiled against [iODBC](http://www.iodbc.org/) _may_ also
work, but are not fully supported.
After installation of the driver manager and driver, you will have to
register the driver in a [odbcinst.ini](#dsn-configuration-files) file for it
to appear in `odbc::odbcListDrivers()`.
**odbc** and it's dependencies use C++11 features. Therefore you need [gcc
4.8](https://gcc.gnu.org/), [clang 3.3](https://clang.llvm.org) or [Rtools
3.3](https://CRAN.R-project.org/bin/windows/Rtools/) or later.
### Windows
Windows is bundled with ODBC libraries however drivers for each database
need to be installed separately. Windows ODBC drivers typically include an
installer that needs to be run and will install the driver to the proper
locations.
### MacOS
[homebrew](https://brew.sh/) can be used to easily install database drivers on MacOS.
#### UnixODBC - Required for all databases
```shell
# Install the unixODBC library
brew install unixodbc
```
#### Common DB drivers
```shell
# Microsoft SQL Server ODBC Drivers (Recommended)
brew tap microsoft/mssql-release https://github.com/Microsoft/homebrew-mssql-release
brew update
brew install msodbcsql17 mssql-tools
# SQL Server ODBC Drivers (Free TDS)
brew install freetds
# PostgreSQL ODBC ODBC Drivers
brew install psqlodbc
# MySQL ODBC Drivers (and database)
brew install mysql
# SQLite ODBC Drivers
brew install sqliteodbc
```
### Linux - Debian / Ubuntu
[apt-get](https://wiki.debian.org/Apt) can be used to easily install database
drivers on Linux distributions which support it, such as Debian and Ubuntu.
#### UnixODBC - Required for all databases
```shell
# Install the unixODBC library
apt-get install unixodbc unixodbc-dev
```
#### Common DB drivers
For SQL Server on Linux it is recommended you use the [Microsoft
Drivers](https://docs.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server?view=sql-server-ver15#ubuntu).
```shell
# SQL Server ODBC Drivers (Free TDS)
apt-get install tdsodbc
# PostgreSQL ODBC ODBC Drivers
apt-get install odbc-postgresql
# MySQL ODBC Drivers
apt-get install libmyodbc
# SQLite ODBC Drivers
apt-get install libsqliteodbc
```
### R
```R
# Install the latest odbc release from CRAN:
install.packages("odbc")
# Or the the development version from GitHub:
# install.packages(devtools)
devtools::install_github("r-dbi/odbc")
```
## Connecting to a Database
Databases can be connected by specifying a connection string directly, or with
DSN configuration files.
### Connection Strings
Specify a connection string as named arguments directly in the `dbConnect()` method.
```r
library(DBI)
con <- dbConnect(odbc::odbc(),
driver = "PostgreSQL Driver",
database = "test_db",
uid = "postgres",
pwd = "password",
host = "localhost",
port = 5432)
```
Alternatively you can pass a complete connection string as the
`.connection_string` argument. [The Connection Strings
Reference](https://www.connectionstrings.com) is a useful resource that has
example connection strings for a large variety of databases.
```r
library(DBI)
con <- dbConnect(odbc::odbc(),
.connection_string = "Driver={PostgreSQL Driver};Uid=postgres;Pwd=password;Host=localhost;Port=5432;Database=test_db;")
```
### DSN Configuration files
ODBC configuration files are another option to specify connection parameters
and allow one to use a Data Source Name (DSN) to make it easier to connect to a database.
```r
con <- dbConnect(odbc::odbc(), "PostgreSQL")
```
#### Windows
The [ODBC Data Source
Administrator](https://docs.microsoft.com/en-us/sql/odbc/admin/odbc-data-source-administrator?view=sql-server-ver15)
application is used to manage ODBC data sources on Windows.
#### MacOS / Linux
On MacOS and Linux there are two separate text files that need to be edited.
UnixODBC includes a command line executable `odbcinst` which can be used to
query and modify the DSN files. However these are plain text files you
can also edit by hand if desired.
There are two different files used to setup the DSN information.
- `odbcinst.ini` - which defines driver options
- `odbc.ini` - which defines connection options
The DSN configuration files can be defined globally for all users of the
system, often at
`/etc/odbc.ini` or `/opt/local/etc/odbc.ini`, the exact location depends on
what option was used when compiling unixODBC. `odbcinst -j` can be used to find
the exact location. Alternatively the `ODBCSYSINI` environment variable can be
used to specify the location of the configuration files. Ex. `ODBCSYSINI=~/ODBC`
A local DSN file can also be used with the files `~/.odbc.ini` and `~/.odbcinst.ini`.
##### odbcinst.ini
Contains driver information, particularly the name of the driver library.
Multiple drivers can be specified in the same file.
```ini
[PostgreSQL Driver]
Driver = /usr/local/lib/psqlodbcw.so
[SQLite Driver]
Driver = /usr/local/lib/libsqlite3odbc.dylib
```
##### odbc.ini
Contains connection information, particularly the username, password, database
and host information. The Driver line corresponds to the driver defined in
`odbcinst.ini`.
```ini
[PostgreSQL]
Driver = PostgreSQL Driver
Database = test_db
Servername = localhost
UserName = postgres
Password = password
Port = 5432
[SQLite]
Driver = SQLite Driver
Database=/tmp/testing
```
See also: [unixODBC without the GUI](http://www.unixodbc.org/odbcinst.html) for more information and examples.
## Usage
All of the following examples assume you have already created a connection `con`.
See [Connecting to a database](#connecting-to-a-database) for more information on establishing a
connection.
### Table and Field information
`dbListTables()` is used for listing all existing tables in a database.
```r
dbListTables(con)
# List tables beginning with f
dbListTables(con, table_name = "f%")
# List all fields in the 'flights' database
dbListFields(con, "flights")
```
### Reading
`dbReadTable()` will read a full table into an R `data.frame()`.
```r
data <- dbReadTable(con, "flights")
```
### Writing
`dbWriteTable()` will write an R `data.frame()` to an SQL table.
```r
data <- dbWriteTable(con, "iris", iris)
```
### Querying
`dbGetQuery()` will submit a query and fetch the results. It is also possible
to submit the query and fetch separately with `dbSendQuery()` and `dbFetch()`.
The `n=` argument to `dbFetch()` can be used to fetch only the part of a query result (the next *n* rows).
```r
result <- dbSendQuery(con, "SELECT flight, tailnum, origin FROM flights ORDER BY origin")
# Retrieve the first 100 results
first_100 <- dbFetch(result, n = 100)
# Retrieve the rest of the results
rest <- dbFetch(result)
```
## Benchmarks
The *odbc* package is often much faster than the existing
[RODBC](https://cran.r-project.org/package=RODBC) and DBI compatible
[RODBCDBI](https://cran.r-project.org/package=RODBCDBI) packages.
### Reading
Reading a table from a SQL Server database with the 'flights' dataset (336,776
rows, 19 columns) of the package
[nytflights13](https://github.com/hadley/nycflights13).
```{r}
# First using RODBC / RODBCDBI
library(DBI)
rodbc <- dbConnect(RODBCDBI::ODBC(), dsn = "MicrosoftSQLServer", user = Sys.getenv("SQLSERVER_UID"), password = Sys.getenv("SQLSERVER_PWD"))
system.time(rodbc_result <- dbReadTable(rodbc, "flights"))
#> user system elapsed
#> 13.986 1.173 15.192
# Now using odbc
odbc <- dbConnect(odbc::odbc(), dsn = "MicrosoftSQLServer", UID = Sys.getenv("SQLSERVER_UID"), PWD = Sys.getenv("SQLSERVER_PWD"))
system.time(odbc_result <- dbReadTable(odbc, "flights"))
#> user system elapsed
#> 1.334 0.065 1.406
tibble::as_tibble(odbc_result)
#> # A tibble: 336,776 x 19
#> year month day dep_time sched_dep_time dep_delay arr_time
#> <int> <int> <int> <int> <int> <dbl> <int>
#> 1 2013 1 1 517 515 2 830
#> 2 2013 1 1 533 529 4 850
#> 3 2013 1 1 542 540 2 923
#> 4 2013 1 1 544 545 -1 1004
#> 5 2013 1 1 554 600 -6 812
#> 6 2013 1 1 554 558 -4 740
#> 7 2013 1 1 555 600 -5 913
#> 8 2013 1 1 557 600 -3 709
#> 9 2013 1 1 557 600 -3 838
#> 10 2013 1 1 558 600 -2 753
#> # … with 336,766 more rows, and 12 more variables: sched_arr_time <int>,
#> # arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,
#> # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
#> # minute <dbl>, time_hour <dttm>
identical(dim(rodbc_result), dim(odbc_result))
#> [1] TRUE
rm(rodbc_result, odbc_result, odbc, rodbc)
gc(verbose = FALSE)
```
### Writing
Writing the same dataset to the database.
```{r, echo = FALSE, results = "hide"}
odbc <- dbConnect(odbc::odbc(), dsn = "MicrosoftSQLServer", UID = Sys.getenv("SQLSERVER_UID"), PWD = Sys.getenv("SQLSERVER_PWD"))
if (dbExistsTable(odbc, "flights2")) { dbRemoveTable(odbc, "flights2") }
if (dbExistsTable(odbc, "flights3")) { dbRemoveTable(odbc, "flights3") }
rm(odbc)
gc(verbose = FALSE)
```
```{r, cache = TRUE}
library(nycflights13)
# rodbc does not support writing timestamps natively, so we remove that column
flights2 <- as.data.frame(flights[, names(flights) != "time_hour"])
rodbc <- dbConnect(RODBCDBI::ODBC(), dsn = "MicrosoftSQLServer", user = Sys.getenv("SQLSERVER_UID"), password = Sys.getenv("SQLSERVER_PWD"))
system.time(dbWriteTable(rodbc, "flights2", flights2))
#> user system elapsed
#> 11.891 6.269 765.269
# Now using odbc
odbc <- dbConnect(odbc::odbc(), dsn = "MicrosoftSQLServer", UID = Sys.getenv("SQLSERVER_UID"), PWD = Sys.getenv("SQLSERVER_PWD"))
flights3 <- as.data.frame(flights)
system.time(dbWriteTable(odbc, "flights3", flights3))
#> user system elapsed
#> 2.617 0.275 9.649
```