There are several R packages to read excel files, each of which using different languages or resources, as summarized in the following table:
R package | Uses |
---|---|
xlsx | Java |
XLconnect | Java |
openxlsx | C++ |
readxl | C++ |
RODBC | ODBC |
gdata | Perl |
For the packages that use Java or ODBC it is important to know details about your system because you may have compatibility issues depending on your R version and OS. For instance, if you are using R 64 bits then you also must have Java 64 bits to use xlsx
or XLconnect
.
Some examples of reading excel files with each package are provided below. Note that many of the packages have the same or very similar function names. Therefore, it is useful to state the package explicitly, like package::function
. The package openxlsx
requires prior installation of RTools.
library(xlsx)
The index or name of the sheet is required to import.
xlsx::read.xlsx("Book1.xlsx", sheetIndex=1)
xlsx::read.xlsx("Book1.xlsx", sheetName="Sheet1")
library(XLConnect)
wb <- XLConnect::loadWorkbook("Book1.xlsx")
# Either, if Book1.xlsx has a sheet called "Sheet1":
sheet1 <- XLConnect::readWorksheet(wb, "Sheet1")
# Or, more generally, just get the first sheet in Book1.xlsx:
sheet1 <- XLConnect::readWorksheet(wb, getSheets(wb)[1])
XLConnect
automatically imports the pre-defined Excel cell-styles embedded in Book1.xlsx
. This is useful when you wish to format your workbook object and export a perfectly formatted Excel document. Firstly, you will need to create the desired cell formats in Book1.xlsx
and save them, for example, as myHeader
, myBody
and myPcts
. Then, after loading the workbook in R
(see above):
Headerstyle <- XLConnect::getCellStyle(wb, "myHeader")
Bodystyle <- XLConnect::getCellStyle(wb, "myBody")
Pctsstyle <- XLConnect::getCellStyle(wb, "myPcts")
The cell styles are now saved in your R
environment. In order to assign the cell styles to certain ranges of your data, you need to define the range and then assign the style:
Headerrange <- expand.grid(row = 1, col = 1:8)
Bodyrange <- expand.grid(row = 2:6, col = c(1:5, 8))
Pctrange <- expand.grid(row = 2:6, col = c(6, 7))
XLConnect::setCellStyle(wb, sheet = "sheet1", row = Headerrange$row,
col = Headerrange$col, cellstyle = Headerstyle)
XLConnect::setCellStyle(wb, sheet = "sheet1", row = Bodyrange$row,
col = Bodyrange$col, cellstyle = Bodystyle)
XLConnect::setCellStyle(wb, sheet = "sheet1", row = Pctrange$row,
col = Pctrange$col, cellstyle = Pctsstyle)
Note that XLConnect
is easy, but can become extremely slow in formatting. A much faster, but more cumbersome formatting option is offered by openxlsx
.
Excel files can be imported with package openxlsx
library(openxlsx)
openxlsx::read.xlsx("spreadsheet1.xlsx", colNames=TRUE, rowNames=TRUE)
#colNames: If TRUE, the first row of data will be used as column names.
#rowNames: If TRUE, first column of data will be used as row names.
The sheet, which should be read into R can be selected either by providing its position in the sheet
argument:
openxlsx::read.xlsx("spreadsheet1.xlsx", sheet = 1)
or by declaring its name:
openxlsx::read.xlsx("spreadsheet1.xlsx", sheet = "Sheet1")
Additionally, openxlsx
can detect date columns in a read sheet. In order to allow automatic detection of dates, an argument detectDates
should be set to TRUE
:
openxlsx::read.xlsx("spreadsheet1.xlsx", sheet = "Sheet1", detectDates= TRUE)
Excel files can be imported as a data frame into R
using the readxl
package.
library(readxl)
It can read both .xls
and .xlsx
files.
readxl::read_excel("spreadsheet1.xls")
readxl::read_excel("spreadsheet2.xlsx")
The sheet to be imported can be specified by number or name.
readxl::read_excel("spreadsheet.xls", sheet = 1)
readxl::read_excel("spreadsheet.xls", sheet = "summary")
The argument col_names = TRUE
sets the first row as the column names.
readxl::read_excel("spreadsheet.xls", sheet = 1, col_names = TRUE)
The argument col_types
can be used to specify the column types in the data as a vector.
readxl::read_excel("spreadsheet.xls", sheet = 1, col_names = TRUE,
col_types = c("text", "date", "numeric", "numeric"))
Excel files can be read using the ODBC Excel Driver that interfaces with Windows' Access Database Engine (ACE), formerly JET. With the RODBC package, R can connect to this driver and directly query workbooks. Worksheets are assumed to maintain column headers in first row with data in organized columns of similar types. NOTE: This approach is limited to only Windows/PC machines as JET/ACE are installed .dll files and not available on other operating systems.
library(RODBC)
xlconn <- odbcDriverConnect('Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};
DBQ=C:\\Path\\To\\Workbook.xlsx')
df <- sqlQuery(xlconn, "SELECT * FROM [SheetName$]")
close(xlconn)
Connecting with an SQL engine in this approach, Excel worksheets can be queried similar to database tables including JOIN
and UNION
operations. Syntax follows the JET/ACE SQL dialect. NOTE: Only data access DML statements, specifically SELECT
can be run on workbooks, considered not updateable queries.
joindf <- sqlQuery(xlconn, "SELECT t1.*, t2.* FROM [Sheet1$] t1
INNER JOIN [Sheet2$] t2
ON t1.[ID] = t2.[ID]")
uniondf <- sqlQuery(xlconn, "SELECT * FROM [Sheet1$]
UNION
SELECT * FROM [Sheet2$]")
Even other workbooks can be queried from the same ODBC channel pointing to a current workbook:
otherwkbkdf <- sqlQuery(xlconn, "SELECT * FROM
[Excel 12.0 Xml;HDR=Yes;
Database=C:\\Path\\To\\Other\\Workbook.xlsx].[Sheet1$];")
example here