R Language I/O for foreign tables (Excel, SAS, SPSS, Stata) Importing Excel files

Help us to keep this website almost Ad Free! It takes only 10 seconds of your time:
> Step 1: Go view our video on YouTube: EF Core Bulk Extensions
> Step 2: And Like the video. BONUS: You can also share it!

Example

There are several R packages to read excel files, each of which using different languages or resources, as summarized in the following table:

R packageUses
xlsxJava
XLconnectJava
openxlsxC++
readxlC++
RODBCODBC
gdataPerl

For the packages that use Java or ODBC it is important to know details about your system because you may have compatibility issues depending on your R version and OS. For instance, if you are using R 64 bits then you also must have Java 64 bits to use xlsx or XLconnect.

Some examples of reading excel files with each package are provided below. Note that many of the packages have the same or very similar function names. Therefore, it is useful to state the package explicitly, like package::function. The package openxlsx requires prior installation of RTools.

Reading excel files with the xlsx package

library(xlsx)

The index or name of the sheet is required to import.

xlsx::read.xlsx("Book1.xlsx", sheetIndex=1)

xlsx::read.xlsx("Book1.xlsx", sheetName="Sheet1")

Reading Excel files with the XLconnect package

library(XLConnect)
wb <- XLConnect::loadWorkbook("Book1.xlsx")

# Either, if Book1.xlsx has a sheet called "Sheet1":
sheet1 <- XLConnect::readWorksheet(wb, "Sheet1")
# Or, more generally, just get the first sheet in Book1.xlsx:
sheet1 <- XLConnect::readWorksheet(wb, getSheets(wb)[1])

XLConnect automatically imports the pre-defined Excel cell-styles embedded in Book1.xlsx. This is useful when you wish to format your workbook object and export a perfectly formatted Excel document. Firstly, you will need to create the desired cell formats in Book1.xlsx and save them, for example, as myHeader, myBody and myPcts. Then, after loading the workbook in R (see above):

Headerstyle <- XLConnect::getCellStyle(wb, "myHeader")
Bodystyle <- XLConnect::getCellStyle(wb, "myBody")
Pctsstyle <- XLConnect::getCellStyle(wb, "myPcts")

The cell styles are now saved in your R environment. In order to assign the cell styles to certain ranges of your data, you need to define the range and then assign the style:

Headerrange <- expand.grid(row = 1, col = 1:8)
Bodyrange <- expand.grid(row = 2:6, col = c(1:5, 8))
Pctrange <- expand.grid(row = 2:6, col = c(6, 7))

XLConnect::setCellStyle(wb, sheet = "sheet1", row = Headerrange$row,
             col = Headerrange$col, cellstyle = Headerstyle)
XLConnect::setCellStyle(wb, sheet = "sheet1", row = Bodyrange$row,
             col = Bodyrange$col, cellstyle = Bodystyle)
XLConnect::setCellStyle(wb, sheet = "sheet1", row = Pctrange$row,
             col = Pctrange$col, cellstyle = Pctsstyle)

Note that XLConnect is easy, but can become extremely slow in formatting. A much faster, but more cumbersome formatting option is offered by openxlsx.

Reading excel files with the openxlsx package

Excel files can be imported with package openxlsx

library(openxlsx)

openxlsx::read.xlsx("spreadsheet1.xlsx", colNames=TRUE, rowNames=TRUE)

#colNames: If TRUE, the first row of data will be used as column names.
#rowNames: If TRUE, first column of data will be used as row names.

The sheet, which should be read into R can be selected either by providing its position in the sheet argument:

openxlsx::read.xlsx("spreadsheet1.xlsx", sheet = 1)

or by declaring its name:

openxlsx::read.xlsx("spreadsheet1.xlsx", sheet = "Sheet1")

Additionally, openxlsx can detect date columns in a read sheet. In order to allow automatic detection of dates, an argument detectDates should be set to TRUE:

openxlsx::read.xlsx("spreadsheet1.xlsx", sheet = "Sheet1", detectDates= TRUE)

Reading excel files with the readxl package

Excel files can be imported as a data frame into R using the readxl package.

library(readxl)

It can read both .xls and .xlsx files.

readxl::read_excel("spreadsheet1.xls")
readxl::read_excel("spreadsheet2.xlsx")

The sheet to be imported can be specified by number or name.

readxl::read_excel("spreadsheet.xls", sheet = 1)
readxl::read_excel("spreadsheet.xls", sheet = "summary")

The argument col_names = TRUE sets the first row as the column names.

 readxl::read_excel("spreadsheet.xls", sheet = 1, col_names = TRUE)

The argument col_types can be used to specify the column types in the data as a vector.

readxl::read_excel("spreadsheet.xls", sheet = 1, col_names = TRUE,
                   col_types = c("text", "date", "numeric", "numeric"))

Reading excel files with the RODBC package

Excel files can be read using the ODBC Excel Driver that interfaces with Windows' Access Database Engine (ACE), formerly JET. With the RODBC package, R can connect to this driver and directly query workbooks. Worksheets are assumed to maintain column headers in first row with data in organized columns of similar types. NOTE: This approach is limited to only Windows/PC machines as JET/ACE are installed .dll files and not available on other operating systems.

library(RODBC)

xlconn <- odbcDriverConnect('Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};
                             DBQ=C:\\Path\\To\\Workbook.xlsx')

df <- sqlQuery(xlconn, "SELECT * FROM [SheetName$]")
close(xlconn)

Connecting with an SQL engine in this approach, Excel worksheets can be queried similar to database tables including JOIN and UNION operations. Syntax follows the JET/ACE SQL dialect. NOTE: Only data access DML statements, specifically SELECT can be run on workbooks, considered not updateable queries.

joindf <-  sqlQuery(xlconn, "SELECT t1.*, t2.* FROM [Sheet1$] t1
                             INNER JOIN [Sheet2$] t2
                             ON t1.[ID] = t2.[ID]")

uniondf <-  sqlQuery(xlconn, "SELECT * FROM [Sheet1$]
                              UNION  
                              SELECT * FROM [Sheet2$]")

Even other workbooks can be queried from the same ODBC channel pointing to a current workbook:

otherwkbkdf <- sqlQuery(xlconn, "SELECT * FROM 
                                 [Excel 12.0 Xml;HDR=Yes;
                                 Database=C:\\Path\\To\\Other\\Workbook.xlsx].[Sheet1$];")

Reading excel files with the gdata package

example here



Got any R Language Question?