Stack Overflow Asked by joshbh on November 18, 2021
I am reading in an xlsx file into R using read_excel and I have a date column that has many, ambiguous formats. For example, some values are nice and clean like "05/06/20", while some are text like "no date" or "not applicable". When reading in the file (df) and specifying the column (date1) as type "date", the text answers like "no date" or "not applicable" get turned to NA. Also, when reading in the file (df) and specifying the column (date1) as type "text", the answers in proper date formats like "05/06/20" get turned into numbers like "43447". When trying to convert that column, currently of type character, to a date, i get the error of dates not being in an unambiguous format. Any suggestions of how to read in the file or transform the df$date1 column once imported so i can have both answers like "05/06/20" and "not applicable"?
The following function takes a vector with a mix of numbers and text and outputs a vector of dates corresponding to the numbers. If any of the input vector elements are not coercible to numeric, the output will be of class "character"
.
library(readxl)
date_with_text <- function(x, origin = "1899-12-30"){
num <- suppressWarnings(as.numeric(x))
y <- as.Date(num, origin = origin)
if(anyNA(num)){
y <- as.character(y)
y[is.na(num)] <- as.character(x[is.na(num)])
}
y
}
Now test the function. Note that I'm using col_types = "text"
to read the column in as text.
df1 <- read_excel("test.xlsx", col_types = "text")
date_with_text(df1$date)
#[1] "2020-06-05" "no date" "not applicable"
Test data
The test data was an excel .xlsx
file with one column only. The column date
had the values
"05/06/2020", "no date", "not applicable"
read in as
"43987", "no date", "not applicable"
Answered by Rui Barradas on November 18, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP