Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have a data frame with roughly 8 million rows and 3 columns. I used strptime() in the following manner:

df$date.time <- strptime(df$date.time, "%m/%d/%y %I:%M:%S %p")

This works fine for all but 1104 of the rows, which I checked using

df[is.na(df$date.time), ]

When I look at these "problem" data, the date.time entries seem to be formatted in the way I would expect. For example, here is an observation that comes up as a problem, but doesn't appear to be an NA:

id                date.time              outcome
observation543490 2012-03-11 02:14:01    C

What could possibly be going on here that is.na(df$date.time) returns a TRUE value for this row that has apparently been converted correctly?

Here's a reproducible example (if you're in CST):

is.na(strptime("03/11/12 2:14:01 AM", "%m/%d/%y %I:%M:%S %p", "CST6CDT"))
#[1] TRUE
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
904 views
Welcome To Ask or Share your Answers For Others

1 Answer

The problem is likely that all the times that return NA do not exist in whatever timezone you're using, due to daylight saving time.

Check with the data source to determine the timezone the data were recorded in, then set the tz argument to that value in your call to strptime.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...