R: Uniques (or dplyr distinct) + most recent date -
i have dataframe consisting of rows of information include repeats based on name different dates. i'd filter df 1 includes unique names, choose recent occurrence if given chance. big fan of dplyr , have used combinations of distinct , select before, documentation makes seem cannot done alone:
"variables use when determining uniqueness. if there multiple rows given combination of inputs, first row preserved."
this seems problem occur commonly, wondering if had advice. example df below, reflects real data has names character class , date posixct generated using lubridate package.
structure(list(name = c("john", "john", "mary", "john", "mary", "chad"), date = structure(c(1430438400, 1433116800, 1335830400, 1422748800, 1435708800, 1427846400), tzone = "utc", class = c("posixct", "posixt"))), .names = c("name", "date"), row.names = c(na, -6l ), class = "data.frame")
the desired result is:
structure(list(name = c("john", "mary", "chad"), date = structure(c(1433116800, 1435708800, 1427846400), class = c("posixct", "posixt"), tzone = "utc")), .names = c("name", "date"), row.names = c(2l, 5l, 6l), class = "data.frame")
thank help.
the simplest way be
df %>% arrange(desc(date)) %>% distinct(name)
if want names kept in same order, these work (thanks @akrun):
df %>% group_by(name) %>% slice(which.max(date)) # @akrun's better idea df %>% group_by(name) %>% filter(date==max(date)) # idea
Comments
Post a Comment