R: Uniques (or dplyr distinct) + most recent date -

- March 15, 2011

i have dataframe consisting of rows of information include repeats based on name different dates. i'd filter df 1 includes unique names, choose recent occurrence if given chance. big fan of dplyr , have used combinations of distinct , select before, documentation makes seem cannot done alone:

"variables use when determining uniqueness. if there multiple rows given combination of inputs, first row preserved."

this seems problem occur commonly, wondering if had advice. example df below, reflects real data has names character class , date posixct generated using lubridate package.

structure(list(name = c("john", "john", "mary", "john", "mary",  "chad"), date = structure(c(1430438400, 1433116800, 1335830400,  1422748800, 1435708800, 1427846400), tzone = "utc", class = c("posixct",  "posixt"))), .names = c("name", "date"), row.names = c(na, -6l ), class = "data.frame")

the desired result is:

structure(list(name = c("john", "mary", "chad"), date = structure(c(1433116800,  1435708800, 1427846400), class = c("posixct", "posixt"), tzone = "utc")), .names = c("name",  "date"), row.names = c(2l, 5l, 6l), class = "data.frame")

thank help.

the simplest way be

df %>% arrange(desc(date)) %>% distinct(name)

if want names kept in same order, these work (thanks @akrun):

df %>% group_by(name) %>% slice(which.max(date))  # @akrun's better idea df %>% group_by(name) %>% filter(date==max(date)) # idea

Search This Blog

Ruby Co

R: Uniques (or dplyr distinct) + most recent date -

Comments

Post a Comment

Popular posts from this blog

c - Bitwise operation with (signed) enum value -

xslt - Unnest parent nodes by child node -

YouTubePlayerFragment cannot be cast to android.support.v4.app.Fragment -