r - Deleting rows where any of several columns is a duplicate -
i have dataframe id column, , several attribute columns. drop rows in dataframe 1 of attribute columns (or multiple) identical other attribute column. in other words, want keep rows each attribute unique value within row.
for example, using code:
example = data.frame(id = c("a", "b", "c", "d"), attr1 = seq(1,4), attr2 = c(2, 3, 3, 1), attr3 = c(1, 2, 3, 3))
which results in dataframe:
id attr1 attr2 attr3 1 2 1 b 2 3 2 c 3 3 3 d 4 1 3
i want drop of rows last one, id "d".
i've looked ways this, particular problem (unique within rows) i'm not sure how solve -- if columns easy.
thanks in advance!
you may try anyduplicated
example[!apply(example[-1], 1, anyduplicated),] # id attr1 attr2 attr3 #4 d 4 1 3
or
example[apply(example[-1],1, function(x) length(unique(x))==3),]
or using regex
example[!nzchar(sub('^(?:([0-9])(?!.*\\1))*$', '', do.call(paste0, example[-1]), perl=true)),]
benchmarks
example1 <- example[rep(1:nrow(example),1e6),] system.time(example1[!apply(example1[-1], 1, anyduplicated),]) # user system elapsed # 32.953 0.222 33.239 system.time(example1[!apply(example1[-1], 1, function(x) length(unique(x))==3),]) # user system elapsed # 35.409 0.185 35.659 system.time(example1[!nzchar(sub('^(?:([0-9])(?!.*\\1))*$', '', do.call(paste0, example1[-1]), perl=true)),]) # user system elapsed # 10.033 0.020 10.069
Comments
Post a Comment