r - Deleting rows where any of several columns is a duplicate -

- April 15, 2010

i have dataframe id column, , several attribute columns. drop rows in dataframe 1 of attribute columns (or multiple) identical other attribute column. in other words, want keep rows each attribute unique value within row.

for example, using code:

    example = data.frame(id = c("a", "b", "c", "d"), attr1 = seq(1,4), attr2 =     c(2, 3, 3, 1), attr3 = c(1, 2, 3, 3))

which results in dataframe:

id  attr1   attr2   attr3     1     2       1 b     2     3       2 c     3     3       3 d     4     1       3

i want drop of rows last one, id "d".

i've looked ways this, particular problem (unique within rows) i'm not sure how solve -- if columns easy.

thanks in advance!

you may try anyduplicated

 example[!apply(example[-1], 1, anyduplicated),]  #  id attr1 attr2 attr3  #4  d     4     1     3

 example[apply(example[-1],1, function(x) length(unique(x))==3),]

or using regex

 example[!nzchar(sub('^(?:([0-9])(?!.*\\1))*$', '',               do.call(paste0, example[-1]), perl=true)),]

benchmarks

example1 <- example[rep(1:nrow(example),1e6),] system.time(example1[!apply(example1[-1], 1, anyduplicated),]) #   user  system elapsed  # 32.953   0.222  33.239    system.time(example1[!apply(example1[-1], 1,        function(x) length(unique(x))==3),]) #   user  system elapsed  # 35.409   0.185  35.659   system.time(example1[!nzchar(sub('^(?:([0-9])(?!.*\\1))*$',             '', do.call(paste0, example1[-1]), perl=true)),]) # user  system elapsed  # 10.033   0.020  10.069

Search This Blog

Ruby Co

r - Deleting rows where any of several columns is a duplicate -

benchmarks

Comments

Post a Comment

Popular posts from this blog

c - Bitwise operation with (signed) enum value -

xslt - Unnest parent nodes by child node -

YouTubePlayerFragment cannot be cast to android.support.v4.app.Fragment -