r - Non-uniform `shift`ing: leads/lags with multiple within-period&ID observations -
the use of shift
well-documented way lead , lag values of variables, i'm struggling extending logic situation have multiple observations within each period or ids.
example:
dt<-data.table(id=rep(1:2,c(6,3)), pd=rep(rep(1:3,2),c(1:3,1,1,1)), firm=c(rep(c("01","02"),c(4,2)), c("04","05","06")))
the basic approach wrong:
dt[,paste0("firm_",c("lag","lead")):= .(shift(firm),shift(firm,type="lead")),by=id] > dt id pd firm firm_lag firm_lead 1: 1 1 01 na 01 2: 1 2 01 01 01 3: 1 2 01 01 01 4: 1 3 01 01 02 #all lags should 01 id 1 in pd 3 5: 1 3 02 01 02 #all leads should na id 1 in pd 3 6: 1 3 02 02 na 7: 2 1 04 na 05 8: 2 2 05 04 06 9: 2 3 06 05 na
my current workaround self-merge:
dt[setkey(unique(dt)[,.(pd,shift(firm),shift(firm,type="lead")),by=id],id,pd), c("lag","lead"):=.(i.v2,i.v3)] > dt id pd firm lag lead 1: 1 1 01 na 01 2: 1 2 01 01 01 3: 1 2 01 01 01 4: 1 3 01 01 na 5: 1 3 02 01 na 6: 1 3 02 01 na 7: 2 1 04 na 05 8: 2 2 05 04 06 9: 2 3 06 05 na
this correct, quite messy. there easier way accomplish this?
here's 1 way without merging:
dt[,c("lag","lead"):={ r <- rleid(pd) x <- .sd[,firm[1],by=pd]$v1 lapply(c("lag","lead"), function(y) x[shift(r,type=y)] ) }, by=id]
as far performance goes, there faster ways .sd
line. try ave(firm,pd,fun=function(x)x[1])
, example.
Comments
Post a Comment