parallel processing - bootstrap a dataset in R -
i need perform bootstrapping dataset in r. data in form of list contains 2 matrices , has following properties:
both matrices n m , contain positive integers (including 0).
data <- list(a=matrix(,n,m), b=matrix(,n,m))
a number of marbles, 10000 distributed each matrix, i.e., 10000 divided in n*m parts. in other words, sum of entries each matrix fixed.
> sum(data$a) [1] 10000 > sum(data$b) [1] 10000
- the marbles distributed according affinity of ij-th elements marbles, i.e. how many marbles end ij-th entry of matrix depends on probability associated every cell of matrix.
- the probabilities associated elements different 2 matrices.
my goal estimate parameters lead underlying probabilities. model assumes 2n
parameters, n
number of rows , 1 set each matrix. parameters combine in complex manner , 2 matrices must analyzed together.
parameters <- data.frame(a=numeric(n), b=numeric(n))
right now, approach using:
i define function
sgen
takes input matrix containing probabilities associated sites, generates dataset using these probabilities , returns it.sgen <- function(freq) { #generate sample ... }
for non-parametric bootstrap (which want implement now), run experiment, , calculate observed probability associated each ij element dividing observed matrices 10000. let call
freq
now. so,freq
list 2 matrices.freq <- list(a=data$a/10000, b=data$b/10000)
- next, replicate 100 samples data passing
freq
sgen
. - i pass replicates pre-defined function,
analyze
gives me 100 n 2 matrices containing parameters. - next calculate mean , sd of entries between matrices n 2 matrix containing means , containing sd. so, desired value (1,5)th element of mean matrix mean of (1,5)th elements of 100 replicates.
while approach works, use boot package in r job. want because can use functions in boot package later analyses , way essential information stored in format of boot class. important reason use boot package offers easy way make use of multicore capabilities of computer. so, can please guide me on how use boot
purpose?
you can use bootstrap
function in following way (taken ?bootstrap
):
# bootstrap functions of more complex data structures, # write theta argument x # set of observation numbers # , pass data bootstrap vector 1,2,..n. # example, bootstrap # correlation coefficient set of 15 data pairs: xdata <- matrix(rnorm(30),ncol=2) n <- 15 theta <- function(x,xdata){ cor(xdata[x,1],xdata[x,2]) } results <- bootstrap(1:n,20,theta,xdata)
theta
function bootstrap.
the problem approach (i believe) theta can return vector (not dataframe/matrix of multiple values in 1 go). so, if theta
function returns else vector might not work.
update boot
package:
the approach similar using boot
function boot
package. takes data
, data vector, matrix, or dataframe, , statistic
, "a function when applied data returns vector containing statistic(s) of interest." non-parametric bootstrap, statistic
function must take (at least) 2 arguments: original data, , vector of indices, frequencies or weights.
so, key write 1 function implements steps 1-5 on subset of data given index, e.g:
theta <- function(data, indices) { ## exact subsetting operation depends on format of data subset_data = data[indices,] ## perform calculations in steps 1-5 here on subset_data }
then should able call theta
this:
boot(data, theta)
Comments
Post a Comment