rjava - R control memory usage in doParallel -
i use doparallel batches of tasks seems r not freeing memory call rm() , gc(). program exit out of memory. may worth mention running r java using jri, think not java issue r. because (1) monitoring jvm memory consumption, never beyond 3gb (allocated max=8g); (2) running job on company server, allocating total of 36g job (8g jvm). job exits due system error (max memory 36g reached), not java's outofmemory exception. me suggests r engine java calls has gradually used memory available.
below r code , java code. java code complex tried simplify outlining general structure.
rengine re = .... eng.parseandeval("registerdoparallel("+cpucores+")") while(somecondition){ string datamatrixfile=functiontogetdatamatrixfile("..."); /the file path //is different in each iteration of while loop re.parseandeval("m <- read.table('"+datamatrixfile+"', header=false, sep=',', skip=0)"); //read data matrix saved on disk re.parseandeval("d <- dist(m, 'euclidean')"); //compute distance matrix re.parseandeval("clustering <- hclust(d, 'average')"); //do hierarchical //clustering on distance matrix re.parseandeval("b <- c(1, 3, 4, 7, 8, 11, ......)"); //a set of number //of clusters cut dendrogram. exact size of b , values //depend on java function. lets assume here vector //of integers, , there can 20-100 elements. object result = re.parseandeval("foreach(i=b, .combine=c, .packages='fpc') %dopar% {\n subgraph <-cutree(clustering, k=i) calinhara(m, subgraph) }"); //line a. use doparallel iterate through b, cut //dendrogram, , compute calinski & harabasz statistic. object //returned foreach block vector of double values re.parseandeval("rm(list=ls())"); re.parseandeval("gc()"); //line b re.parseandeval("gc()"); javafunctiontoprocess(result); }
by logging seems block of code indicated "line a" causing memory leak, each time program breaks can see in log executing block. question how fix this, might helpful if can explain of specific questions below
1) in foreach block, r create multiple copies of object "clustering", 1 each core parallelize?
2) if yes 1), these duplicate objects garbage collected automatically? if not cause memory leak
3) necessary , change line to:
object result = re.parseandeval("foreach(i=b, .combine=c, .packages='fpc') %dopar% {\n subgraph <-cutree(clustering, k=i) calinhara(m, subgraph) gc() gc() }");
4) necessary , "force" r release memory exiting r environment, , re-create r instance periodically, e.g., this: re.parseandeval("q()"); re = null; re = [code create new rengine]; ...//continue processing
any suggestions appreciated!
Comments
Post a Comment