rjava - R control memory usage in doParallel -


i use doparallel batches of tasks seems r not freeing memory call rm() , gc(). program exit out of memory. may worth mention running r java using jri, think not java issue r. because (1) monitoring jvm memory consumption, never beyond 3gb (allocated max=8g); (2) running job on company server, allocating total of 36g job (8g jvm). job exits due system error (max memory 36g reached), not java's outofmemory exception. me suggests r engine java calls has gradually used memory available.

below r code , java code. java code complex tried simplify outlining general structure.

rengine re = .... eng.parseandeval("registerdoparallel("+cpucores+")")      while(somecondition){     string datamatrixfile=functiontogetdatamatrixfile("..."); /the file path     //is different in each iteration of while loop     re.parseandeval("m <- read.table('"+datamatrixfile+"', header=false, sep=',', skip=0)"); //read data matrix saved on disk     re.parseandeval("d <- dist(m, 'euclidean')"); //compute distance matrix     re.parseandeval("clustering <- hclust(d, 'average')"); //do hierarchical      //clustering on distance matrix      re.parseandeval("b <- c(1, 3, 4, 7, 8, 11, ......)"); //a set of number     //of clusters cut dendrogram. exact size of b , values      //depend on java function. lets assume here vector     //of integers, , there can 20-100 elements.      object result = re.parseandeval("foreach(i=b, .combine=c, .packages='fpc') %dopar% {\n                subgraph <-cutree(clustering, k=i)                calinhara(m, subgraph)                }"); //line a. use doparallel iterate through b, cut     //dendrogram, , compute calinski & harabasz statistic. object      //returned foreach block vector of double values      re.parseandeval("rm(list=ls())");     re.parseandeval("gc()"); //line b     re.parseandeval("gc()");                      javafunctiontoprocess(result);  } 

by logging seems block of code indicated "line a" causing memory leak, each time program breaks can see in log executing block. question how fix this, might helpful if can explain of specific questions below

1) in foreach block, r create multiple copies of object "clustering", 1 each core parallelize?

2) if yes 1), these duplicate objects garbage collected automatically? if not cause memory leak

3) necessary , change line to:

object result = re.parseandeval("foreach(i=b, .combine=c, .packages='fpc') %dopar% {\n                subgraph <-cutree(clustering, k=i)                calinhara(m, subgraph)                gc()                gc()                }"); 

4) necessary , "force" r release memory exiting r environment, , re-create r instance periodically, e.g., this: re.parseandeval("q()"); re = null; re = [code create new rengine]; ...//continue processing

any suggestions appreciated!


Comments

Popular posts from this blog

c - Bitwise operation with (signed) enum value -

xslt - Unnest parent nodes by child node -

python - Healpy: From Data to Healpix map -