xml - How do I use Saxon to do multiple search/replace on values, in a way that's efficient -


i used saxon v9 profile xsl transformation converts xml json. profiler tells me function escapes characters takes 70% of total processing time. conversion important because otherwise created json file invalid because of characters break strings.

java -jar saxon9he.jar -xsl:jsontransform.xslt -s:input.xml -o:output.json -tp 

the "method" used escape values looks this:

<xsl:template name="escapejson">         <xsl:param name="string"/>         <xsl:sequence select="replace(                               replace(                               replace(                               replace(                               replace(                                replace(                               replace(                               replace(                               replace($string, '\\','\\\\'),                                '/', '\\/'),                               '&quot;', '\\&quot;'),                               '&#xa;','\\n'),                               '&#xd;','\\r'),                                     '&#x9;','\\t'),                                 '\n','\\n'),                               '\r','\\r'),                               '\t','\\t')"/>     </xsl:template> 

i received valuable suggestions rolf lear @rolfl in this other post , cut down number of replace calls:

... replace( '\n|&#xa;','\\n'), replace( '\r|&#xd;','\\r'),       replace( '\t|&#x9;','\\t') ... 

but unfortunately fails process data within time constraints. compared original form of xsl modified 1 , time spent equal.

because xsl runs on software appliance have not file level access need solution saxon 8 because version supposed in use there. , assume integrating java xsl not option because (but have not tested yet) prevented appliance security reasons.

all strings replacing single characters, , can exploit fact. imagine majority of strings contain none of these special characters. therefore best approach might initial examination of string see if contain of these characters before doing replacement. can done efficiently using translate: if following expression true

$x eq translate($x, '\/"&#xa;....', '') 

then no replacement needs done. luck, reduce number of strings processed tiny fraction of total, efficiency of multiple replace no longer matters.

another approach recode replace logic as:

string-join(   $c in string-to-codepoints($in)   return     if ($c eq xx) "\\"     else if ($x eq xy) "\n"     else if ....     else codepoints-to-string($c),   "") 

incidentally, think replace(x, '\n', y) means same replace(x, '&#xa;', y) redundant both.

you need use saxon 8. there in fact succession of releases saxon 8.0 saxon 8.9, enormous amount of development between them. i'm not going check records see features introduced when.


Comments

Popular posts from this blog

python - Healpy: From Data to Healpix map -

c - Bitwise operation with (signed) enum value -

xslt - Unnest parent nodes by child node -