c# - Ignore Special Characters (tittles) in Examine search -


using umbraco v6, examine search (not full blown lucene queries). latin/south american website. i've asked colleges how type in tittles search/url, , said don't, use "regular" characters (a-z, a-z).

i know how strip special characters out of string when passing examine, need other way around, in examine removing special characters properties match query. have numerous "nodes" have tittles in name (which 1 of properties searching on).

posts i've researched:

i've tried writing luence query (or think) i'm not getting in hits.

// q query querystring var searcher = examinemanager.instance.searchprovidercollection["customsearchsearcher"];  //var query = searcher.createsearchcriteria().field("nodename", q).or().field("description", q).compile(); //var searchresults = searcher.search(query).orderbydescending(x => x.score).takewhile(x => x.score > 0.05f);  var searchresults = searcher.search(global.removespecialcharacters(q), true).orderbydescending(x => x.score).takewhile(x => x.score > 0.05f); 

global class

    public static string removespecialcharacters(string str)     {         stringbuilder sb = new stringbuilder();         (int = 0; < str.length; i++)         {             if ((str[i] >= '0' && str[i] <= '9')                     || (str[i] >= 'a' && str[i] <= 'z' || (str[i] == '.' || str[i] == '_'))                 || str[i] == 'á' || str[i] == 'é' || str[i] == 'í' || str[i] == 'ñ' || str[i] == 'ó' || str[i] == 'ú')             {                 sb.append(str[i]);             }         }          return sb.tostring();     } 

as stated above, need special characters (tittles) removed lucene, not query passed in.

from: https://our.umbraco.org/documentation/reference/searching/examine/overview-explanation

i've read "analyzers", have never worked them before, nor know one(s) get/install/add vs, etc. better way go this??

a custom analyzer answer.

this answered on umbraco forum here: https://our.umbraco.org/forum/developers/extending-umbraco/16396-examine-and-accents-for-portuguese-language

make analyzer strips special characters:

  public class ciaianalyser : analyzer {     public override tokenstream tokenstream(string fieldname, system.io.textreader reader)     {         standardtokenizer tokenizer = new standardtokenizer(lucene.net.util.version.lucene_29, reader);          tokenizer.setmaxtokenlength(255);         tokenstream stream = new standardfilter(tokenizer);         stream = new lowercasefilter(stream);         return new asciifoldingfilter(stream);      }  } 

then same search input.

   public class cleanaccent {     public static string removediacritics(string input)     {         // indicates unicode string normalized using full canonical decomposition.          if (string.isnullorempty(input)) return input;          string inputinformd = input.normalize(normalizationform.formd);         var sb = new stringbuilder();          (int idx = 0; idx < inputinformd.length; idx++)         {             unicodecategory uc = charunicodeinfo.getunicodecategory(inputinformd[idx]);             if (uc != unicodecategory.nonspacingmark)             {                 sb.append(inputinformd[idx]);             }         }          return (sb.tostring().normalize(normalizationform.formc));     }  } 

then reference analyzer in examinesettings.config.


Comments

Popular posts from this blog

c - Bitwise operation with (signed) enum value -

xslt - Unnest parent nodes by child node -

YouTubePlayerFragment cannot be cast to android.support.v4.app.Fragment -