c# Regex non letter characters from a string -
my terminology may little out here, trying strip out non letters string in c#, remove dashes ampersands etc, retain things accented characters , chinese characters. c# examples have seen on have regex new regex("[^a-za-z0-9 -]");
, needs beyond ascii characters.
string input = "i- +am. 相关 azurÉe& /30%";
string output = "i 相关 azurÉe 30";
a starting point remove characters according unicode character class. example, code removes characterized punctuation, symbol or control character:
string input = "i- +am. 相关 azurÉe& /30%"; var output = regex.replace(input, "[\\p{s}\\p{c}\\p{p}]", "");
you try whitelisting approach, allowing classes. example, keeps characters letters, diacritics, digits , spacing:
var output = regex.replace(input, "[^\\p{l}\\p{m}\\p{n}\\p{z}]", "");
Comments
Post a Comment