c# Regex non letter characters from a string -


my terminology may little out here, trying strip out non letters string in c#, remove dashes ampersands etc, retain things accented characters , chinese characters. c# examples have seen on have regex new regex("[^a-za-z0-9 -]");, needs beyond ascii characters.

string input = "i- +am. 相关 azurÉe& /30%";

string output = "i 相关 azurÉe 30";

a starting point remove characters according unicode character class. example, code removes characterized punctuation, symbol or control character:

string input = "i- +am. 相关 azurÉe& /30%"; var output = regex.replace(input, "[\\p{s}\\p{c}\\p{p}]", ""); 

you try whitelisting approach, allowing classes. example, keeps characters letters, diacritics, digits , spacing:

var output = regex.replace(input, "[^\\p{l}\\p{m}\\p{n}\\p{z}]", ""); 

see in action.


Comments

Popular posts from this blog

php - Calling a template part from a post -

Firefox SVG shape not printing when it has stroke -

How to mention the localhost in android -