Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'd like to restrict my form input from entering non-english characters. For example, all Chinese, Japanese, Cyrllic, but also single characters like: à, a, ù, ?, ü, ?, ?, ê. Would this be possible? Do I have to set up a locale on my MVC application or rather just do a regex textbox validation? Just a side note, I want to be able to enter numbers and other characters. I only want this to exclude letters.

Please advice, thank you

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
631 views
Welcome To Ask or Share your Answers For Others

1 Answer

For this you have to use Unicode character properties and blocks. Each Unicode code points has assigned some properties, e.g. this point is a Letter. Blocks are code point ranges.

For more details, see:

Those Unicode Properties and blocks are written p{Name}, where "Name" is the name of the property or block.

When it is an uppercase "P" like this P{Name}, then it is the negation of the property/block, i.e. it matches anything else.

There are e.g. some properties (only a short excerpt):

  • L ==> All letter characters.
  • Lu ==> Letter, Uppercase
  • Ll ==> Letter, Lowercase
  • N ==> All numbers. This includes the Nd, Nl, and No categories.
  • Pc ==> Punctuation, Connector
  • P ==> All punctuation characters. This includes the Pc, Pd, Ps, Pe, Pi, Pf, and Po categories.
  • Sm ==> Symbol, Math

There are e.g. some blocks (only a short excerpt):

  • 0000 - 007F ==> IsBasicLatin
  • 0400 - 04FF ==> IsCyrillic
  • 1000 - 109F ==> IsMyanmar

What I used in the solution:

P{L} is a character property that is matching any character that is not a letter ("L" for Letter)

p{IsBasicLatin} is a Unicode block that matches the code points 0000 - 007F

So your regex would be:

^[P{L}p{IsBasicLatin}]+$

In plain words:

This matches a string from the start to the end (^ and $), When there are (at least one) only non letters or characters from the ASCII table (doce points 0000 - 007F)

A short c# test method:

string[] myStrings = { "Foobar",
    "Foo@bar!"§$%&/()",
    "F?obar",
    "fóóè"
};

Regex reg = new Regex(@"^[P{L}p{IsBasicLatin}]+$");

foreach (string str in myStrings) {
    Match result = reg.Match(str);
    if (result.Success)
        Console.Out.WriteLine("matched ==> " + str);
    else
        Console.Out.WriteLine("failed ==> " + str);
}

Console.ReadLine();

Prints:

matched ==> Foobar
matched ==> Foo@bar!"§$%&/()
failed ==> F?obar
failed ==> fóóè


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...