Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have cases where user-entered data from an html textarea or input is sometimes sent with u00a0 (non-breaking spaces) instead of spaces when encoded as utf-8 json.

I believe that to be a bug in Firefox, as I know that the user isn't intentionally putting in non-breaking spaces instead of spaces.

There are also two bugs in Ruby, one of which can be used to combat the other.

For whatever reason s doesn't match u00a0.

However [^[:print:]], which definitely should not match) and xC2xA0 both will match, but I consider those to be less-than-ideal ways to deal with the issue.

Are there other recommendations for getting around this issue?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
1.3k views
Welcome To Ask or Share your Answers For Others

1 Answer

Use /u00a0/ to match non-breaking spaces. For instance s.gsub(/u00a0/, ' ') converts all non-breaking spaces to regular spaces.

Use /[[:space:]]/ to match all whitespace, including Unicode whitespace like non-breaking spaces. This is unlike /s/, which matches only ASCII whitespace.

See also: Ruby Regexp documentation


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...