Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

How can I split a text into an array of sentences?

Example text:

Fry me a Beaver. Fry me a Beaver! Fry me a Beaver? Fry me Beaver no. 4?! Fry me many Beavers... End

Should output:

0 => Fry me a Beaver.
1 => Fry me a Beaver!
2 => Fry me a Beaver?
3 => Fry me Beaver no. 4?!
4 => Fry me many Beavers...
5 => End

I tried some solutions that I've found on SO through search, but they all fail, especially at the 4th sentence.

/(?<=[!?.])./

/.|?|!/

/((?<=[a-z0-9)][.?!])|(?<=[a-z0-9][.?!]"))(s|
)(?="?[A-Z])/

/(?<=[.!?]|[.!?]['"])s+/    // <- closest one
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
593 views
Welcome To Ask or Share your Answers For Others

1 Answer

Since you want to "split" sentences why are you trying to match them ?

For this case let's use preg_split().

Code:

$str = 'Fry me a Beaver. Fry me a Beaver! Fry me a Beaver? Fry me Beaver no. 4?! Fry me many Beavers... End';
$sentences = preg_split('/(?<=[.?!])s+(?=[a-z])/i', $str);
print_r($sentences);

Output:

Array
(
    [0] => Fry me a Beaver.
    [1] => Fry me a Beaver!
    [2] => Fry me a Beaver?
    [3] => Fry me Beaver no. 4?!
    [4] => Fry me many Beavers...
    [5] => End
)

Explanation:

Well to put it simply we are spliting by grouped space(s) s+ and doing two things:

  1. (?<=[.?!]) Positive look behind assertion, basically we search if there is a point or question mark or exclamation mark behind the space.

  2. (?=[a-z]) Positive look ahead assertion, searching if there is a letter after the space, this is kind of a workaround for the no. 4 problem.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share

548k questions

547k answers

4 comments

86.3k users

...