MATLAB: Simple string analysis - Find locations

Question

Welcome To Ask or Share your Answers For Others

MATLAB: Simple string analysis - Find locations

asked Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

Here I have an example of a piece of literature that I would like to do a simple analysis on. Notice the different sections:

str =   "Random info - at beginning-man. "+ ...
        "Random info still continues. "+ ...
        "CHAPTER 1. " + ...
        "Random info in middle one, "+ ...
        "Random info still continues. "+ ...
        "1 This is sentence one of verse one, "+ ...
        "This still sentence one of verse one. "+ ...
        "2 This is sentence one of verse two. "+ ...
        "This is sentence two of verse two. "+ ...
        "3 This is sentence one of verse three; "+ ...
        "this still sentence one of verse three. "+ ...
        "CHAPTER 2. " + ...
        "Random info in middle two. "+ ...
        "Random info still continues. "+ ...
        "1 This is sentence four? "+ ...
        "2 This is sentence five, "+ ...
        "3 this still sentence five but verse three!"+ ...
        "Random info at end's end."+ ...
        "Random info still continues. ";

I'm interested all the data dat can be called "Random info in middle", which is after a Chapter name, and before a verse beginning.

I would like to use the function "extractBetween" to extract the information found between "CHAPTER #" and "1"(First Verse).

I know how to use the function "extractBetween", but how can I determine the locations just before "CHAPTER #" and just after "1"(First Verse), for any amount of Chapters?

At the end I would like to have such an answer, where the random information for each Chapter is allocated in a table:

I've tried, regexp() and findstr(), but have no success. All help will be appreciated. Thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

249 views

1 Answer

深蓝 · Answer 1 · 2022-01-31T07:23:20+0000

You can use a regular expression with regexp to match the text.

[tokens, matches] = regexp(str, '(CHAPTER d).s*(.*?)1', 'tokens', 'match');

for k = 1:numel(tokens)
    fprintf('%s	%s
', tokens{k}(1), tokens{k}(2)); 
    % or: fprintf('%s	%s
', tokens{k}); 
end

Will print

CHAPTER 1   Random info in middle one, Random info still continues. 
CHAPTER 2   Random info in middle two. Random info still continues.

To explain the regular expression (CHAPTER d).s*(.*?)1:

(CHAPTER d) matches CHAPTER with any number, and the () brackets surrounding it will capture the match in the tokens variable.
. matches the period
s* matches any possible whitespace
(.*?)1 will capture any text till the next 1 in the text. Note the questionmark to make it match lazy, otherwise it will match all the text till the last 1 in str.

Categories

MATLAB: Simple string analysis - Find locations

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags