Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have raw text from a chain of emails.

For all inquiries please reach out
From: abc@abc.com At: 01/27/21 23:29:28To: CompanyA
Cc: 123@123.com, 345@345-YYY.com Subject: this is the subject line
From: CompanyB(company) <mmm@mmm.net>
Sent: Wednesday, January 27, 2021 12:51 PM
From: 999@999.com At: 01/27/21 23:29:28To: CompanyA
Cc: 888@888.com, 777@777.com Subject: tect

Through Regex I need to capture the email addresses between the first word From to the first Subject. In the above the match should be:
abc@abc.com
123@123.com
345@345-YYY.com

I do have ( ){0,1}([w.]@[w+-.]) to get email addresses. I will match through Python Regex Lib.

question from:https://stackoverflow.com/questions/66067870/regex-find-email-address-between-first-two-instances-of-strings

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
151 views
Welcome To Ask or Share your Answers For Others

1 Answer

One option is to use 2 patterns with re.

First find all the matches from From: till the first occurrence of Subject:

(?s)From:.*?Subject:

Then for all those matches, get the email address like patterns without matching < and >

[^<>s@]+@[^@s<>]+

Example

import re
s = ("For all inquiries please reach out
"
            "From: abc@abc.com At: 01/27/21 23:29:28To: CompanyA
"
            "Cc: 123@123.com, 345@345-YYY.com Subject: this is the subject line
"
            "From: CompanyB(company) <mmm@mmm.net>
"
            "Sent: Wednesday, January 27, 2021 12:51 PM
"
            "From: 999@999.com At: 01/27/21 23:29:28To: CompanyA
"
            "Cc: 888@888.com, 777@777.com Subject: tect")

for match in re.findall(r"(?s)From:.*?Subject:", s):
        print(re.findall(r"[^<>s@]+@[^@s<>]+", match))

Output

['abc@abc.com', '123@123.com,', '345@345-YYY.com']
['mmm@mmm.net', '999@999.com', '888@888.com,', '777@777.com']

If you don't want to cross another occurrence of From: or Subject, you can use a negative lookahead to check if the line does not contain any of the strings.

^From:.*(?:
?
(?!From|.*Subject:).*)*
?
.*Subject:

Regex demo

Example

for match in re.findall(r"(?m)^From:.*(?:
?
(?!From|.*Subject:).*)*
?
.*Subject:", s):
        print(re.findall(r"[^<>s@]+@[^@s<>]+", match))

Output

['abc@abc.com', '123@123.com,', '345@345-YYY.com']
['999@999.com', '888@888.com,', '777@777.com']

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share

548k questions

547k answers

4 comments

86.3k users

...