Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm trying to extract publication years ISI-style data from the Thomson-Reuters Web of Science. The line for "Publication Year" looks like this (at the very beginning of a line):

PY 2015

For the script I'm writing I have defined the following regex function:

import re
f = open('savedrecs.txt')
wosrecords = f.read()

def findyears():
    result = re.findall(r'PY (dddd)', wosrecords)
    print result

findyears()

This, however, gives false positive results because the pattern may appear elsewhere in the data.

So, I want to only match the pattern at the beginning of a line. Normally I would use ^ for this purpose, but r'^PY (dddd)' fails at matching my results. On the other hand, using seems to do what I want, but that might lead to further complications for me.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
714 views
Welcome To Ask or Share your Answers For Others

1 Answer

re.findall(r'^PY (dddd)', wosrecords, flags=re.MULTILINE)

should work


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...