Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I am trying to scrape this website using BeautifulSoup and Regex. While doing so, I encountered a question which was having "double quotes" and I wanted to replace the "double quotes" and save it as a .txt file. But it is not replacing the "double quotes". We tried .replace() method but I failed. The code is as follows:

url = 'http://www.sanfoundry.com/operating-system-mcqs-process-scheduling-queue/'
r = requests.get(url)
soup = bs(r.content)
data = soup.find_all('div', {'class':'entry-content'})
data1 = data[0].text
pattern = r'^d{1,2}[.|)]([s|S].*)|(^[a-z])s.*)|^View Answers?(Answer:.*)'
#pattern = r'^d{1,2}[.|)]s*(.*)|(^[a-z])s.*)|^View Answers?(Answer:.*)'
reg = re.compile(pattern)
#with open(r'C:UsersdhvaniGoogle DrivePythonData Scrapingyb.txt', 'a') as f:
with open(r'C:UsersJeri_DabbaGoogle DrivePythonData Scrapingyb.txt', 'a') as f:

    for i in data1.split('
'):
        if reg.search(i).group(1):
           y = reg.search(i).group(1)
           y = y.replace('"', '')
           f.write(y + "
")

When I checked the .txt file the "double quotes" was not replaced. What might be the problem?

I am new to python.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
165 views
Welcome To Ask or Share your Answers For Others

1 Answer

This website includes characters that aren't 'normal' double quote characters i.e. not " U+0022

The site includes right and left double quotation marks unicode U+201C and U+201D

You can replace these:

y = y.replace('"', '')
y = y.replace('“', '')
y = y.replace('”', '')

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...