Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm trying to extract the email address of each restaurant on TripAdvisor.

I've tried this but keeps returning an [ ]:

response.xpath('//*[@class= "restaurants-detail-overview-cards-LocationOverviewCard__detailLink--iyzJI restaurants-detail-overview-cards-LocationOverviewCard__contactItem--89flT6"]')

Code snippet off the TripAdvisor page is below:

<div class="restaurants-detail-overview-cards-LocationOverviewCard__detailLink--iyzJI restaurants-detail-overview-cards-LocationOverviewCard__contactItem--1flT6"><span><a href="mailto:info@canopylounge.my?subject=?"><span class="ui_icon email restaurants-detail-overview-cards-LocationOverviewCard__detailLinkIcon--T_k32"></span><span class="restaurants-detail-overview-cards-LocationOverviewCard__detailLinkText--co3ei">Email</span><span class="ui_icon external-link-no-box restaurants-detail-overview-cards-LocationOverviewCard__upLinkIcon--1oVn1"></span></a></span></div>
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
413 views
Welcome To Ask or Share your Answers For Others

1 Answer

First: you had mistake in class name.

Second: it is class in <div> but @href is in <a>. And <a> is not directly after <div> so you need

'//*[@class="..."]//a/@href'

(I skip class name because it is too long to display it)


But instead of so long class name you can try

'//a[contains(@href, "mailto")]/@href'

I tested xpath using lxml

text = '''<div class="restaurants-detail-overview-cards-LocationOverviewCard__detailLink--iyzJI restaurants-detail-overview-cards-LocationOverviewCard__contactItem--1flT6">
<span><a href="mailto:info@canopylounge.my?subject=?">
<span class="ui_icon email restaurants-detail-overview-cards-LocationOverviewCard__detailLinkIcon--T_k32"></span>
<span class="restaurants-detail-overview-cards-LocationOverviewCard__detailLinkText--co3ei">Email</span>
<span class="ui_icon external-link-no-box restaurants-detail-overview-cards-LocationOverviewCard__upLinkIcon--1oVn1"></span>
</a></span>
</div>'''

import lxml.html

soup = lxml.html.fromstring(text)

print(soup.xpath('//*[@class="restaurants-detail-overview-cards-LocationOverviewCard__detailLink--iyzJI restaurants-detail-overview-cards-LocationOverviewCard__contactItem--1flT6"]//a/@href'))
print(soup.xpath('//a[contains(@href, "mailto")]/@href'))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...