Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

Can anyone help me get this form selection correct?

Trying to get a crawl of google, I get the error: mechanize._mechanize.FormNotFoundError: no form matching name 'q'

Unusual, since I have seen several other tutorials using it, and: p.s. I don't plan to SLAM google with requests, just hope to use an automatic selector to take the effort out of finding academic citation pdfs from time to time.

<f GET http://www.google.com.tw/search application/x-www-form-urlencoded
  <HiddenControl(ie=Big5) (readonly)>
  <HiddenControl(hl=zh-TW) (readonly)>
  <HiddenControl(source=hp) (readonly)>
  <TextControl(q=)>
  <SubmitControl(btnG=Google ?j?M) (readonly)>
  <SubmitControl(btnI=?n???) (readonly)>
  <HiddenControl(gbv=1) (readonly)>>
>>> quit()




import os, subprocess
import re
import mechanize
from bs4 import BeautifulSoup
#prepare mechanize
br = mechanize.Browser()
br.set_handle_robots(False)
br.set_handle_equiv(False)
br.addheaders = [('User-agent', 'Mozilla/5.0')] 
br.open('http://www.google.com/')
br.select_form('q')
citation = ' www.stackoverflow.com '.strip() 
#citation = GOOGLE_BASE + Citation
print citation
br.open('http://www.google.com/')
br.select_form('q')
br.form['q'] = citation
br.submit()
data = br.read()
soup = BeautifulSoup(data)
print soup
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
1.0k views
Welcome To Ask or Share your Answers For Others

1 Answer

You are trying to select a form named q, which does not exist. It seems that the form is named f instead. (However, I was unable to verify that in my browser - even with Javascript disabled, I only saw a different name.)

A simple Google search can be done like this:

import os, subprocess
import re
import mechanize
from bs4 import BeautifulSoup

#prepare mechanize
br = mechanize.Browser()
br.set_handle_robots(False)
br.set_handle_equiv(False)
br.addheaders = [('User-agent', 'Mozilla/5.0')] 
br.open('http://www.google.com/')

# do the query
br.select_form(name='f')   # Note: select the form named 'f' here
br.form['q'] = 'here goes your query' # query
data = br.submit()

# parse and output
soup = BeautifulSoup(data.read())
print soup

This should give you the idea.

Update: How to find the right form 'selector'

To print the names of the available forms, you can do:

for form in br.forms():
    print form.name

This comes in handy when you use the interactive console.

You are not bound to use the name of the form, but you may give other hints to select the right form. For example, on some pages the forms have no name at all. Then you can still select based on the number of the form, e.g. br.select_form(nr=1) for the second form on the page. Please see help(br.select_form) for details. Also, list(br.forms()) will give you a list of all forms which you can inspect further.

Another option would be to inspect the page by hand in your usual browser.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...