python - Scrapy and Selenium error : Element not found in the cache - perhaps the page has changed since it was looked up Stacktrace -

- August 15, 2010

i want extract data amazon.
source code :

    scrapy.contrib.spiders import crawlspider     scrapy import selector     selenium import webdriver     selenium.webdriver.support.select import select     time import sleep     import selenium.webdriver.support.ui ui     scrapy.xlib.pydispatch import dispatcher     scrapy.http import htmlresponse, textresponse     extraction.items import produititem      class runnerspider(crawlspider):       name = 'products'       allowed_domains = ['amazon.com']       start_urls = ['http://www.amazon.com']        def __init__(self):          self.driver = webdriver.firefox()       def parse(self, response):         items = []         sel = selector(response)         self.driver.get(response.url)         recherche = self.driver.find_element_by_xpath('//*[@id="twotabsearchtextbox"]')         recherche.send_keys("a")         recherche.submit()         resultat = self.driver.find_element_by_xpath('//ul[@id="s-results-list-atf"]')         resultas = resultat.find_elements_by_xpath('//li')         result in resultas:           item = produititem()           lien = result.find_element_by_xpath('//div[@class="s-item-container"]/div/div/div[2]/div[1]/a')           lien.click()           #lien.implicitly_wait(2)           res = self.driver.find_element_by_xpath('//h1[@id="aiv-content-title"]')           item['titre'] = res.text           item['image'] = lien.find_element_by_xpath('//div[@id="dv-dp-left-content"]/div[1]/div/div/img').get_attribute('src')           items.append(item)          self.driver.close()         yield items

when run code error :

element not found in cache - perhaps page has changed since looked stacktrace:

if tell selenium click on likn moved original page page behind link.

in case have result site urls products on amazon click 1 of links in result list , moved detail site. in case site changes , rest of elements want iterate on in for loop not there -- that's why exception.

why don't use search result site extract title , image? both there need change xpath expressions right fields of lien.

update

to title search result site extract text in h2 element of a element want click.

to image need take other div in li element: in xpath select div[2] need select div[1] image.

if open search result site in browser , @ sources developer tools can see xpath expression use elements.

Search This Blog

Ruby Co

python - Scrapy and Selenium error : Element not found in the cache - perhaps the page has changed since it was looked up Stacktrace -

Comments

Post a Comment

Popular posts from this blog

c - Bitwise operation with (signed) enum value -

xslt - Unnest parent nodes by child node -

YouTubePlayerFragment cannot be cast to android.support.v4.app.Fragment -