object has no attribute "get"

Question

import fake_useragent
import requests
from bs4 import BeautifulSoup
from time import sleep
import json

ua = fake_useragent.UserAgent()
headers = {
    "User-Agent": ua.random
}
# project = []
for count in range(1, 50):
    sleep(3)
    url = f'https://zeto.ua/category/matritsa/asus/{count}.html'
    res = requests.get(url, headers=headers)
    soup = BeautifulSoup(res.text, 'lxml')
    data = soup.find_all('div', class_='col-lg-five col-md-3 col-sm-4 col-xs-6')

    for i in data:
        title = i.find('div', class_='product-title').text.replace('\n', ' ')
        link = 'https://zeto.ua' + i.find('a', class_="product-thumb-link").get('href')
        price = i.find('div', class_="product-price").find('span').text
        data_list = {
            "Model": title,
            "Price": price,
            "Links": link
        }
        print(data_list)
        # project.append(data_list)

        # with open('data_lst.json', 'w', encoding='utf-8') as json_file:
        #     json.dump(data_list, json_file, indent=4, ensure_ascii=False)

Ошибка:

                                    **Terminal:::::**
{'Model': '   Матрица для ноутбука Asus K53SD   ', 'Price': '2197 грн', 'Links': 'https://zeto.ua/product/matrica-dlya-asus-k53sd/185163.html'}
{'Model': '   Матрица для ноутбука Asus Eee PC 1008HA   ', 'Price': '1055 грн', 'Links': 'https://zeto.ua/product/matrica-dlya-asus-eee-pc-1008ha/184642.html'}                              
 **link = 'https://zeto.ua' + i.find('a', class_="product-thumb-link").get('href')
AttributeError: 'NoneType' object has no attribute 'get**

Запускается парсер, всё хорошо, но затем я получаю ошибку: NoneType object has no attribute 'get'

Answer 1

При парсинге страниц - так как нельзя предсказать, что любой нужный вам элемент будет содержать искомый атрибут - просто необходимо использовать блок try/except

    for elem, i in enumerate(data):
        try:
            title = i.find('div', class_='product-title').text.replace('\n', ' ')
            link = 'https://zeto.ua' + i.find('a', class_="product-thumb-link").get('href')
            price = i.find('div', class_="product-price").find('span').text
            data_list = {
                "element": (count, elem),
                "Model": title,
                "Price": price,
                "Links": link
            }
        except AttributeError:
            print("i.find() == NoneType")
        except:
            print("Something else went wrong")
        else:
            print(data_list)
        # finally:
        #     continue

Я добавил в распечатку нумерацию страниц и элементов, из которой видно где вылетает ошибка. Первая:

{'element': (3, 23), 'Model': '   Матрица для ноутбука Asus Eee PC 1008HA   ', 'Price': '1055 грн', 'Links': 'https://zeto.ua/product/matrica-dlya-asus-eee-pc-1008ha/184642.html'}
i.find() == NoneType
{'element': (3, 25), 'Model': '   Матрица для ноутбука Asus ZenBook UX430   ', 'Price': '3813 грн', 'Links': 'https://zeto.ua/product/matrica-asus-zenbook-ux430/620881.html'}

Посмотрев можно увидеть, что там вместо ссылки другой элемент:

<span onclick="if (!window.__cfRLUnblockHandlers) return false; window.location.href='/product/matrica-asus-x751ldv/466064.html'" class="product-thumb-link cursor-pointer" data-cf-modified-121ce7e759cbd7f31e64489a-="">

Вы можете как пропустить его, так и сделать особую обработку для таких элементов, если они вам нужны.

Answer 2

data = soup.find_all('div', class_="product-info")
for i in data:
    title = i.a.text.strip()
    link = 'https://zeto.ua' + i.a['href']
    price = i.find(attrs={'class': 'product-price'}).span.text
    data_list = {
        "Model": title,
        "Price": price,
        "Links": link
    }

БЛОГ НА HUSL

object has no attribute "get"

Ответы (2 шт):