парсим википедию

Question

Вытаскиваю инфу с википедии, но код зациклился и не выходит. Подскажите как выйти

from lxml import html
import requests
from bs4 import BeautifulSoup
url = 'https://ru.wikipedia.org/wiki/Категория:Животные_по_алфавиту'
an_list = []

while True:
    soup = BeautifulSoup(url, 'lxml')
    page = requests.get(url)
    soup = BeautifulSoup(page.text, "html.parser")
    animals = soup.select('#mw-pages li')
    for i in animals:
        an_list.append(i.text)
    animals2 = soup.find('div', id='mw-pages').find_all('a')
    for a in animals2:
        if a.text == 'Следующая страница':
            url = 'https://ru.wikipedia.org/' + a.get('href')
            page = requests.get(url).text
            break

print(an_list)

Answer 1

import requests
from bs4 import BeautifulSoup

url = 'https://ru.wikipedia.org/wiki/Категория:Животные_по_алфавиту'
an_list = []

while True:
    page = requests.get(url)
    
    soup = BeautifulSoup(page.text, "html.parser")
    animals = soup.select('#mw-pages li')
    an_list.append([i.text for i in animals])

    a = soup.find('div', id='mw-pages').find_all('a')[-1]
    url = 'https://ru.wikipedia.org/' + a.get('href')
    if a.text != 'Следующая страница':
        break

расширенная версия https://ru.stackoverflow.com/a/1430299/470333

БЛОГ НА HUSL

парсим википедию

Ответы (1 шт):