нужна помощь в парсере

Question

Как правильно это реализовать, в выводе получаю ошибку AttributeError: 'NoneType' object has no attribute 'find_all'

import requests
from bs4 import BeautifulSoup
url = 'https://www.sslproxies.org'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find('table', {'id': 'proxylisttable'})
rows = table.find_all('tr')
for row in rows:
    tds = row.find_all('td')
    if len(tds) > 0:
        ip = tds[0].text.strip()
        port = tds[1].text.strip()
        print(f'{ip}:{port}')

Answer 1

from bs4 import BeautifulSoup as Soup
import requests
from bs4 import Tag
import json

response = requests.get('https://www.sslproxies.org')
soup = Soup(response.content, 'html.parser')

headers = dict(
    enumerate(
        map(
            lambda x: x.get_text(strip=True).lower().replace(' ', '_'),
            soup.select('div.fpl-list thead th')
        )
    )
)


def line_to_dict(val: Tag):
    for i in range(len(td := val.find_all('td'))):
        yield headers.get(i), td[i].get_text(strip=True)


with open('result.json', 'w', encoding='utf-8') as f:
    json.dump(
        list(
            dict(line_to_dict(item))
            for item in soup.select('div.fpl-list tbody tr')
        ),
        f,
        ensure_ascii=False,
        indent=2
    )

[
  {
    "ip_address": "198.44.162.8",
    "port": "45787",
    "code": "JP",
    "country": "Japan",
    "anonymity": "elite proxy",
    "google": "no",
    "https": "yes",
    "last_checked": "1 min ago"
  },
  {
    "ip_address": "200.105.215.22",
    "port": "33630",
    "code": "BO",
    "country": "Bolivia",
    "anonymity": "elite proxy",
    "google": "no",
    "https": "yes",
    "last_checked": "1 min ago"
  },
...
  {
    "ip_address": "213.241.205.2",
    "port": "3129",
    "code": "RU",
    "country": "Russian Federation",
    "anonymity": "anonymous",
    "google": "no",
    "https": "yes",
    "last_checked": "1 min ago"
  }
]

Answer 2

Как правильно отметил Сергей Ш по proxylisttable ничего не находит, их и в dev console нет, не понятно откуда вы его взяли.

Но да ладно, учитывая, что на сайте не так много таблиц, проще всего использовать pandas под задачу. И код будет всего в три строчки, и получим dataframe который можно фильтровать, сортировать и т.д., а также сохранить в excel, csv и любой другой формат.

import requests
import pandas as pd
pd.set_option('expand_frame_repr', False)

r = requests.get('https://www.sslproxies.org')
tables = pd.read_html(r.text)
print(tables[0])

Вывод:

         IP Address   Port Code             Country    Anonymity Google Https         Last Checked
0     190.61.88.147   8080   GT           Guatemala    anonymous    yes   yes            1 min ago
1     8.219.176.202   8080   SG           Singapore  elite proxy     no   yes            1 min ago
2      178.33.3.163   8080   FR              France  elite proxy    yes   yes            1 min ago
3    201.229.250.21   8080   DO  Dominican Republic  elite proxy     no   yes            1 min ago
4    198.59.191.234   8080   US       United States  elite proxy     no   yes            1 min ago
..              ...    ...  ...                 ...          ...    ...   ...                  ...
95   196.202.210.73  32650   KE               Kenya    anonymous     no   yes    5 hours 1 min ago
96     190.152.5.17  39888   EC             Ecuador  elite proxy     no   yes  5 hours 10 mins ago
97   36.137.106.110   7890   CN               China    anonymous    NaN   yes  5 hours 10 mins ago
98  183.221.242.111   8443   CN               China    anonymous     no   yes  5 hours 21 mins ago
99   36.138.114.102   7890   CN               China    anonymous     no   yes  5 hours 21 mins ago

Ну и чтобы получить то, что вы изначально хотели, вместо print(f'{ip}:{port}') будет:

for _, row in tables[0].iloc[:, :2].iterrows():
    print('{}:{}'.format(row[0],row[1]))

Вывод:

190.61.88.147:8080
8.219.176.202:8080
178.33.3.163:8080
…

БЛОГ НА HUSL

нужна помощь в парсере

Ответы (2 шт):

Вывод:

Вывод: