[Python] Парсер StopGame

Внимание! Если Вы видите рекламму на нашем ресурсе, установите для вашего браузера расширение Adblock Plus | AdBlocker Ultimate | Блокировщик |

Выбираете категорию и количество страниц.
p.s.: нормально парсит только категорию Изумительно, в других плохо парсит описание

Код Python:

import requests
from bs4 import BeautifulSoup
import time
import json

headers = {
    # можете поставить свой
    "User-Agent": "Opera/9.80 (X11; Linux i686; U; ru) Presto/2.8.131 Version/11.11"
}

def get_links(how_much, chose_theme):
    global counter, max_page, headers, theme

    if how_much <= max_page and how_much > 0:

        while counter != int(how_much):
            time.sleep(1)
            counter += 1

            responce_link = f'https://stopgame.ru/review/new/{theme.get(chose_theme)}/p{counter}'
            responce = requests.get(url=responce_link, headers=headers)
            soup = BeautifulSoup(responce.text, 'lxml')
            items_links = soup.find_all('a', class_='article-image image')

            for item in items_links:
                all_items_links.append(
                    f"https://stopgame.ru{item.get('href')}")

    else:
        print('dalbaeb')

def parse_mode():
    global all_items_links, headers, big_data

    for game_page in all_items_links:
        time.sleep(1)

        responce = requests.get(url=game_page, headers=headers)
        soup = BeautifulSoup(responce.text, 'lxml')

        game_title = soup.find('h1', class_='article-title').find('a').string
        try:
            game_short_desc = soup.find(
                'section', class_='article article-show').find('p').text
        except:
            game_short_desc = ''
        game_specs = soup.find_all('div', class_='game-spec')
        game_data = {}

        for game_spec in game_specs:
            label = game_spec.find('span', class_='label').string
            value = game_spec.find('span', class_='value').text.strip("\n").strip(
                "\t").strip("\r").replace("\n", "").replace("\t", "").replace("\r", "")
            new_item = {label: value}
            game_data.update(new_item)

        big_data.append(
            {
                'info': game_data,
                'title': game_title,
                'Short desc': game_short_desc
            }
        )

        print(f'Добавил {game_title}')

what = (f'1) Наш выбор\n'
        f'2) Изумительно\n'
        f'3) Похвально\n'
        f'4) Проходняк\n'
        f'5) Мусор')

chose_theme = input(f'{what}\nВаш выбор( цыфра ): ')

theme = {'1': 'stopchoice', '2': 'izumitelno',
         '3': 'pohvalno', '4': 'prohodnjak', '5': 'musor'}

main_link = f'https://stopgame.ru/review/new/{theme.get(chose_theme)}/p1'
responce = requests.get(url=main_link, headers=headers)
soup = BeautifulSoup(responce.text, 'lxml')
pages_link = soup.find_all('a', class_='item')
all_items_links = []

max_page = int(pages_link[-1].text)
counter = 0
big_data = []

how_much = int(
    input(f' Всего страниц в категории {max_page}\n Сколько нужно спарсить: '))

get_links(how_much=how_much, chose_theme=chose_theme)
parse_mode()

with open(f'{theme.get(chose_theme)}.json', 'a', encoding='utf-8') as file:
    json.dump(big_data, file, indent=4, ensure_ascii=False)
    print(f'Сохранил в файл!')

Информация:
Посетители, находящиеся в группе Гости, не могут скачивать файлы с данного сайта.

Пароль к архивам: ComInet

Парсер, stopgame, Python

丨

Автор материала

...

Логин на сайте: ...

Группа: ...

Статус: ...

Категория

PHP / Python / Ruby / Perl / JavaScript

Поделись с друзьями

Может быть интересно

C++ / C# / .NET

[C ++] Простая система HWID

Ключи | Активаторы | Патчи и т.д.

Ключи для касперского 2021

Шаблоны и графика

HTML False mirror визитка

Комментариев: 0

Copyright © <span class="pb7gSvoQ"><a href="https://www.ucoz.ru/" title="Используются технологии uCoz" target="_blank" rel="nofollow">uCoz</a></span>  <script type="text/javascript"> (function(m,e,t,r,i,k,a){ m[i]=m[i]||function(){(m[i].a=m[i].a||[]).push(arguments)}; m[i].l=1*new Date(); k=e.createElement(t),a=e.getElementsByTagName(t)[0],k.async=1,k.src=r,a.parentNode.insertBefore(k,a) })(window, document,'script','//mc.yandex.ru/metrika/tag.js', 'ym'); ym(88065197, 'init', {accurateTrackBounce:true, trackLinks:true, clickmap:true, params: {__ym: {isFromApi: 'yesIsFromApi'}}}); </script> <noscript><div><img src="https://mc.yandex.ru/watch/88065197" style="position:absolute; left:-9999px;" alt="" /></div></noscript>  2020-2025. All right reserved.

Новое на форуме

Главное меню

Комментарии

Мы ВКонтакте