Documentation
Manage Your Proxies: Review, Edit and Monitor Your Connections
Authentication
  • API Key Authentication
Proxy Operations
  • Change Proxy IP
  • List Available Locations
  • Change Proxy Location
  • Change Proxy IPv4 Rotation
  • Change Proxy IPv4 Whitelist
  • Append Proxy IPv4 Whitelist
  • Get IP Auth info
Integration
  • IOS
  • MacOS
  • Android
  • Windows
  • FoxyProxy
  • SwitchyOmega
Anti-Detect Browsers
  • Chrome
  • Firefox
  • Brave
  • AdsPower
  • Multilogin
  • GoLogin
  • Bit Browser
  • Ghost
  • Sphere
  • Clone Browser
  • Octo
  • Incogniton
  • Dolphin
  • AntBrowser
  • VMLogin
  • HideMyAcc
Other
  • Accepted Payment Methods
  • IP Blocking
  • IP Whitelisting Authentication
  • IP Rotating
  • Scrapy with ProxyPanel
  • Buying with Cryptocurrency
  • Selenium with ProxyPanel
  • Urllib3 with ProxyPanel
  • Requests with ProxyPanel
  • Playwright with ProxyPanel
  • HTTPX with ProxyPanel
  • Beatutiful Soup with ProxyPanel

Introduction

BeautifulSoup — An Overview

Beautiful Soup is a powerful Python library designed for extracting data from HTML and XML files. It simplifies the process of web scraping by providing intuitive methods for navigating, searching, and modifying the parse tree of web documents. With Beautiful Soup, you can effortlessly extract and manipulate data from complex web pages, making it an essential tool for data analysts, developers, and anyone involved in data extraction tasks. By working seamlessly with various parsers, Beautiful Soup helps streamline the process of converting raw HTML or XML into structured, usable data.

Prerequisites

  1. Install Python from the official website.

  2. Check if Python is installed correctly using the following command:

    python --version

    The output should be in the following format:

    Python 3.11.2

    If you receive an error or see a version number starting with 2.x, you need to download Python 3.x and follow the installation instructions to set it up

  3. Install the requests package:

    pip install requests
  4. Install Beautiful Soup and its required parser:

    pip install BeautifulSoup4

    This command installs the BeautifulSoup4 package, which is commonly used for parsing HTML and XML documents.

Configuring the Requests Package with ProxyPanel

We will use ProxyPanel to set up the `requests` package for parsing Amazon items with Beautiful Soup. Begin by retrieving your proxy information from the ProxyPanel dashboard.

import requests

# URL to scrape
url = 'https://www.amazon.com/s?k=laptops'  # Replace with the desired website URL

# Proxy configuration with login and password
proxy_host = '154.128.31.247'
proxy_port = 8083
proxy_login = 'john.MyProxy-1'
proxy_password = 'proxypanel!123'
proxy = f'http://{proxy_login}:{proxy_password}@{proxy_host}:{proxy_port}'

# Headers to mimic a real browser
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
    'Accept-Language': 'en-US,en;q=0.9',
    'Accept-Encoding': 'gzip, deflate, br',
    'Connection': 'keep-alive',
    'Upgrade-Insecure-Requests': '1'
}

proxies = {
    'http': proxy,
    'https': proxy
}

# Send a GET request using the proxy
response = requests.get(url, proxies=proxies, headers=headers)

# Check if the request was successful
if response.status_code == 200:
    # Process the response content
    print(response.text)
else:
    print('Request failed with status code:', response.status_code)

Make sure to replace the placeholders with your actual proxy details to complete the setup. This will ensure that Selenium uses the specified proxy for all browser interactions.

If you see a screen containing HTML, CSS, and JavaScript code, it means you have successfully scraped Amazon items using ProxyPanel proxies. Next, let’s proceed with parsing the HTML content using Beautiful Soup.

Parsing Content with Beautiful Soup

After scraping the content, the next step is to parse it using Beautiful Soup. This package specializes in parsing HTML and XML content, making it easy to extract and manipulate the data you need.

As we see in the picture, each laptop title is contained within a <span> element with the classes a-size-medium a-color-base a-text-normal. To extract these titles, you can use the browser’s Developer Tools (press F12) to inspect the page and confirm these classes. Our task is to extract all <span> elements with these classes to retrieve the titles.

The following code will guide you through parsing all these <span> elements using Beautiful Soup, allowing you to efficiently extract and work with the data.

import requests
from bs4 import BeautifulSoup

# URL to scrape
url = 'https://www.amazon.com/s?k=laptops'  # Replace with the desired website URL

# Proxy configuration with login and password
proxy_host = '154.128.31.247'
proxy_port = 8083
proxy_login = 'john.MyProxy-1'
proxy_password = 'proxypanel!123'
proxy = f'http://{proxy_login}:{proxy_password}@{proxy_host}:{proxy_port}'

# Headers to mimic a real browser
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
    'Accept-Language': 'en-US,en;q=0.9',
    'Accept-Encoding': 'gzip, deflate, br',
    'Connection': 'keep-alive',
    'Upgrade-Insecure-Requests': '1'
}

# Proxy settings
proxies = {
    'http': proxy,
    'https': proxy
}

# Send a GET request using the proxy
response = requests.get(url, proxies=proxies, headers=headers)

if response.status_code == 200:
    # Process the response content with Beautiful Soup
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # Find all elements with the specified class
    titles = soup.find_all('span', class_='a-size-medium a-color-base a-text-normal')
    
    # Print each title
    for title in titles:
        print(title.get_text())
else:
    print('Request failed with status code:', response.status_code)

Conclusion: Extracting and Utilizing Laptop Titles

The final output will be a list of laptop titles extracted from the webpage, as shown in the picture. This demonstrates the effectiveness of Beautiful Soup in parsing and retrieving specific data from HTML content. With this approach, you can efficiently gather and utilize the information you need. Thank you for following along with this guide!