Website scraping, price analysis, competitor monitoring in Python worth 17569 hours

Website scraping, price analysis, competitor monitoring in Python worth 17569 hours

May 27, 2025

Amid fierce competition in the consumer market, the pricing strategy of a shop depends on the offers of multiple retailers. The price of a product can change in real time several times a day. Every seller wants to attract buyers by holding sales promotions and lowering prices on popular categories of merchandise. However, if competitors don't offer similar products, it enables one to raise the price. Commodity experts must analyse prices and the assortment of goods in the market to maintain their reputation and retain steady customers. It is extremely difficult to adjust pricing in a timely manner without computer algorithms, especially when your warehouses are stocked with thousands of products.

exclamation mark in the circle

We can develop parsers in Python, Java, PHP or C# for .NET.

The service solves the following tasks:

  • Economic intelligence
  • Competitor intelligence
  • Competitor price comparison
  • Monitoring MSRP and MAP for brands
  • Shop pricing
  • Marketplace analytics service
  • Monitoring on-shelf availability

How does it work?

Day and night, the marketplace analytics service scans several hundred online stores that sell electronics, home appliances, children's and sports goods, building supplies, books, clothing, jewellery, and many other items. Website parsing is connected to the largest platforms, managing more than ten million products, such as Amazon, eBay, and AliExpress. Over 700 parsers have been developed in Python. Our clients include well-known European companies.

info

Replenishing the database, updating prices, assortments, and stock balances is automated through repetitive ‘parsing’ (or scraping) of websites – the so-called information ‘donors’.

The client's products are compared to the competitor's products in a semi-automatic mode.

  • The texts of names and descriptions are analysed considering the morphology of the language.
  • Exact product matches are made with the help of ‘assessors’ – operators who select the correct pair of products from the list of relevance-sorted variants offered by the system. Both the client's employees and the service's workers can act as assessors.

The results of monitoring product prices are:

  • Lists of changed prices for similar or identical goods
  • Information on the availability of similar or identical goods
  • Full exports of the product catalogue
  • Partial exports based on selected criteria

The monitoring data is made available:

  • In your personal area
  • As periodic CSV, XLSX, JSON, or XML exports
  • As customized exports

Architecture

A client's personal area is implemented based on the Django framework in Python. The data is stored in a decentralized PostgreSQL-based storage. The cloud parsers are run in a local cloud inside Docker containers. File exchange between processes running on different servers is organized via a network repository – a shared folder over the network. Vertical and horizontal service scaling is managed through the Nginx load balancer. The parsers' code is executed on different servers as part of a limited task queue via the Celery scheduler. Chrome and Firefox browsers are launched in containers on different servers within a Selenium farm.

A recommended configuration for server equipment for a similar service on hosting is provided below. Up to 350 browsers can run on two parser farms simultaneously, with approximately 15 browsers per server in each parser farm.

01 Web application server Web-application server
  • 24 CPU
  • 128 Gb RAM
  • 2 Tb SSD
02 Database server Database server
  • 24 CPU
  • 128 Gb RAM
  • 2 Tb SSD
03 Parser server Parser server
  • 24 CPU
  • 256 Gb RAM
  • 2 Tb SSD
  • 8 Tb HDD
04 Selenium- or Selenoid-farm with USB modems. Selenium- or Selenoid-farm with USB-modems
  • 1 balancer, 10 servers with web browsers
  • CPU with 14–24 cores
  • RAM 12–64 Gb
05 Selenium- or Selenoid-farm using proxy Selenium- or Selenoid-farm using proxy
  • 1 balancer, 10 servers
  • CPU with 14–24 cores
  • RAM 12–64 Gb

Proxy classification

One of the ways to bypass a site's protection for further scraping is by outputting traffic from different IP addresses. The access scheme is adjusted individually for each donor site. Some may require a large array of IP addresses, while others may only need regular pauses. The network technology for accessing a site via another address is called a proxy server. Unlike HTTP proxy servers, SOCKS proxies transmit all the data from the client without adding anything of their own. From the destination server's perspective, the data received from the SOCKS proxy is identical to the data the client would have sent directly, without proxying.

Process

According to the application protocol, we differentiate between HTTP, SOCKS4, or SOCKS5 proxies. According to the network protocol, we differentiate between IPv4 or IPv6 proxies.

Payment methods

01 Free Free
02 With traffic restrictions Proxy with unlimited traffic and a fixed subscription fee
03 With a fee per packet of traffic A proxy with a subscription fee for a traffic packet, with additional payment for traffic overrun

Type of server devices

01 Highlighted Private
  • Servers in data centres
02 Mobile Mobile
  • Smartphone applications
  • USB-adapters with a SIM-card
  • Routers with a SIM-card
  • Routers with several SIM-cards or several USB-ports
03 Resident Self-hosted
  • Internet provider's routers in blocks of flats
  • Legal botnets on Internet users' PCs
  • Greyware operating on computers

What we recommend

Private servers in data centres are typically in the same subnetwork, and for that reason, they are more likely to be blocked by anti-bot tools. Residential proxies are the best and most expensive proxies, with the least chance of being blocked. They accept payments in dollars or euros.

01 Dedicated proxy Private proxies
  • PROXY6.NET
  • FineProxy
02 Mobile proxy Mobile proxies
  • MobileProxy.space
  • proxyzzz.com
  • SOAX
  • bright data
03 Resident proxy Self-hosted proxies
  • SOAX
  • bright data

Anti-captcha

‘Completely Automated Public Turing test to tell Computers and Humans Apart’ is a comprehensive definition of CAPTCHA, and there's nothing more to add on that front.

Captcha

Google's CAPTCHA uses algorithms to add noise to images. Neural networks have great difficulty recognizing such images and often make significant errors. This is why automatic services capable of solving complex CAPTCHAs don't exist and likely won't be developed anytime soon. All CAPTCHA-solving services rely on human resources.

info

Anti-Captcha.com is the most efficient service.

Parsing

Database replenishment, stock balance renewal, and price updates are conducted through repetitive parsing of online stores. Prices are updated via partial parsing of preset URLs, which are retrieved by searching for identical products in competitors' stores.

Partial parsing starts automatically according to the schedule set in the system settings. URLs leading to identical goods become irrelevant over time. To update them, we must perform a full parse, and sometimes even search for matching product pairs all over again.

Full parsing of large online stores can take a long time, sometimes up to several weeks, especially if there is complex website parsing protection in place. New clients, due to their business specifics, often require parsing the same store in multiple regions repeatedly and in full. To avoid overloading the server park, the parsing of large data sources is initiated manually.

Unprecedented acts of generosity

It turns out that shops rarely use EAN codes. They are often more likely to identify products using an internal, invented code. Some sellers make the mistake of using the same SKU twice, sometimes even for completely different products. For example, price monitoring data mistakenly swapped a product with SKU ‘134-ADC’ and the shop started selling iPhones at the price of a pizza.

Designers of popular websites often change the site's appearance and internal structure, while e-commerce developers implement new and more advanced methods of protection against bots. These circumstances compel technicians to respond to parsers' development failures using the following tools:

  • Cloud parsers launch planning
  • Parser status monitoring
  • Error monitoring

Planner

This webpage contains a list of clients whose parsers are scheduled to be launched. The planner's list includes the following data:

  • Time elapsed since the parsers' termination
  • The date and time of the next planned parser launch
  • Client's name
  • Information parsing mode
  • Parsing completion percentage, calculated based on the number of successfully completed cloud parsers
  • The percentage of data obsolescence in the client's personal area
Inspector - the parsing planner

When a planner's entry is selected, the list of parsers is displayed, including the following information:

  • The name of the shop
  • The date and time of the last launch
  • Parsing modes – full or partial
  • Completion status and existing errors

The page specifying parser actions includes the following:

  • Manual launch
  • Termination

Session log

It shows the launch history and the results of the parsers' work, as well as which cloud parsers are currently running.

The session log includes:

  • Internet shop and region
  • Parsing mode
  • Launch date and time
  • Total operating time of the parser
  • The number of products found during the current and previous sessions
  • The number of new products
  • The number of deactivated products that failed to be found
  • The number of errors and warnings

In the top right corner, the sessions can be filtered by:

  • Successful completion
  • Critical error
  • Active sessions
Inspector - parser session log

Runtime log

Clicking on the session completion time will download an archive containing the logs of the finished parser. If the session is active, the browser will open a text runtime log.

Parser messages include:

  • The time of the event
  • Module name
  • Logging level: DEBUG, INFO, WARN, ERROR
  • Message text

The runtime log window header includes:

  • Parser name
  • Parser operation time
  • Number of products found
  • Number of errors
  • Termination button
Realtime log

Selenium farm

Some donor sites are built using reactive JavaScript frameworks, such as Angular, Vue.js, and React. Programmers are working to upgrade their e-commerce sites, which were originally built using older web technologies.

The content of a reactive web page is generated by JavaScript code directly in the user's browser and is not available in the source HTML code. Parsing dynamic websites by running HTTP requests is either highly limited or completely impossible.

Chrome or Firefox browsers, with browser automation technology, can be used for scraping dynamic websites. Automation allows bypassing anti-bot protection by mimicking the actions of a human user. The browsers are launched in Docker containers across multiple servers in a Selenium farm. Instead of loading HTML pages, the parser code sends commands to the browser via the Selenium API.

Selenium-farm
An example of a mobile proxy farm

The Selenoid package in Go offers a web interface with a list of running VNC sessions, allowing users to view the browser window in real time.

Parsing from a browser is several times slower than classic scraping. Selenium parsing typically operates in multiple processes, with one parser interacting with several copies of a browser launched in different containers.

Example of parsing
Detailed parsing window

The main purpose of both Selenium and Selenoid is automated software testing. Advanced protection mechanisms can detect Selenium.

In addition to Selenium, there are Puppeteer from Google and Playwright from Microsoft, which interact with the browser being tested via WebSocket and the DevTools protocol. On one hand, Puppeteer and Playwright are less prone to detection. On the other hand, they cannot execute actions prohibited by the JavaScript security policy.

If a website checks the browser fingerprint, we use a customized browser, Dolphin{anty}, to bypass its protection.

Modem farm
An example of a mobile ‘backstreet’ proxy-farm

Personal area

The main page of the client's personal area includes a list of shops, featuring:

  • Shop name and logo
  • Total number of products
  • Average product price
  • Date and time of the last parsing of web pages
Personal account - a list of sites connected to monitoring

Product catalogue

It displays a list of products retrieved by parsing the client's and competitors' shops, as well as by parsing product images. The catalogue can be viewed in table mode or grid mode. In table mode, extended parameters are shown.

  • Shop name
  • City
  • SKU
  • Product name
  • Availability indicator
  • Discounted price
  • Full price
  • Discount amount in percentage
  • Date of price and availability update

In grid mode, you can view the product image and a short list of parameters:

  • Product name
  • Product image
  • Discounted price
  • Full price (without discount)
  • Brand
  • Date of price and availability update
Personal account - product catalog

Full-text search and product filtering are available by:

  • Shop name
  • City
  • Product category
  • Brand
  • Availability for sale
  • Price (from and up to)
  • Discount amount in percentage terms (from and up to)
  • Date of monitoring

The product card page contains full information about the product:

  • Shop name
  • Product category
  • Product image
  • SKU
  • Title
  • Description
  • Properties
  • Discounted price
  • Price without discount
  • Availability
  • Date of price and availability update
  • Graph of price behaviour throughout the entire monitoring period
Personal account - product card

A competitor price comparison

Price change monitoring in shops is conducted through regular price parsing and comparing the client's products to similar or identical products in competitors' shops. Semi-automated product comparison is carried out first through morphological text analysis, and then by operators who select the correct product pair from several variants suggested by the system, sorted by relevance.

A price comparison table is formed using the monitoring results:

  • Information on the client's product (rows):
    • SKU
    • Title
    • Type of product pair – identical or similar
  • Product prices (columns):
    • Name and location (city) of the shop
    • Minimum price of all similar products found, including discounts

The degree of price difference between the client's price and the competitors' prices is visualized through the background colour:

  • A higher price is highlighted with red background tones
  • A lower price is highlighted with green background tones

The rock-bottom price

When comparing prices, stock manager Jake didn't pay attention to how halogen bulbs were sold on the market: some sellers sold them individually, while others sold them in packs of six. After comparing the prices, Jake's shop offered a ‘generous’ discount and started selling bulbs at a loss.

We have developed the ability for full-text search and product filtering by:

  • Shop name
  • Product category
  • City
  • Brand
  • Availability
  • The ratio of the client's price to the competitor's price
  • Type of product pair – identical or similar

The price comparison chart, based on the filters and the search query, can be exported in CSV or XLSX formats. The background colour fill of the cells in the web version of the table is retained in the XLSX exported file.

Personal account - price comparison with competitor's products

The product card includes detailed information about the price comparison results.

  • Information about the client's product:
    • Product image
    • SKU
    • Name
    • Description
    • Discounted price
    • Price without discount
    • Availability
    • Date and time of monitoring
  • Information about identical and similar products:
    • Shop name
    • City
    • SKU
    • Product name
    • Type of product pair – identical or similar
    • Discounted price
    • Price without discount
    • Availability
    • Date and time of monitoring
  • Graph:
    • Client's price comparison with competitors' prices throughout the entire monitoring period

Bar chart

It visualizes the results of comparing the number of the client's products with higher and lower prices in relation to similar products. Along the X-axis, the number of goods with higher or lower prices compared to similar products is plotted. The list of online shops is plotted on the Y-axis. The red part of the bar represents the number of the client's products with a higher price than that of the competitors, while the green part represents those with a lower price.

Product selection can be sorted by:

  • Product category
  • Availability
  • Type of product pair – identical or similar
  • Brand
Personal account - quantitative comparison of products with a higher and lower price

Pie chart

It visualizes the results of comparing the number of the client's products with similar products in the price difference segment. Each sector of the pie chart represents the number of products in that segment. For example, the segment ‘min–50%’ shows the number of products with a price that is at least 50% lower than the minimum price of a similar product from all price quotes.

The selection can be sorted by:

  • Product category
  • Availability
  • Type of product pair – identical or similar
  • Brand
Personal account - the quantitative ratio of goods in the price range

Box plot

It visualizes, in percentage terms, the results of comparing the client's prices with the prices of similar products from competitors. Along the X-axis, the price difference in percentage terms is plotted, while the list of online shops is plotted on the Y-axis. The price difference is represented in the form of Japanese candlesticks, where the shadows (or wicks) show the minimum and maximum cost variance, the body of the candlestick shows the lower and higher quartiles, and the bold line inside the body represents the median cost.

It is possible to filter the selection by:

  • Product category
  • Availability
  • Type of product pair – identical or similar
  • Brand
Personal account - price range comparison

Line chart

First, average product prices are determined in each product category for each shop. Then, the relationship between the average prices of products in competitors' shops and the client's shop is expressed as a percentage. The chart displays the changes in prices in competitors' shops over time.

The selection can be filtered by:

  • Product category
  • Type of product pair – identical or similar
  • Brand
Personal account - comparative analysis of price dynamics for the period

Product range comparison

The comparison of the product range in competitors' shops is conducted based on product classification and the results of assigning attributes to each product by assessors. In the ‘Range of Goods’ section, you can find the quantity of products for sale and their average price, broken down by product categories in competitors' shops. A stock manager analyses the product range in the table and on a bubble chart, both displayed on the same page.

Table view includes:

  • Title of product category
  • Columns with the names of the shops
  • The average price and the number of goods for sale in the chosen category (in the table cell)

Filtration of comparative analysis results is available by:

  • Shop name
  • Product category in the classification
  • City
  • Brand
  • Availability
  • Presence of a discount
  • Product price – from and up to

The bubble chart provides an opportunity to quantitatively evaluate the difference in the client's product range in a given category compared to their competitors. A list of shops is plotted horizontally, and a list of product classification categories is plotted vertically. The number of goods in the shops is represented by circles of varying diameters, depending on the quantity of goods. Clicking on the name of a category will open the analysis of related categories from the client's product classification.

Personal account - product range comparison

Other types of charts and graphs, displaying the results of the analysis of the product range, are available in the personal area section.

Bar chart

This chart presents the results of the comparison of the product range according to the product classification in quantitative terms. The quantity of goods is plotted along the X-axis, while the list of competitors' shops and the client's product classification categories are plotted along the Y-axis.

The selection of goods can be filtered by:

  • Product category
  • Availability of goods
  • Availability of a discount
  • Price range
  • Brand
Personal account - product range comparison

Bubble chart

This chart presents the results of the product range comparison in terms of price segments, in accordance with the client's product classification. A list of the client's and competitors' shops is plotted along the X-axis. The price range and quantity of goods in the given price segment are plotted along the Y-axis.

The selection of products can be filtered by:

  • Product category
  • Availability for sale
  • Availability of a discount
  • Price range
  • Brand
Personal account - quantitative comparison of the product range by price segments

Candlestick chart

This chart displays the results of comparing the product range in terms of the minimum, maximum, and average prices. The price range is shown along the X-axis, and the list of competitors' shops is shown along the Y-axis. The minimum, maximum, and average prices of goods are represented as candles.

The selection of products can be filtered by:

  • Product category
  • Availability of goods
  • Availability of a discount
  • Price range
  • Brand
Personal account - price range comparison

Calculator of the market value of goods

This tool allows you to set a mathematical formula for the price of each product. For example: MIN(ALL) - 300, where ALL represents an array of price offers from all competitors connected to the monitoring. The formula can use competitor names as variables, along with a necessary set of mathematical operations, such as:

  • MIN, MAX, AVG functions
  • Addition, subtraction, multiplication, division
  • Left and right parentheses

One formula can be applied to a group of several selected products. The current market price will appear in the ‘Market Price’ column.

Data can be exported to an external system via manual file export or by publishing the data to a specific URL that the client's system can access.

Personal account - market value calculator

Reports and periodic exports

In the personal area, users can export the data presented on the graphs in CSV or XLSX file formats. The service generates periodic exports upon completion of web parsing and when various events occur. Periodic exports are sent to the client's employees via email. The service stores all compiled reports and periodic exports, which are available in the ‘Reports’ section.

Personal account - periodic uploads