Service Announcements

Crawler Update: Better Redirect Handling & Deeper Error Page Checks

Over the past days, I’ve been refining the crawling engine to make scan results more precise and technically reliable. This update focuses mainly on how redirects and error pages are handled internally. Here’s what changed.
Published on Feb 19, 2026

Redirects Are Now Analyzed Explicitly

If a URL was on the blacklist, it was supposed to be ignored.
However, when that URL returned a redirect, the crawler still followed it.
Example:

  • URL A → listed in blacklist
  • URL A responds with 301 → redirect to URL B
  • URL B returns 404
  • The 404 was included in scan statistics

What Changed

Blacklist checks now happen before any redirect is processed.

The updated logic:

  • If URL A is blacklisted → it is skipped entirely
  • No redirect is followed
  • No downstream URLs (B, C, …) are evaluated
  • No error codes from redirect targets enter the statistics

Redirect chains originating from excluded URLs are now completely ignored.

Error Pages Are No Longer Ignored

In the past, HTTP error responses like 404 or 500 would stop the parsing process. That’s technically correct behavior — but not helpful for a website scan.

Now the crawler:

  • Accepts 4xx and 5xx responses
  • Parses HTML error pages
  • Extracts and validates links on those pages
  • Detects broken assets even on 404/500 templates

Most websites return fully styled error pages containing scripts, stylesheets, images, and navigation links. These are now checked just like any other page.