Investigator Use
Siteliner is a free website analysis tool that identifies duplicate content, broken links, and internal link structure issues within a website. For OSINT investigators analyzing a target website's structure, web developers, and SEO professionals, Siteliner provides automated scanning that reveals website composition, content patterns, and technical characteristics without requiring server access.
From an OSINT perspective, Siteliner reveals information about how a website is structured that may not be obvious from normal browsing. The duplicate content analysis identifies pages on a website that share substantial text — which may indicate automatically generated content, copied content from other sources, or template-based pages used for spam or SEO manipulation. Sites with very high duplicate content ratios may be content farms or fraudulent sites using programmatic content generation.
The internal link analysis maps the website's navigation structure — which pages link to which others, which pages are most prominently linked from the site's navigation, and which pages are "orphaned" (not linked from anywhere else). Orphaned pages are particularly interesting for investigators: they may be test pages, forgotten content, hidden resources, or staging content left publicly accessible but not listed in normal navigation.
The broken links report identifies URLs within the site that no longer resolve — providing intelligence about the site's maintenance state, whether referenced external resources still exist, and whether the operator is actively managing the site.
Siteliner reports page-level statistics including word count, unique content percentage, and internal link count — useful for characterizing different sections of large websites and identifying statistical anomalies in content patterns.
The tool is limited to crawling up to 250 pages by default, with higher limits for registered users. Very large sites will require sectional analysis.
For OSINT investigations, running Siteliner against a target website as part of initial reconnaissance provides a structured view of site architecture that complements other tools like BuiltWith (technology fingerprinting), WHOIS (registration data), and Wayback Machine (historical versions).
Document the scan date and crawled page count alongside Siteliner findings for investigation records.
Before You Pivot
Record Context
Capture the target, search terms, and why this source is relevant before you leave the page.
Preserve Evidence
Archive volatile pages, save screenshots, and keep timestamps for anything that may change.
Corroborate
Treat one tool as a lead source. Confirm important findings with independent sources.
Related Tools
ArchiveBox
Web & URL OSINT
ArchiveBox is self-hosted open-source web archiving for preserving websites, social posts, and online evidence for investigations.
Builtwith
Web & URL OSINT
Web technology information profiler tool. Find out what a website is built with.
Check short url
Web & URL OSINT
CheckShortURL expands shortened URLs to reveal the final destination before clicking, supporting safe analysis of potentially malicious links.
Cute Stats
Web & URL OSINT
Cutestat provides website analytics including traffic estimates, Alexa rank, server details, WHOIS data, and SEO metrics for any domain.
Down for who?
Web & URL OSINT
Down For Everyone Or Just Me confirms whether a website is globally offline or unavailable locally during OSINT investigations.
Fast Osint Crawler
Web & URL OSINT
Photon is a fast OSINT crawler extracting URLs, emails, files, subdomains, and metadata from any target website for investigators.