← Back to Blog

AUTOMATING WHOIS LOOKUPS: DOMAIN INTELLIGENCE AT SCALE

Published: 2026-05-24

WHAT WHOIS RETURNS IN 2025

GDPR enforcement beginning in 2018 and subsequent ICANN policy changes gutted the registrant data available in most gTLD WHOIS records. Before that, a single WHOIS query on a .com domain would return the registrant's name, organization, street address, phone number, and email. Today, the same query on most .com, .net, .org, and other ICANN-managed TLDs returns a stripped record.

What you reliably get: registrar name and IANA ID, creation date, expiration date, last-updated timestamp, name servers, and domain status codes. What is almost always redacted: registrant name, organization, email, physical address, and phone number. The privacy proxy email provided (like contact@withheldforprivacy.com) is a forwarding address managed by the registrar's privacy service, not the actual registrant contact.

Country-code TLDs operate under their own registries and follow their own rules. Many ccTLDs — .de, .nl, .ru, .cn and others — still expose registrant information, or expose it selectively based on the registrant type (individual vs. business). If the target owns infrastructure under a ccTLD, the WHOIS record may contain the full registration detail that gTLDs no longer provide.

Despite redaction, the remaining fields contain more intelligence than they appear to. A domain created 28 days before a phishing campaign, registered through a registrar known for bulk registrations, using nameservers associated with bulletproof hosting, and expiring in 1 year reads very differently from a 15-year-old domain registered through a corporate registrar account with custom authoritative nameservers. Registration metadata tells a story about intent and infrastructure even without registrant identity.

RDAP: THE MODERN REPLACEMENT FOR PLAINTEXT WHOIS

WHOIS is a plaintext TCP protocol from 1982. It returns unstructured text with inconsistent field names across different registrars and TLD operators — some write "Creation Date:", others write "Registered:", others write "Domain Name Commencement Date:". Parsing it reliably requires regex soup and constant maintenance.

RDAP (Registration Data Access Protocol) is the JSON-native successor, standardized in RFC 7483 (2015). RDAP returns structured JSON with consistent field semantics, handles authentication for tiered access (allowing registrars to return full data to credentialed law enforcement or registrant-self-queries while restricting public access), and uses HTTPS rather than raw TCP. Most TLDs now support RDAP, and ICANN mandates it for all gTLD operators.

Query RDAP directly without any library:

import requests

def rdap_lookup(domain: str) -> dict:
    """Query RDAP bootstrap for structured domain registration data."""
    resp = requests.get(
        f"https://rdap.org/domain/{domain}",
        timeout=10,
        headers={"Accept": "application/rdap+json"},
    )
    resp.raise_for_status()
    return resp.json()

data = rdap_lookup("example.com")

# Extract events (registration, expiration, last changed)
for event in data.get("events", []):
    print(f"{event['eventAction']}: {event['eventDate']}")

# Extract nameservers
for ns in data.get("nameservers", []):
    print(f"NS: {ns['ldhName']}")

The rdap.org bootstrap service handles RDAP server discovery automatically — you don't need to know which RDAP server handles .com vs .io vs .de. For production use, implement the bootstrap lookup yourself using IANA's RDAP bootstrap registry to avoid the single-point dependency.

BULK WHOIS AUTOMATION WITH PYTHON

For investigations involving multiple domains — a threat actor's infrastructure cluster, a company's registered assets, a lookalike domain set — bulk WHOIS automation is essential. The python-whois library handles the majority of gTLDs and provides Python-native parsed output:

import whois
import time
from datetime import datetime

domains = [
    "target.com",
    "target.net",
    "target-security.com",
    "target-login.com",
]

results = []
for domain in domains:
    try:
        w = whois.whois(domain)
        created = w.creation_date
        # creation_date can be a list for some TLDs
        if isinstance(created, list):
            created = created[0]
        results.append({
            "domain": domain,
            "registrar": w.registrar,
            "created": created.isoformat() if isinstance(created, datetime) else str(created),
            "nameservers": w.name_servers,
        })
        print(f"[+] {domain}: {w.registrar}, created {created}")
    except Exception as e:
        print(f"[-] {domain}: {e}")
    time.sleep(2)  # WHOIS servers rate-limit aggressively; respect them

Rate limiting is non-negotiable. Most WHOIS servers enforce hard throttles of 10–20 queries per minute per IP. Exceeding these limits results in temporary bans or TCP connection resets. For large-scale lookups (hundreds of domains), spread queries across sessions with a 2–3 second delay minimum, and consider rotating through multiple query paths (direct WHOIS + RDAP bootstrap) to distribute load.

OpenOSINT's search_whois tool wraps python-whois with enforced timeouts, error handling for malformed records, and output formatting for terminal readability. The AI agent invokes it as part of domain investigation chains where the raw data feeds into Claude's reasoning about infrastructure age and registrar patterns.

READING BETWEEN THE LINES OF REDACTED RECORDS

When registrant identity is fully redacted, the remaining metadata contains correlatable signals that privacy protection does not hide.

Nameserver patterns: Custom nameservers (ns1.targetcorp.com) indicate direct infrastructure ownership. Shared nameservers (ns1.cloudflare.com, ns1.digitalocean.com) indicate infrastructure providers. Certain nameservers are strongly associated with specific hosting ecosystems — recognizing these patterns lets you cluster domains that use the same infrastructure even without shared registrant data.

Registrar clustering: Bulk domain buyers tend to register through the same registrar for volume discounts or API access. If 30 lookalike domains all show Namecheap as registrar with privacy protection enabled, that's a different infrastructure signature than 30 domains through different enterprise registrars. GoDaddy, Namecheap, NameSilo, and Porkbun each have distinct user populations.

Registration date density: A cluster of similar domains (target-login.com, target-secure.com, target-account.com) all registered within a 48-hour window, all expiring in 1 year, all using the same nameservers — this pattern is strong evidence of coordinated infrastructure even with all registrant data hidden.

Privacy proxy attribution: The specific privacy proxy service used is not anonymous. "Withheld for Privacy ehf" (an Icelandic company) is distinct from "Contact Privacy Inc. Customer" (Tucows/OpenSRS) which is distinct from GoDaddy's proxy service. Some privacy proxy providers have been more cooperative with legal process than others — this matters for law enforcement cases.

WHEN WHOIS IS COMPLETELY HIDDEN: FALLBACK SOURCES

Some registrars provide effectively nothing beyond nameservers and dates. For these cases, orthogonal data sources often recover the information WHOIS no longer provides.

Certificate Transparency: Every TLS certificate issued for a domain is logged in public CT logs. The certificate subject and SAN fields contain the domain names, and historical certs may have been issued before privacy protection was applied. Query crt.sh for a target domain and look at certificates issued years ago — many pre-GDPR certs contain organization names in the subject. Also see subdomain enumeration via CT logs for the secondary benefit of certificate data.

Passive DNS: Services like PassiveTotal, SecurityTrails, and VirusTotal Passive DNS log which IP addresses a domain has historically resolved to. Correlating historical IP addresses across multiple domains can reveal shared infrastructure even when WHOIS is empty. If three domains with redacted registrants all pointed to the same IP range in 2022, that's a cluster signal.

Historical WHOIS snapshots: ViewDNS.info, DomainTools (commercial), and domain.glass maintain historical WHOIS snapshots. Domains registered before GDPR-era redaction often have complete registrant records in historical snapshots. The privacy protection your target applied in 2019 doesn't erase the 2015 record from archives.

The IP address a domain currently resolves to is always queryable via DNS — that's outside WHOIS scope entirely. Resolving the domain and feeding the IP into IP intelligence tools frequently recovers organization attribution even when the domain registration is fully anonymized.

SEE ALSO


Home · Blog · Tools · GitHub