How GeoIPHub classifies an IP address.

Name: GeoIPHub API
Availability: InStock
Author: GeoIPHub

Every flag in a GeoIPHub response can be traced to a method, a source, and a timestamp. This page documents all of them: the seven verification method families, the 30+ data sources behind them, how often each one refreshes, and how the evidence becomes a score.

01 — Philosophy

Verified over aggregated.

Most IP intelligence is aggregation: buy a few lists, merge them, sell the union. The problem is that lists disagree, decay, and never say why. When a customer gets blocked, “a vendor list said so” is not an answer.

GeoIPHub is built the other way around. A flag is set when we can verify it — a protocol that answered a handshake, a provider's own published server list, a registry record, an official exit list. Each classification stores its evidence trail: the method that fired, the source it used, the port and protocol where relevant, a confidence value, and the timestamp it was established.

That evidence surfaces in the response itself. scoring.detection_methods names the exact signals behind a verdict, detection.vpn_provider names the operator when its own server list identifies the IP, and meta carries the classification and scan timestamps. If we can't substantiate a field, it returns null — an honest gap beats a confident guess.

02 — Methods

Seven method families.

No single method decides a classification. These seven families each contribute evidence, and the strongest evidence — direct verification — carries the most weight.

ASN classification

Every IP belongs to an autonomous system, and the AS tells you a lot before you look at anything else. GeoIPHub maintains a curated table of 890 hosting ASNs — networks whose business is renting servers, not connecting households — plus 14 ASNs operated by VPN providers, and separate tables for satellite, anycast, and CDN networks.

The BGP/ASN table is rebuilt every 3 days from the global routing table, so an IP that moves between networks is reclassified within days, not quarters.

Official provider lists

Where an operator publishes its own infrastructure, we use that publication directly — it is the strongest evidence available. Server lists are pulled from 10 VPN providers' own published APIs: NordVPN, Mullvad, Surfshark, Private Internet Access, Windscribe, AirVPN, IVPN, PrivadoVPN, Riseup, and FastestVPN. An IP on one of these lists is attributed by name in detection.vpn_provider.

The same principle covers clouds and crawlers. Official IP ranges come from 10 cloud providers — AWS, Google Cloud, Azure, Oracle, Cloudflare, Fastly, GitHub, DigitalOcean, Linode, and Alibaba. Crawlers and AI bots are verified against 9 official IP-range feeds: Googlebot, Google Special Crawlers, Bingbot, Applebot, DuckDuckBot, GPTBot, ChatGPT-User, OAI-SearchBot, and CCBot. Tor exits come from the Tor Project's official exit list, and privacy relays are matched against iCloud Private Relay and Cloudflare WARP ranges.

Active protocol probing

Lists tell you what an IP was. A handshake tells you what it is. The scanner attempts real protocol handshakes on 25+ ports across 11 VPN and proxy protocols: OpenVPN, WireGuard, IKEv2, PPTP, L2TP, Shadowsocks, SOCKS5, SOCKS4, HTTP CONNECT, and HTTP forward proxies.

A completed handshake is close to conclusive: a server that answers an OpenVPN handshake on a reachable port is running OpenVPN, whatever its WHOIS record says. This is why GeoIPHub flags VPN and proxy infrastructure only after verification — a probe result, a provider list match, or both — rather than inheriting someone else's stale label.

Threat & blocklist feeds

Threat flags come from public feeds we name, so you can audit them yourself: Spamhaus DROP and EDROP for hijacked and criminal-controlled ranges, FireHOL Level 1 for aggregate attack sources, Feodo Tracker for botnet command-and-control servers, and Team Cymru's bogon data for unallocated space that should never appear as a source address. All are refreshed daily.

Each IP is additionally checked against 4 DNS blocklist zones; the response's dnsbl_sources field names every zone that listed it. Known internet-wide scanner ranges — including Shodan and Censys — are flagged separately as scanners, because measurement traffic is not the same as attack traffic.

WHOIS / RDAP analysis

Registration records are pulled over RDAP from all 5 Regional Internet Registries — ARIN, RIPE, APNIC, LACNIC, and AFRINIC — and analyzed against 46 VPN keywords and 27 hosting keywords. Infrastructure registered as "VPN hosting services" rarely hides it in its registry record.

Keyword matches surface as has_vpn_keywords and has_hosting_keywords in the whois group. They are weighted as supporting evidence, never as a sole reason to flag.

Verified crawler & AI-bot identification

A User-Agent header is a claim, not proof. Crawlers and AI bots are verified against their operators' own published IP ranges — an IP that claims to be Googlebot only counts as verified when it sits inside Google's official ranges. This is the check that separates evidence from decoration.

The published ranges are pulled from 9 official crawler and AI-bot feeds. A request that claims a crawler identity but falls outside the operator's ranges sets crawler.spoofed instead of slipping through, and a crawler-like identity we cannot confirm sets crawler.unverified.

Live classification

When a lookup hits an IP that isn't in the database yet, the request doesn't fail and it doesn't get a guess. The Rust scanner classifies the address live — registry data, list matches, probes — in under 2.5 seconds, returns the result, and persists it. Every subsequent lookup of that IP is served from the memory-mapped database in sub-millisecond time.

03 — Freshness

Refresh cadence.

IP intelligence decays at different rates — a botnet C2 list goes stale in days, a WHOIS record in months. Each feed family refreshes on its own schedule.

Feed family	Cadence
Threat feeds (Spamhaus DROP/EDROP, FireHOL L1, Feodo Tracker, bogons)	Daily
Tor exit list	Daily
VPN / proxy protocol probes	Every 2 days
Crawler & AI-bot IP-range feeds	Every 2 days
BGP / ASN table	Every 3 days
WHOIS / RDAP records	Weekly
Cloud provider IP ranges	Weekly

04 — Scoring

How evidence becomes a score.

The 0–100 fraud_score combines 40+ weighted signals. Direct verification — a completed handshake, a provider-list match — weighs more than circumstantial evidence like a WHOIS keyword.

The model also subtracts. A residential ISP removes 20 points; a verified crawler removes 30. Exonerating evidence matters because the cost of a false positive is a real customer turned away. For the same reason, scores on carrier-grade NAT ranges are capped: thousands of people share a CGNAT address, and one bad actor shouldn't condemn all of them.

Every verdict ships with its reasoning: scoring.detection_methods lists exactly which signals fired, and scoring.confidence reports how much evidence sits behind the result.

Score	recommended_action	Meaning
0–25	allow	No meaningful risk evidence. Let the request through.
26–50	review	Some signals fired. Log it, watch it, don't block it.
51–75	step_up	Material evidence. Add friction — a challenge or verification step.
76–100	block	Strong, multi-signal evidence of anonymization or abuse.

05 — Architecture

Architecture in brief.

The pipeline is a Rust scanner running 10 scheduled steps — feed ingestion, list reconciliation, protocol probing, RDAP pulls, DNS resolution, and classification among them. Postgres is the source of truth for every classification and its evidence trail.

The serving path is deliberately boring: classifications are compiled into a memory-mapped database that the API reads directly, swapped atomically once a day. That's what makes sub-millisecond lookups possible — no query planner, no cache misses to a remote store, just a structured read from mapped memory. IPs not yet in the database are classified live in under 2.5 seconds and persisted for next time.

06 — Limits

What we don't claim.

IP intelligence is probabilistic. An IP address identifies a network interface, not a person; geolocation resolves to a city at best, never a street address; and any IP's behavior can change the day after it was scanned. Anyone selling certainty in this domain is selling something else.

So we bound our claims. We don't claim street-level precision — geo_confidence exists precisely because city-level answers vary in reliability, and it returns null when the evidence is thin. We don't identify individuals, and we don't try. We flag infrastructure based on what it verifiably is, and we show the evidence so you can disagree with us.

If you find a classification we got wrong, tell us — corrections make the dataset better for everyone. Write to app@geoiphub.com with the IP and the request_id from your lookup.

See the methodology's output for yourself — run a lookup in the free IP lookup tool or read the full API reference.

Get Started Free