Type in a command, or "ls dictionary" to search all commands for "dictionary", etc.
|
SYNOPSIS extractDomainName [URL] EXAMPLES extractDomainName http://www.amazon.com/ returns: amazon.com extractDomainName eemadges.com returns: eemadges.com extractDomainName http://en.wikipedia.org?search=%s returns: en.wikipedia.org extractDomainName http://seek.sing365.com:8080/cgi-bin/s.cgi?q=ladytron returns: sing365.com extractDomainName https://www.cia.gov/cia/publications/factbook/geos/.html returns: cia.gov (thanks to Frank Raiser for noticing the https bug!) DESCRIPTION Extracts the domain name from the given URL. It's a tad more complex than that. Since I made this command explicitly as a building block for another command (">") it has some quirks to fit my needs. For instance, I usually wanted the domain address with all subdomains ( e.g. I wanted en.wikipedia.org not just wikipedia.org) unless those subdomains corresponded to a search subsection of a website (e.g. I preferred nytimes.com instead of query.nytimes.com). Details are in the code below. Here's the basic regexp behind extractDomainName: def extractDomainName(url) r = url=~(/^(?:\w+:\/\/)?([^\/?]+)(?:\/|\?|$)/) ? $1 : 'Not a valid URL!' r.gsub!(/((?:www)|(?:seek)|(?:query)|(?:search))\.(([^\.]+)\.([^\.]+)(\.([^\.]+))?)/, '\2') r.gsub!(/\:\d+$/, '') end Please email me (ely[dot]parra[gmail]) if you find bugs or have suggestions. -elzr.com ========== Old implementation: http://eemadges.com/extractDomainName?id=%s