JavaScript Regex To Get Parts Of URL

I came across an extremely useful JavaScript Regex To Get Parts Of URL lately that I thought I would document as it took some searching to find. Basically I was trying to find the host of any number of combinations of urls. Meaning, if you have http://www.adamthings.com/ I wanted to then only get adamthings.com back.

The use case for this was I need to ensure the host domains matched. So if I got an input of say http://www.alienwarefxthemes.com/ I wanted to fail it because adamthings.com does not equal alienwarefxthemes.com.

Now I realize this regex may be a bit overkill for this situation but it was one of the few that seem to handle the majority of my cases successfully.

Now, the regex. I’ll be the first to admit I don’t understand everything that is going on as I didn’t write it and regex are not my thing.

var _host_from_url = function (url) {
var clean_url = jq.trim(url);
var match = clean_url.match(/^((http[s]?|ftp):\/\/)?\/?([^\/\.]+\.)*?([^\/\.]+\.[^:\/\s\.]{2,3}(\.[^:\/\s\.]‌​{2,3})?)(:\d+)?($|\/)([^#?\s]+)?(.*?)?(#[\w\-]+)?$/i);

return match[4];
};

Sorry for the wrapping, but I wanted it to all be on the screen without scrolling. So in my case, I was returning the 5th part of the array as it was the host name. Lets look at some outcomes.

Url To Test:
http://www.adamthings.com/post/2014/02/17/hello-world-angularjs/ has 10 groups:

  1. http://
  2. http
  3. www.
  4. adamthings.com
  5. /
  6. post/2014/02/17/hello-world-angularjs/

As you can see there is a lot more information you can gather from this regex. I recommend playing with it and seeing how it works for you. Here are some other examples. (Left out the blanks to conserve space but you can see the url used and what parts it found.

www.adamthings.com has 10 groups:
www.
adamthings.com

http://www.subdomain.adamthings.com has 10 groups:
http://
http
subdomain.
adamthings.com

https://www.adamthings.com has 10 groups:
https://
https
www.
adamthings.com

adamthings.com has 10 groups:
adamthings.com

http://adamthings.com has 10 groups:
http://
http
adamthings.com

https://adamthings.com has 10 groups:
https://
https
adamthings.com

Regex to Split Single Line Address

I had the need come up in a project to split an address that gets inputted in a single line and has no validation. Apart from the obvious fix of requiring validation and saving the fields in different columns I had to come up with the best way to to split out that address so that we could wrap it in a rich snippet. This is a Regex I found, I cannot find it again..so if someone knows the original source/author please let me know so I can give the correct credit.

Since I am having such a hard time finding anything that is close to this I figured it necessitated a blog post to get the information out there.

//Splits Address formatted like this.. 1045 E Test Lane, Gilbert, AZ 85296
Regex splitAddressRegex =
new Regex(@"(?(^[^,]*,[^,]*,[\w\s]*$) # If check Condition for 2 commas if so match below
(?[^,]*)    # Place into capture group Line1
(?:,\s)     # Match but don't place into capture.
(?[^,]*)    # Place into capture group City
(?:,\s)     # Match but don't place into capture.
(?\w\w)     # Place into capture group State
(?:\s*)     # Ignore spaces
(?[\d\-]*)  # Place int Zip
(?:$|\r\n)  # Match/No group either $ or \n\r
|           # Else its a bigger address
(?[^,]*)    # Place into capture group Line1
(?:,\s)     # Match but don't place into capture.
(?[^,]*)    # Place into capture group Line1
(?:,\s)     # Match but don't place into capture.
(?[^,]*)    # Place into capture group City
(?:,\s)     # Match but don't place into capture.
(?\w\w)     # Place into capture group State
(?:\s*)     # Ignore spaces
(?[\d\-]*)  # Place int Zip
(?:$|\r\n)  # Match/No group either $ or \n\r
)", RegexOptions.IgnorePatternWhitespace);

It’s a pretty in-depth and ugly Regex but you end up with the address split into groups. So if you run the address: 1045 E Test Lane, Gilbert, AZ 85296

You end up with groups such as..

matchedAddress.Groups["Line1"].Value //1045 E Test Lane

The comments in the Regex snippet show you the group names. Again, I did not make this and would love to give credit but can’t find the original source and definitely want to have it published as it saved me a lot of time.