Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.0k views
in Technique[技术] by (71.8m points)

c# - Parsing string for Domain / hostName

Out customers can enter websites from domain names. They also can enter mailadresses from their contacts.

Know we need to find customers which websited whoose domain can be associated to the domains of the mailadresses.

So my idea is to extract the host from the webadress and from the url and compare them

So what's the most reliable algorithm to get the hostname from a url?

for example a host can be:

foo.com
www.foo.com
http://foo.com
https://foo.com
https://www.foo.com

The result should always be foo.com

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Rather than relying on unreliable regex use System.Uri to do the parsing for you. Use a code like this:

string uriStr = "www.foo.com";
if (!uriStr.Contains(Uri.SchemeDelimiter)) {
    uriStr = string.Concat(Uri.UriSchemeHttp, Uri.SchemeDelimiter, uriStr);
}
Uri uri = new Uri(uriStr);
string domain = uri.Host; // will return www.foo.com

Now to get just the top-level domain you can use:

string tld = uri.GetLeftPart( UriPartial.Authority ); // will return foo.com

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...