Hello Programmers, In this post, you will learn how to solve HackerRank Detect the Domain Name Solution. This problem is a part of the Regex HackerRank Series.
One more thing to add, don’t straight away look for the solutions, first try to solve the problems by yourself. If you find any difficulty after trying several times, then look for the solutions. We are going to solve the Regex HackerRank Solutions using CPP, JAVA, PYTHON, JavaScript & PHP Programming Languages.

HackerRank Detect the Domain Name Solution
Problem
You will be provided with a chunk of HTML markup. Your task is to identify the unique domain names from the links or Urls which are present in the markup fragment.
For example, if the link http://www.hackerrank.com/contest is present in the markup, you should detect the domain: hackerrank.com. In case there are second level or higher level domains present in the markup, all of them need to be treated as unique. For instance if the links http://www.xyz.com/news, https://abc.xyz.com/jobs, http://abcd.xyz.com/jobs2 are present in the markup then [xyz.com, abc.xyz.com, abcd.xyz.com] should all be identified as unique domains present in the markup. Prefixes like “www.” and “ww2.”, if present, should be scrubbed out from the domain name.
Input Format
An Integer N. This is equal to the number of lines in the HTML Fragment which follows. A chunk of HTML Markup with embedded links, the length of which is N lines.
Output Format
One line, containing the list of detected domains, separated by semi–colons, in lexicographical order. Do not leave any leading or trailing spaces either at the ends of the line, or before and after the individual domain names.
Sample Input
10 <div class="reflist" style="list-style-type: decimal;"> <ol class="references"> <li id="cite_note-1"><span class="mw-cite-backlink"><b>^ ["Train (noun)"](http://www.askoxford.com/concise_oed/train?view=uk). <i>(definition – Compact OED)</i>. Oxford University Press<span class="reference-accessdate">. Retrieved 2008-03-18</span>.</span><span title="ctx_ver=Z39.88-2004&rfr_id=info%3Asid%2Fen.wikipedia.org%3ATrain&rft.atitle=Train+%28noun%29&rft.genre=article&rft_id=http%3A%2F%2Fwww.askoxford.com%2Fconcise_oed%2Ftrain%3Fview%3Duk&rft.jtitle=%28definition+%E2%80%93+Compact+OED%29&rft.pub=Oxford+University+Press&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal" class="Z3988"><span style="display:none;"> </span></span></span></li> <li id="cite_note-2"><span class="mw-cite-backlink"><b>^</b></span> <span class="reference-text"><span class="citation book">Atchison, Topeka and Santa Fe Railway (1948). <i>Rules: Operating Department</i>. p. 7.</span><span title="ctx_ver=Z39.88-2004&rfr_id=info%3Asid%2Fen.wikipedia.org%3ATrain&rft.au=Atchison%2C+Topeka+and+Santa+Fe+Railway&rft.aulast=Atchison%2C+Topeka+and+Santa+Fe+Railway&rft.btitle=Rules%3A+Operating+Department&rft.date=1948&rft.genre=book&rft.pages=7&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook" class="Z3988"><span style="display:none;"> </span></span></span></li> <li id="cite_note-3"><span class="mw-cite-backlink"><b>^ [Hydrogen trains](http://www.hydrogencarsnow.com/blog2/index.php/hydrogen-vehicles/i-hear-the-hydrogen-train-a-comin-its-rolling-round-the-bend/)</span></li> <li id="cite_note-4"><span class="mw-cite-backlink"><b>^ [Vehicle Projects Inc. Fuel cell locomotive](http://www.bnsf.com/media/news/articles/2008/01/2008-01-09a.html)</span></li> <li id="cite_note-5"><span class="mw-cite-backlink"><b>^</b></span> <span class="reference-text"><span class="citation book">Central Japan Railway (2006). <i>Central Japan Railway Data Book 2006</i>. p. 16.</span><span title="ctx_ver=Z39.88-2004&rfr_id=info%3Asid%2Fen.wikipedia.org%3ATrain&rft.au=Central+Japan+Railway&rft.aulast=Central+Japan+Railway&rft.btitle=Central+Japan+Railway+Data+Book+2006&rft.date=2006&rft.genre=book&rft.pages=16&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook" class="Z3988"><span style="display:none;"> </span></span></span></li> <li id="cite_note-6"><span class="mw-cite-backlink"><b>^ ["Overview Of the existing Mumbai Suburban Railway"](http://web.archive.org/web/20080620033027/http://www.mrvc.indianrail.gov.in/overview.htm). _Official webpage of Mumbai Railway Vikas Corporation_. Archived from [the original](http://www.mrvc.indianrail.gov.in/overview.htm) on 2008-06-20<span class="reference-accessdate">. Retrieved 2008-12-11</span>.</span><span title="ctx_ver=Z39.88-2004&rfr_id=info%3Asid%2Fen.wikipedia.org%3ATrain&rft.atitle=Overview+Of+the+existing+Mumbai+Suburban+Railway&rft.genre=article&rft_id=http%3A%2F%2Fwww.mrvc.indianrail.gov.in%2Foverview.htm&rft.jtitle=Official+webpage+of+Mumbai+Railway+Vikas+Corporation&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal" class="Z3988"><span style="display:none;"> </span></span></span></li> </ol> </div>
Sample Output
askoxford.com;bnsf.com;hydrogencarsnow.com;mrvc.indianrail.gov.in;web.archive.org
HackerRank Detect the Domain Name Solution in Cpp
#include <iostream> #include <string> #include <regex> #include <set> #include <algorithm> #include <iterator> // Returns the text from stdin without newline characters std::string get_text() { std::string text; std::string line; while (std::getline(std::cin, line)) text += line; return text; } template <class ForwardIt> void print_delimited(ForwardIt first, ForwardIt last, char del) { if (first == last) return; std::cout << *first++; for (; first != last; ++first) std::cout << del << *first; } const auto regex_expr = "https?://(?:www.|ww2.)?" // Begin "((?:[-[:alnum:]]+\\.)+[[:alpha:]]+)"; // Domain name int main() { std::ios_base::sync_with_stdio(false); std::cin.tie(nullptr); const auto text = get_text(); std::regex regex(regex_expr); std::set<std::string> tags; const auto begin = std::sregex_iterator(text.begin(), text.end(), regex); const auto end = std::sregex_iterator(); std::set<std::string> domain_names; std::for_each(begin, end, [&](const std::smatch& m) { domain_names.insert(m[1]); }); print_delimited(domain_names.begin(), domain_names.end(), ';'); std::cout << '\n'; }
HackerRank Detect the Domain Name Solution in Java
import java.io.*; import java.util.*; import java.text.*; import java.math.*; import java.util.regex.*; public class Solution { public static void main(String[] args) { /* Enter your code here. Read input from STDIN. Print output to STDOUT. Your class should be named Solution. */ Scanner in = new Scanner(System.in); String format = "(http|https)\\://(www.|ww2.|)([a-zA-Z0-9\\-\\.]+)(\\.[a-zA-Z]+)(/\\S*)?"; Pattern pattern = Pattern.compile(format); ArrayList<String>links = new ArrayList<String>(); int testcase = in.nextInt(); String dec = in.nextLine(); for(int i = 0;i<testcase;i++){ String assessed = in.nextLine(); Matcher match = pattern.matcher(assessed); while(match.find()){ match.groupCount(); if(links.contains(match.group(3)+match.group(4)) == false){ links.add(match.group(3)+match.group(4)); } } } Collections.sort(links); for(int j = 0;j<links.size();j++){ if(j == links.size()-1){ System.out.println(links.get(j)); } else{ System.out.print(links.get(j)+";"); } } } }
HackerRank Detect the Domain Name Solution in Python
# Enter your code here. Read input from STDIN. Print output to STDOUT import re N = int(raw_input().strip()) tags = set() for i in range(N): str = raw_input().strip() t = re.findall(r"[=\'\"](?:https{0,1}\:\/\/(?:ww[w0-9]\.){0,1})([0-9a-zA-Z][0-9a-zA-Z_\-\.]+\.[a-zA-Z]+)",str) #print t for tag in t: if tag not in tags: tags.add(tag) taglist = sorted(list(tags)) print ';'.join(taglist)
HackerRank Detect the Domain Name Solution in JavaScript
function processData(input) { var lines = input.split('\n'); var N = parseInt(lines.shift(), 10); var text = lines.join(' '); var domainREStr = 'https?://(?:ww[a-zA-Z0-9_-]+\\.)?([a-zA-Z0-9_-]+(?:\\.[a-zA-Z0-9_-]+)+)[/?"\']'; var re = new RegExp(domainREStr, 'ig'); var domains = {}; var arr = null; while ((arr = re.exec(text)) != null) { domains[arr[1].trim()] = 0; } var res = []; for (var i in domains) { res.push(i); } res.sort(); process.stdout.write(res.join(';') + '\n'); } process.stdin.resume(); process.stdin.setEncoding("ascii"); _input = ""; process.stdin.on("data", function (input) { _input += input; }); process.stdin.on("end", function () { processData(_input); });
HackerRank Detect the Domain Name Solution in PHP
<?php $lin = fgets(STDIN); $content=""; for($i=0;$i<$lin;$i++){ $content .= fgets(STDIN); } //echo substr($content,strpos($content,"imshopping.rediff.com")-250,500); $urls= array(); if (preg_match_all('/https{0,1}\:\/\/([\.a-z0-9\-]+)/im',$content,$matches)){ foreach($matches[1] as $urlunfiltered){ $urlfiltered=preg_replace("/^www\./i","",$urlunfiltered); $urlfiltered=preg_replace("/^ww[0-9]+\./i","",$urlfiltered); if (preg_match("/[a-z0-9]+\.[a-z0-9]+/i",$urlfiltered)){ $urls[($urlfiltered)]=1; } } } //print_r($urls); $urls=array_keys($urls); sort($urls); echo implode(";",$urls); ?>
Disclaimer: This problem (Detect the Domain Name) is generated by HackerRank but the solution is provided by Chase2learn. This tutorial is only for Educational and Learning purposes.