Unraveling the Web's Addressing System: A deep dive into the DNS infrastructure

Demystifying the Technology that Translates Domain Names into IP Addresses

Unraveling the Web's Addressing System: A deep dive into the DNS infrastructure

Introduction to Browsers and the Internet

Browsers are a gateway to the World Wide Web. They allow users to access and interact with websites. However, the process of displaying a website on your screen involves a complex sequence of events that users are typically unaware of.

In this article, we will explore the inner workings of browsers. Before we start, we need to understand that the internet is a vast network of interconnected computers, facilitating the exchange of information globally. At the heart of this lies the Domain Name System (DNS).

Understanding the Domain Name System (DNS)

The Domain Name System(DNS) acts as the internet's global address book. It is a critical component that translates human-readable domain names like google.com into IP addresses so that browsers can load Internet resources. The DNS helps us by removing the need to remember IP addresses.

Each device connected to the internet has a unique IP address, which other machines use to identify and locate the device. In other words, IP addresses are how computers identify other computers on the internet.

While DNS helps translate domain names to IP addresses for accessing websites, it also plays a crucial role in email delivery, Voice over IP (VoIP) communications, and other internet services that rely on mapping domain names to network resources. The DNS system is a distributed database that maps human-friendly domain names to their respective IP addresses.

How DNS Works?

When you first type or enter a website address in your browser, the DNS system kicks in. The DNS is hierarchical, and it consists of root servers, Top-Level Domain (TLD) servers, and authoritative name servers.

  1. Local Cache

When you enter a URL in the address bar, the computer, and the Operating System first check its local cache, as it keeps a local cache of domain name mappings to IP addresses. This local cache stays small because it kicks out the domains that you have not visited in a while or domains that send out their expiration dates.

  • Chrome browser, you can view the database by typing "chrome://net-internals/#dns" in the address bar.

  • Firefox: Type "about:networking#dns" in the address bar.

  • Safari: Access the DNS cache by going to "Safari" > "Preferences" > "Advanced" > "Show Develop menu in menu bar" and then from the menu bar select "Develop" > "Show Web Inspector" > "Storage" > "Local Storage" > "chrome://net-internals/#dns"

  1. Root Servers

However, if you have never visited the website before, it means your computer does not have the IP address for that domain in its cache. This is where the route resolver comes in. These are typically provided by your Internet Service Provider (ISP) and keep their own cache.

The ISP locates the root server. There are 13 root server addresses scattered around the globe that are responsible for keeping track of a subset of the millions of domain names. The ISP/root resolver asks the root name servers which name server knows the .com domains.

The root name server responds with the IP address of a TLD name server that tracks ".com" domains. The root server knows where the .com TLD server is and directs the ISP/root resolver to it.

These root servers sit at the top of the DNS hierarchy and are separated by [letter].root-servers.net, where [letter] ranges from A to M, like a.root-servers.net, b.root-servers.net, and so on. Although there are only 13 root servers, each organization provides multiple physical servers distributed around the globe to support the entire internet.

  1. Top-Level Domain (TLD) Servers

The TLD server knows which domain name registrar has access to the site's authoritative name servers. The coordination of most top-level domains (TLDs) belongs to the Internet Corporation for Assigned Names and Numbers (ICANN).

Some common types of TLDs include:

  • Country Code TLDs (ccTLDs): Usually two-letter ISO codes, like .uk for the United Kingdom or .in for India.

  • Internationalized Domain Names (IDNs): TLDs written in native languages, like .भारत for India in the Devanagari script.

  • Generic TLDs (gTLDs): .com, .net, .org, .edu, and others.

  • Infrastructure TLDs:

    .arpa - This TLD is primarily used for reverse DNS lookups, which is the opposite of regular DNS lookups. Reverse DNS allows mapping an IP address back to a domain name.

    .root - Reserved for the root zone of the DNS hierarchy.

    .local - Used for local area network hostnames that aren't supposed to be publicly accessible over the internet.

    .localhost - A special-use domain name reserved for the loopback IP address (127.0.0.1) on a local computer.

    .test - Designated for use in testing of current or prospective domain name resolution

In addition to the common TLDs mentioned, there are also sponsored TLDs (sTLDs) like .aaa, .ibm, and .intel, which are reserved for specific organizations or companies. Recently, a large number of new generic TLDs (ngTLDs) like .app, .online, and .cloud have been introduced to provide more options for domain registration.

  1. Authoritative Name Servers

The name servers provide answers to the DNS server queries to give the resolver the IP address it is searching for. When a domain is purchased, the domain registrar reserves the name and communicates the authoritative name servers to the TLD registry.

Authoritative name servers are the ultimate authority on domain information. They provide answers to DNS queries related to websites, emails, and other services associated with the domain. There are usually multiple name servers attached to a domain for load balancing and increased availability, especially in case of failure.

To find the authoritative name servers for your domain, you can run a WHOIS query.

The DNS Resolution Process

Here's a summary of the entire DNS resolution process:

  • When the IP address cannot be found in the local cache, the resolver is called.

  • The resolver goes to the root server.

  • The root server tells the resolver where to find the TLD server for the domain (e.g., .com).

  • The TLD server gives the addresses of the authoritative name servers for the domain.

  • The authoritative name server provides the IP address the resolver needs.

  • The resolver returns to the operating system with the IP address.

  • The operating system saves the IP address in its cache and hands it to the browser.

  • The browser sends a request to the IP address to retrieve the website's HTML, CSS, and other content.

But how can the resolver find the IP address for 'ns1.example.com' before knowing where to look up 'example.com' itself? Since 'ns1.example.com' is a subdomain of 'example.com', this could lead to a circular dependency.

To break this circular dependency, DNS uses something called 'glue records'. When asking the TLD servers about 'example.com', they provide some extra "glue" data including the IP addresses of the domain's nameservers like 'ns1.example.com'. This "glue" information allows the resolver to cut the circular chain and go directly to the authoritative nameservers i.e.by providing the resolver with the IP address for ns1.example.com.

This entire DNS resolution process typically takes less than a second, but it involves multiple steps and servers across the globe. And caching plays a crucial role in speeding up the process by storing frequently accessed DNS records for a set period, reducing the need to query authoritative servers repeatedly.

Punycode and IDNA

You know, one of the quirks of the Domain Name System (DNS) is its limitation to using ASCII characters. Now, that's all well and good if you're working with languages that use the Roman alphabet, but what about those that don't?

This is where Punycode comes in. It's this clever encoding system that lets you take any old Unicode character and turn it into something that fits into the ASCII mold. This means you can have domain names with all sorts of characters, even ones that aren't part of the standard ASCII set.

Have you ever noticed how when you try to visit a website with non-ASCII characters in the domain, your browser's address bar starts looking like it's been hit by a tornado? That's Punycode at work, translating those funky characters into something DNS-friendly. But not all browsers show these Punycoded strings. Take Safari on iOS, for example; it might just show you the original domain name, especially if there's an emoji involved.

For Chrome:

It becomes like this:

For Safari:

Speaking of emojis, wouldn't it be cool to have them in domain names? Some registrars will tell you it's possible, but when you try to register one, it's like hitting a brick wall. Why? Well, it's all down to a little something called the Internationalized Domain Names in Applications (IDNA) protocol.

You see, allowing any old Unicode character in domain names opens up a whole can of worms, particularly when it comes to security. There's this thing called a homograph attack, where someone could use characters that look similar to trick you into visiting a fake website. It's like replacing the letter 'm' with 'rn' – looks legit, right? Especially when you squint they look similar. But with Unicode characters, it's even trickier to spot the fakes.

As there are almost 150,000 Unicode characters and some of these are almost indistinguishable from ASCII ones.

For instance, take:

my-domain.example vs my-domain.example

These look the same but they aren't - the 'o' in the first one is not an 'o' at all, it is a Greek small letter omicron. But depending on your font, you might not be able to tell these apart either.

Allowing any Unicode characters in the domain name could be problematic.

This applies to emojis too - there are lots of emojis that look similar like:

But there are also emojis with different skin colors that can be combined. Different browsers might render the emojis differently:

To combat this, IDNA 2008 laid down the law on what characters are allowed in domain names – and emojis? Sorry, they didn't make the cut. Sure, Punycode can still do its thing, but most top-level domains (TLDs) stick to the IDNA rules, meaning no emojis are allowed.

There are a few exceptions, though. Some country-code TLDs (ccTLDs) like .ws, .to, and .tk don't follow IDNA 2008.

For more information on the IDNA protocol and the characters allowed in domain names, you can check out the official Unicode Consortium website here.

Manipulating DNS

a. DNS Overriding

One common use of DNS manipulation is 'DNS overriding'.

DNS overriding allows you to locally reroute a domain name to an IP address of your choosing. On Linux, you simply edit the /etc/hosts file as root to add the IP-domain mapping you want to be overridden. Mac users can do this easily through Gas Mask. For Windows users, it involves opening up Notepad as an admin and tweaking the hosts file buried in the C:\Windows\System32\drivers\etc folder.

By putting a domain override entry in the hosts file, you're essentially hard-coding that domain to resolve to a specific IP address before your computer even has a chance to query DNS servers. So whenever you try visiting that website in your browser, it'll just pull up the server you mapped it to instead of whatever it normally resolves to.

Web developers frequently employ this when building sites, forcing their browser to load the work-in-progress code from their local dev environment instead of the live production server. While convenient for development, DNS overrides only work on the machine you set them on.

b. Disadvantages of DNS Overriding

While convenient for development, DNS overrides only work on the machine you set them on. When demonstrating to a client, you may need to ask them to update their file.

DNS caches can sometimes cause issues, even after making changes to the hosts file. You may need to verify whether you are dealing with a development or production server before making any adjustments.

The hosts file doesn't support wildcard domains, so you'd need separate entries for sub-domains. It can conflict with other local services using the same domain. And the more entries you stuff in there, the trickier it becomes to maintain and debug issues.

Another disadvantage of DNS overriding through the hosts file is that it can cause conflicts if the same domain is being used for different purposes, such as web browsing and email. Additionally, it can be challenging to manage and update the hosts file as the number of overrides increases, especially in larger organizations.

By default, personal computers use DNS resolvers provided by their Internet Service Providers (ISPs). However, these resolvers can often be unreliable, take longer to resolve addresses (even if they are geographically closer to our homes), raise privacy concerns, and may not resolve sites blocked by our ISPs at their discretion.

Therefore, it is generally recommended to use alternative DNS resolvers like 1.1.1.1 (and 1.0.0.1) by Cloudflare or Google's 8.8.8.8 and 8.8.4.4 on personal computers. These third-party services tend to be faster, more robust, and without the same privacy and censorship concerns as ISP-provided resolvers.

But they're not a perfect solution either. Putting all your DNS eggs in one basket creates a single point of failure. If that public resolver ever goes down or gets blocked, you're suddenly cut off from the internet altogether until it's back online. Therefore, it's important to consider redundancy and failover mechanisms when using third-party DNS resolvers.