samedi 12 décembre 2009

Introduction to the DNS protocol (part 1)

Now let's have a look at a very important protocol of the TCP/IP stack: the DNS (Domain Name System) protocol. As you know, each host on an IP network (like internet) has a unique IP address, which is a 32-bit number, like 209.85.229.106. As IP addresses are difficult to memory, a more convenient way to identify hosts on the network is to use domain names, like "www.google.com", "www.wikpedia.org", etc. The DNS protocol is used to convert domain names into IP addresses, and conversely, thanks to DNS tables stored in DNS servers. The process of converting a domain name into an IP address is called "DNS resolution".

To understand how the DNS protocol works, you should read domain names from the right to the left: for instance in the name "www.google.com", the first part (on the right) is ".com", and is called the top-level domain, then "google.com" is called a subdomain of the "com" domain, and at last "www.google.com" is called a host name, or FQDN (Fully Qualified Domain Name). There is quite a few number of top-level domains, like ".com", ".net", ".org", ".fr", ".uk", ".ca", etc. (there is one reserved top-level domain for each country in the world), and each one has many subdomains, which can themselves have other subdomains, and so on. In the end, there is a hierarchy of domain names, which can be represented by a tree, like this:
ROOT
|
|-> com
| |-> google.com
| | |-> www.google.com
| | |-> mail.google.com
| |
| |-> yahoo.com
| |-> ...
|
|-> org
| |-> wikipedia.org
| |-> ...
|
|-> net
| |-> ...
|
|-> fr
| |-> yahoo.fr
| |-> www.yahoo.fr

Something which makes the DNS protocol powerful is that it is a decentralized protocol, which means there is (fortunately!) not a single DNS server in the world, but many DNS servers which can talk to each others. Each DNS server controls a set of domain names, called a DNS zone, which means it knows the IP address of each host in this zone.

Now let's imagine we want to know the IP address corresponding to the host name "www.google.com". This can be done easily with the command "host":
# host www.google.com
www.google.com is an alias for www.l.google.com.
www.l.google.com has address 209.85.229.106
www.l.google.com has address 209.85.229.99
www.l.google.com has address 209.85.229.104
www.l.google.com has address 209.85.229.103
www.l.google.com has address 209.85.229.105
www.l.google.com has address 209.85.229.147
First surprise, "www.google.com" is actually not a host name, but an alias to another host name. A DNS alias (called a CNAME in the DNS protocol) is just a shortcut to another domain name, and can be used for instance to make a domain name easier to remember.
The second surprise is that there are actually several IP addresses for the host www.l.google.com, and if you execute this "host" command several times, you will see that the addresses do not always appear in the same order. This mechanism is called round-robin, and its goal is to distribute the web traffic of Google over several servers, as each time you type "www.google.com" in your web browser, you actually use one of the 6 IP addresses listed above, with a probability of 1/6.

But how did your computer find these addresses? The first thing to know is that if your computer is connected to an IP network, it is probably configured to use one or two DNS servers; of course a DNS server has to be identified by its IP address, as you cannot use the DNS protocol to find the IP address of your own DNS server! On Linux or Mac OS X, you can see the address of your DNS server(s) in the file /etc/resolv.conf:
# cat /etc/resolv.conf
nameserver 84.103.237.140
nameserver 86.64.145.140
Here there are two servers, in case the first one is not available. If you are connected to the internet thanks to an ISP (Internet Service Provider), the DNS servers you use are probably hosted by this provider. You can check this thanks to a reverse DNS lookup, i.e by getting the domain name associated to an IP address; this can be done with the host command too, or with the dig command, which is a much more powerful command to perform DNS queries:
# dig -x 84.103.237.140

; <<>> DiG 9.6.0-APPLE-P2 <<>> -x 84.103.237.140
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28600
;; flags: qr rd ra
; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;140.237.103.84.in-addr.arpa. IN PTR

;; ANSWER SECTION:
140.237.103.84.in-addr.arpa. 4236 IN PTR ns1.rslv.n9uf.net.


;; Query time: 10 msec

Here you can see in the ANSWER SECTION, that the IP address 84.103.237.140 corresponds to the domain name "ns1.rslv.n9uf.net".

Let's see what happens when you type the command "host www.google.com". The DNS protocol is a protocol of the application layer of the TCP/IP stack, and is based on the UDP transport protocol (see previous article). A DNS server (nearly) always runs on the UDP port 53, so let's use the following tcpdump command to see what happens on the network during a DNS request:
# tcpdump -nv -X -p udp port 53

13:15:47.636801 IP (tos 0x0, ttl 64, id 50466, offset 0,
flags [none], proto UDP (17), length 59) 192.168.1.2.64419 >
84.103.237.140.53: 3099+ A? www.google.fr. (31)

00: 4500 003b c522 0000 4011 b1f1 c0a8 0102 E..;."..@.......
10: 5467 ed8c fba3 0035 0027 03d7 0c1b 0100 Tg.....5.'......
20: 0001 0000 0000 0000 0377 7777 0667 6f6f .........www.goo
30: 676c 6502 6672 0000 0100 01 gle.fr.....


13:15:47.666368 IP (tos 0x0, ttl 58, id 2756, offset 0,
flags [DF], proto UDP (17), length 203) 84.103.237.140.53 >
192.168.1.2.64419: 3099 8/0/0 www.google.fr. CNAME
www.google.com., www.google.com. CNAME www.l.google.com.,
www.l.google.com. A 209.85.229.147, www.l.google.com.
A 209.85.229.99, www.l.google.com. A 209.85.229.106,
www.l.google.com. A 209.85.229.103, www.l.google.com.
A 209.85.229.105, www.l.google.com. A 209.85.229.104 (175)

00: 4500 00cb 0ac4 4000 3a11 31c0 5467 ed8c E.....@.:.1.Tg..
10: c0a8 0102 0035 fba3 00b7 9841 0c1b 8180 .....5.....A....
20: 0001 0008 0000 0000 0377 7777 0667 6f6f .........www.goo
30: 676c 6502 6672 0000 0100 01c0 0c00 0500 gle.fr..........
40: 0100 03fa cd00 1003 7777 7706 676f 6f67 ........www.goog
50: 6c65 0363 6f6d 00c0 2b00 0500 0100 086a le.com..+......j
60: 5400 0803 7777 7701 6cc0 2fc0 4700 0100 T...www.l./.G...
70: 0100 0000 6c00 04d1 55e5 93c0 4700 0100 ....l...U...G...
80: 0100 0000 6c00 04d1 55e5 63c0 4700 0100 ....l...U.c.G...
90: 0100 0000 6c00 04d1 55e5 6ac0 4700 0100 ....l...U.j.G...
a0: 0100 0000 6c00 04d1 55e5 67c0 4700 0100 ....l...U.g.G...
b0: 0100 0000 6c00 04d1 55e5 69c0 4700 0100 ....l...U.i.G...
c0: 0100 0000 6c00 04d1 55e5 68 ....l...U.h
The message in green is the DNS query, and the message in red is the corresponding DNS reply. You can see the number "3099" at the beginning of both the query and the reply; this number is the identifier of the query, and is used by the client to know to which query the reply corresponds. Here the query is a "A?" query, which means "give me the IP address of the given host", but there are many types of DNS queries, which are all detailed in the RFC 1035.

Aucun commentaire:

Enregistrer un commentaire