dimanche 13 décembre 2009

Introduction to the DNS protocol (part 2)

So far, you have seen what a DNS query to a name server looks like. But this does not explain how the name server of your internet provider manages to find the IP address of "www.google.com", to keep the same example. As already explained, the DNS protocol is decentralized, and when you send a query to your name server, it may have to communicate to another name server, or several other ones, to get the result. Let's see how it works.

When a name server receives a query to get the IP address corresponding to a domain name, it first checks if the domain belongs to its own zone. A zone is a domain controlled by a name server, which means the server can give an authoritative answer to all queries related to this domain. If the domain does not belong to a zone owned by the name server, it has to find which name server has the information, and then forwards the query to this server.
In order to do this, the name server reads the domain name from the right to the left: in our example, the first part of the domain is "com", so the server has to find which name server in the world knows the "com" domain. Actually, all the top-level domains, such as "com", are controlled by name servers called the "root servers". There are 13 official root servers, which have fixed IP addresses, so that every DNS server in the world can know them. You can see the list of root servers with the following command:
# dig -t ns .

;; QUESTION SECTION:
;. IN NS

;; ANSWER SECTION:
. 185710 IN NS a.root-servers.net.
. 185710 IN NS d.root-servers.net.
. 185710 IN NS f.root-servers.net.
. 185710 IN NS b.root-servers.net.
. 185710 IN NS i.root-servers.net.
. 185710 IN NS e.root-servers.net.
. 185710 IN NS h.root-servers.net.
. 185710 IN NS c.root-servers.net.
. 185710 IN NS k.root-servers.net.
. 185710 IN NS m.root-servers.net.
. 185710 IN NS j.root-servers.net.
. 185710 IN NS l.root-servers.net.
. 185710 IN NS g.root-servers.net.

;; ADDITIONAL SECTION:
b.root-servers.net. 3599912 IN A 192.228.79.201
d.root-servers.net. 3599912 IN A 128.8.10.90
k.root-servers.net. 3599912 IN A 193.0.14.129
g.root-servers.net. 3599912 IN A 192.112.36.4
h.root-servers.net. 3599912 IN A 128.63.2.53
c.root-servers.net. 3599912 IN A 192.33.4.12
i.root-servers.net. 3599912 IN A 192.36.148.17
l.root-servers.net. 3599912 IN A 199.7.83.42
m.root-servers.net. 3599912 IN A 202.12.27.33
e.root-servers.net. 3599912 IN A 192.203.230.10
a.root-servers.net. 3599912 IN A 198.41.0.4
j.root-servers.net. 3599912 IN A 192.58.128.30
f.root-servers.net. 3599912 IN A 192.5.5.241
So, your name server already knows the IP address of the root servers, and can send a DNS query (called a "NS" query, in that case) directly to one of them, to know which name server owns the "com" domain. You can simulate this DNS query with the dig command, by adding the '@' option, which specifies the IP address of the name server you want to query. For instance, let's query the root server "A":
# dig -t ns com @198.41.0.4

;; QUESTION SECTION:
;com. IN NS

;; AUTHORITY SECTION:
com. 172800 IN NS K.GTLD-SERVERS.NET.
com. 172800 IN NS I.GTLD-SERVERS.NET.
com. 172800 IN NS H.GTLD-SERVERS.NET.
com. 172800 IN NS G.GTLD-SERVERS.NET.
com. 172800 IN NS F.GTLD-SERVERS.NET.
com. 172800 IN NS A.GTLD-SERVERS.NET.
com. 172800 IN NS D.GTLD-SERVERS.NET.
com. 172800 IN NS E.GTLD-SERVERS.NET.
com. 172800 IN NS B.GTLD-SERVERS.NET.
com. 172800 IN NS J.GTLD-SERVERS.NET.
com. 172800 IN NS C.GTLD-SERVERS.NET.
com. 172800 IN NS L.GTLD-SERVERS.NET.
com. 172800 IN NS M.GTLD-SERVERS.NET.

;; ADDITIONAL SECTION:
A.GTLD-SERVERS.NET. 172800 IN A 192.5.6.30
B.GTLD-SERVERS.NET. 172800 IN A 192.33.14.30
C.GTLD-SERVERS.NET. 172800 IN A 192.26.92.30
D.GTLD-SERVERS.NET. 172800 IN A 192.31.80.30
E.GTLD-SERVERS.NET. 172800 IN A 192.12.94.30
F.GTLD-SERVERS.NET. 172800 IN A 192.35.51.30
G.GTLD-SERVERS.NET. 172800 IN A 192.42.93.30
H.GTLD-SERVERS.NET. 172800 IN A 192.54.112.30
I.GTLD-SERVERS.NET. 172800 IN A 192.43.172.30
J.GTLD-SERVERS.NET. 172800 IN A 192.48.79.30
K.GTLD-SERVERS.NET. 172800 IN A 192.52.178.30
L.GTLD-SERVERS.NET. 172800 IN A 192.41.162.30
M.GTLD-SERVERS.NET. 172800 IN A 192.55.83.30
Once again, there is a big list of name servers in the reply, which are the 13 official top-level domain name servers.

Then the same process goes on: your name server can send a query to any of the above servers, for instance 192.5.6.30, to get the address of the name server which owns the domain "google.com". Let's do this with dig:
# dig -t ns google.com @192.5.6.30

;; QUESTION SECTION:
;google.com. IN NS

;; ANSWER SECTION:
google.com. 172800 IN NS ns1.google.com.
google.com. 172800 IN NS ns2.google.com.
google.com. 172800 IN NS ns3.google.com.
google.com. 172800 IN NS ns4.google.com.

;; ADDITIONAL SECTION:
ns1.google.com. 172800 IN A 216.239.32.10
ns2.google.com. 172800 IN A 216.239.34.10
ns3.google.com. 172800 IN A 216.239.36.10
ns4.google.com. 172800 IN A 216.239.38.10
Now, the reply is much smaller, and shows the address of the 4 DNS servers which are authoritative for the zone "google.com".

At last, your name server can send a query to one of these servers, to get the IP address of "www.google.com". This time, this is a "A" query, and not a "NS " query", but you don't need to precise "-t a" to dig, this is the default option:
# dig www.google.com @216.239.32.10

;; QUESTION SECTION:
;www.google.com. IN A

;; ANSWER SECTION:
www.google.com. 604800 IN CNAME www.l.google.com.
www.l.google.com. 300 IN A 209.85.227.147
www.l.google.com. 300 IN A 209.85.227.103
www.l.google.com. 300 IN A 209.85.227.106
www.l.google.com. 300 IN A 209.85.227.104
www.l.google.com. 300 IN A 209.85.227.105
www.l.google.com. 300 IN A 209.85.227.99
Here the reply shows that www.google.com is actually an alias (a "CNAME") to www.l.google.com, and gives 6 different IP addresses, as already seen in the previous article.

So, your name server has now learned the IP address of www.google.com, and can reply to your query. By the way, let's see what happens with dig when you query your own dns server; i.e the one of your internet provider:
# dig www.google.com

;; QUESTION SECTION:
;www.google.com. IN A

;; ANSWER SECTION:
www.google.com. 471995 IN CNAME www.l.google.com.
www.l.google.com. 95 IN A 209.85.229.106
www.l.google.com. 95 IN A 209.85.229.147
www.l.google.com. 95 IN A 209.85.229.104
www.l.google.com. 95 IN A 209.85.229.103
www.l.google.com. 95 IN A 209.85.229.105
www.l.google.com. 95 IN A 209.85.229.99

;; Query time: 29 msec
;; SERVER: 84.103.237.140#53(84.103.237.140)
;; WHEN: Sun Dec 13 11:20:59 2009
;; MSG SIZE rcvd: 148
You can see in the SERVER line the IP address of the name server which has been queried. As the address was not passed as parameter to dig, the default name server has been used (the one in /etc/resolv.conf). The interesting difference compared to the query done directly to the google name server is the number in red, called the TTL (Time To Live) of the DNS record. The TTL is the number of seconds for which the reply is valid, which means that if your computer needs to know again the IP address of www.google.com after the TTL, it has to send another query to the name server, otherwise it can just reuse the address it already knows (which is called caching). The goal of this TTL mechanism is to reduce the traffic to the name servers, but with the guarantee that the server will be called again after a given delay, in case the IP address has changed meanwhile. Then if you do the same query a few seconds later to your name server, you can see that the TTL has decreased:
# dig www.google.com

;; QUESTION SECTION:
;www.google.com. IN A

;; ANSWER SECTION:
www.google.com. 471989 IN CNAME www.l.google.com.
www.l.google.com. 89 IN A 209.85.229.104
www.l.google.com. 89 IN A 209.85.229.99
www.l.google.com. 89 IN A 209.85.229.147
www.l.google.com. 89 IN A 209.85.229.106
www.l.google.com. 89 IN A 209.85.229.105
www.l.google.com. 89 IN A 209.85.229.103
If you look again at the query done to the google name server, you can see that the TTL was 300s for all the www.l.google.com records. It means all the name servers which are not authoritative for the google.com domain (for instant the name server of your provider) must send again a query to the google name server after a delay of 300s, and that the TTL they use for their own replies must be lower than 300s.

These notions of caching and TTL are very important, as they explain why different name servers can give different results to the same query at the same time. It also explains why a change of IP address can take some time to be propagated to the whole internet, as if you own a DNS server and change your DNS table, you have to wait for the TTL to be sure that everyone will be aware of the change. That's why the TTL should be as low as possible if the IP addresses often changes, for instance if you use Dynamic DNS, which is a way to have a fixed domain name for your computer, even if it is connected to the internet through a cheap connection without a fixed IP address. On the other hand, a low TTL increases the traffic to the DNS servers, so a balance has to be found.

That's all for this introduction, I hope you now understand better how this important protocol works!

samedi 12 décembre 2009

Introduction to the DNS protocol (part 1)

Now let's have a look at a very important protocol of the TCP/IP stack: the DNS (Domain Name System) protocol. As you know, each host on an IP network (like internet) has a unique IP address, which is a 32-bit number, like 209.85.229.106. As IP addresses are difficult to memory, a more convenient way to identify hosts on the network is to use domain names, like "www.google.com", "www.wikpedia.org", etc. The DNS protocol is used to convert domain names into IP addresses, and conversely, thanks to DNS tables stored in DNS servers. The process of converting a domain name into an IP address is called "DNS resolution".

To understand how the DNS protocol works, you should read domain names from the right to the left: for instance in the name "www.google.com", the first part (on the right) is ".com", and is called the top-level domain, then "google.com" is called a subdomain of the "com" domain, and at last "www.google.com" is called a host name, or FQDN (Fully Qualified Domain Name). There is quite a few number of top-level domains, like ".com", ".net", ".org", ".fr", ".uk", ".ca", etc. (there is one reserved top-level domain for each country in the world), and each one has many subdomains, which can themselves have other subdomains, and so on. In the end, there is a hierarchy of domain names, which can be represented by a tree, like this:
ROOT
|
|-> com
| |-> google.com
| | |-> www.google.com
| | |-> mail.google.com
| |
| |-> yahoo.com
| |-> ...
|
|-> org
| |-> wikipedia.org
| |-> ...
|
|-> net
| |-> ...
|
|-> fr
| |-> yahoo.fr
| |-> www.yahoo.fr

Something which makes the DNS protocol powerful is that it is a decentralized protocol, which means there is (fortunately!) not a single DNS server in the world, but many DNS servers which can talk to each others. Each DNS server controls a set of domain names, called a DNS zone, which means it knows the IP address of each host in this zone.

Now let's imagine we want to know the IP address corresponding to the host name "www.google.com". This can be done easily with the command "host":
# host www.google.com
www.google.com is an alias for www.l.google.com.
www.l.google.com has address 209.85.229.106
www.l.google.com has address 209.85.229.99
www.l.google.com has address 209.85.229.104
www.l.google.com has address 209.85.229.103
www.l.google.com has address 209.85.229.105
www.l.google.com has address 209.85.229.147
First surprise, "www.google.com" is actually not a host name, but an alias to another host name. A DNS alias (called a CNAME in the DNS protocol) is just a shortcut to another domain name, and can be used for instance to make a domain name easier to remember.
The second surprise is that there are actually several IP addresses for the host www.l.google.com, and if you execute this "host" command several times, you will see that the addresses do not always appear in the same order. This mechanism is called round-robin, and its goal is to distribute the web traffic of Google over several servers, as each time you type "www.google.com" in your web browser, you actually use one of the 6 IP addresses listed above, with a probability of 1/6.

But how did your computer find these addresses? The first thing to know is that if your computer is connected to an IP network, it is probably configured to use one or two DNS servers; of course a DNS server has to be identified by its IP address, as you cannot use the DNS protocol to find the IP address of your own DNS server! On Linux or Mac OS X, you can see the address of your DNS server(s) in the file /etc/resolv.conf:
# cat /etc/resolv.conf
nameserver 84.103.237.140
nameserver 86.64.145.140
Here there are two servers, in case the first one is not available. If you are connected to the internet thanks to an ISP (Internet Service Provider), the DNS servers you use are probably hosted by this provider. You can check this thanks to a reverse DNS lookup, i.e by getting the domain name associated to an IP address; this can be done with the host command too, or with the dig command, which is a much more powerful command to perform DNS queries:
# dig -x 84.103.237.140

; <<>> DiG 9.6.0-APPLE-P2 <<>> -x 84.103.237.140
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28600
;; flags: qr rd ra
; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;140.237.103.84.in-addr.arpa. IN PTR

;; ANSWER SECTION:
140.237.103.84.in-addr.arpa. 4236 IN PTR ns1.rslv.n9uf.net.


;; Query time: 10 msec

Here you can see in the ANSWER SECTION, that the IP address 84.103.237.140 corresponds to the domain name "ns1.rslv.n9uf.net".

Let's see what happens when you type the command "host www.google.com". The DNS protocol is a protocol of the application layer of the TCP/IP stack, and is based on the UDP transport protocol (see previous article). A DNS server (nearly) always runs on the UDP port 53, so let's use the following tcpdump command to see what happens on the network during a DNS request:
# tcpdump -nv -X -p udp port 53

13:15:47.636801 IP (tos 0x0, ttl 64, id 50466, offset 0,
flags [none], proto UDP (17), length 59) 192.168.1.2.64419 >
84.103.237.140.53: 3099+ A? www.google.fr. (31)

00: 4500 003b c522 0000 4011 b1f1 c0a8 0102 E..;."..@.......
10: 5467 ed8c fba3 0035 0027 03d7 0c1b 0100 Tg.....5.'......
20: 0001 0000 0000 0000 0377 7777 0667 6f6f .........www.goo
30: 676c 6502 6672 0000 0100 01 gle.fr.....


13:15:47.666368 IP (tos 0x0, ttl 58, id 2756, offset 0,
flags [DF], proto UDP (17), length 203) 84.103.237.140.53 >
192.168.1.2.64419: 3099 8/0/0 www.google.fr. CNAME
www.google.com., www.google.com. CNAME www.l.google.com.,
www.l.google.com. A 209.85.229.147, www.l.google.com.
A 209.85.229.99, www.l.google.com. A 209.85.229.106,
www.l.google.com. A 209.85.229.103, www.l.google.com.
A 209.85.229.105, www.l.google.com. A 209.85.229.104 (175)

00: 4500 00cb 0ac4 4000 3a11 31c0 5467 ed8c E.....@.:.1.Tg..
10: c0a8 0102 0035 fba3 00b7 9841 0c1b 8180 .....5.....A....
20: 0001 0008 0000 0000 0377 7777 0667 6f6f .........www.goo
30: 676c 6502 6672 0000 0100 01c0 0c00 0500 gle.fr..........
40: 0100 03fa cd00 1003 7777 7706 676f 6f67 ........www.goog
50: 6c65 0363 6f6d 00c0 2b00 0500 0100 086a le.com..+......j
60: 5400 0803 7777 7701 6cc0 2fc0 4700 0100 T...www.l./.G...
70: 0100 0000 6c00 04d1 55e5 93c0 4700 0100 ....l...U...G...
80: 0100 0000 6c00 04d1 55e5 63c0 4700 0100 ....l...U.c.G...
90: 0100 0000 6c00 04d1 55e5 6ac0 4700 0100 ....l...U.j.G...
a0: 0100 0000 6c00 04d1 55e5 67c0 4700 0100 ....l...U.g.G...
b0: 0100 0000 6c00 04d1 55e5 69c0 4700 0100 ....l...U.i.G...
c0: 0100 0000 6c00 04d1 55e5 68 ....l...U.h
The message in green is the DNS query, and the message in red is the corresponding DNS reply. You can see the number "3099" at the beginning of both the query and the reply; this number is the identifier of the query, and is used by the client to know to which query the reply corresponds. Here the query is a "A?" query, which means "give me the IP address of the given host", but there are many types of DNS queries, which are all detailed in the RFC 1035.

lundi 7 décembre 2009

The transport layer: TCP and UDP

The central layer of the internet protocol stack is the transport layer, which in the TCP/IP model can be TCP (Transport Control Protocol) or UDP (User Datagram Protocol). The TCP protocol, briefly introduced in the previous article, is actually a quite complex protocol, whose goal is to make sure that data will be received by the recipient, in the right order, and without any transmission error. UDP is a more simple protocol, which just acts as a container to transport any data in small packets, with a light mechanism to detect transmission errors, but with no guarantee that packets will be received by the recipient.
Usually, TCP is used when data integrity is more important than speed, for instance to transport e-mails, files, or Web pages, and UDP is used when speed is the priority, even at the price of a possible loss of data, for instance for real time online games or to transport voice. A data packet is usually called a datagram in the UDP protocol, and a segment in the TCP protocol.

Both TCP and UDP bring the important concept of port number: a port is a 16-bit number (i.e. with a value between 0 and 65535), which corresponds to a service offered by a host on the network. To take our well-know post office analogy, if the IP address corresponds to the address of a letter (country, city, street and number), the port corresponds to the first name of the recipient. For instance, a single computer can act as both a Web (HTTP) server and mail (STMP) server at the same time. When such a computer receives an IP packet, it has to know if the packet must be sent to the Web server or to the mail server: this is done thanks the destination port number, which is a part of the TCP header of the packet; for instance, a Web server is usually associated to the port 80, and a mail server to the port 25. The association between a given service and a port number is standard, and the whole list can be found at the address http://www.iana.org/assignments/port-numbers, or in the file /etc/services on Linux or Mac OS X (C:\WINDOWS\system32\drivers\etc\services on Windows). So, each TCP or UDP packet contains a destination port, so that the destination host can know what it has to do with the packet, and also contains a source port, which will be the destination port used by the recipient when it sends back a reply packet to the sender (if a reply is needed).

You can see all the TCP and UDP ports used by your computer with the netstat command (the exact options depend on the Operating System), for instance:
# netstat -an
Active Internet connections (servers and established)
Proto Local Address Foreign Address State
tcp 0.0.0.0:22 0.0.0.0:* LISTEN
tcp 127.0.0.1:631 0.0.0.0:* LISTEN
tcp 192.168.1.3:22 192.168.1.2:56347 ESTABLISHED
udp 0.0.0.0:5353 0.0.0.0:*
udp 0.0.0.0:32920 0.0.0.0:*
Each address is written in the standard form <IP address>:<port number>. The lines ending with "LISTEN" correspond to services running on the computer: for instance the first line with port 22 corresponds to a running SSH (Secure Shell) server, and the third line shows that another computer with IP address 192.168.1.2 is connected to this SSH server. The two last lines mean that two UDP services are also running on the computer, on ports 5353 and 32920.

To conclude this article, here is the format of an UDP packet (also described in the RFC 768), which contains a header of 8 bytes followed by the data:
bits | 0 - 15          | 16 - 32           |
-----+-----------------+-------------------|
0 | Source port | Destination Port |
-----+-----------------+-------------------|
32 | Length | Checksum |
-----+-----------------+-------------------|
64 | Data
-----+----------------- - - -
The format of a TCP packet is more complex and is described in the RFC 793 (details about the TCP protocol are out of the topic of this short introduction).

jeudi 3 décembre 2009

Overview of the TCP/IP stack

Now I will show you in more details how data is transmitted over an IP network. There are several protocols used on the internet, which are standardized ways of exchanging information on the network. Ethernet and IP are such protocols, but there are many other ones, like TCP, UDP, HTTP, DNS, FTP, SMTP, RTP, ICMP, IGMP, and so on. These protocols are organized in layers, that's why the set of protocols used on the internet is usually called the "TCP/IP stack".

To understand this concept of protocol layers, let's use once again the post office analogy. When you send a letter to someone in another city, you write something on a paper sheet, put the sheet in an envelope, and drop the letter in a mailbox. Later, this letter will be put with other ones in a big bag, and the bag will itself be transported with other bags in a car. In the internet world, this concept of putting data in a container (like a paper sheet in an envelope, or a letter in a bag) is called encapsulation. At the top of the TCP/IP protocol stack, you will find the application protocols, which is a first level of encapsulation of the data. For instance e-mails are encapsulated in the SMTP protocol (Simple Mail Transfer Protocol), web pages are encapsulated in the HTTP protocol (Hyper Text Transfer Protocol), etc. The application protocol depends on the kind of data you want to exchange: e-mails, files, video, etc. In the post office analogy, the application protocol corresponds to the language in which you write your letter. Exactly as a long letter has to be split into several paper sheets, the applicative data (like an e-mail) is split into small packets called segments, thanks to a transport protocol, which is usually TCP (Transport Control Protocol). The goal of a transport protocol is to make sure no information will be lost on the network, and will be received in the right order. Then the data is again encapsulated in an other protocol layer, called network protocol, which is IP here (Internet Protocol). This network protocol corresponds to the envelope of your letter, and is responsible of the delivery of your message to the right recipient, thanks to its address. At last, the IP packet (the letter) is once again encapsulated in another protocol called link protocol, which is Ethernet for instance. This protocol layer is responsible of the actual transport of the data on a physical support (a cable for instance), and corresponds to the car in the post office analogy.
To sum up, data is encapsulated in several layers of protocols:
  • The application layer (HTTP, SMTP, FTP, etc.)
  • The transport layer (TCP, UDP)
  • The network layer (IP)
  • The link layer (Ethernet)
As this notion of layers may seem a bit abstract, let's look at a concrete example: you open your Web browser and enter the address "http://www.wikipedia.org". Your browser will send a HTTP request to the Wikipedia web server, to retrieve the Wikipedia welcome page. This request will involve several protocol layers introduced above: HTTP, TCP, IP and Ethernet.
To see what these protocols look like, let's use the UNIX tool "tcpdump", which is an extremely powerful tool to analyse network traffic. I will not explain the syntax of tcpdump now, the goal is just to see what an HTTP request looks like:
# tcpdump -n -vvv -e -XX -s 1500 -i eth0 tcp port 80

19:36:52.958551 00:19:e3:09:25:05 (oui Unknown) >
00:18:39:c9:a8:a2 (oui Unknown), ethertype IPv4 (0x0800),
length 512: (tos 0x0, ttl 64, id 3047, offset 0, flags [DF],
proto TCP (6), length 498)
192.168.1.4.53236 > 91.198.174.2.80: Flags [P.],
cksum 0xd994 (correct), seq 1:447, ack 1, win 65535, options
[nop,nop,TS val 699779696 ecr 1329605586], length 446

000: 0018 39c9 a8a2 0019 e309 2505 0800 4500 ..9.......%...E.
010: 01f2 0be7 4000 4006 61aa c0a8 0104 5bc6 ....@.@.a.....[.
020: ae02 cff4 0050 4e86 55c8 cd6e c021 8018 .....PN.U..n.!..
030: ffff d994 0000 0101 080a 29b5 ca70 4f40 ..........)..pO@
040: 2bd2 4745 5420 2f20 4854 5450 2f31 2e31 +.GET./.HTTP/1.1
050: 0d0a 486f 7374 3a20 7777 772e 7769 6b69 ..Host:.www.wiki
060: 7065 6469 612e 6f72 670d 0a55 7365 722d pedia.org..User-
070: 4167 656e 743a 204d 6f7a 696c 6c61 2f35 Agent:.Mozilla/5
080: 2e30 2028 4d61 6369 6e74 6f73 683b 2055 .0.(Macintosh;.U
090: 3b20 496e 7465 6c20 4d61 6320 4f53 2058 ;.Intel.Mac.OS.X
0a0: 2031 302e 363b 2066 723b 2072 763a 312e .10.6;.fr;.rv:1.
0b0: 392e 312e 3529 2047 6563 6b6f 2f32 3030 9.1.5).Gecko/200
0c0: 3931 3130 3220 4669 7265 666f 782f 332e 91102.Firefox/3.
0d0: 352e 350d 0a41 6363 6570 743a 2074 6578 5.5..Accept:.tex
0e0: 742f 6874 6d6c 2c61 7070 6c69 6361 7469 t/html,applicati
0f0: 6f6e 2f78 6874 6d6c 2b78 6d6c 2c61 7070 on/xhtml+xml,app
100: 6c69 6361 7469 6f6e 2f78 6d6c 3b71 3d30 lication/xml;q=0
110: 2e39 2c2a 2f2a 3b71 3d30 2e38 0d0a 4163 .9,*/*;q=0.8..Ac
120: 6365 7074 2d4c 616e 6775 6167 653a 2066 cept-Language:.f
130: 722c 6672 2d66 723b 713d 302e 382c 656e r,fr-fr;q=0.8,en
140: 2d75 733b 713d 302e 352c 656e 3b71 3d30 -us;q=0.5,en;q=0
150: 2e33 0d0a 4163 6365 7074 2d45 6e63 6f64 .3..Accept-Encod
160: 696e 673a 2067 7a69 702c 6465 666c 6174 ing:.gzip,deflat
170: 650d 0a41 6363 6570 742d 4368 6172 7365 e..Accept-Charse
180: 743a 2049 534f 2d38 3835 392d 312c 7574 t:.ISO-8859-1,ut
190: 662d 383b 713d 302e 372c 2a3b 713d 302e f-8;q=0.7,*;q=0.
1a0: 370d 0a4b 6565 702d 416c 6976 653a 2033 7..Keep-Alive:.3
1b0: 3030 0d0a 436f 6e6e 6563 7469 6f6e 3a20 00..Connection:.
1c0: 6b65 6570 2d61 6c69 7665 0d0a 4966 2d4d keep-alive..If-M
1d0: 6f64 6966 6965 642d 5369 6e63 653a 204d odified-Since:.M
1e0: 6f6e 2c20 3233 204e 6f76 2032 3030 3920 on,.23.Nov.2009.
1f0: 3036 3a33 353a 3139 2047 4d54 0d0a 0d0a 06:35:19.GMT....

The second part of this output of tcpdump is the contents of one Ethernet frame (a "frame" is the name of a packet in the Ethernet protocol), in hexadecimal on the left side, and in ASCII on the right side. In red, you can see the header of the Ethernet frame, in green the header of the IP packet contained in this Ethernet frame, in purple the header of the TCP segment contained in the IP packet, and the rest is the actual contents of the HTTP request contained in the TCP segment, called the payload. Before the contents of the frame, you can see a bit of information coming from the headers, and which is decoded by tcpdump (I used the same color convention).
With this example, you can see that the Ethernet header (in red) is very simple: it contains the destination MAC address (6 bytes), then the source MAC address (6 bytes), then the type of data (the "ethertype", 4 bytes). Then there are 20 bytes for the IP header, and 32 bytes for the TCP header. At last, the remaining 446 bytes are the HTTP request itself.

This was just a brief overview of the protocols used on the internet; there are of course many things to say for each of them, which I will do in next articles ;-)

mercredi 2 décembre 2009

Essentials of IP routing (part 2)

Now let's imagine another IP network 192.168.12.0/24 ("network B"), on which are connected computer E (with IP address 192.168.12.5) and computer F (with address 192.168.12.6). These computers are also connected through an Ethernet network, but a different one from network A. What happens if computer C wants to send an IP packet to computer F? As they are not connected to the same Ethernet network, the ARP protocol cannot work here, and computer C has no way to know how to communicate with computer F. The solution is to use a router. A router is a device with several network interfaces, each one being connected to a different IP network, and able to transport IP packets from one network to another one (this is called routing or IP forwarding). In our example, a router can be used to connect networks A and B, like this:

As you can see, each interface of the router has its own IP address in a given IP network: here the router has an interface in network A, with the address 192.168.1.1, and an interface in network B, with the address 192.168.12.1. These interfaces are called the gateways of the networks, as they allow IP packets to go outside the network. By convention, the address of a gateway is usually the second IP of the address range of the network, for example the gateway of the network 192.168.1.0/24 has the address 192.168.1.1 (the address 192.168.1.0 is not used).
So, when computer C wants to send a packet to computer F, it has to know that computer F is outside its own IP network (it is easy to know, as the IP address 192.168.12.6 is not in the address range of network 192.168.1.0/24), and in that case it sends the packet to the gateway 192.168.1.1. To take again the post office analogy: if you want to send a letter to someone who is in your street (in your IP network), you can bring directly the letter to his mailbox. But if you want to send a letter to someone in another city, you cannot bring the letter directly, so you drop it in the mailbox of the nearest post office (the gateway of the router).
When a router receives an IP packet, it looks at the destination address, and forwards the packet to the interface which is in the corresponding IP network. In this example, the router can send the packet directly to computer F thanks to the Ethernet protocol, as they are both part of the same Ethernet network.

Any IP router, and actually any computer, phone, etc. connected to an IP network, has a routing table. The routing table allows the device to know on which network interface it has to send an IP packet, depending on the destination IP address. On Linux, the routing table can be displayed with the command "route -n" (add the -n option to avoid DNS resolution; if you don't know what it means, just put the option ;-) ). On Windows or Mac OS X, you can display the routing table with "netstat -r" command. Here is an example of routing table on Linux:
# route -n
Kernel IP routing table
Destination Gateway Genmask Iface
172.16.110.0 0.0.0.0 255.255.254.0 eth0
192.168.1.0 0.0.0.0 255.255.0.0 eth1
127.0.0.0 0.0.0.0 255.0.0.0 lo
0.0.0.0 172.16.110.1 0.0.0.0 eth0
The columns "Destination" and "Genmask" give the network and its mask. As explained in the previous article, the network mask is a way to represent the number of significant bits of the network address; for instance 172.16.110.0 with a mask of 255.255.254.0 actually means 172.16.110.0/23. The column "Iface" gives the network interface on which IP packets must be sent for the given network, and the column "Gateway" is the address of the gateway to use for this network.
When the gateway address has the special value 0.0.0.0 (or "*"), it means that no gateway is needed to send a packet on this network (the packet can be sent directly to its destination), as all the hosts of this network are directly connected to the same Ethernet network as the corresponding interface. In this example, the network interface "eth0" is connected to an Ethernet network in which all the hosts have an IP address in the range 172.16.110.0 -> 172.16.111.255.
The destination address 0.0.0.0 is also special, and is called the default address; the line which has this address in the routing table gives the interface and gateway (called the default gateway) which have to be used when no other line in the routing table matches the destination address of the packet. For instance an IP packet with a destination address of 138.15.7.145 doesn't match any of the networks 172.16.110.0/23, 192.168.1.0/16 or 127.0.0.0/8, so it will be sent on the interface eth0, to the Ethernet device which has the IP address 172.16.110.1, which is the gateway of a router. Note that (on Linux) if your routing table has no default gateway, you will get an error "No route to host" if you try to send an IP packet to an unknown destination.

Now try yourself to display and understand the routing table of your computer!

Essentials of IP routing (part 1)

Now you (hopefully) know a bit more what is an IP network, but what can you do with it? The purpose of a network is to allow machines to communicate, so the purpose of an IP network is to transport data between its hosts. To understand how hosts can communicate over an IP network, let's take the usual analogy of the post office. Let's say you want to send a letter from New York to your cousin in Paris. You write your letter, put it in an envelope, write your cousin's address on the envelope, and if you expect a reply, you also write your own address on it. Then you drop your letter in a mailbox at the post office, and thanks to the recipient's address on the envelope, your letter first goes to New York airport (probably by car), then flies to Paris by plane, and then arrives to your cousin's mailbox by car again.
This is exactly how data is transported on an IP network: the data is first split into IP packets (or datagrams), which correspond to letters, and the envelope is called the IP header. Like on a real envelope, the IP header contains the (IP) address of the sender, and the one of the recipient. In the IP world, the mailbox is called a gateway, the post office a router, and the road from the post office to the airport, or the flight from New York to Paris are different networks.

Now let's imagine an IP network 192.168.1.0/24 (called "network A"), with two computers: computer C with the IP address 192.168.1.2, and computer D with the IP address 192.168.1.3. These computers can be linked together thanks to an Ethernet connection, for instance. Ethernet is a low-level network protocol called "link protocol", and is used to link devices (computers, phones, etc.) which can be physically connected together. By "physically", I mean with a network cable, an optical fiber, or a WiFi link for instance. Each device on an Ethernet network has an address called "MAC address", which is a 48-bit number usually represented in hexadecimal (for instance: "00:19:e3:ff:fe:8d:f5:08"). Unlike the IP address, the MAC address is a characteristics of the network card of your computer; it should be unique in the world and is not supposed to be changed. When several computers are connected through an Ethernet network, each computer "learns" (thanks to a protocol called "ARP": Address Resolution Protocol) the MAC address of each other computer on the network, and the associated IP addresses. Like for the IP protocol, data is sent inside Ethernet packets, which start with a header containing the source and destination MAC address. The communication on an Ethernet network is very simple: when a computer wants to send an Ethernet packet to another one, this packet is actually sent to all the computers on the network (this mechanism is called "broadcast"), but only the one whose MAC address is equal to the destination address of the packet will read it; the other computers will just ignore the packet. As this broadcast mechanism is not very efficient if there are many computers in the network (it causes a problem called "packet collision"), there is actually another mechanism called bridging (or switching), which allows to divide an Ethernet network into smaller segments, and limit the broadcast mechanism to these segments; as it is a bit out of the topic I will not enter into details now.
The thing to remember is that hosts which are connected to the same Ethernet network (computers C and D in the example) can be part of the same IP network, and can talk directly to each other. On Linux, you can see which other hosts are on the same Ethernet network(s) as your computer, thanks to the command "arp" ("arp -a" on Windows or Mac OS X). It shows the ARP table, which makes the correspondence between IP addresses and MAC addresses:

# arp
IP address HW address Iface
192.168.1.1 00:18:39:C1:A8:A2 eth0
192.168.1.4 00:19:E3:02:2A:03 wl0

By the way, a computer can have several network interfaces (network cards), and in that case each interface has its own MAC address, and (usually) its own IP address. On Linux or Mac OS X you can see all your network interfaces with the command "ifconfig -a" (use "ipconfig /all" on Windows):
# ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:18:39:C9:A4:A6
inet addr:192.168.1.2 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

wl0 Link encap:Ethernet HWaddr 00:18:39:C9:A1:A2
inet addr:192.168.1.3 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

Here there are two interfaces: "eth0" (an Ethernet card) and "wl0" (a WiFi interface).
On each "inet" line, you can see 3 addresses: "addr" is the IP address of the interface. "Bcast" is the broadcast address, i.e. is the destination address used when the computer wants to send a message to all the hosts of this network; the broadcast address is always the last IP address of the address range of the network. The third one, "Mask", is the network mask; the mask is another way to represent the number of bits of the network part of an address: it is the address obtained with all the bits of the network part set to "1", and all the bits of the host part set to "0". For instance, a network mask of 255.255.255.0 can be written "11111111 11111111 11111111 00000000" in binary, which means the network part of the address has 24 bits (so here the network is 192.168.1.0/24)

IP addresses and networks

Before studying network tunneling in details, it is highly recommended to understand first how IP networks work. But what is an IP network? IP stands for "Internet Protocol", and is the network protocol at the heart of the whole internet, but also of most private networks out there. For instance at home you probably own a computer connected to the internet via an ADSL modem; if your modem is configured to be a "router" (this is probably the case if you own a recent one), then your computer communicates with the modem using the IP protocol, inside a private IP network (which only contains the modem and your computers - you may have several ones).

Each "host" of an IP network (an host could be a computer, a router, a phone, etc.) is identified by a unique "IP address", which is a 32-bit number (at least with the version 4 of the IP protocol, which is the most commonly used). It is common to represent an IP address with its decimal form, i.e. with 4 numbers separated by dots, each number having a value between 0 and 255. For instance, 192.168.1.2 or 74.125.77.104 or 255.255.255.255 are valid IP addresses, but 312.0.15.10 is not.

Though an IP address is usually written in its decimal form (to be human-readable), it is important to remember that it is just a 32-bit number; for instance the address 192.168.1.2 can actually be written "11000000 10101000 00000001 00000010" in binary. This is important to understand this representation, because an IP address is actually made of two parts: the network part and the host part. The network part corresponds to the n first bits of the address, and the remaining bits are the host part. For instance, if n=24 (24 bits, i.e. 3 bytes), the network part of the address 192.168.1.2 is "192.168.1", and the host part is ".2".
In the early days of internet, the number n had to be a multiple of 8, which corresponded to so-called "network classes"; there are 3 main classes:
  • Class A: n=8
  • Class B: n=16
  • Class C: n=24
This notion of class is now a bit deprecated, and n can take any value between 0 and 32 (as an IP address is a 32-bit number).
But now you wonder: "Where does this value of n comes from? Why did he choose 24 in his example??" Good question ;-) Actually the choice of n is up to the network administrator, and will determine how many hosts can be connected to an IP network. For instance if n=24, there are 8 remaining bits for the host part of the IP address, which means there can be 256 hosts in the network, which will share the same network part in their IP address. For instance 192.168.1.2, 192.168.1.5, 192.168.1.115 are all hosts of a same IP network, noted "192.168.1.0/24". In this notation (called CIDR notation, for "Classless Inter Domain Routing"), the first part is the decimal representation of the n bits of the network, and the second part (after the slash) is precisely the number n. So "192.168.1.0/24" represents an IP network containing addresses within the range 192.168.1.0 -> 192.168.1.255. Another example: the network "172.16.110.0/23" contains addresses within the range 172.16.110.0 -> 172.16.111.255 (this network can have 512 hosts).

Now comes the interesting part: an IP network can be divided into smaller IP networks, usually called subnets. For instance, the networks 192.168.1.0/24 and 192.168.15.0/24 are both subnets of the bigger network 192.168.0.0/16, as their addresses are within the range of this bigger network (the range 192.168.1.0 -> 192.168.1.255 is included in the range 192.168.0.0 -> 192.168.255.255). The administrator of a given IP network can chose to divide it into several subnets the way he wants, by choosing the appropriate value of n for the subnets. For instance the network 192.168.0.0/16 could be divided into 256 subnets (192.168.0.0/24, 192.168.1.0/24, 192.168.2.0/24, 192.168.3.0/24, etc.), or into 2 subnets (192.168.0.0/17 and 192.168.128.0/17), or anything else.

And the good news is, that you are the administrator of this 192.168.0.0/16 network ;-) Indeed, this range of IPs (192.168.0.0 -> 192.168.255.255) is reserved for private networks, which means everyone can use it to create its own IP network. However it is forbidden to connect such a network to the internet, as someone else in the world could use the same addresses as you, and this is not possible as IP addresses have to be unique in an IP network. Actually there is a way to connect a private network to the internet, by using a mechanism called Network Address Translation (NAT), but this will be discussed later.
There are 3 networks reserved for private usage:
  • 10.0.0.0/8
  • 172.16.0.0/12
  • 192.168.0.0/16
In addition to these networks, there is also a special IP address, 127.0.0.1, called the loopback address. This address is used when a host needs to talks to itself, for instance when two applications on a computer want to use TCP/IP to communicate but do not want to expose themselves to the outside world.

The subnet 192.168.0.0/24 is commonly used by default on ADSL or WiFi routers, so there are many chances that your computer at home is already configured to use this network.