jeudi 3 décembre 2009

Overview of the TCP/IP stack

Now I will show you in more details how data is transmitted over an IP network. There are several protocols used on the internet, which are standardized ways of exchanging information on the network. Ethernet and IP are such protocols, but there are many other ones, like TCP, UDP, HTTP, DNS, FTP, SMTP, RTP, ICMP, IGMP, and so on. These protocols are organized in layers, that's why the set of protocols used on the internet is usually called the "TCP/IP stack".

To understand this concept of protocol layers, let's use once again the post office analogy. When you send a letter to someone in another city, you write something on a paper sheet, put the sheet in an envelope, and drop the letter in a mailbox. Later, this letter will be put with other ones in a big bag, and the bag will itself be transported with other bags in a car. In the internet world, this concept of putting data in a container (like a paper sheet in an envelope, or a letter in a bag) is called encapsulation. At the top of the TCP/IP protocol stack, you will find the application protocols, which is a first level of encapsulation of the data. For instance e-mails are encapsulated in the SMTP protocol (Simple Mail Transfer Protocol), web pages are encapsulated in the HTTP protocol (Hyper Text Transfer Protocol), etc. The application protocol depends on the kind of data you want to exchange: e-mails, files, video, etc. In the post office analogy, the application protocol corresponds to the language in which you write your letter. Exactly as a long letter has to be split into several paper sheets, the applicative data (like an e-mail) is split into small packets called segments, thanks to a transport protocol, which is usually TCP (Transport Control Protocol). The goal of a transport protocol is to make sure no information will be lost on the network, and will be received in the right order. Then the data is again encapsulated in an other protocol layer, called network protocol, which is IP here (Internet Protocol). This network protocol corresponds to the envelope of your letter, and is responsible of the delivery of your message to the right recipient, thanks to its address. At last, the IP packet (the letter) is once again encapsulated in another protocol called link protocol, which is Ethernet for instance. This protocol layer is responsible of the actual transport of the data on a physical support (a cable for instance), and corresponds to the car in the post office analogy.
To sum up, data is encapsulated in several layers of protocols:
  • The application layer (HTTP, SMTP, FTP, etc.)
  • The transport layer (TCP, UDP)
  • The network layer (IP)
  • The link layer (Ethernet)
As this notion of layers may seem a bit abstract, let's look at a concrete example: you open your Web browser and enter the address "http://www.wikipedia.org". Your browser will send a HTTP request to the Wikipedia web server, to retrieve the Wikipedia welcome page. This request will involve several protocol layers introduced above: HTTP, TCP, IP and Ethernet.
To see what these protocols look like, let's use the UNIX tool "tcpdump", which is an extremely powerful tool to analyse network traffic. I will not explain the syntax of tcpdump now, the goal is just to see what an HTTP request looks like:
# tcpdump -n -vvv -e -XX -s 1500 -i eth0 tcp port 80

19:36:52.958551 00:19:e3:09:25:05 (oui Unknown) >
00:18:39:c9:a8:a2 (oui Unknown), ethertype IPv4 (0x0800),
length 512: (tos 0x0, ttl 64, id 3047, offset 0, flags [DF],
proto TCP (6), length 498)
192.168.1.4.53236 > 91.198.174.2.80: Flags [P.],
cksum 0xd994 (correct), seq 1:447, ack 1, win 65535, options
[nop,nop,TS val 699779696 ecr 1329605586], length 446

000: 0018 39c9 a8a2 0019 e309 2505 0800 4500 ..9.......%...E.
010: 01f2 0be7 4000 4006 61aa c0a8 0104 5bc6 ....@.@.a.....[.
020: ae02 cff4 0050 4e86 55c8 cd6e c021 8018 .....PN.U..n.!..
030: ffff d994 0000 0101 080a 29b5 ca70 4f40 ..........)..pO@
040: 2bd2 4745 5420 2f20 4854 5450 2f31 2e31 +.GET./.HTTP/1.1
050: 0d0a 486f 7374 3a20 7777 772e 7769 6b69 ..Host:.www.wiki
060: 7065 6469 612e 6f72 670d 0a55 7365 722d pedia.org..User-
070: 4167 656e 743a 204d 6f7a 696c 6c61 2f35 Agent:.Mozilla/5
080: 2e30 2028 4d61 6369 6e74 6f73 683b 2055 .0.(Macintosh;.U
090: 3b20 496e 7465 6c20 4d61 6320 4f53 2058 ;.Intel.Mac.OS.X
0a0: 2031 302e 363b 2066 723b 2072 763a 312e .10.6;.fr;.rv:1.
0b0: 392e 312e 3529 2047 6563 6b6f 2f32 3030 9.1.5).Gecko/200
0c0: 3931 3130 3220 4669 7265 666f 782f 332e 91102.Firefox/3.
0d0: 352e 350d 0a41 6363 6570 743a 2074 6578 5.5..Accept:.tex
0e0: 742f 6874 6d6c 2c61 7070 6c69 6361 7469 t/html,applicati
0f0: 6f6e 2f78 6874 6d6c 2b78 6d6c 2c61 7070 on/xhtml+xml,app
100: 6c69 6361 7469 6f6e 2f78 6d6c 3b71 3d30 lication/xml;q=0
110: 2e39 2c2a 2f2a 3b71 3d30 2e38 0d0a 4163 .9,*/*;q=0.8..Ac
120: 6365 7074 2d4c 616e 6775 6167 653a 2066 cept-Language:.f
130: 722c 6672 2d66 723b 713d 302e 382c 656e r,fr-fr;q=0.8,en
140: 2d75 733b 713d 302e 352c 656e 3b71 3d30 -us;q=0.5,en;q=0
150: 2e33 0d0a 4163 6365 7074 2d45 6e63 6f64 .3..Accept-Encod
160: 696e 673a 2067 7a69 702c 6465 666c 6174 ing:.gzip,deflat
170: 650d 0a41 6363 6570 742d 4368 6172 7365 e..Accept-Charse
180: 743a 2049 534f 2d38 3835 392d 312c 7574 t:.ISO-8859-1,ut
190: 662d 383b 713d 302e 372c 2a3b 713d 302e f-8;q=0.7,*;q=0.
1a0: 370d 0a4b 6565 702d 416c 6976 653a 2033 7..Keep-Alive:.3
1b0: 3030 0d0a 436f 6e6e 6563 7469 6f6e 3a20 00..Connection:.
1c0: 6b65 6570 2d61 6c69 7665 0d0a 4966 2d4d keep-alive..If-M
1d0: 6f64 6966 6965 642d 5369 6e63 653a 204d odified-Since:.M
1e0: 6f6e 2c20 3233 204e 6f76 2032 3030 3920 on,.23.Nov.2009.
1f0: 3036 3a33 353a 3139 2047 4d54 0d0a 0d0a 06:35:19.GMT....

The second part of this output of tcpdump is the contents of one Ethernet frame (a "frame" is the name of a packet in the Ethernet protocol), in hexadecimal on the left side, and in ASCII on the right side. In red, you can see the header of the Ethernet frame, in green the header of the IP packet contained in this Ethernet frame, in purple the header of the TCP segment contained in the IP packet, and the rest is the actual contents of the HTTP request contained in the TCP segment, called the payload. Before the contents of the frame, you can see a bit of information coming from the headers, and which is decoded by tcpdump (I used the same color convention).
With this example, you can see that the Ethernet header (in red) is very simple: it contains the destination MAC address (6 bytes), then the source MAC address (6 bytes), then the type of data (the "ethertype", 4 bytes). Then there are 20 bytes for the IP header, and 32 bytes for the TCP header. At last, the remaining 446 bytes are the HTTP request itself.

This was just a brief overview of the protocols used on the internet; there are of course many things to say for each of them, which I will do in next articles ;-)

Aucun commentaire:

Enregistrer un commentaire