The File Transfer Protocol
Created: 18th August 2010
FTP (File Transfer Protocol) is one of the oldest protocols still used on the web today. Originally drafted into an RFC back in 1971 (but not actually practical until 1980), it still continues to be a popular choice in managing web servers and hosting large file stores. The aim of this guide is to shed some light on the inner workings of FTP, and how it separates control commands, from actual data transfers.
At a Glance
FTP does exactly what it says on the tin. Its primary purpose is to move files between remote systems, using a simple command like structure. Initially FTP was incredibly simple, but like most protocols, has undergone many changes and enhancements. The main implementation of FTP we see today is based on RFC959 (http://tools.ietf.org/html/rfc959), which provides for the majority of users needs. FTP works over TCP, typically port 21. But as this guide shows, very little data is actually transferred over port 21, since this is just a control port (used to issue control commands).
FTP is a client\server based protocol. The FTP server listens for FTP commands, and executes them. Similarly the FTP client issues FTP commands. Both can send\receive files. Among the many reasons why people choose to use FTP, the following stand out:
- It works across different file systems; an FTP server running Linux (Ext3) can download files to a Windows machine (running NTFS).
- It uses TCP therefore providing a reliable transfer mechanism
- It's able to adapt to allow transfers through firewalls by means of active\passive mode
Perhaps above all, there is an abundance of good free FTP clients and servers out there, from small lightweight ones to complex paid-for fully featured packages. Its common knowledge that FTP uses port 21 to transfer traffic, but what many may not know is that this is only a control port – meaning FTP uses this to send the commands required for operation. The actual data is transferred over a separate TCP socket.
The Control Port
When your client establishes a connection to the FTP server, it's traditionally targeted at port 21. The FTP server listens on this port and processes the commands sent by the client, after sending its initial banner and if configured to do so – authentication prompt.
The control port handles all requests to the server, only the FTP client may issue commands, the server simply responds. A common example of logging into the FTP server using Windows FTP (run from the command line), and executing the 'pwd' command (print working directory) would look as follows:
220 Ubuntu FTP Server (Version 6.4/OpenBSD/Linux-ftpd-0.17) ready.
USER john
331 Password required for john.
PASS pass123
230 User john logged in.
XPWD
257 "/home/john" is current directory.
QUIT
221 Goodbye.
Like SMTP, FTP is also a polite protocol which says goodbye.
In this conversation, we can see that the initial connection to port 21 triggered the server to respond with a 220 Response. Like HTTP, FTP uses numerical codes to indicate the nature of the response. '220' indicates that the service is ready for a new user login.
Note:
You can see a list of commonly used FTP response codes here. The full list can be found on the FTP RFC959 located here.
The server responses are in red, and the client commands are in blue. Most commands are self explanatory, and we can see the client issues the 'USER' command followed by the username. Assuming the user is valid, the server responds requesting further authentication. We can then see the 'PASS' command issued by the client, with a plain text version of the password. This is intentional, FTP makes no attempt to mask or encrypt its traffic. Therefore FTP is an insecure protocol.
Finally, when the user is logged in, we get the '230' response code stating all is well, and that the server is ready for new commands.
When we issued the 'pwd' command on the command line, the FTP client sent the 'XPWD' command. This command isn't technically part of the FTP RFC, but rather included in RFC775 "Directory oriented FTP commands" – an RFC which extends the directory commands to account for UNIX style directory structures. Almost all modern FTP clients\servers support this. However you could also send the valid 'PWD' command to get the same results (as FileZilla Client does).
All commands go over the control port, and the server responds to each with a numerical code and text string stating weather the command was successful.
A good way to learn the commands without actually transferring data is to telnet to the FTP server; this will then allow you to enter raw commands to the server. You can telnet the FTP server on port 21 and issue the commands above to gain access to your FTP server (obviously using your own account details).
The Data Port – Putting the 'T' in FTP
FTP wouldn't be much use if it just issued commands, we want file transfers dammit! That's where the data port comes in. The data port doesn't issue or receive commands, it simply transfers the bytes requested based on the commands in the control port. The FTP client\server software maintains a record of which data ports are open, and can open more than one if needed, but traditionally there is one data port, using port 20.
Before a transfer can take place, the client needs to negotiate on a way to initiate the data port. This can be done one of two ways, active or passive.
Active FTP
Active FTP means that the FTP client becomes the 'host server' when data is transferred between the two devices. When the FTP client requests a file, it may send the PORT command first, signalling that it intends to use Active FTP. The port command advises the server to connect to a port on the FTP client's host, and send the data.
With active FTP, the client issues the 'PORT' command, followed by its own IP address, and the hexadecimal value of the port number, an example would be:
PORT 192,168,1,30,19,142
This command states that the FTP client is listening on port 5006, using the IP address of the client – 192.168.1.30. Where does 5006 come from? A quick way to work out the hexadecimal value of the port is to use the following method on the last two numbers: (19 * 256) + 142.
So you simply multiply the first number by 256, then add the second number.
Now that the client is listening on this port, the server establishes a connection to it, and transfers the data:
Passive FTP
Passive FTP is the opposite mode, where the FTP client retains the role of a client and requests the server open a data port, to transfer the file:
Originally, FTP preferred to use PORT mode, however it requires the server to initiate a connection to the client, something which firewalls generally don't allow these days, so PASSIVE mode is more acceptable in most modern networks. A common problem is to use PORT mode behind a firewall, resulting in the incoming server connection from being blocked, therefore use PASSIVE mode if you're having problems getting FTP to work correctly.
FTP is really easy to set up and use, for Windows, I highly recommend FileZilla's free FTP server and client software, for Linux, the ftpd or ProFTPd are two fully functional FTP Server daemons.