Thursday, September 15, 2005

How do download managers work?

I have often wondered how download managers worked? I wanted to know the internal working that makes it possible for these components to download faster?

The main functions of a download manager are:
• Resuming interrupted downloads (i.e., downloading only the rest of the file instead of restarting theprocess from the very beginning);
• Scheduled operation: connecting to the Internet, downloading a list of specific files and disconnectingaccording to a user-defined schedule (e.g. at night when the connection quality is usually higher, while the connection rates are lower);
• some download managers have additional functions: searching for files on WWW and FTP servers byname, downloading files in several "streams" from one or from different mirror servers, etc.

Most download managers use the concept of "Multi-connection downloading" - the file is downloaded in several segments through multiple connections and reassembled at the user's PC.
To understand how this would work, we first need to understand a feature of web-servers.
A lot of webservers (http and ftp) today support the "resume download" function - what this means is that if your download is interrupted or stopped, U can resume downloading the file from where U left it. But the question now arises, how does the client(web-browser) tell the server what part of the file it wants or where to resume download? Is this a standard or server proprietary? I was suprised when I found out that it is the HTTP protocol itself that has support for "range downloads", i.e. when U request for a resource, U can also request what portion/segment of the resource U want to download. This information is passed from the client as a HTTP header:

See http header snipper below:

GET http://lrc.aiha.com/English/Training/Dldmgrs-Eng.pdf?Cache HTTP/1.1
Host: lrc.aiha.com
Accept: */*
User-Agent: DA 7.0
Proxy-Authorization: Basic bmFyZW5kcjpuYXJlbjEyNDM=
Connection: Close
Range: bytes=0-96143

Now what download managers do is that they start a number of threads that download different portions of the resource. So the download manager will make another request with the header:

GET http://lrc.aiha.com/English/Training/Dldmgrs-Eng.pdf?Cache HTTP/1.1
Host: lrc.aiha.com
Accept: */*
User-Agent: DA 7.0
Proxy-Authorization: Basic bmFyZW5kcjpuYXJlbjEyNDM=
Connection: Close
Range: bytes=96143-192286

This solves the mystery of how the download managers are able to simultaneously download different portions of the resource.

Imp Note: To resume interrupted downloads, it is not enough to use a download manager: the server from which the file is being downloaded should support download resumption. Unfortunately, some servers do not support thisfunction, and are called “non-resumable.”
So Ur download managers won't work (no increase in speed either), as the servers would ignore the HTTP "range" header.

But I was still confused how exactly does this increase the speed. After all if the current bandwidth is "fully utilized" with one connection, how does making more connections help? The answers I found on the net are as below:

Normal TCP connections, as used by HTTP, encounter a maximum connection throughput well below that of the available bandwidth in circumstances with even moderate amounts of packet loss and signal latency. (so bcoz of packet loss and latency, the client will have to re-request some packets ) Multiple TCP connections can help to alleviate these effects and, in doing so, provide faster downloads and better utilization of available bandwidth.

Opening more connections means less sharing with others. Web servers are set up to split their bandwidth into several streams to support as many users downloading as possible. As an example, if the download manager created eight connections to the server, the server thinks it is transmitting to eight different users and delivers all eight streams to the same user. Each of the eight requests asks for data starting at a different location in the file.

Here are the links to some good download managers:

www.getright.com
www.internetdownloadmanager.com/
www.netants.com
www.alwaysfreeware.co.uk/dload.html