The Greased Turkey Document [1]
or
How to set up a load-sharing server

Release History: 0.01alpha - Rob Thomas - rob@rpi.net.au [Bootstrap of the documentation]

This document was written with [homepage link] ippvs version 0.5 and Linux Kernel [kernel.org link] 2.0.35 in mind.

1: Overview

This document coveres the basics of what ippvs does, how it works, and how to set it up. I expect it to expand to cover a decent man(8) page, and a FAQ.

2: What does it do?

ippvs is a kernel modification that offers a NAT-style load sharing for multiple virtual servers. What we mean by this is that you have one 'listening' machine, that transparently (and incredibly quickly) redirect clients connection requests to other machines. The advantages of doing this is that it allows you to have huge arrays of redundant and load sharing servers.
A good example of this (and the example that we will be following through this entire document) is the setting up of a cluster of load-sharing proxy servers, at a very, very, low cost-per-tps rate. It's also perfectly suited to serving normal web traffic, or allmost anything that can be served over TCP or UDP. The only caveat is that it will NOT work with ftp services, because ftp services are too smart for their own good. [quick overview of how ftpd tells the client which ip and port to connect to, and how that will break the NAT]

3: How does it work?

In this document, as mentioned above, we will be going through how to set up an array of proxy servers, that appear to the clients as one physical machine. The first thing you should realise is how the machines should be wired together. [2]

                                    [ ---  HUB  --- ]
   [proxy server 1]<-eth0------------+ | | | | | | +--------eth0->[proxy server 4]
   [proxy server 2]<-eth0--------------+ | | | | +----------eth0->[proxy server 5]
   [proxy server 3]<-eth0----------------+ | | +------------eth0->[proxy server 6]
                                           | |
                                           | |
                                           | +--eth1->[ippvs server 0]<-eth0-------...local network...
                                           +----eth1->[ippvs server 1]<-eth0-------...local network...


[I realise that I use a -very- wide screen, so that'll probably look like crap on a 80x24 display - looks good on a 128x24 8)]

You should have a look at this map, and take notice of a few things:

1: The proxy servers are -not- connected to your LAN - they're on their own seperate LAN
2: The machines are connected to the rest of the network THROUGH the ippvs server. Make sure their default route is set up that way

In this demonstration, the IP addresses of the machines are:
ippvs server 0:
   eth0:  203.1.1.2  [Machine's IP address]
   eth0:0 203.1.1.10 [Permanant load-sharing IP address]
   eth0:1 203.1.1.11 [Only up if ippvs1 dies - usually DOWN]
   eth1:  10.1.1.254 [Private LAN IP address - non routeable, as only the proxy servers see it]
   eth1:0 10.1.1.253 [Only up if ippvs1 dies - usually DOWN]
ippvs server 1:
   eth0:  203.1.1.3  [Machine's IP address]
   eth0:0 203.1.1.11 [Permanant load-sharing IP address]
   eth0:1 203.1.1.10 [Only up if ippvs0 dies - usually DOWN]
   eth1:  10.1.1.253 [Private LAN IP address - non routeable, as only the proxy servers see it]
   eth1:0 10.1.1.254 [Only up if ippvs0 dies - usually DOWN]
proxy server 1:
   eth0: 10.1.1.1 
   default route to 10.1.1.254
proxy server 2:
   eth0: 10.1.1.2 
   default route to 10.1.1.254
proxy server 3:
   eth0: 10.1.1.3 
   default route to 10.1.1.254
proxy server 4:
   eth0: 10.1.1.4 
   default route to 10.1.1.253
proxy server 5:
   eth0: 10.1.1.5 
   default route to 10.1.1.253
proxy server 6:
   eth0: 10.1.1.6 
   default route to 10.1.1.253
This looks a bit complex, but if you're not interested in setting up a fault-tolerant network you don't need the second ippvs server, or to have half the servers talking to one machine, and the other half talking to the other machine.

[XXX - I'm aware that no auto-failover exists, but it'll only be a few 'ping' scripts to make it work. - XXX]
[XXX - Should I take out the redunancy stuff until I write some more documentation for it? - XXX]

Lets track a packet that's coming from a client machine, to port 8080 on 203.1.1.10.

Header: Request connection to port 8080 on 203.1.1.10 from 203.2.3.4 port 9999

The first thing that happens, is that 203.1.1.10 looks at the headers, and realises that it's it's set up as a load sharing port. ippvs0 picks a machine to send it to, and scribbles over the headers, changing the DESTINATION address (the SOURCE address stays the same) of the packet, and fires it out.

Header: Request connection to port 8080 on 10.1.1.2 from 203.2.3.4 port 9999

The machine 10.1.1.2 accepts the connection, and sends the data back:

Header: Connection accept, 203.2.3.4 port 9999, and here's the data, love from 10.1.1.2 port 8080.

The packet then heads back along the wire to the default route, which is ippvs0. The machine then glues the original headers back on and sends the packet on it's merry way

Header: Connection accept, 203.2.3.4 port 9999, and here's the data, love from 203.1.1.10 port 8080.

All the client sees is a normal connection to 203.1.1.10:8080, as though nothing magic was going on behind the scenes.


4: Wow. This rocks. How do I set it up?

The only 'setting up' is done on the actual ippvs server(s) - You need to pick out your IP addresses for your private LAN, obviously, and configure the machines. This document will pretend that you're using the IP addresses specified above - and there's no reason at all why you shouldn't. This is exactly what 10.x.x.x and 192.168.x.x is set aside for.
On ippvs0:
  ipfwadm -F -a m 10.1.2.0/24 -D 0.0.0.0/0 (?? No descrption of '-a m' in man ipfwadm?)
  ippfvsadm -A -t 203.1.1.10:8080 -R 10.1.1.1:8080  - Redirect _T_CP connections to 203.1.1.10:8080 to 10.1.1.1:8080
  ippfvsadm -A -t 203.1.1.10:8080 -R 10.1.1.2:8080  - and 10.1.1.2:8080
  ippfvsadm -A -t 203.1.1.10:8080 -R 10.1.1.3:8080  - and 10.1.1.3:8080

On ippvs1:
  ipfwadm -F -a m 10.1.2.0/24 -D 0.0.0.0/0 (?? No descrption of '-a m' in man ipfwadm?)
  ippfvsadm -A -t 203.1.1.11:8080 -R 10.1.1.1:8080  - Redirect _T_CP connections to 203.1.1.11:8080 to 10.1.1.4:8080
  ippfvsadm -A -t 203.1.1.11:8080 -R 10.1.1.2:8080  - and 10.1.1.5:8080
  ippfvsadm -A -t 203.1.1.11:8080 -R 10.1.1.3:8080  - and 10.1.1.6:8080

That's all you have to do. Now, when you try to make a connection to 203.1.1.10 or .11 on port 8080, it will be automatically, and invisibly, redirected to a random machine. There are various algorithims that are used to balance the load, which are out of the scope of this document at this stage of play.

5: Things you should be aware of that will bite you if you're not careful.

Allways make sure that the default route of the client machines points to the ippvs server.
0.5 supports tunneling, which I haven't played with, so therefore I don't know how it works yet 8-)
Allways make sure that the default route of the client machines points to the ippvs server. (Yes, twice. Don't forget!)

6: That hi-av thing looks cool. How does that work?

Hi-av isn't all that hard. When I get some time I'm going to whack together a couple of scripts and a database that can keep track of machines and automatically remove them from the redirection list, and have another machine (ala ippvs1) take over from a failed other ippvs. It's easy to to it manually. Switch ippvs0 off, run 'ifup eth0:1' and 'ifup eth1:0' on the other machine (if you have it set up that way) and then run the ippfvsadm commands that the other machine used to do, and it'll take over invisibly. Go look at the IP addresses above if you don't understand what I mean.

Questions, comments and suggestions about this document, please, send to rob@rpi.net.au
The Virtual Server mailing list is currently hosted at linux-virtualserver@iinchina.net - to subscribe to the maling list, send a message to 'majordomo@iinchina.net' with the message BODY (not subject) of 'subscribe' - it'll all be taken care of from there. Any messages sent to the list saying 'I'm not subscribed to this list, so can you email the reply to me privately' will be ignored, as it's very, very bad manners.

--Robert Thomas - 28/11/98

[1] - Kernel versions 2.1.129 and 2.1.130 have earned themselves the names of 'Greased Weasel' and 'Basted Turkey', due to some light-hearted banter of Linus Torvalds in the kernel release notes. This document was prepared over these two kernel revisions!
[2] - This is an 'optimal' diagram. There's no -physical- reason why the ippvs server, the clustered machines, and the clients can't be on the same segment. It's just nicer this way. Go buy a $50 hub. Trust us. It's better.