Load balancing with Nginx

I have a standalone web application running on a CentOS server in three instances on ports 8080,8081,8082. All three of the instances are running fine and operating as expected on their own. But I cannot figure out how to config Nginx to load balance between the instances. Here is what I currently have as the nginx config

upstream myproject {
    ip_hash;
    server 127.0.0.1:8080;
    server 127.0.0.1:8081;
    server 127.0.0.1:8082;
  }

  server {
    listen 80;
    location / {
        proxy_set_header X-Real-IP  $remote_addr;
        proxy_set_header X-Forwarded-For $remote_addr;
        proxy_set_header Host $host;
        proxy_pass http://myproject;
    }
  }

I am passing along the header information because I suspect that the Xojo server doesn’t like the request that nginx is sending. I am using ip_hash to keep all connections from one user going to the same instance so the sessions don’t break (not sure if this is necessary or the best way to do this). Anyhow, this is my first deployment where I am trying multi-instance load balancing. So any of you gurus out there who have pulled this off before please let me know what you are doing.

Thank you in advance!

P.S. I know that the request is getting through to the xojo app because I can see the load increase on the app when I hit Nginx - but nginx never responds. It just hangs there doing nothing like it is forever waiting for a reply from xojo.

I’m sorry, but I don’t see the point of “load-balancing” to the same server. Is this just an exercise?

Due to Xojo being 32bit the amount of memory available to the WE app is limited. Running multiple instances on the same server extends that. Once the new 64bit compiler is available it may become irrelevant, but there’s also the maximum concurrent connection limit. I don’t know if that will be increased with 64bit memory addressing.

I’ve done a similar thing, but wrote my own load-balancer in Xojo which also scaled across multiple servers.

Pushing updates to web pages also becomes more complicated when you can’t hold references to sessions on other instances.

Can you put time out and then to check the logs ?
example:
server 127.0.0.1:8080 max_fails=3 fail_timeout=30s;

Off the top of my head, you’ll probably want to add a “proxy_buffering off;” line to the above proxy configuration. Nginx by default tries to buffer responses when acting as a proxy- but that doesn’t work well for the kind of interactive communications that happen in a web app…

Still does not make sense to me yet Wayne. Having 3 apps in parallel on the same 32 bit limited space just divides the same limited space by 3. That should not represent a gain. Unless… Each instance has limited resources, and having more instances you have some gain.

The different applications will each have access to their own memory. You can sneak about 3Gb per 32bit app, so 3 will allow you to use 9Gb of physical memory.

I think more important than memory is CPU. Xojo apps currently run on only one CPU core. So running multiple app instances allows you to utilize multiple cores, and in that sense you are load-balancing among the available cores on a single server.

[quote=28808:@Jay Madren]Xojo apps currently run on only one CPU core.[/quote] Good point. Concurrent connections has always been the issue for me, but using multiple cores is an added bonus.

@John Joyce
Thinking out loud… This ip hash that you are creating… How is that going to behave with multiple connections from the same machine?

@Rick A. This is a proof of concept. The idea here is to manually spawn multiple child instances each running on it’s own core on an 8 core server. I am currently only trying three to start with.

If I can get it working I will do some test and post the results back here if anyone is interested.

But, besides that, once this base setup is complete it is trivial to add server instances as well.

@Antonis Vakondios , @Travis Hill - Good ideas, thanks! I will try them out.

@Greg Olone , It should channel all connections from one machine to the same specific instance of the app … hopefully. I don’t know if it is really necessary, but it seems like it would be in order to maintain sessions.

Not that easy Wayne. To break the 4GB limit, the CPU architecture for a 32 bit mode must be PAE ready (Page Address Extension) and the OS must be prepared to use this feature and YOUR app should pre-allocate a huge chunk of memory at start up (forcing the OS to choose an upper page, breaking the 4GB limit, for the next process). If all those 3 needs are not satisfied it’s very likely that the 3 processes can be put in the first 4GB block. In fact if PAE is not enabled by some circumstance, all memory above 4GB is never used. But… Thinking optimistically you can be right. :wink:

Yes. Perfect sense for a multicore system. In a multiple core system, a Xojo app nowadays is simply ineffective, and this makes perfect sense to try to circumvent this Xojo limitation of not using all the CPU power.

OK - so the results are in. This made a MAJOR difference in usability, if an app is likely at all to get temporarily locked up (processing a large file or something). The fix was what @Travis Hill had suggested - thanks Travis. In the end, this works pretty much as I hoped and allows you to run separate instances of standalone apps across different processors and still keep the connections xojo needs to maintain sessions.

Nginx Conf:

 upstream myproject {
        ip_hash;
        server 127.0.0.1:8080;
        server 127.0.0.1:8081;
        server 127.0.0.1:8082;
        #add a listing for any port you are running an instance on
  }

  server {
    listen 80;
    location / {
        proxy_buffering off;
        proxy_read_timeout 5m;
        proxy_pass http://myproject;
    }
  }

ip_hash; is required to maintain session, app fails to launch without it.

proxy_buffering off; needed to keep data flowing from app to client through proxy

proxy_read_timeout 5m; keeps connection open for app “pings”. If left at default the app will disconnect after 60 seconds. I tried several settings between 1-10 minutes and this was the lowest setting that worked consistently. The app remains open.

In order to start the app instances, I used upstart. I am on CentOS but this should also work on RHEL and possibly Fedora and many others. This is my method of choice because it will automatically respawn instances when they fail, allows easy starting and stopping of the whole group and can exec your app as a non-root user. If you are not familiar with upstart, here are my app.conf files for your reference:

Upstart, first app instance:

[code]# /etc/init/myapp.conf

description ‘First instance of app’
author ‘john-joyce.com
env LOG_FILE=/path/to/logfiles/myapp.log
env USER=

start on runlevel [2345]
stop on runlevel [016]
respawn

script
touch $LOG_FILE
chown $USER:$USER $LOG_FILE
exec su -s /bin/sh -c ‘exec “$0” “$@”’ $USER – /path/to/myapp --port=8080 >> $LOG_FILE 2>&1
end script
[/code]

Upstart, additional instances:

[code]# /etc/init/myapp1.conf

description ‘Additional instance of app’
author ‘john-joyce.com
env LOG_FILE=/path/to/logfiles/myapp.log
env USER=

start on starting myapp
stop on stopping myapp
respawn

script
exec su -s /bin/sh -c ‘exec “$0” “$@”’ $USER – /path/to/myapp --port=8081 >> $LOG_FILE 2>&1
end script
[/code]

This is just how I did it, if you are launching 20 instances you might want to iterate through it in the script rather than have a file for each instance, but this was easy for just a few instances. You just duplicate the second conf file for as many instances as you want and change the port.

Then when you do ‘initctl myapp’ all of the instances will start, stop, restart.

Observations:

This worked surprisingly well and I would recommend this kind of setup for any app that is getting decent usage. In opening up and rendering an image file I was easily able to get the CPU up to 100% with just one browser, and as most of you know, xojo doesn’t spawn it’s own child process or anything so it is effectively locked up for anyone else trying to use it - even if you are running it on a 24 core server with plenty of memory. Just creating 2 additional instances makes a remarkable difference in performance in a multi-user situation.

Above you can see 1 instance of the app maxed out and unavailable (CPU 11) while the others are on CPUs 18,19. Instance 1 is totally unresponsive even though there are still 20+ idle cores. But the instances on 18 and 19 are unaffected and running fine. I have to say I was surprised how easy it was to overload a single instance of the app.

It should be noted that because the processes are grouped and routed by IP (same process receives all requests for any one IP) that users in an office using NAT might all appear as one IP to the server and all be routed to the same instance, bypassing the load-balancing for that office. I haven’t tested that though.

Let me know if this helps anyone out there - hope it does.
John

Apologies for my ignorance, but how do you ensure the application runs on a different processor?

(badly worded) - I mean how do you ensure each application instance runs on a new processor?

…sigh…and of course, again, I mean core …

@david the OS starts them on different processors automatically in my experience (randomly), although you can force them to have a specific ‘processor affinity’ using a command like taskset.

Here I have started 5 copies of my app on different ports. The OS has selected the processors and they are all different. I have never seen it put more than one on a single processor so far.

UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
500      29930 29926  0  3925 11736  13 13:26 ?        00:00:00 /home/johnjoyce/photos/myapp --port=8084
500      29931 29925  0  3925 11736  14 13:26 ?        00:00:00 /home/johnjoyce/photos/myapp --port=8083
500      29933 29927  0  3925 11736  15 13:26 ?        00:00:00 /home/johnjoyce/photos/myapp --port=8081
500      29935 29928  0  3925 11736   3 13:26 ?        00:00:00 /home/johnjoyce/photos/myapp --port=8082
500      29936 29929  0  3925 11732   1 13:26 ?        00:00:00 /home/johnjoyce/photos/myapp --port=8080
root     29943  7194  0  3342  1060   0 13:26 pts/2    00:00:00 ps -aF -u johnjoyce

“PSR” column is the processor core # and you can see above the OS has picked 1,3,13,14,15 automatically (this is a 24 core server).

It’s OK I am using processor and core interchangeably too … I am referring to cores here if it matters.

Thank you very much John! :slight_smile:

I guess all instances connect to the same database. What database do you use? there is no problem?

In this context, I guess using SQLite is not possible, and it must be a server database.