Tuesday, November 25, 2008

I switched from Squid to Sun Java System Web Proxy Server

I've been running Squid Web Proxy Cache for quite a while and also documented some basic setup in another article. But the last time we set up a server I decided to try Sun Java System Web Proxy Server. Since then, I switched the remaining Squid servers to Sun's proxy and lived happily ever after.

Why? Well, Squid was giving me no problem but sometimes setting it up and managing it was boring and error prone. Sun's Web Proxy Server has got the (familiar) administrator's web interface and I practically never touch a configuration file by hand. Creating a basic setup it's really a question of clicking a couple of button and the proxy's up and running.

Installation is pretty straightforward. I downloaded the Sun Java Enterprise System and launched the installer. Once launched, I just checked the Sun Java System Web Proxy Server and the installer did it all. The installer also gives you the possibility of automatically creating a proxy server with the default configuration values and if you need a good starting point that's a good hint.

Creating a server.
This was easy too. I had to create two different web proxies because we're serving two subnets with different requirements. Once the installer finishes its work, you can connect to the administration console using the configuration values you provided during the installation:
  • administration port
  • admin password
Open your favorite browser and launch the console. Once you're in, you'll find yourserlf in the Server/Manage Server section:

Adding a server is pretty easy, it just asks you for (very) basic information:

Inspecting default configuration.
Once you're done with creating your server(s), you can inspect the default configuration with the Manage Servers/Preferences/View server settings option:

Configuring system preferences.
Using the Manage Servers/Preferences/Configure system preferences tab you can modify basic preferences for your proxy:

In this page you can set:
  • server user: by default, it's nobody, and it's a value I usually don't need to change.
  • processes: the number of the background processes used to serve incoming requests.
  • listen queue size: the maximum number of pending connections on a socket.
  • request throttle: the number of concurrent transactions that the proxy can handle.
  • enable DNS: this is useful mostly for logging and for managing access control. If you enable DNS, the proxy will resolve IP into host names.
There are other configurable options, many of which are useful if you plan to implement distributed caching, whic I'll not cover in this post.

Adding listen sockets.
The next thing you'll probably want to do is setting up listen sockets, which are the endpoints of the proxy to which your clients will connect. If during the installation a default server was created for you, you'll probably want to edit the default port value for the listen socket:

Setting up cache properties.
The last thing you'll probably do to set this basic web proxy server is configuring the cache. You can start in the Manage servers/Cache section of the admin application. The first panel is Set cache specifics where you can set the most common properties for you cache.

The first thing I usually do is changing the cache working directory. Remember that when you change the cache directory you must pay attention that the proxy user (in my case nobody) can write into that directory, otherwise the cache won't work.

One chosen your favorite directory, you can set up the cache capacity either with the provided drop down list or via the Cache capacity configurator.

In this page you can also configure basic caching behavior for HTTP, FTP and Gopher protocols. As far as it concerns the HTTP protocol:
  • Always check if the document is up to date: this option does exactly what it says: every time a document is requested to the proxy, the proxy will check that the version it is caching is up to date. This may be useful in some circumstances but will rise the number of outgoing connection from the proxy server.
  • Check only if last check more than: if you choose this option, the proxy server will open a connection to check if the document is up to date only if the last time it did was more than what you specify. The default is two hours and depending on the situation I use to rise it up to one entire day.
  • Using: this option controls how the proxy server checks if the document is up to date. You can choose either using the last-modification factor, which is the set of headers that the web server sends along with the document, or the explicit expiration information, which are the internal headers used by the proxy server.
  • Never report accesses to remote server: this option tells the proxy server not to report a cache hit to remote servers.
  • Report cache hits to remote server: this option tells the proxy server to report to the remote server the number of times a document has been hit in the cache and accessed from there. This option rises the number of outgoing connection from the proxy server and may hit latencies and performance.
Cache partitions.
The cache partitions are the parts of disk reserved for caching purposes by the proxy server. You'll need to edit the cache partitions properties in the case, for example, you rise the cache capacity and you need to reserve more space on disk by adding a new cache partition.

In the previous screenshot the cache partition is 1.6 GB, which is the cache capacity I set up for this server. Adding a cache partition is trivial, you're only asked about the directory which will host the partition.

Set garbage collection.
As long as you use the proxy server, it will cache documents you request and the cache will keep growing up maintaining the allocated space in the range specified by the caching configuration. The garbage collection is the process that cleans up documents from the proxy cache and must be performed periodically. By default, this property is set as Automatic. I observed in my proxy server instances that if the cache hits are high and you are caching big documents, even if the garbage collection is automatic, it seems to never take place and the cache keeps growing up. For this reason I suggest you plan and schedule regular gargabe collection cycles. You may schedule them via the system cron or via the internal proxy scheduler. I usually use the system cron. Once chosen the manual configuration option, explicit garbage collection cycles can be scheduled in the Schedule garbage collection panel.

Caching configuration.
Other useful options you may want to set up can be found on the Set caching configuration panel. By default, the caching default is the derived configuration. If you want to explicitely set up every option, you can then set cache as the caching default value. Once done that and pushed the OK button, a new form will appear:

The options you'll find usually are:
  • The cache default
  • How to cache pages that require authentication
  • How to cache queries
  • The minimum and maximum cache file sizes
  • When to refresh a cached document
  • The cache expiration policy
  • The caching behavior for client interruptions
  • The caching behavior for failed connections to origin servers
An option which is often overlooked and might be pretty important for your proxy performance are the last two which rule what happens when a proxy connection is broken. This may happen if, for example, your user exits the browser or cancel a connection: the proxy may continue downloading the entire file even if the client is not retrieving it any more and this effect may sum up when many client are connected leading to proxy saturation and lost of performance. I saw this happen many times, even if with multimedia content such as flash-based solutions which deliver content, like YouTube. For this reason, I usually set 100% for the caching behavior for client interruptions which in effect has the proxy close the remote connection whenever a client disconnects.

With just few and simple steps you've set up an enterprise grade web proxy server. I suggest you to check the official documentation at Sun documentation center to fine tune your setup and read about more advanced configurations such as connecting to an LDAP to authenticate users, setting up SOCKS and setting up proxy arrays for distributed caching.

Now, enjoy your new proxy server!

No comments: