AdSense Mobile Ad

Showing posts with label apache. Show all posts
Showing posts with label apache. Show all posts

Wednesday, May 12, 2010

HTTP Compression: With Safari, Compress Just Text

As I've outline in another post I've been configuring a couple of internal Apache HTTP Server to use HTTP 1.1 response compression. Since these web server act as a front-end proxy to a good number of web applications deployed in distinct application server, that was the good place to centralize such a configuration without having to modify every server, one by one.

To my surprise, after applying the new configuration, I discovered that it wasn't working correctly with Safari (4.0.5). Every browser I could text (Firefox, Internet Explorer, Google Chrome, Opera) in a bunch of different operating systems (Solaris, GNU/Linux, Windows) were working correctly. Safari did not and the problem manifested as random blank pages, missing images on page, incorrect CSSs and so on.

I fiddled a while with the Apache configuration and at the end the only working solution was adding the following sad line to httpd.conf:

BrowserMatch Safari gzip-only-text/html

So sad.


Monday, May 10, 2010

Speeding Up Web Access and Reducing Traffic With Apache

One of the parameters that might affect our web sites response times that we, developers or system administrators, do have under our control is the size of the HTTP Response. This size may be reduced after careful analysis and engineering so that responses are non redundant and efficient. Nevertheless developers often forget that, just before our web server returns the HTTP response to our clients, there's one last thing that can be done in the case you're using at least HTTP/1.1 (which will almost invariably be the case): apply a compression algorithm.

Compression algorithms are everywhere and the HTTP protocol is no exception. Although you should carefully analyze your application's resource consumption to discover potential bottlenecks of your application,  users typically spend much of their time waiting for page to load. Images, scripts, embedded objects, the page markup: all of them contribute to a bandwidth usage that affect your web application response time. The same way you spare hard disk storage when you compress your images or your music with an appropriate compression algorithm, you'll spare bandwidth (and hence time) if you compress your responses.

A Short Introduction

Let's make a short introduction before going on. For compressed output to be understood by agents, coordination between the server and the browser must take place: that's why HTTP/1.1 formalized and standardized how and when compression can be used. Basically, servers and clients exchange information to determine whether compressed requests and responses can be used and, if both support a common algorithm, they use it. Most of the time this information exchange is made with the Accept-Encoding and Content-Encoding HTTP headers.

HTTP/1.1 specifies three compression methods that can be used: gzip, deflate and compress. Many clients and servers support gzip: notably, the Apache HTTP server does so. Others do support deflate although its usage by browsers is more quirky than gzip's.

gzip, that is surely known to UNIX users, will produce good compression rates for text: it's not uncommon to achieve compression rates of 70% and above when compressing text files, typical HTML markup or JavaScript code.

Configuring Your Apache Web Server

Configuring the Apache HTTP Server to compress its output is pretty easy. One of the things you should take into account is almost obvious: not every content type will compress well and compression has a cost. So, depending on the content served by your application, consider configuring your web server accordingly so that precious CPU cycles aren't wasted compressing something that should not be. Text, hence HTML markup, JavaScript, CSSs and so on will quite surely compress well. Compressed images such as JPEGs, PDFs, compressed multimedia files such as mp3, ogg, flac, will not.

Enabling mod_deflate

mod_deflate is a module, bundled with standard Apache 2 distibutions, that will provide the filter you need to compress your traffic. To enable mod_deflate you must modify your Apache configuration file accordingly. Open httpd.conf and verify that mod_deflate is enabled:

[...snip...]
LoadModule deflate_module libexec/mod_deflate.so
[...snip...]

Deciding When and What To Compress

The next choice you have to make is when and what to compress. Apache is pretty flexible and you can apply compression at distinct levels of your configurations such as, for example:
  • Apply it to everything.
  • Apply it at multiple <Location/> level.
  • Apply it at <VirtualHost/> level.

The "best" configuration will depending on how you're using your Apache HTTP server. If you're using your Apache HTTP Server as a proxy and to manage different virtual hosts, you might be interested on reducing configuration complexity:
  • Disable compression on every web server proxied by your front-end Apache server.
  • Configure compression on Apache by using appropriate <Location/> sections on at a virtual host level.

The last web server I configured for a client of mine acted as a proxy for a great number of virtual hosts. Since every virtual host was serving compressible content, we applied just one configuration at the / location:

<Location />
[...snip...]
# mod_deflate configuration here
</Location>

Take time in analyzing the characteristics of your traffic before blindingly turning on compression: you may save CPU cycles. Do remember to disable compression behind your Apache server: there's probably no point in compressing twice or more times, you should do it just before your response is sent to your clients.

An Example Configuration

Typical configuration will take into account:
  • Browsers non-compliant behaviors.
  • Content types not to compress.

The configuration we're running usually is more or less the same configuration exemplified in mod_deflate official documentation:

[...snip...]
# Insert filter
SetOutputFilter DEFLATE

# Netscape 4.x has some problems...
BrowserMatch ^Mozilla/4 gzip-only-text/html

# Netscape 4.06-4.08 have some more problems
BrowserMatch ^Mozilla/4\.0[678] no-gzip

# MSIE masquerades as Netscape, but it is fine
# BrowserMatch \bMSIE !no-gzip !gzip-only-text/html

# NOTE: Due to a bug in mod_setenvif up to Apache 2.0.48
# the above regex won't work. You can use the following
# workaround to get the desired effect:
BrowserMatch \bMSI[E] !no-gzip !gzip-only-text/html

BrowserMatch Safari gzip-only-text/html

# Don't compress images
SetEnvIfNoCase Request_URI \
\.(?:gif|jpe?g|png)$ no-gzip dont-vary

# Make sure proxies don't deliver the wrong content
Header append Vary User-Agent env=!dont-vary
[...snip...]

A brief explanation of the configuration example is the following:
  • The first line sets the DEFLATE output filter.
  • The next four lines, beginning with the BrowserMatch directive, tells Apache to check its clients' browser version to solve some well-known quirks.
  • The sixth line is a regular expression to match the request URI with: if it matches, compression it's not applied. In this case, as you may see, common poorly compressible image formats are matched.
  • The last line tells Apache to append an additional header so that proxies will not deliver cached (compressed) responses to clients that cannot accept them.

Next Steps

Needless to say, that's just a basic configuration and much finer tunings can be done. One of the first thing you might want to tweak is using a better way to control which files will be compressed. Instead of using the SetEnvIfNoCase directive as shown in the example above, you could for example use the AddOutputFilterByType to register the DEFLATE filter and associated to the MIME Types of the files you want to compress. To do that, remove the SetOutputFilter directive from the example above and use the following instead:

[...snip...]
AddOutputFilterByType DEFLATE application/xhtml+xml
AddOutputFilterByType DEFLATE application/rss+xml
[...snip...]

and so on.

If you're managing many application with a variety of web and applications server on your boxes, consider using a front-end Apache to centralize such a configuration. Instead of configuring each of your servers you'll reduce your infrastructure complexity and improve its maintainability. If you want to know how to configure Apache Virtual Hosts, a previous blog post is a good starting point.

Wednesday, February 10, 2010

Setting up Apache SSL on Solaris 10

Solaris 10 is almost ready to run an SSL-secured Apache instance out of the box. What you really need is just the server certificate. The certificate, basically, contains the public key your clients will use to encrypt the communication with your SSL-secured server. If you're setting up a production site, chances are you already have a certificate from a trusted Certificate Authority. If you don't, go and get one. Instead, if you're running a non critical, internal or testing site, you can build a self-signed certificate and use it for your site.

Stop apache

Stop apache! ;)

# svcadm disable svc:/network/http:apache2

Enabling SSL

Solaris 10 uses SMF to manage its services and the bundled Apache is no exception. To enable SSL for the bundled Apache instance, you've got to modify the service configuration:

svccfg -s apache2 setprop httpd/ssl = boolean: 'true'

Creating a certificate


Safe harbor statement: This step, as explained in the introduction, will not generate a certificate suitable for production use.

Solaris 10 provides a bundled OpenSSL package which is just what you need to produce a self-signed certificate. The openssl binary is installed by default at /usr/sfw/bin/openssl.

To create the certificate, issue the following command:

$ openssl req -new -x509 -out server.crt -keyout server.key

When filling in the questions made by openssl, please note that the Common Name field must contain the name of the server you're creating the certificate for.

The server.key file produced in the previous step is a just a plain text file. If you want (I do) to protect your key with a passphrase, then launch openssl once more:

$ openssl rsa -des3 -in server.key -out server.key.crypt

You can now safely delete server.key and store server.key.crypt in a secure place. However, Apache won't start unless you type a pass phrase and can be a pain. I usually store the key with a very restrictive permission mask (400) and install it unencrypted. Another option you might use if you don't like letting the key unencrypted is using the SSLPassPhraseDialog directive in ssl.conf and built a script to output the pass phrase. Please note, however, that this method is not inherently more secure than leaving the key unencrypted.

Tell Apache where the certificate and the key are

To tell Apache where the certificate and the key are you have to use the

SSLCertificateFile
SSLCertificateKeyFile

directives. Solaris 10 ships with a functional /etc/apache2/ssl.conf file: edit the file and make sure the SSLCertificate* directive are pointing to your certificate and its key.

Reviewing your configuration

You're probably going to spend some minutes reviewing your ssl.conf file and learn about mod_ssl directive you'll find in there in case you need further customization.

Start Apache

Start the Apache service issuing a:

# svcadm enable svc:/network/http:apache2

and test your site with openssl:

$ openssl s_client -connect localhost:443 -state -debug

A note about virtual hosts

If you're using Apache name-based virtual hosts you might be thinking that the same mechanism applies for SSL-secured name-based virtual hosts. I'm sorry but the answer is no. Basically, SSL encapsulates HTTP and Apache won't be able to decide which host the request is directed to because there won't be any Host header before decrypting the communication, which can only be accomplished at the destination server. Moreover, Apache wouldn't be able to choose a certificate to decrypt the communication just because of the same reason: indeed, Apache will ignore multiple SSLCertificate* directives in <VirtualHost/> block and default to the first directive encountered.. If you're looking for more information on the subject, you can start here: Name-based VirtualHosts and SSL. Unless you can accept the restrictions outlined in this article, the only viable options to deploy SSL-secured virtual hosts are using IP-based (or port-based) virtual hosts.

Tuesday, November 17, 2009

Apache HTTP Server Virtual Hosts (on Solaris)

Some posts ago (Poor man's web redirection using a servlet filter) I described how I'm sending an HTTP Redirect Status Code back to a client. Such a solution was easy for me to implement because we're already running Java EE applications on our application servers and, on the other hand, we have no other web server available (if you're thinking about Apache HTTP Server). The previous post fails in pointing out that, implementing such a solution from scratch, is really overkill.

If you're one of the many users of Apache HTTP Server you should be aware of a functionality called Virtual Hosts. Virtual hosts let you run multiple web sites on a single Apache HTTP server instance and requests will be forwarded to the appropriate web site by using either the target IP address or the name you used to connect to the site. The last step will be configuring Apache so that such a Virtual Host will be served by proxying the destination server.

With such a proxy/gateway you'll be able, for example:
  • To serve different domains, subdomains or even specific URLs with just one Apache HTTP Server instance.
  • To offer a gateway in the case you've got a reduced number of public IPs and you don't want to publish HTTP services on ports other than 80.

DNS configuration

First of all I substituted the old DNS record with a CNAME which points to this Apache HTTP Server instance. Now, whenever a client requests www.domainA.com, the connection will be established with the target Apache.

Apache HTTP Server Startup

On (Open)Solaris, check if you've got an Apache HTTP Server instance running:
# svcs \*apache\*
STATE          STIME    FMRI
legacy_run     Mar_13   lrc:/etc/rc3_d/S50apache
disabled       Mar_13   svc:/network/http:apache2

If it isn't running, create a suitable configuration file in /etc/apache2:
# cp httpd.conf-example httpd.conf

Once the configuration file is created, the service should start normally:
# svcadm enable svc:/network/http:apache2
# svcs http:apache2
STATE          STIME    FMRI
online         Nov_15   svc:/network/http:apache2

Apache HTTP Server Configuration

The last thing to do is creating virtual hosts:

NameVirtualHost *

<VirtualHost *>
ServerName domainA.com
DocumentRoot /var/apache2/htdocs
</VirtualHost>

<VirtualHost *>
ServerName subdomain.domainA.es
ProxyPreserveHost On
ProxyPass / http://localhost:8083/
ProxyPassReverse / http://localhost:8083/
</VirtualHost>

In the previous fragment you can notice the following:
  • the NameVirtualHost directive lets you configure Apache to listen on a specific address and port. In this case, any IP address and any port (*) have been configured.
  • The VirtualHost sections let you define virtual hosts. Please note that the NameVirtualHost value and the VirtualHost value must be the same (in this case, *).
  • The ServerName directive is used to assign the domain name a virtual host should serve.
  • ProxyPreserveHost is used to tell Apache not to override the Host HTTP header when connecting to the proxied host.
  • ProxyPass and ProxyPassReverse lets you map proxied URL spaces. In this case, everything (/) is sent to the proxied host (http://localhost:8083/).

Further Readings
If you want to go into deeper detail, please read the following: