Wednesday, January 22, 2014

Clean and Optimise the ElasticSearch Indexes of Logstash

ElasticSearch index files grow large quickly and one of the most common questions about them is how to optimise them and clean them, getting rid of old records you're not interested in any longer. A very easy way to accomplish these tasks is using the following two scripts:
  • logstash_index_optimize.py
  • logstash_index_cleaner.py
The first optimises the indexes newer than the specified number of days, while the latter cleans the indexes older than the specified number of days. The complete synopsis of either command can be obtained using the -h option.

Installing the Dependencies

These scripts depend on the following components:
  • The Python runtime (at least version 2).
  • The pyes package.
The pyes package, in turn, can be installed using pip:

# pip install pyes

Beware that the ElasticSearch instance bundled by Logstash is not supported by the latest pyes release (0.90.x) which requires ElasticSearch 0.90. If you're using the ElasticSearch instance bundled in Logstash, you must install version 0.20.1:

# pip install pyes==0.20.1

Installation on FreeBSD

The FreeBSD ports collection ships all the required dependencies as binary packages. The Python runtime can be installed with the following command:

# pkg install python

pip can be installed using (assuming Python 2.7 has been installed, as in FreeBSD 9.2 and 10.0):

# pkg install py27-pip

Once pip is installed, it can be used to installed pyes in a platform-independent way as explained in the previous section.

Running the Scripts

The simplest way to run the scripts is:

  • Passing the --host option to specify the ElasticSearch server to connect to.
  • Passing the -d option to specify the desired number of days.

$ python /path/to/logstash_index_cleaner.py \
  --host es-host \
  -d 30

Given the periodic nature of these tasks, I usually schedule them as cron jobs in a crontab file.



Installing Logstash on FreeBSD

Some time ago, I decided to centralise all the logs generated by a client's production systems to a syslog server and, after assessing a bunch of products, we chose Logstash (now part of the ElasticSearch family) as the tool to organise the unstructured logs into meaningful data structures which can then be searched, filtered and exploited.

The platform chosen to run Logstash is FreeBSD (9.x at the time), a rock-solid and very well documented UNIX-like operating system. Besides, it also ships a production-ready ZFS implementation which always comes handy in the data center.

Installation

Installing Logstash in FreeBSD is very easy because a good Logstash port exists and the binary package can be installed with a one-liner:

# pkg install logstash

The FreeBSD package manager will take care of installing Logstash and all its dependencies.

If the OpenJDK is installed during this process, a warning will instruct the administrator to mount the fdesc and proc file systems, since they are required for the correct operation of the OpenJDK. To make the change permanent, the following lines can be added to /etc/fstab:

fdesc  /dev/fd  fdescfs  rw  0  0
proc   /proc    procfs   rw  0  0

To mount them once after this change, execute:

# mount -a

To check they are mounted correctly, the mount command can be used:

# mount
[...snip...]
fdescfs on /dev/fd (fdescfs)
procfs on /proc (procfs, local)
[...snip...]

Configuration

The Logstash port performs a very good initial installation of Logstash and very few customisations are required in most cases.

Operation Mode

The Logstash service rc script is installed in /usr/local/etc/rc.d/logstash and supports three modes of operation (further information can be found in the official Logstash documentation):
  • standalone
  • agent
  • web
The operation mode can be set setting the logstash_mode variable in the /etc/rc.conf file:

logstash_mode="standalone"

The default operation mode is standalone does the following:
  • It launches a local ElasticSearch instance.
  • It launches the Logstash agent.
  • It launches the bundled Kibana web interface.
A standalone Logstash service is the easiest way to bootstrap a fully functional Logstash server with an embedded ElasticSearch instance. A more complex setup, for example, could contemplate a separate ElasticSearch server.

Embedded ElasticSearch

If the standalone operation mode is used, then an embedded ElasticSearch instance is launched. This instance stores its index files in the directory specified by the logstash_elastic_datadir configuration variable which, by default, is /var/db/logstash.

If you use the embedded ElasticSearch instance, you may want to mount a separate disk on /var/db/logstash for easier management in case you wanted to dedicate more space as the time passes. A ZFS dataset, in this case, is possibly the most flexible option available on FreeBSD (and other systems).

If you don't plan to store ElasticSearch indexes indefinitely, you're likely looking for a way to optimise them and remove older index entries.

Updating a Stale Logstash JAR

The Logstash port is not updated as often as Logstash is and you are likely going to get a pretty old version. Fortunately, the port lets you very easily run an updated Logstash binary. Beware the following:
  • Changing a ports' file may cause problems while upgrading a port.
  • Index files generated (or updated) by newer ElasticSearch instance may not be backwards compatible and rolling back a manual JAR upgrade may not be easy or possible altogether.
The Logstash service rc script uses the logstash_jar configuration variable to set the path of the Logstash JAR archive, the default value being

/usr/local/logstash/logstash-${version}-flatjar.jar

where ${version} is the version of the Logstash port.

An updated Logstash JAR archive can be downloaded and the value of the logstash_jar variable can be overridden in the /etc/rc.conf configuration file:

logstash_jar="/usr/local/logstash/logstash-1.3.3-flatjar.jar"

Always test a newer Logstash instance in a test environment.

Edited on June, 20th: Since version 1.4 Logstash is not distributed as a single JAR file any longer. I explored the tasks required to install the new version of Logstash on FreeBSD in a new blog post.

Adding Custom Java VM Options

Very often it's desirable to pass additional options to the Java VM but unfortunately the Logstash port service rc script doesn't allow you to do so. Unless, of course, you modify it.

I usually define a new empty configuration variable, called logstash_java_opts: in the service rc file /usr/local/etc/rc.d/logstash (added line in bold):

[...snip...]
: ${logstash_java_home="/usr/local/openjdk6"}
: ${logstash_java_opts=""}
: ${logstash_log="NO"}
[...snip...]

and then update the launch command (added fragment in bold):

[...snip...]
command_args="-f -p ${pidfile} ${java_cmd} ${logstash_java_opts}
required_files="${java_cmd} ${logstash_config}"

run_rc_command "$1"

Now, you can override the value of the logstash_java_opts variable in /etc/rc.conf:

logstash_java_opts="-Xmx2048M"

Once again, beware the consequence of changing the files of an installed binary package. In the meantime, I've asked the current port maintainer to consider a modification to support this use case.

Configuring Logstash

Logstash should now be configured according to your needs. The configuration file used by this package rc script is specified by the logstash_config variable whose default value is set by the rc script (only the relevant lines are shown):

name=logstash
: ${logstash_config="/usr/local/etc/${name}/${name}.conf"}

This value can be overridden setting the logstash_config variable in the /etc/rc.conf configuration file.

Detailed information and working examples can be found in the official Logstash documentation.

Testing Logstash

To start the Logstash service to test its configuration, the following command can be used:

# service logstash onestart

To stop it, use:

# service logstash onestop

To enable the Logstash log, the logstash_log should be set to YES in the /etc/rc.conf file:

logstash_log="YES"

The log file location is specified by the logstash_log_file variable, whose default value is set by the service rc file (only the relevant lines are shown):

name=logstash
logdir="/var/log"
: ${logstash_log_file="${logdir}/${name}.log"}

The log file location can be overridden setting the logstash_log_file variable in the /etc/rc.conf file.

Enabling the Logstash Service

When the configuration is correct, it can be enabled so that it's started during the system startup. To enable the Logstash service, add the following line to /etc/rc.conf:

logstash_enable="YES"

Further Readings

An updated version of this post was published on June 20th 2014 to describe the installation procedure of Logstash v. 1.4 (and greater) on FreeBSD.

Logstash (Up to at Least 1.4) Fails to Start on FreeBSD 10.0

Logstash (any version from 1.2.1 to 1.4.2) fails to start on FreeBSD 10.0 with the following exception:

Exception in thread "LogStash::Runner" org.jruby.exceptions.RaiseException: (NotImplementedError) stat.st_dev unsupported or native support failed to load
at org.jruby.RubyFileStat.dev_major(org/jruby/RubyFileStat.java:394)
at RUBY._discover_file(file:/usr/local/logstash/logstash-1.2.1-flatjar.jar!/filewatch/watch.rb:140)
at org.jruby.RubyArray.each(org/jruby/RubyArray.java:1617)
at RUBY._discover_file(file:/usr/local/logstash/logstash-1.2.1-flatjar.jar!/filewatch/watch.rb:122)
at RUBY.watch(file:/usr/local/logstash/logstash-1.2.1-flatjar.jar!/filewatch/watch.rb:34)
at RUBY.tail(file:/usr/local/logstash/logstash-1.2.1-flatjar.jar!/filewatch/tail.rb:58)
at RUBY.run(file:/usr/local/logstash/logstash-1.2.1-flatjar.jar!/logstash/inputs/file.rb:125)
at org.jruby.RubyArray.each(org/jruby/RubyArray.java:1617)
at RUBY.run(file:/usr/local/logstash/logstash-1.2.1-flatjar.jar!/logstash/inputs/file.rb:125)
at RUBY.inputworker(file:/usr/local/logstash/logstash-1.2.1-flatjar.jar!/logstash/pipeline.rb:151)
at RUBY.start_input(file:/usr/local/logstash/logstash-1.2.1-flatjar.jar!/logstash/pipeline.rb:145)

It comes out that a long standing issue (affecting Solaris instead of FreeBSD) exists from quite a while: https://logstash.jira.com/browse/LOGSTASH-665. As far as I can see, the same problem now affects FreeBSD 10.0. I added a comment on that issue and opened another one: https://logstash.jira.com/browse/LOGSTASH-1819.

For the moment, Logstash should be run on FreeBSD 9.2.

Edit: I've managed to patch Logstash to work on FreeBSD 10 and I've sent a pull request upstream. Until the pull request is merged and Logstash updated (which it can take forever), you can use a new FreeBSD port I've created to install Logstash on FreeBSD 10. This port is meant to substitute the outdated Logstash port in the FreeBSD port collection. I'm in talks with the port maintainer and hopefully it should not take long.

Update and Workaround

Since the stack trace and the nature of the error itself suggested this is a JRuby bug rather than a Logstash one, I opened an issue (Issue #1754) on JRuby's GitHub Repository. Kevin Menard quickly replied and pointed me to the right track sending me several existing issues regarding a JRuby dependency (jnr-ffi) chocking on FreeBSD 10.0 libc.so ld script (libc.so is a symbolic link in earlier FreeBSD releases).

I couldn't try it until today, when Michael (no more details are given) left a comment to this blog post (see below) pointing at the same reason Kevin gave and the suggestion of trying it in a FreeBSD jail using ezjail.

I confirm that running Logstash in a FreeBSD 10 jail where the existing libc.so is substituted with a symbolic link to the corresponding binary in /lib solves the problem and provides an easy-to-implement workaround to install the latest Logstash release in a FreeBSD 10 environment.

Current Status

I wish to update this post since I've been asked many times about the status of this issue. It turns out that the problem with running Logstash on FreeBSD 10 seems to lie on jnr-ffi bug for which push requests have been sent at least three times:

I hope the push request is finally included upstream. If you are waiting for this issue as well, please vote it and make your voice heard.