AdSense Mobile Ad

Showing posts with label elasticsearch. Show all posts
Showing posts with label elasticsearch. Show all posts

Friday, June 20, 2014

Installing Logstash v 1.4 (and Greater) on FreeBSD

In a previous post I described how to install Logstash (v. 1.3 and previous) on FreeBSD and in this post I will describe how to install Logstash v. 1.4 and greater.

Until version 1.3 included, Logstash was distributed as a single JAR file, and when version 1.4 was released a new packaging style was introduced. As a consequence, new instructions are required to properly setup Logstash in FreeBSD and registering it as a service. Further information about the new    distribution layout can be found in the Logstash release notes.

As seen in the previous post, a Logstash FreeBSD port exists, but it is currently outdated since it bundles Logstash v. 1.2. But while this could be used as a starting point for JAR-based Logstash installations (as we have seen, the update process only required updating the Logstash JAR), this is not possible with the new Logstash distribution because the included rc script will fail to work.

Edit: I've managed to patch the broken Logstash dependency and I'm waiting for the pull request to be merged upstream. In the meantime you can use an updated FreeBSD port I've created to install Logstash on FreeBSD 10.

Edit: The Logstash port has been updated and I'm the new maintainer. Also, the pull request solving a JRuby bug on FreeBSD has been merged to upstream and will hopefully hit a Logstash release soon.

Skimming through the original post is recommended because it provides general information about Logstash and FreeBSD which is required to properly plan and execute a Logstash setup.

Prerequisites

The essential prerequisites required to execute Logstash are:
The former is required because LogStash is a JRuby application while the latter, although not technically a requirement, is the recommended output for Logstash.

Installing Java

To install OpenJDK on FreeBSD you can use pkg to install a ready-to-use binary package:

# pkg install openjdk

Currently, this command will install an instance of OpenJDK v. 7 in both FreeBSD 9 and 10. If you'd rather install a different version, you can search the available packages and pick the one you prefer (command output has been filtered for brevity):

# pkg search openjdk
openjdk-7.60.19,1
openjdk6-b31_3,1
openjdk8-8.5.13_7
# pkg install openjdk8-8.5.13_7

Installing ElasticSearch

Logstash includes an embedded ElasticSearch instance you can use for standalone installations (see my previous post for an introductory view on Logstash operation modes). The required configuration to bootstrap the embedded ElasticSearch instance and to have Logstash use it as its outputs is described in the following sections.

Although simpler from the standpoint of the configuration, Logstash installations using separate ElasticSearch instances are out of the scope of this post.

Installing Logstash

Logstash installation procedure is fairly simple since it is distributed as a tarball:
  • Download Logstash from the official website.
  • Extract the tarball in the designated installation directory (my personal suggestion is to avoid /usr/local because it is used by ports and to use /opt instead).

Creating an rc.d Script

An rc.d script is required in a BSD system to register a service, define its configuration and have the rc framework manage its lifetime. The following script can be used as is or as a starting point to customise your own. If used as is, be aware that the script uses the following default values:
  • Installation directory: ${logstash14_home="/opt/logstash-1.4.1"}
  • Configuration file path: ${logstash14_config="/usr/local/etc/${name}/${name}.conf"}
  • ElasticSearch data directory: ${logstash14_elastic_datadir="/var/db/logstash14"}
  • Java home: ${logstash14_java_home="/usr/local/openjdk6"}
You can override any of the supported configuration values in the /etc/rc.conf file. If, for example, you want to use an alternate Java home path, just add the following line to /etc/rc.conf setting the desired value:

logstash14_java_home="/usr/local/openjdk7"

Testing the Service

To test the Logstash service, the following command can be used:

# service logstash14 onestart

To stop it, use:

# service logstash14 onestop

To help troubleshooting any problem you might find you can enable the Logstash log, setting the logstash14_log variable to YES in the /etc/rc.conf file:

logstash14_log="YES"

The log file location is specified by the logstash14_log_file variable, whose default value is set by the service rc file (only the relevant lines are shown):

name=logstash14
logdir="/var/log"
: ${logstash14_log_file="${logdir}/${name}.log"}

The log file location can be overridden setting the logstash14_log_file variable in the /etc/rc.conf file.

Enabling the Service

Note that the rc script described above does not enable the Logstash service:

: ${logstash14_enable="NO"}

If everything works, you can enable the Logstash service just adding the following line to /etc/rc.conf:

logstash14_enable="YES"

Wednesday, January 22, 2014

Clean and Optimise the ElasticSearch Indexes of Logstash

ElasticSearch index files grow large quickly and one of the most common questions about them is how to optimise them and clean them, getting rid of old records you're not interested in any longer. A very easy way to accomplish these tasks is using the following two scripts:
  • logstash_index_optimize.py
  • logstash_index_cleaner.py
The first optimises the indexes newer than the specified number of days, while the latter cleans the indexes older than the specified number of days. The complete synopsis of either command can be obtained using the -h option.

Installing the Dependencies

These scripts depend on the following components:
  • The Python runtime (at least version 2).
  • The pyes package.
The pyes package, in turn, can be installed using pip:

# pip install pyes

Beware that the ElasticSearch instance bundled by Logstash is not supported by the latest pyes release (0.90.x) which requires ElasticSearch 0.90. If you're using the ElasticSearch instance bundled in Logstash, you must install version 0.20.1:

# pip install pyes==0.20.1

Installation on FreeBSD

The FreeBSD ports collection ships all the required dependencies as binary packages. The Python runtime can be installed with the following command:

# pkg install python

pip can be installed using (assuming Python 2.7 has been installed, as in FreeBSD 9.2 and 10.0):

# pkg install py27-pip

Once pip is installed, it can be used to installed pyes in a platform-independent way as explained in the previous section.

Running the Scripts

The simplest way to run the scripts is:

  • Passing the --host option to specify the ElasticSearch server to connect to.
  • Passing the -d option to specify the desired number of days.

$ python /path/to/logstash_index_cleaner.py \
  --host es-host \
  -d 30

Given the periodic nature of these tasks, I usually schedule them as cron jobs in a crontab file.



Installing Logstash on FreeBSD

Some time ago, I decided to centralise all the logs generated by a client's production systems to a syslog server and, after assessing a bunch of products, we chose Logstash (now part of the ElasticSearch family) as the tool to organise the unstructured logs into meaningful data structures which can then be searched, filtered and exploited.

The platform chosen to run Logstash is FreeBSD (9.x at the time), a rock-solid and very well documented UNIX-like operating system. Besides, it also ships a production-ready ZFS implementation which always comes handy in the data center.

Installation

Installing Logstash in FreeBSD is very easy because a good Logstash port exists and the binary package can be installed with a one-liner:

# pkg install logstash

The FreeBSD package manager will take care of installing Logstash and all its dependencies.

If the OpenJDK is installed during this process, a warning will instruct the administrator to mount the fdesc and proc file systems, since they are required for the correct operation of the OpenJDK. To make the change permanent, the following lines can be added to /etc/fstab:

fdesc  /dev/fd  fdescfs  rw  0  0
proc   /proc    procfs   rw  0  0

To mount them once after this change, execute:

# mount -a

To check they are mounted correctly, the mount command can be used:

# mount
[...snip...]
fdescfs on /dev/fd (fdescfs)
procfs on /proc (procfs, local)
[...snip...]

Configuration

The Logstash port performs a very good initial installation of Logstash and very few customisations are required in most cases.

Operation Mode

The Logstash service rc script is installed in /usr/local/etc/rc.d/logstash and supports three modes of operation (further information can be found in the official Logstash documentation):
  • standalone
  • agent
  • web
The operation mode can be set setting the logstash_mode variable in the /etc/rc.conf file:

logstash_mode="standalone"

The default operation mode is standalone does the following:
  • It launches a local ElasticSearch instance.
  • It launches the Logstash agent.
  • It launches the bundled Kibana web interface.
A standalone Logstash service is the easiest way to bootstrap a fully functional Logstash server with an embedded ElasticSearch instance. A more complex setup, for example, could contemplate a separate ElasticSearch server.

Embedded ElasticSearch

If the standalone operation mode is used, then an embedded ElasticSearch instance is launched. This instance stores its index files in the directory specified by the logstash_elastic_datadir configuration variable which, by default, is /var/db/logstash.

If you use the embedded ElasticSearch instance, you may want to mount a separate disk on /var/db/logstash for easier management in case you wanted to dedicate more space as the time passes. A ZFS dataset, in this case, is possibly the most flexible option available on FreeBSD (and other systems).

If you don't plan to store ElasticSearch indexes indefinitely, you're likely looking for a way to optimise them and remove older index entries.

Updating a Stale Logstash JAR

The Logstash port is not updated as often as Logstash is and you are likely going to get a pretty old version. Fortunately, the port lets you very easily run an updated Logstash binary. Beware the following:
  • Changing a ports' file may cause problems while upgrading a port.
  • Index files generated (or updated) by newer ElasticSearch instance may not be backwards compatible and rolling back a manual JAR upgrade may not be easy or possible altogether.
The Logstash service rc script uses the logstash_jar configuration variable to set the path of the Logstash JAR archive, the default value being

/usr/local/logstash/logstash-${version}-flatjar.jar

where ${version} is the version of the Logstash port.

An updated Logstash JAR archive can be downloaded and the value of the logstash_jar variable can be overridden in the /etc/rc.conf configuration file:

logstash_jar="/usr/local/logstash/logstash-1.3.3-flatjar.jar"

Always test a newer Logstash instance in a test environment.

Edited on June, 20th: Since version 1.4 Logstash is not distributed as a single JAR file any longer. I explored the tasks required to install the new version of Logstash on FreeBSD in a new blog post.

Adding Custom Java VM Options

Very often it's desirable to pass additional options to the Java VM but unfortunately the Logstash port service rc script doesn't allow you to do so. Unless, of course, you modify it.

I usually define a new empty configuration variable, called logstash_java_opts: in the service rc file /usr/local/etc/rc.d/logstash (added line in bold):

[...snip...]
: ${logstash_java_home="/usr/local/openjdk6"}
: ${logstash_java_opts=""}
: ${logstash_log="NO"}
[...snip...]

and then update the launch command (added fragment in bold):

[...snip...]
command_args="-f -p ${pidfile} ${java_cmd} ${logstash_java_opts}
required_files="${java_cmd} ${logstash_config}"

run_rc_command "$1"

Now, you can override the value of the logstash_java_opts variable in /etc/rc.conf:

logstash_java_opts="-Xmx2048M"

Once again, beware the consequence of changing the files of an installed binary package. In the meantime, I've asked the current port maintainer to consider a modification to support this use case.

Configuring Logstash

Logstash should now be configured according to your needs. The configuration file used by this package rc script is specified by the logstash_config variable whose default value is set by the rc script (only the relevant lines are shown):

name=logstash
: ${logstash_config="/usr/local/etc/${name}/${name}.conf"}

This value can be overridden setting the logstash_config variable in the /etc/rc.conf configuration file.

Detailed information and working examples can be found in the official Logstash documentation.

Testing Logstash

To start the Logstash service to test its configuration, the following command can be used:

# service logstash onestart

To stop it, use:

# service logstash onestop

To enable the Logstash log, the logstash_log should be set to YES in the /etc/rc.conf file:

logstash_log="YES"

The log file location is specified by the logstash_log_file variable, whose default value is set by the service rc file (only the relevant lines are shown):

name=logstash
logdir="/var/log"
: ${logstash_log_file="${logdir}/${name}.log"}

The log file location can be overridden setting the logstash_log_file variable in the /etc/rc.conf file.

Enabling the Logstash Service

When the configuration is correct, it can be enabled so that it's started during the system startup. To enable the Logstash service, add the following line to /etc/rc.conf:

logstash_enable="YES"

Further Readings

An updated version of this post was published on June 20th 2014 to describe the installation procedure of Logstash v. 1.4 (and greater) on FreeBSD.