Wednesday, January 22, 2014

Logstash (Up to at Least 1.4) Fails to Start on FreeBSD 10.0

Logstash (any version from 1.2.1 to 1.4.2) fails to start on FreeBSD 10.0 with the following exception:

Exception in thread "LogStash::Runner" org.jruby.exceptions.RaiseException: (NotImplementedError) stat.st_dev unsupported or native support failed to load
at org.jruby.RubyFileStat.dev_major(org/jruby/RubyFileStat.java:394)
at RUBY._discover_file(file:/usr/local/logstash/logstash-1.2.1-flatjar.jar!/filewatch/watch.rb:140)
at org.jruby.RubyArray.each(org/jruby/RubyArray.java:1617)
at RUBY._discover_file(file:/usr/local/logstash/logstash-1.2.1-flatjar.jar!/filewatch/watch.rb:122)
at RUBY.watch(file:/usr/local/logstash/logstash-1.2.1-flatjar.jar!/filewatch/watch.rb:34)
at RUBY.tail(file:/usr/local/logstash/logstash-1.2.1-flatjar.jar!/filewatch/tail.rb:58)
at RUBY.run(file:/usr/local/logstash/logstash-1.2.1-flatjar.jar!/logstash/inputs/file.rb:125)
at org.jruby.RubyArray.each(org/jruby/RubyArray.java:1617)
at RUBY.run(file:/usr/local/logstash/logstash-1.2.1-flatjar.jar!/logstash/inputs/file.rb:125)
at RUBY.inputworker(file:/usr/local/logstash/logstash-1.2.1-flatjar.jar!/logstash/pipeline.rb:151)
at RUBY.start_input(file:/usr/local/logstash/logstash-1.2.1-flatjar.jar!/logstash/pipeline.rb:145)

It comes out that a long standing issue (affecting Solaris instead of FreeBSD) exists from quite a while: https://logstash.jira.com/browse/LOGSTASH-665. As far as I can see, the same problem now affects FreeBSD 10.0. I added a comment on that issue and opened another one: https://logstash.jira.com/browse/LOGSTASH-1819.

For the moment, Logstash should be run on FreeBSD 9.2.

Edit: I've managed to patch Logstash to work on FreeBSD 10 and I've sent a pull request upstream. Until the pull request is merged and Logstash updated (which it can take forever), you can use a new FreeBSD port I've created to install Logstash on FreeBSD 10. This port is meant to substitute the outdated Logstash port in the FreeBSD port collection. I'm in talks with the port maintainer and hopefully it should not take long.

Update and Workaround

Since the stack trace and the nature of the error itself suggested this is a JRuby bug rather than a Logstash one, I opened an issue (Issue #1754) on JRuby's GitHub Repository. Kevin Menard quickly replied and pointed me to the right track sending me several existing issues regarding a JRuby dependency (jnr-ffi) chocking on FreeBSD 10.0 libc.so ld script (libc.so is a symbolic link in earlier FreeBSD releases).

I couldn't try it until today, when Michael (no more details are given) left a comment to this blog post (see below) pointing at the same reason Kevin gave and the suggestion of trying it in a FreeBSD jail using ezjail.

I confirm that running Logstash in a FreeBSD 10 jail where the existing libc.so is substituted with a symbolic link to the corresponding binary in /lib solves the problem and provides an easy-to-implement workaround to install the latest Logstash release in a FreeBSD 10 environment.

Current Status

I wish to update this post since I've been asked many times about the status of this issue. It turns out that the problem with running Logstash on FreeBSD 10 seems to lie on jnr-ffi bug for which push requests have been sent at least three times:

I hope the push request is finally included upstream. If you are waiting for this issue as well, please vote it and make your voice heard.

6 comments:

Michael said...

This is due to jnr-ffi not recognizing libc, since it has been changed to a ldscript in FreeBSD 10 and jffi's parser, which in general supports this, chokes on it:

cat /usr/lib/libc.so
/* $FreeBSD: release/10.0.0/lib/libc/libc.ldscript 258398 2013-11-20 20:24:59Z peter $ */
GROUP ( /lib/libc.so.7 /usr/lib/libc_nonshared.a /usr/lib/libssp_nonshared.a )

A simple workaround until this has been fixed upstream is to remove libc.so and replace it with a symlink. I'd suggest doing this in a jail, e.g. if you're running ezjail:

cd /usr/jails/basejail/usr/lib
rm libc.so
ln -s ../../lib/libc.so.7 libc.so

Enrico Maria Crisostomo said...

Thank you very much Michael,

I opened this issue on JRuby's GitHub repository:

https://github.com/jruby/jruby/issues/1754

where Kevin Menard pointed me to the same underlying issue with jnr-ffi not recognising that ld script. I'll update the blog post to reflect that and will provide your suggestion as well. I haven't tried it yet, but it's reasonable to assume that it will solve the problem.

Cheers,
-- Enrico

Michael said...

Great to see you updated your post. Since my last comment I switched our logstash production instance to 1.4 on FreeBSD 10 and it's working great.

Your post on installing 1.4 was quite helpful to get things started, I had to do some minor adjustments (install a custom elasticsearch.yml and add LS_HEAP_SIZE to the logstash14 rc.d script).

In case there's no update of sysutils/logstash really soon I would suggest you take over maintainership of that port. It seems to me that the current maintainer is not really using logstash himself and it's the kind of software you want to keep current.

-- Michael

Enrico Maria Crisostomo said...

Thanks again Michael.

I'll certainly volunteer to maintain that port: the current maintainer hasn't upgraded the port yet.

Cheers,
-- Enrico

Abhinav said...

Hi,

I am facing this issue on a Solaris 10 server. I don't really understand this 'jail' software. Can you please explain how I can get this fix done on my Solaris box to start using Logstash 'file' input?

Cheers,
Abhinav

Enrico Maria Crisostomo said...

Hi Abhinav,

Jail is BSD jargon. In Solaris 10 you'd use a Zone or a LDom (or both).

Basically, the problem is a native library not being a binary file but a dynamic linker script.

You should to the same thing: removing the linker script and create a symbolic link.

The suggestion to do it in a jail (or zone, ldom, etc.) is to avoid breaking the system in case something goes wrong.

Cheers,
-- Enrico