Loggly, loggy, logs! Oh my!

First a brief overview of the main technologies we're going to be setting up and configuring.

  • ElasticSearch:
    Provides real-time insights on data that is made immediately available for search and analytics. This allows you to interactively search, discover, and analyze entries to gain insights into your infrastructure. You interact with ElasticSearch via it's friendly RESTful API. The system is distributed, which allows you to add nodes as you need them, and highly available, which means healthy nodes detect failed nodes and react to protect your data.

  • Logstash:
    Logstash is a data pipeline that helps you process logs and other event data. You can opt to parse the data received by Logstash data and manipulate it as needed. Default log formats tend to work out of the box, but you can extend to custom formats fairly easily.

  • Kibana:
    Designed to be seamlessly integrated as the frontend of Elasticsearch, Kibana can give shape to any indexed data. It allows you to give context to your data with bar charts, line and scatter plots, histograms, pie charts, and maps.

Let's get started. Since we're going to need Java, let's add the EPEL repository.

rpm -Uvh http://dl.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm

Next let's install the correct version of Java and Apache.

yum install java-1.7.0-openjdk httpd -y

With Java installed, we can now install ElasticSearch. I'll talk about clustering ElasticSearch and scaling below.

rpm -Uvh https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.6.0.noarch.rpm

Enable ElasticSearch at boot.

/sbin/chkconfig --add elasticsearch

And now let's start ElasticSearch.

service elasticsearch start

Next, let's get Logstash running. Install the Logstash RPM.

rpm -Uvh https://download.elastic.co/logstash/logstash/packages/centos/logstash-1.5.2-1.noarch.rpm

Create a basic config file for Logstash processing. This just creates a file watch on /var/log/messages.

#/etc/logstash/conf.d/messages.conf
input {
  file {
    path => ["/var/log/messages","/var/log/logstash/logstash.log"]
    type => "syslog"
  }
}
output {
  elasticsearch { host => localhost }
  stdout { codec => rubydebug }
}

Enable Logstash at boot.

/sbin/chkconfig --add logstash

And now let's start Logstash.

service logstash start

Finally, let's get Kibana running. First create a directory to store the Kibana webroot.

mkdir /etc/kibana/

Download the Kibana files.

wget https://download.elastic.co/kibana/kibana/kibana-4.1.1-linux-x64.tar.gz

Expand the files.

tar zxfv kibana-4.1.1-linux-x64.tar.gz

Move the files into the appropriate directory.

mv kibana-4.1.1-linux-x64 /etc/kibana/www/

Start up the Kibana Node.js web server:

/etc/kibana/www/bin/kibana > /var/log/kibana.log&

And now browse to your host on port 5601 and view Kibana!

Definitely refer to the official documentation when playing around with Kibana - they have a good number of videos and decent documentation.

You can try this out yourself with my vagrant-elk github repository.

ELK in Production

A few recommendations for running an ELK stack in production:

  1. Use ElasticSearch cluster features for maximum resilience and speed.
  2. Hide all your services behind at least one Elastic Load Balancer.
  3. For access to your ElasticSearch cluster, place it behind a proxy service that:
    1. Blocks / disallows delete requests.
    2. Only serves traffic from your explicit domain.
    3. Optionally requires authentication to get logs.
  4. Depending on your security requirements, also require authentication for Kibana access. Or instead of the ElasticSearch authentication.

Setting up an ElasticSearch cluster is extremely easy. You need to configure a cluster name and a discovery mechanism. Out of the box, ElasticSearch can use multicast to discover other hosts. Unfortunately that doesn't work on AWS, so we need to use the EC2 plugin or unicast.

To set a cluster name:

cluster:
  name: [CLUSTER NAME]

Pro tip: You can also have the node name set to the hostname!

node:
  name: ${HOSTNAME}

To add unicast hosts:

discovery.zen.ping.unicast:
  hosts: "1.1.1.1:9200, 1.1.1.2:9200"

For EC2 you can use the ElasticSearch EC2 Plugin. After downloading and installing the plugin you can configure it like this (assuming IAM roles):

cloud.node.auto_attributes: true
discovery:
  type: ec2
discovery.ec2.tag:
  clustername: "test"

To set up a basic NGINX proxy that prevents bad things from happening to your ElasticSearch cluster, here's a basic configuration:

  upstream elasticsearch {
      server localhost:9200;
  }

server { listen 7777;

  location / {
    return 403;
    limit_except PUT POST DELETE {
      proxy_pass http://elasticsearch;
    }
    proxy_redirect off;
  }

  location ~* ^(/_cluster|/_nodes|/_shutdown) {
    return 403;
    break;
  }

}

For more security advice you can read the Elastic blog: playing-http-tricks-nginx