Heka: различия между версиями

Материал из noname.com.ua
Перейти к навигацииПерейти к поиску
 
(не показано 40 промежуточных версий этого же участника)
Строка 1: Строка 1:
  +
[[Категория:Heka]]
  +
[[Категория:LMA]]
  +
[[Категория:MOS FUEL]]
  +
[[Категория:Linux]]
 
=Heka=
 
=Heka=
 
Heka is an open source stream processing software system developed by Mozilla. Heka is a “Swiss Army Knife” type tool for data processing, useful for a wide variety of different tasks, such as:
 
Heka is an open source stream processing software system developed by Mozilla. Heka is a “Swiss Army Knife” type tool for data processing, useful for a wide variety of different tasks, such as:
Строка 7: Строка 11:
 
* Shipping data from one location to another via the use of an external transport (such as AMQP) or directly (via TCP).
 
* Shipping data from one location to another via the use of an external transport (such as AMQP) or directly (via TCP).
 
* Delivering processed data to one or more persistent data stores.
 
* Delivering processed data to one or more persistent data stores.
  +
==Data flow and LAM HA overview==
  +
In LMA HA clusters we need to aggregate data on primary controller. "Primary" controller is controller with VIP (virtual IP), so we can sen aggregated data to VIP. Sometimes it may looks, so need more detailed explanation.
  +
<BR>
  +
  +
* 3 controllers, all are up and running
  +
  +
  +
<PRE>
  +
[VIP] <-----------------------------------------------------------------------------<--+
  +
Aggregation ^ ^ |
  +
Input | | |
  +
+--------------+ | +--------------+ | +--------------+ |
  +
| Controller 1 | | | Controller 2 | | |Controller 3 | |
  +
| Hekad |Aggregation-| | Hekad |Aggregation-| |Hekad |Aggregation-|
  +
| |Output | |Output | |Output
  +
| | | | | |
  +
| | | | | |
  +
+--------------+ +--------------+ +--------------+
  +
| | |
  +
+--------------------------------+---------->--->--->--->--->---+------->[ElasticSearch, InfluxDB, Nagios, etc]
  +
  +
</PRE>
  +
  +
* Controler 1 is down, VIP moved to Controller3
  +
  +
  +
<PRE>
  +
->-->--[VIP]----<-<---------<--+
  +
^ Aggregation |
  +
| Input |
  +
+--------------+ +--------------+ | +--------------+ |
  +
| Controller 1 | | Controller 2 | | |Controller 3 | |
  +
| Hekad |Aggregation- | Hekad |Aggregation-| |Hekad |Aggregation-|
  +
| |Output | |Output | |Output
  +
| DOWN | | | | |
  +
| | | | | |
  +
+--------------+ +--------------+ +--------------+
  +
| |
  +
+---------->--->--->--->--->---+------->[ElasticSearch, InfluxDB, Nagios, etc]
  +
</PRE>
  +
  +
To avoid endless loop used field aggregator, e.g:
  +
<PRE>
  +
message_matcher = "Fields[aggregator]
  +
</PRE>
  +
 
==Configuration overview==
 
==Configuration overview==
 
All LMA heka config files are located in /etc/lma_collector folder.
 
All LMA heka config files are located in /etc/lma_collector folder.
Строка 96: Строка 146:
 
* Encoders
 
* Encoders
 
* Outputs
 
* Outputs
===Inputs===
 
   
On controller there are following inputs groups:
 
====AMQPInput====
 
AMQP input (https://hekad.readthedocs.org/en/v0.10.0/config/inputs/amqp.html)
 
<BR>
 
There are followinf AMQP inputs:
 
* amqp-openstack_error.toml
 
* amqp-openstack_info.toml
 
* amqp-openstack_warn.toml
 
   
All AMQP inputs looks like:
 
<PRE>
 
[openstack_error_amqp]
 
type = "AMQPInput"
 
url = "amqp://nova:nova_password@192.168.0.2:5673/"
 
exchange = "nova"
 
exchange_type = "topic"
 
exchange_durability = false
 
exchange_auto_delete = false
 
queue_auto_delete = false
 
queue = "lma_notifications.error"
 
routing_key = "lma_notifications.error"
 
decoder = "notification_decoder"
 
splitter = "NullSplitter"
 
can_exit = true
 
</PRE>
 
The only difference between AMQP inputs are queue and routing_key parameter:
 
<PRE>
 
queue = "lma_notifications.info"
 
routing_key = "lma_notifications.info"
 
</PRE>
 
   
  +
Data flow in heka is following:
All AMQP inputs use one decoder to decode AMQP messages: notification_decoder, configuration can be found in <B>decoder-notification.toml</B> file.
 
<BR>
 
   
  +
<PRE>
LMA plugin configures openstack services to use 'lma_notifications' as notification_topics, e.g :
 
  +
+-<----------------------+
<PRE>
 
  +
| |
# cat /etc/nova/nova.conf | grep lma
 
  +
+-> | log reader decoder for service A log format | Routing |---->Filters (may inject new messages)
notification_topics=lma_notifications
 
  +
- read logs -------------------------> | log reader decoder for service B log format | |----> Decoder (TXT/JSON/....)-+---------->Outputs to file
</PRE>
 
  +
--Inputs - listen http/tcp/udp-----------------> | http decoder for input from service C | | |
so heka is enable to get messages from queue and decode it.
 
  +
+-> | tcp/udp decoder for input from service D | | +---------->TCP
<BR>
 
  +
- other inputs (not ued in LMA now)---> | Other/custom decoders | | +---------->Http PUT
Also, it is possible to see rabbitmq messages using trace plugin, for details please see: http://wiki.sirmax.noname.com.ua/index.php/Rabbitmq_trace#RabbitMQ_log_messages
 
   
====HttpListenInput====
 
HttpListenInput plugins start a webserver listening on the specified address and port. For more detail: https://hekad.readthedocs.org/en/v0.10.0/config/inputs/httplisten.html
 
<BR>There are the folljwing HttpListen inputs configured in LMA (controller)
 
* httplisten-collectd.toml
 
* httplisten-http-check.toml
 
=====httplisten-collectd=====
 
This is input used to get data only from <B>local</B> collectd.
 
<PRE>
 
[collectd_httplisten]
 
type="HttpListenInput"
 
address = "127.0.0.1:8325"
 
decoder = "collectd_decoder"
 
splitter = "NullSplitter"
 
 
</PRE>
 
</PRE>
  +
* Good article to [https://www.librato.com/docs/kb/collect/collection_agents/heka.html go deeper with heka].
  +
* Splitters are skipped on scheme above to make it simpler.
   
  +
So, what does heka REALLY do? <BR>
=====httplisten-http-check=====
 
  +
* Reads data from multiple sources
<PRE>
 
  +
* Split data into messages
[http-check_httplisten]
 
  +
* Decode each message and save it internal format
type="HttpListenInput"
 
  +
* Send messages to filters if configured
address = "192.168.0.2:5566"
 
  +
** Filter can create new messages, e.g.aggregate data and inject message back to heka
decoder = "http-check_decoder"
 
  +
* Encode (transform) output message
splitter = "NullSplitter"
 
  +
* Send message to out from Heka to file or another process
</PRE>
 
  +
This is 'opened port' used for haproxy http check.
 
  +
It sounds not too complicated (and like other log collectors e.g. logstash) but heka's configuration may be very complex.
As you can see in haproxy config, this port is used only for check 'is heka running or not' for expose port 5565 from input-aggregator.
 
  +
  +
==Heka Inputs==
  +
Input plugins acquire data from the outside world and inject it into the Heka pipeline.
  +
They can do this by reading files from a file system, actively making network connections to acquire data from remote servers,
  +
listening on a network socket for external actors to push data in,
  +
launching processes on the local system to gather arbitrary data, or any other mechanism.
  +
<BR>
  +
Input plugins must be written in Go.
 
<BR>
 
<BR>
  +
* More detailed inputs explanation is in [http://wiki.sirmax.noname.com.ua/index.php/Heka_Inputs LMA Inputs details] document.
<B>/etc/haproxy/conf.d/999-lma.cfg</B>
 
<PRE>
 
listen lma
 
bind 192.168.0.7:5565
 
balance roundrobin
 
mode tcp
 
option httpchk
 
option tcplog
 
server node-6 192.168.0.2:5565 check port 5566
 
</PRE>
 
   
====TcpInput====
+
==Splitters==
  +
Splitter plugins receive the data that is being acquired by an input plugin and slice it up into individual records. They must be written in Go.
There is only one tcp input in LMA configuration:
 
* input-aggregator.toml
 
<PRE>
 
[aggregator_tcpinput]
 
type="TcpInput"
 
address = "192.168.0.2:5565"
 
decoder = "aggregator_decoder"
 
splitter = "HekaFramingSplitter"
 
</PRE>
 
   
  +
* Short [http://wiki.sirmax.noname.com.ua/index.php/Heka_Splitters Splitters in LMA] description.
This input is used to aggregate data in HA configuration and this port is exposed using haproxy on Virtual IP.
 
<BR> So in HA multi-cotroller configuration this port will be exposed only on one controller.<BR>More details will be provided below.
 
   
  +
==Decoders==
====LogstreamerInput====
 
  +
Decoder plugins convert data that comes in through the Input plugins to Heka’s internal Message data structure. Typically decoders are responsible for any parsing, deserializing, or extracting of structure from unstructured data that needs to happen.
Logstream input tails a single log file, a sequential single log source, or multiple log sources of either a single logstream or multiple logstreams.
 
<BR>More detals: https://hekad.readthedocs.org/en/v0.10.0/config/inputs/logstreamer.html
 
 
<BR>
 
<BR>
  +
Decoder plugins can be written entirely in Go, or the core logic can be written in sandboxed Lua code.
There are following inputs configured on controller:
 
* logstreamer-keystone_7_0.toml
 
* logstreamer-keystone_wsgi.toml
 
* logstreamer-mysql.toml
 
* logstreamer-openstack_7_0.toml
 
* logstreamer-openstack_dashboard.toml
 
* logstreamer-ovs.toml
 
* logstreamer-pacemaker.toml
 
* logstreamer-rabbitmq.toml
 
* logstreamer-swift.toml
 
* logstreamer-system.toml
 
All logstream inputs are very closed to each other. E.g logstreamer-openstack:
 
<PRE>
 
[openstack_7_0_logstreamer]
 
type = "LogstreamerInput"
 
log_directory = "/var/log"
 
file_match = '(?P<Service>nova|cinder|glance|heat|neutron|murano)-all\.log$'
 
differentiator = [ 'openstack.', 'Service' ]
 
decoder = "openstack_decoder"
 
splitter = "openstack_splitter"
 
</PRE>
 
   
  +
*Decoders [http://wiki.sirmax.noname.com.ua/index.php/Heka_Decoders configured in the LMA]
This input do the following:
 
* read files from /var/log/ matches file_match expression
 
* diffirentiator is a set of strings that will be used in the naming of the logger. E.g. records from /var/log/nova-all.log will be marked as <B>:Logger: openstack.nova</B>
 
<PRE>
 
:Timestamp: 2016-01-27 15:44:05.114000128 +0000 UTC
 
:Type: log
 
:Hostname: node-6
 
:Pid: 17814
 
:Uuid: c2a1db38-1f24-48b6-a96b-34be7b364eb3
 
:Logger: openstack.nova
 
:Payload: nova.osapi_compute.wsgi.server [-] 192.168.0.7 "OPTIONS / HTTP/1.0" status: 200 len: 317 time: 0.0005581
 
:EnvVersion:
 
:Severity: 6
 
:Fields:
 
| name:"syslogfacility" type:double value:22
 
| name:"environment_label" type:string value:"test2"
 
| name:"http_client_ip_address" type:string value:"192.168.0.7"
 
| name:"http_response_time" type:double value:0.0005581
 
| name:"http_method" type:string value:"OPTIONS"
 
| name:"http_version" type:string value:"1.0"
 
| name:"http_url" type:string value:"/"
 
| name:"openstack_release" type:string value:"2015.1.0-7.0"
 
| name:"http_response_size" type:double value:317
 
| name:"openstack_region" type:string value:"RegionOne"
 
| name:"http_status" type:string value:"200"
 
| name:"openstack_roles" type:string value:"primary-controller"
 
| name:"deployment_mode" type:string value:"ha_compact"
 
| name:"programname" type:string value:"nova-api"
 
| name:"deployment_id" type:string value:"3"
 
| name:"severity_label" type:string value:"INFO"
 
</PRE>
 
* "openstack_decoder" is lua decoder, /usr/share/lma_collector/decoders/openstack_log.lua
 
* "openstack_splitter" is regexp splitter:
 
<PRE>
 
[openstack_splitter]
 
type = "RegexSplitter"
 
delimiter = '(<[0-9]+>)'
 
delimiter_eol = false
 
</PRE>
 
This splitter is very simple: each openstack log contains leading part '<number>', e.g. we can check all unique fields:
 
<PRE>
 
# cat /var/log/*all.log | sort -u -t'>' -k1,1
 
<134>Jan 28 18:00:02 node-6 heat-api-cfn 2016-01-28 18:00:02.115 15557 INFO eventlet.wsgi.server [-] 192.168.0.7 - - [28/Jan/2016 18:00:02] "OPTIONS / HTTP/1.0" 300 275 0.000297
 
<14>Jan 21 15:00:02 node-6 glance-cache-pruner 2016-01-21 15:00:02.026 24376 INFO glance.image_cache [-] Image cache loaded driver 'sqlite'.
 
<147>Jan 21 15:08:19 node-6 glance-api 2016-01-21 15:08:19.576 3196 ERROR swiftclient [req-023ef8c5-9b09-40b1-9806-e685e205c16d 56aa47e7bf964ce4a13456f055739c29 7a65891a25f94a3bbda76b99e582ade6 - - -] Container HEAD failed: http://192.168.0.7:8080/v1/AUTH_7a65891a25f94a3bbda76b99e582ade6/glance 404 Not Found
 
<148>Jan 21 14:47:05 node-6 glance-registry 2016-01-21 14:47:05.943 3141 WARNING keystonemiddleware.auth_token [-] Configuring admin URI using auth fragments. This is deprecated, use 'identity_uri' instead.
 
<150>Jan 21 14:47:03 node-6 glance-manage 2016-01-21 14:47:03.198 3051 INFO migrate.versioning.api [-] 0 -> 1...
 
<155>Jan 28 13:18:06 node-6 cinder-scheduler 2016-01-28 13:18:06.088 18090 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server 192.168.0.2:5673 closed the connection. Check login credentials: Socket closed
 
<158>Jan 25 18:00:04 node-6 cinder-api 2016-01-25 18:00:04.089 18212 INFO eventlet.wsgi.server [-] (18212) accepted ('192.168.0.7', 53352)
 
<166>Jan 28 15:00:09 node-6 neutron-server 2016-01-28 15:00:09.535 17707 INFO neutron.wsgi [-] (17707) accepted ('192.168.0.7', 49085)
 
<182>Jan 28 15:00:08 node-6 nova-api 2016-01-28 15:00:08.742 7567 INFO nova.osapi_compute.wsgi.server [-] 192.168.0.7 "OPTIONS / HTTP/1.0" status: 200 len: 317 time: 0.0006490
 
<44>Jan 21 14:49:24 node-6 swift-container-server: Configuration option internal_client_conf_path not defined. Using default configuration, See internal-client.conf-sample for options
 
<45>Jan 21 14:49:25 node-6 swift-container-server: Started child 26510
 
<46>Jan 21 14:42:44 node-6 keystone_wsgi_admin_access 192.168.0.2 - - [21/Jan/2016:14:42:42 +0000] "GET /v3/services HTTP/1.1" 200 113 532351 "-" "python-keystoneclient"
 
</PRE>
 
   
  +
==Filters==
This number in log is PRI rsyslog field:
 
  +
Filter plugins are Heka’s processing engines. They are configured to receive messages matching certain specific characteristics (using Heka’s Message Matcher Syntax) and are able to perform arbitrary monitoring, aggregation, and/or processing of the data. Filters are also able to generate new messages that can be reinjected into the Heka pipeline, such as summary messages containing aggregate data, notification messages in cases where suspicious anomalies are detected, or circular buffer data messages that will show up as real time graphs in Heka’s dashboard.
 
<BR>
 
<BR>
  +
Filters can be written entirely in Go, or the core logic can be written in sandboxed Lua code. It is also possible to configure Heka to allow Lua filters to be dynamically injected into a running Heka instance without needing to reconfigure or restart the Heka process, nor even to have shell access to the server on which Heka is running.
{{quote|
 
  +
<BR>
The PRI value is a combination of so-called severity and facility. The facility indicates where the message originated from (e.g. kernel, mail subsystem) while the severity provides a glimpse of how important the message might be (e.g. error or informational).}}
 
  +
* Filters configured in the LMA are described in [http://wiki.sirmax.noname.com.ua/index.php/Heka_Filters Filter description] document
and added into message template:
 
  +
<PRE>
 
  +
==Encoders==
$Template RemoteLog, "<%pri%>%timestamp% %hostname% %syslogtag%%msg:::sp-if-no-1st-sp%%msg%\n"
 
</PRE>
 
   
  +
Encoder plugins are the inverse of Decoders. They generate arbitrary byte streams using data extracted from Heka Message structs. Encoders are embedded within Output plugins; Encoders handle the serialization, Outputs handle the details of interacting with the outside world.
===Splitters===
 
Splitter details: https://hekad.readthedocs.org/en/v0.10.0/config/splitters/index.html
 
 
<BR>
 
<BR>
  +
Encoder plugins can be written entirely in Go, or the core logic can be written in sandboxed Lua code.
There are only one custom splitter:
 
 
<BR>
 
<BR>
  +
Example of simple Nagios encoder you can find in [http://wiki.sirmax.noname.com.ua/index.php/Heka_Filter_afd_example#Simple_Nagios_Encoder example of AFD filters] document.
<PRE>
 
[openstack_splitter]
 
type = "RegexSplitter"
 
delimiter = '(<[0-9]+>)'
 
delimiter_eol = false
 
</PRE>
 
===Decoders===
 
Decoders parse the contents of the inputs to extract data from the text format and map them onto a Heka message schema. List of all available decoders: https://hekad.readthedocs.org/en/v0.10.0/config/decoders/index.html
 
<BR> On controller we have the follwing decoders configured:
 
* decoder-collectd.toml
 
* decoder-http-check.toml
 
* decoder-keystone_7_0.toml
 
* decoder-keystone_wsgi.toml
 
* decoder-mysql.toml
 
* decoder-notification.toml
 
* decoder-openstack.toml
 
* decoder-ovs.toml
 
* decoder-pacemaker.toml
 
* decoder-rabbitmq.toml
 
* decoder-swift.toml
 
* decoder-system.toml
 
 
All decoders are SandboxDecoder
 
 
====SandboxDecoder====
 
Sandbox documentation: https://hekad.readthedocs.org/en/v0.10.0/sandbox/index.html
 
   
  +
==Outputs==
  +
<BR>
  +
Output plugins send data that has been serialized by an Encoder to some external destination. They handle all of the details of interacting with the network, filesystem, or any other outside resource. They are, like Filters, configured using Heka’s Message Matcher Syntax so they will only receive and deliver messages matching certain characteristics.
  +
<BR>
  +
Output plugins must be written in Go.
 
==Heka Debugging==
 
==Heka Debugging==
  +
* Heka debugging [http://wiki.sirmax.noname.com.ua/index.php/Heka_Debugging mini-HowTO]
<PRE>
 
[RstEncoder]
 
 
[output_file]
 
type = "FileOutput"
 
#message_matcher = "Fields[aggregator] == NIL && Type == 'heka.sandbox.afd_node_metric'"
 
message_matcher = "Fields[aggregator] == NIL"
 
path = "/var/log/heka-debug.log"
 
perm = "666"
 
flush_count = 100
 
flush_operator = "OR"
 
#encoder = "nagios_afd_nodes_encoder_debug"
 
encoder = "RstEncoder"
 
</PRE>
 

Текущая версия на 19:19, 9 февраля 2016

Heka

Heka is an open source stream processing software system developed by Mozilla. Heka is a “Swiss Army Knife” type tool for data processing, useful for a wide variety of different tasks, such as:

  • Loading and parsing log files from a file system.
  • Accepting statsd type metrics data for aggregation and forwarding to upstream time series data stores such as graphite or InfluxDB.
  • Launching external processes to gather operational data from the local system.
  • Performing real time analysis, graphing, and anomaly detection on any data flowing through the Heka pipeline.
  • Shipping data from one location to another via the use of an external transport (such as AMQP) or directly (via TCP).
  • Delivering processed data to one or more persistent data stores.

Data flow and LAM HA overview

In LMA HA clusters we need to aggregate data on primary controller. "Primary" controller is controller with VIP (virtual IP), so we can sen aggregated data to VIP. Sometimes it may looks, so need more detailed explanation.

  • 3 controllers, all are up and running


     [VIP] <-----------------------------------------------------------------------------<--+
 Aggregation                ^                                ^                              |
 Input                      |                                |                              |
+--------------+            |    +--------------+            |  +--------------+            |
| Controller 1 |            |    | Controller 2 |            |  |Controller 3  |            |
| Hekad        |Aggregation-|    | Hekad        |Aggregation-|  |Hekad         |Aggregation-|  
|              |Output           |              |Output         |              |Output         
|              |                 |              |               |              |
|              |                 |              |               |              |
+--------------+                 +--------------+               +--------------+
      |                                |                              |
      +--------------------------------+---------->--->--->--->--->---+------->[ElasticSearch, InfluxDB, Nagios, etc]

  • Controler 1 is down, VIP moved to Controller3


                                                             ->-->--[VIP]----<-<---------<--+
                                                             ^   Aggregation                |
                                                             |   Input                      |
+--------------+                 +--------------+            |  +--------------+            |
| Controller 1 |                 | Controller 2 |            |  |Controller 3  |            |
| Hekad        |Aggregation-     | Hekad        |Aggregation-|  |Hekad         |Aggregation-|  
|              |Output           |              |Output         |              |Output         
| DOWN         |                 |              |               |              |
|              |                 |              |               |              |
+--------------+                 +--------------+               +--------------+
                                       |                              |
                                       +---------->--->--->--->--->---+------->[ElasticSearch, InfluxDB, Nagios, etc]

To avoid endless loop used field aggregator, e.g:

message_matcher = "Fields[aggregator]

Configuration overview

All LMA heka config files are located in /etc/lma_collector folder. e.g. on controller there are follwing confguration files:

amqp-openstack_error.toml
amqp-openstack_info.toml
amqp-openstack_warn.toml
decoder-collectd.toml
decoder-http-check.toml
decoder-keystone_7_0.toml
decoder-keystone_wsgi.toml
decoder-mysql.toml
decoder-notification.toml
decoder-openstack.toml
decoder-ovs.toml
decoder-pacemaker.toml
decoder-rabbitmq.toml
decoder-swift.toml
decoder-system.toml
encoder-elasticsearch.toml
encoder-influxdb.toml
encoder-nagios_afd_nodes_debug.toml
encoder-nagios_afd_nodes.toml
encoder-nagios_gse_global_clusters.toml
encoder-nagios_gse_node_clusters.toml
filter-afd_api_backends.toml
filter-afd_api_endpoints.toml
filter-afd_node_controller_cpu.toml
filter-afd_node_controller_log-fs.toml
filter-afd_node_controller_root-fs.toml
filter-afd_node_mysql-nodes_mysql-fs.toml
filter-afd_service_apache_worker.toml
filter-afd_service_cinder-api_http_errors.toml
filter-afd_service_glance-api_http_errors.toml
filter-afd_service_heat-api_http_errors.toml
filter-afd_service_keystone-admin-api_http_errors.toml
filter-afd_service_keystone-public-api_http_errors.toml
filter-afd_service_mysql_node-status.toml
filter-afd_service_neutron-api_http_errors.toml
filter-afd_service_nova-api_http_errors.toml
filter-afd_service_rabbitmq_disk.toml
filter-afd_service_rabbitmq_memory.toml
filter-afd_service_rabbitmq_queue.toml
filter-afd_service_swift-api_http_errors.toml
filter-afd_workers.toml
filter-gse_global.toml
filter-gse_node.toml
filter-gse_service.toml
filter-heka_monitoring.toml
filter-http_metrics.toml
filter-influxdb_accumulator.toml
filter-influxdb_annotation.toml
filter-instance_state.toml
filter-resource_creation_time.toml
filter-service_heartbeat.toml
global.toml
httplisten-collectd.toml
httplisten-http-check.toml
input-aggregator.toml
logstreamer-keystone_7_0.toml
logstreamer-keystone_wsgi.toml
logstreamer-mysql.toml
logstreamer-openstack_7_0.toml
logstreamer-openstack_dashboard.toml
logstreamer-ovs.toml
logstreamer-pacemaker.toml
logstreamer-rabbitmq.toml
logstreamer-swift.toml
logstreamer-system.toml
multidecoder-aggregator.toml
output-aggregator.toml
output-dashboard.toml
output-elasticsearch.toml
output-influxdb.toml
output-nagios_afd_nodes.toml
output-nagios_gse_global_clusters.toml
output-nagios_gse_node_clusters.toml
scribbler-aggregator_flag.toml
splitter-openstack.toml
splitter-rabbitmq.toml

Heka's configuration files can be divided into follwing groups:

  • Inputs
  • Splitters
  • Decoders
  • Filters
  • Encoders
  • Outputs


Data flow in heka is following:

 
                                                                                                 +-<----------------------+
                                                                                                 |                        |
                                             +-> | log reader decoder for service A log format | Routing |---->Filters (may inject new messages)
         - read logs  -------------------------> | log reader decoder for service B log format |         |----> Decoder (TXT/JSON/....)-+---------->Outputs to file
--Inputs - listen http/tcp/udp-----------------> | http decoder for input from service C       |         |                              |
                                             +-> | tcp/udp decoder for input from service D    |         |                              +---------->TCP
         - other inputs (not ued in LMA now)---> | Other/custom decoders                       |         |                              +---------->Http PUT  

So, what does heka REALLY do?

  • Reads data from multiple sources
  • Split data into messages
  • Decode each message and save it internal format
  • Send messages to filters if configured
    • Filter can create new messages, e.g.aggregate data and inject message back to heka
  • Encode (transform) output message
  • Send message to out from Heka to file or another process

It sounds not too complicated (and like other log collectors e.g. logstash) but heka's configuration may be very complex.

Heka Inputs

Input plugins acquire data from the outside world and inject it into the Heka pipeline. They can do this by reading files from a file system, actively making network connections to acquire data from remote servers, listening on a network socket for external actors to push data in, launching processes on the local system to gather arbitrary data, or any other mechanism.
Input plugins must be written in Go.

Splitters

Splitter plugins receive the data that is being acquired by an input plugin and slice it up into individual records. They must be written in Go.

Decoders

Decoder plugins convert data that comes in through the Input plugins to Heka’s internal Message data structure. Typically decoders are responsible for any parsing, deserializing, or extracting of structure from unstructured data that needs to happen.
Decoder plugins can be written entirely in Go, or the core logic can be written in sandboxed Lua code.

Filters

Filter plugins are Heka’s processing engines. They are configured to receive messages matching certain specific characteristics (using Heka’s Message Matcher Syntax) and are able to perform arbitrary monitoring, aggregation, and/or processing of the data. Filters are also able to generate new messages that can be reinjected into the Heka pipeline, such as summary messages containing aggregate data, notification messages in cases where suspicious anomalies are detected, or circular buffer data messages that will show up as real time graphs in Heka’s dashboard.
Filters can be written entirely in Go, or the core logic can be written in sandboxed Lua code. It is also possible to configure Heka to allow Lua filters to be dynamically injected into a running Heka instance without needing to reconfigure or restart the Heka process, nor even to have shell access to the server on which Heka is running.

Encoders

Encoder plugins are the inverse of Decoders. They generate arbitrary byte streams using data extracted from Heka Message structs. Encoders are embedded within Output plugins; Encoders handle the serialization, Outputs handle the details of interacting with the outside world.
Encoder plugins can be written entirely in Go, or the core logic can be written in sandboxed Lua code.
Example of simple Nagios encoder you can find in example of AFD filters document.

Outputs


Output plugins send data that has been serialized by an Encoder to some external destination. They handle all of the details of interacting with the network, filesystem, or any other outside resource. They are, like Filters, configured using Heka’s Message Matcher Syntax so they will only receive and deliver messages matching certain characteristics.
Output plugins must be written in Go.

Heka Debugging