Версия 17:38, 29 января 2016

Heka

Heka is an open source stream processing software system developed by Mozilla. Heka is a “Swiss Army Knife” type tool for data processing, useful for a wide variety of different tasks, such as:

Loading and parsing log files from a file system.
Accepting statsd type metrics data for aggregation and forwarding to upstream time series data stores such as graphite or InfluxDB.
Launching external processes to gather operational data from the local system.
Performing real time analysis, graphing, and anomaly detection on any data flowing through the Heka pipeline.
Shipping data from one location to another via the use of an external transport (such as AMQP) or directly (via TCP).
Delivering processed data to one or more persistent data stores.

Configuration overview

All LMA heka config files are located in /etc/lma_collector folder. e.g. on controller there are follwing confguration files:

amqp-openstack_error.toml
amqp-openstack_info.toml
amqp-openstack_warn.toml
decoder-collectd.toml
decoder-http-check.toml
decoder-keystone_7_0.toml
decoder-keystone_wsgi.toml
decoder-mysql.toml
decoder-notification.toml
decoder-openstack.toml
decoder-ovs.toml
decoder-pacemaker.toml
decoder-rabbitmq.toml
decoder-swift.toml
decoder-system.toml
encoder-elasticsearch.toml
encoder-influxdb.toml
encoder-nagios_afd_nodes_debug.toml
encoder-nagios_afd_nodes.toml
encoder-nagios_gse_global_clusters.toml
encoder-nagios_gse_node_clusters.toml
filter-afd_api_backends.toml
filter-afd_api_endpoints.toml
filter-afd_node_controller_cpu.toml
filter-afd_node_controller_log-fs.toml
filter-afd_node_controller_root-fs.toml
filter-afd_node_mysql-nodes_mysql-fs.toml
filter-afd_service_apache_worker.toml
filter-afd_service_cinder-api_http_errors.toml
filter-afd_service_glance-api_http_errors.toml
filter-afd_service_heat-api_http_errors.toml
filter-afd_service_keystone-admin-api_http_errors.toml
filter-afd_service_keystone-public-api_http_errors.toml
filter-afd_service_mysql_node-status.toml
filter-afd_service_neutron-api_http_errors.toml
filter-afd_service_nova-api_http_errors.toml
filter-afd_service_rabbitmq_disk.toml
filter-afd_service_rabbitmq_memory.toml
filter-afd_service_rabbitmq_queue.toml
filter-afd_service_swift-api_http_errors.toml
filter-afd_workers.toml
filter-gse_global.toml
filter-gse_node.toml
filter-gse_service.toml
filter-heka_monitoring.toml
filter-http_metrics.toml
filter-influxdb_accumulator.toml
filter-influxdb_annotation.toml
filter-instance_state.toml
filter-resource_creation_time.toml
filter-service_heartbeat.toml
global.toml
httplisten-collectd.toml
httplisten-http-check.toml
input-aggregator.toml
logstreamer-keystone_7_0.toml
logstreamer-keystone_wsgi.toml
logstreamer-mysql.toml
logstreamer-openstack_7_0.toml
logstreamer-openstack_dashboard.toml
logstreamer-ovs.toml
logstreamer-pacemaker.toml
logstreamer-rabbitmq.toml
logstreamer-swift.toml
logstreamer-system.toml
multidecoder-aggregator.toml
output-aggregator.toml
output-dashboard.toml
output-elasticsearch.toml
output-influxdb.toml
output-nagios_afd_nodes.toml
output-nagios_gse_global_clusters.toml
output-nagios_gse_node_clusters.toml
scribbler-aggregator_flag.toml
splitter-openstack.toml
splitter-rabbitmq.toml

Heka's configuration files can be divided into follwing groups:

Inputs
Splitters
Decoders
Filters
Encoders
Outputs

Inputs

On controller there are following inputs groups:

AMQPInput

AMQP input (https://hekad.readthedocs.org/en/v0.10.0/config/inputs/amqp.html)
There are followinf AMQP inputs:

amqp-openstack_error.toml
amqp-openstack_info.toml
amqp-openstack_warn.toml

All AMQP inputs looks like:

[openstack_error_amqp]
type = "AMQPInput"
url = "amqp://nova:nova_password@192.168.0.2:5673/"
exchange = "nova"
exchange_type = "topic"
exchange_durability = false
exchange_auto_delete = false
queue_auto_delete = false
queue = "lma_notifications.error"
routing_key = "lma_notifications.error"
decoder = "notification_decoder"
splitter = "NullSplitter"
can_exit = true

The only difference between AMQP inputs are queue and routing_key parameter:

queue = "lma_notifications.info"
routing_key = "lma_notifications.info"

All AMQP inputs use one decoder to decode AMQP messages: notification_decoder, configuration can be found in decoder-notification.toml file.

LMA plugin configures openstack services to use 'lma_notifications' as notification_topics, e.g :

# cat /etc/nova/nova.conf | grep lma
notification_topics=lma_notifications

so heka is enable to get messages from queue and decode it.
Also, it is possible to see rabbitmq messages using trace plugin, for details please see: http://wiki.sirmax.noname.com.ua/index.php/Rabbitmq_trace#RabbitMQ_log_messages

HttpListenInput

HttpListenInput plugins start a webserver listening on the specified address and port. For more detail: https://hekad.readthedocs.org/en/v0.10.0/config/inputs/httplisten.html
There are the folljwing HttpListen inputs configured in LMA (controller)

httplisten-collectd.toml
httplisten-http-check.toml

httplisten-collectd

This is input used to get data only from local collectd.

[collectd_httplisten]
type="HttpListenInput"
address = "127.0.0.1:8325"
decoder = "collectd_decoder"
splitter = "NullSplitter"

httplisten-http-check

[http-check_httplisten]
type="HttpListenInput"
address = "192.168.0.2:5566"
decoder = "http-check_decoder"
splitter = "NullSplitter"

This is 'opened port' used for haproxy http check. As you can see in haproxy config, this port is used only for check 'is heka running or not' for expose port 5565 from input-aggregator.
/etc/haproxy/conf.d/999-lma.cfg

listen lma
  bind 192.168.0.7:5565
  balance  roundrobin
  mode  tcp
  option  httpchk
  option  tcplog
  server node-6 192.168.0.2:5565  check port 5566

TcpInput

There is only one tcp input in LMA configuration:

input-aggregator.toml

[aggregator_tcpinput]
type="TcpInput"
address = "192.168.0.2:5565"
decoder = "aggregator_decoder"
splitter = "HekaFramingSplitter"

This input is used to aggregate data in HA configuration and this port is exposed using haproxy on Virtual IP.
So in HA multi-cotroller configuration this port will be exposed only on one controller.
More details will be provided below.

LogstreamerInput

Logstream input tails a single log file, a sequential single log source, or multiple log sources of either a single logstream or multiple logstreams.
More detals: https://hekad.readthedocs.org/en/v0.10.0/config/inputs/logstreamer.html
There are following inputs configured on controller:

logstreamer-keystone_7_0.toml
logstreamer-keystone_wsgi.toml
logstreamer-mysql.toml
logstreamer-openstack_7_0.toml
logstreamer-openstack_dashboard.toml
logstreamer-ovs.toml
logstreamer-pacemaker.toml
logstreamer-rabbitmq.toml
logstreamer-swift.toml
logstreamer-system.toml

All logstream inputs are very closed to each other. E.g logstreamer-openstack:

[openstack_7_0_logstreamer]
type = "LogstreamerInput"
log_directory = "/var/log"
file_match = '(?P<Service>nova|cinder|glance|heat|neutron|murano)-all\.log$'
differentiator = [ 'openstack.', 'Service' ]
decoder = "openstack_decoder"
splitter = "openstack_splitter"

This input do the following:

read files from /var/log/ matches file_match expression
diffirentiator is a set of strings that will be used in the naming of the logger. E.g. records from /var/log/nova-all.log will be marked as :Logger: openstack.nova

:Timestamp: 2016-01-27 15:44:05.114000128 +0000 UTC
:Type: log
:Hostname: node-6
:Pid: 17814
:Uuid: c2a1db38-1f24-48b6-a96b-34be7b364eb3
:Logger: openstack.nova
:Payload: nova.osapi_compute.wsgi.server [-] 192.168.0.7 "OPTIONS / HTTP/1.0" status: 200 len: 317 time: 0.0005581
:EnvVersion:
:Severity: 6
:Fields:
    | name:"syslogfacility" type:double value:22
    | name:"environment_label" type:string value:"test2"
    | name:"http_client_ip_address" type:string value:"192.168.0.7"
    | name:"http_response_time" type:double value:0.0005581
    | name:"http_method" type:string value:"OPTIONS"
    | name:"http_version" type:string value:"1.0"
    | name:"http_url" type:string value:"/"
    | name:"openstack_release" type:string value:"2015.1.0-7.0"
    | name:"http_response_size" type:double value:317
    | name:"openstack_region" type:string value:"RegionOne"
    | name:"http_status" type:string value:"200"
    | name:"openstack_roles" type:string value:"primary-controller"
    | name:"deployment_mode" type:string value:"ha_compact"
    | name:"programname" type:string value:"nova-api"
    | name:"deployment_id" type:string value:"3"
    | name:"severity_label" type:string value:"INFO"

"openstack_decoder" is lua decoder, /usr/share/lma_collector/decoders/openstack_log.lua
"openstack_splitter" is regexp splitter:

[openstack_splitter]
type = "RegexSplitter"
delimiter = '(<[0-9]+>)'
delimiter_eol = false

This splitter is very simple: each openstack log contains leading part '<number>', e.g. we can check all unique fields:

# cat  /var/log/*all.log | sort -u  -t'>' -k1,1
<134>Jan 28 18:00:02 node-6 heat-api-cfn 2016-01-28 18:00:02.115 15557 INFO eventlet.wsgi.server [-] 192.168.0.7 - - [28/Jan/2016 18:00:02] "OPTIONS / HTTP/1.0" 300 275 0.000297
<14>Jan 21 15:00:02 node-6 glance-cache-pruner 2016-01-21 15:00:02.026 24376 INFO glance.image_cache [-] Image cache loaded driver 'sqlite'.
<147>Jan 21 15:08:19 node-6 glance-api 2016-01-21 15:08:19.576 3196 ERROR swiftclient [req-023ef8c5-9b09-40b1-9806-e685e205c16d 56aa47e7bf964ce4a13456f055739c29 7a65891a25f94a3bbda76b99e582ade6 - - -] Container HEAD failed: http://192.168.0.7:8080/v1/AUTH_7a65891a25f94a3bbda76b99e582ade6/glance 404 Not Found
<148>Jan 21 14:47:05 node-6 glance-registry 2016-01-21 14:47:05.943 3141 WARNING keystonemiddleware.auth_token [-] Configuring admin URI using auth fragments. This is deprecated, use 'identity_uri' instead.
<150>Jan 21 14:47:03 node-6 glance-manage 2016-01-21 14:47:03.198 3051 INFO migrate.versioning.api [-] 0 -> 1...
<155>Jan 28 13:18:06 node-6 cinder-scheduler 2016-01-28 13:18:06.088 18090 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server 192.168.0.2:5673 closed the connection. Check login credentials: Socket closed
<158>Jan 25 18:00:04 node-6 cinder-api 2016-01-25 18:00:04.089 18212 INFO eventlet.wsgi.server [-] (18212) accepted ('192.168.0.7', 53352)
<166>Jan 28 15:00:09 node-6 neutron-server 2016-01-28 15:00:09.535 17707 INFO neutron.wsgi [-] (17707) accepted ('192.168.0.7', 49085)
<182>Jan 28 15:00:08 node-6 nova-api 2016-01-28 15:00:08.742 7567 INFO nova.osapi_compute.wsgi.server [-] 192.168.0.7 "OPTIONS / HTTP/1.0" status: 200 len: 317 time: 0.0006490
<44>Jan 21 14:49:24 node-6 swift-container-server: Configuration option internal_client_conf_path not defined. Using default configuration, See internal-client.conf-sample for options
<45>Jan 21 14:49:25 node-6 swift-container-server: Started child 26510
<46>Jan 21 14:42:44 node-6 keystone_wsgi_admin_access 192.168.0.2 - - [21/Jan/2016:14:42:42 +0000] "GET /v3/services HTTP/1.1" 200 113 532351 "-" "python-keystoneclient"

This number in log is PRI rsyslog field:
{{Quote The PRI value is a combination of so-called severity and facility. The facility indicates where the message originated from (e.g. kernel, mail subsystem) while the severity provides a glimpse of how important the message might be (e.g. error or informational).}} and added into message template:

$Template RemoteLog, "<%pri%>%timestamp% %hostname% %syslogtag%%msg:::sp-if-no-1st-sp%%msg%\n"

Splitters

Splitter details: https://hekad.readthedocs.org/en/v0.10.0/config/splitters/index.html
There are only one custom splitter:

[openstack_splitter]
type = "RegexSplitter"
delimiter = '(<[0-9]+>)'
delimiter_eol = false

Decoders

decoder-collectd.toml
decoder-libvirt.toml
decoder-openstack.toml
decoder-ovs.toml
decoder-system.toml

Heka Debugging

[RstEncoder]

[output_file]
type = "FileOutput"
#message_matcher = "Fields[aggregator] == NIL && Type == 'heka.sandbox.afd_node_metric'"
message_matcher = "Fields[aggregator] == NIL"
path = "/var/log/heka-debug.log"
perm = "666"
flush_count = 100
flush_operator = "OR"
#encoder = "nagios_afd_nodes_encoder_debug"
encoder = "RstEncoder"

Heka: различия между версиями