Heka Debugging: различия между версиями

Материал из noname.com.ua
Перейти к навигацииПерейти к поиску
Строка 3: Строка 3:
 
[[Категория:MOS FUEL]]
 
[[Категория:MOS FUEL]]
 
[[Категория:Linux]]
 
[[Категория:Linux]]
  +
[[Категория:Monitoring]]
 
=Heka Debugging=
 
=Heka Debugging=
 
From time to time something went wrong with any software so there are a few words about debugging.
 
From time to time something went wrong with any software so there are a few words about debugging.

Версия 15:41, 9 февраля 2016

Heka Debugging

From time to time something went wrong with any software so there are a few words about debugging.

Save data to file

The easiest way to debug is to configure FileOutput (https://hekad.readthedocs.org/en/v0.10.0/config/outputs/file.html) and save all data in debug file.
To encode output is used RstEncoder (https://hekad.readthedocs.org/en/v0.10.0/config/encoders/rst.html)

To enable such debugging it is necessary to add config file, e.g /etc/lma_collector/00_debug.toml
Attention: this configuration is dangerous because it will create I/O load. Please be careful with debugging on production!

[RstEncoder]

[output_file]
type = "FileOutput"
message_matcher = "Fields[aggregator] == NIL"
path = "/var/log/heka-debug.log"
perm = "666"
flush_count = 100
flush_operator = "OR"
encoder = "RstEncoder"

Example of output:


tail -100 /var/log/heka-debug.log
... <SKIP> ...
:Timestamp: 2016-01-31 16:58:42.232999936 +0000 UTC
:Type: metric
:Hostname: node-6
:Pid: 5505
:Uuid: 2a73fa6f-21fc-40b0-acf9-7fe96bcb9b80
:Logger: collectd
:Payload: {"type":"gauge","values":[0],"type_instance":"q-agent-notifier-dvr-update_fanout_57e5b132e51840ebb6.messages","meta":{"0":true},"dsnames":["value"],"plugin":"rabbitmq_info","time":1454259522.233,"interval":10,"host":"node-6","dstypes":["gauge"],"plugin_instance":""}
:EnvVersion:
:Severity: 6
:Fields:
    | name:"type" type:string value:"gauge"
    | name:"source" type:string value:"rabbitmq_info"
    | name:"openstack_release" type:string value:"2015.1.0-7.0"
    | name:"deployment_mode" type:string value:"ha_compact"
    | name:"openstack_roles" type:string value:"primary-controller"
    | name:"queue" type:string value:"q-agent-notifier-dvr-update_fanout_57e5b132e51840ebb6"
    | name:"tag_fields" type:string value:"queue"
    | name:"openstack_region" type:string value:"RegionOne"
    | name:"name" type:string value:"rabbitmq_queue_messages"
    | name:"hostname" type:string value:"node-6"
    | name:"value" type:double value:0
    | name:"deployment_id" type:string value:"3"
    | name:"environment_label" type:string value:"test2"
    | name:"interval" type:double value:10


:Timestamp: 2016-01-31 16:58:42 +0000 UTC
:Type: test_decoder
:Hostname: node-6
:Pid: 0
:Uuid: bbc94442-67c8-4445-acf8-341b14c2d43f
:Logger: test_input
:Payload:
:EnvVersion:
:Severity: 7
:Fields:
    | name:"test_decoderData" type:string value:"888999888"
    | name:"test_decoderCount" type:double value:1212
    | name:"test_decoderDataToNumber" type:double value:8.88999888e+08


:Timestamp: 2016-01-31 16:58:42 +0000 UTC
:Type: log
:Hostname: node-6
:Pid: 11905
:Uuid: 36cfae57-cb3e-4777-a886-60a351fb230c
:Logger: pacemaker
:Payload: cib_process_ping: 	Reporting our current digest to node-6.domain.tld: 70954c66c71f324b3851af4d92ef93dd for 0.90.152 (0x1c12960 0)

:EnvVersion:
:Severity: 7
:Fields:
    | name:"deployment_mode" type:string value:"ha_compact"
    | name:"environment_label" type:string value:"test2"
    | name:"openstack_region" type:string value:"RegionOne"
    | name:"deployment_id" type:string value:"3"
    | name:"severity_label" type:string value:"DEBUG"
    | name:"programname" type:string value:"cib"
    | name:"openstack_roles" type:string value:"primary-controller"
    | name:"openstack_release" type:string value:"2015.1.0-7.0"

Debug data filtering

To avoid I/O overload if is possible to filter messages. To filter data only from test_decoder and save it into the /var/log/heka-debug-test-decoder.log file is used the following configuration:

01_debug_test_decoder_data.toml

[output_file]
type = "FileOutput"
message_matcher = "Type == 'test_decoder'"
path = "/var/log/heka-debug-test-decoder.log"
perm = "666"
flush_count = 100
flush_operator = "OR"
encoder = "RstEncoder"

Remark: depends on your current Heka configuration you may need to enable RstEncoder. In this example it is already enabled in 00_debug.toml file.

Example of output:

:Timestamp: 2016-01-31 17:07:16 +0000 UTC
:Type: test_decoder
:Hostname: node-6
:Pid: 0
:Uuid: 38ffa77b-8f5c-4f73-aba5-f5de1f15f0b8
:Logger: test_input
:Payload:
:EnvVersion:
:Severity: 7
:Fields:
    | name:"test_decoderData" type:string value:"888999888"
    | name:"test_decoderCount" type:double value:29
    | name:"test_decoderDataToNumber" type:double value:8.88999888e+08


:Timestamp: 2016-01-31 17:07:17 +0000 UTC
:Type: test_decoder
:Hostname: node-6
:Pid: 0
:Uuid: 1545163c-cc97-46aa-aba8-431ab6ca3338
:Logger: test_input
:Payload:
:EnvVersion:
:Severity: 7
:Fields:
    | name:"test_decoderData" type:string value:"888999888"
    | name:"test_decoderCount" type:double value:30
    | name:"test_decoderDataToNumber" type:double value:8.88999888e+08


:Timestamp: 2016-01-31 17:07:18 +0000 UTC
:Type: test_decoder
:Hostname: node-6
:Pid: 0
:Uuid: 5da75795-63fd-48f8-ac52-44db368f54ed
:Logger: test_input
:Payload:
:EnvVersion:
:Severity: 7
:Fields:
    | name:"test_decoderData" type:string value:"888999888"
    | name:"test_decoderCount" type:double value:31
    | name:"test_decoderDataToNumber" type:double value:8.88999888e+08

If Heka can't start

If Heka is not able to start it is possible to run it manually:

  • Disable resorce in CRM
    • Run crm resource command:
#crm resource crm(live)resource# list ... <SKIP> ... Clone Set: clone_lma_collector [lma_collector] Started: [ node-6.domain.tld ]
    • Stop lma_ollector resource:
#crm(live)resource# stop lma_collector crm(live)resource# stop clone_lma_collector
    • Check lma_collector resource status. It should be stopped.
#crm(live)resource# list Clone Set: clone_lma_collector [lma_collector] Stopped: [ node-6.domain.tld ]

Now it is safe to run Heka manually.

  • Create screen session:
#screen


  • In screeen section switch into heka user
#sudo -H -u heka /bin/bash
$hekad -config /etc/lma_collector/ 2016/01/31 17:23:39 Error reading config: open /etc/lma_collector/01_debug_test_decoder_data.toml: permission denied


  • As you can see in example above error is related to file permissions:
## ls -lsa /etc/lma_collector/01_debug_test_decoder_data.toml 4 -rwx------ 1 root root 199 Jan 31 17:06 /etc/lma_collector/01_debug_test_decoder_data.toml
  • File is owned by root and can't be read by heka. To fix it it is nessesary to fix owner or permittions
#chown heka:heka /etc/lma_collector/01_debug_test_decoder_data.toml
  • Start heka again:
$ hekad -config /etc/lma_collector/ 2016/01/31 17:28:13 Pre-loading: [RstEncoder] 2016/01/31 17:28:13 Pre-loading: [output_file] 2016/01/31 17:28:13 Pre-loading: [output_file] 2016/01/31 17:28:13 Pre-loading: [openstack_error_amqp] 2016/01/31 17:28:13 Pre-loading: [openstack_info_amqp] 2016/01/31 17:28:13 Pre-loading: [openstack_warn_amqp] 2016/01/31 17:28:13 Pre-loading: [collectd_decoder] 2016/01/31 17:28:13 Pre-loading: [http-check_decoder] 2016/01/31 17:28:13 Pre-loading: [keystone_7_0_decoder] 2016/01/31 17:28:13 Pre-loading: [keystone_wsgi_decoder] 2016/01/31 17:28:13 Pre-loading: [mm_decoder] 2016/01/31 17:28:13 Pre-loading: [mysql_decoder] 2016/01/31 17:28:13 Pre-loading: [notification_decoder] 2016/01/31 17:28:13 Pre-loading: [openstack_decoder] 2016/01/31 17:28:13 Pre-loading: [ovs_decoder] 2016/01/31 17:28:13 Pre-loading: [pacemaker_decoder] 2016/01/31 17:28:13 Pre-loading: [rabbitmq_decoder] 2016/01/31 17:28:13 Pre-loading: [swift_decoder] 2016/01/31 17:28:13 Pre-loading: [system_decoder] 2016/01/31 17:28:13 Pre-loading: [elasticsearch_encoder] 2016/01/31 17:28:13 Pre-loading: [influxdb_encoder] 2016/01/31 17:28:13 Pre-loading: [nagios_afd_nodes_encoder] 2016/01/31 17:28:13 Pre-loading: [nagios_afd_nodes_encoder_debug] 2016/01/31 17:28:13 Pre-loading: [nagios_gse_global_clusters_encoder] 2016/01/31 17:28:13 Pre-loading: [nagios_gse_node_clusters_encoder] 2016/01/31 17:28:13 Pre-loading: [afd_api_backends_filter] 2016/01/31 17:28:13 Pre-loading: [afd_api_endpoints_filter] 2016/01/31 17:28:13 Pre-loading: [afd_node_controller_cpu_filter] 2016/01/31 17:28:13 Pre-loading: [afd_node_controller_log-fs_filter] 2016/01/31 17:28:13 Pre-loading: [afd_node_controller_root-fs_filter] 2016/01/31 17:28:13 Pre-loading: [afd_node_mysql-nodes_mysql-fs_filter] 2016/01/31 17:28:13 Pre-loading: [afd_service_apache_worker_filter] 2016/01/31 17:28:13 Pre-loading: [afd_service_cinder-api_http_errors_filter] 2016/01/31 17:28:13 Pre-loading: [afd_service_glance-api_http_errors_filter] 2016/01/31 17:28:14 Pre-loading: [afd_service_heat-api_http_errors_filter] 2016/01/31 17:28:14 Pre-loading: [afd_service_keystone-admin-api_http_errors_filter] 2016/01/31 17:28:14 Pre-loading: [afd_service_keystone-public-api_http_errors_filter] 2016/01/31 17:28:14 Pre-loading: [afd_service_mysql_node-status_filter] 2016/01/31 17:28:14 Pre-loading: [afd_service_neutron-api_http_errors_filter] 2016/01/31 17:28:14 Pre-loading: [afd_service_nova-api_http_errors_filter] 2016/01/31 17:28:14 Pre-loading: [afd_service_rabbitmq_disk_filter] 2016/01/31 17:28:14 Pre-loading: [afd_service_rabbitmq_memory_filter] 2016/01/31 17:28:14 Pre-loading: [afd_service_rabbitmq_queue_filter] 2016/01/31 17:28:14 Pre-loading: [afd_service_swift-api_http_errors_filter] 2016/01/31 17:28:14 Pre-loading: [afd_workers_filter] 2016/01/31 17:28:14 Pre-loading: [gse_global_filter] 2016/01/31 17:28:14 Pre-loading: [gse_node_filter] 2016/01/31 17:28:14 Pre-loading: [gse_service_filter] 2016/01/31 17:28:14 Pre-loading: [heka_monitoring_filter] 2016/01/31 17:28:14 Pre-loading: [http_metrics_filter] 2016/01/31 17:28:14 Pre-loading: [influxdb_accumulator_filter] 2016/01/31 17:28:14 Pre-loading: [influxdb_annotation_filter] 2016/01/31 17:28:14 Pre-loading: [instance_state_filter] 2016/01/31 17:28:14 Pre-loading: [resource_creation_time_filter] 2016/01/31 17:28:14 Pre-loading: [service_heartbeat_filter] 2016/01/31 17:28:14 Pre-loading: [collectd_httplisten] 2016/01/31 17:28:14 Pre-loading: [http-check_httplisten] 2016/01/31 17:28:14 Pre-loading: [aggregator_tcpinput] 2016/01/31 17:28:14 Pre-loading: [keystone_7_0_logstreamer] 2016/01/31 17:28:14 Pre-loading: [keystone_wsgi_logstreamer] 2016/01/31 17:28:14 Pre-loading: [test_input] 2016/01/31 17:28:14 Pre-loading: [mysql_logstreamer] 2016/01/31 17:28:14 Pre-loading: [openstack_7_0_logstreamer] 2016/01/31 17:28:14 Pre-loading: [openstack_dashboard_logstreamer] 2016/01/31 17:28:14 Pre-loading: [ovs_logstreamer] 2016/01/31 17:28:14 Pre-loading: [pacemaker_logstreamer] 2016/01/31 17:28:14 Pre-loading: [rabbitmq_logstreamer] 2016/01/31 17:28:14 Pre-loading: [swift_logstreamer] 2016/01/31 17:28:14 Pre-loading: [system_logstreamer] 2016/01/31 17:28:14 Pre-loading: [aggregator_decoder] 2016/01/31 17:28:14 Pre-loading: [aggregator_tcpoutput] 2016/01/31 17:28:14 Pre-loading: [DashboardOutput] 2016/01/31 17:28:14 Pre-loading: [elasticsearch_output] 2016/01/31 17:28:14 Pre-loading: [influxdb_output] 2016/01/31 17:28:14 Pre-loading: [nagios_afd_nodes_output] 2016/01/31 17:28:14 Pre-loading: [nagios_afd_nodes_output_file] 2016/01/31 17:28:14 Pre-loading: [nagios_gse_global_clusters_output] 2016/01/31 17:28:14 Pre-loading: [nagios_gse_node_clusters_output] 2016/01/31 17:28:14 Pre-loading: [aggregator_flag_decoder] 2016/01/31 17:28:14 Pre-loading: [openstack_splitter] 2016/01/31 17:28:14 Pre-loading: [rabbitmq_splitter] 2016/01/31 17:28:14 Pre-loading: [ProtobufDecoder] 2016/01/31 17:28:14 Loading: [ProtobufDecoder] 2016/01/31 17:28:14 Pre-loading: [ProtobufEncoder] 2016/01/31 17:28:14 Loading: [ProtobufEncoder] 2016/01/31 17:28:14 Pre-loading: [TokenSplitter] 2016/01/31 17:28:14 Loading: [TokenSplitter] 2016/01/31 17:28:14 Pre-loading: [HekaFramingSplitter] 2016/01/31 17:28:14 Loading: [HekaFramingSplitter] 2016/01/31 17:28:14 Pre-loading: [NullSplitter] 2016/01/31 17:28:14 Loading: [NullSplitter] 2016/01/31 17:28:14 Loading: [collectd_decoder] 2016/01/31 17:28:14 Loading: [http-check_decoder] 2016/01/31 17:28:14 Loading: [keystone_7_0_decoder] 2016/01/31 17:28:14 Loading: [keystone_wsgi_decoder] 2016/01/31 17:28:14 Loading: [mm_decoder] 2016/01/31 17:28:14 Loading: [mysql_decoder] 2016/01/31 17:28:14 Loading: [notification_decoder] 2016/01/31 17:28:14 Loading: [openstack_decoder] 2016/01/31 17:28:14 Loading: [ovs_decoder] 2016/01/31 17:28:14 Loading: [pacemaker_decoder] 2016/01/31 17:28:14 Loading: [rabbitmq_decoder] 2016/01/31 17:28:14 Loading: [swift_decoder] 2016/01/31 17:28:14 Loading: [system_decoder] 2016/01/31 17:28:14 Loading: [aggregator_flag_decoder] 2016/01/31 17:28:14 Loading: [aggregator_decoder] 2016/01/31 17:28:14 Loading: [RstEncoder] 2016/01/31 17:28:14 Loading: [elasticsearch_encoder] 2016/01/31 17:28:14 Loading: [influxdb_encoder] 2016/01/31 17:28:14 Loading: [nagios_afd_nodes_encoder] 2016/01/31 17:28:14 Loading: [nagios_afd_nodes_encoder_debug] 2016/01/31 17:28:14 Loading: [nagios_gse_global_clusters_encoder] 2016/01/31 17:28:14 Loading: [nagios_gse_node_clusters_encoder] 2016/01/31 17:28:14 Loading: [openstack_splitter] 2016/01/31 17:28:14 Loading: [rabbitmq_splitter] 2016/01/31 17:28:14 Loading: [openstack_error_amqp] 2016/01/31 17:28:14 Loading: [openstack_info_amqp] 2016/01/31 17:28:14 Loading: [openstack_warn_amqp] 2016/01/31 17:28:14 Loading: [collectd_httplisten] 2016/01/31 17:28:14 Loading: [http-check_httplisten] 2016/01/31 17:28:14 Loading: [aggregator_tcpinput] 2016/01/31 17:28:14 Loading: [keystone_7_0_logstreamer] 2016/01/31 17:28:14 Loading: [keystone_wsgi_logstreamer] 2016/01/31 17:28:14 Loading: [test_input] 2016/01/31 17:28:14 Loading: [mysql_logstreamer] 2016/01/31 17:28:14 Loading: [openstack_7_0_logstreamer] 2016/01/31 17:28:14 Loading: [openstack_dashboard_logstreamer] 2016/01/31 17:28:14 Loading: [ovs_logstreamer] 2016/01/31 17:28:14 Loading: [pacemaker_logstreamer] 2016/01/31 17:28:14 Loading: [rabbitmq_logstreamer] 2016/01/31 17:28:14 Loading: [swift_logstreamer] 2016/01/31 17:28:14 Loading: [system_logstreamer] 2016/01/31 17:28:14 Loading: [afd_api_backends_filter] 2016/01/31 17:28:14 Loading: [afd_api_endpoints_filter] 2016/01/31 17:28:14 Loading: [afd_node_controller_cpu_filter] 2016/01/31 17:28:14 Loading: [afd_node_controller_log-fs_filter] 2016/01/31 17:28:14 Loading: [afd_node_controller_root-fs_filter] 2016/01/31 17:28:14 Loading: [afd_node_mysql-nodes_mysql-fs_filter] 2016/01/31 17:28:14 Loading: [afd_service_apache_worker_filter] 2016/01/31 17:28:14 Loading: [afd_service_cinder-api_http_errors_filter] 2016/01/31 17:28:14 Loading: [afd_service_glance-api_http_errors_filter] 2016/01/31 17:28:14 Loading: [afd_service_heat-api_http_errors_filter] 2016/01/31 17:28:14 Loading: [afd_service_keystone-admin-api_http_errors_filter] 2016/01/31 17:28:14 Loading: [afd_service_keystone-public-api_http_errors_filter] 2016/01/31 17:28:14 Loading: [afd_service_mysql_node-status_filter] 2016/01/31 17:28:14 Loading: [afd_service_neutron-api_http_errors_filter] 2016/01/31 17:28:14 Loading: [afd_service_nova-api_http_errors_filter] 2016/01/31 17:28:14 Loading: [afd_service_rabbitmq_disk_filter] 2016/01/31 17:28:14 Loading: [afd_service_rabbitmq_memory_filter] 2016/01/31 17:28:14 Loading: [afd_service_rabbitmq_queue_filter] 2016/01/31 17:28:14 Loading: [afd_service_swift-api_http_errors_filter] 2016/01/31 17:28:14 Loading: [afd_workers_filter] 2016/01/31 17:28:14 Loading: [gse_global_filter] 2016/01/31 17:28:14 Loading: [gse_node_filter] 2016/01/31 17:28:14 Loading: [gse_service_filter] 2016/01/31 17:28:14 Loading: [heka_monitoring_filter] 2016/01/31 17:28:14 Loading: [http_metrics_filter] 2016/01/31 17:28:14 Loading: [influxdb_accumulator_filter] 2016/01/31 17:28:14 Loading: [influxdb_annotation_filter] 2016/01/31 17:28:14 Loading: [instance_state_filter] 2016/01/31 17:28:14 Loading: [resource_creation_time_filter] 2016/01/31 17:28:14 Loading: [service_heartbeat_filter] 2016/01/31 17:28:14 Loading: [output_file] 2016/01/31 17:28:14 Loading: [output_file] 2016/01/31 17:28:14 Loading: [aggregator_tcpoutput] 2016/01/31 17:28:14 Loading: [DashboardOutput] 2016/01/31 17:28:14 Loading: [elasticsearch_output] 2016/01/31 17:28:14 Loading: [influxdb_output] 2016/01/31 17:28:14 Loading: [nagios_afd_nodes_output] 2016/01/31 17:28:14 Loading: [nagios_afd_nodes_output_file] 2016/01/31 17:28:14 Loading: [nagios_gse_global_clusters_output] 2016/01/31 17:28:14 Loading: [nagios_gse_node_clusters_output] 2016/01/31 17:28:14 Starting hekad... 2016/01/31 17:28:14 Output started: output_file 2016/01/31 17:28:14 Output started: aggregator_tcpoutput 2016/01/31 17:28:14 Output started: DashboardOutput 2016/01/31 17:28:14 Output started: influxdb_output 2016/01/31 17:28:14 Output started: nagios_afd_nodes_output 2016/01/31 17:28:14 Output started: elasticsearch_output 2016/01/31 17:28:14 Output started: nagios_afd_nodes_output_file 2016/01/31 17:28:14 Output started: nagios_gse_global_clusters_output 2016/01/31 17:28:14 Output started: nagios_gse_node_clusters_output 2016/01/31 17:28:14 Filter started: afd_api_endpoints_filter 2016/01/31 17:28:14 Filter started: afd_node_controller_root-fs_filter 2016/01/31 17:28:14 Filter started: afd_service_cinder-api_http_errors_filter 2016/01/31 17:28:14 Filter started: afd_service_keystone-admin-api_http_errors_filter 2016/01/31 17:28:14 Filter started: afd_service_mysql_node-status_filter 2016/01/31 17:28:14 Filter started: afd_service_nova-api_http_errors_filter 2016/01/31 17:28:14 Filter started: afd_service_rabbitmq_memory_filter 2016/01/31 17:28:14 Filter started: gse_service_filter 2016/01/31 17:28:14 Filter started: afd_service_rabbitmq_disk_filter 2016/01/31 17:28:14 Filter started: influxdb_annotation_filter 2016/01/31 17:28:14 Filter started: influxdb_accumulator_filter 2016/01/31 17:28:14 Filter started: afd_api_backends_filter 2016/01/31 17:28:14 Filter started: afd_node_controller_cpu_filter 2016/01/31 17:28:14 Filter started: afd_service_swift-api_http_errors_filter 2016/01/31 17:28:14 Filter started: service_heartbeat_filter 2016/01/31 17:28:14 Filter started: afd_service_apache_worker_filter 2016/01/31 17:28:14 Filter started: afd_service_heat-api_http_errors_filter 2016/01/31 17:28:14 Filter started: afd_service_neutron-api_http_errors_filter 2016/01/31 17:28:14 Filter started: afd_service_rabbitmq_queue_filter 2016/01/31 17:28:14 Filter started: http_metrics_filter 2016/01/31 17:28:14 Filter started: afd_workers_filter 2016/01/31 17:28:14 Filter started: heka_monitoring_filter 2016/01/31 17:28:14 Filter started: instance_state_filter 2016/01/31 17:28:14 Filter started: resource_creation_time_filter 2016/01/31 17:28:14 Filter started: afd_node_controller_log-fs_filter 2016/01/31 17:28:14 Filter started: afd_node_mysql-nodes_mysql-fs_filter 2016/01/31 17:28:14 Filter started: gse_node_filter 2016/01/31 17:28:14 Filter started: afd_service_glance-api_http_errors_filter 2016/01/31 17:28:14 Filter started: afd_service_keystone-public-api_http_errors_filter 2016/01/31 17:28:14 Filter started: gse_global_filter 2016/01/31 17:28:14 MessageRouter started. 2016/01/31 17:28:14 Input started: test_input 2016/01/31 17:28:14 Input started: openstack_7_0_logstreamer 2016/01/31 17:28:14 Input started: pacemaker_logstreamer 2016/01/31 17:28:14 Input started: rabbitmq_logstreamer ... <SKIP> ...

So heka was successfully started and you can see debug messages.
To detach from screen use ctr-a d shortcut key combination.