Telegraf reloads every 5 minutes, causing interruptions in sending data to prometheus

Bug #1911815 reported by Paul Goins
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Telegraf Charm
Fix Released
Undecided
Unassigned

Bug Description

As of stable/20.08, charm-telegraf is causing telegraf to be reloaded on every firing of the update-status hook. The timing of the service reloads appears to be correlated with where we're seeing gaps in data collected by Prometheus.

Further details:
* The underlying bug is in reactive/telegraf.py, update_status(). Its call to charms.reactive.helpers.is_data_changed("detect_changes", get_base_inputs(), ...) never actually results in that value being updated, so the data it passes in will be compared to None every time - i.e. it will always return that it has changed. This causes configure_telegraf() to be called on every invocation of update_status, which in turn results in service reloads on every invocation.
* Changing "is_data_changed" to "data_changed", the version of that function which writes the passed-in value to .unit-state.db, should fix this issue. The data_changed() helper provides the same return value, but also updates the .unit-state.db file so that subsequent calls will compare against the previous calls.
* The underlying issue has existed for years, but was more benign. However, in stable/20.08, a service_reload() call was added, thus it is stable/20.08 where this issue began to appear. I believe adding that call was completely appropriate; however, because of the underlying bug mentioned above, the reload is happening every 5 minutes instead of only when necessary.

Related branches

Xav Paice (xavpaice)
Changed in charm-telegraf:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.