Telegraf reloads every 5 minutes, causing interruptions in sending data to prometheus
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Telegraf Charm |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
As of stable/20.08, charm-telegraf is causing telegraf to be reloaded on every firing of the update-status hook. The timing of the service reloads appears to be correlated with where we're seeing gaps in data collected by Prometheus.
Further details:
* The underlying bug is in reactive/
* Changing "is_data_changed" to "data_changed", the version of that function which writes the passed-in value to .unit-state.db, should fix this issue. The data_changed() helper provides the same return value, but also updates the .unit-state.db file so that subsequent calls will compare against the previous calls.
* The underlying issue has existed for years, but was more benign. However, in stable/20.08, a service_reload() call was added, thus it is stable/20.08 where this issue began to appear. I believe adding that call was completely appropriate; however, because of the underlying bug mentioned above, the reload is happening every 5 minutes instead of only when necessary.
Related branches
- Alvaro Uria (community): Approve
- Haw Loeung: Approve (+1)
- Edin S (community): Approve
-
Diff: 13 lines (+1/-1)1 file modifiedsrc/reactive/telegraf.py (+1/-1)
Changed in charm-telegraf: | |
status: | New → Fix Released |