Two corner cases were hanging around [2][3] related to css lifecycles,
since they're loosely related I'm sending them together.
The 2nd patch fixes problems encountered in syzbot tests only.
Alternative solutions could be:
- daisy-chain css_release_work_fn from the offending css_killed_work_fn call,
- rework kill_css() not to rely on multi-stage css_killed_work_fn() [1].
The simplest approach was chosen.
The other existing users of percpu_ref_kill_and_confirm are not affected by
similar issues.
[1] Rough idea is to only synchronize via a completion like e.g.
nvmet_sq_destroy() does and move most of css_killed_work_fn() at the end of
kill_css(). kill_css() is only used in process context when de-configuring
controllers or rmdiring a cgroup.
[2] https://lore.kernel.org/lkml/[email protected]/
[3] https://lore.kernel.org/lkml/[email protected]/
Michal Koutný (2):
cgroup: Wait for cgroup_subsys_state offlining on unmount
cgroup: Use separate work structs on css release path
include/linux/cgroup-defs.h | 5 +++--
kernel/cgroup/cgroup.c | 19 +++++++++++--------
2 files changed, 14 insertions(+), 10 deletions(-)
--
2.35.3