Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp1359152pxj; Wed, 19 May 2021 04:24:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwrPUAW0ieSHZzrF8/S8KCeH39Wl6A0mhIzdsbas7nIckMYIKcMbWF94GfdhWSWpyEcatyY X-Received: by 2002:a5d:9916:: with SMTP id x22mr1492033iol.160.1621423472692; Wed, 19 May 2021 04:24:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1621423472; cv=none; d=google.com; s=arc-20160816; b=aESIVi7zjArBSHwpPtF+nl4N0oY9ukJjPhLhKExB/46IjTKCbchk5y5Lt9Ksbx6kkD ik81Y44Zo+OL6Tg4qUaM4hdijZ4F6cNmyDqkMm4XZLyW/D3azsthGl/+u7LD5kzzbHF6 sth0Rt/0h+rWAU+Bh9ZiH31m0zbp+d+w+8Ec2gvHiWmV7Dprtt59uHmKYQCeosuUchxv yvdyOrUPU2t/qDCov9W0n1ssQwZi2/Vmgc8E8XpC0MDj1Isv3vyk9/3118vgea4LR8oO GP9IIXs7aVdni5XQrtykn5tKWpDrsHo3ISzFd53UiQdsi8p8NJ0y1HFNYTM4cEOsiRtU PTew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=1Ny5lIy3pJwNbIsx+jbPo64u2eYbS5hv3YL9/m46a60=; b=Q9W/cP5CcBvnu1Pas+cC5vMLkXTBT9ZnGorQ0/DN/M/4XgXe7und0quBqGhAhN+Zqy lI3D3VW5mDcdzVNPPhJvpWneDbIGlfkl+JgGk8RrJuQK9/RxRVTCxcu4mOxBJHbhyFwj BCHpz4vgRUnoR0/1kh8eJsCQq1MFQj6uMYBZt9PiGBBi0fOhtv9OoWwfK1i+ZhuqNaOv MTQTYEU5pO1Q2C4HdNl6hMGbfzwkFl7kOr8FVilkHeBUzcz0XBRxVRQmgOOUDEds2LW1 +TXjxr/zFqKmeW9iFj51MYl5qcnbhQc6x4N4B3PjKR24ruzaGF9y4cng08sYMvljGCuR kylQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=nGB7yTMo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i17si19084767jav.59.2021.05.19.04.24.20; Wed, 19 May 2021 04:24:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=nGB7yTMo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240172AbhERCK3 (ORCPT + 99 others); Mon, 17 May 2021 22:10:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57572 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238402AbhERCK2 (ORCPT ); Mon, 17 May 2021 22:10:28 -0400 Received: from mail-yb1-xb31.google.com (mail-yb1-xb31.google.com [IPv6:2607:f8b0:4864:20::b31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 13893C061573 for ; Mon, 17 May 2021 19:09:11 -0700 (PDT) Received: by mail-yb1-xb31.google.com with SMTP id c14so11180462ybr.5 for ; Mon, 17 May 2021 19:09:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=1Ny5lIy3pJwNbIsx+jbPo64u2eYbS5hv3YL9/m46a60=; b=nGB7yTMo3luYrv4PggfMl6ZQh7jVQCszv1QFwsbzkXImQkn6hxYluV+o8mdXJyxtod U56V3JXibStDJY71weIi8O06n9ohTsB2/EfpnggSBd4YoAu0GCCfKIOa/9gHru8a0xHe MddVzicSXB8EYPrIlb1kbvqnhzuGFBObYdkzLWEnr7CWYGeTzNIxXcw3C/7gUqpZ5ROs 6C6DJZ1Cv62qB9v4PnTmM8QT9K+4VuEjsgc8DJdxo82oWp2kThoIpVHnXTQX9UXgOyuS 4aCWR7f88W1jzT5ZzwR1Ps44ZtvTjXt65owZylH0XHY/R8Eve630FYHlzisO8LLp/W6H E5cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=1Ny5lIy3pJwNbIsx+jbPo64u2eYbS5hv3YL9/m46a60=; b=MQVIeaVttxKHEUll2jqRX82zE/g9MFWVOfo6qCYwlgrS7+bpIP31Rr7VzS/I1+6MHg v83GviFonrxX8EmoycYla4OoNH8cAsvnQGwe6K17I4FcBKL/x/9EiOg/0O8dWyMpk1+5 /eOXJj9wfnvpYtbuL9CmHTuQMS6biHuJIXPwKL/Tr2PgySPZDQgBWM6J7Pw4VBiQvsnr 1f1sD/OEQFUleeztcvEDqFdry5YMyxk8B4oBHzVIF6ztpOND1CPpuHE795OVwY03MNIR rNJ8IkOQtzhnvSUjzaxeonPfP+t0N2F/5NxEUt6iCl2U1Jc4ccumPJ5wW0EGL3Ow+1Tf ZBaQ== X-Gm-Message-State: AOAM532zcT18vqA/GJmkVvV7ZY4aVe+lC0ocxF/HVHFtIia+Lziq4ZWW kTd0tf7XdBPfGjsu3ewxbKszsKHjkGsq2koD92+j2g== X-Received: by 2002:a25:8e0e:: with SMTP id p14mr4326862ybl.84.1621303750134; Mon, 17 May 2021 19:09:10 -0700 (PDT) MIME-Version: 1.0 References: <1621242249-8314-1-git-send-email-huangzhaoyang@gmail.com> In-Reply-To: From: Suren Baghdasaryan Date: Mon, 17 May 2021 19:08:59 -0700 Message-ID: Subject: Re: [[RFC]PATCH] psi: fix race between psi_trigger_create and psimon To: Zhaoyang Huang Cc: Johannes Weiner , Zhaoyang Huang , Ziwei Dai , Ke Wang , LKML Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 17, 2021 at 6:47 PM Suren Baghdasaryan wrote: > > On Mon, May 17, 2021 at 5:41 PM Zhaoyang Huang wrote: > > > > On Tue, May 18, 2021 at 5:30 AM Suren Baghdasaryan wrote: > > > > > > On Mon, May 17, 2021 at 12:33 PM Suren Baghdasaryan wrote: > > > > > > > > On Mon, May 17, 2021 at 11:36 AM Johannes Weiner wrote: > > > > > > > > > > CC Suren > > > > > > > > Thanks! When resending the patch please run scripts/get_maintainer.pl against your patch and CC reported recipients. Thanks! > > > > > > > > > > > > > > On Mon, May 17, 2021 at 05:04:09PM +0800, Huangzhaoyang wrote: > > > > > > From: Zhaoyang Huang > > > > > > > > > > > > Race detected between psimon_new and psimon_old as shown below, which > > > > > > cause panic by accessing invalid psi_system->poll_wait->wait_queue_entry > > > > > > and psi_system->poll_timer->entry->next. It is not necessary to reinit > > > > > > resource of psi_system when psi_trigger_create. > > > > > > > > resource of psi_system will not be reinitialized because > > > > init_waitqueue_head(&group->poll_wait) and friends are initialized > > > > only during the creation of the first trigger for that group (see this > > > > condition: https://elixir.bootlin.com/linux/latest/source/kernel/sched/psi.c#L1119). > > > > > > > > > > > > > > > > psi_trigger_create psimon_new psimon_old > > > > > > init_waitqueue_head finish_wait > > > > > > spin_lock(lock_old) > > > > > > spin_lock_init(lock_new) > > > > > > wake_up_process(psimon_new) > > > > > > > > > > > > finish_wait > > > > > > spin_lock(lock_new) > > > > > > list_del list_del > > > > > > > > Could you please clarify this race a bit? I'm having trouble > > > > deciphering this diagram. I'm guessing psimon_new/psimon_old refer to > > > > a new trigger being created while an old one is being deleted, so it > > > > seems like a race between psi_trigger_create/psi_trigger_destroy. The > > > > combination of trigger_lock and RCU should be protecting us from that > > > > but maybe I missed something? > > > > I'm excluding a possibility of a race between psi_trigger_create with > > > > another existing trigger on the same group because the codepath > > > > calling init_waitqueue_head(&group->poll_wait) happens only when the > > > > first trigger for that group is created. Therefore if there is an > > > > existing trigger in that group that codepath will not be taken. > > > > > > Ok, looking at the current code I think you can hit the following race > > > when psi_trigger_destroy is destroying the last trigger in a psi group > > > while racing with psi_trigger_create: > > > > > > psi_trigger_destroy psi_trigger_create > > > mutex_lock(trigger_lock); > > > rcu_assign_pointer(poll_task, NULL); > > > mutex_unlock(trigger_lock); > > > mutex_lock(trigger_lock); > > > if > > > (!rcu_access_pointer(group->poll_task)) { > > > > > > timer_setup(poll_timer, poll_timer_fn, 0); > > > > > > rcu_assign_pointer(poll_task, task); > > > } > > > mutex_unlock(trigger_lock); > > > > > > synchronize_rcu(); > > > del_timer_sync(poll_timer); <-- poll_timer has been reinitialized by > > > psi_trigger_create > > > > > > So, trigger_lock/RCU correctly protects destruction of > > > group->poll_task but misses this race affecting poll_timer and > > > poll_wait. > > > Let me think if we can handle this without moving initialization into > > > group_init(). > > Right, this is exactly what we met during a monkey test on an android > > system, where the psimon will be destroyed/recreated by unref/recreate > > the psi_trigger. IMHO, poll_timer and poll_wait should exist during > > whole period > > Ok, understood. I think it should be ok to initialize poll_wait and > poll_timer at the group creation time. Looks like > init_waitqueue_head() and timer_setup() initialize the fields but I > don't think they allocate some additional resources. Johannes pointed > to some issues in your original patch, so I've made some small > modifications (see below). del_timer_sync() was important back when we > used kthread_worker, now even if timer fires unnecessarily it should > be harmless after we reset group->poll_task. So I think a del_timer() > in psi_trigger_destroy() should be enough: > > @@ -181,6 +181,7 @@ struct psi_group psi_system = { > }; > > static void psi_avgs_work(struct work_struct *work); > +static void poll_timer_fn(struct timer_list *t); > > static void group_init(struct psi_group *group) > { > @@ -202,6 +203,8 @@ static void group_init(struct psi_group *group) > group->polling_next_update = ULLONG_MAX; > group->polling_until = 0; > rcu_assign_pointer(group->poll_task, NULL); > + init_waitqueue_head(&group->poll_wait); > + timer_setup(&group->poll_timer, poll_timer_fn, 0); > } > > void __init psi_init(void) > @@ -1157,9 +1160,7 @@ struct psi_trigger *psi_trigger_create(struct > psi_group *group, > return ERR_CAST(task); > } > atomic_set(&group->poll_wakeup, 0); > - init_waitqueue_head(&group->poll_wait); > wake_up_process(task); > - timer_setup(&group->poll_timer, poll_timer_fn, 0); > rcu_assign_pointer(group->poll_task, task); > } > > @@ -1211,6 +1212,7 @@ static void psi_trigger_destroy(struct kref *ref) > group->poll_task, > lockdep_is_held(&group->trigger_lock)); > rcu_assign_pointer(group->poll_task, NULL); > + del_timer(&group->poll_timer); > } > } > > @@ -1230,10 +1232,7 @@ static void psi_trigger_destroy(struct kref *ref) > /* > * After the RCU grace period has expired, the worker > * can no longer be found through group->poll_task. > - * But it might have been already scheduled before > - * that - deschedule it cleanly before destroying it. > */ > - del_timer_sync(&group->poll_timer); > kthread_stop(task_to_destroy); > } > kfree(t); > > > > > > > > > > > > > > > > > > > > Signed-off-by: ziwei.dai > > > > > > Signed-off-by: ke.wang > > > > > > Signed-off-by: Zhaoyang Huang > > > > > > --- > > > > > > kernel/sched/psi.c | 6 ++++-- > > > > > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c > > > > > > index cc25a3c..d00e585 100644 > > > > > > --- a/kernel/sched/psi.c > > > > > > +++ b/kernel/sched/psi.c > > > > > > @@ -182,6 +182,8 @@ struct psi_group psi_system = { > > > > > > > > > > > > static void psi_avgs_work(struct work_struct *work); > > > > > > > > > > > > +static void poll_timer_fn(struct timer_list *t); > > > > > > + > > > > > > static void group_init(struct psi_group *group) > > > > > > { > > > > > > int cpu; > > > > > > @@ -201,6 +203,8 @@ static void group_init(struct psi_group *group) > > > > > > memset(group->polling_total, 0, sizeof(group->polling_total)); > > > > > > group->polling_next_update = ULLONG_MAX; > > > > > > group->polling_until = 0; > > > > > > + init_waitqueue_head(&group->poll_wait); > > > > > > + timer_setup(&group->poll_timer, poll_timer_fn, 0); > > > > > > > > > > This makes sense. > > > > > > > > Well, this means we initialize resources for triggers in each psi > > > > group even if the user never creates any triggers. Current logic > > > > initializes them when the first trigger in the group gets created. > > > > > > > > > > > > > > > rcu_assign_pointer(group->poll_task, NULL); > > > > > > } > > > > > > > > > > > > @@ -1157,7 +1161,6 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group, > > > > > > return ERR_CAST(task); > > > > > > } > > > > > > atomic_set(&group->poll_wakeup, 0); > > > > > > - init_waitqueue_head(&group->poll_wait); > > > > > > wake_up_process(task); > > > > > > timer_setup(&group->poll_timer, poll_timer_fn, 0); > > > > > > > > > > This looks now unncessary? > > > > > > > > > > > rcu_assign_pointer(group->poll_task, task); > > > > > > @@ -1233,7 +1236,6 @@ static void psi_trigger_destroy(struct kref *ref) > > > > > > * But it might have been already scheduled before > > > > > > * that - deschedule it cleanly before destroying it. > > > > > > */ > > > > > > - del_timer_sync(&group->poll_timer); > > > > > > > > > > And this looks wrong. Did you mean to delete the timer_setup() line > > > > > instead? > > > > > > > > I would like to get more details about this race before trying to fix > > > > it. Please clarify. > > > > Thanks!