Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp72193iob; Mon, 2 May 2022 13:23:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyaXCGRCAwztdANC6Q9BXlqmkRHbq681kOmEEIJjEaAWf21mvsrvrFKIpeRDbqt8yOArdGv X-Received: by 2002:a17:90b:4f91:b0:1cd:3a73:3a5d with SMTP id qe17-20020a17090b4f9100b001cd3a733a5dmr973063pjb.98.1651522986743; Mon, 02 May 2022 13:23:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651522986; cv=none; d=google.com; s=arc-20160816; b=TXDPAzDxtJBvFMY55xFO1QQa8XpYiQM+zYzD4VPsGy/pQ7QDQ+hiwWBck94HVhq18n 3BZJ4wYwMTPmug919Apg8frIiFSIYdg6D1KRSjY34IzXQX7r2o/USr3CA7wDiztoy/82 C1WgjZz3VxNf00Pjl3jeDu0xo78bexy9t0Vj4IO8hyyA/2jNBjmbOCm7fmD+iP8nnTpi YBu4oBkzaZvDij5bwDtM7baXckxv8T9XGBNELD4DhNrae5HonrleqIZ8E9IVHgj9Uv2m 9IUfpRTkFa/TjBzQ10ZjF97vuhA/MxL7yoinKbInn6OIYY/K/lew7O/oUeyHw85G3KgV MEhg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=gOJ0rjADphs8szk5j8Hf+39lABKqOMQs2XLjvoBK8jY=; b=Sb3ZdDB5KfEOJGuDwOdD0MCkkuIKOkNUoPB4Rfh3CzIgJXK2a/R2UIw4ARt+nI27Pt iYPPAxL+xIkk8/0QCEUwhA5RpmNIdMiKYfBwp2ahqjgx95yP3c3bo+/GtdoJCOGKol4j 0eKgj4aQDB8I3SELuMAbjbGP24SqvjktqB7D/XP0tGZlvE+L/jtPoonWXPIqkMn8u8jJ 9iK06zAtBmHFGa3wKvUczSxMjmTueFjVUidFKkQDGMuSmKaNg2CPzC7uLSLDwSi0ZR+C h+jltZ7q2STvtaaQte7QcWguRW2C9w7VOmaFDXzEIcPNNh0px3DRPJ2LbNXTwXqpkBTH F9Bg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@quicinc.com header.s=qcdkim header.b=IpQjBPvk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=quicinc.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k15-20020a170902c40f00b00153b2d16529si15951627plk.305.2022.05.02.13.22.50; Mon, 02 May 2022 13:23:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@quicinc.com header.s=qcdkim header.b=IpQjBPvk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=quicinc.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239709AbiEBPjt (ORCPT + 99 others); Mon, 2 May 2022 11:39:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47086 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234055AbiEBPjr (ORCPT ); Mon, 2 May 2022 11:39:47 -0400 Received: from alexa-out-sd-02.qualcomm.com (alexa-out-sd-02.qualcomm.com [199.106.114.39]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 43B98BD8 for ; Mon, 2 May 2022 08:36:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; i=@quicinc.com; q=dns/txt; s=qcdkim; t=1651505775; x=1683041775; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=gOJ0rjADphs8szk5j8Hf+39lABKqOMQs2XLjvoBK8jY=; b=IpQjBPvkWwSDU1OQ8dfngpR4O1nsL2Lr4oKI2QSYbvLsUDGWVHlJ/z09 kdNsCnVOG/cAa+bLN+Vhmi6KDCT1iJCaLq6/DcylJDSsWblnEUkyRWVl9 2PElueX8/vxRsXUNyektVrvPuAc60X+qv4MNGyFiU0cZwfKuApO0gQKTf w=; Received: from unknown (HELO ironmsg03-sd.qualcomm.com) ([10.53.140.143]) by alexa-out-sd-02.qualcomm.com with ESMTP; 02 May 2022 08:36:14 -0700 X-QCInternal: smtphost Received: from nasanex01c.na.qualcomm.com ([10.47.97.222]) by ironmsg03-sd.qualcomm.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 May 2022 08:36:14 -0700 Received: from [10.110.26.248] (10.80.80.8) by nasanex01c.na.qualcomm.com (10.47.97.222) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Mon, 2 May 2022 08:36:11 -0700 Message-ID: Date: Mon, 2 May 2022 21:06:08 +0530 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: [PATCH v4] devcoredump : Serialize devcd_del work Content-Language: en-US To: CC: , , , , , References: <1651505671-1506-1-git-send-email-quic_mojha@quicinc.com> From: Mukesh Ojha In-Reply-To: <1651505671-1506-1-git-send-email-quic_mojha@quicinc.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.80.80.8] X-ClientProxiedBy: nasanex01b.na.qualcomm.com (10.46.141.250) To nasanex01c.na.qualcomm.com (10.47.97.222) X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This should be v4, corrected in the subject -Mukesh On 5/2/2022 9:04 PM, Mukesh Ojha wrote: > In following scenario(diagram), when one thread X running dev_coredumpm() > adds devcd device to the framework which sends uevent notification to > userspace and another thread Y reads this uevent and call to > devcd_data_write() which eventually try to delete the queued timer that > is not initialized/queued yet. > > So, debug object reports some warning and in the meantime, timer is > initialized and queued from X path. and from Y path, it gets reinitialized > again and timer->entry.pprev=NULL and try_to_grab_pending() stucks. > > To fix this, introduce mutex and a boolean flag to serialize the behaviour. > > cpu0(X) cpu1(Y) > > dev_coredump() uevent sent to user space > device_add() ======================> user space process Y reads the > uevents writes to devcd fd > which results into writes to > > devcd_data_write() > mod_delayed_work() > try_to_grab_pending() > del_timer() > debug_assert_init() > INIT_DELAYED_WORK() > schedule_delayed_work() > debug_object_fixup() > timer_fixup_assert_init() > timer_setup() > do_init_timer() > /* > Above call reinitializes > the timer to > timer->entry.pprev=NULL > and this will be checked > later in timer_pending() call. > */ > timer_pending() > !hlist_unhashed_lockless(&timer->entry) > !h->pprev > /* > del_timer() checks h->pprev and finds > it to be NULL due to which > try_to_grab_pending() stucks. > */ > > Link: https://lore.kernel.org/lkml/2e1f81e2-428c-f11f-ce92-eb11048cb271@quicinc.com/ > Signed-off-by: Mukesh Ojha > --- > v3->v4: > - flg variable renamed to delete_work. > > v2->v3: > Addressed comments from gregkh > - Wrapped the commit text and corrected the alignment. > - Described the reason to introduce new variables. > - Restored the blank line. > - rename the del_wk_queued to flg. > Addressed comments from tglx > - Added a comment which explains the race which looks obvious however > would not occur between disabled_store and devcd_del work. > > > v1->v2: > - Added del_wk_queued flag to serialize the race between devcd_data_write() > and disabled_store() => devcd_free(). > drivers/base/devcoredump.c | 83 ++++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 81 insertions(+), 2 deletions(-) > > diff --git a/drivers/base/devcoredump.c b/drivers/base/devcoredump.c > index f4d794d..1c06781 100644 > --- a/drivers/base/devcoredump.c > +++ b/drivers/base/devcoredump.c > @@ -25,6 +25,47 @@ struct devcd_entry { > struct device devcd_dev; > void *data; > size_t datalen; > + /* > + * Here, mutex is required to serialize the calls to del_wk work between > + * user/kernel space which happens when devcd is added with device_add() > + * and that sends uevent to user space. User space reads the uevents, > + * and calls to devcd_data_write() which try to modify the work which is > + * not even initialized/queued from devcoredump. > + * > + * > + * > + * cpu0(X) cpu1(Y) > + * > + * dev_coredump() uevent sent to user space > + * device_add() ======================> user space process Y reads the > + * uevents writes to devcd fd > + * which results into writes to > + * > + * devcd_data_write() > + * mod_delayed_work() > + * try_to_grab_pending() > + * del_timer() > + * debug_assert_init() > + * INIT_DELAYED_WORK() > + * schedule_delayed_work() > + * > + * > + * Also, mutex alone would not be enough to avoid scheduling of > + * del_wk work after it get flush from a call to devcd_free() > + * mentioned as below. > + * > + * disabled_store() > + * devcd_free() > + * mutex_lock() devcd_data_write() > + * flush_delayed_work() > + * mutex_unlock() > + * mutex_lock() > + * mod_delayed_work() > + * mutex_unlock() > + * So, delete_work flag is required. > + */ > + struct mutex mutex; > + bool delete_work; > struct module *owner; > ssize_t (*read)(char *buffer, loff_t offset, size_t count, > void *data, size_t datalen); > @@ -84,7 +125,12 @@ static ssize_t devcd_data_write(struct file *filp, struct kobject *kobj, > struct device *dev = kobj_to_dev(kobj); > struct devcd_entry *devcd = dev_to_devcd(dev); > > - mod_delayed_work(system_wq, &devcd->del_wk, 0); > + mutex_lock(&devcd->mutex); > + if (!devcd->delete_work) { > + devcd->delete_work = true; > + mod_delayed_work(system_wq, &devcd->del_wk, 0); > + } > + mutex_unlock(&devcd->mutex); > > return count; > } > @@ -112,7 +158,12 @@ static int devcd_free(struct device *dev, void *data) > { > struct devcd_entry *devcd = dev_to_devcd(dev); > > + mutex_lock(&devcd->mutex); > + if (!devcd->delete_work) > + devcd->delete_work = true; > + > flush_delayed_work(&devcd->del_wk); > + mutex_unlock(&devcd->mutex); > return 0; > } > > @@ -122,6 +173,30 @@ static ssize_t disabled_show(struct class *class, struct class_attribute *attr, > return sysfs_emit(buf, "%d\n", devcd_disabled); > } > > +/* > + * > + * disabled_store() worker() > + * class_for_each_device(&devcd_class, > + * NULL, NULL, devcd_free) > + * ... > + * ... > + * while ((dev = class_dev_iter_next(&iter)) > + * devcd_del() > + * device_del() > + * put_device() <- last reference > + * error = fn(dev, data) devcd_dev_release() > + * devcd_free(dev, data) kfree(devcd) > + * mutex_lock(&devcd->mutex); > + * > + * > + * In the above diagram, It looks like disabled_store() would be racing with parallely > + * running devcd_del() and result in memory abort while acquiring devcd->mutex which > + * is called after kfree of devcd memory after dropping its last reference with > + * put_device(). However, this will not happens as fn(dev, data) runs > + * with its own reference to device via klist_node so it is not its last reference. > + * so, above situation would not occur. > + */ > + > static ssize_t disabled_store(struct class *class, struct class_attribute *attr, > const char *buf, size_t count) > { > @@ -278,13 +353,16 @@ void dev_coredumpm(struct device *dev, struct module *owner, > devcd->read = read; > devcd->free = free; > devcd->failing_dev = get_device(dev); > + devcd->delete_work = false; > > + mutex_init(&devcd->mutex); > device_initialize(&devcd->devcd_dev); > > dev_set_name(&devcd->devcd_dev, "devcd%d", > atomic_inc_return(&devcd_count)); > devcd->devcd_dev.class = &devcd_class; > > + mutex_lock(&devcd->mutex); > if (device_add(&devcd->devcd_dev)) > goto put_device; > > @@ -301,10 +379,11 @@ void dev_coredumpm(struct device *dev, struct module *owner, > > INIT_DELAYED_WORK(&devcd->del_wk, devcd_del); > schedule_delayed_work(&devcd->del_wk, DEVCD_TIMEOUT); > - > + mutex_unlock(&devcd->mutex); > return; > put_device: > put_device(&devcd->devcd_dev); > + mutex_unlock(&devcd->mutex); > put_module: > module_put(owner); > free: