Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp5799ybl; Thu, 12 Dec 2019 13:03:56 -0800 (PST) X-Google-Smtp-Source: APXvYqwjhz1n/SXnXNoQaDWszeXjkIxbsXiydxvHaVfFT/9bKxMaNryHDog5noEu2T/2puA6LZf4 X-Received: by 2002:a9d:4706:: with SMTP id a6mr9663570otf.331.1576184636105; Thu, 12 Dec 2019 13:03:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1576184636; cv=none; d=google.com; s=arc-20160816; b=D/nUEqFlU+U6Zcn6Jt1meOTOonlOEX+wFe5bel5ZweFFOji1TorAcpVwlVYEM+8Z4S gqrAg/mE15MeaH+RXNvgzJiEPXak0djYawHZ1ayLAEPQwOILg3K+fHVYbpyfURoDeI/E z3pvAWC4ojYh8BWAHZO5aG3/S2GZ6gg49egW2GqgHSneGa+dgIpaVXdbNjtS+/+/WANR 3OgpcWjoVkQ2X+XJT87W9pyjAdOQHCzXHg/mIztgDXyObEROn0OphnsM/niarBxEYTzp 0qgIDO7WpNGnb6+O5Qt8oAypoVYtQwPkueJ5jGMfBUJ/kZuaXV3SfJVLf4/FRYpwkhoV jWPw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:organization:from:references:to:subject:dkim-signature; bh=K/xmP+Vw2t1eL6SnJM6qLcIwF2tU6JDfLGMm9tBVuHw=; b=QMlM7r456jgJ28oRobLsc0300MsX5ACD76uVjFuwwHzaK4rG4Bp2LTTyqnMoydFfYr XfKuV+IqP6XyKUontXdZRtD+O/SWbickSebmrPDL5pG6r1XFHsOV9HDCFzgp481qYBSj Ko8/u7tXwcaWKH2c4Tw1ZCkNL97/HnqnxeqmCSXk+YOPOlvu6IQrs1ARxX6227/bsA54 e2m551eMYh02jd+2j+hcHx4/fF+mYi6tI24krNTLoF4ON4Ky6czRYXM6HUmw6DRDys7a lH76aRAqBWMfhDiHFzD555bnM0EEh3GvMOUMxhDExiRXKKr2keTUOVNegbhGFyfJ1Kq9 oymg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Ov0pRfT0; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q124si3644741oig.228.2019.12.12.13.03.29; Thu, 12 Dec 2019 13:03:56 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Ov0pRfT0; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731002AbfLLVBK (ORCPT + 99 others); Thu, 12 Dec 2019 16:01:10 -0500 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:21378 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1730779AbfLLVBJ (ORCPT ); Thu, 12 Dec 2019 16:01:09 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1576184469; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=K/xmP+Vw2t1eL6SnJM6qLcIwF2tU6JDfLGMm9tBVuHw=; b=Ov0pRfT0QN4+Ypbp6RFbMBZdt+9CLXkDyLkckootafS17GnsigCmpvRe9MzPsc82Oso00c M5k0Q0eUhsgXdb6uRNaKVGbC79SKbbGLUFnQGtFyAGwi40i97A/bENrCRLdm570XgKqudC ua9gBkBgdXSZVaH8aAmpB0BUeVymdAs= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-229-MzWPlqBOPM2DK_2Zu9ivkA-1; Thu, 12 Dec 2019 16:01:05 -0500 X-MC-Unique: MzWPlqBOPM2DK_2Zu9ivkA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 43A3618FE892; Thu, 12 Dec 2019 21:01:04 +0000 (UTC) Received: from llong.remote.csb (dhcp-17-59.bos.redhat.com [10.18.17.59]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7823B51154; Thu, 12 Dec 2019 21:01:03 +0000 (UTC) Subject: Re: [PATCH v2] mm/hugetlb: defer free_huge_page() to a workqueue To: Andrew Morton , Mike Kravetz , Matthew Wilcox , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Michal Hocko , aneesh.kumar@linux.ibm.com References: <20191211194615.18502-1-longman@redhat.com> <4fbc39a9-2c9c-4c2c-2b13-a548afe6083c@oracle.com> <32d2d4f2-83b9-2e40-05e2-71cd07e01b80@redhat.com> <0fcce71f-bc20-0ea3-b075-46592c8d533d@oracle.com> <20191212060650.ftqq27ftutxpc5hq@linux-p48b> <20191212063050.ufrpij6s6jkv7g7j@linux-p48b> <20191212190427.ouyohviijf5inhur@linux-p48b> From: Waiman Long Organization: Red Hat Message-ID: <295a82ae-a575-b6a0-ae89-3196fea45b9f@redhat.com> Date: Thu, 12 Dec 2019 16:01:03 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2 MIME-Version: 1.0 In-Reply-To: <20191212190427.ouyohviijf5inhur@linux-p48b> Content-Type: text/plain; charset=windows-1252 Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/12/19 2:04 PM, Davidlohr Bueso wrote: > There have been deadlock reports[1, 2] where put_page is called > from softirq context and this causes trouble with the hugetlb_lock, > as well as potentially the subpool lock. > > For such an unlikely scenario, lets not add irq dancing overhead > to the lock+unlock operations, which could incur in expensive > instruction dependencies, particularly when considering hard-irq > safety. For example PUSHF+POPF on x86. > > Instead, just use a workqueue and do the free_huge_page() in regular > task context. > > [1] > https://lore.kernel.org/lkml/20191211194615.18502-1-longman@redhat.com/ > [2] > https://lore.kernel.org/lkml/20180905112341.21355-1-aneesh.kumar@linux.= ibm.com/ > > Reported-by: Waiman Long > Reported-by: Aneesh Kumar K.V > Signed-off-by: Davidlohr Bueso > --- > > - Changes from v1: Only use wq when in_interrupt(), otherwise business > =A0=A0 as usual. Also include the proper header file. > > - While I have not reproduced this issue, the v1 using wq passes all > hugetlb > =A0=A0 related tests in ltp. > > mm/hugetlb.c | 45 ++++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 44 insertions(+), 1 deletion(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index ac65bb5e38ac..f28cf601938d 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -27,6 +27,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -1136,7 +1137,13 @@ static inline void > ClearPageHugeTemporary(struct page *page) > =A0=A0=A0=A0page[2].mapping =3D NULL; > } > > -void free_huge_page(struct page *page) > +static struct workqueue_struct *hugetlb_free_page_wq; > +struct hugetlb_free_page_work { > +=A0=A0=A0 struct page *page; > +=A0=A0=A0 struct work_struct work; > +}; > + > +static void __free_huge_page(struct page *page) > { > =A0=A0=A0=A0/* > =A0=A0=A0=A0 * Can't pass hstate in here because it is called from the > @@ -1199,6 +1206,36 @@ void free_huge_page(struct page *page) > =A0=A0=A0=A0spin_unlock(&hugetlb_lock); > } > > +static void free_huge_page_workfn(struct work_struct *work) > +{ > +=A0=A0=A0 struct page *page; > + > +=A0=A0=A0 page =3D container_of(work, struct hugetlb_free_page_work, > work)->page; > +=A0=A0=A0 __free_huge_page(page); > +} > + > +void free_huge_page(struct page *page) > +{ > +=A0=A0=A0 if (unlikely(in_interrupt())) { in_interrupt() also include context where softIRQ is disabled. So maybe !in_task() is a better fit here. > +=A0=A0=A0=A0=A0=A0=A0 /* > +=A0=A0=A0=A0=A0=A0=A0=A0 * While uncommon, free_huge_page() can be at = least > +=A0=A0=A0=A0=A0=A0=A0=A0 * called from softirq context, defer freeing = such > +=A0=A0=A0=A0=A0=A0=A0=A0 * that the hugetlb_lock and spool->lock need = not have > +=A0=A0=A0=A0=A0=A0=A0=A0 * to deal with irq dances just for this. > +=A0=A0=A0=A0=A0=A0=A0=A0 */ > +=A0=A0=A0=A0=A0=A0=A0 struct hugetlb_free_page_work work; > + > +=A0=A0=A0=A0=A0=A0=A0 work.page =3D page; > +=A0=A0=A0=A0=A0=A0=A0 INIT_WORK_ONSTACK(&work.work, free_huge_page_wor= kfn); > +=A0=A0=A0=A0=A0=A0=A0 queue_work(hugetlb_free_page_wq, &work.work); > + > +=A0=A0=A0=A0=A0=A0=A0 /* wait until the huge page freeing is done */ > +=A0=A0=A0=A0=A0=A0=A0 flush_work(&work.work); > +=A0=A0=A0=A0=A0=A0=A0 destroy_work_on_stack(&work.work); The problem I see is that you don't want to wait too long while in the hardirq context. However, the latency for the work to finish is indeterminate. Cheers, Longman