Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp2722169ybt; Tue, 16 Jun 2020 13:21:19 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwggaXhBXVo7l9yNL2L55E8RtG3Yn3SOAElr60SVFx3vpLLuI+A9ZfZDssNFKlw/JXVdvn1 X-Received: by 2002:a17:906:f203:: with SMTP id gt3mr4663514ejb.77.1592338879407; Tue, 16 Jun 2020 13:21:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1592338879; cv=none; d=google.com; s=arc-20160816; b=On6sc+I0fub7Uu7RI+C8vAhuiB2CHwMpnNBVwgu+qWJqfs8BJ0NpBUoWupzwNSJ2C7 Vt3b7kwgmubIYhvaxYhjyb7QE5tYSuJrt5vSmi9AFsnZsPHnoP+PU5hhCUjkPN/VtSa4 iD/5mE2snVWkMjRYJlp4WrGXwVZCz48RwCAhJRFpYlFOSQuPKSDmAjkbP3xjCHN7N2/1 pzh9LPMf2CgTV7b2po9f4uKS6wY0WeixyzVH19ZJcTT2/p7QoKFcne2MOn66c4eotvMB Ekx5AeLTVGxwEvXOTkKUDxMC1fO8P3juh/jOQi5ZNztkaB2bSwFd0QFCnjV8czvTaVFE CxVA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature; bh=5OcMZK0pIGG+xHLzfApAVvfMiSRe3c77ecNoi1ZxYsc=; b=OMrnBRiPZW4EYFixJVpc1plI4hfFvhs6sDABalhCk0EKELC76OxpXVSeqlTl1FbTsH diHwmOyTqbTum3GHDxBAQpgi2q0ygcpWDYyLacMWH35PDQFmpOGcLdV29r+6hFrwZqK0 T7nPaSFHsv9cMsJignw0vzdzyLrmZ3glgI0cv0wj/12YmEp8sf+M0UZR3mZWm1SNy1qp NNlGSrfckHyPu1LWFUAilEKJ34cZMn4yAP3yYI5di4rhFgLeVNE9iQ2DexSslDv5Gywo 7GONLfuiXNvXSQSldPAnnaAI5+mCnI8kUau/tEEIDPG8GO/0LyGAAadVzqnBVsuPGt1I nJAg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=acRvEnLA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a7si11445387ejx.184.2020.06.16.13.20.56; Tue, 16 Jun 2020 13:21:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=acRvEnLA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731463AbgFPUS6 (ORCPT + 99 others); Tue, 16 Jun 2020 16:18:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60532 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730725AbgFPUS5 (ORCPT ); Tue, 16 Jun 2020 16:18:57 -0400 Received: from mail-ot1-x344.google.com (mail-ot1-x344.google.com [IPv6:2607:f8b0:4864:20::344]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2B6A2C061573 for ; Tue, 16 Jun 2020 13:18:57 -0700 (PDT) Received: by mail-ot1-x344.google.com with SMTP id g5so16990894otg.6 for ; Tue, 16 Jun 2020 13:18:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=5OcMZK0pIGG+xHLzfApAVvfMiSRe3c77ecNoi1ZxYsc=; b=acRvEnLAKUl45yhbSb5Acwnee3bW4VJQVY1BLJ2hsepfm6/dj+FoLtbkSd4s+dJnhB jpAs08a6XpEPDpdSekXGPRNJ0uNyM+wFVUwEasiIdzcG5hp01rqI/aq4Irorf2QqwI7u wemXPg7vMZvPip+snIpyPE/zinPNWeh18ch96GYLSaVxD4MzcM3oCnQVEy4Eil2mpqxQ WgZSNnS0kd0HzZnPTt5svSXHbOFyjW1M41wT+dRee2fI9dtx71rduUQB00Q0VRQ2udZG PeECt8Bw+dfaxAJmAXRclNQVSalnP5Lkb3H3DtBEDIezd8jOdg2T72b6uGYLdxSEAC07 FBiw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=5OcMZK0pIGG+xHLzfApAVvfMiSRe3c77ecNoi1ZxYsc=; b=PTZdxN+BpONkvNGUewjUTNlTUrOkOZFiVpiss5ZwaEmiEMrWyAbotbEUOD53HF6Yoo 5ZlTxGVFHnnSN1TuWEyuRQJz80E5rAphGMvaYPk1aPTkF/7ChY1OLI/QanWWN0LJPQeQ 9G3YIdUN/jgyKKtu8aFauyXh7ZUPh38uBcnB9fRH0GrlymKpJVYh3vYylSvDyQHwAuqH W67TPRd6wKdZXZ6GwW8JF6+D8j3y4Q5kduvMXGpeohPo3+s/ScwW+KbONfX5Iia07jO0 zHvkAd2Uc8CCZHYM3VbpwsSJoSXluZ2C4w7wFAIKsW4MffXDGTcAR3urMWejfXqnSrYN W/zA== X-Gm-Message-State: AOAM531L9OQwu7ioDncvkvnr2D91+aFkvHD2Nk11KVnqq0udWQWSgSKk zDASZDI+ud8sq8Wi0ni+GW/e2A== X-Received: by 2002:a05:6830:4a2:: with SMTP id l2mr3812104otd.10.1592338735936; Tue, 16 Jun 2020 13:18:55 -0700 (PDT) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id 94sm4268169otb.47.2020.06.16.13.18.54 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Tue, 16 Jun 2020 13:18:54 -0700 (PDT) Date: Tue, 16 Jun 2020 13:18:40 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Vlastimil Babka cc: akpm@linux-foundation.org, alex.shi@linux.alibaba.com, hughd@google.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, liwang@redhat.com, mgorman@techsingularity.net, stable@vger.kernel.org Subject: Re: [PATCH 1/2] mm, compaction: make capture control handling safe wrt interrupts In-Reply-To: <20200616082649.27173-1-vbabka@suse.cz> Message-ID: References: <20200616082649.27173-1-vbabka@suse.cz> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 16 Jun 2020, Vlastimil Babka wrote: > Hugh reports: > > ===== > While stressing compaction, one run oopsed on NULL capc->cc in > __free_one_page()'s task_capc(zone): compact_zone_order() had been > interrupted, and a page was being freed in the return from interrupt. > > Though you would not expect it from the source, both gccs I was using > (a 4.8.1 and a 7.5.0) had chosen to compile compact_zone_order() with > the ".cc = &cc" implemented by mov %rbx,-0xb0(%rbp) immediately before > callq compact_zone - long after the "current->capture_control = &capc". > An interrupt in between those finds capc->cc NULL (zeroed by an earlier > rep stos). > > This could presumably be fixed by a barrier() before setting > current->capture_control in compact_zone_order(); but would also need > more care on return from compact_zone(), in order not to risk leaking > a page captured by interrupt just before capture_control is reset. > > Maybe that is the preferable fix, but I felt safer for task_capc() to > exclude the rather surprising possibility of capture at interrupt time. > ===== > > I have checked that gcc10 also behaves the same. > > The advantage of fix in compact_zone_order() is that we don't add another > test in the page freeing hot path, and that it might prevent future problems > if we stop exposing pointers to unitialized structures in current task. > > So this patch implements the suggestion for compact_zone_order() with barrier() > (and WRITE_ONCE() to prevent store tearing) for setting > current->capture_control, and prevents page leaking with WRITE_ONCE/READ_ONCE > in the proper order. > > Fixes: 5e1f0f098b46 ("mm, compaction: capture a page under direct compaction") > Cc: stable@vger.kernel.org # 5.1+ > Reported-by: Hugh Dickins > Suggested-by: Hugh Dickins > Signed-off-by: Vlastimil Babka Acked-by: Hugh Dickins > --- > mm/compaction.c | 17 ++++++++++++++--- > 1 file changed, 14 insertions(+), 3 deletions(-) > > diff --git a/mm/compaction.c b/mm/compaction.c > index fd988b7e5f2b..86375605faa9 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -2316,15 +2316,26 @@ static enum compact_result compact_zone_order(struct zone *zone, int order, > .page = NULL, > }; > > - current->capture_control = &capc; > + /* > + * Make sure the structs are really initialized before we expose the > + * capture control, in case we are interrupted and the interrupt handler > + * frees a page. > + */ > + barrier(); > + WRITE_ONCE(current->capture_control, &capc); > > ret = compact_zone(&cc, &capc); > > VM_BUG_ON(!list_empty(&cc.freepages)); > VM_BUG_ON(!list_empty(&cc.migratepages)); > > - *capture = capc.page; > - current->capture_control = NULL; > + /* > + * Make sure we hide capture control first before we read the captured > + * page pointer, otherwise an interrupt could free and capture a page > + * and we would leak it. > + */ > + WRITE_ONCE(current->capture_control, NULL); > + *capture = READ_ONCE(capc.page); > > return ret; > } > -- > 2.27.0