Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp5334673imw; Wed, 20 Jul 2022 03:50:03 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vSsDIXaRiDKLr4SMXpP/Ugmvp8gOB/4yhx6mj1bkHb7kOyVhliJAvKfUxASJYq7nh1Jii0 X-Received: by 2002:a05:6402:540c:b0:434:d965:f8a with SMTP id ev12-20020a056402540c00b00434d9650f8amr50873389edb.30.1658314203662; Wed, 20 Jul 2022 03:50:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658314203; cv=none; d=google.com; s=arc-20160816; b=XNNbrq2FW45RBQonN+xvK3JeKk+kXxkdfrXHiLvKFntli+Jqq6Vw/GNqZgvevH1sT/ 6Kcp8maLfmv5s1KWS7ToR4QQSK0BsSY0InqZ1u+GOflxnRmGDfpfLQDBo2fpBHec1BZJ rS29M3HgrnYFuSUdny4nJug1Z/SCQjdDYi/UcasT/hoH4javR2BALgf7G8Kikn5rgrW+ 0taKdAViqRQESnmNJsJqOCHZPtnltJUy6wu6oHsBYENM8MpMwGD19e2/LTzkSRPkKgUB ughlH6zwTZNpC2TfprQOyQsEQbLo+eG59b1f1O3ScV7y02Tl3DqUHkCGV7rnbHzVStE1 Xvzw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=bXOtkSkNPN/B9tmbWziX3PgrKXqM7+mLUw/3t2EA2Ug=; b=sGLxGDul/5hIfla9269KoYDkfhLFak8NvSZ+IffSE7+w+EusZI/QUZWEmO54zsb8Mp GxVZvhoKJANabICQlPqs5nt6rtWiRnzqurUL3Te7YjR1vUJW+lCKO1O2EoN6gfxbSA/I DTPlcRHa0G6wXKPRRIuxDDiuiNECQh8xTWwJxa4O0xXEVf0GnQ4uyr/2IFFBPW9+fCqE WC7GlTSTc38BWTIBBvTv8YMdBmyQg8g6VG2/qzca/ZSiCtAUnpDRFtkmmH7WUwfyZcCG 5bdyfR1LdmcXK+nDh1gbJlM1plOcKbvL6iSL/mStmP8WwDDao1lJ++tPqR1X9Kg2N3go rFtw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@quicinc.com header.s=qcdkim header.b=H7j0togz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=quicinc.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hc38-20020a17090716a600b007269f867290si2411315ejc.1008.2022.07.20.03.49.38; Wed, 20 Jul 2022 03:50:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@quicinc.com header.s=qcdkim header.b=H7j0togz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=quicinc.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231852AbiGTKnc (ORCPT + 99 others); Wed, 20 Jul 2022 06:43:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55712 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230032AbiGTKna (ORCPT ); Wed, 20 Jul 2022 06:43:30 -0400 Received: from alexa-out-sd-01.qualcomm.com (alexa-out-sd-01.qualcomm.com [199.106.114.38]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D891066AF1 for ; Wed, 20 Jul 2022 03:43:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; i=@quicinc.com; q=dns/txt; s=qcdkim; t=1658313808; x=1689849808; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=bXOtkSkNPN/B9tmbWziX3PgrKXqM7+mLUw/3t2EA2Ug=; b=H7j0togztSW7QGAvIqWRWS/BNQgBU9r6fomhL461kx8u75HyEQLAvqAp +T38oaiy+7DcwEYWXfwLIeXmF01+a+rg1Bm6HRtWtIJS2yECR+LADSZ/+ jkEF5yzaR/ME1Veaq6eixFyaB8hkpY5roE2Ku9SByDTnwRu16pUnK2OY0 U=; Received: from unknown (HELO ironmsg02-sd.qualcomm.com) ([10.53.140.142]) by alexa-out-sd-01.qualcomm.com with ESMTP; 20 Jul 2022 03:43:28 -0700 X-QCInternal: smtphost Received: from nasanex01c.na.qualcomm.com ([10.47.97.222]) by ironmsg02-sd.qualcomm.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2022 03:43:28 -0700 Received: from nalasex01a.na.qualcomm.com (10.47.209.196) by nasanex01c.na.qualcomm.com (10.47.97.222) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Wed, 20 Jul 2022 03:43:27 -0700 Received: from [10.216.34.46] (10.80.80.8) by nalasex01a.na.qualcomm.com (10.47.209.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Wed, 20 Jul 2022 03:43:22 -0700 Message-ID: Date: Wed, 20 Jul 2022 16:13:19 +0530 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Subject: Re: [PATCH] mm: fix use-after free of page_ext after race with memory-offline Content-Language: en-US To: Michal Hocko , Pavan Kondeti CC: , , , , , , , , , , , , "iamjoonsoo.kim@lge.com" References: <1657810063-28938-1-git-send-email-quic_charante@quicinc.com> <20220720082112.GA14437@hu-pkondeti-hyd.qualcomm.com> From: Charan Teja Kalla In-Reply-To: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.80.80.8] X-ClientProxiedBy: nasanex01b.na.qualcomm.com (10.46.141.250) To nalasex01a.na.qualcomm.com (10.47.209.196) X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Thanks Michal & Pavan, On 7/20/2022 2:40 PM, Michal Hocko wrote: >>>> Thanks! The most imporant part is how the exclusion is actual achieved >>>> because that is not really clear at first sight >>>> >>>> CPU1 CPU2 >>>> lookup_page_ext(PageA) offlining >>>> offline_page_ext >>>> __free_page_ext(addrA) >>>> get_entry(addrA) >>>> ms->page_ext = NULL >>>> synchronize_rcu() >>>> free_page_ext >>>> free_pages_exact (now addrA is unusable) >>>> >>>> rcu_read_lock() >>>> entryA = get_entry(addrA) >>>> base + page_ext_size * index # an address not invalidated by the freeing path >>>> do_something(entryA) >>>> rcu_read_unlock() >>>> >>>> CPU1 never checks ms->page_ext so it cannot bail out early when the >>>> thing is torn down. Or maybe I am missing something. I am not familiar >>>> with page_ext much. >>> >>> Thanks a lot for catching this Michal. You are correct that the proposed >>> code from me is still racy. I Will correct this along with the proper >>> commit message in the next version of this patch. >>> >> Trying to understand your discussion with Michal. What part is still racy? We >> do check for mem_section::page_ext and bail out early from lookup_page_ext(), >> no? >> >> Also to make this scheme explicit, we can annotate page_ext member with __rcu >> and use rcu_assign_pointer() on the writer side. Annotating with __rcu requires all the read and writes to ms->page_ext to be under rcu_[access|assign]_pointer which is a big patch. I think READ_ONCE and WRITE_ONCE, mentioned by Michal, below should does the job. >> >> struct page_ext *lookup_page_ext(const struct page *page) >> { >> unsigned long pfn = page_to_pfn(page); >> struct mem_section *section = __pfn_to_section(pfn); >> /* >> * The sanity checks the page allocator does upon freeing a >> * page can reach here before the page_ext arrays are >> * allocated when feeding a range of pages to the allocator >> * for the first time during bootup or memory hotplug. >> */ >> if (!section->page_ext) >> return NULL; >> return get_entry(section->page_ext, pfn); >> } > You are right. I was looking at the wrong implementation and misread > ifdef vs. ifndef CONFIG_SPARSEMEM. My bad. > There is still a small race window b/n ms->page_ext setting NULL and its access even under CONFIG_SPARSEMEM. In the above mentioned example: CPU1 CPU2 rcu_read_lock() lookup_page_ext(PageA): offlining offline_page_ext __free_page_ext(addrA) get_entry(addrA) if (!section->page_ext) turns to be false. ms->page_ext = NULL addrA = get_entry(base=section->page_ext): base + page_ext_size * index; **Since base is NULL here, caller can still do the dereference on the invalid pointer address.** synchronize_rcu() free_page_ext free_pages_exact (now ) > Memory hotplug is not supported outside of CONFIG_SPARSEMEM so the > scheme should really work. I would use READ_ONCE for ms->page_ext and > WRITE_ONCE on the initialization side. Yes, I should be using the READ_ONCE() and WRITE_ONCE() here. Thanks, Charan