Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp165311imu; Mon, 26 Nov 2018 18:52:13 -0800 (PST) X-Google-Smtp-Source: AJdET5cjeHCUzN9sBAcOilDVYPtd163ekqxd/XE4809y1/OLIY2BbSzdSGKpqoJxPxUhjAjCXZAc X-Received: by 2002:a62:da5a:: with SMTP id w26mr30965915pfl.106.1543287133484; Mon, 26 Nov 2018 18:52:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543287133; cv=none; d=google.com; s=arc-20160816; b=NUKJF7JEg6wkF7qeWqJeB45fiXWRysd0MYK975v3ujqKCO7WGFVCO8Yxto/qK2hVFn Hc+xuFQ55igh+CHGt9n80dtFLzDBidc6tmazpViqq66GcjJOGlxLsYbWcHTWo3nxauVx IL1DchwqronBRz3HIf2+HL7/d3aSRciR1TCvciafrL7YikGrrqWMtcllDyLP6k56UgH0 dPa+SRJ/82Y7s5gVkJ9ozXBLuzFVFsv7n4UvK0tegh/AuEr8nCQ3sujJM3OsslViDqf2 /DqKIafEkh2lp0ABNCeOaHqz3CkBvugZ+9wz4MvwvUz+I0siL6pSEk/UDgFA0hiwbXRZ jG2Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:reply-to:message-id :subject:cc:to:from:date:dkim-signature; bh=e3DnUS0xGl1BdP8iCaJnj8b8YZZ49GG1NKDzme1M4zk=; b=tDTTpL7HP5DqdQs9Izl4LZfUf7MnU4Af8qxFPB6lkpg642NfcaLpkCYsqcrcNwvhsu JNDut9A5GW41zvEQpNFQD8Y75GqGWE9m+021pZ13tRlOCO1IHxp8Cg5YmixdP6R+VuSt dIe/X9NoDICQP/7psloenVzi54U0zIjvyWUbcpm2vZ4T2u+B3MN/KsYmqw2y83kjg1D0 Og50xICpOAovYZEMOPVWmi0QroRsjYQ/8oFcdpVt6/kPgMwG23yuhJOV6S1t1XyO0NYb KYBICfCNMNYut6/ahF8Y7gZMa9VF+TEPabZXiFa62wbdYRMuqdxqXSlkWu/u51eUwzeH z3Qw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=OcHUfHlV; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 23si2585504pfz.20.2018.11.26.18.51.58; Mon, 26 Nov 2018 18:52:13 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=OcHUfHlV; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728309AbeK0Nfr (ORCPT + 99 others); Tue, 27 Nov 2018 08:35:47 -0500 Received: from mail-ed1-f65.google.com ([209.85.208.65]:37716 "EHLO mail-ed1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728025AbeK0Nfq (ORCPT ); Tue, 27 Nov 2018 08:35:46 -0500 Received: by mail-ed1-f65.google.com with SMTP id h15so17690347edb.4 for ; Mon, 26 Nov 2018 18:39:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:reply-to:references:mime-version :content-disposition:in-reply-to:user-agent; bh=e3DnUS0xGl1BdP8iCaJnj8b8YZZ49GG1NKDzme1M4zk=; b=OcHUfHlVixSwdJVIScNNUWjvETbJCfTrK9oHeAYx/PTIF+g0l5/ZVA+6eDIrYNFhN6 iX48X6cLKvpTjpKpQnKXvKeULK9cWfI8srVJy4MBdl/IQned5vJIuM2qqMlrK3B2D8qn tmkFm0ibCh/D+DxsoALiBVWhpyNb23ZPs/Vp5nhltB58nrwEF8dewWatEjjFtf5GpXrG hOHQpid+E235ijnx0SvpTYCwFOXREgX/ZWQH9B8XaBceZwBqZiZkKIC401h/Y86wBelT QGjPo2oIoGHrdsxAkvCoxJ+7M4l8/Gt63U8uZCoSco4CnxVALJJiJlGLydKkaBuXwUfA RVpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:reply-to :references:mime-version:content-disposition:in-reply-to:user-agent; bh=e3DnUS0xGl1BdP8iCaJnj8b8YZZ49GG1NKDzme1M4zk=; b=D4hVexg0TvtQd6ylgxqbcd57IzeRFN8lUDnb+SKV/GtZ3NOVRy7T2esH3DdTId3fj2 sS+MNuduUVhX127GfbrEJBtp1vqLzKf1I5jam2rbWrkXwTNlRvoF8LBfMFeztSd2u+ME /tELuwthTXV6OY7BZP2d/VcQGf5wwdqF9q5AJbP3dKq7F2E/r4MGZIbQPTUD1qeftWdS FE4eaczGJ6EUxVenEpOxX1C8webjHVMbkn0lQKopR50WTuIp+BQSrL/+JTmucjA27czK X7pEOrIpkrq9XeYzwj9ooIUKIBE20MDnYT91xmqGyTEfwNhFt66k9xc7SZ+R1Isit+Ov iJ7g== X-Gm-Message-State: AA+aEWYlzhEdEV4ZNNrRGCNwtLadxYSv5YdHLFH0twGTvaf5YvYXJbxY B3ES90JPjb6iZTtt1E0dPF8= X-Received: by 2002:a50:93c5:: with SMTP id o63mr24637767eda.20.1543286366565; Mon, 26 Nov 2018 18:39:26 -0800 (PST) Received: from localhost ([185.92.221.13]) by smtp.gmail.com with ESMTPSA id w13sm632899edl.54.2018.11.26.18.39.25 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 26 Nov 2018 18:39:25 -0800 (PST) Date: Tue, 27 Nov 2018 02:39:25 +0000 From: Wei Yang To: Wengang Wang Cc: Wei Yang , zhong jiang , Christopher Lameter , penberg@kernel.org, David Rientjes , iamjoonsoo.kim@lge.com, Andrew Morton , Linux-MM , Linux Kernel Mailing List Subject: Re: [PATCH] mm: use this_cpu_cmpxchg_double in put_cpu_partial Message-ID: <20181127023925.xdramcjenpsawfw6@master> Reply-To: Wei Yang References: <20181117013335.32220-1-wen.gang.wang@oracle.com> <5BF36EE9.9090808@huawei.com> <476b5d35-1894-680c-2bd9-b399a3f4d9ed@oracle.com> <20181127003638.2oyudcyene6hb6sb@master> <6e9efd77-b6af-40f4-56b0-c0572930b3e0@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6e9efd77-b6af-40f4-56b0-c0572930b3e0@oracle.com> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 26, 2018 at 05:42:28PM -0800, Wengang Wang wrote: > > >On 2018/11/26 16:36, Wei Yang wrote: >> On Mon, Nov 26, 2018 at 08:57:54AM -0800, Wengang Wang wrote: >> > >> > On 2018/11/25 17:59, Wei Yang wrote: >> > > On Tue, Nov 20, 2018 at 10:58 AM zhong jiang wrote: >> > > > On 2018/11/17 9:33, Wengang Wang wrote: >> > > > > The this_cpu_cmpxchg makes the do-while loop pass as long as the >> > > > > s->cpu_slab->partial as the same value. It doesn't care what happened to >> > > > > that slab. Interrupt is not disabled, and new alloc/free can happen in the >> > > > > interrupt handlers. Theoretically, after we have a reference to the it, >> > > > > stored in _oldpage_, the first slab on the partial list on this CPU can be >> > > > > moved to kmem_cache_node and then moved to different kmem_cache_cpu and >> > > > > then somehow can be added back as head to partial list of current >> > > > > kmem_cache_cpu, though that is a very rare case. If that rare case really >> > > > > happened, the reading of oldpage->pobjects may get a 0xdead0000 >> > > > > unexpectedly, stored in _pobjects_, if the reading happens just after >> > > > > another CPU removed the slab from kmem_cache_node, setting lru.prev to >> > > > > LIST_POISON2 (0xdead000000000200). The wrong _pobjects_(negative) then >> > > > > prevents slabs from being moved to kmem_cache_node and being finally freed. >> > > > > >> > > > > We see in a vmcore, there are 375210 slabs kept in the partial list of one >> > > > > kmem_cache_cpu, but only 305 in-use objects in the same list for >> > > > > kmalloc-2048 cache. We see negative values for page.pobjects, the last page >> > > > > with negative _pobjects_ has the value of 0xdead0004, the next page looks >> > > > > good (_pobjects is 1). >> > > > > >> > > > > For the fix, I wanted to call this_cpu_cmpxchg_double with >> > > > > oldpage->pobjects, but failed due to size difference between >> > > > > oldpage->pobjects and cpu_slab->partial. So I changed to call >> > > > > this_cpu_cmpxchg_double with _tid_. I don't really want no alloc/free >> > > > > happen in between, but just want to make sure the first slab did expereince >> > > > > a remove and re-add. This patch is more to call for ideas. >> > > > Have you hit the really issue or just review the code ? >> > > > >> > > > I did hit the issue and fixed in the upstream patch unpredictably by the following patch. >> > > > e5d9998f3e09 ("slub: make ->cpu_partial unsigned int") >> > > > >> > > Zhong, >> > > >> > > I took a look into your upstream patch, while I am confused how your patch >> > > fix this issue? >> > > >> > > In put_cpu_partial(), the cmpxchg compare cpu_slab->partial (a page struct) >> > > instead of the cpu_partial (an unsigned integer). I didn't get the >> > > point of this fix. >> > I think the patch can't prevent pobjects from being set as 0xdead0000 (the >> > primary 4 bytes of LIST_POISON2). >> > But if pobjects is treated as unsigned integer, >> > >> > 2266???????????????????????????????????????????????? pobjects = oldpage->pobjects; >> > 2267???????????????????????????????????????????????? pages = oldpage->pages; >> > 2268???????????????????????????????????????????????? if (drain && pobjects > s->cpu_partial) { >> > 2269???????????????????????????????????????????????????????????????? unsigned long flags; >> > >> Ehh..., you mean (0xdead0000 > 0x02) ? >Yes. > >> This is really a bad thing, if it wordarounds the problem like this. >It does. >> I strongly don't agree this is a *fix*. This is too tricky. >It's tricky. I don't know for what purpose the patch went in. The commit >message was just saying _pobjects_ should be "unsigned"... Well, upstream accepts that commit is reasonable, since pobjects is not possible to be negative. This is two different things. But relates that commit with the problem here is not proper. > >thanks, >wengang > >> > line 2268 will be true in put_cpu_partial(), thus code goes to >> > unfreeze_partials(). This way the slabs in the cpu partial list can be moved >> > to kmem_cache_nod and then freed. So it fixes (or say workarounds) the >> > problem I see here (huge number of empty slabs stay in cpu partial list). >> > >> > thanks >> > wengang >> > >> > > > Thanks, >> > > > zhong jiang -- Wei Yang Help you, Help me