Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp5930752pxv; Wed, 7 Jul 2021 15:22:40 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy8agrNWwpy3yJNg9GwGRFAtLqvqunxULlmev8SOc1wA8o7AG/YPqAM6KvgTJuI253gZIc5 X-Received: by 2002:a6b:f41a:: with SMTP id i26mr21487697iog.162.1625696559861; Wed, 07 Jul 2021 15:22:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1625696559; cv=none; d=google.com; s=arc-20160816; b=aAAmGnOCF6N14OdHTyFqwM56OKCjVKEOb2g29qb8Dm/VgkUScnR6ramLF2oFaVoaJW ETTQ8QK8txGymtVoKhkicxeg8b4+MUVIoB/WGx3Srn6MkS4WNH3BtcCkYZHhBNFTMGmm Dcf7nr3eN4KUu2BZ6DB6XftHtoUpNfUkXt0hkYZaiNOKpnhBWBJjMljx/CGbl5hzvOvp zOKSBKPVqS1RVrMc5CxmkQ9QghFn6RCzsy824kzfJ37I23nLP2wy5xO4d5dPSZEiM++U xtqxUSitVmivoXjkfsOaZOtyU0fubBF40b69b2tnFlNYaKTwpjpkg4T6KPWI8TIzokXK HzhQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=+0wM/2+93HPOOZpIMUtQMPBQcZGKatuACzxxoFN87Do=; b=evuIHBsFQ4oPzTz1yP/KwOjmDaguFUlz2yrM0Uwz1eI7xZeGR6JtAI6amWA96/Rxp1 W9x90xwyCck9b0J5BTZt8m2Bdc2bhWrPkF8pakMuMddFI661jvoC4zHarUMcIP33dAiS vRn9CB5rK4VUVaCJsHwcBToFCYOUiQRn7v9trs8H1qcyUwptudZOp/XsuIntg2rT8Mi9 CvD1cY+yTegIY4GTGjQU5sQTj+eUPHId6Ps/Oq3xin9cvQeS8JlLp3orKsxZJJWwUeJY eIFNKObDq5KZCCZvkl3/DIfg8Ut8uYPYQ8k0hYiR6PkPn8w4ZnBr8501pDquLf3eD5+1 XyGg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n1si216054jat.109.2021.07.07.15.22.28; Wed, 07 Jul 2021 15:22:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232511AbhGGVI7 (ORCPT + 99 others); Wed, 7 Jul 2021 17:08:59 -0400 Received: from mail-io1-f48.google.com ([209.85.166.48]:44998 "EHLO mail-io1-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232088AbhGGVI5 (ORCPT ); Wed, 7 Jul 2021 17:08:57 -0400 Received: by mail-io1-f48.google.com with SMTP id q2so5374729iot.11 for ; Wed, 07 Jul 2021 14:06:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=+0wM/2+93HPOOZpIMUtQMPBQcZGKatuACzxxoFN87Do=; b=jMmJ0yxNZyWDdhSr1C5LMLiFUP00Ctx0RdETkaHxbXeVDZbTdZXS5ftYnC+vAHa8mM aWcq51s2abKpxIYOZZCDSO7t3omso8I6+qpI2qjW43qKBrC/PHTUqD4drn36ABQRiYH7 7T1krUjULVWETsTnQ2aHG/5vh6ELnMQ8x8zt7mkv+9ZxY958Cr63vlYaKUQIKtcJYoTG OCzitpIuFrlBgRob8uROIY7kNQuVLXT0Ags4MN6+QFkA7OMJnfsXaHerr0p59PvuXZ7S Dh665qkxnkn0uWOBRS4JVit3jp2SVB18nH3qozzXsgXKqaJ051PoQ+C8W2oqAgmt9TbJ Sqtg== X-Gm-Message-State: AOAM533iNAuf7cR8G7wN7nfz6l7EHeumrN72sEk6Cx5gxDWexUS3t9Z0 ZjkXQg2jvdG6j3s+yFG3bVU= X-Received: by 2002:a02:cd1a:: with SMTP id g26mr11273707jaq.38.1625691976865; Wed, 07 Jul 2021 14:06:16 -0700 (PDT) Received: from google.com (243.199.238.35.bc.googleusercontent.com. [35.238.199.243]) by smtp.gmail.com with ESMTPSA id k5sm10203ilu.70.2021.07.07.14.06.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Jul 2021 14:06:16 -0700 (PDT) Date: Wed, 7 Jul 2021 21:06:15 +0000 From: Dennis Zhou To: Linus Torvalds Cc: Tejun Heo , Christoph Lameter , Linux-MM , Linux Kernel Mailing List Subject: Re: [GIT PULL] percpu fixes for v5.14-rc1 Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 07, 2021 at 11:40:45AM -0700, Linus Torvalds wrote: > On Wed, Jul 7, 2021 at 6:00 AM Dennis Zhou wrote: > > > > This is just a single change to fix percpu depopulation. The code relied > > on depopulation code written specifically for the free path and relied > > on vmalloc to do the tlb flush lazily. As we're modifying the backing > > pages during the lifetime of a chunk, we need to also flush the tlb > > accordingly. > > I pulled this, but I ended up unpulling after looking at the fix. > > The fix may be perfectly correct, but I'm looking at that > pcpu_reclaim_populated() function, and I want somebody to explain to > me what it's ok to drop and re-take the 'pcpu_lock' and just continue. > > Because whatever it was protecting is now not protected any more. > > It *looks* like it's intended to protect the pcpu_chunk_lists[] > content, and some other functions that do this look ok. So for > example, pcpu_balance_free() at least removes the 'chunk' from the > pcpu_chunk_lists[] before it drops the lock and then works on the > chunk contents. > > But pcpu_reclaim_populated() seems to *leave* chunk on the > pcpu_chunk_lists[], drop the lock, and then continue to use 'chunk'. > > That odd "release lock and continue to use the data it's supposed to > protect" seems to be pre-existing, but > > (a) this is the code that caused problems to begin with > > and > > (b) it seems to now happen even more. > > So maybe this code is right. But it looks very odd to me, and I'd like > to get more explanations of _why_ it would be ok before I pull this > fix, since there seems to be a deeper underlying problem in the code > that this tries to fix. > Yeah I agree. I think I've inadvertently made this more complex than it needs to be and intend to clean it up. Percpu has a mutex lock and spinlock. The mutex lock is responsible for lifetime of a chunk and correctness of the backing pages, chunk->populated[]. The spinlock protects the other members of the chunk as well as pcpu_chunk_lists[]. The pcpu_balance_workfn() is used to serialize the freeing of chunks and maintenance of a floating # of pages to suffice atomic allocations. This is where depopulation is being bolted onto. It uses the serialization of: mutex_lock(&pcpu_alloc_mutex); pcpu_balance_free() pcpu_reclaim_populated() pcpu_balance_populated() mutex_unlock(&pcpu_alloc_mutex); while holding the mutex lock to know that the chunk we are modifying will not be freed. It's this that makes it okay to just pick up the lock and continue. Depopulation adds complexity to the lifecycle of a chunk through others freeing back percpu allocations. So, unlike pcpu_balance_free(), taking the chunk off of pcpu_chunk_lists[] doesn't guarantee someone else isn't accessing the chunk because they are freeing an object back. To handle this, a notion of isolated chunks is introduced and these chunks become effectively hidden to the allocation path. We are also holding the mutex lock throughout this process which prevents the allocation path from simultaenously adding backing pages to a chunk. I hope this explanation makes sense. Thanks, Dennis