Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp2440268pxu; Sat, 28 Nov 2020 14:12:58 -0800 (PST) X-Google-Smtp-Source: ABdhPJyjFWUlG+yQ8fNAAXgdG4724qMBkMbGnLF7Mtc+TKmgb2aZw/ERVXj+5JqRYxjmZ8deo4Dg X-Received: by 2002:a17:906:b799:: with SMTP id dt25mr14384366ejb.88.1606601578027; Sat, 28 Nov 2020 14:12:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606601578; cv=none; d=google.com; s=arc-20160816; b=npxu2Nx8zHFwMTWnIVgzY4KxaPF4C4bWJ1vqPQJekbKyBtGlX7WPC77QZe5mPyjgbF YnLu7ErY8weuR/XBgT6DXottouXf/s+vtmWlvKJI8f2TciPNaoSy2kY0fWF0t5v+8FE8 +uKdiR+ti2ecBB+F/GFdYBeAG4hE/CMZWoeQ7IYr3hf0C72val3oVwQ0yOP9J7HTxq5X tcF7zALOhEuwxElnSoxwHkAk6f64wohk4DooAj97HGqccCajdvYZQc3XF9J0yEU0zS4l RyH3J33ZMX4reHc/85WA76k8gTFpQzZ53+3dc7TIBc7qM4whPagyMf5hbHgmZFHpUT+M mBgQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=rk41III0tAkUh4u/hIKJEcn4ibWW3TcSORMYy2GTzN8=; b=XqxJk5mMdomzLZaRyT12AORY7wibbb9AlyAHRLoIdUJv+VAznp84uoR0mn7/62kkFe xI0d6vz1yQ6olkDU3mXUJpyqPF8eD5uRH6T4NwXeGcTL3O1BUayxJAfZXSK1bzek6Y3a Iyzdce5cbg59FK4pMGl5+jGRO5vAXab87M9AINY7WSSEmN5NlDl+SjvVseyiR0W1XSvv XOePK9vCf25oggy+Kcy9wCnprsoab7Ypg83PWU76sUHpA3etrz19sGVTxEG8VIeVEY2M 0Oku3JhaaUjDtxVA4sdoUHWTFH9hcvxQDphkSpSNpwIOQdHLGun5vN+CQwDgMA5nT2ng wogg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@natalenko.name header.s=dkim-20170712 header.b=k7ihDQwC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=natalenko.name Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w10si7924544edj.71.2020.11.28.14.12.35; Sat, 28 Nov 2020 14:12:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@natalenko.name header.s=dkim-20170712 header.b=k7ihDQwC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=natalenko.name Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389665AbgK1VuE (ORCPT + 99 others); Sat, 28 Nov 2020 16:50:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35352 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732504AbgK1SH6 (ORCPT ); Sat, 28 Nov 2020 13:07:58 -0500 Received: from vulcan.natalenko.name (vulcan.natalenko.name [IPv6:2001:19f0:6c00:8846:5400:ff:fe0c:dfa0]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D49CFC02A1A7; Sat, 28 Nov 2020 06:27:25 -0800 (PST) Received: from localhost (home.natalenko.name [151.237.229.131]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by vulcan.natalenko.name (Postfix) with ESMTPSA id A189A8B18C9; Sat, 28 Nov 2020 15:27:23 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=natalenko.name; s=dkim-20170712; t=1606573643; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rk41III0tAkUh4u/hIKJEcn4ibWW3TcSORMYy2GTzN8=; b=k7ihDQwCULTU9S/ikzBoxo1P0RtVceKrJkARHROduIcWV2nNClsB0dRBpUryKPk+QZYR7z kMB/UMQqQz3n0IUfkabPbbhnE81kTeAMlEaclWEyzg7dxKAFjnX+Eq5UPEXfdpagu+EcPI 0FsVN+B8TZSEAlacTVifrcfQ64gsZ7o= Date: Sat, 28 Nov 2020 15:27:23 +0100 From: Oleksandr Natalenko To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, Andrew Morton , Sebastian Andrzej Siewior , Steven Rostedt , Mike Galbraith , Thomas Gleixner , linux-rt-users@vger.kernel.org Subject: Re: scheduling while atomic in z3fold Message-ID: <20201128142723.zik6d5skvt3uwu5f@spock.localdomain> References: <20201128140523.ovmqon5fjetvpby4@spock.localdomain> <20201128140924.iyqr2h52z2olt6zb@spock.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20201128140924.iyqr2h52z2olt6zb@spock.localdomain> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Nov 28, 2020 at 03:09:24PM +0100, Oleksandr Natalenko wrote: > > While running v5.10-rc5-rt11 I bumped into the following: > > > > ``` > > BUG: scheduling while atomic: git/18695/0x00000002 > > Preemption disabled at: > > [] z3fold_zpool_malloc+0x463/0x6e0 > > … > > Call Trace: > > dump_stack+0x6d/0x88 > > __schedule_bug.cold+0x88/0x96 > > __schedule+0x69e/0x8c0 > > preempt_schedule_lock+0x51/0x150 > > rt_spin_lock_slowlock_locked+0x117/0x2c0 > > rt_spin_lock_slowlock+0x58/0x80 > > rt_spin_lock+0x2a/0x40 > > z3fold_zpool_malloc+0x4c1/0x6e0 > > zswap_frontswap_store+0x39c/0x980 > > __frontswap_store+0x6e/0xf0 > > swap_writepage+0x39/0x70 > > shmem_writepage+0x31b/0x490 > > pageout+0xf4/0x350 > > shrink_page_list+0xa28/0xcc0 > > shrink_inactive_list+0x300/0x690 > > shrink_lruvec+0x59a/0x770 > > shrink_node+0x2d6/0x8d0 > > do_try_to_free_pages+0xda/0x530 > > try_to_free_pages+0xff/0x260 > > __alloc_pages_slowpath.constprop.0+0x3d5/0x1230 > > __alloc_pages_nodemask+0x2f6/0x350 > > allocate_slab+0x3da/0x660 > > ___slab_alloc+0x4ff/0x760 > > __slab_alloc.constprop.0+0x7a/0x100 > > kmem_cache_alloc+0x27b/0x2c0 > > __d_alloc+0x22/0x230 > > d_alloc_parallel+0x67/0x5e0 > > __lookup_slow+0x5c/0x150 > > path_lookupat+0x2ea/0x4d0 > > filename_lookup+0xbf/0x210 > > vfs_statx.constprop.0+0x4d/0x110 > > __do_sys_newlstat+0x3d/0x80 > > do_syscall_64+0x33/0x40 > > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > ``` > > > > The preemption seems to be disabled here: > > > > ``` > > $ scripts/faddr2line mm/z3fold.o z3fold_zpool_malloc+0x463 > > z3fold_zpool_malloc+0x463/0x6e0: > > add_to_unbuddied at mm/z3fold.c:645 > > (inlined by) z3fold_alloc at mm/z3fold.c:1195 > > (inlined by) z3fold_zpool_malloc at mm/z3fold.c:1737 > > ``` > > > > The call to the rt_spin_lock() seems to be here: > > > > ``` > > $ scripts/faddr2line mm/z3fold.o z3fold_zpool_malloc+0x4c1 > > z3fold_zpool_malloc+0x4c1/0x6e0: > > add_to_unbuddied at mm/z3fold.c:649 > > (inlined by) z3fold_alloc at mm/z3fold.c:1195 > > (inlined by) z3fold_zpool_malloc at mm/z3fold.c:1737 > > ``` > > > > Or, in source code: > > > > ``` > > 639 /* Add to the appropriate unbuddied list */ > > 640 static inline void add_to_unbuddied(struct z3fold_pool *pool, > > 641 struct z3fold_header *zhdr) > > 642 { > > 643 if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0 || > > 644 zhdr->middle_chunks == 0) { > > 645 struct list_head *unbuddied = get_cpu_ptr(pool->unbuddied); > > 646 > > 647 int freechunks = num_free_chunks(zhdr); > > 648 spin_lock(&pool->lock); > > 649 list_add(&zhdr->buddy, &unbuddied[freechunks]); > > 650 spin_unlock(&pool->lock); > > 651 zhdr->cpu = smp_processor_id(); > > 652 put_cpu_ptr(pool->unbuddied); > > 653 } > > 654 } > > ``` > > > > Shouldn't the list manipulation be protected with > > local_lock+this_cpu_ptr instead of get_cpu_ptr+spin_lock? Totally untested: ``` diff --git a/mm/z3fold.c b/mm/z3fold.c index 18feaa0bc537..53fcb80c6167 100644 --- a/mm/z3fold.c +++ b/mm/z3fold.c @@ -41,6 +41,7 @@ #include #include #include +#include #include #include #include @@ -156,6 +157,7 @@ struct z3fold_pool { const char *name; spinlock_t lock; spinlock_t stale_lock; + local_lock_t llock; struct list_head *unbuddied; struct list_head lru; struct list_head stale; @@ -642,14 +644,17 @@ static inline void add_to_unbuddied(struct z3fold_pool *pool, { if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0 || zhdr->middle_chunks == 0) { - struct list_head *unbuddied = get_cpu_ptr(pool->unbuddied); + struct list_head *unbuddied; + int freechunks; + local_lock(&pool->llock); + unbuddied = *this_cpu_ptr(&pool->unbuddied); - int freechunks = num_free_chunks(zhdr); + freechunks = num_free_chunks(zhdr); spin_lock(&pool->lock); list_add(&zhdr->buddy, &unbuddied[freechunks]); spin_unlock(&pool->lock); zhdr->cpu = smp_processor_id(); - put_cpu_ptr(pool->unbuddied); + local_unlock(&pool->llock); } } @@ -887,7 +892,8 @@ static inline struct z3fold_header *__z3fold_alloc(struct z3fold_pool *pool, lookup: /* First, try to find an unbuddied z3fold page. */ - unbuddied = get_cpu_ptr(pool->unbuddied); + local_lock(&pool->llock); + unbuddied = *this_cpu_ptr(&pool->unbuddied); for_each_unbuddied_list(i, chunks) { struct list_head *l = &unbuddied[i]; @@ -905,7 +911,7 @@ static inline struct z3fold_header *__z3fold_alloc(struct z3fold_pool *pool, !z3fold_page_trylock(zhdr)) { spin_unlock(&pool->lock); zhdr = NULL; - put_cpu_ptr(pool->unbuddied); + local_unlock(&pool->llock); if (can_sleep) cond_resched(); goto lookup; @@ -919,7 +925,7 @@ static inline struct z3fold_header *__z3fold_alloc(struct z3fold_pool *pool, test_bit(PAGE_CLAIMED, &page->private)) { z3fold_page_unlock(zhdr); zhdr = NULL; - put_cpu_ptr(pool->unbuddied); + local_unlock(&pool->llock); if (can_sleep) cond_resched(); goto lookup; @@ -934,7 +940,7 @@ static inline struct z3fold_header *__z3fold_alloc(struct z3fold_pool *pool, kref_get(&zhdr->refcount); break; } - put_cpu_ptr(pool->unbuddied); + local_unlock(&pool->llock); if (!zhdr) { int cpu; @@ -1005,6 +1011,7 @@ static struct z3fold_pool *z3fold_create_pool(const char *name, gfp_t gfp, goto out_c; spin_lock_init(&pool->lock); spin_lock_init(&pool->stale_lock); + local_lock_init(&pool->llock); pool->unbuddied = __alloc_percpu(sizeof(struct list_head)*NCHUNKS, 2); if (!pool->unbuddied) goto out_pool; ``` -- Oleksandr Natalenko (post-factum)