Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp2751445imj; Mon, 11 Feb 2019 07:58:17 -0800 (PST) X-Google-Smtp-Source: AHgI3Ibrdq6GSNyA0i06ZZGB2B98/2dBAbJ4yjKJeuN3AXSHhZVJ3S3LS+hD1DwKFoZI8g2YeYAT X-Received: by 2002:a17:902:6b49:: with SMTP id g9mr18611353plt.291.1549900697008; Mon, 11 Feb 2019 07:58:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1549900697; cv=none; d=google.com; s=arc-20160816; b=bFWaxnI1HKtRiS6VqVWxRAGQon54QLCVxjjEPo8tuM4hDe1XMZXrXkcNIM7HYdjSgf upz0Q68CTm0usQM3C8mhcS1go28pK832xdGXdCjFv5mi36ztw8CZNPAahsvu7mHN3WXN iLpEwwUP+fyVtlrimIp5gV/VV6LSBEBjFC8SgoJSg7XP3d8/ax9w/5Ib/ghQN7hG11gq txUNuG5gKEMNQGqGXidYSzNqavrOk3A3apoS08fw+MWaTQXieFRxq7qpkjPMpDeX0NPm YMlKx776L65Bm1F4ly8Gq/1dq1wFQnOqGiIquc13AJgBeI88WLDYGRcKaY3U2lgzhnYN v7cQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=EjyLmA+/wqdTnKDUdShqC48wWRA8HtRr24TSdXkNCTY=; b=UgOrovmIXKKqdsf9UQ5sNWhDm4kMaVTBXqeEDaflVjYfQEtPKUrSrBPM+fHrOpSuYm DWrM0RJ7Mu4ircL2JCrX/s9Gn7SBROyPnSeyS4ZLOJkIKkahKicJ9z7dnoEd7zg68igf gFYKfLZISINAYgNoKwRUTS0GOMvxus49mZLXUbvUxJk6Cz5fjp4v/MllyurXQsNOQDwn 3Z1uwt6UXyiPxfhSAqGK+9l8miWYCvVtGk5hr4ukbE1yfKFXXkke22izm04LspfXFlrE iq3wnccx0waRz063Tkic8vVviP/8uTpIcukPetvug3hOt3zLA5YG6nq3NxU9KBcINzOG GPzg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=VfOvPZg1; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q8si10246537plr.382.2019.02.11.07.58.01; Mon, 11 Feb 2019 07:58:16 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=VfOvPZg1; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729778AbfBKP5J (ORCPT + 99 others); Mon, 11 Feb 2019 10:57:09 -0500 Received: from mail.kernel.org ([198.145.29.99]:41784 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730197AbfBKOdv (ORCPT ); Mon, 11 Feb 2019 09:33:51 -0500 Received: from localhost (5356596B.cm-6-7b.dynamic.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 3D691222A2; Mon, 11 Feb 2019 14:33:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1549895630; bh=Qv0FmXJ8lWyTsKRaSrbQwphcFTSzDSf5cD6daCFuqNQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=VfOvPZg11/9qTXsW/82i8/oBk3c29QCPsrHZIJyjcw/UgOWF73++YvOFwIPduXxSh TAjBQEgrcGGOYqqtPUh5ri9d/F07/Abw+LSfPokAqYiLsk+ee1of7DYhLorNEcw76y +VDcGzD5Mzf1LN/SvlGKA5qokVex5ajSP9Lh6HY8= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Minchan Kim , Sergey Senozhatsky , Joey Pabalinas , Andrew Morton , Linus Torvalds , Sasha Levin Subject: [PATCH 4.20 269/352] zram: fix lockdep warning of free block handling Date: Mon, 11 Feb 2019 15:18:16 +0100 Message-Id: <20190211141904.122452692@linuxfoundation.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190211141846.543045703@linuxfoundation.org> References: <20190211141846.543045703@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review X-Patchwork-Hint: ignore MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.20-stable review patch. If anyone has any objections, please let me know. ------------------ [ Upstream commit 3c9959e025472122a61faebb208525cf26b305d1 ] Patch series "zram idle page writeback", v3. Inherently, swap device has many idle pages which are rare touched since it was allocated. It is never problem if we use storage device as swap. However, it's just waste for zram-swap. This patchset supports zram idle page writeback feature. * Admin can define what is idle page "no access since X time ago" * Admin can define when zram should writeback them * Admin can define when zram should stop writeback to prevent wearout Details are in each patch's description. This patch (of 7): ================================ WARNING: inconsistent lock state 4.19.0+ #390 Not tainted -------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. zram_verify/2095 [HC0[0]:SC1[1]:HE1:SE0] takes: 00000000b1828693 (&(&zram->bitmap_lock)->rlock){+.?.}, at: put_entry_bdev+0x1e/0x50 {SOFTIRQ-ON-W} state was registered at: _raw_spin_lock+0x2c/0x40 zram_make_request+0x755/0xdc9 generic_make_request+0x373/0x6a0 submit_bio+0x6c/0x140 __swap_writepage+0x3a8/0x480 shrink_page_list+0x1102/0x1a60 shrink_inactive_list+0x21b/0x3f0 shrink_node_memcg.constprop.99+0x4f8/0x7e0 shrink_node+0x7d/0x2f0 do_try_to_free_pages+0xe0/0x300 try_to_free_pages+0x116/0x2b0 __alloc_pages_slowpath+0x3f4/0xf80 __alloc_pages_nodemask+0x2a2/0x2f0 __handle_mm_fault+0x42e/0xb50 handle_mm_fault+0x55/0xb0 __do_page_fault+0x235/0x4b0 page_fault+0x1e/0x30 irq event stamp: 228412 hardirqs last enabled at (228412): [] __slab_free+0x3e6/0x600 hardirqs last disabled at (228411): [] __slab_free+0x1c5/0x600 softirqs last enabled at (228396): [] __do_softirq+0x31e/0x427 softirqs last disabled at (228403): [] irq_exit+0xd1/0xe0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&zram->bitmap_lock)->rlock); lock(&(&zram->bitmap_lock)->rlock); *** DEADLOCK *** no locks held by zram_verify/2095. stack backtrace: CPU: 5 PID: 2095 Comm: zram_verify Not tainted 4.19.0+ #390 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x67/0x9b print_usage_bug+0x1bd/0x1d3 mark_lock+0x4aa/0x540 __lock_acquire+0x51d/0x1300 lock_acquire+0x90/0x180 _raw_spin_lock+0x2c/0x40 put_entry_bdev+0x1e/0x50 zram_free_page+0xf6/0x110 zram_slot_free_notify+0x42/0xa0 end_swap_bio_read+0x5b/0x170 blk_update_request+0x8f/0x340 scsi_end_request+0x2c/0x1e0 scsi_io_completion+0x98/0x650 blk_done_softirq+0x9e/0xd0 __do_softirq+0xcc/0x427 irq_exit+0xd1/0xe0 do_IRQ+0x93/0x120 common_interrupt+0xf/0xf With writeback feature, zram_slot_free_notify could be called in softirq context by end_swap_bio_read. However, bitmap_lock is not aware of that so lockdep yell out: get_entry_bdev spin_lock(bitmap->lock); irq softirq end_swap_bio_read zram_slot_free_notify zram_slot_lock <-- deadlock prone zram_free_page put_entry_bdev spin_lock(bitmap->lock); <-- deadlock prone With akpm's suggestion (i.e. bitmap operation is already atomic), we could remove bitmap lock. It might fail to find a empty slot if serious contention happens. However, it's not severe problem because huge page writeback has already possiblity to fail if there is severe memory pressure. Worst case is just keeping the incompressible in memory, not storage. The other problem is zram_slot_lock in zram_slot_slot_free_notify. To make it safe is this patch introduces zram_slot_trylock where zram_slot_free_notify uses it. Although it's rare to be contented, this patch adds new debug stat "miss_free" to keep monitoring how often it happens. Link: http://lkml.kernel.org/r/20181127055429.251614-2-minchan@kernel.org Signed-off-by: Minchan Kim Reviewed-by: Sergey Senozhatsky Reviewed-by: Joey Pabalinas Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- drivers/block/zram/zram_drv.c | 38 +++++++++++++++++++---------------- drivers/block/zram/zram_drv.h | 2 +- 2 files changed, 22 insertions(+), 18 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 8e6a0db6555f..d1459cc1159f 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -53,6 +53,11 @@ static size_t huge_class_size; static void zram_free_page(struct zram *zram, size_t index); +static int zram_slot_trylock(struct zram *zram, u32 index) +{ + return bit_spin_trylock(ZRAM_LOCK, &zram->table[index].value); +} + static void zram_slot_lock(struct zram *zram, u32 index) { bit_spin_lock(ZRAM_LOCK, &zram->table[index].value); @@ -401,7 +406,6 @@ static ssize_t backing_dev_store(struct device *dev, goto out; reset_bdev(zram); - spin_lock_init(&zram->bitmap_lock); zram->old_block_size = old_block_size; zram->bdev = bdev; @@ -445,29 +449,24 @@ out: static unsigned long get_entry_bdev(struct zram *zram) { - unsigned long entry; - - spin_lock(&zram->bitmap_lock); + unsigned long blk_idx = 1; +retry: /* skip 0 bit to confuse zram.handle = 0 */ - entry = find_next_zero_bit(zram->bitmap, zram->nr_pages, 1); - if (entry == zram->nr_pages) { - spin_unlock(&zram->bitmap_lock); + blk_idx = find_next_zero_bit(zram->bitmap, zram->nr_pages, blk_idx); + if (blk_idx == zram->nr_pages) return 0; - } - set_bit(entry, zram->bitmap); - spin_unlock(&zram->bitmap_lock); + if (test_and_set_bit(blk_idx, zram->bitmap)) + goto retry; - return entry; + return blk_idx; } static void put_entry_bdev(struct zram *zram, unsigned long entry) { int was_set; - spin_lock(&zram->bitmap_lock); was_set = test_and_clear_bit(entry, zram->bitmap); - spin_unlock(&zram->bitmap_lock); WARN_ON_ONCE(!was_set); } @@ -888,9 +887,10 @@ static ssize_t debug_stat_show(struct device *dev, down_read(&zram->init_lock); ret = scnprintf(buf, PAGE_SIZE, - "version: %d\n%8llu\n", + "version: %d\n%8llu %8llu\n", version, - (u64)atomic64_read(&zram->stats.writestall)); + (u64)atomic64_read(&zram->stats.writestall), + (u64)atomic64_read(&zram->stats.miss_free)); up_read(&zram->init_lock); return ret; @@ -1402,10 +1402,14 @@ static void zram_slot_free_notify(struct block_device *bdev, zram = bdev->bd_disk->private_data; - zram_slot_lock(zram, index); + atomic64_inc(&zram->stats.notify_free); + if (!zram_slot_trylock(zram, index)) { + atomic64_inc(&zram->stats.miss_free); + return; + } + zram_free_page(zram, index); zram_slot_unlock(zram, index); - atomic64_inc(&zram->stats.notify_free); } static int zram_rw_page(struct block_device *bdev, sector_t sector, diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h index 72c8584b6dff..d1095dfdffa8 100644 --- a/drivers/block/zram/zram_drv.h +++ b/drivers/block/zram/zram_drv.h @@ -79,6 +79,7 @@ struct zram_stats { atomic64_t pages_stored; /* no. of pages currently stored */ atomic_long_t max_used_pages; /* no. of maximum pages stored */ atomic64_t writestall; /* no. of write slow paths */ + atomic64_t miss_free; /* no. of missed free */ }; struct zram { @@ -110,7 +111,6 @@ struct zram { unsigned int old_block_size; unsigned long *bitmap; unsigned long nr_pages; - spinlock_t bitmap_lock; #endif #ifdef CONFIG_ZRAM_MEMORY_TRACKING struct dentry *debugfs_dir; -- 2.19.1