Received: by 10.223.164.221 with SMTP id h29csp3168157wrb; Fri, 3 Nov 2017 02:54:28 -0700 (PDT) X-Google-Smtp-Source: ABhQp+TfQZzeL7oBA5urljtsNLHH07DOc26LgZa3F8wawPhRRjymW1P8nSqBtxQO9MX8pQHNnT6X X-Received: by 10.98.224.11 with SMTP id f11mr7046218pfh.43.1509702867911; Fri, 03 Nov 2017 02:54:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1509702867; cv=none; d=google.com; s=arc-20160816; b=NZMqprTFzyEHG0O4PlnjOSHu9d9yhyZz0i9ZqSVJgKLeFRASjyNzEs+klzBVg0hSKe crtPOF9440g+MfzMbEajaF2iSjaXHBHxdsVIqd6IhThSkJoZ6Bt34/pr2KgosjdD4QoR nf/syTyAAxf4nkf5nhc9jBfFYH98h8q6VJUGjA6274XmQI7rTLA5YYuv+K8rB739hAwq Gn1LzKi1s4uwsCR0x/7BGqZGhpOZ6aYVw8Ga7Jtnl3dd1pPTiVSYmhIT+DN4x+uYFODR 9AaFyWUNxmxnHLnsM33HB/kafZQ8floahqpm4H47xFMNn8IMJKqP7txJNNOZ0D0g+CAY xfTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=gu/yWloSbxSFak+tDpsC32VcrqrMG+pEhbPX4triFN8=; b=RM/fPmHQJjILXGb/BDgyqXCiLdJUyYcyLFx9HWSvq5WJRbl3rLxGUKDLXQgzNHMFeM Ds7JisoelOp9+IhnosS29j3Dp06jOh7/DyLhWb9Ta+fdJMH94SI5irDVJ9CKxoS1JkRg CTa4nrXGZwZZ0kl1vNY1WqmaPE3QV6RDA8X2aTgm8YPQ0Z/xmv6bKvRn+riR5p7Kpkht BtBMlinAja3MhtsDRoAT9v9rm/0tvEWlI31/dMNgcLog15IDUGrdL6+rvUxMuXfW1yCE c2scFtB3SZsC59c+Gz6DJVmRY/o8NbO+IZ3POlChXSy0iHKDlSwtnerP4u+pAFMZi3Pp 5q5g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=eu2Q7/LE; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o19si5490510pgn.751.2017.11.03.02.54.14; Fri, 03 Nov 2017 02:54:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=eu2Q7/LE; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755763AbdKCJw2 (ORCPT + 97 others); Fri, 3 Nov 2017 05:52:28 -0400 Received: from mail-io0-f193.google.com ([209.85.223.193]:49574 "EHLO mail-io0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750857AbdKCJw0 (ORCPT ); Fri, 3 Nov 2017 05:52:26 -0400 Received: by mail-io0-f193.google.com with SMTP id n137so5024498iod.6 for ; Fri, 03 Nov 2017 02:52:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=gu/yWloSbxSFak+tDpsC32VcrqrMG+pEhbPX4triFN8=; b=eu2Q7/LEMaw9EecnteAy8rSZFc1DqxobBVKTgy+eKBJC8WipqjNF3Smzh10DS9HGJa CjKPebUpgkUz6duKjCvgVJsURMW318v0GIBiapnjl8Xl4Y+/xdamX4J6hwaHynWuIAOI mvhaOuTGWDRHWmi6vTINQdzXglEw9+e9XpM8TivAYgqBXF813ZADozlDSmn4Tup/c8Hx PvBzF6weKJJ7sDIvCN2Rd+xbcDGgJJ1yrJXrKMSBF6lafwKw+4mtisbAhfQWyJVCd9yk HCk3fxIXh3ENy+n9/PMGAuxWW/VrrVz50lsg68jeB9dk6EThYBGPDwp5f/WWmCXb6Xlb Qglg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=gu/yWloSbxSFak+tDpsC32VcrqrMG+pEhbPX4triFN8=; b=lXCIjshxysBKs5UMglyPSfQk5HCGzKz+5ykJM7yOaDz6XJA0SFi9VuIglUmTDkAjhJ Y+fy8E5ECUkhC2fb4mq+6mwldFLVro/Z4vBVFH9hXJq0Qb3+doaDd58D8YiA0HeW2UWi t8+TU++NbcztnJoakMUl0Iux8TwvqFaMLsBPbkUD8ZQaSusiqvV/rMNIJn0E43enbVdr LNQZaw+b44oY1Zvno9rwcSABYCG9uM81JE3gVhP/iOUoJfaSPTpufrqxRXBycj9FLJIY OI4P955xxBOKtaLmhh94K6zRceBdVDOvAIqTT8+7LopZqRvkTtFxaOV6uqeNx+Y+DKty A4UQ== X-Gm-Message-State: AMCzsaWuLhkyK766F1f937397CZVQp2HJcqb5de4k4qU9TQk0xDH0bPV yrmeBGMGP7LbuuMUIgfAUsI8e00LSb444Gzcfew= X-Received: by 10.36.184.134 with SMTP id m128mr6211551ite.96.1509702745322; Fri, 03 Nov 2017 02:52:25 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.53.76 with HTTP; Fri, 3 Nov 2017 02:52:24 -0700 (PDT) In-Reply-To: <20171103082417.7rwns74txzzoyzyv@dhcp22.suse.cz> References: <20171102093613.3616-1-mhocko@kernel.org> <20171102093613.3616-2-mhocko@kernel.org> <20171103082417.7rwns74txzzoyzyv@dhcp22.suse.cz> From: David Herrmann Date: Fri, 3 Nov 2017 10:52:24 +0100 Message-ID: Subject: Re: [PATCH 1/2] shmem: drop lru_add_drain_all from shmem_wait_for_pins To: Michal Hocko Cc: Hugh Dickins , linux-mm , Peter Zijlstra , Thomas Gleixner , Johannes Weiner , Mel Gorman , Tejun Heo , LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi On Fri, Nov 3, 2017 at 9:24 AM, Michal Hocko wrote: > On Fri 03-11-17 00:46:18, Hugh Dickins wrote: >> On Thu, 2 Nov 2017, Michal Hocko wrote: >> > From: Michal Hocko >> > >> > syzkaller has reported the following lockdep splat >> > ====================================================== >> > WARNING: possible circular locking dependency detected >> > 4.13.0-next-20170911+ #19 Not tainted >> > ------------------------------------------------------ >> > syz-executor5/6914 is trying to acquire lock: >> > (cpu_hotplug_lock.rw_sem){++++}, at: [] get_online_cpus include/linux/cpu.h:126 [inline] >> > (cpu_hotplug_lock.rw_sem){++++}, at: [] lru_add_drain_all+0xe/0x20 mm/swap.c:729 >> > >> > but task is already holding lock: >> > (&sb->s_type->i_mutex_key#9){++++}, at: [] inode_lock include/linux/fs.h:712 [inline] >> > (&sb->s_type->i_mutex_key#9){++++}, at: [] shmem_add_seals+0x197/0x1060 mm/shmem.c:2768 >> > >> > more details [1] and dependencies explained [2]. The problem seems to be >> > the usage of lru_add_drain_all from shmem_wait_for_pins. While the lock >> > dependency is subtle as hell and we might want to make lru_add_drain_all >> > less dependent on the hotplug locks the usage of lru_add_drain_all seems >> > dubious here. The whole function cares only about radix tree tags, page >> > count and page mapcount. None of those are touched from the draining >> > context. So it doesn't make much sense to drain pcp caches. Moreover >> > this looks like a wrong thing to do because it basically induces >> > unpredictable latency to the call because draining is not for free >> > (especially on larger machines with many cpus). >> > >> > Let's simply drop the call to lru_add_drain_all to address both issues. >> > >> > [1] http://lkml.kernel.org/r/089e0825eec8955c1f055c83d476@google.com >> > [2] http://lkml.kernel.org/r/http://lkml.kernel.org/r/20171030151009.ip4k7nwan7muouca@hirez.programming.kicks-ass.net >> > >> > Cc: David Herrmann >> > Cc: Hugh Dickins >> > Signed-off-by: Michal Hocko >> >> NAK. shmem_wait_for_pins() is waiting for temporary pins on the pages >> to go away, and using lru_add_drain_all() in the usual way, to lower >> the refcount of pages temporarily pinned in a pagevec somewhere. Page >> count is touched by draining pagevecs: I'm surprised to see you say >> that it isn't - or have pagevec page references been eliminated by >> a recent commit that I missed? > > I must be missing something here. __pagevec_lru_add_fn merely about > moving the page into the appropriate LRU list, pagevec_move_tail only > rotates, lru_deactivate_file_fn moves from active to inactive LRUs, > lru_lazyfree_fn moves from anon to file LRUs and activate_page_drain > just moves to the active list. None of those operations touch the page > count AFAICS. So I would agree that some pages might be pinned outside > of the LRU (lru_add_pvec) and thus unreclaimable but does this really > matter. Or what else I am missing? Yes, we need to make sure those page-pins are dropped. shmem_wait_for_pins() literally just waits for all those to be cleared, since there is no way to tell whether a page is still inflight for some pending async WRITE operation. Hence, if the pagevecs keep pinning those pages, we must fail the shmem-seal operation, as we cannot guarantee there are no further WRITEs to this file. The refcount is our only way to tell. I think the caller could just call lru_add_drain_all() between mapping_deny_writable() and shmem_wait_for_pins(), releasing the inode-lock in between. But that means we drain it even if shmem_tag_pins() does not find anything (presumably the common case). It would also have weird interactions with parallel inode-operations, in case the seal-operation fails and is reverted. Not sure I like that. Thanks David From 1583032586143117133@xxx Fri Nov 03 08:25:19 +0000 2017 X-GM-THRID: 1582946571328910726 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread