Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp3023316imj; Mon, 11 Feb 2019 12:28:35 -0800 (PST) X-Google-Smtp-Source: AHgI3IatWh3RPGjJpxVx0rbr88dqWZqmXQy531eZWN11FF9WD42XMQz/ia2y7PAqSRe/oIYv4gcb X-Received: by 2002:a17:902:9a07:: with SMTP id v7mr25835plp.247.1549916915253; Mon, 11 Feb 2019 12:28:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1549916915; cv=none; d=google.com; s=arc-20160816; b=b4TibiCtuJAM5IJnhTreeN+/2BYC+DfdwDPKG/DrE35sxV9wcehqXjomi2N719GAi7 CrsVW3qVVFP1IYJKvSHxYPh53VgczKUZy3kYWL8HuMLtvHf9YXOqb0Jt//DTEvcSsPKW VBEpPce4e9+31Blve361Wu6abuRAlAMUr59tUsW/rAGf9JqkNNb6zFWPUUAxW/om/wkL xgMz89LB4eQzw7P5VzLbIk1uEQagNXjM+TmAF7MfSeFgyz5z7UEkTJ6IjYFqaeVEPiqn jKieqCGK3z2aTH6psZbfkfXsjbXEuWtMxtYgBNa5QaOCzrKyJU8CMdnGAX9XDj67feS3 F11g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:content-disposition :mime-version:message-id:subject:to:from:date; bh=UezU9WuYb5yNX0TQLkTWUFMivEMkUOXWtYKTQWw+9NY=; b=KFUO7u+l3blwA3KyxsCpJMvhg6Of7zhyshaWPSuLIFfOTTHcudh8om2O/rKm2dGBw/ BzX1SatO3Tq7yguVvnopYYThEDTip/pQQE5Y4vqep5wPHBvC4dICsoPKJ92KQL22Fa75 nZuGRgC+JeVyY+AW0VvzBiYyYc5ldR/2QkffaBkfCphUOBdqTQs21FCS4XNk6tazxoVH QryERiO+SGl4PPweW7/MiuvpZFY2+VH6NruYtBeNeoieP0KePXuFvTrTgO5bkn395h1w jejhZOTZxxlTtsiWTyao8xVE+6fvb08Shl6OLsizUl3whlsDAJLVzlS2hzgkSWBGn7rA cH1A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l2si3482231pfc.287.2019.02.11.12.28.19; Mon, 11 Feb 2019 12:28:35 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727251AbfBKQtU (ORCPT + 99 others); Mon, 11 Feb 2019 11:49:20 -0500 Received: from koef.zs64.net ([212.12.50.230]:13427 "EHLO koef.zs64.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726105AbfBKQtU (ORCPT ); Mon, 11 Feb 2019 11:49:20 -0500 X-Greylist: delayed 1255 seconds by postgrey-1.27 at vger.kernel.org; Mon, 11 Feb 2019 11:49:18 EST Received: from koef.zs64.net (koef.zs64.net [212.12.50.230]) by 0ons.org (8.15.2/8.15.2) with ESMTP id x1BGSMM4048139 for ; Mon, 11 Feb 2019 16:28:22 GMT (envelope-from cracauer@koef.zs64.net) Received: (from cracauer@localhost) by koef.zs64.net (8.15.2/8.15.2/Submit) id x1BGSMJd048136 for linux-kernel@vger.kernel.org; Mon, 11 Feb 2019 11:28:22 -0500 (EST) (envelope-from cracauer) Date: Mon, 11 Feb 2019 11:28:22 -0500 From: Martin Cracauer To: linux-kernel@vger.kernel.org Subject: Possibe NFS mm problem: client page-in errors with ZFS Linux server Message-ID: <20190211162821.GA38018@cons.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.11.2 (2019-01-07) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Folks. I suspect that this isn't actually a ZFS bug but a general memory manager problem. I would appreciate input from mm folks. I parked the detailed bug report here for now: https://github.com/zfsonlinux/zfs/issues/8396 Short story: When running a ZFS on a Linux 4.19/4.20 NFS server the clients are occasionally unsuccessful in perfectly normal page-in on mapped files. Example: executable on NFS mounted filesystem. Some executable-mapped page is referenced. The page is supposed to be retrieved on demand if not resident yet. This occasionally fails with recent Linux or ZoL code. The semantics of the error are identical to what is legitimately happening when you have a page fault in a NFS client mapped file after that file has been unlinked on the server side. Here it is happening without the unlinking. I suspect this might be a general Linux mm problem because I cross-checked with a FreeBSD server with very similar ZFS code. Although I cannot track when the error started appearing I know it is about within the last 6 months and I read all commits to ZoL that manage pages and couldn't see anything suspicious off-hand. On the other hand the errors do not appear when moving the server side file tree to ext4fs. The errors get more frequent with uptime of the server and are not impressed by drop_caches or by trying to evict ZFS' own caches with memory pressure. Details with a line of reasoning why I blame the server and all other info I am collecting: https://github.com/zfsonlinux/zfs/issues/8396 Thanks Martin -- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Martin Cracauer http://www.cons.org/cracauer/