Received: by 2002:a05:7412:da14:b0:e2:908c:2ebd with SMTP id fe20csp296380rdb; Fri, 6 Oct 2023 04:02:04 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHlEysXlc7XUIzGBGMAg1HpDw+qLzfBM99offMYHEXtXSN0oX+n36xFHQ55psL09P0rgpLh X-Received: by 2002:a05:6a20:144e:b0:161:25f7:40ce with SMTP id a14-20020a056a20144e00b0016125f740cemr8457261pzi.27.1696590123675; Fri, 06 Oct 2023 04:02:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696590123; cv=none; d=google.com; s=arc-20160816; b=ueDxsd7CEdd+/y14XqAVJ79tlQ5qhi4IguGl9GMeRA3jhmvSTRhoOqb6KV96B39tfa C0Kibq9O9Z4Q6Edv2cqYvixVaERFDHDfBAYzdBT/qgt7owiIbpdvjvYUraEUuI3+jQsd riKGK8EL7TkvIxQwjQ+YKUZAjuWf88SnakZj9/KvYRW+xmPR0KX6GA2WxKk8jLc1dDpn Qr+/E6ii8ZyNMpZZ740QA4tp9eCVfmcHQopxsV2fi/9783SKt6dtzKPGmSx+Vhxl0OQh JPpkp0ZpEELp86nshnEPg1G0ouvkmzE5musurhFIg7uV+ABqml1l+cWTLnQyRD+GAGbL dJ9w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature :dkim-signature; bh=piOdkBH7q3g/1BdTNsnZfQXf4D5rELoYVkzhPQ8djN8=; fh=OJ9YAqYY/Gr3aVsMin7eXoLnZmkHDmQF6dcOU4CB9JE=; b=Xf8M+NbLhoqk4P5VAmYPBAvHOPnDo1JQLsBm2NhjZ5ue3RQ8jk07zFj1BIOfmv6eqa sb10SMn7OGCVY/f8rwAMNz8dXNEeBlh3nEQ9Hn8Ww7eGDRUGCFvCfXWU5sm2gwrtBdrj TBx6YEo2vtR/ESYcPA2wXBr0JjpgvTSXLjowa31R8PYCRzEnM/14OrmvYccFZSXzwwtf i2/08LKSt/ub/BBJJvXE9cD7DPypN+lpayKQGaqh8jlUfDjz8Cok6HfTi2TXFB8Lc7uO bf1nbmD2MHZf9ZD17V3cO5zlsiAD9Oi/rrN3yjYRzrYgZ5DK8wc3MNzFkVJ8iCRttuU3 LkkA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=nmSUnawL; dkim=neutral (no key) header.i=@suse.cz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id k15-20020a056a00134f00b0068e4704fd5bsi1279944pfu.346.2023.10.06.04.02.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Oct 2023 04:02:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=nmSUnawL; dkim=neutral (no key) header.i=@suse.cz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id E57D781489D0; Fri, 6 Oct 2023 04:02:00 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231898AbjJFLBq (ORCPT + 99 others); Fri, 6 Oct 2023 07:01:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50410 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231932AbjJFLBn (ORCPT ); Fri, 6 Oct 2023 07:01:43 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8CF43C5; Fri, 6 Oct 2023 04:01:37 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 9ADD61F895; Fri, 6 Oct 2023 11:01:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1696590096; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=piOdkBH7q3g/1BdTNsnZfQXf4D5rELoYVkzhPQ8djN8=; b=nmSUnawLLGxHZx6O3UIeMOwzv12xv2RZvPnJ8stoCvCKdwMofAdiMkHBnrh5UPn21qX9o8 8Q74t+0pmMyfS3/KfP2UsdmJy6rFFQvDAp98BpFFW++/+Kn1eqKAHwCtFWit6clUi1S0yR fgz62rGbjmT0562KHtQFkzMe8VKYFP8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1696590096; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=piOdkBH7q3g/1BdTNsnZfQXf4D5rELoYVkzhPQ8djN8=; b=TbbiZWcj4OMYEB5UYrj9hYV67Ip7O8Kp80NroTbUjI3fpK0U77fZTUeMWiOsexUDWr6vKT qVUAtYSxM2vr4KCg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 8A1FC13A2E; Fri, 6 Oct 2023 11:01:36 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id Wf6yIRDpH2WWagAAMHmgww (envelope-from ); Fri, 06 Oct 2023 11:01:36 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 19CDFA07CC; Fri, 6 Oct 2023 13:01:36 +0200 (CEST) Date: Fri, 6 Oct 2023 13:01:36 +0200 From: Jan Kara To: Hugh Dickins Cc: Jan Kara , Andrew Morton , Christian Brauner , Carlos Maiolino , Chuck Lever , Matthew Wilcox , Johannes Weiner , Axel Rasmussen , Vlastmil Babka , Sasha Levin , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 3/8] shmem: factor shmem_falloc_wait() out of shmem_fault() Message-ID: <20231006110136.7xnfmjjrmmpumwyf@quack3> References: <6fe379a4-6176-9225-9263-fe60d2633c0@google.com> <20231003131853.ramdlfw5s6ne4iqx@quack3> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=2.7 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Fri, 06 Oct 2023 04:02:01 -0700 (PDT) X-Spam-Level: ** On Thu 05-10-23 20:48:00, Hugh Dickins wrote: > On Tue, 3 Oct 2023, Jan Kara wrote: > > On Fri 29-09-23 20:27:53, Hugh Dickins wrote: > > > That Trinity livelock shmem_falloc avoidance block is unlikely, and a > > > distraction from the proper business of shmem_fault(): separate it out. > > > (This used to help compilers save stack on the fault path too, but both > > > gcc and clang nowadays seem to make better choices anyway.) > > > > > > Signed-off-by: Hugh Dickins > > > > Looks good. Feel free to add: > > > > Reviewed-by: Jan Kara > > Thanks a lot for all these reviews, Jan. (And I particularly enjoyed > your "Autumn cleaning" remark: sweeping up the leaves, I've been glad > to have "Autumn Almanac" running through my head since reading that.) :-) You have widened my musical horizon. > > Looking at the code I'm just wondering whether the livelock with > > shmem_undo_range() couldn't be more easy to avoid by making > > shmem_undo_range() always advance the index by 1 after evicting a page and > > thus guaranteeing a forward progress... Because the forward progress within > > find_get_entries() is guaranteed these days, it should be enough. > > I'm not sure that I understand your "advance the index by 1" comment. > Since the "/* If all gone or hole-punch or unfalloc, we're done */" > change went in, I believe shmem_undo_range() does make guaranteed > forward progress; but its forward progress is not enough. Right, I have missed that retry when glancing over the code. And the "advance the index by 1" was also wrong because find_get_entries() already does it these days. > I would love to delete all that shmem_falloc_wait() strangeness; > and your comment excited me to look, hey, can we just delete all that > stuff now, instead of shifting it aside? That would be much nicer. Well, even if you decided to keep the synchronization what you could do these days is to use inode->i_mapping->invalidate_lock for synchronization instead of your home-grown solution (like all other filesystems do for these kind of races). If you don't want to pay the cost of rwsem acquisition in the fault fast path, you could do it in the hole-punch running case only like currently. > And if I'd replied to you yesterday, I'd have been saying yes we can. > But that's because I hadn't got far enough through re-reading the > various July 2014 3.16-rc mail threads. I had been excited to find > myself posting a revert of the patch; before reaching Sasha's later > livelock which ended up with "belt and braces" retaining the > shmem_falloc wait while adding the "If all gone or hole-punch" mod. > > https://marc.info/?l=linux-kernel&m=140487864819409&w=2 > though that thread did not resolve, and morphed into lockdep moans. > > So I've reverted to my usual position: that it's conceivable that > something has changed meanwhile, to make that Trinity livelock no > longer an issue (in particular, i_mmap_mutex changed to i_mmap_rwsem, > and much later unmap_mapping_range() changed to only take it for read: > but though that change gives hope, I suspect it would turn out to be > ineffectual against the livelock); but that would have to be proved. > > If there's someone who can re-demonstrate Sasha's Trinity livelock > on 3.16-with-shmem-falloc-wait-disabled, or re-demonstrate it on any > later release-with-shmem-falloc-wait-disabled, but demonstrate that > the livelock does not occur on 6.6-rc-with-shmem-falloc-wait-disabled > (or that plus some simple adjustment of hacker's choosing): that would > be great news, and we could delete all this - though probably not > without bisecting to satisfy ourselves on what was the crucial change. > > But I never got around to running up Trinity to work on it originally, > nor in the years since, nor do I expect to soon. (Vlastimil had a > good simple technique for demonstrating a part of the problem, but > fixing that part turned out not fix the whole issue with Trinity.) Fair enough. I agree that we should do some testing before we actually remove the serialization because the problem was not well understood even back then and likely had something to do with unmap_mapping_folio() inefficiency (e.g. unmapping one page at a time acquiring heavily contended i_mmap_mutex for each page) rather than page cache eviction itself. Honza -- Jan Kara SUSE Labs, CR