Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp4618174rwd; Tue, 23 May 2023 10:00:27 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7rRJ4XidGZKf2cyqbPc5VB/L7gjg4Fs9IZpX2Yodrpcw4duIHH7EKun71MgcgcPNAHsz/7 X-Received: by 2002:a05:6a00:22c3:b0:63b:64f7:45a0 with SMTP id f3-20020a056a0022c300b0063b64f745a0mr21125366pfj.12.1684861227637; Tue, 23 May 2023 10:00:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684861227; cv=none; d=google.com; s=arc-20160816; b=yiTRUN2BZBv/C1YRdS5w3aU01skHWuVuotdtz36iPYlmkCcb813CGLPVr8v1wnSOVv SO0hmWuhUj7Q8Wenq2bTJSYg18JAzc+nAQBnur3hHUxUyL3DB/b9b3xTJltoBEV5dSag GBQQq+xzINyT38FCJ4kfD7eulItFRWCdsZBS5GVnsizH3ksoX4td8gSum8LambbxwsAc U1cZUfmXi4a3tVc/DGghErUEjP3B7RwCJyAaZCmVmnUv3FKkFJBoTs+OAU1Qj7bPJA0M btqw+FloqciFdduv0PHbXGBxa50vBJw0Go17hVQ0QcLqy8DmSVmeMODM+kZzJinkBxJh Stmw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:dkim-signature:date; bh=z+1sZNJGqmSSorNrp7Vi6hi24SWgmfSQeHF6o5ml3aM=; b=QVqKDa4g1Tsm7wh02P7zeWwTVkIHtrycviL3ikHs6DFB5F9gAi7hDl358yXmP7f9jQ iFCHCQj9fAVAbnT+s03GwxXY0sHt1L8oGbgqPoQNO4vMRyEUeMxfzI605zatS0orJPMW ZxPDfl3U2ba9kMe99PiW7mArZgeTCBk6WX7E4ArzS32+xV01O1lqhnYCcwt2evS1UUNZ eCdovHu1dlWzoXity1Gl43uGioMH2R+VeiSEFM4Q7SyF4nt7Qh7hLLEOKsnpnz6tbIw/ ajWKBbuOPJ6wHoetUXRXxjxQ6Y6cXZrk6+Xr33LOTmLy+XL+cGhMS4NFCfgLEOp+hD/j hpfQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=boM2zgX3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k12-20020aa79d0c000000b0063b6bc7df13si6762658pfp.209.2023.05.23.10.00.11; Tue, 23 May 2023 10:00:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=boM2zgX3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237564AbjEWQf4 (ORCPT + 99 others); Tue, 23 May 2023 12:35:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48280 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237211AbjEWQfx (ORCPT ); Tue, 23 May 2023 12:35:53 -0400 Received: from out-43.mta0.migadu.com (out-43.mta0.migadu.com [91.218.175.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B4093E5B for ; Tue, 23 May 2023 09:35:41 -0700 (PDT) Date: Tue, 23 May 2023 12:35:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1684859739; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=z+1sZNJGqmSSorNrp7Vi6hi24SWgmfSQeHF6o5ml3aM=; b=boM2zgX3bgqEBDShk9hs2N9+0YcX7kYB2z6gjcPWbAuGANGbQ8XRf/CROsOsiW2+eAm0it 8iLCgk5Cl0XSAo8r3TGcHZehApiBzJwTlLXnbU2dyUrYPhtmmQwLMxvyZyvCM1Tib2GUac xWptY55CWX/cCg0qWcOG42+RgTJCy5c= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Kent Overstreet To: Christoph Hellwig Cc: Jan Kara , cluster-devel@redhat.com, "Darrick J . Wong" , linux-kernel@vger.kernel.org, dhowells@redhat.com, linux-bcachefs@vger.kernel.org, linux-fsdevel@vger.kernel.org, Kent Overstreet Subject: Re: [Cluster-devel] [PATCH 06/32] sched: Add task_struct->faults_disabled_mapping Message-ID: References: <20230509165657.1735798-1-kent.overstreet@linux.dev> <20230509165657.1735798-7-kent.overstreet@linux.dev> <20230510010737.heniyuxazlprrbd6@quack3> <20230523133431.wwrkjtptu6vqqh5e@quack3> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 23, 2023 at 09:21:56AM -0700, Christoph Hellwig wrote: > On Tue, May 23, 2023 at 03:34:31PM +0200, Jan Kara wrote: > > I've checked the code and AFAICT it is all indeed handled. BTW, I've now > > remembered that GFS2 has dealt with the same deadlocks - b01b2d72da25 > > ("gfs2: Fix mmap + page fault deadlocks for direct I/O") - in a different > > way (by prefaulting pages from the iter before grabbing the problematic > > lock and then disabling page faults for the iomap_dio_rw() call). I guess > > we should somehow unify these schemes so that we don't have two mechanisms > > for avoiding exactly the same deadlock. Adding GFS2 guys to CC. > > > > Also good that you've written a fstest for this, that is definitely a useful > > addition, although I suspect GFS2 guys added a test for this not so long > > ago when testing their stuff. Maybe they have a pointer handy? > > generic/708 is the btrfs version of this. > > But I think all of the file systems that have this deadlock are actually > fundamentally broken because they have a mess up locking hierarchy > where page faults take the same lock that is held over the the direct I/ > operation. And the right thing is to fix this. I have work in progress > for btrfs, and something similar should apply to gfs2, with the added > complication that it probably means a revision to their network > protocol. No, this is fundamentally because userspace controls the ordering of locking because the buffer passed to dio can point into any address space. You can't solve this by changing the locking heirarchy. If you want to be able to have locking around adding things to the pagecache so that things that bypass the pagecache can prevent inconsistencies (and we do, the big one is fcollapse), and if you want dio to be able to use that same locking (because otherwise dio will also cause page cache inconsistency), this is the way to do it.