Received: by 2002:a05:6a10:1d13:0:0:0:0 with SMTP id pp19csp1494061pxb; Fri, 20 Aug 2021 06:53:24 -0700 (PDT) X-Google-Smtp-Source: ABdhPJywyOpkoE9RMf1YDTqvlRwzNSjEtAuobuvcaj4l6M8UdzbSby7SnW1GsRJJ12rJOcQi561Z X-Received: by 2002:a17:907:3f14:: with SMTP id hq20mr21735778ejc.370.1629467604695; Fri, 20 Aug 2021 06:53:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1629467604; cv=none; d=google.com; s=arc-20160816; b=YBNgdO3/oMaeHFi8FmgayPfcfpwT45I4aXd1g2wttYCOoAG2plbRKmBIM46OU7+AkA 3kBmmvUkPBKqBWtIkVDaMbQUVbXy+0WyI3E4e60Tmi9IKsLmdtNDuYqgHyl4xVINv/Lx ClPLkn5bhP9ssyYD5SOiGMZm4LtjHYN9KkCGCX0As/iWDM6ANN3tEcJ/fDdNq1IJfzLp Y8o0dzmhdA+Umwk/DnnsZwKll/W/z6fUdjSWA6/H9EjdhAV6m4NVqf9/F5tTBefE6zox Z/RzpzZ6bOIwskLzfmJ1Y6cipLx8/OAMVFQlSv12+GAZFKu8qCLH5mvD/9/bXQkr4izo 352A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:organization:references:in-reply-to:date:cc:to:from :subject:message-id:dkim-signature; bh=S4VaO/ZlpDnEtWAJ02e00kwDAMMR+6kRDVmQuNZWgk0=; b=q6ZXTxtBQW5mDWibrfXd0OoWYxdu4R59jKNIZxUOBQ/PFWj/CJ5oWgtfZDBMEOE7J4 3ZMi7+Q1bDGwaqaNmNMqn5zIXSc96mA20wGuWxC3twvXRork11OzaP0C2uzLrHu1B9oz or9kNVHZUwmVNav2cS/Km4C4pMwqXTHilNQa1Bf4xzf2n6CKpgDXGqu5gyBOMsFVwU7s xX4fT5tD33fgQ4LSzdYSOP47dBWKFEqF6MyyptsPbW9d5rx7u1kpx2Dweir7GvBN3BDE JueCVhHqqI03Ive8khrv+zkZUKASSQBHqyLuMRTjgjXleJ9Xq/rAYxb03JROWNNQnFqF ZQjA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=D6SATk9L; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h20si6405830edv.453.2021.08.20.06.53.01; Fri, 20 Aug 2021 06:53:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=D6SATk9L; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240726AbhHTNsg (ORCPT + 99 others); Fri, 20 Aug 2021 09:48:36 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:24924 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231202AbhHTNsf (ORCPT ); Fri, 20 Aug 2021 09:48:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1629467277; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=S4VaO/ZlpDnEtWAJ02e00kwDAMMR+6kRDVmQuNZWgk0=; b=D6SATk9LT3KkHBBS1Gik/0ohUYvY6Ec/Su59ofc8gmRf5wuotcO58+L+3EZL59/bQRrpKn BdLtaozN24uVmzyWPWtCGTRWDLl+MIVevUxKvFlwJZpQUo+LDLifR7oRnLpHmTPziV1az6 AUC1t6SE4OX1T98iM+NTCb3aXwKEU5o= Received: from mail-ej1-f70.google.com (mail-ej1-f70.google.com [209.85.218.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-163-8C0kiTdBMZKYOsdKQ_5sNw-1; Fri, 20 Aug 2021 09:47:56 -0400 X-MC-Unique: 8C0kiTdBMZKYOsdKQ_5sNw-1 Received: by mail-ej1-f70.google.com with SMTP id e15-20020a1709061fcf00b005bd9d618ea0so3723814ejt.13 for ; Fri, 20 Aug 2021 06:47:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:organization:user-agent:mime-version :content-transfer-encoding; bh=S4VaO/ZlpDnEtWAJ02e00kwDAMMR+6kRDVmQuNZWgk0=; b=be/VbxW9f4fMnkzbz3fX29fadzN/IWVJ2DlSgm4epxiA7rVLb1prMAKqUxHeSv0tEO es9jjF/7sadq/tApwKyKHsw6nltJ0o8p0u2QgPwLA9jk3SjYvJqdFiBQte7XH0izgfWn n7k7SM6AbqbuBGVGEO0psEFOt8j6SyUmS8M72k0R3YBMGqqof7cbUx462AnTLYv4N+XF zgJHtAdDC0B8Ol/feGhdhVgIkL6pOvEwqoFKmu/qJoP5yFKGdJ4cfyUS6L8CRs0Sil8m FSFZt0T7QMNdwbSAOI+iCe/lUwfzSbZK1+VrtlpmB3YTcC7jwze0cjh+mMCSV7MVA0kS VYmg== X-Gm-Message-State: AOAM532air/PBGZQpOoK70g1FQJ71ytg9vO09TRBoBHUflbMHZ37GcaU O1R9SjI84e1lMYfhvcvPSWAbDgb9hKlrh05HC6hbSrLBTb8Qek1DCdn6BH3MuSGjwiYfEBqgyGH +8lTjftREnHoLmUWEWrVwiYKn X-Received: by 2002:a17:906:c52:: with SMTP id t18mr17733336ejf.148.1629467275450; Fri, 20 Aug 2021 06:47:55 -0700 (PDT) X-Received: by 2002:a17:906:c52:: with SMTP id t18mr17733309ejf.148.1629467275222; Fri, 20 Aug 2021 06:47:55 -0700 (PDT) Received: from 0.7.3.c.2.b.0.0.0.3.7.8.9.5.0.2.0.0.0.0.a.d.f.f.0.b.8.0.1.0.0.2.ip6.arpa (0.7.3.c.2.b.0.0.0.3.7.8.9.5.0.2.0.0.0.0.a.d.f.f.0.b.8.0.1.0.0.2.ip6.arpa. [2001:8b0:ffda:0:2059:8730:b2:c370]) by smtp.gmail.com with ESMTPSA id e22sm3641397edu.35.2021.08.20.06.47.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Aug 2021 06:47:54 -0700 (PDT) Message-ID: Subject: Re: [Cluster-devel] [PATCH v6 10/19] gfs2: Introduce flag for glock holder auto-demotion From: Steven Whitehouse To: Andreas Gruenbacher Cc: Linus Torvalds , Alexander Viro , Christoph Hellwig , "Darrick J. Wong" , Jan Kara , LKML , Matthew Wilcox , cluster-devel , linux-fsdevel , ocfs2-devel@oss.oracle.com Date: Fri, 20 Aug 2021 14:47:54 +0100 In-Reply-To: References: <20210819194102.1491495-1-agruenba@redhat.com> <20210819194102.1491495-11-agruenba@redhat.com> <5e8a20a8d45043e88013c6004636eae5dadc9be3.camel@redhat.com> Organization: Red Hat UK Ltd Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.34.4 (3.34.4-1.fc31) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On Fri, 2021-08-20 at 15:17 +0200, Andreas Gruenbacher wrote: > On Fri, Aug 20, 2021 at 11:35 AM Steven Whitehouse < > swhiteho@redhat.com> wrote: > > On Thu, 2021-08-19 at 21:40 +0200, Andreas Gruenbacher wrote: > > > From: Bob Peterson > > > > > > This patch introduces a new HIF_MAY_DEMOTE flag and > > > infrastructure > > > that will allow glocks to be demoted automatically on locking > > > conflicts. > > > When a locking request comes in that isn't compatible with the > > > locking > > > state of a holder and that holder has the HIF_MAY_DEMOTE flag > > > set, the > > > holder will be demoted automatically before the incoming locking > > > request > > > is granted. > > > > I'm not sure I understand what is going on here. When there are > > locking > > conflicts we generate call backs and those result in glock > > demotion. > > There is no need for a flag to indicate that I think, since it is > > the > > default behaviour anyway. Or perhaps the explanation is just a bit > > confusing... > > When a glock has active holders (with the HIF_HOLDER flag set), the > glock won't be demoted to a state incompatible with any of those > holders. > Ok, that is a much clearer explanation of what the patch does. Active holders have always prevented demotions previously. > > > Processes that allow a glock holder to be taken away indicate > > > this by > > > calling gfs2_holder_allow_demote(). When they need the glock > > > again, > > > they call gfs2_holder_disallow_demote() and then they check if > > > the > > > holder is still queued: if it is, they're still holding the > > > glock; if > > > it isn't, they need to re-acquire the glock. > > > > > > This allows processes to hang on to locks that could become part > > > of a > > > cyclic locking dependency. The locks will be given up when a > > > (rare) > > > conflicting locking request occurs, and don't need to be given up > > > prematurely. > > > > This seems backwards to me. We already have the glock layer cache > > the > > locks until they are required by another node. We also have the min > > hold time to make sure that we don't bounce locks too much. So what > > is > > the problem that you are trying to solve here I wonder? > > This solves the problem of faulting in pages during read and write > operations: on the one hand, we want to hold the inode glock across > those operations. On the other hand, those operations may fault in > pages, which may require taking the same or other inode glocks, > directly or indirectly, which can deadlock. > > So before we fault in pages, we indicate with > gfs2_holder_allow_demote(gh) that we can cope if the glock is taken > away from us. After faulting in the pages, we indicate with > gfs2_holder_disallow_demote(gh) that we now actually need the glock > again. At that point, we either still have the glock (i.e., the > holder > is still queued and it has the HIF_HOLDER flag set), or we don't. > > The different kinds of read and write operations differ in how they > handle the latter case: > > * When a buffered read or write loses the inode glock, it returns a > short result. This > prevents torn writes and reading things that have never existed on > disk in that form. > > * When a direct read or write loses the inode glock, it re-acquires > it before resuming > the operation. Direct I/O is not expected to return partial > results > and doesn't provide > any kind of synchronization among processes. > > We could solve this kind of problem in other ways, for example, by > keeping a glock generation number, dropping the glock before faulting > in pages, re-acquiring it afterwards, and checking if the generation > number has changed. This would still be an additional piece of glock > infrastructure, but more heavyweight than the HIF_MAY_DEMOTE flag > which uses the existing glock holder infrastructure. This is working towards the "why" but could probably be summarised a bit more. We always used to manage to avoid holding fs locks when copying to/from userspace to avoid these complications. If that is no longer possible then it would be good to document what the new expectations are somewhere suitable in Documentation/filesystems/... just so we make sure it is clear what the new system is, and everyone will be on the same page, Steve.