Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp1100446pxv; Thu, 1 Jul 2021 17:27:47 -0700 (PDT) X-Google-Smtp-Source: ABdhPJymo2EbUNaborFQ/lSFM3t5WEfvXqmZh1LgJdgqIhEmYK773DuswY+Ar0NQxpQFUg5A/MJX X-Received: by 2002:a05:6602:2c4a:: with SMTP id x10mr1536876iov.96.1625185667643; Thu, 01 Jul 2021 17:27:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1625185667; cv=none; d=google.com; s=arc-20160816; b=TC6tbuSKBzwGaoxU+nTtjt29TT/6hHll09rvVVkCawtiiZFd4/e3eNhxVxO5GCdqfj xXVtJ5uvPiIC5CdZdpSqwVWp4AztRkuxMZNtsPSCXAcWVr92OtCmc7bibRNcz/wD9Whd i0As00Yg+elsxJzagm8G0AQC0wnOeAcW30iks4vG+F0T+tHreEkx2JSNhm0NCtwgl7F7 QMqfu9/9Otl0vsLmdc3KwyKru3PgiNITYN0K0wWlCXxeK5h0l9lM+LMQ9+ghEtJslkMr GakR/tOzA51Dwkm4xYC6y78zl2wJ37Yvp5SHEa9LEd3k9cyzIYvWkPJRN+2Cx6hjONBu 9U3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=rQPsdOkVHeGKBuEsStJMLYq9tWsj5wYRKjPTUAXJMQg=; b=COFXM3dLKqyn6fwhgZ8VDsw3bEPXEMrxJHKdYC0xzevygL+OCUo89OPnu4iV/TQAUr tRAyn5e6v8lO3rEJfUZiVfOZ0FhM2ECgHmDXQcGA02Nv1HpEMNdNm+OHh1aPsKS38Uuo qPQr7uZdOh1gkvHVCkxgOdtphEB/scEiHdk9eeUHJ1uomceU1eBYNdwLPjgdBY1AjtaA 48AwVrtnLnCY5tC2o7IYosnhG9DOdKNI2ZY47Dwt9JsEhZPBQxxZTBlfpc4pV/7VW59W kx9gSFlGVklMP6TEiLveP8kl3D0f0UKIWrkp+zkv15IfQszJRQfgVaeS2hCyx1jMtO0J QGpg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=fM+iRd5O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y8si1374357ila.86.2021.07.01.17.27.22; Thu, 01 Jul 2021 17:27:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=fM+iRd5O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234312AbhGBAXH (ORCPT + 99 others); Thu, 1 Jul 2021 20:23:07 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:58700 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234195AbhGBAXG (ORCPT ); Thu, 1 Jul 2021 20:23:06 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1625185234; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=rQPsdOkVHeGKBuEsStJMLYq9tWsj5wYRKjPTUAXJMQg=; b=fM+iRd5ODoNG0m8i2Z2qxJvCe54K1AUZBzK/r2rwBJlEAwKbOgI2noqe//ZUWMUGLPeTNB hAnrEk/m7U92hgTkgG/zw6N/HzlVldQAA6tFvpOiEBj1G0JhqNocgfKzKc2Mb8bSS+JHGH unMoj+WswHVsRk6VXzPFNx4cHjYjWUg= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-233-hgA3ORIPMsidc1wjVIcNZw-1; Thu, 01 Jul 2021 20:20:33 -0400 X-MC-Unique: hgA3ORIPMsidc1wjVIcNZw-1 Received: by mail-wm1-f72.google.com with SMTP id j38-20020a05600c1c26b02901dbf7d18ff8so5411634wms.8 for ; Thu, 01 Jul 2021 17:20:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=rQPsdOkVHeGKBuEsStJMLYq9tWsj5wYRKjPTUAXJMQg=; b=FVKE2wPKMdqwdHWhat57kKXqslieq7TNpmY9T97msrN93Ycz4//Ty3x6neWRKVgDFK QeLG6mWsm2w+9isy3Iy2IybBw+EaXN1AoQabjbIDVPE6NLi/OxmvB4Gnhv2XSFtC/blm G97ekOv45ApSssA9jg/k3PIDyP9mIRdLNqLViz7QVD56iNDQBuXlDA/gImKKAjgIaBcY MSw1p+Z35greHpo426ZQnawV2+s340joKkj+dF7zxTvuuG3sJziD36nDxkMs2oYLNHzx DULY+J/kYaSzHGeE614EX+VppsauZDKXNJbs3IV6XhNVa7MFMVa2qdpjPVQZQKmNxs1u hjug== X-Gm-Message-State: AOAM533ps+DVgJfiDEaff2A5cd9YMaIK3DqoAqzHOzuFZd0LqyabjCvr xuBKNJdojSqD5nivyf312eatPO2bqgBJlZwouiI5ZYy25XndzKPtJQJg5rtQL2kQw9/oZsVvvol YY05uV5KeDecfB5dK9ixvryDSjYiFcjjNg3UATj1s X-Received: by 2002:a7b:c4da:: with SMTP id g26mr13499688wmk.3.1625185232447; Thu, 01 Jul 2021 17:20:32 -0700 (PDT) X-Received: by 2002:a7b:c4da:: with SMTP id g26mr13499672wmk.3.1625185232278; Thu, 01 Jul 2021 17:20:32 -0700 (PDT) MIME-Version: 1.0 References: <20210701204246.2037142-1-agruenba@redhat.com> In-Reply-To: From: Andreas Gruenbacher Date: Fri, 2 Jul 2021 02:20:20 +0200 Message-ID: Subject: Re: [PATCH] gfs2: Fix mmap + page fault deadlocks To: Linus Torvalds Cc: Alexander Viro , cluster-devel , Linux Kernel Mailing List , Jan Kara , Matthew Wilcox Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 1, 2021 at 11:41 PM Linus Torvalds wrote: > On Thu, Jul 1, 2021 at 1:43 PM Andreas Gruenbacher wrote: > > here's another attempt at fixing the mmap + page fault deadlocks we're > > seeing on gfs2. Still not ideal because get_user_pages_fast ignores the > > current->pagefault_disabled flag > > Of course get_user_pages_fast() ignores the pagefault_disabled flag, > because it doesn't do any page faults. > > If you don't want to fall back to the "maybe do IO" case, you should > use the FOLL_FAST_ONLY flag - or get_user_pages_fast_only(), which > does that itself. > > > For getting get_user_pages_fast changed to fix this properly, I'd need > > help from the memory management folks. > > I really don't think you need anything at all from the mm people, > because we already support that whole "fast only" case. Yes, fair enough. > Also, I have to say that I think the direct-IO code is fundamentally > mis-designed. Why it is doing the page lookup _during_ the IO is a > complete mystery to me. Why wasn't that done ahead of time before the > filesystem took the locks it needed? That would be inconvenient for reads, when the number of bytes read is much smaller than the buffer size and we won't need to page in the entire buffer. > So what the direct-IO code _should_ do is to turn an ITER_IOVEC into a > ITER_KVEC by doing the page lookup ahead of time, and none of these > issues should even exist, and then the whole pagefault_disabled and/or > FOLL_FAST_ONLY would be a complete non-issue. > > Is there any reason why that isn't what it does (other than historical baggage)? It turns out that there's an even deeper issue with keeping references to user-space pages. Those references will essentially pin the glock of the associated inode to the node. Moving a glock off a node requires truncating the inode's page cache, but the page references would prevent that. So we'd only end up with different kinds of potential deadlocks. If we could get iomap_dio_rw to use "fast only" mode when requested, we could fault in the pages without keeping references, try the IO, and repeat when necessary. Thanks a lot, Adreas