Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp691952pxb; Thu, 21 Oct 2021 07:44:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxnosDqdP9LsbPEWJ4htD6wyayj5eP1K4Jmy5YVe7gECItd90Cigd+attmeRuWaVQiQZHTF X-Received: by 2002:a65:6288:: with SMTP id f8mr4711555pgv.81.1634827462572; Thu, 21 Oct 2021 07:44:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1634827462; cv=none; d=google.com; s=arc-20160816; b=em6P68vKqiFkxnVOF7MTGemFwkjcM7/uIfV9wN9DSY3zasWCMbdhpEGBzF4pSXmgJ4 ckTaUg3R0QhZGEte05/+YaleBwX7xLnDOJ0jDUoGUmheXr58xU8u3ZQ9HsGpZBCPyvow BUbbs0DiUQ8Wpi5OFEBN/dUS1FmZTR/iAsUpZS9ftPvwgbMAj3UouR/xF3Q3wxzBUgD6 zySjd6HgEmqE0TC/jN9iIvoYZiotz9GsjNcN3zTmhMrpCYZKjzfYpTEhtLJZ0ftpGNBM HzjR3g3xIGrQ7UD2BzH5JyFRhJdCv/e1R3Aj0/fOLhxvI1/qtplCg0u/kDxVJPha/fVi yKVA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=SmEzbXtfEVt2BpM7wSrp23JreLNUFM7GXMWeh1u9ztI=; b=xZCNYWypiy0Em/S2jpYmOt5aouDlNvFiEIAReVXbz6fR1g4ZhTm59pB4X40EsegFYa S5a71rzNVmVZ+hx3OAeX1kwyiy3tN7GZo6CnZUbwiCAmY57L876+S+ubmaXU6paWOH/O hbEcTBG9xcnwtVjBPkjbXYgmI1+b7unSjxmVC+NiQQky5VN9cqAA39YWQ5Pu65hYTRcN nbWqVjqFj8Qwb6ZxtK3U/QuAk2s9qTJ7JcIpl4zCEC4HWy3Zj6ED/sioFnDGFWPDOJzp KzhgWP/ZUV7LWMQ+POdFCFXQfe3iuDV8dJzYSUccoebquKcSfVEGLUMCtpdc1gSqAGE2 iY5A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NFjey+Rp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a10si8452184pjv.62.2021.10.21.07.44.09; Thu, 21 Oct 2021 07:44:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NFjey+Rp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231485AbhJUOpG (ORCPT + 99 others); Thu, 21 Oct 2021 10:45:06 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:57413 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231425AbhJUOpF (ORCPT ); Thu, 21 Oct 2021 10:45:05 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1634827368; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=SmEzbXtfEVt2BpM7wSrp23JreLNUFM7GXMWeh1u9ztI=; b=NFjey+Rp3UFB62KQxXMPHItRcF96+7DSEXbGiw0H+O9h+ncRJKEOXlRh+domfAFeEZ47UK Zj6cOZeKsUNYle2RH9SJwAcwxpFDe7KlG64PBi8kMgqwkHmZHo7GdJyXGCULOXcT49XWvx FsXgSAR+dq/rNd32WRKOc6za0yZR5WA= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-588-zcMdDC5OOlytQcFGfr0tZA-1; Thu, 21 Oct 2021 10:42:47 -0400 X-MC-Unique: zcMdDC5OOlytQcFGfr0tZA-1 Received: by mail-wm1-f72.google.com with SMTP id p3-20020a05600c204300b0030daa138dfeso322837wmg.9 for ; Thu, 21 Oct 2021 07:42:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=SmEzbXtfEVt2BpM7wSrp23JreLNUFM7GXMWeh1u9ztI=; b=UGhBFc1cXNqAqNV9UaEhMvHmFpN15OHKWyOpqUQQGZIXfZt48YY9P+4YaNTxtiMUPu rP50749TqcY3nRA9yAsF+FmV1v+JewVkywm109rzgkOLTNQuYZkR0dJW2K5dI9NOeLeZ RFKc90DL7+hcadk+x86f6eVIuJcctQhYbr4Sd2i0dmWboprL34QM07BQ6VRUf0YZatAH I5M+8pcfCQwYE5QHiuureJq3VyDG2W5ViSXJ4cofG5Oaadd03bwMvojdmenm4A1y0Whx gEYP1qwMIzVbvfd6xG2AC6dQkYjiOk0SFB1Cq4/ClJUQVrUdCJB/Bo/J+z07xnYa2vP9 /I8Q== X-Gm-Message-State: AOAM533MTm1e0FXzkieQ2pSaJMpxGbz9cpE8aXzWksPfzlsLjZENixfP kCJNOdh17HzwBHenl19dgrtiK18SB06hWvhFrwG3xsSp4IOgafNI6anSDoYy9yceRpSj5dfLIyb kmORXNTSkLMB9SlgK8wwJJiuh1jYXgsFihTnlR4su X-Received: by 2002:adf:c78d:: with SMTP id l13mr7544017wrg.134.1634827364942; Thu, 21 Oct 2021 07:42:44 -0700 (PDT) X-Received: by 2002:adf:c78d:: with SMTP id l13mr7543974wrg.134.1634827364652; Thu, 21 Oct 2021 07:42:44 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Andreas Gruenbacher Date: Thu, 21 Oct 2021 16:42:33 +0200 Message-ID: Subject: Re: [RFC][arm64] possible infinite loop in btrfs search_ioctl() To: Catalin Marinas Cc: Linus Torvalds , Al Viro , Christoph Hellwig , "Darrick J. Wong" , Jan Kara , Matthew Wilcox , cluster-devel , linux-fsdevel , Linux Kernel Mailing List , "ocfs2-devel@oss.oracle.com" , Josef Bacik , Will Deacon Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 21, 2021 at 12:06 PM Catalin Marinas wrote: > On Thu, Oct 21, 2021 at 02:46:10AM +0200, Andreas Gruenbacher wrote: > > On Tue, Oct 12, 2021 at 1:59 AM Linus Torvalds > > wrote: > > > On Mon, Oct 11, 2021 at 2:08 PM Catalin Marinas wrote: > > > > > > > > +#ifdef CONFIG_ARM64_MTE > > > > +#define FAULT_GRANULE_SIZE (16) > > > > +#define FAULT_GRANULE_MASK (~(FAULT_GRANULE_SIZE-1)) > > > > > > [...] > > > > > > > If this looks in the right direction, I'll do some proper patches > > > > tomorrow. > > > > > > Looks fine to me. It's going to be quite expensive and bad for caches, though. > > > > > > That said, fault_in_writable() is _supposed_ to all be for the slow > > > path when things go south and the normal path didn't work out, so I > > > think it's fine. > > > > Let me get back to this; I'm actually not convinced that we need to > > worry about sub-page-size fault granules in fault_in_pages_readable or > > fault_in_pages_writeable. > > > > From a filesystem point of view, we can get into trouble when a > > user-space read or write triggers a page fault while we're holding > > filesystem locks, and that page fault ends up calling back into the > > filesystem. To deal with that, we're performing those user-space > > accesses with page faults disabled. > > Yes, this makes sense. > > > When a page fault would occur, we > > get back an error instead, and then we try to fault in the offending > > pages. If a page is resident and we still get a fault trying to access > > it, trying to fault in the same page again isn't going to help and we > > have a true error. > > You can't be sure the second fault is a true error. The unlocked > fault_in_*() may race with some LRU scheme making the pte not accessible > or a write-back making it clean/read-only. copy_to_user() with > pagefault_disabled() fails again but that's a benign fault. The > filesystem should re-attempt the fault-in (gup would correct the pte), > disable page faults and copy_to_user(), potentially in an infinite loop. > If you bail out on the second/third uaccess following a fault_in_*() > call, you may get some unexpected errors (though very rare). Maybe the > filesystems avoid this problem somehow but I couldn't figure it out. Good point, we can indeed only bail out if both the user copy and the fault-in fail. But probing the entire memory range in fault domain granularity in the page fault-in functions still doesn't actually make sense. Those functions really only need to guarantee that we'll be able to make progress eventually. From that point of view, it should be enough to probe the first byte of the requested memory range, so when one of those functions reports that the next N bytes should be accessible, this really means that the first byte surely isn't permanently inaccessible and that the rest is likely accessible. Functions fault_in_readable and fault_in_writeable already work that way, so this only leaves function fault_in_safe_writeable to worry about. > > We're clearly looking at memory at a page > > granularity; faults at a sub-page level don't matter at this level of > > abstraction (but they do show similar error behavior). To avoid > > getting stuck, when it gets a short result or -EFAULT, the filesystem > > implements the following backoff strategy: first, it tries to fault in > > a number of pages. When the read or write still doesn't make progress, > > it scales back and faults in a single page. Finally, when that still > > doesn't help, it gives up. This strategy is needed for actual page > > faults, but it also handles sub-page faults appropriately as long as > > the user-space access functions give sensible results. > > As I said above, I think with this approach there's a small chance of > incorrectly reporting an error when the fault is recoverable. If you > change it to an infinite loop, you'd run into the sub-page fault > problem. Yes, I see now, thanks. > There are some places with such infinite loops: futex_wake_op(), > search_ioctl() in the btrfs code. I still have to get my head around > generic_perform_write() but I think we get away here because it faults > in the page with a get_user() rather than gup (and copy_from_user() is > guaranteed to make progress if any bytes can still be accessed). Thanks, Andreas