Received: by 2002:ab2:6816:0:b0:1f9:5764:f03e with SMTP id t22csp1097694lqo; Fri, 17 May 2024 10:31:05 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCW1we+g84knGTNEhfix/XejuZUTphtm323Y3fuOUGIs44KHNbgfAjqXsp/5zZCDvxWFWiZj8E8aUmXzi4ojwgYFPb6W8jJyUHibbN/pcA== X-Google-Smtp-Source: AGHT+IGMA1iYYXYnkM1WBoeV1i7Kmh+PUoN/2NEfoOwvIbPh6H4PkACHh7i0/7W0wc4051rPmSH8 X-Received: by 2002:a05:6102:374f:b0:480:7299:efc5 with SMTP id ada2fe7eead31-48077b3b106mr19981848137.2.1715967065120; Fri, 17 May 2024 10:31:05 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1715967065; cv=pass; d=google.com; s=arc-20160816; b=Ln3Bw96H9Q6GoRwGjc6ke4q3o+NjqFN2Xrs9WvzTJqTcpMfZVBRnQEYKlUgWy3mA8o xCoybtUbnOlIc/618DJ+ogaaMIVGW0Hk85VZv3/Ziv6GToV2l73vZW+PwHy3ZIRHL9yU JVD0W3WoQRiLgB1eM1omJ2wGeNj4LFa1o27Myy5woxXSdG+52bhpzfzOktI1imc7opsK 8da0QH0gqiqkdgPvByB63gGPvUWM0K6xltsm0RITgHpDb2tAkvaLpDqa8gO+6J7o4dld L7O3eceFGg2zhHianNUKjFgX//geWQOg1kKJBxRDoZe4l1uenYp8vCUcaUNfOOFdbF5w 7aMQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :references:message-id:subject:cc:to:from:date:dkim-signature; bh=4G1AoFSIpqiKBdHTLG/UG3KtMvn8GSyLhR2c4jHS3UM=; fh=WVYdAdm+KbPusgaCrLp1ZLASxya1/C/iaup0DOBRZXE=; b=mXPC4VwByJnk2zNNfnfoiP8hiKPcwutxthNujfgIqlos1dKCSeXksVnhE8sus9DubN UrnNnoPIaLZSsrMOkV5IsPVaVug4dFfwxEMYgPX6U7dUlSg4HK8XkH6L2XtbISpbdGYd AP4S+xxA0jyo4XBN/0D/EIXG9tu92/76xdvR5X2u7khz1xuHvZBC5s0b7z/D8vmG3G7E wAKeqcYw/bJRmq5YZoVVsip9vRrpUrGeJ/zRpVGJwNNZ64YFC+Qq00XgD8Dtbi1vpPii 3GSYWHH34Y72+3GClZ01YZi+BWCVUnh5+embXLOOxvJ2XEumgYTijNScgSsw3fmi/eGt j8Uw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b="JtFMb3r/"; arc=pass (i=1 spf=pass spfdomain=ziepe.ca dkim=pass dkdomain=ziepe.ca); spf=pass (google.com: domain of linux-kernel+bounces-182415-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-182415-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id ada2fe7eead31-4806cc4328dsi3395013137.247.2024.05.17.10.31.04 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 May 2024 10:31:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-182415-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b="JtFMb3r/"; arc=pass (i=1 spf=pass spfdomain=ziepe.ca dkim=pass dkdomain=ziepe.ca); spf=pass (google.com: domain of linux-kernel+bounces-182415-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-182415-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id BF4061C20D82 for ; Fri, 17 May 2024 17:31:04 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 3ADDA13DDC7; Fri, 17 May 2024 17:30:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b="JtFMb3r/" Received: from mail-qv1-f53.google.com (mail-qv1-f53.google.com [209.85.219.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 81A0813DDA0 for ; Fri, 17 May 2024 17:30:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.53 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715967053; cv=none; b=srvHz3CXWhzOGXTQVJDlDH5mkF3dKAPOt7wQLzGD7dL0DMEsG7dlqnFwQ7GsqgrOO2ChhO2Vy41Gc9/JdzvXCTBxtoqXvTxe/keASnycMdi6ucJ2eSxXud3BQT4QxIoKJMV83JYL5kxryLKlLGkuU6Bse2Y73rz9qEhMZy1MTGI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715967053; c=relaxed/simple; bh=c2QtSZdCkA5RFZOYVJWDhlgUfVfylInXYEoMha5QZYU=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=ogwQTLM9Uc2r0WzEAUXaiCuXtiMKezi39lDmH/eL1HTnmlG9zEQurRtD59UHtofmab+b6Tr5/gVql3S+IyMuVwyzdzJHah198KkV1bcYMpBaCXnovTpNyAxltW5fgD87XZyg2CS3mjHwiQV2NvIu970BMVRo54+ww9yaivvI9+I= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca; spf=pass smtp.mailfrom=ziepe.ca; dkim=pass (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b=JtFMb3r/; arc=none smtp.client-ip=209.85.219.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=ziepe.ca Received: by mail-qv1-f53.google.com with SMTP id 6a1803df08f44-6a0ffaa079dso7411976d6.1 for ; Fri, 17 May 2024 10:30:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; t=1715967050; x=1716571850; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=4G1AoFSIpqiKBdHTLG/UG3KtMvn8GSyLhR2c4jHS3UM=; b=JtFMb3r/IWgBydXEXsMo+KkL09GnZn6UTWU4leMB/1jYSI76vDBnTTWzsvcwis1qYA BX5TrmI+9X7OTa8s6Q8pfSCNUqVLsavwh+kdLO7rv3l36QUPMvR5c0ytfnkJ7KiAbRkF AHMy56JBvJ4K/AeV8JeXKewXcCpBE1x+DrDqmQNwrUwhUbE4RUR9WauZ6G7LcbSjD/SR b4ULMu8KZdnvidq88tzyjoMqTu8wo8ONiHpFsrhb/HE12Is2h3Kr1GrQuPjudIKcr3hr mofLy/YNB5QOiVfnMH1utgQuBWKHWfgP2wlqOgB1W7Ao36TTG8S7H/pVRgQm5eIAqqKz 2pNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715967050; x=1716571850; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=4G1AoFSIpqiKBdHTLG/UG3KtMvn8GSyLhR2c4jHS3UM=; b=c3++ELNGb39PdGx8tECPv0PuJ81ftiL0qh811fVVbM8Eg4FcefcnZ2EiYCr7GmaOGd 65KRbFm4xHA8pkFmiJoNLL6HCUMaQmZJVBj7i1MyLNqqbll1bOfMvVASUgDUbtFzelPB qVBqA5mGM+Dr0xoeHD6Gwu8fDZX+nDv6CCiqWRnYE2/6UB9azdqgQ2WgajhVmU6UHmyN ymajYJckL+lY3sWsjbrIHPbwD/8+IbAOA4RBhmHbp6y+a6EcngOs9GiFrqGW2SqwUvP/ FlJCyk4NWOp8mcT897QQaL6b5idtZOUMd3XcMH1DrAkcY3C+758GfF52CbTjmMy0kyNg SuQQ== X-Forwarded-Encrypted: i=1; AJvYcCVrk3LVGqFDazRH89k/tQkneC7J33ZZ4e6kOrFiylwCST73GBzqeSFKM+8so71ZLnbSvbhHJmFYQhv2ZitpzaJ0FGGfJ28MpcB2LlBm X-Gm-Message-State: AOJu0YzsEYKDWqCyDjEkshunGQDTBLLYL2IbwjGLCCnCBF8RNzerPSDb l/NfXiwcX3vBzbS96SgUVZ/Vx+3sLw3JJsZAqshAA9F3zW327QNGmvFHHtYJdU8= X-Received: by 2002:a0c:f889:0:b0:6a0:9607:a441 with SMTP id 6a1803df08f44-6a15cc965f2mr477287196d6.28.1715967050326; Fri, 17 May 2024 10:30:50 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-142-68-80-239.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.68.80.239]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6a465aed09bsm10136526d6.29.2024.05.17.10.30.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 May 2024 10:30:49 -0700 (PDT) Received: from jgg by wakko with local (Exim 4.95) (envelope-from ) id 1s81Ps-000Ixn-JM; Fri, 17 May 2024 14:30:48 -0300 Date: Fri, 17 May 2024 14:30:48 -0300 From: Jason Gunthorpe To: Haakon Bugge Cc: OFED mailing list , open list , netdev , "rds-devel@oss.oracle.com" , Leon Romanovsky , Saeed Mahameed , Tariq Toukan , "David S . Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Tejun Heo , Lai Jiangshan , Allison Henderson , Manjunath Patil , Mark Zhang , Chuck Lever III , Shiraz Saleem , Yang Li Subject: Re: [PATCH 0/6] rds: rdma: Add ability to force GFP_NOIO Message-ID: <20240517173048.GA69273@ziepe.ca> References: <20240513125346.764076-1-haakon.bugge@oracle.com> <72BE64EC-3CB8-469C-85CB-F97671C0E867@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <72BE64EC-3CB8-469C-85CB-F97671C0E867@oracle.com> On Tue, May 14, 2024 at 06:19:53PM +0000, Haakon Bugge wrote: > Hi Jason, > > > > On 14 May 2024, at 01:03, Jason Gunthorpe wrote: > > > > On Mon, May 13, 2024 at 02:53:40PM +0200, HÃ¥kon Bugge wrote: > >> This series enables RDS and the RDMA stack to be used as a block I/O > >> device. This to support a filesystem on top of a raw block device > >> which uses RDS and the RDMA stack as the network transport layer. > >> > >> Under intense memory pressure, we get memory reclaims. Assume the > >> filesystem reclaims memory, goes to the raw block device, which calls > >> into RDS, which calls the RDMA stack. Now, if regular GFP_KERNEL > >> allocations in RDS or the RDMA stack require reclaims to be fulfilled, > >> we end up in a circular dependency. > >> > >> We break this circular dependency by: > >> > >> 1. Force all allocations in RDS and the relevant RDMA stack to use > >> GFP_NOIO, by means of a parenthetic use of > >> memalloc_noio_{save,restore} on all relevant entry points. > > > > I didn't see an obvious explanation why each of these changes was > > necessary. I expected this: > > > >> 2. Make sure work-queues inherits current->flags > >> wrt. PF_MEMALLOC_{NOIO,NOFS}, such that work executed on the > >> work-queue inherits the same flag(s). > > When the modules initialize, it does not help to have 2., unless > PF_MEMALLOC_NOIO is set in current->flags. That is most probably not > set, e.g. considering modprobe. That is why we have these steps in > all the five modules. During module initialization, work queues are > allocated in all mentioned modules. Therefore, the module > initialization functions need the paranthetic use of > memalloc_noio_{save,restore}. And why would I need these work queues to have noio? they are never called under a filesystem. You need to explain in every single case how something in a NOIO context becomes entangled with the unrelated thing you are taggin NIO. Historically when we've tried to do this we gave up because the entire subsystem end up being NOIO. > > And further, is there any validation of this? There is some lockdep > > tracking of reclaim, I feel like it should be more robustly hooked up > > in RDMA if we expect this to really work.. > > Oracle is about to launch a product using this series, so the > techniques used have been thoroughly validated, allthough on an > older kernel version. That doesn't really help keep it working. I want to see some kind of lockdep scheme to enforce this that can validate without ever triggering reclaim. Jason