Received: by 2002:ab2:7855:0:b0:1f9:5764:f03e with SMTP id m21csp363137lqp; Wed, 22 May 2024 06:59:16 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWhN2+hLDpDZbRuWZTV0aMv/4kgJ+f0w3lkguCcOoBPfO77AFwVGlT6iXeSicGEIJZeRtk6AgOKGZoqA+vwY2LuOdr43bl0bUZrpNdHRg== X-Google-Smtp-Source: AGHT+IHoW8d8Lx1D6cYOGAU1Z/9cei26maxcWODQabzojXbX1Nm5cTIKhw57+9h2w5oUw1TAnZ2L X-Received: by 2002:a17:902:7849:b0:1f2:f954:d68d with SMTP id d9443c01a7336-1f31c9e6cccmr22650335ad.55.1716386356106; Wed, 22 May 2024 06:59:16 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1716386356; cv=pass; d=google.com; s=arc-20160816; b=MEvUj2CUSOICFZUSBhukKSo5ULUYTAI5Ir369dQmm8KQyOGZF//giesIioMd1smu0B IrfxLJ7cOqvu2PuHevU69jYiLWuG33z9svOrSD/GYn53yN54FLFZU3ucqWgM+Sm+OPbF TwT+J2Y+N1y+yQvsuWQrl+3sNdcoEsOeml7jJ3QuQOK/JFaEHH8GETtMTNxVPSYxDkub BACP4WdD6NVGU8wHu5TNwhyk2MlJ8dU5x0T9yAOC3OISGWeOlso5LIrDDIK0Zs1IP78+ RSpGL8HD7+H0tFMBLeSjxbnhNqFy3J0HNjUDlPRF85aWXV7Ay7GjYUPAvFOAJTKopANG QQow== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=cjEST1jMAgisAXPjhkC6jVf+PoUCWzwnDuoQDxOX6/o=; fh=GaSAYfh5FLoTC6t2Xhg6bNmq0NjRUvBIE2VOTI0dJc4=; b=x65q53A2w0/ZZRe0GLWn/GwLoT/IvkRbmV6w4Ak+LbG5xnrLBaaMeCQTfyWKVkaNMX CWWi1gSdP9ou+Ov2cxQ+d8cj7CFhhlLLBT3TFMHMiONc1xVu8SKauL7BZzleXHDVCkhW ZZgnsd0ldkir5HmxnZsLwi8gZDXX/KYvv/KldQaHvYJcCPq46KsITEZ5uFoY4oD+eu4+ rxZrzoy4lP8WvQJD9ac29k3EzC9hC314bePss6sJj112TKqK4BKBiTTgtowYgl0JqHWq MIl/4P8BwV7/FdNKvty1nC2XGyv6o5ooyigUgK+6QT5mxGKHK0M6kbfRoxyPRybb8dBE UBXw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2023-11-20 header.b=YsjodTtQ; arc=pass (i=1 spf=pass spfdomain=oracle.com dkim=pass dkdomain=oracle.com dmarc=pass fromdomain=oracle.com); spf=pass (google.com: domain of linux-kernel+bounces-186317-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-186317-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=oracle.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id d9443c01a7336-1f312b438e9si24136075ad.123.2024.05.22.06.59.15 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 May 2024 06:59:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-186317-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2023-11-20 header.b=YsjodTtQ; arc=pass (i=1 spf=pass spfdomain=oracle.com dkim=pass dkdomain=oracle.com dmarc=pass fromdomain=oracle.com); spf=pass (google.com: domain of linux-kernel+bounces-186317-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-186317-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=oracle.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id B33EF2812FB for ; Wed, 22 May 2024 13:59:15 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D305F148855; Wed, 22 May 2024 13:55:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="YsjodTtQ" Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 184A613F44F; Wed, 22 May 2024 13:55:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.177.32 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716386146; cv=none; b=GqjelPWLGrUG4FFFyFG3OcSm2d+zavim+EktIWkfBKLKpjIrRYsiWHH9+iSauaZ0lQWmyHDQwJBaFsGb9GShx38YsxcLwtaUfYX7Wu7tcZpNDTbfKeJ7d2XmOGrxxpuwoScvCakz01L0XCVnS2VEaW1mDZ4tz4ek5T6IJ4yvTvY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716386146; c=relaxed/simple; bh=Fv4RgdofNF0g1JF1RNb7kP1z5vab40UP16jtAY2qMKY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=K0bdUs79/6ytcnaJZ6tNIp5X6zEH+GqjXRDlDJfh4kgZwd8EnHGESH45vNHiXenZnMjbz8mbl5VFAmv7+fa0RZucL/6jUSYvEs210VasvwqxD7HxXTGLbcDCN6spqYNK6NYMbsdl0sApHpo3n7PIPdOCHQwGoXX/RvDb0jdV84U= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=YsjodTtQ; arc=none smtp.client-ip=205.220.177.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Received: from pps.filterd (m0333520.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 44MCqdEr013781; Wed, 22 May 2024 13:55:30 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2023-11-20; bh=cjEST1jMAgisAXPjhkC6jVf+PoUCWzwnDuoQDxOX6/o=; b=YsjodTtQ2/cQ+NNKK2zGwVP68kZYMxe8mr4oFhj/27bFMZ15T/FdiSux5vH56bZokVF3 p02XGtQkMcO7lCqOZR3yg+5ZEJ0CNIbVD4V3Z8rd0fhG8/F8t95LXK3DvdhiQBUMMtf5 Jzgrj70BuxDvFqCuWa5H6WX7X4qGUCSHdA/DHzMeJ59WhM43RYz/2rBpKLSUaYb+34jU BZv6iR76VQ4leDqUVF//dQYhhDoO9LOcbocw5r5HxAhTqCmMNIspcZw0yajCL5Uoh7vZ oUkvcwa+ugYW1q2zxymQByLz7ITVJFVriLSffworKkvhplgd7V3Qu2sZ2SZKUvzzObow Kg== Received: from iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta02.appoci.oracle.com [147.154.18.20]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3y6mcdyt6r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 22 May 2024 13:55:30 +0000 Received: from pps.filterd (iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 44MC6s1B019593; Wed, 22 May 2024 13:55:29 GMT Received: from pps.reinject (localhost [127.0.0.1]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3y6js98tsn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 22 May 2024 13:55:29 +0000 Received: from iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 44MDsm3M016070; Wed, 22 May 2024 13:55:29 GMT Received: from lab61.no.oracle.com (lab61.no.oracle.com [10.172.144.82]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTP id 3y6js98su1-13; Wed, 22 May 2024 13:55:28 +0000 From: =?UTF-8?q?H=C3=A5kon=20Bugge?= To: linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, rds-devel@oss.oracle.com Cc: Jason Gunthorpe , Leon Romanovsky , Saeed Mahameed , Tariq Toukan , "David S . Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Tejun Heo , Lai Jiangshan , Allison Henderson , Manjunath Patil , Mark Zhang , =?UTF-8?q?H=C3=A5kon=20Bugge?= , Chuck Lever , Shiraz Saleem , Yang Li Subject: [PATCH v3 6/6] workqueue: Inherit per-process allocation flags Date: Wed, 22 May 2024 15:54:44 +0200 Message-Id: <20240522135444.1685642-13-haakon.bugge@oracle.com> X-Mailer: git-send-email 2.39.3 In-Reply-To: <20240522135444.1685642-1-haakon.bugge@oracle.com> References: <20240522135444.1685642-1-haakon.bugge@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.650,FMLib:17.12.28.16 definitions=2024-05-22_07,2024-05-22_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 mlxscore=0 spamscore=0 adultscore=0 mlxlogscore=999 malwarescore=0 suspectscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2405010000 definitions=main-2405220093 X-Proofpoint-ORIG-GUID: eUWMZFAEmmP3H6GkpJe4GjZk2_mD0AC3 X-Proofpoint-GUID: eUWMZFAEmmP3H6GkpJe4GjZk2_mD0AC3 For drivers/modules running inside a memalloc_flags_{save,restore} region, if a work-queue is created, we make sure work executed on the work-queue inherits the same flag(s). This in order to conditionally enable drivers to work aligned with block I/O devices. This commit makes sure that any work queued later on work-queues created during module initialization, when current's flags has any of the PF_MEMALLOC* set, will inherit the same flags. We do this in order to enable drivers to be used as a network block I/O device. This in order to support XFS or other file-systems on top of a raw block device which uses said drivers as the network transport layer. Under intense memory pressure, we get memory reclaims. Assume the file-system reclaims memory, goes to the raw block device, which calls into said drivers. Now, if regular GFP_KERNEL allocations in the drivers require reclaims to be fulfilled, we end up in a circular dependency. We break this circular dependency by: 1. Force all allocations in the drivers to use GFP_NOIO, by means of a parenthetic use of memalloc_flags_{save,restore} on all relevant entry points, setting/clearing the PF_MEMALLOC_NOIO bit. 2. Make sure work-queues inherits current->flags wrt. PF_MEMALLOC_NOIO, such that work executed on the work-queue inherits the same flag(s). That is what this commit contributes with. Signed-off-by: HÃ¥kon Bugge --- v2 -> v3: * Add support for all PF_MEMALLOC* flags * Re-worded commit message v1 -> v2: * Added missing hunk in alloc_workqueue() --- include/linux/workqueue.h | 9 ++++++ kernel/workqueue.c | 60 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+) diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index fb39938945365..f8c87f824272b 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -406,9 +406,18 @@ enum wq_flags { __WQ_DRAINING = 1 << 16, /* internal: workqueue is draining */ __WQ_ORDERED = 1 << 17, /* internal: workqueue is ordered */ __WQ_LEGACY = 1 << 18, /* internal: create*_workqueue() */ + __WQ_MEMALLOC = 1 << 19, /* internal: execute work with MEMALLOC */ + __WQ_MEMALLOC_NOFS = 1 << 20, /* internal: execute work with MEMALLOC_NOFS */ + __WQ_MEMALLOC_NOIO = 1 << 21, /* internal: execute work with MEMALLOC_NOIO */ + __WQ_MEMALLOC_NORECLAIM = 1 << 22, /* internal: execute work with MEMALLOC_NORECLAIM */ + __WQ_MEMALLOC_NOWARN = 1 << 23, /* internal: execute work with MEMALLOC_NOWARN */ + __WQ_MEMALLOC_PIN = 1 << 24, /* internal: execute work with MEMALLOC_PIN */ /* BH wq only allows the following flags */ __WQ_BH_ALLOWS = WQ_BH | WQ_HIGHPRI, + + __WQ_PF_MEMALLOC_MASK = PF_MEMALLOC | PF_MEMALLOC_NOFS | PF_MEMALLOC_NOIO | + PF_MEMALLOC_NORECLAIM | PF_MEMALLOC_NOWARN | PF_MEMALLOC_PIN, }; enum wq_consts { diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 003474c9a77d0..28ed6b9556e91 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -51,6 +51,7 @@ #include #include #include +#include #include #include #include @@ -3113,6 +3114,28 @@ static bool manage_workers(struct worker *worker) return true; } +static unsigned int wq_build_memalloc_flags(struct pool_workqueue *pwq) +{ + unsigned int pf_flags = 0; + +#define BUILD_PF_FLAGS_FROM_WQ(name) \ + do { \ + if (pwq->wq->flags & __WQ_ ## name) \ + pf_flags |= PF_ ## name; \ + } while (0) + + BUILD_PF_FLAGS_FROM_WQ(MEMALLOC); + BUILD_PF_FLAGS_FROM_WQ(MEMALLOC_NOFS); + BUILD_PF_FLAGS_FROM_WQ(MEMALLOC_NOIO); + BUILD_PF_FLAGS_FROM_WQ(MEMALLOC_NORECLAIM); + BUILD_PF_FLAGS_FROM_WQ(MEMALLOC_NOWARN); + BUILD_PF_FLAGS_FROM_WQ(MEMALLOC_PIN); + +#undef BUILD_PF_FLAGS_FROM_WQ + + return pf_flags; +} + /** * process_one_work - process single work * @worker: self @@ -3136,6 +3159,8 @@ __acquires(&pool->lock) unsigned long work_data; int lockdep_start_depth, rcu_start_depth; bool bh_draining = pool->flags & POOL_BH_DRAINING; + unsigned int memalloc_flags = wq_build_memalloc_flags(pwq); + unsigned int memalloc_flags_old; #ifdef CONFIG_LOCKDEP /* * It is permissible to free the struct work_struct from @@ -3148,6 +3173,10 @@ __acquires(&pool->lock) lockdep_copy_map(&lockdep_map, &work->lockdep_map); #endif + /* Set inherited alloc flags */ + if (memalloc_flags) + memalloc_flags_old = memalloc_flags_save(memalloc_flags); + /* ensure we're on the correct CPU */ WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) && raw_smp_processor_id() != pool->cpu); @@ -3284,6 +3313,10 @@ __acquires(&pool->lock) /* must be the last step, see the function comment */ pwq_dec_nr_in_flight(pwq, work_data); + + /* Restore alloc flags */ + if (memalloc_flags) + memalloc_flags_restore(memalloc_flags_old); } /** @@ -5637,6 +5670,30 @@ static void wq_adjust_max_active(struct workqueue_struct *wq) } while (activated); } +/** + * wq_set_memalloc_flags - Test current->flags for PF_MEMALLOC_FOO_BAR + * flag bits and set the corresponding __WQ_MEMALLOC_FOO_BAR in the + * WQ's flags variable. + * @flags_ptr: Pointer to wq->flags + */ +static void wq_set_memalloc_flags(unsigned int *flags_ptr) +{ +#define TEST_PF_SET_WQ(name) \ + do { \ + if (current->flags & PF_ ## name) \ + *flags_ptr |= __WQ_ ## name; \ + } while (0) + + TEST_PF_SET_WQ(MEMALLOC); + TEST_PF_SET_WQ(MEMALLOC_NOFS); + TEST_PF_SET_WQ(MEMALLOC_NOIO); + TEST_PF_SET_WQ(MEMALLOC_NORECLAIM); + TEST_PF_SET_WQ(MEMALLOC_NOWARN); + TEST_PF_SET_WQ(MEMALLOC_PIN); + +#undef TEST_PF_SET_WQ +} + __printf(1, 4) struct workqueue_struct *alloc_workqueue(const char *fmt, unsigned int flags, @@ -5695,6 +5752,9 @@ struct workqueue_struct *alloc_workqueue(const char *fmt, /* init wq */ wq->flags = flags; + if (current->flags & __WQ_PF_MEMALLOC_MASK) + wq_set_memalloc_flags(&wq->flags); + wq->max_active = max_active; wq->min_active = min(max_active, WQ_DFL_MIN_ACTIVE); wq->saved_max_active = wq->max_active; -- 2.31.1