Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp1402572ybf; Thu, 27 Feb 2020 10:16:09 -0800 (PST) X-Google-Smtp-Source: APXvYqxMtF78jr8fLH6aOplTn2djDcxh23xtoMhRm4egfDcpSIaaKwJhjm8OHyNFnIt0RQrWiBfP X-Received: by 2002:aca:2803:: with SMTP id 3mr239258oix.162.1582827369418; Thu, 27 Feb 2020 10:16:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582827369; cv=none; d=google.com; s=arc-20160816; b=U0w23iTU0M+iY3J2cyMR+rCaYxZDnYsUrKCebpqFoner2PlrtVtDWW9IDPSU2IoFLn RNqWGLVzSJti9ncTo0bd8MT8rnM2aXld46L2hN5rO3px/z2nZ58/W6mgxYNoyRhf1jl5 +p5Bb9aLJkGWtXBPd/Euu/TOedASMutyAB12g7TQtakvvsofBBoPenvb2U8JRN/Yrin9 6OrZBorQvPpdqY5zcvvhYMKHxQjuEchFuzG204gc50yxcC8tnwoMxf8FeiJWKgWOB8U4 N6j65JSKaBLn+8wr/ZAt58MvajdNEX7grSupw7NQy9kFbmooGg+i/h3/UaxWlwMTSwql c9DA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=19sQdZH67v9yOy/BduEXSg0i7G3USqaAUKmseX8opy0=; b=boR4Yq7End5gortTYvKgArtjzQ0TV1EhKCh5nQvF+OIlovFPYwSNi7O05CRheps+TU 1RmdNI1Dea3jJ7ImYsTDKusXRlAEdEaA1VoU00E75W1pYVF+GRD4JKfwP+LA5GV7I2EZ XEogHopT2VxAK3C4/v9Y3+C0T0WfAcYmJCNFh+E0qBw565gI0NUWdsLhLIGlyBmgsUMV 9F7wBKyQxdNw+hu5SHQMv34stR/sFNrjZuxviIIMZ4FqOTcL7r4DwAyB/HuGB92Sc9Zv 143na7y+RikIO2cG2ed+//l8+fMilTIAP2KwNuC4xxUOFPx4G5q9kAiNL/XIxaoRtvNd 3MKg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=0IG8plhA; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p83si294749oih.198.2020.02.27.10.15.57; Thu, 27 Feb 2020 10:16:09 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=0IG8plhA; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729610AbgB0SOf (ORCPT + 99 others); Thu, 27 Feb 2020 13:14:35 -0500 Received: from mail-qk1-f193.google.com ([209.85.222.193]:39603 "EHLO mail-qk1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729317AbgB0SOe (ORCPT ); Thu, 27 Feb 2020 13:14:34 -0500 Received: by mail-qk1-f193.google.com with SMTP id e16so283070qkl.6 for ; Thu, 27 Feb 2020 10:14:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=19sQdZH67v9yOy/BduEXSg0i7G3USqaAUKmseX8opy0=; b=0IG8plhA6YTGx0j6rHUrt9WyouEz5FGMjybnDuDbUR7axVZhu+rgZlDa76gtdh8YwP /BhvP4PrfAUfKbf4wWhSFJOqBrI7JQ+9WpS63f7uxvjPet4NPXoB3dmwuNdmFwjnk4FX 1jVKXf8c2T1gFBZuOnwIqt51u2dUPq1cgjvR7zFm0WU9lL6ANWW6XIFJIHX6vK0OhLS1 J7DzHM0N4FEAk+lKkdDVdPBUOvPR0KzwO8QnjgEOy/J9Ojgvd04vxXAf02MIPcEGOpxb EivvMkXIRJkPfKGgDKl8kcG3bTQe25P1HIgY1nQs6rbBZBfIFa/9YWSNslzmqZbSVSLN u4lw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=19sQdZH67v9yOy/BduEXSg0i7G3USqaAUKmseX8opy0=; b=AH7kbz/q7PTyGYMOXaeBTL9qGcVi25426kC7hvNjOU6PWZm08RE9ysCiYPaytIJrb+ /iU4gGiq1d0R5risGGaLWqtfuTRAJ3j5IwVKimHaHklcLCglWA83aKGkREdpM5sakcSR HrR3OVXe3S5Zknx64onTSTyYayWb5hIAVtIccGxiwjsrYRlwvjeRwRoETmdkPFbZMdiz 9fw41MSGEBm6OIF3LqB4XZqoNjPT2CGdo83MJnMtNIIz5QQAbUPDbTY8ZsrnIHXtfZh5 inuilWum60QKfMgQg1YxPSHmzxGC6HRrgih+J4RAwTIYJZ0Qcue9qQQLJ9SGrhgkD9cc TRIw== X-Gm-Message-State: APjAAAVkKVNs3uZ9CiceTSBmarEVxr58ahCVf0VW/03goCT7Wj7tySVy ZJCoIiQDCHF1wOQgTWNfsQezPQ== X-Received: by 2002:a05:620a:10ac:: with SMTP id h12mr512814qkk.487.1582827272500; Thu, 27 Feb 2020 10:14:32 -0800 (PST) Received: from localhost ([2620:10d:c091:500::3:2450]) by smtp.gmail.com with ESMTPSA id t29sm3712868qtt.20.2020.02.27.10.14.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Feb 2020 10:14:31 -0800 (PST) Date: Thu, 27 Feb 2020 13:14:30 -0500 From: Johannes Weiner To: Qian Cai Cc: Dan Schatzberg , Jens Axboe , Tejun Heo , Li Zefan , Michal Hocko , Vladimir Davydov , Andrew Morton , Hugh Dickins , Roman Gushchin , Shakeel Butt , Chris Down , Yang Shi , Thomas Gleixner , "open list:BLOCK LAYER" , open list , "open list:CONTROL GROUP (CGROUP)" , "open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)" Subject: Re: [PATCH v3 1/3] loop: Use worker per cgroup instead of kworker Message-ID: <20200227181430.GA44024@cmpxchg.org> References: <1582736558.7365.131.camel@lca.pw> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1582736558.7365.131.camel@lca.pw> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 26, 2020 at 12:02:38PM -0500, Qian Cai wrote: > On Mon, 2020-02-24 at 17:17 -0500, Dan Schatzberg wrote: > > Existing uses of loop device may have multiple cgroups reading/writing > > to the same device. Simply charging resources for I/O to the backing > > file could result in priority inversion where one cgroup gets > > synchronously blocked, holding up all other I/O to the loop device. > > > > In order to avoid this priority inversion, we use a single workqueue > > where each work item is a "struct loop_worker" which contains a queue of > > struct loop_cmds to issue. The loop device maintains a tree mapping blk > > css_id -> loop_worker. This allows each cgroup to independently make > > forward progress issuing I/O to the backing file. > > > > There is also a single queue for I/O associated with the rootcg which > > can be used in cases of extreme memory shortage where we cannot allocate > > a loop_worker. > > > > The locking for the tree and queues is fairly heavy handed - we acquire > > the per-loop-device spinlock any time either is accessed. The existing > > implementation serializes all I/O through a single thread anyways, so I > > don't believe this is any worse. > > > > Signed-off-by: Dan Schatzberg > > Acked-by: Johannes Weiner > > The locking in loop_free_idle_workers() will trigger this with sysfs reading, > > [ 7080.047167] LTP: starting read_all_sys (read_all -d /sys -q -r 10) > [ 7239.842276] cpufreq transition table exceeds PAGE_SIZE. Disabling > > [ 7247.054961] ===================================================== > [ 7247.054971] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected > [ 7247.054983] 5.6.0-rc3-next-20200226 #2 Tainted: G???????????O????? > [ 7247.054992] ----------------------------------------------------- > [ 7247.055002] read_all/8513 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: > [ 7247.055014] c0000006844864c8 (&fs->seq){+.+.}, at: file_path+0x24/0x40 > [ 7247.055041]? > ???????????????and this task is already holding: > [ 7247.055061] c0002006bab8b928 (&(&lo->lo_lock)->rlock){..-.}, at: > loop_attr_do_show_backing_file+0x3c/0x120 [loop] > [ 7247.055078] which would create a new lock dependency: > [ 7247.055105]??(&(&lo->lo_lock)->rlock){..-.} -> (&fs->seq){+.+.} > [ 7247.055125]? > ???????????????but this new dependency connects a SOFTIRQ-irq-safe lock: > [ 7247.055155]??(&(&lo->lo_lock)->rlock){..-.} > [ 7247.055156]? > ???????????????... which became SOFTIRQ-irq-safe at: > [ 7247.055196]???lock_acquire+0x130/0x360 > [ 7247.055221]???_raw_spin_lock_irq+0x68/0x90 > [ 7247.055230]???loop_free_idle_workers+0x44/0x3f0 [loop] > [ 7247.055242]???call_timer_fn+0x110/0x5f0 > [ 7247.055260]???run_timer_softirq+0x8f8/0x9f0 > [ 7247.055278]???__do_softirq+0x34c/0x8c8 > [ 7247.055288]???irq_exit+0x16c/0x1d0 > [ 7247.055298]???timer_interrupt+0x1f0/0x680 > [ 7247.055308]???decrementer_common+0x124/0x130 > [ 7247.055328]???arch_local_irq_restore.part.8+0x34/0x90 > [ 7247.055352]???cpuidle_enter_state+0x11c/0x8f0 > [ 7247.055361]???cpuidle_enter+0x50/0x70 > [ 7247.055389]???call_cpuidle+0x4c/0x90 > [ 7247.055398]???do_idle+0x378/0x470 > [ 7247.055414]???cpu_startup_entry+0x3c/0x40 > [ 7247.055442]???start_secondary+0x7a8/0xa80 > [ 7247.055461]???start_secondary_prolog+0x10/0x14 That's kind of hilarious. So even though it's a spin_lock_irq(), suggesting it's used from both process and irq context, Dan appears to be adding the first user that actually runs from irq context. It looks like it should have been a regular spinlock all along. Until now, anyway. Fixing it should be straight-forward. Use get_file() under lock to pin the file, drop the lock to do file_path(), release file with fput().