Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp1193677rwb; Fri, 18 Nov 2022 14:21:16 -0800 (PST) X-Google-Smtp-Source: AA0mqf7Tc0EgYOh9BJx+FVmWbKwSV79jCVkakanJdfQIhi2puI1YHGcXSBlPAIIlU0oAG1evRQ+I X-Received: by 2002:a63:1859:0:b0:476:c9bd:c0d9 with SMTP id 25-20020a631859000000b00476c9bdc0d9mr8530622pgy.415.1668810074874; Fri, 18 Nov 2022 14:21:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668810074; cv=none; d=google.com; s=arc-20160816; b=L6vv/6+VCUTUXGjtHPtajVrrYJYfQtmkIvDMS37uKhjP39fxcddQDFhQEW4xUahHhN 1k8kZ66LbY3buCTx7wvU75BeixYTZfAITDYunwFtXjPN6Hq3+qW+LoKc9Uvlj4jkyP6F ObdH6Xn1c3daX6HExUq9bOhq13+232tb0F4uxeHU9BB0jKeoAV7XRa1IFT8OUHapIWic sIjjMmT2IgkACTj3xGY4nILmcFeoOB6C/znCM9GFiDBpFXp+Dm2puOyJJ+PEVQ19Cmlm GO+ui8VHWouE8tZ0OpLL25zrsbRzMIrsF+GaK1ViqDd92fTEALBcr+Qq+ZSB+ZqQuDDO GICQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=CyExldfbrsqWBNGdEWi4FK0cXws7x5ZfaRqkvG0GwxY=; b=ARUGITnCJi5URNrIElk3Xf8IfmmBU0311XfNTLVOGIWdAjrb6x8X6AcYSOwYmVHDWV KKnSTzf3Txz1hITGiHnwrv5Ry5aTEccCo3ATMbZreAuzEM8UOZleCLGyZnKQRP9Q9yGV 35ciMK4c8i9klJ6iOqxA5O0CkjgCw/IJrtGIGwcr8859/ysUsfrMobxHEhK2cWBSQWNd GgZSKFgzkaJH1+F1sgBw35ouuwdUkhpGWg2Xa4N8TSVW9YuJfN37ATd9F8ul8OB/2ZYv ERLb/kyarbhDfv6kXeJ4fk3Y5wypaXQUHbf3NylUDrUgsd7E9nmzIrUenL+09l/gFkf4 5KWg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=korg header.b=AHI++KLc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l8-20020a17090a850800b00213054baa79si10594623pjn.0.2022.11.18.14.21.03; Fri, 18 Nov 2022 14:21:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=korg header.b=AHI++KLc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230073AbiKRV1p (ORCPT + 90 others); Fri, 18 Nov 2022 16:27:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37866 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229483AbiKRV1n (ORCPT ); Fri, 18 Nov 2022 16:27:43 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6CFC4A28A7 for ; Fri, 18 Nov 2022 13:27:43 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 0A21E625DE for ; Fri, 18 Nov 2022 21:27:43 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 29640C433D6; Fri, 18 Nov 2022 21:27:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1668806862; bh=RKeUbLAWkAqKAQpRa/aMhHUQToW5rk/wNnEtKf8AAVs=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=AHI++KLcg8f1oKeTyVJj9GgaijJqkIhGCizaWFzcfqfKY2Na6sQIY8Rv/9j6Dlnp2 PCy1lMm98PDg76enX/+l1cj3fSgb4vf1ZSWgQaiGwTESXq2ViX5/xhK05feuKPEMcb 0zs/sY66agmdTG8jixrCs7AyIZFi7lqGyH6tqnso= Date: Fri, 18 Nov 2022 13:27:41 -0800 From: Andrew Morton To: Chen Wandun Cc: , , , , , , Huang Ying Subject: Re: [RFC PATCH] swapfile: fix soft lockup in scan_swap_map_slots Message-Id: <20221118132741.aaf6f9081b5a1018cc9a5402@linux-foundation.org> In-Reply-To: <20221118133850.3360369-1-chenwandun@huawei.com> References: <20221118133850.3360369-1-chenwandun@huawei.com> X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.33; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 18 Nov 2022 21:38:50 +0800 Chen Wandun wrote: > A soft lockup occur in scan free swap slot by constructing > huge memory pressure. > The test scenario is: 64 CPU cores, 64GB memory, and 28 > zram devices, the disksize of each zram device is 50MB. > > LATENCY_LIMIT is used to prevent soft lockup in function > scan_swap_map_slots, but the real loop number would more > than LATENCY_LIMIT because of "goto checks and goto scan" > repeatly without decrease of latency limit. > > In order to fix it, move decrease latency_ration code in advance. > > There is also a suspicious place that will cause soft lockup in > function get_swap_pages, in this function, the "goto start_over" > may result in continuous scanning of swap partition, if there is > no cond_sched in scan_swap_map_slots, it would cause soft lockup > (I am not sure about this). > > ... > Looks sensible. > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -972,23 +972,23 @@ static int scan_swap_map_slots(struct swap_info_struct *si, > scan: > spin_unlock(&si->lock); > while (++offset <= READ_ONCE(si->highest_bit)) { > - if (swap_offset_available_and_locked(si, offset)) > - goto checks; > if (unlikely(--latency_ration < 0)) { > cond_resched(); > latency_ration = LATENCY_LIMIT; > scanned_many = true; > } > + if (swap_offset_available_and_locked(si, offset)) > + goto checks; > } > offset = si->lowest_bit; > while (offset < scan_base) { > - if (swap_offset_available_and_locked(si, offset)) > - goto checks; > if (unlikely(--latency_ration < 0)) { > cond_resched(); > latency_ration = LATENCY_LIMIT; > scanned_many = true; > } > + if (swap_offset_available_and_locked(si, offset)) > + goto checks; > offset++; > } > spin_lock(&si->lock); But this does somewhat alter the `scanned_many' logic. We'll now set 'scanned_many` earlier. What are the effects of this? The ed43af10975eef7e changelog outlines tests which could be performed to ensure we aren't regressing from this.