Received: by 2002:a89:413:0:b0:1fd:dba5:e537 with SMTP id m19csp507111lqs; Thu, 13 Jun 2024 17:59:03 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVtRy9pR4DLCGpvWB9ppVHrTYMVzRVvhiiHBIyF9DWsG73Sqb997oVyzOf9QLj6pZpJSMsjJLuN3yyJePhJZQG93bvmfg67umKkj46avA== X-Google-Smtp-Source: AGHT+IFY3UJ1sr3mIQsotDuJWW6JodA1D9JuHGmTurZ4L/eg47pi9MTbLuoggmVmgJ9hfcxft9pK X-Received: by 2002:a50:d482:0:b0:579:cd46:cbfd with SMTP id 4fb4d7f45d1cf-57cbd688307mr656732a12.18.1718326743680; Thu, 13 Jun 2024 17:59:03 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1718326743; cv=pass; d=google.com; s=arc-20160816; b=ZMsEgUO7i5cGUj/7/rEtzwSDz3jM8OGsvMZsgl1NUUYlDgR+ymZGrl8oV50I85D3K/ A6kFWFAaBcJB0tJpjMcY/Qv4YMJN27vING1o9aZYX+uB7jiA4zVbU3wKMX5vo9Ro9Pom BJJCdgP0/QcbyvsP0Sa/xK4xvpDDvlovCKpQLWfNfMJ1KvPsw+EvTXFZrqEpcHknXj/U NmPSRzsurTTn2dqWPxEqA8NKDbhjHMPlXjespf8ZX21CF9JMenpQVNwN+BOSekYyG5/z YUKAY8SBTTu8LeHMfmAq1TExtiZRH8AHF3b4czdlsXxCtROyRsL661ODuby+aE+kjyV2 CJIw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:subject:from:cc:to:content-language :user-agent:mime-version:list-unsubscribe:list-subscribe:list-id :precedence:date:message-id; bh=Ym5yhUpcYw4P5pTmKgyBj024TG05NWa0UEICqrI0o10=; fh=R7VicklCmNNQGZ9ohSy786lzZF+0WQlI5mopAA//xTA=; b=Ug6h9BuhU4drq7+9tHGXvGfvooKQ+QgHnwTKlF0ww+jevMrPHuquopX0d784hOF7Ej yu23QcXfUVf2YD6YLVyA1y5n7VgsMAK4BdR2DjaUtAT+VFJ8/Lpx487iarfVF3gd4+Ut xC+jfGv2KE4kkQ+sWWvXEjpYYJcmgm13YonYITyLg0sr0srHqw0nPLiJh8h3VAUvtGPD BWDK+CQuXI0AT+2l6XxGvZ2HuiFB5V3CtKLudtCs6FlwUOhhwWKQf5e1TfiHA2ji6yig lo7OKXXqK9wj7tkPne3Xz2PdRIxumBPARjdB/NdJ8JmxKSpoOUQj6XUXqczyQM1o7VrR soeQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=i-love.sakura.ne.jp); spf=pass (google.com: domain of linux-kernel+bounces-214165-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-214165-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id 4fb4d7f45d1cf-57cb72d6a0bsi1198374a12.15.2024.06.13.17.59.03 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Jun 2024 17:59:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-214165-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=i-love.sakura.ne.jp); spf=pass (google.com: domain of linux-kernel+bounces-214165-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-214165-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 298091F21B64 for ; Fri, 14 Jun 2024 00:59:03 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 346DA4C6E; Fri, 14 Jun 2024 00:58:56 +0000 (UTC) Received: from www262.sakura.ne.jp (www262.sakura.ne.jp [202.181.97.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0092B4A1E for ; Fri, 14 Jun 2024 00:58:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.181.97.72 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718326735; cv=none; b=Z48aZgghjcVXRwP3qLmHzryloBaMSO/7YXQ9ZJnuOGDMvLwaTC11wkY9dEDW53YCnYhSKbNTgUga9VX09YKXPYHqqlaWtwC3cd/fYamHg/znjpFR8YH9v8GLd4/uPYOC5hQpS/KEYx3Hj7ssxF12+df2bbfvfGVL8c7aEQPaj2k= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718326735; c=relaxed/simple; bh=CfF7rEHXphm3s130Ro698JtGqyNjF3kYHAoHL1WeLSg=; h=Message-ID:Date:MIME-Version:To:Cc:From:Subject:Content-Type; b=mBenN2NbmHwPZJXn9jrHvz/1r4TQA6+eMDmNqOvAfOFGa67Vpm7NUxAnIlRtiNow9xdNadyC4p+zGGebjMNOP6WOo/V+xN7NQvHKCwFbe8m1D1ZypGk2FsU1BhYnEaRwKpTyhdAc6/A27EYR5t1s4QY10fRfdVCGb5efD7a+kuU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=I-love.SAKURA.ne.jp; spf=pass smtp.mailfrom=I-love.SAKURA.ne.jp; arc=none smtp.client-ip=202.181.97.72 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=I-love.SAKURA.ne.jp Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=I-love.SAKURA.ne.jp Received: from fsav313.sakura.ne.jp (fsav313.sakura.ne.jp [153.120.85.144]) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTP id 45E0woAe041804; Fri, 14 Jun 2024 09:58:50 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav313.sakura.ne.jp (F-Secure/fsigk_smtp/550/fsav313.sakura.ne.jp); Fri, 14 Jun 2024 09:58:50 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/fsav313.sakura.ne.jp) Received: from [192.168.1.6] (M106072142033.v4.enabler.ne.jp [106.72.142.33]) (authenticated bits=0) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTPSA id 45E0wodJ041800 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NO); Fri, 14 Jun 2024 09:58:50 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Message-ID: <8d61200a-a739-4200-a8a3-5386a834d44f@I-love.SAKURA.ne.jp> Date: Fri, 14 Jun 2024 09:58:48 +0900 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: Jamal Hadi Salim , Cong Wang , Jiri Pirko Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Network Development , LKML From: Tetsuo Handa Subject: [net/sched] Question: Locks for clearing ERR_PTR() value from idrinfo->action_idr ? Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Hello. syzbot is reporting hung task problems involving rtnl_muxex. A debug printk() patch added to linux-next-20240611 suggested that many of them are caused by an infinite busy loop inside tcf_idr_check_alloc(). ---------- again: rcu_read_lock(); p = idr_find(&idrinfo->action_idr, *index); if (IS_ERR(p)) { /* This means that another process allocated * index but did not assign the pointer yet. */ rcu_read_unlock(); goto again; } ---------- Since there is no sleep (e.g. cond_resched()/schedule_timeout_uninterruptible(1)) before "goto again;", once idr_find() returns an IS_ERR() value, all of that CPU's computation resource is wasted forever with rtnl_mutex held (and anybody else who tries to hold rtnl_mutex at rtnl_lock() is reported as hung task, resulting in various hung task reports waiting for rtnl_mutex at rtnl_lock()). Therefore, I tried to add a sleep before "goto again;", but I can't know whether a sleep added to linux-next-20240612 solves the hung task problem because syzbot currently cannot test linux-next kernels due to some different problem. Therefore, I'm posting a question here before syzbot can resume testing of linux-next kernels. As far as I can see, the ERR_PTR(-EBUSY) assigned at mutex_lock(&idrinfo->lock); ret = idr_alloc_u32(&idrinfo->action_idr, ERR_PTR(-EBUSY), index, max, GFP_KERNEL); mutex_unlock(&idrinfo->lock); in tcf_idr_check_alloc() is cleared by either mutex_lock(&idrinfo->lock); /* Remove ERR_PTR(-EBUSY) allocated by tcf_idr_check_alloc */ WARN_ON(!IS_ERR(idr_remove(&idrinfo->action_idr, index))); mutex_unlock(&idrinfo->lock); in tcf_idr_cleanup() or mutex_lock(&idrinfo->lock); /* Replace ERR_PTR(-EBUSY) allocated by tcf_idr_check_alloc */ idr_replace(&idrinfo->action_idr, a, a->tcfa_index); mutex_unlock(&idrinfo->lock); in tcf_idr_insert_many(). But is there a possibility that rtnl_mutex is released between tcf_idr_check_alloc() and tcf_idr_{cleanup,insert_many}() ? If yes, adding a sleep before "goto again;" won't be sufficient. But if no, how can /* This means that another process allocated * index but did not assign the pointer yet. */ happen (because both setting ERR_PTR(-EBUSY) and replacing with an !IS_ERR() value are done without temporarily releasing rtnl_mutex) ? Is there a possibility that tcf_idr_check_alloc() is called without holding rtnl_mutex? If yes, adding a sleep before "goto again;" would help. But if no, is this a sign that some path forgot to call tcf_idr_{cleanup,insert_many}() ?