Received: by 2002:ab2:7855:0:b0:1f9:5764:f03e with SMTP id m21csp247436lqp; Wed, 22 May 2024 03:31:42 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCV5SyFRz4jF6QZkEKTtNapPsPJKo0Y6o8AISPpLMrtG4MMy/UHA3Zhz/71seJ9W/o7B9tAogPeHNRcSjOaylXReXQgSoXzxfIgLatKnEg== X-Google-Smtp-Source: AGHT+IGUm1GI0lGbzjP1Pe7ZV/xM0yAvMDDaLQVVzkjtY9jftlCCxwnCtgfva3KBMZY4WleqiNB7 X-Received: by 2002:ac8:7f88:0:b0:43e:d87:e8e with SMTP id d75a77b69052e-43f7a2b586bmr235068801cf.18.1716373902593; Wed, 22 May 2024 03:31:42 -0700 (PDT) Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id d75a77b69052e-43df549d49esi25757491cf.147.2024.05.22.03.31.42 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 May 2024 03:31:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-186057-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; arc=fail (body hash mismatch); spf=pass (google.com: domain of linux-kernel+bounces-186057-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-186057-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 3D6261C20D8A for ; Wed, 22 May 2024 10:31:42 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 3F09D8248B; Wed, 22 May 2024 10:31:31 +0000 (UTC) Received: from www262.sakura.ne.jp (www262.sakura.ne.jp [202.181.97.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 999BE50269 for ; Wed, 22 May 2024 10:31:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.181.97.72 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716373890; cv=none; b=rxccv6Pl1/ry6kZFWAqLei2UigdOqMfpUa+oUuyio+urRBMPivxrCQDz0UZ+nq8/BN2N6SJwJz2ud3REhaE+w/xQD3H4arH4xCyab62RiA/fo79P4NVa5PGLBsFH3AA4cmTMNw4Fs6a6hj3tiPU96J82qeVXUkvvwTwuVYx6dyo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716373890; c=relaxed/simple; bh=rxFk8HEHcSjxEGIA+UOVFBH3zOZ9OBbzCE67oj1TzPI=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=FuAM9Mh1Tb1C7k2jjCErGjatFI3OeKXkCio4CADfbAw6LY7FmIunCeyJ9Bh6L9lAqdMvOcdw5EhgbfJlWPmC6a+ouYqlJFTnwAmTAmLqxT4JpsSEEsgr0p8Wgz5jIxKntyPge8xVKIYhvrLcw3vA6RKKqba8FhufiPMvjqN+0hU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=I-love.SAKURA.ne.jp; spf=pass smtp.mailfrom=I-love.SAKURA.ne.jp; arc=none smtp.client-ip=202.181.97.72 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=I-love.SAKURA.ne.jp Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=I-love.SAKURA.ne.jp Received: from fsav119.sakura.ne.jp (fsav119.sakura.ne.jp [27.133.134.246]) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTP id 44MAUxO7084776; Wed, 22 May 2024 19:30:59 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav119.sakura.ne.jp (F-Secure/fsigk_smtp/550/fsav119.sakura.ne.jp); Wed, 22 May 2024 19:30:59 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/fsav119.sakura.ne.jp) Received: from [192.168.1.6] (M106072142033.v4.enabler.ne.jp [106.72.142.33]) (authenticated bits=0) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTPSA id 44MAUwdV084772 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NO); Wed, 22 May 2024 19:30:59 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Message-ID: Date: Wed, 22 May 2024 19:30:58 +0900 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] bpf, sockmap: defer sk_psock_free_link() using RCU To: Jakub Sitnicki , Alexei Starovoitov Cc: Eric Dumazet , Linus Torvalds , bpf , LKML , Hillf Danton , "Paul E. McKenney" References: <838e7959-a360-4ac1-b36a-a3469236129b@I-love.SAKURA.ne.jp> <20240521225918.2147-1-hdanton@sina.com> <877cfmxjie.fsf@cloudflare.com> Content-Language: en-US From: Tetsuo Handa In-Reply-To: <877cfmxjie.fsf@cloudflare.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 2024/05/22 18:50, Jakub Sitnicki wrote: > On Wed, May 22, 2024 at 06:59 AM +08, Hillf Danton wrote: >> On Tue, 21 May 2024 08:38:52 -0700 Alexei Starovoitov >>> On Sun, May 12, 2024 at 12:22=E2=80=AFAM Tetsuo Handa wrote: >>>> --- a/net/core/sock_map.c >>>> +++ b/net/core/sock_map.c >>>> @@ -142,6 +142,7 @@ static void sock_map_del_link(struct sock *sk, >>>> bool strp_stop =3D false, verdict_stop =3D false; >>>> struct sk_psock_link *link, *tmp; >>>> >>>> + rcu_read_lock(); >>>> spin_lock_bh(&psock->link_lock); >>> >>> I think this is incorrect. >>> spin_lock_bh may sleep in RT and it won't be safe to do in rcu cs. >> >> Could you specify why it won't be safe in rcu cs if you are right? >> What does rcu look like in RT if not nothing? > > RCU readers can't block, while spinlock RT doesn't disable preemption. > > https://docs.kernel.org/RCU/rcu.html > https://docs.kernel.org/locking/locktypes.html#spinlock-t-and-preempt-rt > I didn't catch what you mean. https://elixir.bootlin.com/linux/latest/source/include/linux/spinlock_rt.h#L43 defines spin_lock() for RT as static __always_inline void spin_lock(spinlock_t *lock) { rt_spin_lock(lock); } and https://elixir.bootlin.com/linux/v6.9/source/include/linux/spinlock_rt.h#L85 defines spin_lock_bh() for RT as static __always_inline void spin_lock_bh(spinlock_t *lock) { /* Investigate: Drop bh when blocking ? */ local_bh_disable(); rt_spin_lock(lock); } and https://elixir.bootlin.com/linux/latest/source/kernel/locking/spinlock_rt.c#L54 defines rt_spin_lock() for RT as void __sched rt_spin_lock(spinlock_t *lock) { spin_acquire(&lock->dep_map, 0, 0, _RET_IP_); __rt_spin_lock(lock); } and https://elixir.bootlin.com/linux/v6.9/source/kernel/locking/spinlock_rt.c#L46 defines __rt_spin_lock() for RT as static __always_inline void __rt_spin_lock(spinlock_t *lock) { rtlock_might_resched(); rtlock_lock(&lock->lock); rcu_read_lock(); migrate_disable(); } You can see that calling spin_lock() or spin_lock_bh() automatically starts RCU critical section, can't you? If spin_lock_bh() for RT might sleep and calling spin_lock_bh() under RCU critical section is not safe, how can spin_lock(&lock1); spin_lock(&lock2); // do something spin_unlock(&lock2); spin_unlock(&lock1); or spin_lock_bh(&lock1); spin_lock(&lock2); // do something spin_unlock(&lock2); spin_unlock_bh(&lock1); be possible? Unless rcu_read_lock() is implemented in a way that is safe to do rcu_read_lock(); spin_lock(&lock2); // do something spin_unlock(&lock2); rcu_read_unlock(); and rcu_read_lock(); spin_lock_bh(&lock2); // do something spin_unlock_bh(&lock2); rcu_read_unlock(); , I think RT kernels can't run safely. Locking primitive ordering is too much complicated/distributed. We need documentation using safe/unsafe ordering examples.