Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18;
MIME-Version: 1.0
References: <CAM7-yPSAeerKjthXuVpae+CQ1kJu-kJLofX+-+UcTAGcNT6CWw@mail.gmail.com>
In-Reply-To: <CAM7-yPSAeerKjthXuVpae+CQ1kJu-kJLofX+-+UcTAGcNT6CWw@mail.gmail.com>
From:   Yun Levi <ppbuk5246@gmail.com>
Date:   Sat, 16 Jan 2021 16:31:48 +0900
Message-ID: <CAM7-yPRCTdx9kN_q0evU6_yKu2r=q=vT7tfr95eoYUNKWEZNFA@mail.gmail.com>
Subject: Re: Question about qspinlock
To:     peterz@infradead.org, mingo@redhat.com, will@kernel.org
Cc:     Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Content-Type: text/plain; charset="UTF-8"
Precedence: bulk

Sorry.. I see spin_lock is running after preempt_disable.
Sorry to make a noise.

On Fri, Jan 15, 2021 at 11:03 AM Yun Levi <ppbuk5246@gmail.com> wrote:
>
> Hi Peter, Ingo, Will and linux-kernel.
>
> While I read the code of queued_spin_lock_slowpath function,
> I have some questions about an unrelated nesting case when qspinlock is waiting.
>
> Suppose there are CPU1 to CPU8.
> There are two locks named lock1 and lock2 which are not related to each other.
>
> At first, Thread 1 got a lock1.
> And Thread 2 and Thread 3 are contending to get lock1 and each waiting on
> CPU2 and CPU3.
>
> Next, Thread 5 got a lock2.
> And Thread 6 and Thread 7 are contending to get lock2 and each waiting on
> CPU6 and CPU7.
>
> In this situation, Thread 2 consumes all its quantum and switched new
> thread named "NTHREAD"
> But This NTHREAD tries to get lock2 and recognize someone is waiting on queue,
> and try to add itself to queue again.
>
> My questions are:
>     1. When this situation happens, the qnode idx will not be 0, but
> it seems to be nesting.
>          Could this situation happen or just my leak of understanding of code?
>
>     2. If (1) situation does not happen, why does it not happen?
>
>     3. If (1) situation can happen, lock contamination could be
> happened when more than
>         three threads waiting, unlocked one of them and another
> waiting again on same CPU.
>         So, how about making them wait as idx >= MAX_NODE when someone
> try to queuing,
>         But it try to lock different from one who queue in waiting other lock?
>
> My idea is:
> diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
> index b9515fcc9b29..fbb6e2effb59 100644
> --- a/kernel/locking/qspinlock.c
> +++ b/kernel/locking/qspinlock.c
> @@ -106,6 +106,7 @@ struct qnode {
>   * PV doubles the storage and uses the second cacheline for PV state.
>   */
>  static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[MAX_NODES]);
> +static DEFINE_PER_CPU(struct qspinlock *, curr_lock);
>
>  /*
>   * We must be able to distinguish between no-tail and the tail at 0:0,
> @@ -315,6 +316,7 @@ static __always_inline u32
> __pv_wait_head_or_lock(struct qspinlock *lock,
>  void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
>  {
>   struct mcs_spinlock *prev, *next, *node;
> + struct qspinlock *saved_lock = NULL;
>   u32 old, tail;
>   int idx;
>
> @@ -401,6 +403,13 @@ void queued_spin_lock_slowpath(struct qspinlock
> *lock, u32 val)
>   idx = node->count++;
>   tail = encode_tail(smp_processor_id(), idx);
>
> + if (likely(!idx)) {
> + *this_cpu_ptr(&curr_lock) = lock;
> + saved_lock = lock;
> + } else {
> + saved_lock = *this_cpu_ptr(&curr_lock);
> + }
> +
>   /*
>   * 4 nodes are allocated based on the assumption that there will
>   * not be nested NMIs taking spinlocks. That may not be true in
> @@ -410,7 +419,7 @@ void queued_spin_lock_slowpath(struct qspinlock
> *lock, u32 val)
>   * any MCS node. This is not the most elegant solution, but is
>   * simple enough.
>   */
> - if (unlikely(idx >= MAX_NODES)) {
> + if (unlikely((idx >= MAX_NODES) || lock != saved_lock)) {
>   lockevent_inc(lock_no_node);
>   while (!queued_spin_trylock(lock))
>   cpu_relax();
> @@ -557,7 +566,8 @@ void queued_spin_lock_slowpath(struct qspinlock
> *lock, u32 val)
>   /*
>   * release the node
>   */
> - __this_cpu_dec(qnodes[0].mcs.count);
> + if (unlikely(saved_lock == lock))
> + __this_cpu_dec(qnodes[0].mcs.count);
>  }
>  EXPORT_SYMBOL(queued_spin_lock_slowpath);
>
> Thank you.