MIME-Version: 1.0
References: <20230118150703.4024-1-ubizjak@gmail.com> <20230118131825.c6daea81ea1e2dc6aa014f38@linux-foundation.org>
 <CAFULd4ZQGG+N3f7xDuoiNG1jY128pqaH0F4eLKO+fhvSNAbKfA@mail.gmail.com>
 <CAFULd4b5szcTHTVbGJ9WiciG_+8kANiPZYP_pkEZUhnz_HHy-g@mail.gmail.com> <913c01d41f824fa8b3400384437fa0d8@AcuMS.aculab.com>
In-Reply-To: <913c01d41f824fa8b3400384437fa0d8@AcuMS.aculab.com>
From:   Uros Bizjak <ubizjak@gmail.com>
Date:   Mon, 23 Jan 2023 16:04:43 +0100
Message-ID: <CAFULd4aDORSrq7zf_LcAZRP8HOHcrq2-rGMaroKyG2zQDHNpOA@mail.gmail.com>
Subject: Re: [PATCH] lib/genalloc: use try_cmpxchg in {set,clear}_bits_ll
To:     David Laight <David.Laight@aculab.com>
Cc:     Andrew Morton <akpm@linux-foundation.org>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Mateusz Guzik <mjguzik@gmail.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Precedence: bulk

On Thu, Jan 19, 2023 at 1:47 PM David Laight <David.Laight@aculab.com> wrot=
e:
>
> > BTW: Recently, it was determined [1] that the usage of cpu_relax()
> > inside the cmpxchg loop can be harmful for performance. We actually
> > have the same situation here, so perhaps cpu_relax() should be removed
> > in the same way it was removed from the lockref.
>
> I'm not sure you can ever want a cpu_relax() in a loop that
> is implementing an atomic operation.
> Even the ia64 (die...) issue was with a loop that was waiting
> for another cpu to change the location (eg a spinlock).
>
> For an atomic operation an immediate retry is likely to succeed.
> Any kind of deferral to an another cpu can only make it worse.
>
> Clearly if you have 100s of cpu looping doing atomic operation
> on the same cache line it is likely that some get starved.
> But to fix that you need to increase the time between successful
> operations, not delay on failure.

I would like to point out that the wikipedia article on
compare-and-swap claims [1] that:

Instead of immediately retrying after a CAS operation fails,
researchers have found that total system performance can be improved
in multiprocessor systems=E2=80=94where many threads constantly update some
particular shared variable=E2=80=94if threads that see their CAS fail use
exponential backoff=E2=80=94in other words, wait a little before retrying t=
he
CAS [2].

[1] https://en.wikipedia.org/wiki/Compare-and-swap#Overview
[2] https://arxiv.org/pdf/1305.5800.pdf

Uros.