2012-10-08 23:33:46

by Peter Fordham

[permalink] [raw]
Subject: spinlocks in ext4

Hi,

Can someone give me a quick outline of why spinlocks are required in
the EXT4 code? Don't all file-system requests originate from user
context, hence meaning all locking be done with mutexes or semaphores.

I'm doing some profiling on an ARM device it's showing up spin unlock
taking a lot of time and I'd like to migrate to using mutex's instead
since they don't incur penalties from synchronization instructions
like DMB. I'm guessing there's some underlying reason why this isn't
safe and I'd like to understand it.

thanks,

-Pete Fordham


2012-10-09 00:54:29

by Theodore Ts'o

[permalink] [raw]
Subject: Re: spinlocks in ext4

On Mon, Oct 08, 2012 at 04:33:45PM -0700, Peter Fordham wrote:
>
> Can someone give me a quick outline of why spinlocks are required in
> the EXT4 code? Don't all file-system requests originate from user
> context, hence meaning all locking be done with mutexes or semaphores.

Mutexes are incredibly expensive in the contended case, since you
basically have to take a trip through the scheduler. If the other CPU
is only going to be holding the lock for a few dozen cycles, a
spinlock is far preferable to a mutex.

> I'm doing some profiling on an ARM device it's showing up spin unlock
> taking a lot of time and I'd like to migrate to using mutex's instead
> since they don't incur penalties from synchronization instructions
> like DMB. I'm guessing there's some underlying reason why this isn't
> safe and I'd like to understand it.

Why in the world does ARM have expensive spinlocks? ARM64 is *doomed*
if this is a fundamental property of the ARM processor design...

- Ted

2012-10-09 01:10:39

by Theodore Ts'o

[permalink] [raw]
Subject: Re: spinlocks in ext4

How expensive are memory barriers on ARM, anyway?

- Ted

2012-10-11 18:37:46

by Peter Fordham

[permalink] [raw]
Subject: Re: spinlocks in ext4

On 8 October 2012 18:10, Theodore Ts'o <[email protected]> wrote:
> How expensive are memory barriers on ARM, anyway?

The performance monitors seem to be telling me that a DMB just after a
store which misses in the L1 & L2, (causing an eviction of a clean
line and a line-fill, I assume) takes over 100 cycles.

I'm seeing a 20% slow down in ext4 performance when enabling SMP on my
device. I'm starting to think there might be issues with the memory
system.

-Pete

2012-10-12 00:17:37

by Theodore Ts'o

[permalink] [raw]
Subject: Re: spinlocks in ext4

On Thu, Oct 11, 2012 at 11:37:46AM -0700, Peter Fordham wrote:
> On 8 October 2012 18:10, Theodore Ts'o <[email protected]> wrote:
> > How expensive are memory barriers on ARM, anyway?
>
> The performance monitors seem to be telling me that a DMB just after a
> store which misses in the L1 & L2, (causing an eviction of a clean
> line and a line-fill, I assume) takes over 100 cycles.

If we assume a 1GHz clock, 100 cycles is 0.1 microseconds (100ns). A
4k read on an eMMC device (what I assume you are using) is about 5ms.
A super expensive PCIe attached flash has a read latency of around
20-50 microseconds. Read latency for an SSD is around 1ms.

> I'm seeing a 20% slow down in ext4 performance when enabling SMP on my
> device. I'm starting to think there might be issues with the memory
> system.

So what are you measuring and how are you measuring it? If 0.1
microseconds is significant, it must be something where everything is
in cache, and you're never hitting the storage device.

More to the point, as Arjan pointed out to me on Google+, using a
mutex is going to add at least one, and probably more, memory barriers
(due to the locks needed by the scheduler) *plus* the scheduling
overhead. So your claim that using a mutex is superior to using
spinlocks makes absolutely no sense.

- Ted