2001-02-20 00:51:33

by Marcelo Tosatti

[permalink] [raw]
Subject: __lock_page calls run_task_queue(&tq_disk) unecessarily?


Hi Linus,

Take a look at __lock_page:

static void __lock_page(struct page *page)
{
struct task_struct *tsk = current;
DECLARE_WAITQUEUE(wait, tsk);

add_wait_queue_exclusive(&page->wait, &wait);
for (;;) {
sync_page(page);
set_task_state(tsk, TASK_UNINTERRUPTIBLE);
if (PageLocked(page)) {
run_task_queue(&tq_disk);
schedule();
continue;
}
if (!TryLockPage(page))
break;
}
tsk->state = TASK_RUNNING;
remove_wait_queue(&page->wait, &wait);
}


Af a process sleeps in __lock_page, sync_page() will be called even if the
page is already unlocked. (block_sync_page(), the sync_page routine for
generic block based filesystem calls run_task_queue(&tq_disk)).

I don't see any problem if we remove the run_task_queue(&tq_disk) and put
sync_page(page) there instead, removing the other sync_page(page) at the
beginning of the loop.

Comments?


2001-02-20 02:54:00

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: __lock_page calls run_task_queue(&tq_disk) unecessarily?


Btw ___wait_on_page() does something similar.

Here goes the patch for both __lock_page() and ___wait_on_page().


--- linux/mm/filemap.c.orig Mon Feb 19 23:51:02 2001
+++ linux/mm/filemap.c Mon Feb 19 23:51:33 2001
@@ -611,11 +611,11 @@

add_wait_queue(&page->wait, &wait);
do {
- sync_page(page);
set_task_state(tsk, TASK_UNINTERRUPTIBLE);
if (!PageLocked(page))
break;
- run_task_queue(&tq_disk);
+
+ sync_page(page);
schedule();
} while (PageLocked(page));
tsk->state = TASK_RUNNING;
@@ -633,10 +633,9 @@

add_wait_queue_exclusive(&page->wait, &wait);
for (;;) {
- sync_page(page);
set_task_state(tsk, TASK_UNINTERRUPTIBLE);
if (PageLocked(page)) {
- run_task_queue(&tq_disk);
+ sync_page(page);
schedule();
continue;
}


On Mon, 19 Feb 2001, Marcelo Tosatti wrote:

>
> Hi Linus,
>
> Take a look at __lock_page:
>
> static void __lock_page(struct page *page)
> {
> struct task_struct *tsk = current;
> DECLARE_WAITQUEUE(wait, tsk);
>
> add_wait_queue_exclusive(&page->wait, &wait);
> for (;;) {
> sync_page(page);
> set_task_state(tsk, TASK_UNINTERRUPTIBLE);
> if (PageLocked(page)) {
> run_task_queue(&tq_disk);
> schedule();
> continue;
> }
> if (!TryLockPage(page))
> break;
> }
> tsk->state = TASK_RUNNING;
> remove_wait_queue(&page->wait, &wait);
> }
>
>
> Af a process sleeps in __lock_page, sync_page() will be called even if the
> page is already unlocked. (block_sync_page(), the sync_page routine for
> generic block based filesystem calls run_task_queue(&tq_disk)).
>
> I don't see any problem if we remove the run_task_queue(&tq_disk) and put
> sync_page(page) there instead, removing the other sync_page(page) at the
> beginning of the loop.
>
> Comments?
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2001-02-20 15:59:40

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: __lock_page calls run_task_queue(&tq_disk) unecessarily?

On Mon, Feb 19, 2001 at 11:05:23PM -0200, Marcelo Tosatti wrote:
> --- linux/mm/filemap.c.orig Mon Feb 19 23:51:02 2001
> +++ linux/mm/filemap.c Mon Feb 19 23:51:33 2001
> @@ -611,11 +611,11 @@
>
> add_wait_queue(&page->wait, &wait);
> do {
> - sync_page(page);
> set_task_state(tsk, TASK_UNINTERRUPTIBLE);
> if (!PageLocked(page))
> break;
> - run_task_queue(&tq_disk);
> +
> + sync_page(page);
> schedule();
> } while (PageLocked(page));
> tsk->state = TASK_RUNNING;
> @@ -633,10 +633,9 @@
>
> add_wait_queue_exclusive(&page->wait, &wait);
> for (;;) {
> - sync_page(page);
> set_task_state(tsk, TASK_UNINTERRUPTIBLE);
> if (PageLocked(page)) {
> - run_task_queue(&tq_disk);
> + sync_page(page);
> schedule();
> continue;
^^^^^^^^
> }

Looks perfect. I'd also remove the `continue' from __lock_page, it's wake-one
so it should get the wakeup only when it's time to lock the page down.

Andrea

2001-02-20 17:11:49

by Linus Torvalds

[permalink] [raw]
Subject: Re: __lock_page calls run_task_queue(&tq_disk) unecessarily?



On Tue, 20 Feb 2001, Andrea Arcangeli wrote:
>
> Looks perfect. I'd also remove the `continue' from __lock_page, it's wake-one
> so it should get the wakeup only when it's time to lock the page down.

NO!

Even if it is wake-one, others may have claimed it before. There can be
new users coming in and doing a "trylock()" etc.

NEVER *EVER* think that "exclusive wait-queue" implies some sort of
critical region protection. An exlcusive wait-queue is _not_ a lock. It's
only an optimization heuristic.

Linus

2001-02-20 18:09:45

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: __lock_page calls run_task_queue(&tq_disk) unecessarily?

On Tue, Feb 20, 2001 at 09:11:04AM -0800, Linus Torvalds wrote:
> Even if it is wake-one, others may have claimed it before. There can be
> new users coming in and doing a "trylock()" etc.
>
> NEVER *EVER* think that "exclusive wait-queue" implies some sort of
> critical region protection. An exlcusive wait-queue is _not_ a lock. It's
> only an optimization heuristic.

The reason of not executing the trylock in the slow path of the lock_page is to
avoid writing to the shared ram on all the CPUs at the same time for no good
reason.

We can write just now to the same cacheline from all the CPUs at the same time
if all the cpus runs lock_page at the same time on the same page, but there's
not an high probability for such thing to happen and we would be slower to try
to read first.

The reason of the `continue' is only one: if the wakeup would be wake-all we
would end executing NR_sleppers TryLockPage() and we would get an high probability
that only one of those trylocks will succeed. So we could assume that all the
other trylocks was going to be wasted and so we used `continue' to try not to
bang the cacheline too much for probably no good reason.

But since it's a wake-one, only one task will try to acquire the cacheline after
the wakeup as it just did once with the TryLockPage in lock_page(). It will
just try again like restarting from lock_page().

I don't see how the probability of TryLockPage to succeed after the wakeup
could decrease compared to the first TryLockPage. Since the probability doesn't
decrease I don't see any point for the `continue'. Why should the probability
of succeeding in TryLockPage drecrease after a wakeup compared to the
TryLockPage in lock_page()? If you have an explanation I will certainly agree
to left the `continue' there.

Andrea