LinuxLists.cc - [RFC] sched: make callers check lock contention for cond_resched

2012-05-03 08:12:56

Subject: [RFC] sched: make callers check lock contention for cond_resched_lock()

This patch is for showing what I am thinking and only compile tested
on linux-next, so an RFC.

Although I might misread something, I am not sure whether every user of
this API wanted to avoid contention check without CONFIG_PREEMPT.

Any comments will be appreciated.

Thanks,
Takuya

===
From: Takuya Yoshikawa <[email protected]>

While doing kvm development, we found a case in which we wanted to break
a critical section on lock contention even without CONFIG_PREEMPT.

Although we can do that using spin_is_contended() and cond_resched(),
changing cond_resched_lock() to satisfy such a need is another option.

Signed-off-by: Takuya Yoshikawa <[email protected]>
---
arch/x86/kvm/mmu.c | 3 ++-
fs/btrfs/extent_io.c | 2 +-
fs/btrfs/inode.c | 3 ++-
fs/btrfs/ordered-data.c | 3 ++-
fs/btrfs/relocation.c | 3 ++-
fs/dcache.c | 3 ++-
fs/fscache/object.c | 3 ++-
fs/jbd/commit.c | 6 ++++--
fs/jbd2/commit.c | 3 ++-
fs/nfs/nfs4filelayout.c | 2 +-
fs/nfs/write.c | 2 +-
fs/ocfs2/dlm/dlmdomain.c | 5 +++--
fs/ocfs2/dlm/dlmthread.c | 3 ++-
fs/reiserfs/journal.c | 4 ++--
include/linux/sched.h | 6 +++---
kernel/sched/core.c | 4 ++--
16 files changed, 33 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 07424cf..3361ee3 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1704,7 +1704,8 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
mmu_pages_clear_parents(&parents);
}
kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
- cond_resched_lock(&vcpu->kvm->mmu_lock);
+ cond_resched_lock(&vcpu->kvm->mmu_lock,
+ spin_is_contended(&vcpu->kvm->mmu_lock));
kvm_mmu_pages_init(parent, &parents, &pages);
}
}
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 198c2ba..cfcc233 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -675,7 +675,7 @@ again:
if (start > end)
break;

- cond_resched_lock(&tree->lock);
+ cond_resched_lock(&tree->lock, spin_needbreak(&tree->lock));
}
out:
spin_unlock(&tree->lock);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 61b16c6..16a6173 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3985,7 +3985,8 @@ again:
goto again;
}

- if (cond_resched_lock(&root->inode_lock))
+ if (cond_resched_lock(&root->inode_lock,
+ spin_needbreak(&root->inode_lock)))
goto again;

node = rb_next(node);
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index bbf6d0d..1dfcd6d 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -485,7 +485,8 @@ void btrfs_wait_ordered_extents(struct btrfs_root *root,
!test_bit(BTRFS_ORDERED_PREALLOC, &ordered->flags)) {
list_move(&ordered->root_extent_list,
&root->fs_info->ordered_extents);
- cond_resched_lock(&root->fs_info->ordered_extent_lock);
+ cond_resched_lock(&root->fs_info->ordered_extent_lock,
+ spin_needbreak(&root->fs_info->ordered_extent_lock));
continue;
}

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 646ee21..6102a62 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1471,7 +1471,8 @@ again:
}

objectid = btrfs_ino(&entry->vfs_inode) + 1;
- if (cond_resched_lock(&root->inode_lock))
+ if (cond_resched_lock(&root->inode_lock,
+ spin_needbreak(&root->inode_lock)))
goto again;

node = rb_next(node);
diff --git a/fs/dcache.c b/fs/dcache.c
index 58a6ecf..dccfa62 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -855,7 +855,8 @@ relock:
if (!--count)
break;
}
- cond_resched_lock(&dcache_lru_lock);
+ cond_resched_lock(&dcache_lru_lock,
+ spin_needbreak(&dcache_lru_lock));
}
if (!list_empty(&referenced))
list_splice(&referenced, &sb->s_dentry_lru);
diff --git a/fs/fscache/object.c b/fs/fscache/object.c
index b6b897c..9db99c6 100644
--- a/fs/fscache/object.c
+++ b/fs/fscache/object.c
@@ -824,7 +824,8 @@ static void fscache_enqueue_dependents(struct fscache_object *object)
fscache_put_object(dep);

if (!list_empty(&object->dependents))
- cond_resched_lock(&object->lock);
+ cond_resched_lock(&object->lock,
+ spin_needbreak(&object->lock));
}

spin_unlock(&object->lock);
diff --git a/fs/jbd/commit.c b/fs/jbd/commit.c
index 52c15c7..59c60bf 100644
--- a/fs/jbd/commit.c
+++ b/fs/jbd/commit.c
@@ -474,7 +474,8 @@ void journal_commit_transaction(journal_t *journal)
__journal_unfile_buffer(jh);
jbd_unlock_bh_state(bh);
release_data_buffer(bh);
- cond_resched_lock(&journal->j_list_lock);
+ cond_resched_lock(&journal->j_list_lock,
+ spin_needbreak(&journal->j_list_lock));
}
spin_unlock(&journal->j_list_lock);

@@ -905,7 +906,8 @@ restart_loop:
release_buffer_page(bh);
else
__brelse(bh);
- cond_resched_lock(&journal->j_list_lock);
+ cond_resched_lock(&journal->j_list_lock,
+ spin_needbreak(&journal->j_list_lock));
}
spin_unlock(&journal->j_list_lock);
/*
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 840f70f..5e71afa 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -989,7 +989,8 @@ restart_loop:
release_buffer_page(bh); /* Drops bh reference */
else
__brelse(bh);
- cond_resched_lock(&journal->j_list_lock);
+ cond_resched_lock(&journal->j_list_lock,
+ spin_needbreak(&journal->j_list_lock));
}
spin_unlock(&journal->j_list_lock);
/*
diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 5acfd9e..0536aab 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -946,7 +946,7 @@ filelayout_scan_ds_commit_list(struct nfs4_fl_commit_bucket *bucket, int max,
list_for_each_entry_safe(req, tmp, src, wb_list) {
if (!nfs_lock_request(req))
continue;
- if (cond_resched_lock(lock))
+ if (cond_resched_lock(lock, spin_needbreak(lock)))
list_safe_reset_next(req, tmp, wb_list);
nfs_request_remove_commit_list(req);
clear_bit(PG_COMMIT_TO_DS, &req->wb_flags);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index c074623..0d83257 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -570,7 +570,7 @@ nfs_scan_commit_list(struct list_head *src, struct list_head *dst, int max,
list_for_each_entry_safe(req, tmp, src, wb_list) {
if (!nfs_lock_request(req))
continue;
- if (cond_resched_lock(lock))
+ if (cond_resched_lock(lock, spin_needbreak(lock)))
list_safe_reset_next(req, tmp, wb_list);
nfs_request_remove_commit_list(req);
nfs_list_add_request(req, dst);
diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
index 9e89d70..5602e4c 100644
--- a/fs/ocfs2/dlm/dlmdomain.c
+++ b/fs/ocfs2/dlm/dlmdomain.c
@@ -465,11 +465,12 @@ redo_bucket:
dlm_lockres_put(res);

if (dropped) {
- cond_resched_lock(&dlm->spinlock);
+ cond_resched_lock(&dlm->spinlock,
+ spin_needbreak(&dlm->spinlock));
goto redo_bucket;
}
}
- cond_resched_lock(&dlm->spinlock);
+ cond_resched_lock(&dlm->spinlock, spin_needbreak(&dlm->spinlock));
num += n;
}
spin_unlock(&dlm->spinlock);
diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c
index e73c833..ee86242 100644
--- a/fs/ocfs2/dlm/dlmthread.c
+++ b/fs/ocfs2/dlm/dlmthread.c
@@ -276,7 +276,8 @@ static void dlm_run_purge_list(struct dlm_ctxt *dlm,
dlm_lockres_put(lockres);

/* Avoid adding any scheduling latencies */
- cond_resched_lock(&dlm->spinlock);
+ cond_resched_lock(&dlm->spinlock,
+ spin_needbreak(&dlm->spinlock));
}

spin_unlock(&dlm->spinlock);
diff --git a/fs/reiserfs/journal.c b/fs/reiserfs/journal.c
index b1a0857..d58f596 100644
--- a/fs/reiserfs/journal.c
+++ b/fs/reiserfs/journal.c
@@ -835,7 +835,7 @@ static int write_ordered_buffers(spinlock_t * lock,
}
loop_next:
put_bh(bh);
- cond_resched_lock(lock);
+ cond_resched_lock(lock, spin_needbreak(lock));
}
if (chunk.nr) {
spin_unlock(lock);
@@ -870,7 +870,7 @@ static int write_ordered_buffers(spinlock_t * lock,
spin_lock(lock);
}
put_bh(bh);
- cond_resched_lock(lock);
+ cond_resched_lock(lock, spin_needbreak(lock));
}
spin_unlock(lock);
return ret;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 7d2acbd..61f4396 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2723,7 +2723,7 @@ extern int _cond_resched(void);
_cond_resched(); \
})

-extern int __cond_resched_lock(spinlock_t *lock);
+extern int __cond_resched_lock(spinlock_t *lock, int need_break);

#ifdef CONFIG_PREEMPT_COUNT
#define PREEMPT_LOCK_OFFSET PREEMPT_OFFSET
@@ -2731,9 +2731,9 @@ extern int __cond_resched_lock(spinlock_t *lock);
#define PREEMPT_LOCK_OFFSET 0
#endif

-#define cond_resched_lock(lock) ({ \
+#define cond_resched_lock(lock, need_break) ({ \
__might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET); \
- __cond_resched_lock(lock); \
+ __cond_resched_lock(lock, need_break); \
})

extern int __cond_resched_softirq(void);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 477b998..470113f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4570,14 +4570,14 @@ EXPORT_SYMBOL(_cond_resched);
* operations here to prevent schedule() from being called twice (once via
* spin_unlock(), once by hand).
*/
-int __cond_resched_lock(spinlock_t *lock)
+int __cond_resched_lock(spinlock_t *lock, int need_break)
{
int resched = should_resched();
int ret = 0;

lockdep_assert_held(lock);

- if (spin_needbreak(lock) || resched) {
+ if (need_break || resched) {
spin_unlock(lock);
if (resched)
__cond_resched();
--
1.7.5.4

2012-05-03 08:35:45

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [RFC] sched: make callers check lock contention for cond_resched_lock()

On Thu, 2012-05-03 at 17:12 +0900, Takuya Yoshikawa wrote:
>
> Although we can do that using spin_is_contended() and cond_resched(),
> changing cond_resched_lock() to satisfy such a need is another option.
>
Yeah, not a pretty patch. Changing all cond_resched_lock() sites just to
change one and in such an ugly way too.

So what's the impact of making spin_needbreak() work for !PREEMPT?

2012-05-03 12:22:51

by Takuya Yoshikawa

[permalink] [raw]

Subject: Re: [RFC] sched: make callers check lock contention for cond_resched_lock()

On Thu, 03 May 2012 10:35:27 +0200
Peter Zijlstra <[email protected]> wrote:

> On Thu, 2012-05-03 at 17:12 +0900, Takuya Yoshikawa wrote:
> >
> > Although we can do that using spin_is_contended() and cond_resched(),
> > changing cond_resched_lock() to satisfy such a need is another option.
> >
> Yeah, not a pretty patch. Changing all cond_resched_lock() sites just to
> change one and in such an ugly way too.
>
> So what's the impact of making spin_needbreak() work for !PREEMPT?

Although the real use case is out of this RFC patch, we are now discussing
a case in which we may hold a spin_lock for long time, ms order, depending
on workload; and in that case, other threads -- VCPU threads -- should be
given higher priority for that problematic lock.

I wanted to hear whether other users also have similar needs. If so, it
may be worth making the API a bit more generic.

But I could not find a clean solution for that. Do you think that using
spin_is_contended() directly is the way to go?

Thanks,
Takuya

2012-05-03 12:29:30

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [RFC] sched: make callers check lock contention for cond_resched_lock()

On Thu, 2012-05-03 at 21:22 +0900, Takuya Yoshikawa wrote:
> Although the real use case is out of this RFC patch, we are now discussing
> a case in which we may hold a spin_lock for long time, ms order, depending
> on workload; and in that case, other threads -- VCPU threads -- should be
> given higher priority for that problematic lock.

Firstly, if you can hold a lock that long, it shouldn't be a spinlock,
secondly why isn't TIF_RESCHED being set if its running that long? That
should still make cond_resched_lock() break.

2012-05-03 12:47:54

by Avi Kivity

[permalink] [raw]

Subject: Re: [RFC] sched: make callers check lock contention for cond_resched_lock()

On 05/03/2012 03:29 PM, Peter Zijlstra wrote:
> On Thu, 2012-05-03 at 21:22 +0900, Takuya Yoshikawa wrote:
> > Although the real use case is out of this RFC patch, we are now discussing
> > a case in which we may hold a spin_lock for long time, ms order, depending
> > on workload; and in that case, other threads -- VCPU threads -- should be
> > given higher priority for that problematic lock.
>
> Firstly, if you can hold a lock that long, it shouldn't be a spinlock,

In fact with your mm preemptibility work it can be made into a mutex, if
the entire mmu notifier path can be done in task context. However it
ends up a strange mutex - you can sleep while holding it but you may not
allocate, because you might recurse into an mmu notifier again.

Most uses of the lock only involve tweaking some bits though.

> secondly why isn't TIF_RESCHED being set if its running that long? That
> should still make cond_resched_lock() break.

--
error compiling committee.c: too many arguments to function

2012-05-03 13:00:56

by Takuya Yoshikawa

[permalink] [raw]

Subject: Re: [RFC] sched: make callers check lock contention for cond_resched_lock()

On Thu, 03 May 2012 14:29:10 +0200
Peter Zijlstra <[email protected]> wrote:

> On Thu, 2012-05-03 at 21:22 +0900, Takuya Yoshikawa wrote:
> > Although the real use case is out of this RFC patch, we are now discussing
> > a case in which we may hold a spin_lock for long time, ms order, depending
> > on workload; and in that case, other threads -- VCPU threads -- should be
> > given higher priority for that problematic lock.
>
> Firstly, if you can hold a lock that long, it shouldn't be a spinlock,

I agree with you in principle, but isn't cond_resched_lock() there for that?

> secondly why isn't TIF_RESCHED being set if its running that long? That
> should still make cond_resched_lock() break.

I see.

I did some tests using spin_is_contended() and need_resched() and saw
that need_resched() was called as often as spin_is_contended(), so
experimentally I understand your point.

But as I could not see why spin_needbreak() was differently implemented
depending on CONFIG_PREEMPT, I wanted to understand the meaning.

Thanks,
Takuya

2012-05-03 14:11:15

by Takuya Yoshikawa

[permalink] [raw]

Subject: Re: [RFC] sched: make callers check lock contention for cond_resched_lock()

On Thu, 03 May 2012 15:47:26 +0300
Avi Kivity <[email protected]> wrote:

> On 05/03/2012 03:29 PM, Peter Zijlstra wrote:
> > On Thu, 2012-05-03 at 21:22 +0900, Takuya Yoshikawa wrote:
> > > Although the real use case is out of this RFC patch, we are now discussing
> > > a case in which we may hold a spin_lock for long time, ms order, depending
> > > on workload; and in that case, other threads -- VCPU threads -- should be
> > > given higher priority for that problematic lock.
> >
> > Firstly, if you can hold a lock that long, it shouldn't be a spinlock,
>
> In fact with your mm preemptibility work it can be made into a mutex, if
> the entire mmu notifier path can be done in task context. However it
> ends up a strange mutex - you can sleep while holding it but you may not
> allocate, because you might recurse into an mmu notifier again.
>
> Most uses of the lock only involve tweaking some bits though.

I might find a real way to go.

After your "mmu_lock -- TLB-flush" decoupling, we can change the current
get_dirty work flow like this:

for ... {
take mmu_lock
for 4K*8 gfns { // with 4KB dirty_bitmap_buffer
xchg dirty bits // 64/32 gfns at once
write protect them
}
release mmu_lock
copy_to_user
}
TLB flush

This reduces the size of dirty_bitmap_buffer and does not hold mmu_lock
so long.

I should have think of a way not to hold the spin_lock so long as Peter
said. My lack of thinking might be the real problem.

Thanks,
Takuya

2012-05-03 14:27:32

by Avi Kivity

[permalink] [raw]

Subject: Re: [RFC] sched: make callers check lock contention for cond_resched_lock()

On 05/03/2012 05:11 PM, Takuya Yoshikawa wrote:
> On Thu, 03 May 2012 15:47:26 +0300
> Avi Kivity <[email protected]> wrote:
>
> > On 05/03/2012 03:29 PM, Peter Zijlstra wrote:
> > > On Thu, 2012-05-03 at 21:22 +0900, Takuya Yoshikawa wrote:
> > > > Although the real use case is out of this RFC patch, we are now discussing
> > > > a case in which we may hold a spin_lock for long time, ms order, depending
> > > > on workload; and in that case, other threads -- VCPU threads -- should be
> > > > given higher priority for that problematic lock.
> > >
> > > Firstly, if you can hold a lock that long, it shouldn't be a spinlock,
> >
> > In fact with your mm preemptibility work it can be made into a mutex, if
> > the entire mmu notifier path can be done in task context. However it
> > ends up a strange mutex - you can sleep while holding it but you may not
> > allocate, because you might recurse into an mmu notifier again.
> >
> > Most uses of the lock only involve tweaking some bits though.
>
> I might find a real way to go.
>
> After your "mmu_lock -- TLB-flush" decoupling, we can change the current
> get_dirty work flow like this:
>
> for ... {
> take mmu_lock
> for 4K*8 gfns { // with 4KB dirty_bitmap_buffer
> xchg dirty bits // 64/32 gfns at once
> write protect them
> }
> release mmu_lock
> copy_to_user
> }
> TLB flush
>
> This reduces the size of dirty_bitmap_buffer and does not hold mmu_lock
> so long.

Good idea. Hopefully the lock acquisition costs are low enough - we're
adding two atomic operations per iteration here.

--
error compiling committee.c: too many arguments to function

2012-05-03 14:38:40

by Avi Kivity

[permalink] [raw]

Subject: Re: [RFC] sched: make callers check lock contention for cond_resched_lock()

On 05/03/2012 05:27 PM, Avi Kivity wrote:
> On 05/03/2012 05:11 PM, Takuya Yoshikawa wrote:
> > On Thu, 03 May 2012 15:47:26 +0300
> > Avi Kivity <[email protected]> wrote:
> >
> > > On 05/03/2012 03:29 PM, Peter Zijlstra wrote:
> > > > On Thu, 2012-05-03 at 21:22 +0900, Takuya Yoshikawa wrote:
> > > > > Although the real use case is out of this RFC patch, we are now discussing
> > > > > a case in which we may hold a spin_lock for long time, ms order, depending
> > > > > on workload; and in that case, other threads -- VCPU threads -- should be
> > > > > given higher priority for that problematic lock.
> > > >
> > > > Firstly, if you can hold a lock that long, it shouldn't be a spinlock,
> > >
> > > In fact with your mm preemptibility work it can be made into a mutex, if
> > > the entire mmu notifier path can be done in task context. However it
> > > ends up a strange mutex - you can sleep while holding it but you may not
> > > allocate, because you might recurse into an mmu notifier again.
> > >
> > > Most uses of the lock only involve tweaking some bits though.
> >
> > I might find a real way to go.
> >
> > After your "mmu_lock -- TLB-flush" decoupling, we can change the current
> > get_dirty work flow like this:
> >
> > for ... {
> > take mmu_lock
> > for 4K*8 gfns { // with 4KB dirty_bitmap_buffer
> > xchg dirty bits // 64/32 gfns at once
> > write protect them
> > }
> > release mmu_lock
> > copy_to_user
> > }
> > TLB flush
> >
> > This reduces the size of dirty_bitmap_buffer and does not hold mmu_lock
> > so long.
>
> Good idea. Hopefully the lock acquisition costs are low enough - we're
> adding two atomic operations per iteration here.
>

btw, this requires my kvm_cond_flush_remote_tlbs(). Otherwise another
thread can acquire the lock, see a pagetable marked read-only by this
code, and proceed to shadow it, while the guest still has a writeable
tlb entry pointing at it.

--
error compiling committee.c: too many arguments to function

2012-05-03 15:47:49

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [RFC] sched: make callers check lock contention for cond_resched_lock()

On Thu, 2012-05-03 at 22:00 +0900, Takuya Yoshikawa wrote:
> But as I could not see why spin_needbreak() was differently
> implemented
> depending on CONFIG_PREEMPT, I wanted to understand the meaning.

Its been that way since before voluntary preemption was introduced, so
its possible Ingo simply missed that spot and nobody noticed until now.

Ingo, do you have any recollections from back when?

2012-05-04 02:43:41

by Michael wang

[permalink] [raw]

Subject: Re: [RFC] sched: make callers check lock contention for cond_resched_lock()

On 05/03/2012 09:00 PM, Takuya Yoshikawa wrote:

> On Thu, 03 May 2012 14:29:10 +0200
> Peter Zijlstra <[email protected]> wrote:
>
>> On Thu, 2012-05-03 at 21:22 +0900, Takuya Yoshikawa wrote:
>>> Although the real use case is out of this RFC patch, we are now discussing
>>> a case in which we may hold a spin_lock for long time, ms order, depending
>>> on workload; and in that case, other threads -- VCPU threads -- should be
>>> given higher priority for that problematic lock.
>>
>> Firstly, if you can hold a lock that long, it shouldn't be a spinlock,
>
> I agree with you in principle, but isn't cond_resched_lock() there for that?
>
>> secondly why isn't TIF_RESCHED being set if its running that long? That
>> should still make cond_resched_lock() break.
>
> I see.
>
> I did some tests using spin_is_contended() and need_resched() and saw
> that need_resched() was called as often as spin_is_contended(), so
> experimentally I understand your point.
>
> But as I could not see why spin_needbreak() was differently implemented
> depending on CONFIG_PREEMPT, I wanted to understand the meaning.

I think enable CONFIG_PREEMPT means allow preemption in kernel, so if
disabled, we can't reschedule a task if it is running in kernel not the
user space at a given time.

As the cond_resched_lock() was invoked in kernel, and looks like
cpu_relax() will give up cpu(I'm not sure whether this will invoke
schedule on some arch, just because that name...), so we can't do break
if CONFIG_PREEMPT disabled, because that will cause kernel preemption
while not allowed.

May be that's the reason why we need to consider CONFIG_PREEMPT in
spin_needbreak().

Regards,
Michael Wang

>
> Thanks,
> Takuya
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2012-05-10 22:04:06

by Takuya Yoshikawa

[permalink] [raw]

Subject: Re: [RFC] sched: make callers check lock contention for cond_resched_lock()

Replaced Ingo's address with kernel.org one,

On Thu, 03 May 2012 17:47:30 +0200
Peter Zijlstra <[email protected]> wrote:

> On Thu, 2012-05-03 at 22:00 +0900, Takuya Yoshikawa wrote:
> > But as I could not see why spin_needbreak() was differently
> > implemented
> > depending on CONFIG_PREEMPT, I wanted to understand the meaning.
>
> Its been that way since before voluntary preemption was introduced, so
> its possible Ingo simply missed that spot and nobody noticed until now.
>
> Ingo, do you have any recollections from back when?

ping

Takuya

2012-05-18 07:26:14

by Ingo Molnar

[permalink] [raw]

Subject: Re: [RFC] sched: make callers check lock contention for cond_resched_lock()

* Takuya Yoshikawa <[email protected]> wrote:

> Replaced Ingo's address with kernel.org one,
>
> On Thu, 03 May 2012 17:47:30 +0200
> Peter Zijlstra <[email protected]> wrote:
>
> > On Thu, 2012-05-03 at 22:00 +0900, Takuya Yoshikawa wrote:
> > > But as I could not see why spin_needbreak() was differently
> > > implemented
> > > depending on CONFIG_PREEMPT, I wanted to understand the meaning.
> >
> > Its been that way since before voluntary preemption was introduced, so
> > its possible Ingo simply missed that spot and nobody noticed until now.
> >
> > Ingo, do you have any recollections from back when?
>
> ping

I'm not sure we had a usable spin_is_contended() back then, nor
was the !PREEMPT case in my mind really.

( The patch looks ugly though, in 99% of the lines it just does
something that cond_resched_lock() itself could do. )

Thanks,

Ingo

2012-05-18 16:10:11

by Takuya Yoshikawa

[permalink] [raw]

Subject: Re: [RFC] sched: make callers check lock contention for cond_resched_lock()

On Fri, 18 May 2012 09:26:05 +0200
Ingo Molnar <[email protected]> wrote:

> I'm not sure we had a usable spin_is_contended() back then, nor
> was the !PREEMPT case in my mind really.

The fact that both spin_needbreak() and spin_is_contended() can be
used outside of sched is a bit confusing.

For example, in mm/compaction.c we are using spin_is_contended(), but
in mm/memory.c spin_needbreak().

BTW, the actual users of spin_is_contended() look to be only:
mm/compaction.c
security/keys/gc.c

> ( The patch looks ugly though, in 99% of the lines it just does
> something that cond_resched_lock() itself could do. )

Please ignore the patch. I have already found a way to solve my
problem without cond_resched().

Thanks,
Takuya