LinuxLists.cc - [PATCH 1/2] freezer: add unsafe versions of freezable helpers

[permalink] [raw]

Subject: Re: [PATCH 1/2] freezer: add unsafe versions of freezable helpers

On Mon, 6 May 2013 10:50:25 +0200
Peter Zijlstra <[email protected]> wrote:

> On Fri, May 03, 2013 at 02:04:09PM -0700, Colin Cross wrote:
> > NFS calls the freezable helpers with locks held, which is unsafe
> > and caused lockdep warnings when 6aa9707 "lockdep: check that no
> > locks held at freeze time" was applied (reverted in dbf520a).
> > Add new *_unsafe versions of the helpers that will not run the
> > lockdep test when 6aa9707 is reapplied, and call them from NFS.
>
> Am I the only one that would like a bit more information about why NFS does
> this and why we need to work around it?
>

NFS does this because this is how we initially "fixed" it to not block
the freezer. It sucks and is prone to deadlocking if you selectively
freeze processes via the cgroup freezer. It does basically "work" in
most cases though if the freezer is running to suspend the whole system.

I'm looking at fixing this by instead setting points within the NFS/RPC
layers that cause these calls to return something like ERESTARTSYS
instead when a freeze event comes in, depending on whether a call has
already been transmitted to the server.

That's likely to take a while though, and in the meantime we don't want
lockdep complaining every time someone fires off a RPC call with locks
held. So for now, we'd like to "opt out" of these lockdep warnings.

--
Jeff Layton <[email protected]>

2013-05-06 22:14:59

[permalink] [raw]

Subject: Re: [PATCH 1/2] freezer: add unsafe versions of freezable helpers

Hello, Linus.

On Mon, May 06, 2013 at 02:58:31PM -0700, Linus Torvalds wrote:
> Quite frankly, is it worth resurrecting these patches at all?
>
> The only things it actually complained about are not worth the pain
> fixing and are getting explicitly not warned about - is there any
> reason to believe the patches are worth maintaining and the extra
> complexity is worth it?

It might not be too useful for nfs but Colin has a patchset spreading
the use of the freezable schedules in a number of places in an attempt
to reduce freeze latency for android, which includes introducing a lot
of schedule_*_freezable() constructs, so I think we need these
warnings one way or the other so that we don't end up with more
problems like nfs currently has.

Thakns.

--
tejun

2013-05-06 19:57:55

[permalink] [raw]

Subject: Re: [PATCH 1/2] freezer: add unsafe versions of freezable helpers

On Mon, May 6, 2013 at 3:56 AM, Jeff Layton <[email protected]> wrote:
> On Fri, 3 May 2013 14:04:09 -0700
> Colin Cross <[email protected]> wrote:
>
>> NFS calls the freezable helpers with locks held, which is unsafe
>> and caused lockdep warnings when 6aa9707 "lockdep: check that no
>> locks held at freeze time" was applied (reverted in dbf520a).
>> Add new *_unsafe versions of the helpers that will not run the
>> lockdep test when 6aa9707 is reapplied, and call them from NFS.
>>
>> Signed-off-by: Colin Cross <[email protected]>
>> ---
>> fs/nfs/inode.c | 2 +-
>> fs/nfs/nfs3proc.c | 2 +-
>> fs/nfs/nfs4proc.c | 4 ++--
>> include/linux/freezer.h | 42 +++++++++++++++++++++++++++++++++++++++++-
>> net/sunrpc/sched.c | 2 +-
>> 5 files changed, 46 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
>> index 1f94167..53cbee5 100644
>> --- a/fs/nfs/inode.c
>> +++ b/fs/nfs/inode.c
>> @@ -79,7 +79,7 @@ int nfs_wait_bit_killable(void *word)
>> {
>> if (fatal_signal_pending(current))
>> return -ERESTARTSYS;
>> - freezable_schedule();
>> + freezable_schedule_unsafe();
>> return 0;
>> }
>> EXPORT_SYMBOL_GPL(nfs_wait_bit_killable);
>> diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
>> index 43ea96c..ce90eb4 100644
>> --- a/fs/nfs/nfs3proc.c
>> +++ b/fs/nfs/nfs3proc.c
>> @@ -33,7 +33,7 @@ nfs3_rpc_wrapper(struct rpc_clnt *clnt, struct rpc_message *msg, int flags)
>> res = rpc_call_sync(clnt, msg, flags);
>> if (res != -EJUKEBOX)
>> break;
>> - freezable_schedule_timeout_killable(NFS_JUKEBOX_RETRY_TIME);
>> + freezable_schedule_timeout_killable_unsafe(NFS_JUKEBOX_RETRY_TIME);
>> res = -ERESTARTSYS;
>> } while (!fatal_signal_pending(current));
>> return res;
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index 0ad025e..a236077 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -266,7 +266,7 @@ static int nfs4_delay(struct rpc_clnt *clnt, long *timeout)
>> *timeout = NFS4_POLL_RETRY_MIN;
>> if (*timeout > NFS4_POLL_RETRY_MAX)
>> *timeout = NFS4_POLL_RETRY_MAX;
>> - freezable_schedule_timeout_killable(*timeout);
>> + freezable_schedule_timeout_killable_unsafe(*timeout);
>> if (fatal_signal_pending(current))
>> res = -ERESTARTSYS;
>> *timeout <<= 1;
>> @@ -4309,7 +4309,7 @@ int nfs4_proc_delegreturn(struct inode *inode, struct rpc_cred *cred, const nfs4
>> static unsigned long
>> nfs4_set_lock_task_retry(unsigned long timeout)
>> {
>> - freezable_schedule_timeout_killable(timeout);
>> + freezable_schedule_timeout_killable_unsafe(timeout);
>> timeout <<= 1;
>> if (timeout > NFS4_LOCK_MAXTIMEOUT)
>> return NFS4_LOCK_MAXTIMEOUT;
>> diff --git a/include/linux/freezer.h b/include/linux/freezer.h
>> index e70df40..5b31e21c 100644
>> --- a/include/linux/freezer.h
>> +++ b/include/linux/freezer.h
>> @@ -46,7 +46,11 @@ extern int freeze_kernel_threads(void);
>> extern void thaw_processes(void);
>> extern void thaw_kernel_threads(void);
>>
>> -static inline bool try_to_freeze(void)
>> +/*
>> + * DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION
>> + * If try_to_freeze causes a lockdep warning it means the caller may deadlock
>> + */
>> +static inline bool try_to_freeze_unsafe(void)
>> {
>> might_sleep();
>> if (likely(!freezing(current)))
>> @@ -54,6 +58,11 @@ static inline bool try_to_freeze(void)
>> return __refrigerator(false);
>> }
>>
>> +static inline bool try_to_freeze(void)
>> +{
>> + return try_to_freeze_unsafe();
>> +}
>> +
>> extern bool freeze_task(struct task_struct *p);
>> extern bool set_freezable(void);
>>
>> @@ -115,6 +124,14 @@ static inline void freezer_count(void)
>> try_to_freeze();
>> }
>>
>> +/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
>> +static inline void freezer_count_unsafe(void)
>> +{
>> + current->flags &= ~PF_FREEZER_SKIP;
>> + smp_mb();
>> + try_to_freeze_unsafe();
>> +}
>> +
>> /**
>> * freezer_should_skip - whether to skip a task when determining frozen
>> * state is reached
>> @@ -152,6 +169,14 @@ static inline bool freezer_should_skip(struct task_struct *p)
>> freezer_count(); \
>> })
>>
>> +/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
>> +#define freezable_schedule_unsafe() \
>> +({ \
>> + freezer_do_not_count(); \
>> + schedule(); \
>> + freezer_count_unsafe(); \
>> +})
>> +
>> /* Like schedule_timeout_killable(), but should not block the freezer. */
>> #define freezable_schedule_timeout_killable(timeout) \
>> ({ \
>> @@ -162,6 +187,16 @@ static inline bool freezer_should_skip(struct task_struct *p)
>> __retval; \
>> })
>>
>> +/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
>> +#define freezable_schedule_timeout_killable_unsafe(timeout) \
>> +({ \
>> + long __retval; \
>> + freezer_do_not_count(); \
>> + __retval = schedule_timeout_killable(timeout); \
>> + freezer_count_unsafe(); \
>> + __retval; \
>> +})
>> +
>> /*
>> * Freezer-friendly wrappers around wait_event_interruptible(),
>> * wait_event_killable() and wait_event_interruptible_timeout(), originally
>> @@ -225,9 +260,14 @@ static inline void set_freezable(void) {}
>>
>> #define freezable_schedule() schedule()
>>
>> +#define freezable_schedule_unsafe() schedule()
>> +
>> #define freezable_schedule_timeout_killable(timeout) \
>> schedule_timeout_killable(timeout)
>>
>> +#define freezable_schedule_timeout_killable_unsafe(timeout) \
>> + schedule_timeout_killable(timeout)
>> +
>> #define wait_event_freezable(wq, condition) \
>> wait_event_interruptible(wq, condition)
>>
>> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
>> index f8529fc..8dcfadc 100644
>> --- a/net/sunrpc/sched.c
>> +++ b/net/sunrpc/sched.c
>> @@ -254,7 +254,7 @@ static int rpc_wait_bit_killable(void *word)
>> {
>> if (fatal_signal_pending(current))
>> return -ERESTARTSYS;
>> - freezable_schedule();
>> + freezable_schedule_unsafe();
>> return 0;
>> }
>>
>
> Looks reasonable, but note that CIFS uses wait_event_freezekillable
> with locks held too, which will likely have the same problem and will
> need the same workaround for now.

I didn't see any lockdep warnings reported on the mailing list when
the lockdep patch was previously applied, can you point me to the lock
that is held when wait_event_freezkillable is called? I don't want to
add an _unsafe call where its not needed and cause more confusion.

2013-05-05 00:23:02

[permalink] [raw]

Subject: Re: [PATCH 2/2] lockdep: check that no locks held at freeze time

On Sat, May 4, 2013 at 5:05 PM, Pavel Machek <[email protected]> wrote:
> Hi!
>
>> >> >> --- a/kernel/exit.c
>> >> >> +++ b/kernel/exit.c
>> >> >> @@ -835,7 +835,7 @@ void do_exit(long code)
>> >> >> /*
>> >> >> * Make sure we are holding no locks:
>> >> >> */
>> >> >> - debug_check_no_locks_held(tsk);
>> >> >> + debug_check_no_locks_held();
>> >> >
>> >> > Is task guaranteed == current?
>> >>
>> >> Yes, the first line of do_exit is:
>> >> struct task_struct *tsk = current;
>> >
>> > Aha, I understand it now.
>> >
>> > Accessing current is slower than local variable. So your "new" code
>> > will work but will be slower. Please revert this part.
>>
>> Using current instead of passing in tsk was done at Andrew Morton's
>> suggestion, and makes no difference from the freezer's perspective
>> since it would have to use current to get the task to pass in, so I'm
>> going to leave it as is.
>
> Well, current is:
>
> static inline struct thread_info *current_thread_info(void)
> {
> register unsigned long sp asm ("sp");
> return (struct thread_info *)(sp & ~(THREAD_SIZE - 1));
> }
>
> #define get_current() (current_thread_info()->task)
>
> #define current get_current()
>
> Instead of passing computed value to debug_check_no_locks_held(), you
> force it to be computed again. do_exit() performance matters for
> configure scripts, etc.
>
> I'd say it makes sense to keep the optimalization. akpm can correct
> me.

That translates to 3 instructions, with no memory accesses:
c0008350: e1a0300d mov r3, sp
c0008354: e3c32d7f bic r2, r3, #8128 ; 0x1fc0
c0008358: e3c2203f bic r2, r2, #63 ; 0x3f

2013-05-04 23:49:57

[permalink] [raw]

Subject: Re: [PATCH 2/2] lockdep: check that no locks held at freeze time

On Sat, May 4, 2013 at 3:57 PM, Pavel Machek <[email protected]> wrote:
> On Sat 2013-05-04 13:27:23, Colin Cross wrote:
>> On Sat, May 4, 2013 at 6:04 AM, Pavel Machek <[email protected]> wrote:
>> > On Fri 2013-05-03 14:04:10, Colin Cross wrote:
>> >> From: Mandeep Singh Baines <[email protected]>
>> >>
>> >> We shouldn't try_to_freeze if locks are held. Holding a lock can cause a
>> >> deadlock if the lock is later acquired in the suspend or hibernate path
>> >> (e.g. by dpm). Holding a lock can also cause a deadlock in the case of
>> >> cgroup_freezer if a lock is held inside a frozen cgroup that is later
>> >> acquired by a process outside that group.
>> >
>> > Ok, but this does not explain why
>> >
>> >> --- a/include/linux/debug_locks.h
>> >> +++ b/include/linux/debug_locks.h
>> >> @@ -51,7 +51,7 @@ struct task_struct;
>> >> extern void debug_show_all_locks(void);
>> >> extern void debug_show_held_locks(struct task_struct *task);
>> >> extern void debug_check_no_locks_freed(const void *from, unsigned long len);
>> >> -extern void debug_check_no_locks_held(struct task_struct *task);
>> >> +extern void debug_check_no_locks_held(void);
>> >> #else
>> >> static inline void debug_show_all_locks(void)
>> >> {
>> >
>> > Removing task_struct argument from those functions is good idea?
>>
>> This is an existing patch that was merged in 3.9 and then reverted
>> again, so it has already been reviewed and accepted once.
>
> Well, it was also reverted once :-).

It was reverted because of problems in NFS, not because of any problem
with this patch.

>> >> --- a/kernel/exit.c
>> >> +++ b/kernel/exit.c
>> >> @@ -835,7 +835,7 @@ void do_exit(long code)
>> >> /*
>> >> * Make sure we are holding no locks:
>> >> */
>> >> - debug_check_no_locks_held(tsk);
>> >> + debug_check_no_locks_held();
>> >
>> > Is task guaranteed == current?
>>
>> Yes, the first line of do_exit is:
>> struct task_struct *tsk = current;
>
> Aha, I understand it now.
>
> Accessing current is slower than local variable. So your "new" code
> will work but will be slower. Please revert this part.

Using current instead of passing in tsk was done at Andrew Morton's
suggestion, and makes no difference from the freezer's perspective
since it would have to use current to get the task to pass in, so I'm
going to leave it as is.

2013-05-04 13:00:35

[permalink] [raw]

Subject: Re: [PATCH 1/2] freezer: add unsafe versions of freezable helpers

Hi!

> NFS calls the freezable helpers with locks held, which is unsafe
> and caused lockdep warnings when 6aa9707 "lockdep: check that no
> locks held at freeze time" was applied (reverted in dbf520a).
> Add new *_unsafe versions of the helpers that will not run the
> lockdep test when 6aa9707 is reapplied, and call them from NFS.
>
> Signed-off-by: Colin Cross <[email protected]>

Looks mostly good.

> @@ -152,6 +169,14 @@ static inline bool freezer_should_skip(struct task_struct *p)
> freezer_count(); \
> })
>
> +/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
> +#define freezable_schedule_unsafe() \
> +({ \
> + freezer_do_not_count(); \
> + schedule(); \
> + freezer_count_unsafe(); \
> +})
> +

Make it inline function? :-). Add short explanation why it is good
idea?

> +/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
> +#define freezable_schedule_timeout_killable_unsafe(timeout) \
> +({ \
> + long __retval; \
> + freezer_do_not_count(); \
> + __retval = schedule_timeout_killable(timeout); \
> + freezer_count_unsafe(); \
> + __retval; \
> +})

Function too?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2013-05-06 19:33:14

[permalink] [raw]

Subject: Re: [PATCH 2/2] lockdep: check that no locks held at freeze time

Hello,

On Mon, May 06, 2013 at 12:30:19PM -0700, Colin Cross wrote:
> > I don't care about %current change, especially given that it's a debug
> > interface but that really should be a separate patch, so please split
> > it out if you want it (and I think we want it).
>
> The current change was requested by akpm and was part of the original
> patch. Is it really worth confusing the history of this patch even
> more, applying it the first time, reverting it, and then applying it
> again in two parts?

I don't know. The patch seems confusing to me. It really is about
adding single lockdep annotation but comes with other changes. I
don't think it's a big deal either way but at least we wouldn't be
having this %current vs. @tsk conversation which is mostly irrelevant
to the actual proposed change, right? It really should have been a
separate patch from the beginning. Just refer to the original commit
and explain what happened?

Thanks.

--
tejun

2013-05-05 09:23:23

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH 1/2] freezer: add unsafe versions of freezable helpers

* Colin Cross <[email protected]> wrote:

> NFS calls the freezable helpers with locks held, which is unsafe
> and caused lockdep warnings when 6aa9707 "lockdep: check that no
> locks held at freeze time" was applied (reverted in dbf520a).
> Add new *_unsafe versions of the helpers that will not run the
> lockdep test when 6aa9707 is reapplied, and call them from NFS.
>
> Signed-off-by: Colin Cross <[email protected]>
> ---
> fs/nfs/inode.c | 2 +-
> fs/nfs/nfs3proc.c | 2 +-
> fs/nfs/nfs4proc.c | 4 ++--
> include/linux/freezer.h | 42 +++++++++++++++++++++++++++++++++++++++++-
> net/sunrpc/sched.c | 2 +-
> 5 files changed, 46 insertions(+), 6 deletions(-)
>
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index 1f94167..53cbee5 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -79,7 +79,7 @@ int nfs_wait_bit_killable(void *word)
> {
> if (fatal_signal_pending(current))
> return -ERESTARTSYS;
> - freezable_schedule();
> + freezable_schedule_unsafe();

I'd suggest naming such variants _unkillable() instead of _unsafe().

There's nothing inherently 'unsafe' about it: the user asked for a hard
NFS mount and is getting it: with the side effect that it exposes the
machine to network delays in a 'hard' way as well. Which means suspend may
block indefinitely as well on network failure.

So it's two conflicting user requirements: 'hard NFS mount' and 'suspend
now'. We pick the lesser evil, the requirement that is considered higher
prio: the hard NFS mount in this case.

Thanks,

Ingo

2013-05-06 08:52:11

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 1/2] freezer: add unsafe versions of freezable helpers

On Fri, May 03, 2013 at 02:04:09PM -0700, Colin Cross wrote:
> NFS calls the freezable helpers with locks held, which is unsafe
> and caused lockdep warnings when 6aa9707 "lockdep: check that no
> locks held at freeze time" was applied (reverted in dbf520a).
> Add new *_unsafe versions of the helpers that will not run the
> lockdep test when 6aa9707 is reapplied, and call them from NFS.

Am I the only one that would like a bit more information about why NFS does
this and why we need to work around it?

>From replies in this thread I surmise its got something to do with hard NFS
mounts. And I suppose I could go apply google to lkml and try and find the
previous discussion, but really this should be in the Changelog.

2013-05-06 10:56:52

[permalink] [raw]

Subject: Re: [PATCH 1/2] freezer: add unsafe versions of freezable helpers

On Fri, 3 May 2013 14:04:09 -0700
Colin Cross <[email protected]> wrote:

> NFS calls the freezable helpers with locks held, which is unsafe
> and caused lockdep warnings when 6aa9707 "lockdep: check that no
> locks held at freeze time" was applied (reverted in dbf520a).
> Add new *_unsafe versions of the helpers that will not run the
> lockdep test when 6aa9707 is reapplied, and call them from NFS.
>
> Signed-off-by: Colin Cross <[email protected]>
> ---
> fs/nfs/inode.c | 2 +-
> fs/nfs/nfs3proc.c | 2 +-
> fs/nfs/nfs4proc.c | 4 ++--
> include/linux/freezer.h | 42 +++++++++++++++++++++++++++++++++++++++++-
> net/sunrpc/sched.c | 2 +-
> 5 files changed, 46 insertions(+), 6 deletions(-)
>
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index 1f94167..53cbee5 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -79,7 +79,7 @@ int nfs_wait_bit_killable(void *word)
> {
> if (fatal_signal_pending(current))
> return -ERESTARTSYS;
> - freezable_schedule();
> + freezable_schedule_unsafe();
> return 0;
> }
> EXPORT_SYMBOL_GPL(nfs_wait_bit_killable);
> diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
> index 43ea96c..ce90eb4 100644
> --- a/fs/nfs/nfs3proc.c
> +++ b/fs/nfs/nfs3proc.c
> @@ -33,7 +33,7 @@ nfs3_rpc_wrapper(struct rpc_clnt *clnt, struct rpc_message *msg, int flags)
> res = rpc_call_sync(clnt, msg, flags);
> if (res != -EJUKEBOX)
> break;
> - freezable_schedule_timeout_killable(NFS_JUKEBOX_RETRY_TIME);
> + freezable_schedule_timeout_killable_unsafe(NFS_JUKEBOX_RETRY_TIME);
> res = -ERESTARTSYS;
> } while (!fatal_signal_pending(current));
> return res;
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 0ad025e..a236077 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -266,7 +266,7 @@ static int nfs4_delay(struct rpc_clnt *clnt, long *timeout)
> *timeout = NFS4_POLL_RETRY_MIN;
> if (*timeout > NFS4_POLL_RETRY_MAX)
> *timeout = NFS4_POLL_RETRY_MAX;
> - freezable_schedule_timeout_killable(*timeout);
> + freezable_schedule_timeout_killable_unsafe(*timeout);
> if (fatal_signal_pending(current))
> res = -ERESTARTSYS;
> *timeout <<= 1;
> @@ -4309,7 +4309,7 @@ int nfs4_proc_delegreturn(struct inode *inode, struct rpc_cred *cred, const nfs4
> static unsigned long
> nfs4_set_lock_task_retry(unsigned long timeout)
> {
> - freezable_schedule_timeout_killable(timeout);
> + freezable_schedule_timeout_killable_unsafe(timeout);
> timeout <<= 1;
> if (timeout > NFS4_LOCK_MAXTIMEOUT)
> return NFS4_LOCK_MAXTIMEOUT;
> diff --git a/include/linux/freezer.h b/include/linux/freezer.h
> index e70df40..5b31e21c 100644
> --- a/include/linux/freezer.h
> +++ b/include/linux/freezer.h
> @@ -46,7 +46,11 @@ extern int freeze_kernel_threads(void);
> extern void thaw_processes(void);
> extern void thaw_kernel_threads(void);
>
> -static inline bool try_to_freeze(void)
> +/*
> + * DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION
> + * If try_to_freeze causes a lockdep warning it means the caller may deadlock
> + */
> +static inline bool try_to_freeze_unsafe(void)
> {
> might_sleep();
> if (likely(!freezing(current)))
> @@ -54,6 +58,11 @@ static inline bool try_to_freeze(void)
> return __refrigerator(false);
> }
>
> +static inline bool try_to_freeze(void)
> +{
> + return try_to_freeze_unsafe();
> +}
> +
> extern bool freeze_task(struct task_struct *p);
> extern bool set_freezable(void);
>
> @@ -115,6 +124,14 @@ static inline void freezer_count(void)
> try_to_freeze();
> }
>
> +/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
> +static inline void freezer_count_unsafe(void)
> +{
> + current->flags &= ~PF_FREEZER_SKIP;
> + smp_mb();
> + try_to_freeze_unsafe();
> +}
> +
> /**
> * freezer_should_skip - whether to skip a task when determining frozen
> * state is reached
> @@ -152,6 +169,14 @@ static inline bool freezer_should_skip(struct task_struct *p)
> freezer_count(); \
> })
>
> +/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
> +#define freezable_schedule_unsafe() \
> +({ \
> + freezer_do_not_count(); \
> + schedule(); \
> + freezer_count_unsafe(); \
> +})
> +
> /* Like schedule_timeout_killable(), but should not block the freezer. */
> #define freezable_schedule_timeout_killable(timeout) \
> ({ \
> @@ -162,6 +187,16 @@ static inline bool freezer_should_skip(struct task_struct *p)
> __retval; \
> })
>
> +/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
> +#define freezable_schedule_timeout_killable_unsafe(timeout) \
> +({ \
> + long __retval; \
> + freezer_do_not_count(); \
> + __retval = schedule_timeout_killable(timeout); \
> + freezer_count_unsafe(); \
> + __retval; \
> +})
> +
> /*
> * Freezer-friendly wrappers around wait_event_interruptible(),
> * wait_event_killable() and wait_event_interruptible_timeout(), originally
> @@ -225,9 +260,14 @@ static inline void set_freezable(void) {}
>
> #define freezable_schedule() schedule()
>
> +#define freezable_schedule_unsafe() schedule()
> +
> #define freezable_schedule_timeout_killable(timeout) \
> schedule_timeout_killable(timeout)
>
> +#define freezable_schedule_timeout_killable_unsafe(timeout) \
> + schedule_timeout_killable(timeout)
> +
> #define wait_event_freezable(wq, condition) \
> wait_event_interruptible(wq, condition)
>
> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> index f8529fc..8dcfadc 100644
> --- a/net/sunrpc/sched.c
> +++ b/net/sunrpc/sched.c
> @@ -254,7 +254,7 @@ static int rpc_wait_bit_killable(void *word)
> {
> if (fatal_signal_pending(current))
> return -ERESTARTSYS;
> - freezable_schedule();
> + freezable_schedule_unsafe();
> return 0;
> }
>

Looks reasonable, but note that CIFS uses wait_event_freezekillable
with locks held too, which will likely have the same problem and will
need the same workaround for now.

--
Jeff Layton <[email protected]>

2013-05-05 01:13:36

[permalink] [raw]

Subject: Re: [PATCH 2/2] lockdep: check that no locks held at freeze time

On Sat 2013-05-04 17:23:01, Colin Cross wrote:
> On Sat, May 4, 2013 at 5:05 PM, Pavel Machek <[email protected]> wrote:
> > Hi!
> >
> >> >> >> --- a/kernel/exit.c
> >> >> >> +++ b/kernel/exit.c
> >> >> >> @@ -835,7 +835,7 @@ void do_exit(long code)
> >> >> >> /*
> >> >> >> * Make sure we are holding no locks:
> >> >> >> */
> >> >> >> - debug_check_no_locks_held(tsk);
> >> >> >> + debug_check_no_locks_held();
> >> >> >
> >> >> > Is task guaranteed == current?
> >> >>
> >> >> Yes, the first line of do_exit is:
> >> >> struct task_struct *tsk = current;
> >> >
> >> > Aha, I understand it now.
> >> >
> >> > Accessing current is slower than local variable. So your "new" code
> >> > will work but will be slower. Please revert this part.
> >>
> >> Using current instead of passing in tsk was done at Andrew Morton's
> >> suggestion, and makes no difference from the freezer's perspective
> >> since it would have to use current to get the task to pass in, so I'm
> >> going to leave it as is.
> >
> > Well, current is:
> >
> > static inline struct thread_info *current_thread_info(void)
> > {
> > register unsigned long sp asm ("sp");
> > return (struct thread_info *)(sp & ~(THREAD_SIZE - 1));
> > }
> >
> > #define get_current() (current_thread_info()->task)
> >
> > #define current get_current()
> >
> > Instead of passing computed value to debug_check_no_locks_held(), you
> > force it to be computed again. do_exit() performance matters for
> > configure scripts, etc.
> >
> > I'd say it makes sense to keep the optimalization. akpm can correct
> > me.
>
> That translates to 3 instructions, with no memory accesses:
> c0008350: e1a0300d mov r3, sp
> c0008354: e3c32d7f bic r2, r3, #8128 ; 0x1fc0
> c0008358: e3c2203f bic r2, r2, #63 ; 0x3f

On ARM, you are right. It seems to have memory access on s390 and
x86-64. Ok, it is probably going to be in cache, but...

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2013-05-06 08:57:37

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 2/2] lockdep: check that no locks held at freeze time

On Sun, May 05, 2013 at 11:18:44AM +0200, Ingo Molnar wrote:
> That's the old 32-bit x86 trick to compute 'current' from the kernel stack
> pointer.
>
> It can be done better - for example on platforms with optimized percpu
> variables (x86-64) it looks like this:

Doesn't i386 have all the funny per-cpu stuff too? So the only reason it still
does the fugly stack based thing is because nobody could be arsed to do the
work of converting it.

2013-05-06 14:33:31

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH 2/2] lockdep: check that no locks held at freeze time

On Mon, May 6, 2013 at 1:55 AM, Peter Zijlstra <[email protected]> wrote:
>
> Doesn't i386 have all the funny per-cpu stuff too? So the only reason it still
> does the fugly stack based thing is because nobody could be arsed to do the
> work of converting it.

Umm. That "fugly stack-based" thing is better than the per-cpu crap.

The percpu stuff implies a memory load. The stack based thing gets
thread_info with pure register accesses. Much better.

For "current()" the per-cpu thing may be better, but if you actually
need the thread-info (not the case here, but in other places), the
stack masking is superior when it works (ie when you don't have
multi-stack issues due to irq's etc)

Linus

2013-05-06 12:11:08

[permalink] [raw]

Subject: Re: [PATCH 2/2] lockdep: check that no locks held at freeze time

1;2403;0cOn Mon 2013-05-06 10:55:47, Peter Zijlstra wrote:
> On Sun, May 05, 2013 at 11:18:44AM +0200, Ingo Molnar wrote:
> > That's the old 32-bit x86 trick to compute 'current' from the kernel stack
> > pointer.
> >
> > It can be done better - for example on platforms with optimized percpu
> > variables (x86-64) it looks like this:
>
> Doesn't i386 have all the funny per-cpu stuff too? So the only reason it still
> does the fugly stack based thing is because nobody could be arsed to do the
> work of converting it.

(Note that the quoted example was from ARM. But also note that the
percpu stuff requires memory access, so...)
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2013-05-05 09:18:49

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH 2/2] lockdep: check that no locks held at freeze time

* Pavel Machek <[email protected]> wrote:

> Hi!
>
> > >> >> --- a/kernel/exit.c
> > >> >> +++ b/kernel/exit.c
> > >> >> @@ -835,7 +835,7 @@ void do_exit(long code)
> > >> >> /*
> > >> >> * Make sure we are holding no locks:
> > >> >> */
> > >> >> - debug_check_no_locks_held(tsk);
> > >> >> + debug_check_no_locks_held();
> > >> >
> > >> > Is task guaranteed == current?
> > >>
> > >> Yes, the first line of do_exit is:
> > >> struct task_struct *tsk = current;
> > >
> > > Aha, I understand it now.
> > >
> > > Accessing current is slower than local variable. So your "new" code
> > > will work but will be slower. Please revert this part.
> >
> > Using current instead of passing in tsk was done at Andrew Morton's
> > suggestion, and makes no difference from the freezer's perspective
> > since it would have to use current to get the task to pass in, so I'm
> > going to leave it as is.
>
> Well, current is:
>
> static inline struct thread_info *current_thread_info(void)
> {
> register unsigned long sp asm ("sp");
> return (struct thread_info *)(sp & ~(THREAD_SIZE - 1));
> }

That's the old 32-bit x86 trick to compute 'current' from the kernel stack
pointer.

It can be done better - for example on platforms with optimized percpu
variables (x86-64) it looks like this:

static inline struct thread_info *current_thread_info(void)
{
struct thread_info *ti;
ti = (void *)(this_cpu_read_stable(kernel_stack) +
KERNEL_STACK_OFFSET - THREAD_SIZE);
return ti;
}

Which turns the computation of 'current' into a single instruction. For
example, to access current->pid [which fields is at offset 0x2a4], we get:

3ad0: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax
3ad7: 00 00
3ad9: 8b 80 a4 02 00 00 mov 0x2a4(%rax),%eax

I also agree with the removal of the 'tsk' parameter because the function
itself internally assumes that tsk == current.

[ We could perhaps rename the function to
debug_check_no_locks_held_curr(), to make it clear it operates on the
current task. ]

Thanks,

Ingo

2013-05-06 19:57:46

by Rafael J. Wysocki

[permalink] [raw]

Subject: Re: [PATCH 2/2] lockdep: check that no locks held at freeze time

On Monday, May 06, 2013 12:33:07 PM Tejun Heo wrote:
> Hello,
>
> On Mon, May 06, 2013 at 12:30:19PM -0700, Colin Cross wrote:
> > > I don't care about %current change, especially given that it's a debug
> > > interface but that really should be a separate patch, so please split
> > > it out if you want it (and I think we want it).
> >
> > The current change was requested by akpm and was part of the original
> > patch. Is it really worth confusing the history of this patch even
> > more, applying it the first time, reverting it, and then applying it
> > again in two parts?
>
> I don't know. The patch seems confusing to me. It really is about
> adding single lockdep annotation but comes with other changes. I
> don't think it's a big deal either way but at least we wouldn't be
> having this %current vs. @tsk conversation which is mostly irrelevant
> to the actual proposed change, right? It really should have been a
> separate patch from the beginning. Just refer to the original commit
> and explain what happened?

Yeah. I'd prefer that very much.

Thanks,
Rafael

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-05-06 21:54:54

[permalink] [raw]

Subject: Re: [PATCH 1/2] freezer: add unsafe versions of freezable helpers

On Mon, May 6, 2013 at 2:43 PM, Jeff Layton <[email protected]> wrote:
> On Mon, 6 May 2013 12:57:54 -0700
> Colin Cross <[email protected]> wrote:
>
>> On Mon, May 6, 2013 at 3:56 AM, Jeff Layton <[email protected]> wrote:
>> > On Fri, 3 May 2013 14:04:09 -0700
>> > Colin Cross <[email protected]> wrote:
>> >
>> >> NFS calls the freezable helpers with locks held, which is unsafe
>> >> and caused lockdep warnings when 6aa9707 "lockdep: check that no
>> >> locks held at freeze time" was applied (reverted in dbf520a).
>> >> Add new *_unsafe versions of the helpers that will not run the
>> >> lockdep test when 6aa9707 is reapplied, and call them from NFS.
>> >>
>> >> Signed-off-by: Colin Cross <[email protected]>
>> >> ---
>> >> fs/nfs/inode.c | 2 +-
>> >> fs/nfs/nfs3proc.c | 2 +-
>> >> fs/nfs/nfs4proc.c | 4 ++--
>> >> include/linux/freezer.h | 42 +++++++++++++++++++++++++++++++++++++++++-
>> >> net/sunrpc/sched.c | 2 +-
>> >> 5 files changed, 46 insertions(+), 6 deletions(-)
>> >>
>> >> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
>> >> index 1f94167..53cbee5 100644
>> >> --- a/fs/nfs/inode.c
>> >> +++ b/fs/nfs/inode.c
>> >> @@ -79,7 +79,7 @@ int nfs_wait_bit_killable(void *word)
>> >> {
>> >> if (fatal_signal_pending(current))
>> >> return -ERESTARTSYS;
>> >> - freezable_schedule();
>> >> + freezable_schedule_unsafe();
>> >> return 0;
>> >> }
>> >> EXPORT_SYMBOL_GPL(nfs_wait_bit_killable);
>> >> diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
>> >> index 43ea96c..ce90eb4 100644
>> >> --- a/fs/nfs/nfs3proc.c
>> >> +++ b/fs/nfs/nfs3proc.c
>> >> @@ -33,7 +33,7 @@ nfs3_rpc_wrapper(struct rpc_clnt *clnt, struct rpc_message *msg, int flags)
>> >> res = rpc_call_sync(clnt, msg, flags);
>> >> if (res != -EJUKEBOX)
>> >> break;
>> >> - freezable_schedule_timeout_killable(NFS_JUKEBOX_RETRY_TIME);
>> >> + freezable_schedule_timeout_killable_unsafe(NFS_JUKEBOX_RETRY_TIME);
>> >> res = -ERESTARTSYS;
>> >> } while (!fatal_signal_pending(current));
>> >> return res;
>> >> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> >> index 0ad025e..a236077 100644
>> >> --- a/fs/nfs/nfs4proc.c
>> >> +++ b/fs/nfs/nfs4proc.c
>> >> @@ -266,7 +266,7 @@ static int nfs4_delay(struct rpc_clnt *clnt, long *timeout)
>> >> *timeout = NFS4_POLL_RETRY_MIN;
>> >> if (*timeout > NFS4_POLL_RETRY_MAX)
>> >> *timeout = NFS4_POLL_RETRY_MAX;
>> >> - freezable_schedule_timeout_killable(*timeout);
>> >> + freezable_schedule_timeout_killable_unsafe(*timeout);
>> >> if (fatal_signal_pending(current))
>> >> res = -ERESTARTSYS;
>> >> *timeout <<= 1;
>> >> @@ -4309,7 +4309,7 @@ int nfs4_proc_delegreturn(struct inode *inode, struct rpc_cred *cred, const nfs4
>> >> static unsigned long
>> >> nfs4_set_lock_task_retry(unsigned long timeout)
>> >> {
>> >> - freezable_schedule_timeout_killable(timeout);
>> >> + freezable_schedule_timeout_killable_unsafe(timeout);
>> >> timeout <<= 1;
>> >> if (timeout > NFS4_LOCK_MAXTIMEOUT)
>> >> return NFS4_LOCK_MAXTIMEOUT;
>> >> diff --git a/include/linux/freezer.h b/include/linux/freezer.h
>> >> index e70df40..5b31e21c 100644
>> >> --- a/include/linux/freezer.h
>> >> +++ b/include/linux/freezer.h
>> >> @@ -46,7 +46,11 @@ extern int freeze_kernel_threads(void);
>> >> extern void thaw_processes(void);
>> >> extern void thaw_kernel_threads(void);
>> >>
>> >> -static inline bool try_to_freeze(void)
>> >> +/*
>> >> + * DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION
>> >> + * If try_to_freeze causes a lockdep warning it means the caller may deadlock
>> >> + */
>> >> +static inline bool try_to_freeze_unsafe(void)
>> >> {
>> >> might_sleep();
>> >> if (likely(!freezing(current)))
>> >> @@ -54,6 +58,11 @@ static inline bool try_to_freeze(void)
>> >> return __refrigerator(false);
>> >> }
>> >>
>> >> +static inline bool try_to_freeze(void)
>> >> +{
>> >> + return try_to_freeze_unsafe();
>> >> +}
>> >> +
>> >> extern bool freeze_task(struct task_struct *p);
>> >> extern bool set_freezable(void);
>> >>
>> >> @@ -115,6 +124,14 @@ static inline void freezer_count(void)
>> >> try_to_freeze();
>> >> }
>> >>
>> >> +/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
>> >> +static inline void freezer_count_unsafe(void)
>> >> +{
>> >> + current->flags &= ~PF_FREEZER_SKIP;
>> >> + smp_mb();
>> >> + try_to_freeze_unsafe();
>> >> +}
>> >> +
>> >> /**
>> >> * freezer_should_skip - whether to skip a task when determining frozen
>> >> * state is reached
>> >> @@ -152,6 +169,14 @@ static inline bool freezer_should_skip(struct task_struct *p)
>> >> freezer_count(); \
>> >> })
>> >>
>> >> +/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
>> >> +#define freezable_schedule_unsafe() \
>> >> +({ \
>> >> + freezer_do_not_count(); \
>> >> + schedule(); \
>> >> + freezer_count_unsafe(); \
>> >> +})
>> >> +
>> >> /* Like schedule_timeout_killable(), but should not block the freezer. */
>> >> #define freezable_schedule_timeout_killable(timeout) \
>> >> ({ \
>> >> @@ -162,6 +187,16 @@ static inline bool freezer_should_skip(struct task_struct *p)
>> >> __retval; \
>> >> })
>> >>
>> >> +/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
>> >> +#define freezable_schedule_timeout_killable_unsafe(timeout) \
>> >> +({ \
>> >> + long __retval; \
>> >> + freezer_do_not_count(); \
>> >> + __retval = schedule_timeout_killable(timeout); \
>> >> + freezer_count_unsafe(); \
>> >> + __retval; \
>> >> +})
>> >> +
>> >> /*
>> >> * Freezer-friendly wrappers around wait_event_interruptible(),
>> >> * wait_event_killable() and wait_event_interruptible_timeout(), originally
>> >> @@ -225,9 +260,14 @@ static inline void set_freezable(void) {}
>> >>
>> >> #define freezable_schedule() schedule()
>> >>
>> >> +#define freezable_schedule_unsafe() schedule()
>> >> +
>> >> #define freezable_schedule_timeout_killable(timeout) \
>> >> schedule_timeout_killable(timeout)
>> >>
>> >> +#define freezable_schedule_timeout_killable_unsafe(timeout) \
>> >> + schedule_timeout_killable(timeout)
>> >> +
>> >> #define wait_event_freezable(wq, condition) \
>> >> wait_event_interruptible(wq, condition)
>> >>
>> >> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
>> >> index f8529fc..8dcfadc 100644
>> >> --- a/net/sunrpc/sched.c
>> >> +++ b/net/sunrpc/sched.c
>> >> @@ -254,7 +254,7 @@ static int rpc_wait_bit_killable(void *word)
>> >> {
>> >> if (fatal_signal_pending(current))
>> >> return -ERESTARTSYS;
>> >> - freezable_schedule();
>> >> + freezable_schedule_unsafe();
>> >> return 0;
>> >> }
>> >>
>> >
>> > Looks reasonable, but note that CIFS uses wait_event_freezekillable
>> > with locks held too, which will likely have the same problem and will
>> > need the same workaround for now.
>>
>> I didn't see any lockdep warnings reported on the mailing list when
>> the lockdep patch was previously applied, can you point me to the lock
>> that is held when wait_event_freezkillable is called? I don't want to
>> add an _unsafe call where its not needed and cause more confusion.
>
> It's pretty much all of the same VFS-level locks...
>
> Basically, when a process wants to send a synchronous SMB to a CIFS
> server, it'll send off the request and then call wait_for_response() to
> wait on the reply. If you need a particular call stack, then you can
> look in the rmdir() codepath as an example:
>
> vfs_rmdir takes the i_mutex
> cifs_rmdir (via the inode ops)
> CIFSSMBRmDir (via the smb version ops)
> SendReceive
> wait_for_response
>
> ...at that point a freeze can occur while you're still holding the
> i_mutex.
>
> There are many other possibilities for other codepaths that end up in
> wait_for_response(). Once we get a solution in place for NFS, we'll
> need to do something very similar for CIFS.

Makes sense, I will add CIFS to the patch. Would you prefer it in the
same or separate patches.

2013-05-05 22:12:38

[permalink] [raw]

Subject: Re: [PATCH 1/2] freezer: add unsafe versions of freezable helpers

Hi!

> > NFS calls the freezable helpers with locks held, which is unsafe
> > and caused lockdep warnings when 6aa9707 "lockdep: check that no
> > locks held at freeze time" was applied (reverted in dbf520a).
> > Add new *_unsafe versions of the helpers that will not run the
> > lockdep test when 6aa9707 is reapplied, and call them from NFS.
> >
> > Signed-off-by: Colin Cross <[email protected]>
> > ---
> > fs/nfs/inode.c | 2 +-
> > fs/nfs/nfs3proc.c | 2 +-
> > fs/nfs/nfs4proc.c | 4 ++--
> > include/linux/freezer.h | 42 +++++++++++++++++++++++++++++++++++++++++-
> > net/sunrpc/sched.c | 2 +-
> > 5 files changed, 46 insertions(+), 6 deletions(-)
> >
> > diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> > index 1f94167..53cbee5 100644
> > --- a/fs/nfs/inode.c
> > +++ b/fs/nfs/inode.c
> > @@ -79,7 +79,7 @@ int nfs_wait_bit_killable(void *word)
> > {
> > if (fatal_signal_pending(current))
> > return -ERESTARTSYS;
> > - freezable_schedule();
> > + freezable_schedule_unsafe();
>
> I'd suggest naming such variants _unkillable() instead of _unsafe().
>
> There's nothing inherently 'unsafe' about it: the user asked for a hard
> NFS mount and is getting it: with the side effect that it exposes the
> machine to network delays in a 'hard' way as well. Which means suspend may
> block indefinitely as well on network failure.

You only want to use _unsafe() variants when you enter refrigerator
with locks held.

And entering refrigerator with locks is tricky... and unsafe :-). It
is not directly related to killability.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2013-05-06 21:44:28

[permalink] [raw]

Subject: Re: [PATCH 1/2] freezer: add unsafe versions of freezable helpers

On Mon, 6 May 2013 12:57:54 -0700
Colin Cross <[email protected]> wrote:

> On Mon, May 6, 2013 at 3:56 AM, Jeff Layton <[email protected]> wrote:
> > On Fri, 3 May 2013 14:04:09 -0700
> > Colin Cross <[email protected]> wrote:
> >
> >> NFS calls the freezable helpers with locks held, which is unsafe
> >> and caused lockdep warnings when 6aa9707 "lockdep: check that no
> >> locks held at freeze time" was applied (reverted in dbf520a).
> >> Add new *_unsafe versions of the helpers that will not run the
> >> lockdep test when 6aa9707 is reapplied, and call them from NFS.
> >>
> >> Signed-off-by: Colin Cross <[email protected]>
> >> ---
> >> fs/nfs/inode.c | 2 +-
> >> fs/nfs/nfs3proc.c | 2 +-
> >> fs/nfs/nfs4proc.c | 4 ++--
> >> include/linux/freezer.h | 42 +++++++++++++++++++++++++++++++++++++++++-
> >> net/sunrpc/sched.c | 2 +-
> >> 5 files changed, 46 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> >> index 1f94167..53cbee5 100644
> >> --- a/fs/nfs/inode.c
> >> +++ b/fs/nfs/inode.c
> >> @@ -79,7 +79,7 @@ int nfs_wait_bit_killable(void *word)
> >> {
> >> if (fatal_signal_pending(current))
> >> return -ERESTARTSYS;
> >> - freezable_schedule();
> >> + freezable_schedule_unsafe();
> >> return 0;
> >> }
> >> EXPORT_SYMBOL_GPL(nfs_wait_bit_killable);
> >> diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
> >> index 43ea96c..ce90eb4 100644
> >> --- a/fs/nfs/nfs3proc.c
> >> +++ b/fs/nfs/nfs3proc.c
> >> @@ -33,7 +33,7 @@ nfs3_rpc_wrapper(struct rpc_clnt *clnt, struct rpc_message *msg, int flags)
> >> res = rpc_call_sync(clnt, msg, flags);
> >> if (res != -EJUKEBOX)
> >> break;
> >> - freezable_schedule_timeout_killable(NFS_JUKEBOX_RETRY_TIME);
> >> + freezable_schedule_timeout_killable_unsafe(NFS_JUKEBOX_RETRY_TIME);
> >> res = -ERESTARTSYS;
> >> } while (!fatal_signal_pending(current));
> >> return res;
> >> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> >> index 0ad025e..a236077 100644
> >> --- a/fs/nfs/nfs4proc.c
> >> +++ b/fs/nfs/nfs4proc.c
> >> @@ -266,7 +266,7 @@ static int nfs4_delay(struct rpc_clnt *clnt, long *timeout)
> >> *timeout = NFS4_POLL_RETRY_MIN;
> >> if (*timeout > NFS4_POLL_RETRY_MAX)
> >> *timeout = NFS4_POLL_RETRY_MAX;
> >> - freezable_schedule_timeout_killable(*timeout);
> >> + freezable_schedule_timeout_killable_unsafe(*timeout);
> >> if (fatal_signal_pending(current))
> >> res = -ERESTARTSYS;
> >> *timeout <<= 1;
> >> @@ -4309,7 +4309,7 @@ int nfs4_proc_delegreturn(struct inode *inode, struct rpc_cred *cred, const nfs4
> >> static unsigned long
> >> nfs4_set_lock_task_retry(unsigned long timeout)
> >> {
> >> - freezable_schedule_timeout_killable(timeout);
> >> + freezable_schedule_timeout_killable_unsafe(timeout);
> >> timeout <<= 1;
> >> if (timeout > NFS4_LOCK_MAXTIMEOUT)
> >> return NFS4_LOCK_MAXTIMEOUT;
> >> diff --git a/include/linux/freezer.h b/include/linux/freezer.h
> >> index e70df40..5b31e21c 100644
> >> --- a/include/linux/freezer.h
> >> +++ b/include/linux/freezer.h
> >> @@ -46,7 +46,11 @@ extern int freeze_kernel_threads(void);
> >> extern void thaw_processes(void);
> >> extern void thaw_kernel_threads(void);
> >>
> >> -static inline bool try_to_freeze(void)
> >> +/*
> >> + * DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION
> >> + * If try_to_freeze causes a lockdep warning it means the caller may deadlock
> >> + */
> >> +static inline bool try_to_freeze_unsafe(void)
> >> {
> >> might_sleep();
> >> if (likely(!freezing(current)))
> >> @@ -54,6 +58,11 @@ static inline bool try_to_freeze(void)
> >> return __refrigerator(false);
> >> }
> >>
> >> +static inline bool try_to_freeze(void)
> >> +{
> >> + return try_to_freeze_unsafe();
> >> +}
> >> +
> >> extern bool freeze_task(struct task_struct *p);
> >> extern bool set_freezable(void);
> >>
> >> @@ -115,6 +124,14 @@ static inline void freezer_count(void)
> >> try_to_freeze();
> >> }
> >>
> >> +/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
> >> +static inline void freezer_count_unsafe(void)
> >> +{
> >> + current->flags &= ~PF_FREEZER_SKIP;
> >> + smp_mb();
> >> + try_to_freeze_unsafe();
> >> +}
> >> +
> >> /**
> >> * freezer_should_skip - whether to skip a task when determining frozen
> >> * state is reached
> >> @@ -152,6 +169,14 @@ static inline bool freezer_should_skip(struct task_struct *p)
> >> freezer_count(); \
> >> })
> >>
> >> +/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
> >> +#define freezable_schedule_unsafe() \
> >> +({ \
> >> + freezer_do_not_count(); \
> >> + schedule(); \
> >> + freezer_count_unsafe(); \
> >> +})
> >> +
> >> /* Like schedule_timeout_killable(), but should not block the freezer. */
> >> #define freezable_schedule_timeout_killable(timeout) \
> >> ({ \
> >> @@ -162,6 +187,16 @@ static inline bool freezer_should_skip(struct task_struct *p)
> >> __retval; \
> >> })
> >>
> >> +/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
> >> +#define freezable_schedule_timeout_killable_unsafe(timeout) \
> >> +({ \
> >> + long __retval; \
> >> + freezer_do_not_count(); \
> >> + __retval = schedule_timeout_killable(timeout); \
> >> + freezer_count_unsafe(); \
> >> + __retval; \
> >> +})
> >> +
> >> /*
> >> * Freezer-friendly wrappers around wait_event_interruptible(),
> >> * wait_event_killable() and wait_event_interruptible_timeout(), originally
> >> @@ -225,9 +260,14 @@ static inline void set_freezable(void) {}
> >>
> >> #define freezable_schedule() schedule()
> >>
> >> +#define freezable_schedule_unsafe() schedule()
> >> +
> >> #define freezable_schedule_timeout_killable(timeout) \
> >> schedule_timeout_killable(timeout)
> >>
> >> +#define freezable_schedule_timeout_killable_unsafe(timeout) \
> >> + schedule_timeout_killable(timeout)
> >> +
> >> #define wait_event_freezable(wq, condition) \
> >> wait_event_interruptible(wq, condition)
> >>
> >> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> >> index f8529fc..8dcfadc 100644
> >> --- a/net/sunrpc/sched.c
> >> +++ b/net/sunrpc/sched.c
> >> @@ -254,7 +254,7 @@ static int rpc_wait_bit_killable(void *word)
> >> {
> >> if (fatal_signal_pending(current))
> >> return -ERESTARTSYS;
> >> - freezable_schedule();
> >> + freezable_schedule_unsafe();
> >> return 0;
> >> }
> >>
> >
> > Looks reasonable, but note that CIFS uses wait_event_freezekillable
> > with locks held too, which will likely have the same problem and will
> > need the same workaround for now.
>
> I didn't see any lockdep warnings reported on the mailing list when
> the lockdep patch was previously applied, can you point me to the lock
> that is held when wait_event_freezkillable is called? I don't want to
> add an _unsafe call where its not needed and cause more confusion.

It's pretty much all of the same VFS-level locks...

Basically, when a process wants to send a synchronous SMB to a CIFS
server, it'll send off the request and then call wait_for_response() to
wait on the reply. If you need a particular call stack, then you can
look in the rmdir() codepath as an example:

vfs_rmdir takes the i_mutex
cifs_rmdir (via the inode ops)
CIFSSMBRmDir (via the smb version ops)
SendReceive
wait_for_response

...at that point a freeze can occur while you're still holding the
i_mutex.

There are many other possibilities for other codepaths that end up in
wait_for_response(). Once we get a solution in place for NFS, we'll
need to do something very similar for CIFS.

--
Jeff Layton <[email protected]>

2013-05-06 22:11:04

[permalink] [raw]

Subject: Re: [PATCH 1/2] freezer: add unsafe versions of freezable helpers

On Mon, May 6, 2013 at 2:58 PM, Linus Torvalds
<[email protected]> wrote:
> On Mon, May 6, 2013 at 2:54 PM, Colin Cross <[email protected]> wrote:
>>>
>>> There are many other possibilities for other codepaths that end up in
>>> wait_for_response(). Once we get a solution in place for NFS, we'll
>>> need to do something very similar for CIFS.
>>
>> Makes sense, I will add CIFS to the patch. Would you prefer it in the
>> same or separate patches.
>
> Quite frankly, is it worth resurrecting these patches at all?
>
> The only things it actually complained about are not worth the pain
> fixing and are getting explicitly not warned about - is there any
> reason to believe the patches are worth maintaining and the extra
> complexity is worth it?

There was at least one real other case caught when this patch was
applied: https://lkml.org/lkml/2013/3/4/390. Tejun asked that I
resurrect it because I'm adding some additional APIs similar to
freezable_schedule() and he wanted to make sure they didn't get used
improperly in the future.

2013-05-04 20:27:24

[permalink] [raw]

Subject: Re: [PATCH 2/2] lockdep: check that no locks held at freeze time

On Sat, May 4, 2013 at 6:04 AM, Pavel Machek <[email protected]> wrote:
> On Fri 2013-05-03 14:04:10, Colin Cross wrote:
>> From: Mandeep Singh Baines <[email protected]>
>>
>> We shouldn't try_to_freeze if locks are held. Holding a lock can cause a
>> deadlock if the lock is later acquired in the suspend or hibernate path
>> (e.g. by dpm). Holding a lock can also cause a deadlock in the case of
>> cgroup_freezer if a lock is held inside a frozen cgroup that is later
>> acquired by a process outside that group.
>
> Ok, but this does not explain why
>
>> --- a/include/linux/debug_locks.h
>> +++ b/include/linux/debug_locks.h
>> @@ -51,7 +51,7 @@ struct task_struct;
>> extern void debug_show_all_locks(void);
>> extern void debug_show_held_locks(struct task_struct *task);
>> extern void debug_check_no_locks_freed(const void *from, unsigned long len);
>> -extern void debug_check_no_locks_held(struct task_struct *task);
>> +extern void debug_check_no_locks_held(void);
>> #else
>> static inline void debug_show_all_locks(void)
>> {
>
> Removing task_struct argument from those functions is good idea?

This is an existing patch that was merged in 3.9 and then reverted
again, so it has already been reviewed and accepted once.

>> --- a/kernel/exit.c
>> +++ b/kernel/exit.c
>> @@ -835,7 +835,7 @@ void do_exit(long code)
>> /*
>> * Make sure we are holding no locks:
>> */
>> - debug_check_no_locks_held(tsk);
>> + debug_check_no_locks_held();
>
> Is task guaranteed == current?

Yes, the first line of do_exit is:
struct task_struct *tsk = current;

2013-05-04 13:04:42

[permalink] [raw]

Subject: Re: [PATCH 2/2] lockdep: check that no locks held at freeze time

On Fri 2013-05-03 14:04:10, Colin Cross wrote:
> From: Mandeep Singh Baines <[email protected]>
>
> We shouldn't try_to_freeze if locks are held. Holding a lock can cause a
> deadlock if the lock is later acquired in the suspend or hibernate path
> (e.g. by dpm). Holding a lock can also cause a deadlock in the case of
> cgroup_freezer if a lock is held inside a frozen cgroup that is later
> acquired by a process outside that group.

Ok, but this does not explain why

> --- a/include/linux/debug_locks.h
> +++ b/include/linux/debug_locks.h
> @@ -51,7 +51,7 @@ struct task_struct;
> extern void debug_show_all_locks(void);
> extern void debug_show_held_locks(struct task_struct *task);
> extern void debug_check_no_locks_freed(const void *from, unsigned long len);
> -extern void debug_check_no_locks_held(struct task_struct *task);
> +extern void debug_check_no_locks_held(void);
> #else
> static inline void debug_show_all_locks(void)
> {

Removing task_struct argument from those functions is good idea?

> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -835,7 +835,7 @@ void do_exit(long code)
> /*
> * Make sure we are holding no locks:
> */
> - debug_check_no_locks_held(tsk);
> + debug_check_no_locks_held();

Is task guaranteed == current?

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2013-05-06 21:58:32

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH 1/2] freezer: add unsafe versions of freezable helpers

On Mon, May 6, 2013 at 2:54 PM, Colin Cross <[email protected]> wrote:
>>
>> There are many other possibilities for other codepaths that end up in
>> wait_for_response(). Once we get a solution in place for NFS, we'll
>> need to do something very similar for CIFS.
>
> Makes sense, I will add CIFS to the patch. Would you prefer it in the
> same or separate patches.

Quite frankly, is it worth resurrecting these patches at all?

The only things it actually complained about are not worth the pain
fixing and are getting explicitly not warned about - is there any
reason to believe the patches are worth maintaining and the extra
complexity is worth it?

Linus

2013-05-04 20:23:56

[permalink] [raw]

Subject: Re: [PATCH 1/2] freezer: add unsafe versions of freezable helpers

On Sat, May 4, 2013 at 6:00 AM, Pavel Machek <[email protected]> wrote:
> Hi!
>
>> NFS calls the freezable helpers with locks held, which is unsafe
>> and caused lockdep warnings when 6aa9707 "lockdep: check that no
>> locks held at freeze time" was applied (reverted in dbf520a).
>> Add new *_unsafe versions of the helpers that will not run the
>> lockdep test when 6aa9707 is reapplied, and call them from NFS.
>>
>> Signed-off-by: Colin Cross <[email protected]>
>
> Looks mostly good.
>
>> @@ -152,6 +169,14 @@ static inline bool freezer_should_skip(struct task_struct *p)
>> freezer_count(); \
>> })
>>
>> +/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
>> +#define freezable_schedule_unsafe() \
>> +({ \
>> + freezer_do_not_count(); \
>> + schedule(); \
>> + freezer_count_unsafe(); \
>> +})
>> +
>
> Make it inline function? :-). Add short explanation why it is good
> idea?

These are exact copies of the existing non-unsafe versions, except
they call freezer_count_unsafe() instead of freezer_count(). The next
version of my other patch stack that goes on top of this has a patch
to convert the macros in this file to static inline functions (at
least the ones that can be). I'd rather not mix it in with this patch
for ease of comparison with the existing calls.

>> +/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
>> +#define freezable_schedule_timeout_killable_unsafe(timeout) \
>> +({ \
>> + long __retval; \
>> + freezer_do_not_count(); \
>> + __retval = schedule_timeout_killable(timeout); \
>> + freezer_count_unsafe(); \
>> + __retval; \
>> +})
>
> Function too?
> Pavel
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2013-05-04 22:55:53

[permalink] [raw]

Subject: Re: [PATCH 1/2] freezer: add unsafe versions of freezable helpers

Hi!

> >> NFS calls the freezable helpers with locks held, which is unsafe
> >> and caused lockdep warnings when 6aa9707 "lockdep: check that no
> >> locks held at freeze time" was applied (reverted in dbf520a).
> >> Add new *_unsafe versions of the helpers that will not run the
> >> lockdep test when 6aa9707 is reapplied, and call them from NFS.
> >>
> >> Signed-off-by: Colin Cross <[email protected]>
> >
> > Looks mostly good.
> >
> >> @@ -152,6 +169,14 @@ static inline bool freezer_should_skip(struct task_struct *p)
> >> freezer_count(); \
> >> })
> >>
> >> +/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
> >> +#define freezable_schedule_unsafe() \
> >> +({ \
> >> + freezer_do_not_count(); \
> >> + schedule(); \
> >> + freezer_count_unsafe(); \
> >> +})
> >> +
> >
> > Make it inline function? :-). Add short explanation why it is good
> > idea?
>
> These are exact copies of the existing non-unsafe versions, except
> they call freezer_count_unsafe() instead of freezer_count(). The next
> version of my other patch stack that goes on top of this has a patch
> to convert the macros in this file to static inline functions (at
> least the ones that can be). I'd rather not mix it in with this patch
> for ease of comparison with the existing calls.

Ok.

Acked-by: Pavel Machek <[email protected]>

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2013-05-06 19:03:27

[permalink] [raw]

Subject: Re: [PATCH 1/2] freezer: add unsafe versions of freezable helpers

2013-05-06 14:44:37

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 2/2] lockdep: check that no locks held at freeze time

On Mon, May 06, 2013 at 07:33:28AM -0700, Linus Torvalds wrote:
> On Mon, May 6, 2013 at 1:55 AM, Peter Zijlstra <[email protected]> wrote:
> >
> > Doesn't i386 have all the funny per-cpu stuff too? So the only reason it still
> > does the fugly stack based thing is because nobody could be arsed to do the
> > work of converting it.
>
> Umm. That "fugly stack-based" thing is better than the per-cpu crap.
>
> The percpu stuff implies a memory load. The stack based thing gets
> thread_info with pure register accesses. Much better.
>
> For "current()" the per-cpu thing may be better, but if you actually
> need the thread-info (not the case here, but in other places), the
> stack masking is superior when it works (ie when you don't have
> multi-stack issues due to irq's etc)

But you can do both right? Use per-cpu for current and stack frobbery for
current_thread_info().

That said, ISTR some risky bits where the stack frobbery went awry due to
irq-stacks which is the source for my feelings towards the stack frobbery.

That and of course that i386 and x86-64 behave differently for no apparent
reason.

2013-05-06 19:01:22

[permalink] [raw]

Subject: Re: [PATCH 2/2] lockdep: check that no locks held at freeze time

On Fri, May 03, 2013 at 02:04:10PM -0700, Colin Cross wrote:
> From: Mandeep Singh Baines <[email protected]>
>
> We shouldn't try_to_freeze if locks are held. Holding a lock can cause a
> deadlock if the lock is later acquired in the suspend or hibernate path
> (e.g. by dpm). Holding a lock can also cause a deadlock in the case of
> cgroup_freezer if a lock is held inside a frozen cgroup that is later
> acquired by a process outside that group.
>
> History:
> This patch was originally applied as 6aa9707099c and reverted in
> dbf520a9d7d4 because NFS was freezing with locks held. It was
> deemed better to keep the bad freeze point in NFS to allow laptops
> to suspend consistently. The previous patch in this series converts
> NFS to call _unsafe versions of the freezable helpers so that
> lockdep doesn't complain about them until a more correct fix
> can be applied.

I don't care about %current change, especially given that it's a debug
interface but that really should be a separate patch, so please split
it out if you want it (and I think we want it).

Thanks.

--
tejun

2013-05-06 22:00:30

[permalink] [raw]

Subject: Re: [PATCH 1/2] freezer: add unsafe versions of freezable helpers

On Mon, 6 May 2013 14:54:53 -0700
Colin Cross <[email protected]> wrote:

> On Mon, May 6, 2013 at 2:43 PM, Jeff Layton <[email protected]> wrote:
> > On Mon, 6 May 2013 12:57:54 -0700
> > Colin Cross <[email protected]> wrote:
> >
> >> On Mon, May 6, 2013 at 3:56 AM, Jeff Layton <[email protected]> wrote:
> >> > On Fri, 3 May 2013 14:04:09 -0700
> >> > Colin Cross <[email protected]> wrote:
> >> >
> >> >> NFS calls the freezable helpers with locks held, which is unsafe
> >> >> and caused lockdep warnings when 6aa9707 "lockdep: check that no
> >> >> locks held at freeze time" was applied (reverted in dbf520a).
> >> >> Add new *_unsafe versions of the helpers that will not run the
> >> >> lockdep test when 6aa9707 is reapplied, and call them from NFS.
> >> >>
> >> >> Signed-off-by: Colin Cross <[email protected]>
> >> >> ---
> >> >> fs/nfs/inode.c | 2 +-
> >> >> fs/nfs/nfs3proc.c | 2 +-
> >> >> fs/nfs/nfs4proc.c | 4 ++--
> >> >> include/linux/freezer.h | 42 +++++++++++++++++++++++++++++++++++++++++-
> >> >> net/sunrpc/sched.c | 2 +-
> >> >> 5 files changed, 46 insertions(+), 6 deletions(-)
> >> >>
> >> >> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> >> >> index 1f94167..53cbee5 100644
> >> >> --- a/fs/nfs/inode.c
> >> >> +++ b/fs/nfs/inode.c
> >> >> @@ -79,7 +79,7 @@ int nfs_wait_bit_killable(void *word)
> >> >> {
> >> >> if (fatal_signal_pending(current))
> >> >> return -ERESTARTSYS;
> >> >> - freezable_schedule();
> >> >> + freezable_schedule_unsafe();
> >> >> return 0;
> >> >> }
> >> >> EXPORT_SYMBOL_GPL(nfs_wait_bit_killable);
> >> >> diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
> >> >> index 43ea96c..ce90eb4 100644
> >> >> --- a/fs/nfs/nfs3proc.c
> >> >> +++ b/fs/nfs/nfs3proc.c
> >> >> @@ -33,7 +33,7 @@ nfs3_rpc_wrapper(struct rpc_clnt *clnt, struct rpc_message *msg, int flags)
> >> >> res = rpc_call_sync(clnt, msg, flags);
> >> >> if (res != -EJUKEBOX)
> >> >> break;
> >> >> - freezable_schedule_timeout_killable(NFS_JUKEBOX_RETRY_TIME);
> >> >> + freezable_schedule_timeout_killable_unsafe(NFS_JUKEBOX_RETRY_TIME);
> >> >> res = -ERESTARTSYS;
> >> >> } while (!fatal_signal_pending(current));
> >> >> return res;
> >> >> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> >> >> index 0ad025e..a236077 100644
> >> >> --- a/fs/nfs/nfs4proc.c
> >> >> +++ b/fs/nfs/nfs4proc.c
> >> >> @@ -266,7 +266,7 @@ static int nfs4_delay(struct rpc_clnt *clnt, long *timeout)
> >> >> *timeout = NFS4_POLL_RETRY_MIN;
> >> >> if (*timeout > NFS4_POLL_RETRY_MAX)
> >> >> *timeout = NFS4_POLL_RETRY_MAX;
> >> >> - freezable_schedule_timeout_killable(*timeout);
> >> >> + freezable_schedule_timeout_killable_unsafe(*timeout);
> >> >> if (fatal_signal_pending(current))
> >> >> res = -ERESTARTSYS;
> >> >> *timeout <<= 1;
> >> >> @@ -4309,7 +4309,7 @@ int nfs4_proc_delegreturn(struct inode *inode, struct rpc_cred *cred, const nfs4
> >> >> static unsigned long
> >> >> nfs4_set_lock_task_retry(unsigned long timeout)
> >> >> {
> >> >> - freezable_schedule_timeout_killable(timeout);
> >> >> + freezable_schedule_timeout_killable_unsafe(timeout);
> >> >> timeout <<= 1;
> >> >> if (timeout > NFS4_LOCK_MAXTIMEOUT)
> >> >> return NFS4_LOCK_MAXTIMEOUT;
> >> >> diff --git a/include/linux/freezer.h b/include/linux/freezer.h
> >> >> index e70df40..5b31e21c 100644
> >> >> --- a/include/linux/freezer.h
> >> >> +++ b/include/linux/freezer.h
> >> >> @@ -46,7 +46,11 @@ extern int freeze_kernel_threads(void);
> >> >> extern void thaw_processes(void);
> >> >> extern void thaw_kernel_threads(void);
> >> >>
> >> >> -static inline bool try_to_freeze(void)
> >> >> +/*
> >> >> + * DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION
> >> >> + * If try_to_freeze causes a lockdep warning it means the caller may deadlock
> >> >> + */
> >> >> +static inline bool try_to_freeze_unsafe(void)
> >> >> {
> >> >> might_sleep();
> >> >> if (likely(!freezing(current)))
> >> >> @@ -54,6 +58,11 @@ static inline bool try_to_freeze(void)
> >> >> return __refrigerator(false);
> >> >> }
> >> >>
> >> >> +static inline bool try_to_freeze(void)
> >> >> +{
> >> >> + return try_to_freeze_unsafe();
> >> >> +}
> >> >> +
> >> >> extern bool freeze_task(struct task_struct *p);
> >> >> extern bool set_freezable(void);
> >> >>
> >> >> @@ -115,6 +124,14 @@ static inline void freezer_count(void)
> >> >> try_to_freeze();
> >> >> }
> >> >>
> >> >> +/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
> >> >> +static inline void freezer_count_unsafe(void)
> >> >> +{
> >> >> + current->flags &= ~PF_FREEZER_SKIP;
> >> >> + smp_mb();
> >> >> + try_to_freeze_unsafe();
> >> >> +}
> >> >> +
> >> >> /**
> >> >> * freezer_should_skip - whether to skip a task when determining frozen
> >> >> * state is reached
> >> >> @@ -152,6 +169,14 @@ static inline bool freezer_should_skip(struct task_struct *p)
> >> >> freezer_count(); \
> >> >> })
> >> >>
> >> >> +/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
> >> >> +#define freezable_schedule_unsafe() \
> >> >> +({ \
> >> >> + freezer_do_not_count(); \
> >> >> + schedule(); \
> >> >> + freezer_count_unsafe(); \
> >> >> +})
> >> >> +
> >> >> /* Like schedule_timeout_killable(), but should not block the freezer. */
> >> >> #define freezable_schedule_timeout_killable(timeout) \
> >> >> ({ \
> >> >> @@ -162,6 +187,16 @@ static inline bool freezer_should_skip(struct task_struct *p)
> >> >> __retval; \
> >> >> })
> >> >>
> >> >> +/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
> >> >> +#define freezable_schedule_timeout_killable_unsafe(timeout) \
> >> >> +({ \
> >> >> + long __retval; \
> >> >> + freezer_do_not_count(); \
> >> >> + __retval = schedule_timeout_killable(timeout); \
> >> >> + freezer_count_unsafe(); \
> >> >> + __retval; \
> >> >> +})
> >> >> +
> >> >> /*
> >> >> * Freezer-friendly wrappers around wait_event_interruptible(),
> >> >> * wait_event_killable() and wait_event_interruptible_timeout(), originally
> >> >> @@ -225,9 +260,14 @@ static inline void set_freezable(void) {}
> >> >>
> >> >> #define freezable_schedule() schedule()
> >> >>
> >> >> +#define freezable_schedule_unsafe() schedule()
> >> >> +
> >> >> #define freezable_schedule_timeout_killable(timeout) \
> >> >> schedule_timeout_killable(timeout)
> >> >>
> >> >> +#define freezable_schedule_timeout_killable_unsafe(timeout) \
> >> >> + schedule_timeout_killable(timeout)
> >> >> +
> >> >> #define wait_event_freezable(wq, condition) \
> >> >> wait_event_interruptible(wq, condition)
> >> >>
> >> >> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> >> >> index f8529fc..8dcfadc 100644
> >> >> --- a/net/sunrpc/sched.c
> >> >> +++ b/net/sunrpc/sched.c
> >> >> @@ -254,7 +254,7 @@ static int rpc_wait_bit_killable(void *word)
> >> >> {
> >> >> if (fatal_signal_pending(current))
> >> >> return -ERESTARTSYS;
> >> >> - freezable_schedule();
> >> >> + freezable_schedule_unsafe();
> >> >> return 0;
> >> >> }
> >> >>
> >> >
> >> > Looks reasonable, but note that CIFS uses wait_event_freezekillable
> >> > with locks held too, which will likely have the same problem and will
> >> > need the same workaround for now.
> >>
> >> I didn't see any lockdep warnings reported on the mailing list when
> >> the lockdep patch was previously applied, can you point me to the lock
> >> that is held when wait_event_freezkillable is called? I don't want to
> >> add an _unsafe call where its not needed and cause more confusion.
> >
> > It's pretty much all of the same VFS-level locks...
> >
> > Basically, when a process wants to send a synchronous SMB to a CIFS
> > server, it'll send off the request and then call wait_for_response() to
> > wait on the reply. If you need a particular call stack, then you can
> > look in the rmdir() codepath as an example:
> >
> > vfs_rmdir takes the i_mutex
> > cifs_rmdir (via the inode ops)
> > CIFSSMBRmDir (via the smb version ops)
> > SendReceive
> > wait_for_response
> >
> > ...at that point a freeze can occur while you're still holding the
> > i_mutex.
> >
> > There are many other possibilities for other codepaths that end up in
> > wait_for_response(). Once we get a solution in place for NFS, we'll
> > need to do something very similar for CIFS.
>
> Makes sense, I will add CIFS to the patch. Would you prefer it in the
> same or separate patches.

A new patch on top of this one would be fine with me, but I don't feel
strongly either way.

Thanks,
--
Jeff Layton <[email protected]>

2013-05-06 19:30:20

[permalink] [raw]

Subject: Re: [PATCH 2/2] lockdep: check that no locks held at freeze time

On Mon, May 6, 2013 at 12:01 PM, Tejun Heo <[email protected]> wrote:
> On Fri, May 03, 2013 at 02:04:10PM -0700, Colin Cross wrote:
>> From: Mandeep Singh Baines <[email protected]>
>>
>> We shouldn't try_to_freeze if locks are held. Holding a lock can cause a
>> deadlock if the lock is later acquired in the suspend or hibernate path
>> (e.g. by dpm). Holding a lock can also cause a deadlock in the case of
>> cgroup_freezer if a lock is held inside a frozen cgroup that is later
>> acquired by a process outside that group.
>>
>> History:
>> This patch was originally applied as 6aa9707099c and reverted in
>> dbf520a9d7d4 because NFS was freezing with locks held. It was
>> deemed better to keep the bad freeze point in NFS to allow laptops
>> to suspend consistently. The previous patch in this series converts
>> NFS to call _unsafe versions of the freezable helpers so that
>> lockdep doesn't complain about them until a more correct fix
>> can be applied.
>
> I don't care about %current change, especially given that it's a debug
> interface but that really should be a separate patch, so please split
> it out if you want it (and I think we want it).

The current change was requested by akpm and was part of the original
patch. Is it really worth confusing the history of this patch even
more, applying it the first time, reverting it, and then applying it
again in two parts?

2013-05-03 21:12:02

[permalink] [raw]

Subject: [PATCH 2/2] lockdep: check that no locks held at freeze time

From: Mandeep Singh Baines <[email protected]>

We shouldn't try_to_freeze if locks are held. Holding a lock can cause a
deadlock if the lock is later acquired in the suspend or hibernate path
(e.g. by dpm). Holding a lock can also cause a deadlock in the case of
cgroup_freezer if a lock is held inside a frozen cgroup that is later
acquired by a process outside that group.

History:
This patch was originally applied as 6aa9707099c and reverted in
dbf520a9d7d4 because NFS was freezing with locks held. It was
deemed better to keep the bad freeze point in NFS to allow laptops
to suspend consistently. The previous patch in this series converts
NFS to call _unsafe versions of the freezable helpers so that
lockdep doesn't complain about them until a more correct fix
can be applied.

[[email protected]: export debug_check_no_locks_held]
Signed-off-by: Mandeep Singh Baines <[email protected]>
Cc: Ben Chan <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
[[email protected]: don't warn if try_to_freeze_unsafe is called]
Signed-off-by: Colin Cross <[email protected]>
---
include/linux/debug_locks.h | 4 ++--
include/linux/freezer.h | 3 +++
kernel/exit.c | 2 +-
kernel/lockdep.c | 17 ++++++++---------
4 files changed, 14 insertions(+), 12 deletions(-)

As requested by Tejun, this is to prevent incorrect use when
I add new freezable helpers in a subsequent patch series.

I'm not sure what the protocol is for Signed-off-by when
reapplying a reverted patch, but I got the patch from Linus'
tree so I left all of them and added mine at the bottom.

diff --git a/include/linux/debug_locks.h b/include/linux/debug_locks.h
index 3bd46f7..a975de1 100644
--- a/include/linux/debug_locks.h
+++ b/include/linux/debug_locks.h
@@ -51,7 +51,7 @@ struct task_struct;
extern void debug_show_all_locks(void);
extern void debug_show_held_locks(struct task_struct *task);
extern void debug_check_no_locks_freed(const void *from, unsigned long len);
-extern void debug_check_no_locks_held(struct task_struct *task);
+extern void debug_check_no_locks_held(void);
#else
static inline void debug_show_all_locks(void)
{
@@ -67,7 +67,7 @@ debug_check_no_locks_freed(const void *from, unsigned long len)
}

static inline void
-debug_check_no_locks_held(struct task_struct *task)
+debug_check_no_locks_held(void)
{
}
#endif
diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index 5b31e21c..efb80dd 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -3,6 +3,7 @@
#ifndef FREEZER_H_INCLUDED
#define FREEZER_H_INCLUDED

+#include <linux/debug_locks.h>
#include <linux/sched.h>
#include <linux/wait.h>
#include <linux/atomic.h>
@@ -60,6 +61,8 @@ static inline bool try_to_freeze_unsafe(void)

static inline bool try_to_freeze(void)
{
+ if (!(current->flags & PF_NOFREEZE))
+ debug_check_no_locks_held();
return try_to_freeze_unsafe();
}

diff --git a/kernel/exit.c b/kernel/exit.c
index 60bc027..51e485c 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -835,7 +835,7 @@ void do_exit(long code)
/*
* Make sure we are holding no locks:
*/
- debug_check_no_locks_held(tsk);
+ debug_check_no_locks_held();
/*
* We can do this unlocked here. The futex code uses this flag
* just to verify whether the pi state cleanup has been done
diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 8a0efac..259db20 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -4088,7 +4088,7 @@ void debug_check_no_locks_freed(const void *mem_from, unsigned long mem_len)
}
EXPORT_SYMBOL_GPL(debug_check_no_locks_freed);

-static void print_held_locks_bug(struct task_struct *curr)
+static void print_held_locks_bug(void)
{
if (!debug_locks_off())
return;
@@ -4097,22 +4097,21 @@ static void print_held_locks_bug(struct task_struct *curr)

printk("\n");
printk("=====================================\n");
- printk("[ BUG: lock held at task exit time! ]\n");
+ printk("[ BUG: %s/%d still has locks held! ]\n",
+ current->comm, task_pid_nr(current));
print_kernel_ident();
printk("-------------------------------------\n");
- printk("%s/%d is exiting with locks still held!\n",
- curr->comm, task_pid_nr(curr));
- lockdep_print_held_locks(curr);
-
+ lockdep_print_held_locks(current);
printk("\nstack backtrace:\n");
dump_stack();
}

-void debug_check_no_locks_held(struct task_struct *task)
+void debug_check_no_locks_held(void)
{
- if (unlikely(task->lockdep_depth > 0))
- print_held_locks_bug(task);
+ if (unlikely(current->lockdep_depth > 0))
+ print_held_locks_bug();
}
+EXPORT_SYMBOL_GPL(debug_check_no_locks_held);

void debug_show_all_locks(void)
{
--
1.8.2.1

2013-05-05 00:05:31

[permalink] [raw]

Subject: Re: [PATCH 2/2] lockdep: check that no locks held at freeze time

Hi!

> >> >> --- a/kernel/exit.c
> >> >> +++ b/kernel/exit.c
> >> >> @@ -835,7 +835,7 @@ void do_exit(long code)
> >> >> /*
> >> >> * Make sure we are holding no locks:
> >> >> */
> >> >> - debug_check_no_locks_held(tsk);
> >> >> + debug_check_no_locks_held();
> >> >
> >> > Is task guaranteed == current?
> >>
> >> Yes, the first line of do_exit is:
> >> struct task_struct *tsk = current;
> >
> > Aha, I understand it now.
> >
> > Accessing current is slower than local variable. So your "new" code
> > will work but will be slower. Please revert this part.
>
> Using current instead of passing in tsk was done at Andrew Morton's
> suggestion, and makes no difference from the freezer's perspective
> since it would have to use current to get the task to pass in, so I'm
> going to leave it as is.

Well, current is:

static inline struct thread_info *current_thread_info(void)
{
register unsigned long sp asm ("sp");
return (struct thread_info *)(sp & ~(THREAD_SIZE - 1));
}

#define get_current() (current_thread_info()->task)

#define current get_current()

Instead of passing computed value to debug_check_no_locks_held(), you
force it to be computed again. do_exit() performance matters for
configure scripts, etc.

I'd say it makes sense to keep the optimalization. akpm can correct
me.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2013-05-04 22:57:18

[permalink] [raw]

Subject: Re: [PATCH 2/2] lockdep: check that no locks held at freeze time

On Sat 2013-05-04 13:27:23, Colin Cross wrote:
> On Sat, May 4, 2013 at 6:04 AM, Pavel Machek <[email protected]> wrote:
> > On Fri 2013-05-03 14:04:10, Colin Cross wrote:
> >> From: Mandeep Singh Baines <[email protected]>
> >>
> >> We shouldn't try_to_freeze if locks are held. Holding a lock can cause a
> >> deadlock if the lock is later acquired in the suspend or hibernate path
> >> (e.g. by dpm). Holding a lock can also cause a deadlock in the case of
> >> cgroup_freezer if a lock is held inside a frozen cgroup that is later
> >> acquired by a process outside that group.
> >
> > Ok, but this does not explain why
> >
> >> --- a/include/linux/debug_locks.h
> >> +++ b/include/linux/debug_locks.h
> >> @@ -51,7 +51,7 @@ struct task_struct;
> >> extern void debug_show_all_locks(void);
> >> extern void debug_show_held_locks(struct task_struct *task);
> >> extern void debug_check_no_locks_freed(const void *from, unsigned long len);
> >> -extern void debug_check_no_locks_held(struct task_struct *task);
> >> +extern void debug_check_no_locks_held(void);
> >> #else
> >> static inline void debug_show_all_locks(void)
> >> {
> >
> > Removing task_struct argument from those functions is good idea?
>
> This is an existing patch that was merged in 3.9 and then reverted
> again, so it has already been reviewed and accepted once.

Well, it was also reverted once :-).

> >> --- a/kernel/exit.c
> >> +++ b/kernel/exit.c
> >> @@ -835,7 +835,7 @@ void do_exit(long code)
> >> /*
> >> * Make sure we are holding no locks:
> >> */
> >> - debug_check_no_locks_held(tsk);
> >> + debug_check_no_locks_held();
> >
> > Is task guaranteed == current?
>
> Yes, the first line of do_exit is:
> struct task_struct *tsk = current;

Aha, I understand it now.

Accessing current is slower than local variable. So your "new" code
will work but will be slower. Please revert this part.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2013-05-06 22:05:58