Here is a stack trace of recursion:
free_pid_ns(parent)
put_pid_ns(parent)
kref_put(&ns->kref, free_pid_ns);
free_pid_ns
This patch turns recursion into loops.
pidns can be nested many times, so in case of recursion
a simple user space program can provoke a kernel panic
due to exceed of a kernel stack.
v2: * don't check parent on NULL
* use atomic_dec_and_test(&kref->refcount)
v3: Fix coding style issue
Acked-by: Cyrill Gorcunov <[email protected]>
Reviewed-by: Oleg Nesterov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Signed-off-by: Andrew Vagin <[email protected]>
---
include/linux/kref.h | 12 ++++++++++++
kernel/pid_namespace.c | 16 ++++++++++++----
2 files changed, 24 insertions(+), 4 deletions(-)
diff --git a/include/linux/kref.h b/include/linux/kref.h
index 65af688..2844262 100644
--- a/include/linux/kref.h
+++ b/include/linux/kref.h
@@ -95,6 +95,18 @@ static inline int kref_put(struct kref *kref, void (*release)(struct kref *kref)
return kref_sub(kref, 1, release);
}
+/**
+ * kref_put - decrement refcount for object.
+ * @kref: object.
+ *
+ * Decrement the refcount.
+ * Return 1 if refcount is zero.
+ */
+static inline int __kref_put(struct kref *kref)
+{
+ return atomic_dec_and_test(&kref->refcount);
+}
+
static inline int kref_put_mutex(struct kref *kref,
void (*release)(struct kref *kref),
struct mutex *lock)
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 6144bab..b051fa6 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -138,11 +138,19 @@ void free_pid_ns(struct kref *kref)
ns = container_of(kref, struct pid_namespace, kref);
- parent = ns->parent;
- destroy_pid_namespace(ns);
+ while (1) {
+ parent = ns->parent;
+ destroy_pid_namespace(ns);
- if (parent != NULL)
- put_pid_ns(parent);
+ if (parent == &init_pid_ns)
+ break;
+
+ /* kref_put cannot be used for avoiding recursion */
+ if (__kref_put(&parent->kref) == 0)
+ break;
+
+ ns = parent;
+ }
}
void zap_pid_ns_processes(struct pid_namespace *pid_ns)
--
1.7.1
On Sat, 6 Oct 2012 23:56:33 +0400
Andrew Vagin <[email protected]> wrote:
> Here is a stack trace of recursion:
> free_pid_ns(parent)
> put_pid_ns(parent)
> kref_put(&ns->kref, free_pid_ns);
> free_pid_ns
>
> This patch turns recursion into loops.
>
> pidns can be nested many times, so in case of recursion
> a simple user space program can provoke a kernel panic
> due to exceed of a kernel stack.
So we should backport this into earlier kernels.
> --- a/include/linux/kref.h
> +++ b/include/linux/kref.h
> @@ -95,6 +95,18 @@ static inline int kref_put(struct kref *kref, void (*release)(struct kref *kref)
> return kref_sub(kref, 1, release);
> }
>
> +/**
> + * kref_put - decrement refcount for object.
> + * @kref: object.
> + *
> + * Decrement the refcount.
> + * Return 1 if refcount is zero.
> + */
> +static inline int __kref_put(struct kref *kref)
> +{
> + return atomic_dec_and_test(&kref->refcount);
> +}
Greg might be interested in this.
It's a pretty specialised thing and perhaps it needs some stern words
in the description explaining when and why it should and shouldn't be
used.
I wonder if people might (ab)use this to avoid the "doesn't
have a release function" warning.
> static inline int kref_put_mutex(struct kref *kref,
> void (*release)(struct kref *kref),
> struct mutex *lock)
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index 6144bab..b051fa6 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -138,11 +138,19 @@ void free_pid_ns(struct kref *kref)
>
> ns = container_of(kref, struct pid_namespace, kref);
>
> - parent = ns->parent;
> - destroy_pid_namespace(ns);
> + while (1) {
> + parent = ns->parent;
> + destroy_pid_namespace(ns);
>
> - if (parent != NULL)
> - put_pid_ns(parent);
> + if (parent == &init_pid_ns)
> + break;
> +
> + /* kref_put cannot be used for avoiding recursion */
> + if (__kref_put(&parent->kref) == 0)
> + break;
> +
> + ns = parent;
> + }
> }
>
> void zap_pid_ns_processes(struct pid_namespace *pid_ns)
On Tue, Oct 09, 2012 at 11:48:21AM -0700, Andrew Morton wrote:
> On Sat, 6 Oct 2012 23:56:33 +0400
> Andrew Vagin <[email protected]> wrote:
>
> > Here is a stack trace of recursion:
> > free_pid_ns(parent)
> > put_pid_ns(parent)
> > kref_put(&ns->kref, free_pid_ns);
> > free_pid_ns
> >
> > This patch turns recursion into loops.
> >
> > pidns can be nested many times, so in case of recursion
> > a simple user space program can provoke a kernel panic
> > due to exceed of a kernel stack.
>
> So we should backport this into earlier kernels.
>
> > --- a/include/linux/kref.h
> > +++ b/include/linux/kref.h
> > @@ -95,6 +95,18 @@ static inline int kref_put(struct kref *kref, void (*release)(struct kref *kref)
> > return kref_sub(kref, 1, release);
> > }
> >
> > +/**
> > + * kref_put - decrement refcount for object.
> > + * @kref: object.
> > + *
> > + * Decrement the refcount.
> > + * Return 1 if refcount is zero.
> > + */
> > +static inline int __kref_put(struct kref *kref)
> > +{
> > + return atomic_dec_and_test(&kref->refcount);
> > +}
>
> Greg might be interested in this.
>
> It's a pretty specialised thing and perhaps it needs some stern words
> in the description explaining when and why it should and shouldn't be
> used.
>
> I wonder if people might (ab)use this to avoid the "doesn't
> have a release function" warning.
Yes they would, please don't do this at all.
In fact, why is it needed? It doesn't solve anything (if it does,
something in the way the kref is being used is wrong.)
Please no, I don't want this patch.
greg k-h
On Tue, 9 Oct 2012 12:03:00 -0700
Greg KH <[email protected]> wrote:
> On Tue, Oct 09, 2012 at 11:48:21AM -0700, Andrew Morton wrote:
> > On Sat, 6 Oct 2012 23:56:33 +0400
> > Andrew Vagin <[email protected]> wrote:
> >
> > > Here is a stack trace of recursion:
> > > free_pid_ns(parent)
> > > put_pid_ns(parent)
> > > kref_put(&ns->kref, free_pid_ns);
> > > free_pid_ns
> > >
> > > This patch turns recursion into loops.
> > >
> > > pidns can be nested many times, so in case of recursion
> > > a simple user space program can provoke a kernel panic
> > > due to exceed of a kernel stack.
> >
> > So we should backport this into earlier kernels.
> >
> > > --- a/include/linux/kref.h
> > > +++ b/include/linux/kref.h
> > > @@ -95,6 +95,18 @@ static inline int kref_put(struct kref *kref, void (*release)(struct kref *kref)
> > > return kref_sub(kref, 1, release);
> > > }
> > >
> > > +/**
> > > + * kref_put - decrement refcount for object.
> > > + * @kref: object.
> > > + *
> > > + * Decrement the refcount.
> > > + * Return 1 if refcount is zero.
> > > + */
> > > +static inline int __kref_put(struct kref *kref)
> > > +{
> > > + return atomic_dec_and_test(&kref->refcount);
> > > +}
> >
> > Greg might be interested in this.
> >
> > It's a pretty specialised thing and perhaps it needs some stern words
> > in the description explaining when and why it should and shouldn't be
> > used.
> >
> > I wonder if people might (ab)use this to avoid the "doesn't
> > have a release function" warning.
>
> Yes they would, please don't do this at all.
>
> In fact, why is it needed? It doesn't solve anything (if it does,
> something in the way the kref is being used is wrong.)
>
It's right there in the changelog. The patch fixes deep
kref_put->release->kref_put recursion by turning the operation for
pidns into a loop.
On Tue, Oct 09, 2012 at 12:08:31PM -0700, Andrew Morton wrote:
> On Tue, 9 Oct 2012 12:03:00 -0700
> Greg KH <[email protected]> wrote:
>
> > On Tue, Oct 09, 2012 at 11:48:21AM -0700, Andrew Morton wrote:
> > > On Sat, 6 Oct 2012 23:56:33 +0400
> > > Andrew Vagin <[email protected]> wrote:
> > >
> > > > Here is a stack trace of recursion:
> > > > free_pid_ns(parent)
> > > > put_pid_ns(parent)
> > > > kref_put(&ns->kref, free_pid_ns);
> > > > free_pid_ns
> > > >
> > > > This patch turns recursion into loops.
> > > >
> > > > pidns can be nested many times, so in case of recursion
> > > > a simple user space program can provoke a kernel panic
> > > > due to exceed of a kernel stack.
> > >
> > > So we should backport this into earlier kernels.
> > >
> > > > --- a/include/linux/kref.h
> > > > +++ b/include/linux/kref.h
> > > > @@ -95,6 +95,18 @@ static inline int kref_put(struct kref *kref, void (*release)(struct kref *kref)
> > > > return kref_sub(kref, 1, release);
> > > > }
> > > >
> > > > +/**
> > > > + * kref_put - decrement refcount for object.
> > > > + * @kref: object.
> > > > + *
> > > > + * Decrement the refcount.
> > > > + * Return 1 if refcount is zero.
> > > > + */
> > > > +static inline int __kref_put(struct kref *kref)
> > > > +{
> > > > + return atomic_dec_and_test(&kref->refcount);
> > > > +}
> > >
> > > Greg might be interested in this.
> > >
> > > It's a pretty specialised thing and perhaps it needs some stern words
> > > in the description explaining when and why it should and shouldn't be
> > > used.
> > >
> > > I wonder if people might (ab)use this to avoid the "doesn't
> > > have a release function" warning.
> >
> > Yes they would, please don't do this at all.
> >
> > In fact, why is it needed? It doesn't solve anything (if it does,
> > something in the way the kref is being used is wrong.)
> >
>
> It's right there in the changelog. The patch fixes deep
> kref_put->release->kref_put recursion by turning the operation for
> pidns into a loop.
But why would a kref release function ever decrement the same kref
again causing a loop in the first place?
That's what I was referring to. This strongly sounds like a problem in
how the kref is being used, not in the kref code itself.
Is a kref even the correct thing here?
greg k-h
On Wed, Oct 10, 2012 at 3:49 PM, Greg KH <[email protected]> wrote:
> On Tue, Oct 09, 2012 at 12:08:31PM -0700, Andrew Morton wrote:
>> On Tue, 9 Oct 2012 12:03:00 -0700
>> Greg KH <[email protected]> wrote:
>>
>> > On Tue, Oct 09, 2012 at 11:48:21AM -0700, Andrew Morton wrote:
>> > > On Sat, 6 Oct 2012 23:56:33 +0400
>> > > Andrew Vagin <[email protected]> wrote:
>> > >
>> > > > Here is a stack trace of recursion:
>> > > > free_pid_ns(parent)
>> > > > put_pid_ns(parent)
>> > > > kref_put(&ns->kref, free_pid_ns);
>> > > > free_pid_ns
>> > > >
>> > > > This patch turns recursion into loops.
>> > > >
>> > > > pidns can be nested many times, so in case of recursion
>> > > > a simple user space program can provoke a kernel panic
>> > > > due to exceed of a kernel stack.
>> > >
>> > > So we should backport this into earlier kernels.
>> > >
>> > > > --- a/include/linux/kref.h
>> > > > +++ b/include/linux/kref.h
>> > > > @@ -95,6 +95,18 @@ static inline int kref_put(struct kref *kref, void (*release)(struct kref *kref)
>> > > > return kref_sub(kref, 1, release);
>> > > > }
>> > > >
>> > > > +/**
>> > > > + * kref_put - decrement refcount for object.
>> > > > + * @kref: object.
>> > > > + *
>> > > > + * Decrement the refcount.
>> > > > + * Return 1 if refcount is zero.
>> > > > + */
>> > > > +static inline int __kref_put(struct kref *kref)
>> > > > +{
>> > > > + return atomic_dec_and_test(&kref->refcount);
>> > > > +}
>> > >
>> > > Greg might be interested in this.
>> > >
>> > > It's a pretty specialised thing and perhaps it needs some stern words
>> > > in the description explaining when and why it should and shouldn't be
>> > > used.
>> > >
>> > > I wonder if people might (ab)use this to avoid the "doesn't
>> > > have a release function" warning.
>> >
>> > Yes they would, please don't do this at all.
>> >
>> > In fact, why is it needed? It doesn't solve anything (if it does,
>> > something in the way the kref is being used is wrong.)
>> >
>>
>> It's right there in the changelog. The patch fixes deep
>> kref_put->release->kref_put recursion by turning the operation for
>> pidns into a loop.
>
> But why would a kref release function ever decrement the same kref
> again causing a loop in the first place?
>
This should be kref_put ns->parent recursionly, put_pid_ns(ns) ->
free_pid_ns(ns) -> put_pid_ns(ns->parent)->.......
If users create many nested namespaces, the recursion put_pid_ns()
will exausted the kernel stack.
> That's what I was referring to. This strongly sounds like a problem in
> how the kref is being used, not in the kref code itself.
>
> Is a kref even the correct thing here?
Can we fix this by this way? free_pid_ns just release ns itself, we check
the return value of kref_put, if kref_put returns 1, means ns->kref is removed,
then we kref_put(ns->parent).
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 00474b0..2168535 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -53,8 +53,14 @@ extern int reboot_pid_ns(struct pid_namespace
*pid_ns, int cmd);
static inline void put_pid_ns(struct pid_namespace *ns)
{
- if (ns != &init_pid_ns)
- kref_put(&ns->kref, free_pid_ns);
+ int ret;
+ struct pid_namespace *parent
+
+ while (ns != &init_pid_ns) {
+ parent = ns->parent;
+ ret = kref_put(&ns->kref, free_pid_ns);
+ if (!ret)
+ break;
+ else ns = parent;
+ }
}
#else /* !CONFIG_PID_NS */
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 478bad2..85ca23a 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -135,15 +135,11 @@ struct pid_namespace *copy_pid_ns(unsigned long
flags, struct pid_namespace *old
void free_pid_ns(struct kref *kref)
{
- struct pid_namespace *ns, *parent;
+ struct pid_namespace *ns;
ns = container_of(kref, struct pid_namespace, kref);
- parent = ns->parent;
destroy_pid_namespace(ns);
-
- if (parent != NULL)
- put_pid_ns(parent);
}
EXPORT_SYMBOL_GPL(free_pid_ns);
>
> greg k-h
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On Wed, Oct 10, 2012 at 05:12:21PM +0800, Xiaotian Feng wrote:
> >
> > Is a kref even the correct thing here?
>
> Can we fix this by this way? free_pid_ns just release ns itself, we check
> the return value of kref_put, if kref_put returns 1, means ns->kref is removed,
> then we kref_put(ns->parent).
>
> diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
> index 00474b0..2168535 100644
> --- a/include/linux/pid_namespace.h
> +++ b/include/linux/pid_namespace.h
> @@ -53,8 +53,14 @@ extern int reboot_pid_ns(struct pid_namespace
> *pid_ns, int cmd);
>
I've sent the similar patch yesterday, thanks.
http://www.mail-archive.com/[email protected]/msg19471.html
On Wed, Oct 10, 2012 at 5:12 PM, Xiaotian Feng <[email protected]> wrote:
> On Wed, Oct 10, 2012 at 3:49 PM, Greg KH <[email protected]> wrote:
>> On Tue, Oct 09, 2012 at 12:08:31PM -0700, Andrew Morton wrote:
>>> On Tue, 9 Oct 2012 12:03:00 -0700
>>> Greg KH <[email protected]> wrote:
>>>
>>> > On Tue, Oct 09, 2012 at 11:48:21AM -0700, Andrew Morton wrote:
>>> > > On Sat, 6 Oct 2012 23:56:33 +0400
>>> > > Andrew Vagin <[email protected]> wrote:
>>> > >
>>> > > > Here is a stack trace of recursion:
>>> > > > free_pid_ns(parent)
>>> > > > put_pid_ns(parent)
>>> > > > kref_put(&ns->kref, free_pid_ns);
>>> > > > free_pid_ns
>>> > > >
>>> > > > This patch turns recursion into loops.
>>> > > >
>>> > > > pidns can be nested many times, so in case of recursion
>>> > > > a simple user space program can provoke a kernel panic
>>> > > > due to exceed of a kernel stack.
>>> > >
>>> > > So we should backport this into earlier kernels.
>>> > >
>>> > > > --- a/include/linux/kref.h
>>> > > > +++ b/include/linux/kref.h
>>> > > > @@ -95,6 +95,18 @@ static inline int kref_put(struct kref *kref, void (*release)(struct kref *kref)
>>> > > > return kref_sub(kref, 1, release);
>>> > > > }
>>> > > >
>>> > > > +/**
>>> > > > + * kref_put - decrement refcount for object.
>>> > > > + * @kref: object.
>>> > > > + *
>>> > > > + * Decrement the refcount.
>>> > > > + * Return 1 if refcount is zero.
>>> > > > + */
>>> > > > +static inline int __kref_put(struct kref *kref)
>>> > > > +{
>>> > > > + return atomic_dec_and_test(&kref->refcount);
>>> > > > +}
>>> > >
>>> > > Greg might be interested in this.
>>> > >
>>> > > It's a pretty specialised thing and perhaps it needs some stern words
>>> > > in the description explaining when and why it should and shouldn't be
>>> > > used.
>>> > >
>>> > > I wonder if people might (ab)use this to avoid the "doesn't
>>> > > have a release function" warning.
>>> >
>>> > Yes they would, please don't do this at all.
>>> >
>>> > In fact, why is it needed? It doesn't solve anything (if it does,
>>> > something in the way the kref is being used is wrong.)
>>> >
>>>
>>> It's right there in the changelog. The patch fixes deep
>>> kref_put->release->kref_put recursion by turning the operation for
>>> pidns into a loop.
>>
>> But why would a kref release function ever decrement the same kref
>> again causing a loop in the first place?
>>
>
> This should be kref_put ns->parent recursionly, put_pid_ns(ns) ->
> free_pid_ns(ns) -> put_pid_ns(ns->parent)->.......
> If users create many nested namespaces, the recursion put_pid_ns()
> will exausted the kernel stack.
>
>
>> That's what I was referring to. This strongly sounds like a problem in
>> how the kref is being used, not in the kref code itself.
>>
>> Is a kref even the correct thing here?
>
> Can we fix this by this way? free_pid_ns just release ns itself, we check
> the return value of kref_put, if kref_put returns 1, means ns->kref is removed,
> then we kref_put(ns->parent).
>
> diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
> index 00474b0..2168535 100644
> --- a/include/linux/pid_namespace.h
> +++ b/include/linux/pid_namespace.h
> @@ -53,8 +53,14 @@ extern int reboot_pid_ns(struct pid_namespace
> *pid_ns, int cmd);
>
> static inline void put_pid_ns(struct pid_namespace *ns)
> {
> - if (ns != &init_pid_ns)
> - kref_put(&ns->kref, free_pid_ns);
> + int ret;
> + struct pid_namespace *parent
miss a ";" here
> +
> + while (ns != &init_pid_ns) {
> + parent = ns->parent;
> + ret = kref_put(&ns->kref, free_pid_ns);
> + if (!ret)
This line should be "if (!ret | !parent)"
> + break;
> + else ns = parent;
> + }
> }
>
> #else /* !CONFIG_PID_NS */
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index 478bad2..85ca23a 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -135,15 +135,11 @@ struct pid_namespace *copy_pid_ns(unsigned long
> flags, struct pid_namespace *old
>
> void free_pid_ns(struct kref *kref)
> {
> - struct pid_namespace *ns, *parent;
> + struct pid_namespace *ns;
>
> ns = container_of(kref, struct pid_namespace, kref);
>
> - parent = ns->parent;
> destroy_pid_namespace(ns);
> -
> - if (parent != NULL)
> - put_pid_ns(parent);
> }
> EXPORT_SYMBOL_GPL(free_pid_ns);
>
>
>>
>> greg k-h
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/