2019-04-30 09:12:39

by Petr Mladek

[permalink] [raw]
Subject: [PATCH v2 1/2] livepatch: Remove duplicate warning about missing reliable stacktrace support

WARN_ON_ONCE() could not be called safely under rq lock because
of console deadlock issues. Fortunately, there is another check
for the reliable stacktrace support in klp_enable_patch().

Signed-off-by: Petr Mladek <[email protected]>
---
kernel/livepatch/transition.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index 9c89ae8b337a..8e0274075e75 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -263,8 +263,15 @@ static int klp_check_stack(struct task_struct *task, char *err_buf)
trace.nr_entries = 0;
trace.max_entries = MAX_STACK_ENTRIES;
trace.entries = entries;
+
ret = save_stack_trace_tsk_reliable(task, &trace);
- WARN_ON_ONCE(ret == -ENOSYS);
+ /*
+ * pr_warn() under task rq lock might cause a deadlock.
+ * Fortunately, missing reliable stacktrace support has
+ * already been handled when the livepatch was enabled.
+ */
+ if (ret == -ENOSYS)
+ return ret;
if (ret) {
snprintf(err_buf, STACK_ERR_BUF_SIZE,
"%s: %s:%d has an unreliable stack\n",
--
2.16.4


2019-04-30 11:31:54

by Miroslav Benes

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] livepatch: Remove duplicate warning about missing reliable stacktrace support

On Tue, 30 Apr 2019, Petr Mladek wrote:

> WARN_ON_ONCE() could not be called safely under rq lock because
> of console deadlock issues. Fortunately, there is another check
> for the reliable stacktrace support in klp_enable_patch().
>
> Signed-off-by: Petr Mladek <[email protected]>

Acked-by: Miroslav Benes <[email protected]> with a nit below

> ---
> kernel/livepatch/transition.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> index 9c89ae8b337a..8e0274075e75 100644
> --- a/kernel/livepatch/transition.c
> +++ b/kernel/livepatch/transition.c
> @@ -263,8 +263,15 @@ static int klp_check_stack(struct task_struct *task, char *err_buf)
> trace.nr_entries = 0;
> trace.max_entries = MAX_STACK_ENTRIES;
> trace.entries = entries;
> +

Unnecessary new line?

> ret = save_stack_trace_tsk_reliable(task, &trace);
> - WARN_ON_ONCE(ret == -ENOSYS);
> + /*
> + * pr_warn() under task rq lock might cause a deadlock.
> + * Fortunately, missing reliable stacktrace support has
> + * already been handled when the livepatch was enabled.
> + */
> + if (ret == -ENOSYS)
> + return ret;
> if (ret) {
> snprintf(err_buf, STACK_ERR_BUF_SIZE,
> "%s: %s:%d has an unreliable stack\n",

Miroslav

2019-04-30 12:00:14

by Kamalesh Babulal

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] livepatch: Remove duplicate warning about missing reliable stacktrace support

On Tue, Apr 30, 2019 at 11:10:48AM +0200, Petr Mladek wrote:
> WARN_ON_ONCE() could not be called safely under rq lock because
> of console deadlock issues. Fortunately, there is another check
> for the reliable stacktrace support in klp_enable_patch().
>
> Signed-off-by: Petr Mladek <[email protected]>

Reviewed-by: Kamalesh Babulal <[email protected]>

2019-05-07 00:43:10

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] livepatch: Remove duplicate warning about missing reliable stacktrace support

On Tue, Apr 30, 2019 at 11:10:48AM +0200, Petr Mladek wrote:
> WARN_ON_ONCE() could not be called safely under rq lock because
> of console deadlock issues. Fortunately, there is another check
> for the reliable stacktrace support in klp_enable_patch().
>
> Signed-off-by: Petr Mladek <[email protected]>
> ---
> kernel/livepatch/transition.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> index 9c89ae8b337a..8e0274075e75 100644
> --- a/kernel/livepatch/transition.c
> +++ b/kernel/livepatch/transition.c
> @@ -263,8 +263,15 @@ static int klp_check_stack(struct task_struct *task, char *err_buf)
> trace.nr_entries = 0;
> trace.max_entries = MAX_STACK_ENTRIES;
> trace.entries = entries;
> +
> ret = save_stack_trace_tsk_reliable(task, &trace);
> - WARN_ON_ONCE(ret == -ENOSYS);
> + /*
> + * pr_warn() under task rq lock might cause a deadlock.
> + * Fortunately, missing reliable stacktrace support has
> + * already been handled when the livepatch was enabled.
> + */
> + if (ret == -ENOSYS)
> + return ret;

I find the comment to be a bit wordy and confusing (and vague).

Also this check is effectively the same as the klp_have_reliable_stack()
check which is done in kernel/livepatch/core.c. So I think it would be
clearer and more consistent if the same check is done here:

if (!klp_have_reliable_stack())
return -ENOSYS;

ret = save_stack_trace_tsk_reliable(task, &trace);

[ no need to check ret for ENOSYS here ]

Then, IMO, no comment is needed.

--
Josh

2019-05-07 01:45:22

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] livepatch: Remove duplicate warning about missing reliable stacktrace support

On Mon, May 06, 2019 at 07:40:32PM -0500, Josh Poimboeuf wrote:
> On Tue, Apr 30, 2019 at 11:10:48AM +0200, Petr Mladek wrote:
> > WARN_ON_ONCE() could not be called safely under rq lock because
> > of console deadlock issues. Fortunately, there is another check
> > for the reliable stacktrace support in klp_enable_patch().
> >
> > Signed-off-by: Petr Mladek <[email protected]>
> > ---
> > kernel/livepatch/transition.c | 9 ++++++++-
> > 1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> > index 9c89ae8b337a..8e0274075e75 100644
> > --- a/kernel/livepatch/transition.c
> > +++ b/kernel/livepatch/transition.c
> > @@ -263,8 +263,15 @@ static int klp_check_stack(struct task_struct *task, char *err_buf)
> > trace.nr_entries = 0;
> > trace.max_entries = MAX_STACK_ENTRIES;
> > trace.entries = entries;
> > +
> > ret = save_stack_trace_tsk_reliable(task, &trace);
> > - WARN_ON_ONCE(ret == -ENOSYS);
> > + /*
> > + * pr_warn() under task rq lock might cause a deadlock.
> > + * Fortunately, missing reliable stacktrace support has
> > + * already been handled when the livepatch was enabled.
> > + */
> > + if (ret == -ENOSYS)
> > + return ret;
>
> I find the comment to be a bit wordy and confusing (and vague).
>
> Also this check is effectively the same as the klp_have_reliable_stack()
> check which is done in kernel/livepatch/core.c. So I think it would be
> clearer and more consistent if the same check is done here:
>
> if (!klp_have_reliable_stack())
> return -ENOSYS;
>
> ret = save_stack_trace_tsk_reliable(task, &trace);
>
> [ no need to check ret for ENOSYS here ]
>
> Then, IMO, no comment is needed.

BTW, if you agree with this approach then we can leave the
WARN_ON_ONCE() in save_stack_trace_tsk_reliable() after all.

--
Josh

2019-05-07 09:01:27

by Miroslav Benes

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] livepatch: Remove duplicate warning about missing reliable stacktrace support

On Mon, 6 May 2019, Josh Poimboeuf wrote:

> On Tue, Apr 30, 2019 at 11:10:48AM +0200, Petr Mladek wrote:
> > WARN_ON_ONCE() could not be called safely under rq lock because
> > of console deadlock issues. Fortunately, there is another check
> > for the reliable stacktrace support in klp_enable_patch().
> >
> > Signed-off-by: Petr Mladek <[email protected]>
> > ---
> > kernel/livepatch/transition.c | 9 ++++++++-
> > 1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> > index 9c89ae8b337a..8e0274075e75 100644
> > --- a/kernel/livepatch/transition.c
> > +++ b/kernel/livepatch/transition.c
> > @@ -263,8 +263,15 @@ static int klp_check_stack(struct task_struct *task, char *err_buf)
> > trace.nr_entries = 0;
> > trace.max_entries = MAX_STACK_ENTRIES;
> > trace.entries = entries;
> > +
> > ret = save_stack_trace_tsk_reliable(task, &trace);
> > - WARN_ON_ONCE(ret == -ENOSYS);
> > + /*
> > + * pr_warn() under task rq lock might cause a deadlock.
> > + * Fortunately, missing reliable stacktrace support has
> > + * already been handled when the livepatch was enabled.
> > + */
> > + if (ret == -ENOSYS)
> > + return ret;
>
> I find the comment to be a bit wordy and confusing (and vague).
>
> Also this check is effectively the same as the klp_have_reliable_stack()
> check which is done in kernel/livepatch/core.c. So I think it would be
> clearer and more consistent if the same check is done here:
>
> if (!klp_have_reliable_stack())
> return -ENOSYS;

We removed it in 1d98a69e5cef ("livepatch: Remove reliable stacktrace
check in klp_try_switch_task()") and I do think it does not belong here.
We can check for the facility right at the beginning in klp_enable_patch()
and it is not necessary to do it every time klp_check_stack() is called.

But it is nothing I could not live with, so if you decide it is better, I
will not object.

Miroslav

2019-05-07 11:31:02

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] livepatch: Remove duplicate warning about missing reliable stacktrace support

On Mon 2019-05-06 20:43:32, Josh Poimboeuf wrote:
> On Mon, May 06, 2019 at 07:40:32PM -0500, Josh Poimboeuf wrote:
> > On Tue, Apr 30, 2019 at 11:10:48AM +0200, Petr Mladek wrote:
> > > WARN_ON_ONCE() could not be called safely under rq lock because
> > > of console deadlock issues. Fortunately, there is another check
> > > for the reliable stacktrace support in klp_enable_patch().
> > >
> > > Signed-off-by: Petr Mladek <[email protected]>
> > > ---
> > > kernel/livepatch/transition.c | 9 ++++++++-
> > > 1 file changed, 8 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> > > index 9c89ae8b337a..8e0274075e75 100644
> > > --- a/kernel/livepatch/transition.c
> > > +++ b/kernel/livepatch/transition.c
> > > @@ -263,8 +263,15 @@ static int klp_check_stack(struct task_struct *task, char *err_buf)
> > > trace.nr_entries = 0;
> > > trace.max_entries = MAX_STACK_ENTRIES;
> > > trace.entries = entries;
> > > +
> > > ret = save_stack_trace_tsk_reliable(task, &trace);
> > > - WARN_ON_ONCE(ret == -ENOSYS);
> > > + /*
> > > + * pr_warn() under task rq lock might cause a deadlock.
> > > + * Fortunately, missing reliable stacktrace support has
> > > + * already been handled when the livepatch was enabled.
> > > + */
> > > + if (ret == -ENOSYS)
> > > + return ret;
> >
> > I find the comment to be a bit wordy and confusing (and vague).

Then please provide a better one. I have no idea what might make
you happy and am not interested into an endless disputing.

> > Also this check is effectively the same as the klp_have_reliable_stack()
> > check which is done in kernel/livepatch/core.c. So I think it would be
> > clearer and more consistent if the same check is done here:
> >
> > if (!klp_have_reliable_stack())
> > return -ENOSYS;

Huh, it smells with over engineering to me.

> > ret = save_stack_trace_tsk_reliable(task, &trace);
> >
> > [ no need to check ret for ENOSYS here ]
> >
> > Then, IMO, no comment is needed.
>
> BTW, if you agree with this approach then we can leave the
> WARN_ON_ONCE() in save_stack_trace_tsk_reliable() after all.

I really like the removal of the WARN_ON_ONCE(). I consider
it an old fashioned way used when people are lazy to handle
errors. It might make sense when the backtrace helps to locate
the context but the context is well known here. Finally,
WARN() should be used with care. It might cause reboot
with panic_on_warn.

Best Regards,
Petr

2019-05-07 12:05:29

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] livepatch: Remove duplicate warning about missing reliable stacktrace support

On Tue, May 07, 2019 at 01:29:50PM +0200, Petr Mladek wrote:
> On Mon 2019-05-06 20:43:32, Josh Poimboeuf wrote:
> > On Mon, May 06, 2019 at 07:40:32PM -0500, Josh Poimboeuf wrote:
> > > On Tue, Apr 30, 2019 at 11:10:48AM +0200, Petr Mladek wrote:
> > > > WARN_ON_ONCE() could not be called safely under rq lock because
> > > > of console deadlock issues. Fortunately, there is another check
> > > > for the reliable stacktrace support in klp_enable_patch().
> > > >
> > > > Signed-off-by: Petr Mladek <[email protected]>
> > > > ---
> > > > kernel/livepatch/transition.c | 9 ++++++++-
> > > > 1 file changed, 8 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> > > > index 9c89ae8b337a..8e0274075e75 100644
> > > > --- a/kernel/livepatch/transition.c
> > > > +++ b/kernel/livepatch/transition.c
> > > > @@ -263,8 +263,15 @@ static int klp_check_stack(struct task_struct *task, char *err_buf)
> > > > trace.nr_entries = 0;
> > > > trace.max_entries = MAX_STACK_ENTRIES;
> > > > trace.entries = entries;
> > > > +
> > > > ret = save_stack_trace_tsk_reliable(task, &trace);
> > > > - WARN_ON_ONCE(ret == -ENOSYS);
> > > > + /*
> > > > + * pr_warn() under task rq lock might cause a deadlock.
> > > > + * Fortunately, missing reliable stacktrace support has
> > > > + * already been handled when the livepatch was enabled.
> > > > + */
> > > > + if (ret == -ENOSYS)
> > > > + return ret;
> > >
> > > I find the comment to be a bit wordy and confusing (and vague).
>
> Then please provide a better one. I have no idea what might make
> you happy and am not interested into an endless disputing.

Something like this would be clearer:

if (ret == -ENOSYS) {
/*
* This arch doesn't support reliable stack tracing. No
* need to print a warning; that has already been done
* by klp_enable_patch().
*/
return ret;
}

But my next point was that changing the code would be even better than
fixing the comment.

> > > Also this check is effectively the same as the klp_have_reliable_stack()
> > > check which is done in kernel/livepatch/core.c. So I think it would be
> > > clearer and more consistent if the same check is done here:
> > >
> > > if (!klp_have_reliable_stack())
> > > return -ENOSYS;
>
> Huh, it smells with over engineering to me.

How so? It makes the code more readable and the generated code should
be much better because it becomes a build-time check.

But I think Miroslav's suggestion to revert 1d98a69e5cef would be even
better.

> > > ret = save_stack_trace_tsk_reliable(task, &trace);
> > >
> > > [ no need to check ret for ENOSYS here ]
> > >
> > > Then, IMO, no comment is needed.
> >
> > BTW, if you agree with this approach then we can leave the
> > WARN_ON_ONCE() in save_stack_trace_tsk_reliable() after all.
>
> I really like the removal of the WARN_ON_ONCE(). I consider
> it an old fashioned way used when people are lazy to handle
> errors. It might make sense when the backtrace helps to locate
> the context but the context is well known here. Finally,
> WARN() should be used with care. It might cause reboot
> with panic_on_warn.

The warning makes the function consistent with the other weak functions
in stacktrace.c and clarifies that it should never be called unless an
arch has misconfigured something. And if we aren't even checking the
specific ENOSYS error as I proposed then this warning would make the
error more obvious.

--
Josh

2019-05-07 14:31:26

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] livepatch: Remove duplicate warning about missing reliable stacktrace support

On Tue 2019-05-07 07:03:50, Josh Poimboeuf wrote:
> On Tue, May 07, 2019 at 01:29:50PM +0200, Petr Mladek wrote:
> > On Mon 2019-05-06 20:43:32, Josh Poimboeuf wrote:
> > > On Mon, May 06, 2019 at 07:40:32PM -0500, Josh Poimboeuf wrote:
> > > > On Tue, Apr 30, 2019 at 11:10:48AM +0200, Petr Mladek wrote:
> > > > > --- a/kernel/livepatch/transition.c
> > > > > +++ b/kernel/livepatch/transition.c
> > > > > @@ -263,8 +263,15 @@ static int klp_check_stack(struct task_struct *task, char *err_buf)
> > > > > trace.nr_entries = 0;
> > > > > trace.max_entries = MAX_STACK_ENTRIES;
> > > > > trace.entries = entries;
> > > > > +
> > > > > ret = save_stack_trace_tsk_reliable(task, &trace);
> > > > > - WARN_ON_ONCE(ret == -ENOSYS);
> > > > > + /*
> > > > > + * pr_warn() under task rq lock might cause a deadlock.
> > > > > + * Fortunately, missing reliable stacktrace support has
> > > > > + * already been handled when the livepatch was enabled.
> > > > > + */
> > > > > + if (ret == -ENOSYS)
> > > > > + return ret;
> > > >
> > > > I find the comment to be a bit wordy and confusing (and vague).
> >
> > Then please provide a better one. I have no idea what might make
> > you happy and am not interested into an endless disputing.
>
> Something like this would be clearer:
>
> if (ret == -ENOSYS) {
> /*
> * This arch doesn't support reliable stack tracing. No
> * need to print a warning; that has already been done
> * by klp_enable_patch().
> */
> return ret;
> }

I do not mind.

> > > > Also this check is effectively the same as the klp_have_reliable_stack()
> > > > check which is done in kernel/livepatch/core.c. So I think it would be
> > > > clearer and more consistent if the same check is done here:
> > > >
> > > > if (!klp_have_reliable_stack())
> > > > return -ENOSYS;
> >
> > Huh, it smells with over engineering to me.
>
> How so? It makes the code more readable and the generated code should
> be much better because it becomes a build-time check.

save_stack_trace_tsk_reliable() returns various error codes.
We catch a specific one because otherwise the message below
might be misleading.

I do not see why we should prevent this error by calling
a custom hack: klp_have_reliable_stack()?

Regarding reliability. If anyone changes semantic of
save_stack_trace_tsk_reliable() error codes, they would likely
check if all users (one at the moment) handle it correctly.

On the other hand, the dependency between the -ENOSYS
return value and klp_have_reliable_stack() is far from
obvious.

If we want to discuss and fix this to the death. We should change
the return value from -ENOSYS to -EOPNOTSUPP. The reason
is the same as in the commit 375bfca3459db1c5596
("livepatch: core: Return EOPNOTSUPP instead of ENOSYS").

Note that EOPNOTSUPP is the same errno as ENOTSUP, see
man 3 errno.


> But I think Miroslav's suggestion to revert 1d98a69e5cef would be even
> better.

AFAIK, Miroslav wanted to point out that your opinion was inconsistent.


> > > > ret = save_stack_trace_tsk_reliable(task, &trace);
> > > >
> > > > [ no need to check ret for ENOSYS here ]
> > > >
> > > > Then, IMO, no comment is needed.
> > >
> > > BTW, if you agree with this approach then we can leave the
> > > WARN_ON_ONCE() in save_stack_trace_tsk_reliable() after all.
> >
> > I really like the removal of the WARN_ON_ONCE(). I consider
> > it an old fashioned way used when people are lazy to handle
> > errors. It might make sense when the backtrace helps to locate
> > the context but the context is well known here. Finally,
> > WARN() should be used with care. It might cause reboot
> > with panic_on_warn.
>
> The warning makes the function consistent with the other weak functions
> in stacktrace.c and clarifies that it should never be called unless an
> arch has misconfigured something. And if we aren't even checking the
> specific ENOSYS error as I proposed then this warning would make the
> error more obvious.

I consider both WARN() and error value as superfluous. I like the
error value because it allows users to handle the situation as
they need it.

Best Regards,
Petr

PS: This is my last mail in the thread this week. I will eventually
return to it with a clear head next week. It is all nitpicking from
my POV and I have better things to do.

2019-05-07 21:26:57

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] livepatch: Remove duplicate warning about missing reliable stacktrace support

On Tue, May 07, 2019 at 04:28:47PM +0200, Petr Mladek wrote:
> > > > > Also this check is effectively the same as the klp_have_reliable_stack()
> > > > > check which is done in kernel/livepatch/core.c. So I think it would be
> > > > > clearer and more consistent if the same check is done here:
> > > > >
> > > > > if (!klp_have_reliable_stack())
> > > > > return -ENOSYS;
> > >
> > > Huh, it smells with over engineering to me.
> >
> > How so? It makes the code more readable and the generated code should
> > be much better because it becomes a build-time check.
>
> save_stack_trace_tsk_reliable() returns various error codes.
> We catch a specific one because otherwise the message below
> might be misleading.
>
> I do not see why we should prevent this error by calling
> a custom hack: klp_have_reliable_stack()?

I wouldn't call it a hack. It's a simple build-time check.

> Regarding reliability. If anyone changes semantic of
> save_stack_trace_tsk_reliable() error codes, they would likely
> check if all users (one at the moment) handle it correctly.
>
> On the other hand, the dependency between the -ENOSYS
> return value and klp_have_reliable_stack() is far from
> obvious.

I don't follow your point.

> If we want to discuss and fix this to the death. We should change
> the return value from -ENOSYS to -EOPNOTSUPP. The reason
> is the same as in the commit 375bfca3459db1c5596
> ("livepatch: core: Return EOPNOTSUPP instead of ENOSYS").
>
> Note that EOPNOTSUPP is the same errno as ENOTSUP, see
> man 3 errno.

Sure, but that's a separate issue.

> > But I think Miroslav's suggestion to revert 1d98a69e5cef would be even
> > better.
>
> AFAIK, Miroslav wanted to point out that your opinion was inconsistent.

How is my opinion inconsistent?

Is there something wrong with the original approach, which was reverted
with

1d98a69e5cef ("livepatch: Remove reliable stacktrace check in klp_try_switch_task()")

?

As I mentioned, that has some advantages:

- The generated code is simpler/faster because it uses a build-time
check.

- The code is more readable IMO. Especially if the check is done higher
up the call stack by reverting 1d98a69e5cef. That way the arch
support short-circuiting is done in the logical place, before doing
any more unnecessary work. It's faster, but also, more importantly,
it's less surprising.

- It's also more consistent with other code, since the intent of this
check is the same as the klp_have_reliable_stack() use in
klp_enabled_patch().

If you disagree with those, please explain why.

> > > > > ret = save_stack_trace_tsk_reliable(task, &trace);
> > > > >
> > > > > [ no need to check ret for ENOSYS here ]
> > > > >
> > > > > Then, IMO, no comment is needed.
> > > >
> > > > BTW, if you agree with this approach then we can leave the
> > > > WARN_ON_ONCE() in save_stack_trace_tsk_reliable() after all.
> > >
> > > I really like the removal of the WARN_ON_ONCE(). I consider
> > > it an old fashioned way used when people are lazy to handle
> > > errors. It might make sense when the backtrace helps to locate
> > > the context but the context is well known here. Finally,
> > > WARN() should be used with care. It might cause reboot
> > > with panic_on_warn.
> >
> > The warning makes the function consistent with the other weak functions
> > in stacktrace.c and clarifies that it should never be called unless an
> > arch has misconfigured something. And if we aren't even checking the
> > specific ENOSYS error as I proposed then this warning would make the
> > error more obvious.
>
> I consider both WARN() and error value as superfluous. I like the
> error value because it allows users to handle the situation as
> they need it.

... but if there are no users of the weak function then the point is
moot.

> Best Regards,
> Petr
>
> PS: This is my last mail in the thread this week. I will eventually
> return to it with a clear head next week. It is all nitpicking from
> my POV and I have better things to do.

I don't think it's helpful to characterize it as nitpicking. The
details of the code are important.

--
Josh

2019-05-09 07:26:01

by Miroslav Benes

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] livepatch: Remove duplicate warning about missing reliable stacktrace support


> > > But I think Miroslav's suggestion to revert 1d98a69e5cef would be even
> > > better.
> >
> > AFAIK, Miroslav wanted to point out that your opinion was inconsistent.
>
> How is my opinion inconsistent?
>
> Is there something wrong with the original approach, which was reverted
> with
>
> 1d98a69e5cef ("livepatch: Remove reliable stacktrace check in klp_try_switch_task()")
>
> ?
>
> As I mentioned, that has some advantages:
>
> - The generated code is simpler/faster because it uses a build-time
> check.
>
> - The code is more readable IMO. Especially if the check is done higher
> up the call stack by reverting 1d98a69e5cef. That way the arch
> support short-circuiting is done in the logical place, before doing
> any more unnecessary work. It's faster, but also, more importantly,
> it's less surprising.

Correct. I forgot we removed return from klp_enable_patch() if
klp_have_reliable_stack() errors out and we only warn now. So reverting
1d98a69e5cef definitely makes sense.

My...

"We removed it in 1d98a69e5cef ("livepatch: Remove reliable stacktrace
check in klp_try_switch_task()") and I do think it does not belong here.
We can check for the facility right at the beginning in klp_enable_patch()
and it is not necessary to do it every time klp_check_stack() is called."

...from the other email is rubbish then.

Miroslav

2019-05-13 12:55:25

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] livepatch: Remove duplicate warning about missing reliable stacktrace support

On Tue 2019-05-07 16:24:25, Josh Poimboeuf wrote:
> On Tue, May 07, 2019 at 04:28:47PM +0200, Petr Mladek wrote:
> > > > > > Also this check is effectively the same as the klp_have_reliable_stack()
> > > > > > check which is done in kernel/livepatch/core.c. So I think it would be
> > > > > > clearer and more consistent if the same check is done here:
> > > > > >
> > > > > > if (!klp_have_reliable_stack())
> > > > > > return -ENOSYS;
> > > >
> > > > Huh, it smells with over engineering to me.
> > >
> > > How so? It makes the code more readable and the generated code should
> > > be much better because it becomes a build-time check.
> >
> > save_stack_trace_tsk_reliable() returns various error codes.
> > We catch a specific one because otherwise the message below
> > might be misleading.
> >
> > I do not see why we should prevent this error by calling
> > a custom hack: klp_have_reliable_stack()?
>
> I wouldn't call it a hack. It's a simple build-time check.
>
> > Regarding reliability. If anyone changes semantic of
> > save_stack_trace_tsk_reliable() error codes, they would likely
> > check if all users (one at the moment) handle it correctly.
> >
> > On the other hand, the dependency between the -ENOSYS
> > return value and klp_have_reliable_stack() is far from
> > obvious.
>
> I don't follow your point.

We implement klp_have_reliable_stack() in livepatch subsystem.
It checks config options that defines behavior of the
stacktrace subsystem.

We use the above livepatch-specific function to warn about
that a function from stacktrace subsustem will not work.
You even suggest to use it to ignore result from
the stacktrace subsystem.

OK, using klp_have_reliable_stack() on both locations
would keep it consistent.

My point is that the check itself is not reliable because
it is "hard" to maintain.

Instead, I suggest to remove klp_have_reliable_stack() and use
the following in klp_enable_patch().


if (stack_trace_save_tsk_reliable(current, NULL, 0) == -ENOSYS) {
pr_warn("This architecture doesn't have support for the livepatch consistency model.\n");
pr_warn("The livepatch transition may never complete.\n");
}

Also I suggest to remove the check in klp_check_stack() completely.
We will always print that the stack is not reliable but only when
the debug message is enabled. It is slightly misleading
message for -ENOSYS. But I doubt that it could cause much
troubles in reality. This situation should be really rare
and easy to debug.


> > If we want to discuss and fix this to the death. We should change
> > the return value from -ENOSYS to -EOPNOTSUPP. The reason
> > is the same as in the commit 375bfca3459db1c5596
> > ("livepatch: core: Return EOPNOTSUPP instead of ENOSYS").
> >
> > Note that EOPNOTSUPP is the same errno as ENOTSUP, see
> > man 3 errno.
>
> Sure, but that's a separate issue.

I just wanted to show that we might spend even more time with
making this code briliant.


> > > But I think Miroslav's suggestion to revert 1d98a69e5cef would be even
> > > better.
> >
> > AFAIK, Miroslav wanted to point out that your opinion was inconsistent.
>
> How is my opinion inconsistent?

1. You have already acked the removal of WARN_ON() in
klp_have_reliable_stack(),
see https://lkml.kernel.org/r/20190424155154.h62wc3nt7ib2phdy@treble

You even suggested it, see
https://lkml.kernel.org/r/20190418135858.n3lzq2oxkj52m6bi@treble

But you suggest to put it back now.


2. You suggested to remove the warning when klp_check_stack() because
it was superfluous. There was a discussion about keeping
the check for -ENOSYS there and you did not react, see
https://lkml.kernel.org/r/20190424155532.3uyxyxwm4c5dqsf5@treble

You even acked the commit 1d98a69e5cef3aeb68bcef ("livepatch:
Remove reliable stacktrace check in klp_try_switch_task()").

And you suddenly want to revert it.


> Is there something wrong with the original approach, which was reverted
> with
>
> 1d98a69e5cef ("livepatch: Remove reliable stacktrace check in klp_try_switch_task()")
>
> ?
>
> As I mentioned, that has some advantages:
>
> - The generated code is simpler/faster because it uses a build-time
> check.
>
> - The code is more readable IMO. Especially if the check is done higher
> up the call stack by reverting 1d98a69e5cef. That way the arch
> support short-circuiting is done in the logical place, before doing
> any more unnecessary work. It's faster, but also, more importantly,
> it's less surprising.
>
> - It's also more consistent with other code, since the intent of this
> check is the same as the klp_have_reliable_stack() use in
> klp_enabled_patch().
>
> If you disagree with those, please explain why.

As I said. I think that it is less reliable because we check config
options of an unrelated subsystem.

Also I think that it is overengineered.
save_stack_trace_tsk_reliable() is able to tell when it failed.
This particular failure is superfast. IMHO, it is not worth such
an optimization.

In fact, it is a compile time check as well because the inline
from include/linux/stacktrace.h


> > PS: This is my last mail in the thread this week. I will eventually
> > return to it with a clear head next week. It is all nitpicking from
> > my POV and I have better things to do.
>
> I don't think it's helpful to characterize it as nitpicking. The
> details of the code are important.

1. You had issues with almost all my printk() messages, comments,
and commit messages. I know that my English is not perfect.
And that you might want to highlight another information.
But was is it really that bad?

2. This entire patchset is about adding and removing messages
and checks. We have 3rd version and you are still not happy.
Not to say that you suggest something else each time.


Frankly, I do mind too much about this code path, which and how
many warnings are printed. I am not aware about any complains.
IMHO, only people adding support for new architecture
might go this way.

I just wanted to switch the error to warning because Thomas
Gleixner wondered about unused code on s390 when refactoring
the stacktrace code.

I really did not expect that I would spend hours/days on this.
I do not think that it is worth it.

I consider most of the requests as nitpicking because
they requests extra work just because it looks like
a good idea.

My take is that we should accept changes when they improve
the situation and go in the right direction. Further clean up
might be done later by anyone who does not have anything
more important on the plate or gets annoyed enough.

Yes, you have many great ideas. But I am not a trained
monkey. And I do not know how to stop this when it is
looking endless.

Best Regards,
Petr

2019-05-13 19:31:03

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] livepatch: Remove duplicate warning about missing reliable stacktrace support

On Mon, May 13, 2019 at 12:43:18PM +0200, Petr Mladek wrote:
> On Tue 2019-05-07 16:24:25, Josh Poimboeuf wrote:
> > On Tue, May 07, 2019 at 04:28:47PM +0200, Petr Mladek wrote:
> > > > > > > Also this check is effectively the same as the klp_have_reliable_stack()
> > > > > > > check which is done in kernel/livepatch/core.c. So I think it would be
> > > > > > > clearer and more consistent if the same check is done here:
> > > > > > >
> > > > > > > if (!klp_have_reliable_stack())
> > > > > > > return -ENOSYS;
> > > > >
> > > > > Huh, it smells with over engineering to me.
> > > >
> > > > How so? It makes the code more readable and the generated code should
> > > > be much better because it becomes a build-time check.
> > >
> > > save_stack_trace_tsk_reliable() returns various error codes.
> > > We catch a specific one because otherwise the message below
> > > might be misleading.
> > >
> > > I do not see why we should prevent this error by calling
> > > a custom hack: klp_have_reliable_stack()?
> >
> > I wouldn't call it a hack. It's a simple build-time check.
> >
> > > Regarding reliability. If anyone changes semantic of
> > > save_stack_trace_tsk_reliable() error codes, they would likely
> > > check if all users (one at the moment) handle it correctly.
> > >
> > > On the other hand, the dependency between the -ENOSYS
> > > return value and klp_have_reliable_stack() is far from
> > > obvious.
> >
> > I don't follow your point.
>
> We implement klp_have_reliable_stack() in livepatch subsystem.
> It checks config options that defines behavior of the
> stacktrace subsystem.
>
> We use the above livepatch-specific function to warn about
> that a function from stacktrace subsustem will not work.
> You even suggest to use it to ignore result from
> the stacktrace subsystem.
>
> OK, using klp_have_reliable_stack() on both locations
> would keep it consistent.
>
> My point is that the check itself is not reliable because
> it is "hard" to maintain.

I don't see how it would be "hard" to maintain. If an arch implements
reliable stack tracing, but forgets to set HAVE_RELIABLE_STACKTRACE then
they will notice the warning immediately when they try to livepatch.

> Instead, I suggest to remove klp_have_reliable_stack() and use
> the following in klp_enable_patch().
>
>
> if (stack_trace_save_tsk_reliable(current, NULL, 0) == -ENOSYS) {
> pr_warn("This architecture doesn't have support for the livepatch consistency model.\n");
> pr_warn("The livepatch transition may never complete.\n");
> }

It seems confusing to call that function where it isn't needed, since
klp_enable_patch() isn't doing stack tracing yet at that location.

Calling klp_have_reliable_stack() is more readable, and is exactly the
check we want to make.

> Also I suggest to remove the check in klp_check_stack() completely.
> We will always print that the stack is not reliable but only when
> the debug message is enabled. It is slightly misleading
> message for -ENOSYS. But I doubt that it could cause much
> troubles in reality. This situation should be really rare
> and easy to debug.

If you want to remove the specific check in klp_check_stack(), that's
fine with me. I slightly prefer just reverting Kamalesh's commit, but I
don't have a strong feeling either way.

> > > If we want to discuss and fix this to the death. We should change
> > > the return value from -ENOSYS to -EOPNOTSUPP. The reason
> > > is the same as in the commit 375bfca3459db1c5596
> > > ("livepatch: core: Return EOPNOTSUPP instead of ENOSYS").
> > >
> > > Note that EOPNOTSUPP is the same errno as ENOTSUP, see
> > > man 3 errno.
> >
> > Sure, but that's a separate issue.
>
> I just wanted to show that we might spend even more time with
> making this code briliant.
>
>
> > > > But I think Miroslav's suggestion to revert 1d98a69e5cef would be even
> > > > better.
> > >
> > > AFAIK, Miroslav wanted to point out that your opinion was inconsistent.
> >
> > How is my opinion inconsistent?

I don't guarantee that I will always be consistent with my past self.
My thinking may evolve (or devolve?) over time. And there's nothing
wrong with that. But none of the below is actually inconsistent.

> 1. You have already acked the removal of WARN_ON() in
> klp_have_reliable_stack(),
> see https://lkml.kernel.org/r/20190424155154.h62wc3nt7ib2phdy@treble
>
> You even suggested it, see
> https://lkml.kernel.org/r/20190418135858.n3lzq2oxkj52m6bi@treble
>
> But you suggest to put it back now.

Maybe I wasn't clear. If we're not planning on calling the weak version
of the function, then the warning makes sense.

> 2. You suggested to remove the warning when klp_check_stack() because
> it was superfluous. There was a discussion about keeping
> the check for -ENOSYS there and you did not react, see
> https://lkml.kernel.org/r/20190424155532.3uyxyxwm4c5dqsf5@treble

But then (I thought) Miroslav suggested reverting 1d98a69e5cef, which is
a better idea.

> You even acked the commit 1d98a69e5cef3aeb68bcef ("livepatch:
> Remove reliable stacktrace check in klp_try_switch_task()").
>
> And you suddenly want to revert it.

The circumstances have changed since that commit: now we are going back
to allowing non-reliable arches to load livepatches.

> > Is there something wrong with the original approach, which was reverted
> > with
> >
> > 1d98a69e5cef ("livepatch: Remove reliable stacktrace check in klp_try_switch_task()")
> >
> > ?
> >
> > As I mentioned, that has some advantages:
> >
> > - The generated code is simpler/faster because it uses a build-time
> > check.
> >
> > - The code is more readable IMO. Especially if the check is done higher
> > up the call stack by reverting 1d98a69e5cef. That way the arch
> > support short-circuiting is done in the logical place, before doing
> > any more unnecessary work. It's faster, but also, more importantly,
> > it's less surprising.
> >
> > - It's also more consistent with other code, since the intent of this
> > check is the same as the klp_have_reliable_stack() use in
> > klp_enabled_patch().
> >
> > If you disagree with those, please explain why.
>
> As I said. I think that it is less reliable because we check config
> options of an unrelated subsystem.

There's no harm in checking the config option of another subsystem.
Livepatch very much relies on stacktrace.

> Also I think that it is overengineered.
> save_stack_trace_tsk_reliable() is able to tell when it failed.
> This particular failure is superfast. IMHO, it is not worth such
> an optimization.

Sure, performance isn't a concern, but code simplicity, readability, and
consistency certainly are.

> In fact, it is a compile time check as well because the inline
> from include/linux/stacktrace.h
>
>
> > > PS: This is my last mail in the thread this week. I will eventually
> > > return to it with a clear head next week. It is all nitpicking from
> > > my POV and I have better things to do.
> >
> > I don't think it's helpful to characterize it as nitpicking. The
> > details of the code are important.
>
> 1. You had issues with almost all my printk() messages, comments,
> and commit messages. I know that my English is not perfect.
> And that you might want to highlight another information.
> But was is it really that bad?

It's just code review.

> 2. This entire patchset is about adding and removing messages
> and checks. We have 3rd version and you are still not happy.
> Not to say that you suggest something else each time.

That's how review works. If the code improves with each iteration then
how is that a problem?

> Frankly, I do mind too much about this code path, which and how
> many warnings are printed. I am not aware about any complains.
> IMHO, only people adding support for new architecture
> might go this way.
>
> I just wanted to switch the error to warning because Thomas
> Gleixner wondered about unused code on s390 when refactoring
> the stacktrace code.
>
> I really did not expect that I would spend hours/days on this.
> I do not think that it is worth it.
>
> I consider most of the requests as nitpicking because
> they requests extra work just because it looks like
> a good idea.
>
> My take is that we should accept changes when they improve
> the situation and go in the right direction. Further clean up
> might be done later by anyone who does not have anything
> more important on the plate or gets annoyed enough.
>
> Yes, you have many great ideas. But I am not a trained
> monkey. And I do not know how to stop this when it is
> looking endless.

I'm just trying to help improve the code. That's how open source works.
And it's my responsibility as a maintainer. If you're not open to
review, don't bother posting the code in the first place.

--
Josh