2017-06-05 20:24:16

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback

On Fri, May 26, 2017 at 02:55:18PM -0700, Dmitry Torokhov wrote:
> On Fri, May 26, 2017 at 02:32:31PM -0700, Luis R. Rodriguez wrote:
> > On Fri, May 26, 2017 at 2:26 PM, Dmitry Torokhov
> > <[email protected]> wrote:
> > > On Fri, May 26, 2017 at 12:46 PM, Luis R. Rodriguez <[email protected]> wrote:
> > >> On Fri, May 26, 2017 at 06:09:29AM -0500, Eric W. Biederman wrote:
> > >>> "Fuzzey, Martin" <[email protected]> writes:
> > >>> >>>> Maybe SIGCHLD shouldn't interrupt firmware loading?
> > >>> >
> > >>> > I don't think there's a way of doing that without disabling all
> > >>> > signals (ie using the non interruptible wait variants).
> > >>> > It used to be that way (which is why I only ran into this after
> > >>> > updating from an ancient 3.16 kernel to a slightly less ancient 4.4)
> > >>> > But there are valid reasons for wanting to be able to interrupt
> > >>> > firmware loading (like being able to kill the userspace helper)
> > >>>
> > >>> Perhaps simply using a killable wait and not a fully interruptible
> > >>> wait would be better?
> > >>
> > >> What do you mean by a killable wait BTW?
> > >
> > > https://lwn.net/Articles/288056/

Read it thanks ! As per this it states, "Kernel code which uses interruptible
sleeps must always check to see whether it woke up as a result of a signal,
and, if so, clean up whatever it was doing and return -EINTR back to user
space." -- but also on the same article it quotes Alan Cox as having noted
"Unix tradition (and thus almost all applications) believe file store writes to
be non signal interruptible. It would not be safe or practical to change that
guarantee."

For these two reasons then it would seem best we do two things actually:

1) return -EINTR instead of -EAGAIN when we detect swait_event_interruptible_timeout()
got interrupted by a signal (it returns -ERESTARTSYS)
2) Do as you note below and add wait_event_killable_timeout()

> > > I think only interrupting firmware loading with fatal signals would
> > > make a lot of sense.
> > >
> > >>
> > >> ret = swait_event_interruptible_timeout() is being used right now.
> > >
> > > It looks like we are missing swait_event_killable*(), but I do not
> > > think it would be hard to add.
> >
> > What should we do for stable ? Is this a *stable* issue ?
>
> I think it is, as you have users complaining about behavior. I do not
> think we need to make their lives harder than needed by requiring
> handling signals.

Makes sense, specially given the long tradition, it would breaking a long
tradition. Even though in this case we are dealing with sysfs files it
should be no different.

> I do not see why we could not introduce wait_event_killable_timeout()
> and swait_event_killable_timeout() into -stables.

After seeing how simple it is to do so I tend to agree. Greg, Peter,
what are your thoughts ?

Martin Fuzzey can you test this patch as an alternative to your issue ?

diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index b9f907eedbf7..70fc42e5e0da 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -131,7 +131,7 @@ static int __fw_state_wait_common(struct fw_state *fw_st, long timeout)
{
long ret;

- ret = swait_event_interruptible_timeout(fw_st->wq,
+ ret = swait_event_killable_timeout(fw_st->wq,
__fw_state_is_done(READ_ONCE(fw_st->status)),
timeout);
if (ret != 0 && fw_st->status == FW_STATUS_ABORTED)
diff --git a/include/linux/swait.h b/include/linux/swait.h
index c1f9c62a8a50..9c5ca2898b2f 100644
--- a/include/linux/swait.h
+++ b/include/linux/swait.h
@@ -169,4 +169,29 @@ do { \
__ret; \
})

+#define __swait_event_killable(wq, condition) \
+ (void)___swait_event(wq, condition, TASK_KILLABLE, 0, schedule())
+
+#define swait_event_killable(wq, condition) \
+({ \
+ int __ret = 0; \
+ if (!(condition)) \
+ __ret = __swait_event_killable(wq, condition); \
+ __ret; \
+})
+
+#define __swait_event_killable_timeout(wq, condition, timeout) \
+ ___swait_event(wq, ___wait_cond_timeout(condition), \
+ TASK_INTERRUPTIBLE, timeout, \
+ __ret = schedule_timeout(__ret))
+
+#define swait_event_killable_timeout(wq, condition, timeout) \
+({ \
+ long __ret = timeout; \
+ if (!___wait_cond_timeout(condition)) \
+ __ret = __swait_event_killable_timeout(wq, \
+ condition, timeout); \
+ __ret; \
+})
+
#endif /* _LINUX_SWAIT_H */

Luis


2017-06-06 09:04:46

by Martin Fuzzey

[permalink] [raw]
Subject: Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback

On 05/06/17 22:24, Luis R. Rodriguez wrote:
>
>
> For these two reasons then it would seem best we do two things actually:
>
> 1) return -EINTR instead of -EAGAIN when we detect swait_event_interruptible_timeout()
> got interrupted by a signal (it returns -ERESTARTSYS)


I disagree. That would force userspace to handle the signal rather than
having the kernel retry.

From Documentation/DocBook/kernel-hacking.tmpl:

After you slept you should check if a signal occurred: the
Unix/Linux way of handling signals is to temporarily exit the
system call with the <constant>-ERESTARTSYS</constant> error. The
system call entry code will switch back to user context, process
the signal handler and then your system call will be restarted
(unless the user disabled that). So you should be prepared to
process the restart, e.g. if you're in the middle of manipulating
some data structure.



> 2) Do as you note below and add wait_event_killable_timeout()

Hum,
I do think that would be better but, (please correct me if I'm wrong)
the _killable_ variants only allow
SIGKILL (and not SIGINT).

0cb64249ca "firmware_loader: abort request if wait_for_completion is
interrupted"

specifically mentrions ctrl-c (SIGINT) in the commit message so that
would no longer work.

Myself I think having to use kill -9 to interrupt firmware loading by a
usespace helper is OK but others may disagree.

> I do not see why we could not introduce wait_event_killable_timeout()
> and swait_event_killable_timeout() into -stables.
> After seeing how simple it is to do so I tend to agree. Greg, Peter,
> what are your thoughts ?
>
> Martin Fuzzey can you test this patch as an alternative to your issue ?
>
> diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
> index b9f907eedbf7..70fc42e5e0da 100644
> --- a/drivers/base/firmware_class.c
> +++ b/drivers/base/firmware_class.c
> @@ -131,7 +131,7 @@ static int __fw_state_wait_common(struct fw_state *fw_st, long timeout)
> {
> long ret;
>
> - ret = swait_event_interruptible_timeout(fw_st->wq,
> + ret = swait_event_killable_timeout(fw_st->wq,
> __fw_state_is_done(READ_ONCE(fw_st->status)),
> timeout);
> if (ret != 0 && fw_st->status == FW_STATUS_ABORTED)
> diff --git a/include/linux/swait.h b/include/linux/swait.h
> index c1f9c62a8a50..9c5ca2898b2f 100644
> --- a/include/linux/swait.h
> +++ b/include/linux/swait.h
> @@ -169,4 +169,29 @@ do { \
> __ret; \
> })
>
> +#define __swait_event_killable(wq, condition) \
> + (void)___swait_event(wq, condition, TASK_KILLABLE, 0, schedule())
> +
> +#define swait_event_killable(wq, condition) \
> +({ \
> + int __ret = 0; \
> + if (!(condition)) \
> + __ret = __swait_event_killable(wq, condition); \
> + __ret; \
> +})
> +
> +#define __swait_event_killable_timeout(wq, condition, timeout) \
> + ___swait_event(wq, ___wait_cond_timeout(condition), \
> + TASK_INTERRUPTIBLE, timeout, \
> + __ret = schedule_timeout(__ret))
> +

Should be TASK_KILLABLE above

> +#define swait_event_killable_timeout(wq, condition, timeout) \
> +({ \
> + long __ret = timeout; \
> + if (!___wait_cond_timeout(condition)) \
> + __ret = __swait_event_killable_timeout(wq, \
> + condition, timeout); \
> + __ret; \
> +})
> +
> #endif /* _LINUX_SWAIT_H */
>
> Luis

After replacing TASK_INTERRUPTIBLE with TASK_KILLABLE above it works for me.


Regards,

Martin

2017-06-06 14:53:28

by Alan

[permalink] [raw]
Subject: Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback

> "Unix tradition (and thus almost all applications) believe file store
> writes to
> be non signal interruptible. It would not be safe or practical to
> change that
> guarantee."

Yep everyone codes

write(disk_file, "foo", 3);

not while(..) blah around it.

> For these two reasons then it would seem best we do two things
> actually:
>
> 1) return -EINTR instead of -EAGAIN when we detect
> swait_event_interruptible_timeout()
> got interrupted by a signal (it returns -ERESTARTSYS)
> 2) Do as you note below and add wait_event_killable_timeout()

Pedantic detail that I don't think affects you

If you have completed a part of the I/O then you should return the byte
processed count not EINTR, but -1,EINTR if no progress was made.

Alan

2017-06-06 16:34:09

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback

Adding fsdevel for review on the correct semantics of handling signals on
write(), in this case a sysfs write which triggered a sync request firmware
call and what the firmware API should return in such case of a signal (I gather
this should be -EINTR and not -ERESTARTSYS). Also whether or not SIGINT should
be followed or if only allowing SIGKILL is fine (fine by me, but it would
change old behaviour).

Hoping between fsdevel and linux-api folks we can hash this out.

On Tue, Jun 06, 2017 at 11:04:37AM +0200, Martin Fuzzey wrote:
> On 05/06/17 22:24, Luis R. Rodriguez wrote:
> >
> >
> > For these two reasons then it would seem best we do two things actually:
> >
> > 1) return -EINTR instead of -EAGAIN when we detect swait_event_interruptible_timeout()
> > got interrupted by a signal (it returns -ERESTARTSYS)
>
>
> I disagree. That would force userspace to handle the signal rather than
> having the kernel retry.
>
> From Documentation/DocBook/kernel-hacking.tmpl:
>
> After you slept you should check if a signal occurred: the
> Unix/Linux way of handling signals is to temporarily exit the
> system call with the <constant>-ERESTARTSYS</constant> error. The
> system call entry code will switch back to user context, process
> the signal handler and then your system call will be restarted
> (unless the user disabled that). So you should be prepared to
> process the restart, e.g. if you're in the middle of manipulating
> some data structure.

This applies but you are missing my point that the LWN article [0] I referred
to also stated "Kernel code which uses interruptible sleeps must always check
to see whether it woke up as a result of a signal, and, if so, clean up
whatever it was doing and return -EINTR back to user space." -- I realize there
may be contradiction with above documentation -- this perhaps can be clarified
with fsdevel folks *but* regardless of that the same article notes Alan Cox
explains that "Unix tradition (and thus almost all applications) believe file
store writes to be non signal interruptible. It would not be safe or practical
to change that guarantee." So for this reason alone there does seem to be an
exemption to the above documentation worth noting for file store writes, and
the patch which you tested below *moves* the sysfs write op for firmware in
that direction by adding a new killable swait.

[0] https://lwn.net/Articles/288056/

> > 2) Do as you note below and add wait_event_killable_timeout()
>
> Hum, I do think that would be better but, (please correct me if I'm wrong)
> the _killable_ variants only allow SIGKILL (and not SIGINT).

That seems correct given a TASK_KILLABLE is also TASK_UNINTERRUPTIBLE.

> 0cb64249ca "firmware_loader: abort request if wait_for_completion is
> interrupted"
>
> specifically mentrions ctrl-c (SIGINT) in the commit message so that would
> no longer work.

Great point, but it *also* allowed SIGKILL, so I do feel the goal was also to
allow it to be killable. I'm afraid that patch probably did not get proper
review from sufficient folks and its worth now asking ourselves what we'd like
to do. I'm fine with letting go of SIGINT for firmware sysfs calls for the
sake of keeping with the long standing unix tradition on write, given we *still
have SIGKILL*.

> Myself I think having to use kill -9 to interrupt firmware loading by a
> usespace helper is OK but others may disagree.

Its why I added fsdevel as well. This is really a semantics and uapi question.
Between fsdevel and linux-api folks I would hope we can come to a sensible
resolution.

> > I do not see why we could not introduce wait_event_killable_timeout()
> > and swait_event_killable_timeout() into -stables.
> > After seeing how simple it is to do so I tend to agree. Greg, Peter,
> > what are your thoughts ?
> >
> > Martin Fuzzey can you test this patch as an alternative to your issue ?
> >
> > diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
> > index b9f907eedbf7..70fc42e5e0da 100644
> > --- a/drivers/base/firmware_class.c
> > +++ b/drivers/base/firmware_class.c
> > @@ -131,7 +131,7 @@ static int __fw_state_wait_common(struct fw_state *fw_st, long timeout)
> > {
> > long ret;
> > - ret = swait_event_interruptible_timeout(fw_st->wq,
> > + ret = swait_event_killable_timeout(fw_st->wq,
> > __fw_state_is_done(READ_ONCE(fw_st->status)),
> > timeout);
> > if (ret != 0 && fw_st->status == FW_STATUS_ABORTED)
> > diff --git a/include/linux/swait.h b/include/linux/swait.h
> > index c1f9c62a8a50..9c5ca2898b2f 100644
> > --- a/include/linux/swait.h
> > +++ b/include/linux/swait.h
> > @@ -169,4 +169,29 @@ do { \
> > __ret; \
> > })
> > +#define __swait_event_killable(wq, condition) \
> > + (void)___swait_event(wq, condition, TASK_KILLABLE, 0, schedule())
> > +
> > +#define swait_event_killable(wq, condition) \
> > +({ \
> > + int __ret = 0; \
> > + if (!(condition)) \
> > + __ret = __swait_event_killable(wq, condition); \
> > + __ret; \
> > +})
> > +
> > +#define __swait_event_killable_timeout(wq, condition, timeout) \
> > + ___swait_event(wq, ___wait_cond_timeout(condition), \
> > + TASK_INTERRUPTIBLE, timeout, \
> > + __ret = schedule_timeout(__ret))
> > +
>
> Should be TASK_KILLABLE above

Oops yes sorry.

> > +#define swait_event_killable_timeout(wq, condition, timeout) \
> > +({ \
> > + long __ret = timeout; \
> > + if (!___wait_cond_timeout(condition)) \
> > + __ret = __swait_event_killable_timeout(wq, \
> > + condition, timeout); \
> > + __ret; \
> > +})
> > +
> > #endif /* _LINUX_SWAIT_H */
> >
> > Luis
>
> After replacing TASK_INTERRUPTIBLE with TASK_KILLABLE above it works for me.

Great, thanks for testing.

Luis

2017-06-06 16:47:44

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback

Adding fsdevel folks.

On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote:
> > "Unix tradition (and thus almost all applications) believe file store
> > writes to
> > be non signal interruptible. It would not be safe or practical to
> > change that
> > guarantee."
>
> Yep everyone codes
>
> write(disk_file, "foo", 3);
>
> not while(..) blah around it.

Thanks for the confirmation! That's a simple enough explanation.

> > For these two reasons then it would seem best we do two things
> > actually:
> >
> > 1) return -EINTR instead of -EAGAIN when we detect
> > swait_event_interruptible_timeout()
> > got interrupted by a signal (it returns -ERESTARTSYS)
> > 2) Do as you note below and add wait_event_killable_timeout()
>
> Pedantic detail that I don't think affects you
>
> If you have completed a part of the I/O then you should return the byte
> processed count not EINTR, but -1,EINTR if no progress was made.

You are right with some new exceptions and with regards to the future:

The syfs loading interface for firmware currently goes through the
data file exposed on syfs, the respective write op firmware_data_write()
only checks for signals at the beginning. After that its a full one
swoop try to write if you are following the old tradition and are using
a buffer allocated by the firmware API.

If you are using the relatively new request_firmware_into_buf() added
by Stephen Boyd which lets the driver provide the allocated buffer then
we have a loop in firmware_rw() which should be fixed to:

1) Check for signals
2) Do what you noted above.

Furthermore Yi Li over at Intel is adding some new API calls which would
re-use some of this for FPGA firmwares which are also very large, that
work should consider the above and fix appropriately as well.

Luis

2017-06-06 17:53:12

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback

Used wrong alias for fsdevel now, its linux-fsdevel ...

Luis

On Tue, Jun 06, 2017 at 06:34:01PM +0200, Luis R. Rodriguez wrote:
> Adding fsdevel for review on the correct semantics of handling signals on
> write(), in this case a sysfs write which triggered a sync request firmware
> call and what the firmware API should return in such case of a signal (I gather
> this should be -EINTR and not -ERESTARTSYS). Also whether or not SIGINT should
> be followed or if only allowing SIGKILL is fine (fine by me, but it would
> change old behaviour).
>
> Hoping between fsdevel and linux-api folks we can hash this out.
>
> On Tue, Jun 06, 2017 at 11:04:37AM +0200, Martin Fuzzey wrote:
> > On 05/06/17 22:24, Luis R. Rodriguez wrote:
> > >
> > >
> > > For these two reasons then it would seem best we do two things actually:
> > >
> > > 1) return -EINTR instead of -EAGAIN when we detect swait_event_interruptible_timeout()
> > > got interrupted by a signal (it returns -ERESTARTSYS)
> >
> >
> > I disagree. That would force userspace to handle the signal rather than
> > having the kernel retry.
> >
> > From Documentation/DocBook/kernel-hacking.tmpl:
> >
> > After you slept you should check if a signal occurred: the
> > Unix/Linux way of handling signals is to temporarily exit the
> > system call with the <constant>-ERESTARTSYS</constant> error. The
> > system call entry code will switch back to user context, process
> > the signal handler and then your system call will be restarted
> > (unless the user disabled that). So you should be prepared to
> > process the restart, e.g. if you're in the middle of manipulating
> > some data structure.
>
> This applies but you are missing my point that the LWN article [0] I referred
> to also stated "Kernel code which uses interruptible sleeps must always check
> to see whether it woke up as a result of a signal, and, if so, clean up
> whatever it was doing and return -EINTR back to user space." -- I realize there
> may be contradiction with above documentation -- this perhaps can be clarified
> with fsdevel folks *but* regardless of that the same article notes Alan Cox
> explains that "Unix tradition (and thus almost all applications) believe file
> store writes to be non signal interruptible. It would not be safe or practical
> to change that guarantee." So for this reason alone there does seem to be an
> exemption to the above documentation worth noting for file store writes, and
> the patch which you tested below *moves* the sysfs write op for firmware in
> that direction by adding a new killable swait.
>
> [0] https://lwn.net/Articles/288056/
>
> > > 2) Do as you note below and add wait_event_killable_timeout()
> >
> > Hum, I do think that would be better but, (please correct me if I'm wrong)
> > the _killable_ variants only allow SIGKILL (and not SIGINT).
>
> That seems correct given a TASK_KILLABLE is also TASK_UNINTERRUPTIBLE.
>
> > 0cb64249ca "firmware_loader: abort request if wait_for_completion is
> > interrupted"
> >
> > specifically mentrions ctrl-c (SIGINT) in the commit message so that would
> > no longer work.
>
> Great point, but it *also* allowed SIGKILL, so I do feel the goal was also to
> allow it to be killable. I'm afraid that patch probably did not get proper
> review from sufficient folks and its worth now asking ourselves what we'd like
> to do. I'm fine with letting go of SIGINT for firmware sysfs calls for the
> sake of keeping with the long standing unix tradition on write, given we *still
> have SIGKILL*.
>
> > Myself I think having to use kill -9 to interrupt firmware loading by a
> > usespace helper is OK but others may disagree.
>
> Its why I added fsdevel as well. This is really a semantics and uapi question.
> Between fsdevel and linux-api folks I would hope we can come to a sensible
> resolution.
>
> > > I do not see why we could not introduce wait_event_killable_timeout()
> > > and swait_event_killable_timeout() into -stables.
> > > After seeing how simple it is to do so I tend to agree. Greg, Peter,
> > > what are your thoughts ?
> > >
> > > Martin Fuzzey can you test this patch as an alternative to your issue ?
> > >
> > > diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
> > > index b9f907eedbf7..70fc42e5e0da 100644
> > > --- a/drivers/base/firmware_class.c
> > > +++ b/drivers/base/firmware_class.c
> > > @@ -131,7 +131,7 @@ static int __fw_state_wait_common(struct fw_state *fw_st, long timeout)
> > > {
> > > long ret;
> > > - ret = swait_event_interruptible_timeout(fw_st->wq,
> > > + ret = swait_event_killable_timeout(fw_st->wq,
> > > __fw_state_is_done(READ_ONCE(fw_st->status)),
> > > timeout);
> > > if (ret != 0 && fw_st->status == FW_STATUS_ABORTED)
> > > diff --git a/include/linux/swait.h b/include/linux/swait.h
> > > index c1f9c62a8a50..9c5ca2898b2f 100644
> > > --- a/include/linux/swait.h
> > > +++ b/include/linux/swait.h
> > > @@ -169,4 +169,29 @@ do { \
> > > __ret; \
> > > })
> > > +#define __swait_event_killable(wq, condition) \
> > > + (void)___swait_event(wq, condition, TASK_KILLABLE, 0, schedule())
> > > +
> > > +#define swait_event_killable(wq, condition) \
> > > +({ \
> > > + int __ret = 0; \
> > > + if (!(condition)) \
> > > + __ret = __swait_event_killable(wq, condition); \
> > > + __ret; \
> > > +})
> > > +
> > > +#define __swait_event_killable_timeout(wq, condition, timeout) \
> > > + ___swait_event(wq, ___wait_cond_timeout(condition), \
> > > + TASK_INTERRUPTIBLE, timeout, \
> > > + __ret = schedule_timeout(__ret))
> > > +
> >
> > Should be TASK_KILLABLE above
>
> Oops yes sorry.
>
> > > +#define swait_event_killable_timeout(wq, condition, timeout) \
> > > +({ \
> > > + long __ret = timeout; \
> > > + if (!___wait_cond_timeout(condition)) \
> > > + __ret = __swait_event_killable_timeout(wq, \
> > > + condition, timeout); \
> > > + __ret; \
> > > +})
> > > +
> > > #endif /* _LINUX_SWAIT_H */
> > >
> > > Luis
> >
> > After replacing TASK_INTERRUPTIBLE with TASK_KILLABLE above it works for me.
>
> Great, thanks for testing.
>
> Luis
>

--
Luis Rodriguez, SUSE LINUX GmbH
Maxfeldstrasse 5; D-90409 Nuernberg

2017-06-06 17:54:19

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback

Using the right linux-fsdevel this time also, this was the second reply.

Luis

On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote:
> Adding fsdevel folks.
>
> On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote:
> > > "Unix tradition (and thus almost all applications) believe file store
> > > writes to
> > > be non signal interruptible. It would not be safe or practical to
> > > change that
> > > guarantee."
> >
> > Yep everyone codes
> >
> > write(disk_file, "foo", 3);
> >
> > not while(..) blah around it.
>
> Thanks for the confirmation! That's a simple enough explanation.
>
> > > For these two reasons then it would seem best we do two things
> > > actually:
> > >
> > > 1) return -EINTR instead of -EAGAIN when we detect
> > > swait_event_interruptible_timeout()
> > > got interrupted by a signal (it returns -ERESTARTSYS)
> > > 2) Do as you note below and add wait_event_killable_timeout()
> >
> > Pedantic detail that I don't think affects you
> >
> > If you have completed a part of the I/O then you should return the byte
> > processed count not EINTR, but -1,EINTR if no progress was made.
>
> You are right with some new exceptions and with regards to the future:
>
> The syfs loading interface for firmware currently goes through the
> data file exposed on syfs, the respective write op firmware_data_write()
> only checks for signals at the beginning. After that its a full one
> swoop try to write if you are following the old tradition and are using
> a buffer allocated by the firmware API.
>
> If you are using the relatively new request_firmware_into_buf() added
> by Stephen Boyd which lets the driver provide the allocated buffer then
> we have a loop in firmware_rw() which should be fixed to:
>
> 1) Check for signals
> 2) Do what you noted above.
>
> Furthermore Yi Li over at Intel is adding some new API calls which would
> re-use some of this for FPGA firmwares which are also very large, that
> work should consider the above and fix appropriately as well.
>
> Luis
>

--
Luis Rodriguez, SUSE LINUX GmbH
Maxfeldstrasse 5; D-90409 Nuernberg

2017-06-06 22:15:07

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback

On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote:
> On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote:
> > Yep everyone codes
> >
> > write(disk_file, "foo", 3);
> >
> > not while(..) blah around it.

In general I/O to tty devices and other character mode devices was
where you definitely needed to check for EINTR/EAGAIN because that was
the place where historically Unix systems would interrupt system calls
--- e.g., a user typing control-Z, for example.

And in general writes to file systems and block devices in *general*
were never interrupted by signals, although that was always a
non-portable assumption.

So I've always subscribed to the "be liberal in what you accept,
conservative in what you send" rule of thumb. Which is to say, any
programs *I* write I'll in general always check for EINTR/EAGAIN and
check for partial writes, but in general, as a kernel program I try to
adhere to the long-standing Unix trandition for disk based files.

This does beg the question about whether firmware devices are more
like tty devices or block devices or files, though. If before signals
never caused them to return EINTR/EAGAIN, then it's probably best to
not break backwards compatbility.

That being said, not that you also have the option of using
-ERESTARTNOINTR (always restart the system call, regardless of how the
sighandle flags were set), and -ERESTARTNOHAND (restart the system
call always if there was no signal handler and the process was not
killed), in addition to -ERESTARTSYS. So that might be another option
that's fairly easy to implement or experiment with.

- Ted

2017-06-07 00:22:45

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback

On Tue, Jun 06, 2017 at 06:11:51PM -0400, Theodore Ts'o wrote:
> On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote:
> > On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote:
> > > Yep everyone codes
> > >
> > > write(disk_file, "foo", 3);
> > >
> > > not while(..) blah around it.
>
> In general I/O to tty devices and other character mode devices was
> where you definitely needed to check for EINTR/EAGAIN because that was
> the place where historically Unix systems would interrupt system calls
> --- e.g., a user typing control-Z, for example.
>
> And in general writes to file systems and block devices in *general*
> were never interrupted by signals, although that was always a
> non-portable assumption.
>
> So I've always subscribed to the "be liberal in what you accept,
> conservative in what you send" rule of thumb. Which is to say, any
> programs *I* write I'll in general always check for EINTR/EAGAIN and
> check for partial writes, but in general, as a kernel program I try to
> adhere to the long-standing Unix trandition for disk based files.

OK so userspace should consider checking for EINTR/EAGAIN. On the kernel front
though we we do *strive* to go by the old unix tradition -- there are
exceptions though, of course.

> This does beg the question about whether firmware devices are more
> like tty devices or block devices or files, though.

And here you raise the analogy to see if its *worthy* to break the old unix
tradition, the tty example is a case that we seem to accept as reasonable
for an exception, clearly.

This help, thanks!

So, "firmware devices" is a misnomer, the firmware API does direct FS lookup
and only if that fails and the distribution enabled the fallback mechanism will
it be used if the file was not found on /lib/firmware/ paths. So the syfs
interface we are evaluating here is this fallback mechanism. The "firmware
device" then is really more a user of the firmware API, and these do vary
widely. In fact it recent developments with the "driver data API" generalize
the interface given things outside our typical device drivers for what we know
as old "firmware" are looking to use the firmware API for looking for files
from userspace. This in fact already has happened, it was just subtle: we look
for default EEPROMs, configururation files, microcode updates, and soon we'll
want to replace the userspace wireless CRDA tool for the regulatory database
with a simple file lookup once we get firmware signing upstream. So -- more and
more the API is being used for general file lookups.

Which type of drivers request these files ? It all varies across the board,
and we have really early things like CPU microcode files (so we enable built-in
files), regular old firmware files for things like networking drivers,
subsystem configuration stuff (for the wireless regulatory database), and it
seems now *huge* (100s of MiBs I'm told) for remote-proc and FPGAs. The FPGA
use case is one use case where having the uploader use a while loop makes more
sense. Drivers may also want to have more fine control to the buffer where the
"driver data" gets stashed into, so the API request_firmware_into_buf() was
added for this purpose. The FPGA devices seem to want to use this breakdown
mechanism as well. Although the remote-proc folks seem to be relying on the
sysfs interface as a default mechanism.

So it would seem to make sense to support the loop thing since some
files these days *can* be really big. I think we can easily address
this by relying on the killable signal (SIGKILL), rather than capturing
SIGINT (CTRL-C).

One problem is we had supported SIGINT before, it was reported however we
never gave back to userspace the fact that a signal was captured, we always
returned EAGAIN, so usersace did not know WTF was going on. The reporter wanted
to detect this and wanted to get us to return the same value the wait call
passed to the firmware API, -ERESTARTSYS. Hence this patch, however since
-ERESTARTSYS is special and folks may just restart the syscall always when this
happens, one question is if we should return EINTR instead of today's EAGAIN.

> If before signals
> never caused them to return EINTR/EAGAIN, then it's probably best to
> not break backwards compatbility.

There was a time where did not have signal, and always just used to return
-ENOMEM on failure. After signals support was added we still were
returning -ENOMEM for a while. It was only later that we started capturing
the actual signal, however this was masked as EAGAIN, for no precise good
reason, I suppose just lack of review.

The patch in question wants us to return -ERESTARTSYS so that userspace
can restart the upload. But the issue was that the sysfs firmware loader
was killed by a signal as well, so this is why Dmitry noted that we can
just fix this issue by just making the wait killable with SIGKILL only.
The swait killable patch I supplied upon Dmitry's suggestion as a replacement
would do away with capturing SIGINT but allow SIGKILL then.

Since EAGAIN was always returned even if a signal was captured I'm
inclined to take the position we really never gave userspace a proper
clue about the signal. Additionally I cannot see why userspace would
rely on SIGINT working *only* but not *SIGKILL*.

The risk here I think is if userspace ever *did* rely on SIGINT for the sysfs
interface as userspace functional API. If this is not worth breaking
then I suppose we are stuck with the current wait. If we do that I'd
say we return EINTR, based on what I have read so far.

If doing away with SIGINT but keeping SIGKILL is rather sane then I'd go
with the approach of the new swait killable + returning -EINTR.

> That being said, note that you also have the option of using
> -ERESTARTNOINTR (always restart the system call, regardless of how the
> sighandle flags were set), and -ERESTARTNOHAND (restart the system
> call always if there was no signal handler and the process was not
> killed), in addition to -ERESTARTSYS.

We rely on swait, and swait right now only uses -ERESTARTSYS. Are
you saying we could mask out -ERESTARTSYS and map it to -ERESTARTNOINTR
or -ERESTARTNOHAND if we see fit for some future functionality / need ?

> So that might be another option
> that's fairly easy to implement or experiment with.

Thanks!

Luis

2017-06-07 04:57:12

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback

On Tue, Jun 6, 2017 at 5:22 PM, Luis R. Rodriguez <[email protected]> wrote:
> On Tue, Jun 06, 2017 at 06:11:51PM -0400, Theodore Ts'o wrote:
>> On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote:
>> > On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote:
>
> We rely on swait, and swait right now only uses -ERESTARTSYS. Are
> you saying we could mask out -ERESTARTSYS and map it to -ERESTARTNOINTR
> or -ERESTARTNOHAND if we see fit for some future functionality / need ?

I think that has essentially nothing to do with swait. User code does
some syscall. That syscall triggers a firmware load. The caller gets
a signal. If you're going to let firmware load get interrupted, you
need to consider what the syscall is.

2017-06-07 06:25:21

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback

On Tue, Jun 06, 2017 at 09:56:47PM -0700, Andy Lutomirski wrote:
> On Tue, Jun 6, 2017 at 5:22 PM, Luis R. Rodriguez <[email protected]> wrote:
> > On Tue, Jun 06, 2017 at 06:11:51PM -0400, Theodore Ts'o wrote:
> >> On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote:
> >> > On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote:
> >
> > We rely on swait, and swait right now only uses -ERESTARTSYS. Are
> > you saying we could mask out -ERESTARTSYS and map it to -ERESTARTNOINTR
> > or -ERESTARTNOHAND if we see fit for some future functionality / need ?
>
> I think that has essentially nothing to do with swait. User code does
> some syscall. That syscall triggers a firmware load. The caller gets
> a signal. If you're going to let firmware load get interrupted, you
> need to consider what the syscall is.

I think it is way too complicated and I do not think driver writers will
stand a chance of implementing this correctly, given that often firmware
load might be triggered indirectly and by multitude of syscalls.

What's wrong with saying that the only way to interrupt firmware loading
is to kill the process? So ctrl-c will no longer interrupt it, but I do
not think that ease of aborting firmware update is primary goal here. I
consider simple is good here.

Thanks.

--
Dmitry

2017-06-07 12:27:23

by Alan

[permalink] [raw]
Subject: Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback

> What's wrong with saying that the only way to interrupt firmware
> loading is to kill the process? So ctrl-c will no longer interrupt
> it, but I do not think that ease of aborting firmware update is
> primary goal here. I consider simple is good here.

Agreed 100%. The user process did not ask for firmware load, it asked
for an I/O operation. Semantically it should appear as if someone else
did the firmware load and it just had to wait for it to happen.

Alan

2017-06-07 17:15:12

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback

On Wed, Jun 07, 2017 at 01:25:51PM +0100, Alan Cox wrote:
> > What's wrong with saying that the only way to interrupt firmware
> > loading is to kill the process? So ctrl-c will no longer interrupt
> > it, but I do not think that ease of aborting firmware update is
> > primary goal here. I consider simple is good here.
>
> Agreed 100%. The user process did not ask for firmware load, it asked
> for an I/O operation. Semantically it should appear as if someone else
> did the firmware load and it just had to wait for it to happen.

Fine by me ! Will wrap up the patch for the new killable swait then.
I suppose noting it as a stable fix is worth it given the known issues
with for example Android killing loaders unexpectedly.

Unless I hear otherwise I'll also provide a follow up to return -EINTR instead
of -EAGAIN if swait returned -ERESTARTSYS, this way at least userspace could
tell a signal was definitely received. I *don't* think that follow up is
required for stable though.

Luis

2017-06-09 01:14:51

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback

On Tue, Jun 6, 2017 at 11:25 PM, Dmitry Torokhov
<[email protected]> wrote:
> On Tue, Jun 06, 2017 at 09:56:47PM -0700, Andy Lutomirski wrote:
>> On Tue, Jun 6, 2017 at 5:22 PM, Luis R. Rodriguez <[email protected]> wrote:
>> > On Tue, Jun 06, 2017 at 06:11:51PM -0400, Theodore Ts'o wrote:
>> >> On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote:
>> >> > On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote:
>> >
>> > We rely on swait, and swait right now only uses -ERESTARTSYS. Are
>> > you saying we could mask out -ERESTARTSYS and map it to -ERESTARTNOINTR
>> > or -ERESTARTNOHAND if we see fit for some future functionality / need ?
>>
>> I think that has essentially nothing to do with swait. User code does
>> some syscall. That syscall triggers a firmware load. The caller gets
>> a signal. If you're going to let firmware load get interrupted, you
>> need to consider what the syscall is.
>
> I think it is way too complicated and I do not think driver writers will
> stand a chance of implementing this correctly, given that often firmware
> load might be triggered indirectly and by multitude of syscalls.
>

That's what I meant, but I said it unclearly. I meant that, if we're
going to start allowing interruption, we would need to audit all the
callers. Ugh.

I suppose we could have request_firmware_interruptable(), but that
seems like it's barely worth it.

--Andy

2017-06-09 01:34:02

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback

On Thu, Jun 8, 2017 at 6:14 PM, Andy Lutomirski <[email protected]> wrote:
> On Tue, Jun 6, 2017 at 11:25 PM, Dmitry Torokhov
> <[email protected]> wrote:
>> On Tue, Jun 06, 2017 at 09:56:47PM -0700, Andy Lutomirski wrote:
>>> On Tue, Jun 6, 2017 at 5:22 PM, Luis R. Rodriguez <[email protected]> wrote:
>>> > On Tue, Jun 06, 2017 at 06:11:51PM -0400, Theodore Ts'o wrote:
>>> >> On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote:
>>> >> > On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote:
>>> >
>>> > We rely on swait, and swait right now only uses -ERESTARTSYS. Are
>>> > you saying we could mask out -ERESTARTSYS and map it to -ERESTARTNOINTR
>>> > or -ERESTARTNOHAND if we see fit for some future functionality / need ?
>>>
>>> I think that has essentially nothing to do with swait. User code does
>>> some syscall. That syscall triggers a firmware load. The caller gets
>>> a signal. If you're going to let firmware load get interrupted, you
>>> need to consider what the syscall is.
>>
>> I think it is way too complicated and I do not think driver writers will
>> stand a chance of implementing this correctly, given that often firmware
>> load might be triggered indirectly and by multitude of syscalls.
>>
>
> That's what I meant, but I said it unclearly. I meant that, if we're
> going to start allowing interruption, we would need to audit all the
> callers. Ugh.

There are actually two audits worth evaluating if what we've concluded
is fair game:

a) firmware sync calls on interruptible paths
b) use of swait / old interruptible waits on sysfs paths

As for a) we drove tons of code away from using sync, request
firmware, and on async firmware the return value is lost, we only can
currently know if a failure of some sort occurred. If that push to
async failed we also scared away tons of drivers to use
request_firmware() calls on init and later incorrectly on probe due to
serialized init + probe delay on boot. From what I recall my last
Coccinelle evil-monster-hunt, I did indeed find quite a bit of drivers
still still relying on sync firmware which ultimately revealed use on
a probe path. The signal however was only effective as of commit
0cb64249ca500 ("firmware_loader: abort request if wait_for_completion
is interrupted") merged as of v4.0. Creating awareness of the issue
seems fair, but I don't think its worth a huge fly swatter.

I have no information to contribute for b) other than I was reluctant
to even consider it.

> I suppose we could have request_firmware_interruptable(), but that
> seems like it's barely worth it.

I don't think that's worth it, given the signal was effective only as
of v4.0, we already had a big push away from sync requests, and also
had the "don't use request firmware" on init scare which also
propagated to probe later.

Luis

2017-06-09 21:29:34

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback

On Thu, Jun 8, 2017 at 6:33 PM, Luis R. Rodriguez <[email protected]> wrote:
> On Thu, Jun 8, 2017 at 6:14 PM, Andy Lutomirski <[email protected]> wrote:
>> That's what I meant, but I said it unclearly. I meant that, if we're
>> going to start allowing interruption, we would need to audit all the
>> callers. Ugh.
>
> There are actually two audits worth evaluating if what we've concluded
> is fair game:
>
> a) firmware sync calls on interruptible paths
> b) use of swait / old interruptible waits on sysfs paths

And as I noted in the other thread -- another possible issue could be
any swait / interruptable wait on init or probe. Provided any child
completes and the kernel code for wait handler does abort, that
request would be terminated. This could for instance happen at bootup
as modules load and any child from the loader terminates.

We already have Coccinelle grammar to hunt for "though shall not
request firmware on init or probe", such SmPL grammar could be in turn
be repruposed to hunt for these types of conditions.

Luis