2007-06-10 23:00:06

by Eduard-Gabriel Munteanu

[permalink] [raw]
Subject: [PATCH 1/1] UML: fix missing non-blocking I/O, now DEBUG_SHIRQ works

*This message was transferred with a trial version of CommuniGate(r) Pro*
DEBUG_SHIRQ generates spurious interrupts, triggering handlers such as
mconsole_interrupt() or line_interrupt(). They expect data to be
available to be read from their sockets/pipes, but in the case of
spurious interrupts, the host didn't actually send anything, so UML
hangs in read() and friends. Setting those fd's as O_NONBLOCK makes
DEBUG_SHIRQ-enabled UML kernels boot and run correctly.

The patch was written and tested on Linux 2.6.22-rc2-mm1.

I have also included a couple of short/minor fixes for potential
problems.

Signed-off-by: Eduard-Gabriel Munteanu <[email protected]>

---
arch/um/drivers/chan_user.c | 6 ++++++
arch/um/drivers/mconsole_kern.c | 11 +++++------
arch/um/drivers/mconsole_user.c | 5 +++--
arch/um/drivers/ubd_user.c | 6 ++++++
arch/um/drivers/xterm.c | 7 +++++++
5 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/arch/um/drivers/chan_user.c b/arch/um/drivers/chan_user.c
index 13f0bf8..0a8b785 100644
--- a/arch/um/drivers/chan_user.c
+++ b/arch/um/drivers/chan_user.c
@@ -170,6 +170,12 @@ static int winch_tramp(int fd, struct tty_struct *tty, int *fd_out)
err = -EINVAL;
goto out_close;
}
+
+ if (os_set_fd_block(*fd_out, 0)) {
+ printk("winch_tramp: failed to set thread_fd non-blocking.\n");
+ goto out_close;
+ }
+
return err ;

out_close:
diff --git a/arch/um/drivers/mconsole_kern.c b/arch/um/drivers/mconsole_kern.c
index 542c9ef..02d132e 100644
--- a/arch/um/drivers/mconsole_kern.c
+++ b/arch/um/drivers/mconsole_kern.c
@@ -74,12 +74,11 @@ static DECLARE_WORK(mconsole_work, mc_work_proc);

static irqreturn_t mconsole_interrupt(int irq, void *dev_id)
{
- /* long to avoid size mismatch warnings from gcc */
- long fd;
+ int fd;
struct mconsole_entry *new;
static struct mc_request req; /* that's OK */

- fd = (long) dev_id;
+ fd = *((int *) dev_id);
while (mconsole_get_request(fd, &req)){
if(req.cmd->context == MCONSOLE_INTR)
(*req.cmd->handler)(&req);
@@ -798,10 +797,10 @@ void mconsole_stack(struct mc_request *req)
*/
static char *notify_socket = NULL;

+static int sock;
+
static int mconsole_init(void)
{
- /* long to avoid size mismatch warnings from gcc */
- long sock;
int err;
char file[256];

@@ -818,7 +817,7 @@ static int mconsole_init(void)

err = um_request_irq(MCONSOLE_IRQ, sock, IRQ_READ, mconsole_interrupt,
IRQF_DISABLED | IRQF_SHARED | IRQF_SAMPLE_RANDOM,
- "mconsole", (void *)sock);
+ "mconsole", &sock);
if (err){
printk("Failed to get IRQ for management console\n");
return(1);
diff --git a/arch/um/drivers/mconsole_user.c b/arch/um/drivers/mconsole_user.c
index 62e5ad6..f31e715 100644
--- a/arch/um/drivers/mconsole_user.c
+++ b/arch/um/drivers/mconsole_user.c
@@ -86,8 +86,9 @@ int mconsole_get_request(int fd, struct mc_request *req)
int len;

req->originlen = sizeof(req->origin);
- req->len = recvfrom(fd, &req->request, sizeof(req->request), 0,
- (struct sockaddr *) req->origin, &req->originlen);
+ req->len = recvfrom(fd, &req->request, sizeof(req->request),
+ MSG_DONTWAIT, (struct sockaddr *) req->origin,
+ &req->originlen);
if (req->len < 0)
return 0;

diff --git a/arch/um/drivers/ubd_user.c b/arch/um/drivers/ubd_user.c
index 4707b3f..56f2c23 100644
--- a/arch/um/drivers/ubd_user.c
+++ b/arch/um/drivers/ubd_user.c
@@ -43,6 +43,12 @@ int start_io_thread(unsigned long sp, int *fd_out)
kernel_fd = fds[0];
*fd_out = fds[1];

+ err = os_set_fd_block(*fd_out, 0);
+ if (err) {
+ printk("start_io_thread - failed to set nonblocking I/O.\n");
+ goto out_close;
+ }
+
pid = clone(io_thread, (void *) sp, CLONE_FILES | CLONE_VM | SIGCHLD,
NULL);
if(pid < 0){
diff --git a/arch/um/drivers/xterm.c b/arch/um/drivers/xterm.c
index 571c2b3..2b65b57 100644
--- a/arch/um/drivers/xterm.c
+++ b/arch/um/drivers/xterm.c
@@ -151,6 +151,13 @@ int xterm_open(int input, int output, int primary, void *d,
goto out;
}

+ err = os_set_fd_block(new, 0);
+ if (err) {
+ printk("xterm_open : failed to set xterm descriptor "
+ "non-blocking, err = %d\n", -err);
+ return(err);
+ }
+
CATCH_EINTR(err = tcgetattr(new, &data->tt));
if(err){
new = err;



2007-06-11 17:32:18

by Jeff Dike

[permalink] [raw]
Subject: Re: [PATCH 1/1] UML: fix missing non-blocking I/O, now DEBUG_SHIRQ works

On Mon, Jun 11, 2007 at 01:01:57AM +0300, Eduard-Gabriel Munteanu wrote:
> DEBUG_SHIRQ generates spurious interrupts, triggering handlers such as
> mconsole_interrupt() or line_interrupt(). They expect data to be
> available to be read from their sockets/pipes, but in the case of
> spurious interrupts, the host didn't actually send anything, so UML
> hangs in read() and friends. Setting those fd's as O_NONBLOCK makes
> DEBUG_SHIRQ-enabled UML kernels boot and run correctly.

Nice.

I don't really like this section though. The casting I have now isn't
pleasant, but I don't like adding a new global to get rid of it.

> diff --git a/arch/um/drivers/mconsole_kern.c b/arch/um/drivers/mconsole_kern.c
> index 542c9ef..02d132e 100644
> --- a/arch/um/drivers/mconsole_kern.c
> +++ b/arch/um/drivers/mconsole_kern.c
> @@ -74,12 +74,11 @@ static DECLARE_WORK(mconsole_work, mc_work_proc);
>
> static irqreturn_t mconsole_interrupt(int irq, void *dev_id)
> {
> - /* long to avoid size mismatch warnings from gcc */
> - long fd;
> + int fd;
> struct mconsole_entry *new;
> static struct mc_request req; /* that's OK */
>
> - fd = (long) dev_id;
> + fd = *((int *) dev_id);
> while (mconsole_get_request(fd, &req)){
> if(req.cmd->context == MCONSOLE_INTR)
> (*req.cmd->handler)(&req);
> @@ -798,10 +797,10 @@ void mconsole_stack(struct mc_request *req)
> */
> static char *notify_socket = NULL;
>
> +static int sock;
> +
> static int mconsole_init(void)
> {
> - /* long to avoid size mismatch warnings from gcc */
> - long sock;
> int err;
> char file[256];
>
> @@ -818,7 +817,7 @@ static int mconsole_init(void)
>
> err = um_request_irq(MCONSOLE_IRQ, sock, IRQ_READ, mconsole_interrupt,
> IRQF_DISABLED | IRQF_SHARED | IRQF_SAMPLE_RANDOM,
> - "mconsole", (void *)sock);
> + "mconsole", &sock);
> if (err){
> printk("Failed to get IRQ for management console\n");
> return(1);

Jeff

--
Work email - jdike at linux dot intel dot com

2007-06-11 19:40:11

by Eduard-Gabriel Munteanu

[permalink] [raw]
Subject: Re: [PATCH 1/1] UML: fix missing non-blocking I/O, now DEBUG_SHIRQ works

*This message was transferred with a trial version of CommuniGate(r) Pro*
Jeff Dike wrote:
> I don't really like this section though. The casting I have now isn't
> pleasant, but I don't like adding a new global to get rid of it.
>
>> diff --git a/arch/um/drivers/mconsole_kern.c b/arch/um/drivers/mconsole_kern.c
>> index 542c9ef..02d132e 100644
>> --- a/arch/um/drivers/mconsole_kern.c
>> +++ b/arch/um/drivers/mconsole_kern.c
>> @@ -74,12 +74,11 @@ static DECLARE_WORK(mconsole_work, mc_work_proc);
>>
>> static irqreturn_t mconsole_interrupt(int irq, void *dev_id)
>> {
>> - /* long to avoid size mismatch warnings from gcc */
>> - long fd;
>> + int fd;
>> struct mconsole_entry *new;
>> static struct mc_request req; /* that's OK */
>>
>> - fd = (long) dev_id;
>> + fd = *((int *) dev_id);
>> while (mconsole_get_request(fd, &req)){
>> if(req.cmd->context == MCONSOLE_INTR)
>> (*req.cmd->handler)(&req);
>> @@ -798,10 +797,10 @@ void mconsole_stack(struct mc_request *req)
>> */
>> static char *notify_socket = NULL;
>>
>> +static int sock;
>> +
>> static int mconsole_init(void)
>> {
>> - /* long to avoid size mismatch warnings from gcc */
>> - long sock;
>> int err;
>> char file[256];
>>
>> @@ -818,7 +817,7 @@ static int mconsole_init(void)
>>
>> err = um_request_irq(MCONSOLE_IRQ, sock, IRQ_READ, mconsole_interrupt,
>> IRQF_DISABLED | IRQF_SHARED | IRQF_SAMPLE_RANDOM,
>> - "mconsole", (void *)sock);
>> + "mconsole", &sock);
>> if (err){
>> printk("Failed to get IRQ for management console\n");
>> return(1);
>
> Jeff
>

No offense, but this is an ugly hack. What if sizeof(int) !=
sizeof(long)? You're calling glibc functions with that fd as a
parameter. On some arches, compiling will issue warnings or simply fail.

An alternative would be to use kmalloc instead of a global static
variable. Do you like this one more?

2007-06-11 21:16:22

by Jeff Dike

[permalink] [raw]
Subject: Re: [PATCH 1/1] UML: fix missing non-blocking I/O, now DEBUG_SHIRQ works

On Mon, Jun 11, 2007 at 10:39:56PM +0300, Eduard-Gabriel Munteanu wrote:
> No offense, but this is an ugly hack.

I'm not going to defend it too much, but the alternatives don't seem any
better to me.

> What if sizeof(int) != sizeof(long)?

Doesn't matter - the casting will preserve the value. Of greater
concern is the relationship between sizeof(void *) and sizeof(long).
That's not guaranteed by the standard, but is by gcc.

> You're calling glibc functions
> with that fd as a parameter. On some arches, compiling will issue
> warnings or simply fail.

Which ones?

> An alternative would be to use kmalloc instead of a global static
> variable. Do you like this one more?

No, that trades the global variable for a new point of failure.

The main reason I like the current casting better than your global
(not by a lot) is that globals have to be audited for SMP safety. So,
I don't want any globals which don't need to be. This implies a
local, and to minimize the machinery associated with that (kmalloc, or
passing a pointer and synchronizing to avoid returning too soon), just
passing the descriptor in the pointer and accepting the casting needed
to get it through the compiler.

Jeff

--
Work email - jdike at linux dot intel dot com

2007-06-11 22:39:37

by Eduard-Gabriel Munteanu

[permalink] [raw]
Subject: Re: [PATCH 1/1] UML: fix missing non-blocking I/O, now DEBUG_SHIRQ works

*This message was transferred with a trial version of CommuniGate(r) Pro*
Jeff Dike wrote:
> *This message was transferred with a trial version of CommuniGate(r) Pro*
> *This message was transferred with a trial version of CommuniGate(r) Pro*
> On Mon, Jun 11, 2007 at 10:39:56PM +0300, Eduard-Gabriel Munteanu wrote:
>> No offense, but this is an ugly hack.
>
> I'm not going to defend it too much, but the alternatives don't seem any
> better to me.
>
>> What if sizeof(int) != sizeof(long)?
>
> Doesn't matter - the casting will preserve the value. Of greater
> concern is the relationship between sizeof(void *) and sizeof(long).
> That's not guaranteed by the standard, but is by gcc.
>

The cast isn't done right. Doing "fd = (long) dev_id;" doesn't help,
since you pass fd to mconsole_get_request() as is. And
mconsole_get_request() expects an integer:
int mconsole_get_request(int fd, struct mc_request *req)

This will generate at least a warning on arches where sizeof(int) !=
sizeof(long). And as you say, then there is the problem with void
pointers and longs.

And by the way, AFAIK, GCC has the habit of breaking compatibility with
some userspace apps when a new GCC major version is released. And,
AFAIK, they're moving towards standards, so relying on GCC's
"guarantees" may backfire.

>> You're calling glibc functions
>> with that fd as a parameter. On some arches, compiling will issue
>> warnings or simply fail.
>
> Which ones?
>

An example is sparc64:
quote from http://lxr.linux.no/source/include/asm-sparc64/types.h#L49

38 #define BITS_PER_LONG 64
39
40 #ifndef __ASSEMBLY__
41
42 typedef __signed__ char s8;
43 typedef unsigned char u8;
44
45 typedef __signed__ short s16;
46 typedef unsigned short u16;
47
48 typedef __signed__ int s32;
49 typedef unsigned int u32;
50
51 typedef __signed__ long s64;
52 typedef unsigned long u64

>> An alternative would be to use kmalloc instead of a global static
>> variable. Do you like this one more?
>
> No, that trades the global variable for a new point of failure.
>
> The main reason I like the current casting better than your global
> (not by a lot) is that globals have to be audited for SMP safety. So,
> I don't want any globals which don't need to be. This implies a
> local, and to minimize the machinery associated with that (kmalloc, or
> passing a pointer and synchronizing to avoid returning too soon), just
> passing the descriptor in the pointer and accepting the casting needed
> to get it through the compiler.
>
> Jeff
>

Really, a kmalloc() isn't such a big deal, it only happens once and
we're not in interrupt context. One the other hand, ensuring safety and
portability on other arches is something that needs to be taken care of.

Of course, you could also cast to int when calling
mconsole_get_request(), but IMO that doesn't look nicer.

2007-06-12 00:48:20

by Jeff Dike

[permalink] [raw]
Subject: Re: [PATCH 1/1] UML: fix missing non-blocking I/O, now DEBUG_SHIRQ works

On Tue, Jun 12, 2007 at 01:39:25AM +0300, Eduard-Gabriel Munteanu wrote:
> The cast isn't done right. Doing "fd = (long) dev_id;" doesn't help,
> since you pass fd to mconsole_get_request() as is. And
> mconsole_get_request() expects an integer:
> int mconsole_get_request(int fd, struct mc_request *req)

gcc will trim a long to an integer correctly. You can pass a long to
an integer without casting.

> This will generate at least a warning on arches where sizeof(int) !=
> sizeof(long).

No it won't. UML builds without warnings here on x86_64.

> And by the way, AFAIK, GCC has the habit of breaking compatibility with
> some userspace apps when a new GCC major version is released. And,
> AFAIK, they're moving towards standards, so relying on GCC's
> "guarantees" may backfire.

The GCC guarantee I'm talking about it LP64 - I'm highly confident
that's not changing any time soon.

> >>You're calling glibc functions
> >>with that fd as a parameter. On some arches, compiling will issue
> >>warnings or simply fail.
> >
> >Which ones?
> >
>
> An example is sparc64:
> quote from
> >>http://lxr.linux.no/source/include/asm-sparc64/types.h#L49

What warnings does this produce? These are the same sizes as x86_64
(and every other 64-bit arch that Linux runs on, I bet), where this
code compiles without warning.

> Really, a kmalloc() isn't such a big deal, it only happens once and
> we're not in interrupt context.

It's not the runtime cost - it's the extra code.

> One the other hand, ensuring safety and
> portability on other arches is something that needs to be taken care
> of.

You haven't demonstrated any safety or portability problems yet.

Jeff

--
Work email - jdike at linux dot intel dot com

2007-06-12 05:44:47

by Eduard-Gabriel Munteanu

[permalink] [raw]
Subject: Re: [PATCH 1/1] UML: fix missing non-blocking I/O, now DEBUG_SHIRQ works

*This message was transferred with a trial version of CommuniGate(r) Pro*
Jeff Dike wrote:
> No it won't. UML builds without warnings here on x86_64.

Okay, I don't have an x86_64, sparc64 or something similar, as my
computer is an x86, so I can't contradict this. If everything is fine on
such arches, no fix is needed when nothing's broken... though I still
think it's kind of ugly (though it's somehow ingenious as a hack) :)

Do you want me to resubmit the patch without these changes? I'll be back
in a few hours (got to go now) and trim this out from the patch if
that's necessary.

2007-06-12 13:27:36

by Jeff Dike

[permalink] [raw]
Subject: Re: [PATCH 1/1] UML: fix missing non-blocking I/O, now DEBUG_SHIRQ works

On Tue, Jun 12, 2007 at 08:44:37AM +0300, Eduard-Gabriel Munteanu wrote:
> Okay, I don't have an x86_64, sparc64 or something similar, as my
> computer is an x86, so I can't contradict this. If everything is fine on
> such arches, no fix is needed when nothing's broken... though I still
> think it's kind of ugly (though it's somehow ingenious as a hack) :)

Yeah, I would agree with that.

> Do you want me to resubmit the patch without these changes? I'll be back
> in a few hours (got to go now) and trim this out from the patch if
> that's necessary.

I have it in my tree now, without that set of changes. If you want to
make the change yourself, send me a new patch, and I'll use it.
Otherwise, I'll stick with what I have now.

Jeff

--
Work email - jdike at linux dot intel dot com