LinuxLists.cc - [PATCH 0/10] "Bigger" UML fixes for 2.6.14

[permalink] [raw]

Subject: [PATCH 05/10] uml: fix condition in tlb flush

From: Paolo 'Blaisorblade' Giarrusso <[email protected]>

Avoid setting w = 0 twice. Spotted this (trivial) thing which is needed for
another patch.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[email protected]>
---

arch/um/kernel/tlb.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/um/kernel/tlb.c b/arch/um/kernel/tlb.c
--- a/arch/um/kernel/tlb.c
+++ b/arch/um/kernel/tlb.c
@@ -193,12 +193,12 @@ void fix_range_common(struct mm_struct *
r = pte_read(*npte);
w = pte_write(*npte);
x = pte_exec(*npte);
- if(!pte_dirty(*npte))
- w = 0;
- if(!pte_young(*npte)){
- r = 0;
- w = 0;
- }
+ if (!pte_young(*npte)) {
+ r = 0;
+ w = 0;
+ } else if (!pte_dirty(*npte)) {
+ w = 0;
+ }
if(force || pte_newpage(*npte)){
if(pte_present(*npte))
ret = add_mmap(addr,

2005-09-21 17:49:25

[permalink] [raw]

Subject: [PATCH 10/10] uml: replace printk with "stack-friendly" printf - to report console failure

From: Paolo 'Blaisorblade' Giarrusso <[email protected]>

User get *a lot* confused when consoles don't work but we don't report anything.
And, as reported in the comment, using printk to report "your console doesn't
work" isn't likely to go that far.

Fix the problem on the base of this: stack consumption by host printf(). Use
kernel sprintf() and os_write_file, using a wild guess that one page will be
enough for the message, to preallocate the buffer with kmalloc().

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[email protected]>
---

arch/um/drivers/chan_kern.c | 58 +++++++++++++++++++++++++++++++------------
1 files changed, 42 insertions(+), 16 deletions(-)

diff --git a/arch/um/drivers/chan_kern.c b/arch/um/drivers/chan_kern.c
--- a/arch/um/drivers/chan_kern.c
+++ b/arch/um/drivers/chan_kern.c
@@ -19,18 +19,44 @@
#include "line.h"
#include "os.h"

-#ifdef CONFIG_NOCONFIG_CHAN
+/* XXX: could well be moved to somewhere else, if needed. */
+static int my_printf(const char * fmt, ...)
+ __attribute__ ((format (printf, 1, 2)));
+
+static int my_printf(const char * fmt, ...)
+{
+ /* Yes, can be called on atomic context.*/
+ char *buf = kmalloc(4096, GFP_ATOMIC);
+ va_list args;
+ int r;
+
+ if (!buf) {
+ /* We print directly fmt.
+ * Yes, yes, yes, feel free to complain. */
+ r = strlen(fmt);
+ } else {
+ va_start(args, fmt);
+ r = vsprintf(buf, fmt, args);
+ va_end(args);
+ fmt = buf;
+ }
+
+ if (r)
+ r = os_write_file(1, fmt, r);
+ return r;

-/* The printk's here are wrong because we are complaining that there is no
- * output device, but printk is printing to that output device. The user will
- * never see the error. printf would be better, except it can't run on a
- * kernel stack because it will overflow it.
- * Use printk for now since that will avoid crashing.
- */
+}
+
+#ifdef CONFIG_NOCONFIG_CHAN
+/* Despite its name, there's no added trailing newline. */
+static int my_puts(const char * buf)
+{
+ return os_write_file(1, buf, strlen(buf));
+}

static void *not_configged_init(char *str, int device, struct chan_opts *opts)
{
- printk(KERN_ERR "Using a channel type which is configured out of "
+ my_puts("Using a channel type which is configured out of "
"UML\n");
return(NULL);
}
@@ -38,27 +64,27 @@ static void *not_configged_init(char *st
static int not_configged_open(int input, int output, int primary, void *data,
char **dev_out)
{
- printk(KERN_ERR "Using a channel type which is configured out of "
+ my_puts("Using a channel type which is configured out of "
"UML\n");
return(-ENODEV);
}

static void not_configged_close(int fd, void *data)
{
- printk(KERN_ERR "Using a channel type which is configured out of "
+ my_puts("Using a channel type which is configured out of "
"UML\n");
}

static int not_configged_read(int fd, char *c_out, void *data)
{
- printk(KERN_ERR "Using a channel type which is configured out of "
+ my_puts("Using a channel type which is configured out of "
"UML\n");
return(-EIO);
}

static int not_configged_write(int fd, const char *buf, int len, void *data)
{
- printk(KERN_ERR "Using a channel type which is configured out of "
+ my_puts("Using a channel type which is configured out of "
"UML\n");
return(-EIO);
}
@@ -66,7 +92,7 @@ static int not_configged_write(int fd, c
static int not_configged_console_write(int fd, const char *buf, int len,
void *data)
{
- printk(KERN_ERR "Using a channel type which is configured out of "
+ my_puts("Using a channel type which is configured out of "
"UML\n");
return(-EIO);
}
@@ -74,14 +100,14 @@ static int not_configged_console_write(i
static int not_configged_window_size(int fd, void *data, unsigned short *rows,
unsigned short *cols)
{
- printk(KERN_ERR "Using a channel type which is configured out of "
+ my_puts("Using a channel type which is configured out of "
"UML\n");
return(-ENODEV);
}

static void not_configged_free(void *data)
{
- printf(KERN_ERR "Using a channel type which is configured out of "
+ my_puts("Using a channel type which is configured out of "
"UML\n");
}

@@ -457,7 +483,7 @@ static struct chan *parse_chan(char *str
}
}
if(ops == NULL){
- printk(KERN_ERR "parse_chan couldn't parse \"%s\"\n",
+ my_printf("parse_chan couldn't parse \"%s\"\n",
str);
return(NULL);
}

2005-09-21 17:49:25

[permalink] [raw]

Subject: [PATCH 01/10] uml: don't remove umid files in conflict case

From: Paolo 'Blaisorblade' Giarrusso <[email protected]>

Only remove the UML pidfile and management socket if we created them.
Currently in case two UMLs are started with the same umid, the second will
remove the first's ones.

Probably we should also panic() at that point, not sure however.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[email protected]>
---

arch/um/kernel/umid.c | 30 +++++++++++++++++++-----------
1 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/arch/um/kernel/umid.c b/arch/um/kernel/umid.c
--- a/arch/um/kernel/umid.c
+++ b/arch/um/kernel/umid.c
@@ -31,6 +31,8 @@ static char *uml_dir = UML_DIR;
/* Changed by set_umid */
static int umid_is_random = 1;
static int umid_inited = 0;
+/* Have we created the files? Should we remove them? */
+static int umid_owned = 0;

static int make_umid(int (*printer)(const char *fmt, ...));

@@ -82,20 +84,21 @@ int __init umid_file_name(char *name, ch

extern int tracing_pid;

-static int __init create_pid_file(void)
+static void __init create_pid_file(void)
{
char file[strlen(uml_dir) + UMID_LEN + sizeof("/pid\0")];
char pid[sizeof("nnnnn\0")];
int fd, n;

- if(umid_file_name("pid", file, sizeof(file))) return 0;
+ if(umid_file_name("pid", file, sizeof(file)))
+ return;

fd = os_open_file(file, of_create(of_excl(of_rdwr(OPENFLAGS()))),
0644);
if(fd < 0){
printf("Open of machine pid file \"%s\" failed: %s\n",
file, strerror(-fd));
- return 0;
+ return;
}

sprintf(pid, "%d\n", os_getpid());
@@ -103,7 +106,6 @@ static int __init create_pid_file(void)
if(n != strlen(pid))
printf("Write of pid file failed - err = %d\n", -n);
os_close_file(fd);
- return 0;
}

static int actually_do_remove(char *dir)
@@ -147,7 +149,8 @@ static int actually_do_remove(char *dir)
void remove_umid_dir(void)
{
char dir[strlen(uml_dir) + UMID_LEN + 1];
- if(!umid_inited) return;
+ if (!umid_owned)
+ return;

sprintf(dir, "%s%s", uml_dir, umid);
actually_do_remove(dir);
@@ -155,11 +158,12 @@ void remove_umid_dir(void)

char *get_umid(int only_if_set)
{
- if(only_if_set && umid_is_random) return(NULL);
- return(umid);
+ if(only_if_set && umid_is_random)
+ return NULL;
+ return umid;
}

-int not_dead_yet(char *dir)
+static int not_dead_yet(char *dir)
{
char file[strlen(uml_dir) + UMID_LEN + sizeof("/pid\0")];
char pid[sizeof("nnnnn\0")], *end;
@@ -193,7 +197,8 @@ int not_dead_yet(char *dir)
(p == CHOOSE_MODE(tracing_pid, os_getpid())))
dead = 1;
}
- if(!dead) return(1);
+ if(!dead)
+ return(1);
return(actually_do_remove(dir));
}

@@ -286,6 +291,7 @@ static int __init make_umid(int (*printe
if(errno == EEXIST){
if(not_dead_yet(tmp)){
(*printer)("umid '%s' is in use\n", umid);
+ umid_owned = 0;
return(-1);
}
err = mkdir(tmp, 0777);
@@ -296,7 +302,8 @@ static int __init make_umid(int (*printe
return(-1);
}

- return(0);
+ umid_owned = 1;
+ return 0;
}

__uml_setup("uml_dir=", set_uml_dir,
@@ -309,7 +316,8 @@ static int __init make_umid_setup(void)
/* one function with the ordering we need ... */
make_uml_dir();
make_umid(printf);
- return create_pid_file();
+ create_pid_file();
+ return 0;
}
__uml_postsetup(make_umid_setup);

2005-09-21 17:48:55

[permalink] [raw]

Subject: [PATCH 08/10] uml: Fix GFP_ flags usage

From: Paolo 'Blaisorblade' Giarrusso <[email protected]>

GFP_ATOMIC | GFP_KERNEL is meaningless and won't work. Actually it never worked,
even in 2.4.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[email protected]>
---

arch/um/kernel/process_kern.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/um/kernel/process_kern.c b/arch/um/kernel/process_kern.c
--- a/arch/um/kernel/process_kern.c
+++ b/arch/um/kernel/process_kern.c
@@ -82,7 +82,8 @@ unsigned long alloc_stack(int order, int
unsigned long page;
int flags = GFP_KERNEL;

- if(atomic) flags |= GFP_ATOMIC;
+ if (atomic)
+ flags = GFP_ATOMIC;
page = __get_free_pages(flags, order);
if(page == 0)
return(0);

2005-09-21 17:50:53

[permalink] [raw]

Subject: [PATCH 06/10] uml: run mconsole "sysrq" in process context

From: Paolo 'Blaisorblade' Giarrusso <[email protected]>

Things are breaking horribly with sysrq called in interrupt context. I want to
try to fix it, but probably this is simpler. To tell the truth, sysrq is
normally run in interrupt context, so there shouldn't be any problem.

There's also a warning from the fault handler because it's run in atomic
context (I have a patch for that, only I deferred it). This is why I'm doing
this.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[email protected]>
---

arch/um/drivers/mconsole_user.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/um/drivers/mconsole_user.c b/arch/um/drivers/mconsole_user.c
--- a/arch/um/drivers/mconsole_user.c
+++ b/arch/um/drivers/mconsole_user.c
@@ -23,7 +23,7 @@ static struct mconsole_command commands[
{ "reboot", mconsole_reboot, MCONSOLE_PROC },
{ "config", mconsole_config, MCONSOLE_PROC },
{ "remove", mconsole_remove, MCONSOLE_PROC },
- { "sysrq", mconsole_sysrq, MCONSOLE_INTR },
+ { "sysrq", mconsole_sysrq, MCONSOLE_PROC },
{ "help", mconsole_help, MCONSOLE_INTR },
{ "cad", mconsole_cad, MCONSOLE_INTR },
{ "stop", mconsole_stop, MCONSOLE_PROC },

2005-09-21 17:49:58

[permalink] [raw]

Subject: [PATCH 03/10] uml: don't redundantly mark pte as newpage in pte_modify

From: Paolo 'Blaisorblade' Giarrusso <[email protected]>

pte_modify marks a page as needing flush, which is redundant because the
resulting PTE is still set with set_pte, which already handles that.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[email protected]>
---

include/asm-um/pgtable.h | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/include/asm-um/pgtable.h b/include/asm-um/pgtable.h
--- a/include/asm-um/pgtable.h
+++ b/include/asm-um/pgtable.h
@@ -346,7 +346,6 @@ static inline void set_pte(pte_t *pteptr
static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
{
pte_set_val(pte, (pte_val(pte) & _PAGE_CHG_MASK), newprot);
- if(pte_present(pte)) pte = pte_mknewpage(pte_mknewprot(pte));
return pte;
}

2005-09-21 17:49:59

[permalink] [raw]

Subject: [PATCH 02/10] strlcat: use for uml umid.c

From: Paolo 'Blaisorblade' Giarrusso <[email protected]>

Simplify the code by using strlcat() instead of strncat() and manual
appending.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[email protected]>
---

arch/um/include/user.h | 4 +++-
arch/um/kernel/umid.c | 11 ++++-------
2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/um/include/user.h b/arch/um/include/user.h
--- a/arch/um/include/user.h
+++ b/arch/um/include/user.h
@@ -14,7 +14,9 @@ extern void *um_kmalloc_atomic(int size)
extern void kfree(void *ptr);
extern int in_aton(char *str);
extern int open_gdb_chan(void);
-extern int strlcpy(char *, const char *, int);
+/* These use size_t, however unsigned long is correct on both i386 and x86_64. */
+extern unsigned long strlcpy(char *, const char *, unsigned long);
+extern unsigned long strlcat(char *, const char *, unsigned long);
extern void *um_vmalloc(int size);
extern void vfree(void *ptr);

diff --git a/arch/um/kernel/umid.c b/arch/um/kernel/umid.c
--- a/arch/um/kernel/umid.c
+++ b/arch/um/kernel/umid.c
@@ -237,16 +237,13 @@ static int __init make_uml_dir(void)
strlcpy(dir, home, sizeof(dir));
uml_dir++;
}
+ strlcat(dir, uml_dir, sizeof(dir));
len = strlen(dir);
- strncat(dir, uml_dir, sizeof(dir) - len);
- len = strlen(dir);
- if((len > 0) && (len < sizeof(dir) - 1) && (dir[len - 1] != '/')){
- dir[len] = '/';
- dir[len + 1] = '\0';
- }
+ if (len > 0 && dir[len - 1] != '/')
+ strlcat(dir, "/", sizeof(dir));

uml_dir = malloc(strlen(dir) + 1);
- if(uml_dir == NULL){
+ if (uml_dir == NULL) {
printf("make_uml_dir : malloc failed, errno = %d\n", errno);
exit(1);
}

2005-09-21 17:50:53

[permalink] [raw]

Subject: [PATCH 09/10] Uml: use GFP_ATOMIC for allocations under spinlocks.

From: Paolo 'Blaisorblade' Giarrusso <[email protected]>

setup_initial_poll is only called with sigio_lock() held, so use appropriate
allocation.

Also, parse_chan() can also be called when holding a spinlock (see line_open()
-> parse_chan_pair()).

I have sporadical problems (spinlock taken twice, with spinlock debugging on
UP) which could be caused by a sequence like "take spinlock, alloc and go to
sleep, take again the spinlock in the other thread".

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[email protected]>
---

arch/um/drivers/chan_kern.c | 2 +-
arch/um/kernel/sigio_user.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/um/drivers/chan_kern.c b/arch/um/drivers/chan_kern.c
--- a/arch/um/drivers/chan_kern.c
+++ b/arch/um/drivers/chan_kern.c
@@ -465,7 +465,7 @@ static struct chan *parse_chan(char *str
data = (*ops->init)(str, device, opts);
if(data == NULL) return(NULL);

- chan = kmalloc(sizeof(*chan), GFP_KERNEL);
+ chan = kmalloc(sizeof(*chan), GFP_ATOMIC);
if(chan == NULL) return(NULL);
*chan = ((struct chan) { .list = LIST_HEAD_INIT(chan->list),
.primary = 1,
diff --git a/arch/um/kernel/sigio_user.c b/arch/um/kernel/sigio_user.c
--- a/arch/um/kernel/sigio_user.c
+++ b/arch/um/kernel/sigio_user.c
@@ -340,7 +340,7 @@ static int setup_initial_poll(int fd)
{
struct pollfd *p;

- p = um_kmalloc(sizeof(struct pollfd));
+ p = um_kmalloc_atomic(sizeof(struct pollfd));
if(p == NULL){
printk("setup_initial_poll : failed to allocate poll\n");
return(-1);

2005-09-21 17:50:53

[permalink] [raw]

Subject: [PATCH 04/10] uml: fix hang in TT mode on fault

From: Paolo 'Blaisorblade' Giarrusso <[email protected]>

The current code doesn't handle well general protection faults on the host - it
thinks that cr2 is always the address of a page fault. While actually, on
general protection faults, that address is not accessible, so we'd better assume
we couldn't satisfy the fault. Currently instead we think we've fixed it, so we
go back, retry the instruction and fault again endlessly.

This leads to the kernel hanging when doing copy_from_user(dest, -1, ...) in TT
mode, since reading *(-1) causes a GFP, and we don't support kernel preemption.

Thanks to Luo Xin for testing UML with LTP and reporting the failures he got.

Cc: Luo Xin <[email protected]>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[email protected]>
---

arch/um/kernel/trap_kern.c | 11 ++++++++++-
arch/um/kernel/tt/uaccess_user.c | 11 +++++++++--
2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/um/kernel/trap_kern.c b/arch/um/kernel/trap_kern.c
--- a/arch/um/kernel/trap_kern.c
+++ b/arch/um/kernel/trap_kern.c
@@ -18,6 +18,7 @@
#include "asm/a.out.h"
#include "asm/current.h"
#include "asm/irq.h"
+#include "sysdep/sigcontext.h"
#include "user_util.h"
#include "kern_util.h"
#include "kern.h"
@@ -125,7 +126,15 @@ unsigned long segv(struct faultinfo fi,
}
else if(current->mm == NULL)
panic("Segfault with no mm");
- err = handle_page_fault(address, ip, is_write, is_user, &si.si_code);
+
+ if (SEGV_IS_FIXABLE(&fi))
+ err = handle_page_fault(address, ip, is_write, is_user, &si.si_code);
+ else {
+ err = -EFAULT;
+ /* A thread accessed NULL, we get a fault, but CR2 is invalid.
+ * This code is used in __do_copy_from_user() of TT mode. */
+ address = 0;
+ }

catcher = current->thread.fault_catcher;
if(!err)
diff --git a/arch/um/kernel/tt/uaccess_user.c b/arch/um/kernel/tt/uaccess_user.c
--- a/arch/um/kernel/tt/uaccess_user.c
+++ b/arch/um/kernel/tt/uaccess_user.c
@@ -22,8 +22,15 @@ int __do_copy_from_user(void *to, const
__do_copy, &faulted);
TASK_REGS(get_current())->tt = save;

- if(!faulted) return(0);
- else return(n - (fault - (unsigned long) from));
+ if(!faulted)
+ return 0;
+ else if (fault)
+ return n - (fault - (unsigned long) from);
+ else
+ /* In case of a general protection fault, we don't have the
+ * fault address, so NULL is used instead. Pretend we didn't
+ * copy anything. */
+ return n;
}

static void __do_strncpy(void *dst, const void *src, int count)

2005-09-21 17:50:47

[permalink] [raw]

Subject: [PATCH 07/10] uml: avoid fixing faults while atomic

From: Paolo 'Blaisorblade' Giarrusso <[email protected]>

Following i386, we should maybe refuse trying to fault in pages when we're
doing atomic operations, because to handle the fault we could need to take
already taken spinlocks.

Also, if we're doing an atomic operation (in the sense of in_atomic()) we're
surely in kernel mode and we're surely going to handle adequately the failed
fault, so it's safe to behave this way.

Currently, on UML SMP is rarely used, and we don't support PREEMPT, so this is
unlikely to create problems right now, but it might in the future.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[email protected]>
---

arch/um/kernel/trap_kern.c | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/um/kernel/trap_kern.c b/arch/um/kernel/trap_kern.c
--- a/arch/um/kernel/trap_kern.c
+++ b/arch/um/kernel/trap_kern.c
@@ -40,6 +40,12 @@ int handle_page_fault(unsigned long addr
int err = -EFAULT;

*code_out = SEGV_MAPERR;
+
+ /* If the fault was during atomic operation, don't take the fault, just
+ * fail. */
+ if (in_atomic())
+ goto out_nosemaphore;
+
down_read(&mm->mmap_sem);
vma = find_vma(mm, address);
if(!vma)
@@ -90,6 +96,7 @@ survive:
flush_tlb_page(vma, address);
out:
up_read(&mm->mmap_sem);
+out_nosemaphore:
return(err);

/*

2005-09-21 19:20:58

by Bill Davidsen

[permalink] [raw]

Subject: Re: [PATCH 08/10] uml: Fix GFP_ flags usage

Paolo 'Blaisorblade' Giarrusso wrote:
> From: Paolo 'Blaisorblade' Giarrusso <[email protected]>
>
> GFP_ATOMIC | GFP_KERNEL is meaningless and won't work. Actually it never worked,
> even in 2.4.
>
> Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[email protected]>
> ---
>
> arch/um/kernel/process_kern.c | 3 ++-
> 1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/arch/um/kernel/process_kern.c b/arch/um/kernel/process_kern.c
> --- a/arch/um/kernel/process_kern.c
> +++ b/arch/um/kernel/process_kern.c
> @@ -82,7 +82,8 @@ unsigned long alloc_stack(int order, int
> unsigned long page;
> int flags = GFP_KERNEL;
>
> - if(atomic) flags |= GFP_ATOMIC;
> + if (atomic)
> + flags = GFP_ATOMIC;
> page = __get_free_pages(flags, order);
> if(page == 0)
> return(0);
>

int flags = (atomic ? GFP_ATOMIC : GFP_KERNEL);

--
-bill davidsen ([email protected])
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me

2005-09-21 19:50:47

[permalink] [raw]

Subject: Re: [PATCH 07/10] uml: avoid fixing faults while atomic

"Paolo 'Blaisorblade' Giarrusso" <[email protected]> wrote:
>
> From: Paolo 'Blaisorblade' Giarrusso <[email protected]>
>
> Following i386, we should maybe refuse trying to fault in pages when we're
> doing atomic operations, because to handle the fault we could need to take
> already taken spinlocks.
>
> Also, if we're doing an atomic operation (in the sense of in_atomic()) we're
> surely in kernel mode and we're surely going to handle adequately the failed
> fault, so it's safe to behave this way.
>
> Currently, on UML SMP is rarely used, and we don't support PREEMPT, so this is
> unlikely to create problems right now, but it might in the future.
>

That's not really an accurate explanation/understanding of what's going on
in there.

There's an extremely special-case in the pagefault handlers where we fail
the fault if in_atomic(). It's unrelated to spinlocks (spinlocks don't
even cause in_atomic() to become true if !CONFIG_PREEMPT).

It has to do with kmap_atomic(). There's tricksy code in mm/filemap.c
which will fault the target page in by hand and will then take an atomic
kmap and will then raise current->preempt_count by hand, so in_atomic()
becomes true even if !COFNIG_PREEMPT.

So at this stage we expect to be able to do a copy_to/from_user to/from
pagecache without taking a fault, because we just faulted the page in by
hand. And we're not allowed to take a fault, because we're holding an
atomic kmap. But if we _do_ take a fault (extreme memory pressure, racing
munmap, etc) then we want to fail the pagefault immediately.

The in_atomic() test in x86's do_page_fault() is in fact a message passed
into it from filemap.c's kmap_atomic(). It has accidental side-effects,
such as making copy_to_user() fail if inside spinlocks when
CONFIG_PREEMPT=y.

So I think this change is only needed if UML implements kmap_atomic, as in
arch/i386/mm/highmem.c, which it surely does not do?

>
> diff --git a/arch/um/kernel/trap_kern.c b/arch/um/kernel/trap_kern.c
> --- a/arch/um/kernel/trap_kern.c
> +++ b/arch/um/kernel/trap_kern.c
> @@ -40,6 +40,12 @@ int handle_page_fault(unsigned long addr
> int err = -EFAULT;
>
> *code_out = SEGV_MAPERR;
> +
> + /* If the fault was during atomic operation, don't take the fault, just
> + * fail. */
> + if (in_atomic())
> + goto out_nosemaphore;
> +
> down_read(&mm->mmap_sem);
> vma = find_vma(mm, address);
> if(!vma)
> @@ -90,6 +96,7 @@ survive:
> flush_tlb_page(vma, address);
> out:
> up_read(&mm->mmap_sem);
> +out_nosemaphore:
> return(err);
>
> /*

2005-09-21 20:24:07

[permalink] [raw]

Subject: Re: [uml-devel] Re: [PATCH 07/10] uml: avoid fixing faults while atomic

On Wednesday 21 September 2005 21:49, Andrew Morton wrote:
> "Paolo 'Blaisorblade' Giarrusso" <[email protected]> wrote:
> > From: Paolo 'Blaisorblade' Giarrusso <[email protected]>

> The in_atomic() test in x86's do_page_fault() is in fact a message passed
> into it from filemap.c's kmap_atomic().
Ok, this can be ok, but:
> It has accidental side-effects,
> such as making copy_to_user() fail if inside spinlocks when
> CONFIG_PREEMPT=y.
Sorry, but should it ever succeed inside spinlocks? I mean, should it ever
call down() inside spinlocks? (We never do down_trylock, and ever if we did
the x86 trick, that wouldn't make the whole thing safe at all - they still
take the spinlock and potentially sleep. And it's legal only if no spinlock
is held).

Even if spinlocks don't always trigger in_atomic() - which means that we'd
need to have a better fix for this.

(Btw, I took the above reasoning from something said, as an aside, on LWN.net
kernel page, about the FUTEX deadlock on mm->mmap_sem of ~ 2.6.8 - yes, it
wasn't the full truth, but not totally dumb).

> So I think this change is only needed if UML implements kmap_atomic, as in
> arch/i386/mm/highmem.c, which it surely does not do?
NACK, see above.

--
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade

___________________________________
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB
http://mail.yahoo.it

2005-09-21 20:30:26

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH 07/10] uml: avoid fixing faults while atomic

On Wed, 21 Sep 2005, Andrew Morton wrote:
>
> There's an extremely special-case in the pagefault handlers where we fail
> the fault if in_atomic(). It's unrelated to spinlocks (spinlocks don't
> even cause in_atomic() to become true if !CONFIG_PREEMPT).

There's a few other places where we use those semantics, though.

Like "get_futex_value_locked()".

> So I think this change is only needed if UML implements kmap_atomic, as in
> arch/i386/mm/highmem.c, which it surely does not do?

No. Every architecture needs to honor it, or they are screwed.

Linus

2005-09-21 20:48:13

[permalink] [raw]

Subject: Re: [uml-devel] Re: [PATCH 07/10] uml: avoid fixing faults while atomic

Blaisorblade <[email protected]> wrote:
>
> On Wednesday 21 September 2005 21:49, Andrew Morton wrote:
> > "Paolo 'Blaisorblade' Giarrusso" <[email protected]> wrote:
> > > From: Paolo 'Blaisorblade' Giarrusso <[email protected]>
>
> > The in_atomic() test in x86's do_page_fault() is in fact a message passed
> > into it from filemap.c's kmap_atomic().
> Ok, this can be ok, but:
> > It has accidental side-effects,
> > such as making copy_to_user() fail if inside spinlocks when
> > CONFIG_PREEMPT=y.
> Sorry, but should it ever succeed inside spinlocks? I mean, should it ever
> call down() inside spinlocks? (We never do down_trylock, and ever if we did
> the x86 trick, that wouldn't make the whole thing safe at all - they still
> take the spinlock and potentially sleep. And it's legal only if no spinlock
> is held).

Not sure what you're asking here.

copy_to/from_user() will fail inside spinlock if CONFIG_PREMPT=y and if the
copy happens to cause a fault. Otherwise it will succeed inside spinlock,
and it won't spew a sleeping-while-atomic warning, because that uses
in_atomic() too. It might deadlock if we schedule away and try to retake
the same lock.

> Even if spinlocks don't always trigger in_atomic() - which means that we'd
> need to have a better fix for this.

The patch you have will correctly cause copy_*_user()->pagefault to fail
the copy if the caller has run inc_preempt_count(). It will not cause
copy_*_user()->pagefault to fail inside spinlocks unless UML does
inc_preempt_count() in its spinlock implementation.

> (Btw, I took the above reasoning from something said, as an aside, on LWN.net
> kernel page, about the FUTEX deadlock on mm->mmap_sem of ~ 2.6.8 - yes, it
> wasn't the full truth, but not totally dumb).
>
> > So I think this change is only needed if UML implements kmap_atomic, as in
> > arch/i386/mm/highmem.c, which it surely does not do?
> NACK, see above.

Yup, the patch is needed for the futex code, and for general correct
implementation of inc_preempt_count()'s intended effect.

2005-09-21 20:57:38

[permalink] [raw]

Subject: Re: [uml-devel] [PATCH 06/10] uml: run mconsole "sysrq" in process context

On Wed, Sep 21, 2005 at 07:28:57PM +0200, Paolo 'Blaisorblade' Giarrusso wrote:
> From: Paolo 'Blaisorblade' Giarrusso <[email protected]>
>
> Things are breaking horribly with sysrq called in interrupt context. I want to
> try to fix it, but probably this is simpler. To tell the truth, sysrq is
> normally run in interrupt context, so there shouldn't be any problem.

How are they breaking?

Jeff

2005-09-21 21:28:48

[permalink] [raw]

Subject: Re: [uml-devel] [PATCH 08/10] uml: Fix GFP_ flags usage

On Wed, Sep 21, 2005 at 07:29:21PM +0200, Paolo 'Blaisorblade' Giarrusso wrote:
> - if(atomic) flags |= GFP_ATOMIC;
> + if (atomic)
> + flags = GFP_ATOMIC;

We should take a careful look at where this is called from, and see if we
can get rid of the atomic business altogether.

Jeff

2005-09-22 19:44:25

[permalink] [raw]

Subject: Re: [uml-devel] [PATCH 06/10] uml: run mconsole "sysrq" in process context

On Wednesday 21 September 2005 22:50, Jeff Dike wrote:
> On Wed, Sep 21, 2005 at 07:28:57PM +0200, Paolo 'Blaisorblade' Giarrusso
wrote:
> > From: Paolo 'Blaisorblade' Giarrusso <[email protected]>
> >
> > Things are breaking horribly with sysrq called in interrupt context. I
> > want to try to fix it, but probably this is simpler. To tell the truth,
> > sysrq is normally run in interrupt context, so there shouldn't be any
> > problem.
>
> How are they breaking?
sysrq t is broken (and stays), but additionally there are some warnings from
some commands (enable sleep inside spinlock checking and spinlock debugging),
which go to the down_read inside handle_page_fault IIRC. So try to run in
process context.

--
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade

___________________________________
Yahoo! Messenger: chiamate gratuite in tutto il mondo
http://it.messenger.yahoo.com

2005-09-22 19:46:51

[permalink] [raw]

Subject: Re: [uml-devel] Re: [PATCH 07/10] uml: avoid fixing faults while atomic

On Wednesday 21 September 2005 22:47, Andrew Morton wrote:
> Blaisorblade <[email protected]> wrote:
> > On Wednesday 21 September 2005 21:49, Andrew Morton wrote:
> > > "Paolo 'Blaisorblade' Giarrusso" <[email protected]> wrote:
> > > > From: Paolo 'Blaisorblade' Giarrusso <[email protected]>

> > > It has accidental side-effects,
> > > such as making copy_to_user() fail if inside spinlocks when
> > > CONFIG_PREEMPT=y.

> > Sorry, but should it ever succeed inside spinlocks? I mean, should it
> > ever call down() inside spinlocks? (We never do down_trylock, and ever if
> > we did the x86 trick, that wouldn't make the whole thing safe at all -
> > they still take the spinlock and potentially sleep. And it's legal only
> > if no spinlock is held).

> Not sure what you're asking here.

> copy_to/from_user() will fail inside spinlock if CONFIG_PREMPT=y and if the
> copy happens to cause a fault.

> Otherwise it will succeed inside spinlock,
> and it won't spew a sleeping-while-atomic warning, because that uses
> in_atomic() too.

> It might deadlock if we schedule away and try to retake
> the same lock.
Exactly - the point is: is it legal to call copy_from_user() while holding a
spinlock (which is my original question)? Or should copy_from_user try to
satisfy the fault, instead of seeing in_atomic() or something similar and
fail?

>From what you say, copy_*_user is called in such a way, but it's
deadlock-prone, when CONFIG_PREEMPT is disabled.
> > Even if spinlocks don't always trigger in_atomic() - which means that
> > we'd need to have a better fix for this.

> The patch you have will correctly cause copy_*_user()->pagefault to fail
> the copy if the caller has run inc_preempt_count(). It will not cause
> copy_*_user()->pagefault to fail inside spinlocks unless UML does
> inc_preempt_count() in its spinlock implementation.
No, it doesn't. But for that case, we're in the same situation as i386.

Consider even that while UML doesn't implement PREEMPT (but we'll fix that,
sooner or later) it does implement HIGHMEM, and answering to your original
question:

> So I think this change is only needed if UML implements kmap_atomic, as in
> arch/i386/mm/highmem.c, which it surely does not do?
Apart that anybody would make sure that atomic kmaps are indeed atomic, UML
doesn't use a "similar implementation" - it verbatim symlinks the i386 file
and builds it (see arch/um/sys-i386/Makefile and
arch/um/scripts/Makefile.rules if you want).

Hmm, that's something worth considering... at least when debugging is enabled,
for debugging purposes.
> > NACK, see above.

> Yup, the patch is needed for the futex code,
> and for general correct
> implementation of inc_preempt_count()'s intended effect.
Ok, that's enough I guess (and I just read what Linus said).
--
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade

___________________________________
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB
http://mail.yahoo.it

2005-09-22 19:58:57

[permalink] [raw]

Subject: Re: [uml-devel] Re: [PATCH 07/10] uml: avoid fixing faults while atomic

Blaisorblade <[email protected]> wrote:
>
> On Wednesday 21 September 2005 22:47, Andrew Morton wrote:
> > Blaisorblade <[email protected]> wrote:
> > > On Wednesday 21 September 2005 21:49, Andrew Morton wrote:
> > > > "Paolo 'Blaisorblade' Giarrusso" <[email protected]> wrote:
> > > > > From: Paolo 'Blaisorblade' Giarrusso <[email protected]>
>
> > > > It has accidental side-effects,
> > > > such as making copy_to_user() fail if inside spinlocks when
> > > > CONFIG_PREEMPT=y.
>
> > > Sorry, but should it ever succeed inside spinlocks? I mean, should it
> > > ever call down() inside spinlocks? (We never do down_trylock, and ever if
> > > we did the x86 trick, that wouldn't make the whole thing safe at all -
> > > they still take the spinlock and potentially sleep. And it's legal only
> > > if no spinlock is held).
>
> > Not sure what you're asking here.
>
> > copy_to/from_user() will fail inside spinlock if CONFIG_PREMPT=y and if the
> > copy happens to cause a fault.
>
> > Otherwise it will succeed inside spinlock,
> > and it won't spew a sleeping-while-atomic warning, because that uses
> > in_atomic() too.
>
> > It might deadlock if we schedule away and try to retake
> > the same lock.
> Exactly - the point is: is it legal to call copy_from_user() while holding a
> spinlock (which is my original question)? Or should copy_from_user try to
> satisfy the fault, instead of seeing in_atomic() or something similar and
> fail?

No, it is not legal to call copy_*_user() while holding a spinlock.

If CONFIG_PREEMPT=n, do_page_fault() has no way of knowing that the caller
holds a spinlock.

2005-09-22 20:44:59

[permalink] [raw]

Subject: Re: [uml-devel] [PATCH 06/10] uml: run mconsole "sysrq" in process context

On Thu, Sep 22, 2005 at 09:20:20PM +0200, Blaisorblade wrote:
> sysrq t is broken (and stays),

There's a fix on the way for that. This has nothing to do with interrupt
context anyway.

> but additionally there are some warnings from
> some commands (enable sleep inside spinlock checking and spinlock debugging),
> which go to the down_read inside handle_page_fault IIRC. So try to run in
> process context.

Which ones? They should be fixed.

It is fairly fundamental to sysrq that it work from interrupt context. You
may be diagnosing a system which can't context switch any more.

This patch should be dropped, and the real problems fixed.

Jeff

2005-09-22 20:54:53

by Linus Torvalds

[permalink] [raw]

Subject: Re: [uml-devel] Re: [PATCH 07/10] uml: avoid fixing faults while atomic

On Thu, 22 Sep 2005, Blaisorblade wrote:
>
> Exactly - the point is: is it legal to call copy_from_user() while holding a
> spinlock (which is my original question)? Or should copy_from_user try to
> satisfy the fault, instead of seeing in_atomic() or something similar and
> fail?

It is not legal to call copy_to/from_user() under a spinlock in general.

But what _is_ legal to do is something slightly more complex, ie

spin_lock(...)
inc_preempt_count();
ret = __copy_from_user_inatomic(..)
dec_preempt_count();
spin_unlock();

but you have to realize that the copy-from-user will fail if the target is
swapped out (so the code that does things like this has to look at the
return value, and if the copy didn't copy anything it needs to release the
locks and do it all over again without the atomic thing).

We don't do it very much, because it gets more complicated and it hasn't
historically been legal, but it could in theory be very useful if you hold
a lock and want to avoid releasing and re-taking it just to do a user
access.

Right now the only case where that happens is the futex code, and maybe
that's a very special case. But maybe it isn't.

Linus

2005-09-22 20:56:19

[permalink] [raw]

Subject: Re: [uml-devel] [PATCH 06/10] uml: run mconsole "sysrq" in process context

On Thursday 22 September 2005 22:37, Jeff Dike wrote:
> On Thu, Sep 22, 2005 at 09:20:20PM +0200, Blaisorblade wrote:
> > sysrq t is broken (and stays),

> There's a fix on the way for that.
Already found? Nice.
> This has nothing to do with interrupt
> context anyway.

> > but additionally there are some warnings from
> > some commands (enable sleep inside spinlock checking and spinlock
> > debugging), which go to the down_read inside handle_page_fault IIRC. So
> > try to run in process context.

> Which ones? They should be fixed.
Ok, will drop this patch and retry again to look at the messages.
> It is fairly fundamental to sysrq that it work from interrupt context. You
> may be diagnosing a system which can't context switch any more.

Ok, I felt a bit uncertain about this, but didn't realize this very problem.

> This patch should be dropped, and the real problems fixed.
Ok, that's nice for me. I'll look into this. Possibly, the

if (in_atomic()) don't take semaphore in handle_page_fault was the right fix
for this. I'll have a look soon.
> Jeff

--
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade

___________________________________
Yahoo! Messenger: chiamate gratuite in tutto il mondo
http://it.messenger.yahoo.com

2005-09-23 07:41:19

[permalink] [raw]

Subject: Re: [uml-devel] [PATCH 06/10] uml: run mconsole "sysrq" in process context

Jeff Dike <[email protected]> wrote:
>
> On Thu, Sep 22, 2005 at 09:20:20PM +0200, Blaisorblade wrote:
> > sysrq t is broken (and stays),
>
> There's a fix on the way for that. This has nothing to do with interrupt
> context anyway.
>
> > but additionally there are some warnings from
> > some commands (enable sleep inside spinlock checking and spinlock debugging),
> > which go to the down_read inside handle_page_fault IIRC. So try to run in
> > process context.
>
> Which ones? They should be fixed.
>
> It is fairly fundamental to sysrq that it work from interrupt context. You
> may be diagnosing a system which can't context switch any more.
>
> This patch should be dropped, and the real problems fixed.
>

Well Linus has just merged this patch. If you hadn't removed me from the
cc yesterday this would not have happened.

2005-09-23 14:32:17