LinuxLists.cc - [00/17] 2.6.32.33-longterm review

2011-03-11 20:42:10

Subject: [00/17] 2.6.32.33-longterm review

This is the start of the longterm review cycle for the 2.6.32.33 release.
There are 17 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let us know. If anyone is a maintainer of the proper subsystem, and
wants to add a Signed-off-by: line to the patch, please respond with it.

Responses should be made by Sunday, March 13, 2011, 20:00:00 UTC.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v2.6/longterm-review/patch-2.6.32.33-rc1.gz
and the diffstat can be found below.

thanks,

greg k-h

Makefile | 2 +-
arch/powerpc/include/asm/hvcall.h | 1 +
arch/powerpc/kernel/crash.c | 11 +++++-
arch/powerpc/kernel/machine_kexec_64.c | 25 +++++++++++++++
arch/powerpc/kernel/setup_64.c | 17 ++++++++--
arch/powerpc/platforms/pseries/hvCall.S | 38 +++++++++++++++++++++++
arch/powerpc/platforms/pseries/lpar.c | 35 ++++++++++++--------
arch/powerpc/platforms/pseries/plpar_wrappers.h | 18 +++++++++++
drivers/net/ixgbe/ixgbe_main.c | 4 ++
drivers/net/r8169.c | 6 ++-
drivers/s390/char/keyboard.c | 3 +-
drivers/staging/comedi/drivers/jr3_pci.c | 7 +++-
fs/nfsd/nfs4xdr.c | 4 +-
include/keys/rxrpc-type.h | 1 -
include/linux/netdevice.h | 4 ++
kernel/cpuset.c | 7 +++-
mm/mremap.c | 4 +--
net/core/dev.c | 12 ++++++-
net/ipv4/ip_gre.c | 1 +
net/ipv4/ipip.c | 1 +
net/ipv6/sit.c | 2 +-
net/netfilter/nf_log.c | 4 ++
22 files changed, 170 insertions(+), 37 deletions(-)

2011-03-11 20:43:01

by Greg KH

[permalink] [raw]

Subject: [01/17] cpuset: add a missing unlock in cpuset_write_resmask()

2.6.32-longterm review patch. If anyone has any objections, please let us know.

------------------

From: Li Zefan <[email protected]>

commit b75f38d659e6fc747eda64cb72f3920e29dd44a4 upstream.

Don't forget to release cgroup_mutex if alloc_trial_cpuset() fails.

[[email protected]: avoid multiple return points]
Signed-off-by: Li Zefan <[email protected]>
Cc: Paul Menage <[email protected]>
Acked-by: David Rientjes <[email protected]>
Cc: Miao Xie <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/cpuset.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1514,8 +1514,10 @@ static int cpuset_write_resmask(struct c
return -ENODEV;

trialcs = alloc_trial_cpuset(cs);
- if (!trialcs)
- return -ENOMEM;
+ if (!trialcs) {
+ retval = -ENOMEM;
+ goto out;
+ }

switch (cft->private) {
case FILE_CPULIST:
@@ -1530,6 +1532,7 @@ static int cpuset_write_resmask(struct c
}

free_trial_cpuset(trialcs);
+out:
cgroup_unlock();
return retval;
}

2011-03-11 20:43:06

by Greg KH

[permalink] [raw]

Subject: [03/17] RxRPC: Fix v1 keys

2.6.32-longterm review patch. If anyone has any objections, please let us know.

------------------

From: Anton Blanchard <[email protected]>

commit f009918a1c1bbf8607b8aab3959876913a30193a upstream.

commit 339412841d7 (RxRPC: Allow key payloads to be passed in XDR form)
broke klog for me. I notice the v1 key struct had a kif_version field
added:

-struct rxkad_key {
- u16 security_index; /* RxRPC header security index */
- u16 ticket_len; /* length of ticket[] */
- u32 expiry; /* time at which expires */
- u32 kvno; /* key version number */
- u8 session_key[8]; /* DES session key */
- u8 ticket[0]; /* the encrypted ticket */
-};

+struct rxrpc_key_data_v1 {
+ u32 kif_version; /* 1 */
+ u16 security_index;
+ u16 ticket_length;
+ u32 expiry; /* time_t */
+ u32 kvno;
+ u8 session_key[8];
+ u8 ticket[0];
+};

However the code in rxrpc_instantiate strips it away:

data += sizeof(kver);
datalen -= sizeof(kver);

Removing kif_version fixes my problem.

Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: David Howells <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/keys/rxrpc-type.h | 1 -
1 file changed, 1 deletion(-)

--- a/include/keys/rxrpc-type.h
+++ b/include/keys/rxrpc-type.h
@@ -99,7 +99,6 @@ struct rxrpc_key_token {
* structure of raw payloads passed to add_key() or instantiate key
*/
struct rxrpc_key_data_v1 {
- u32 kif_version; /* 1 */
u16 security_index;
u16 ticket_length;
u32 expiry; /* time_t */

2011-03-11 20:43:12

by Greg KH

[permalink] [raw]

Subject: [08/17] powerpc: Use more accurate limit for first segment memory allocations

2.6.32-longterm review patch. If anyone has any objections, please let us know.

------------------

From: Anton Blanchard <[email protected]>

commit 095c7965f4dc870ed2b65143b1e2610de653416c upstream.

Author: Milton Miller <[email protected]>

On large machines we are running out of room below 256MB. In some cases we
only need to ensure the allocation is in the first segment, which may be
256MB or 1TB.

Add slb0_limit and use it to specify the upper limit for the irqstack and
emergency stacks.

On a large ppc64 box, this fixes a panic at boot when the crashkernel=
option is specified (previously we would run out of memory below 256MB).

Signed-off-by: Milton Miller <[email protected]>
Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Cc: Kamalesh Babulal <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/powerpc/kernel/setup_64.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)

--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -432,9 +432,18 @@ void __init setup_system(void)
DBG(" <- setup_system()\n");
}

+static u64 slb0_limit(void)
+{
+ if (cpu_has_feature(CPU_FTR_1T_SEGMENT)) {
+ return 1UL << SID_SHIFT_1T;
+ }
+ return 1UL << SID_SHIFT;
+}
+
#ifdef CONFIG_IRQSTACKS
static void __init irqstack_early_init(void)
{
+ u64 limit = slb0_limit();
unsigned int i;

/*
@@ -444,10 +453,10 @@ static void __init irqstack_early_init(v
for_each_possible_cpu(i) {
softirq_ctx[i] = (struct thread_info *)
__va(lmb_alloc_base(THREAD_SIZE,
- THREAD_SIZE, 0x10000000));
+ THREAD_SIZE, limit));
hardirq_ctx[i] = (struct thread_info *)
__va(lmb_alloc_base(THREAD_SIZE,
- THREAD_SIZE, 0x10000000));
+ THREAD_SIZE, limit));
}
}
#else
@@ -478,7 +487,7 @@ static void __init exc_lvl_early_init(vo
*/
static void __init emergency_stack_init(void)
{
- unsigned long limit;
+ u64 limit;
unsigned int i;

/*
@@ -490,7 +499,7 @@ static void __init emergency_stack_init(
* bringup, we need to get at them in real mode. This means they
* must also be within the RMO region.
*/
- limit = min(0x10000000ULL, lmb.rmo_size);
+ limit = min(slb0_limit(), lmb.rmo_size);

for_each_possible_cpu(i) {
unsigned long sp;

2011-03-11 20:43:26

by Greg KH

[permalink] [raw]

Subject: [13/17] netfilter: nf_log: avoid oops in (un)bind with invalid nfproto values

2.6.32-longterm review patch. If anyone has any objections, please let us know.

------------------

From: Jan Engelhardt <[email protected]>

commit 9ef0298a8e5730d9a46d640014c727f3b4152870 upstream.

Like many other places, we have to check that the array index is
within allowed limits, or otherwise, a kernel oops and other nastiness
can ensue when we access memory beyond the end of the array.

[ 5954.115381] BUG: unable to handle kernel paging request at 0000004000000000
[ 5954.120014] IP: __find_logger+0x6f/0xa0
[ 5954.123979] nf_log_bind_pf+0x2b/0x70
[ 5954.123979] nfulnl_recv_config+0xc0/0x4a0 [nfnetlink_log]
[ 5954.123979] nfnetlink_rcv_msg+0x12c/0x1b0 [nfnetlink]
...

The problem goes back to v2.6.30-rc1~1372~1342~31 where nf_log_bind
was decoupled from nf_log_register.

Reported-by: Miguel Di Ciurcio Filho <[email protected]>,
via irc.freenode.net/#netfilter
Signed-off-by: Jan Engelhardt <[email protected]>
Signed-off-by: Patrick McHardy <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
net/netfilter/nf_log.c | 4 ++++
1 file changed, 4 insertions(+)

--- a/net/netfilter/nf_log.c
+++ b/net/netfilter/nf_log.c
@@ -83,6 +83,8 @@ EXPORT_SYMBOL(nf_log_unregister);

int nf_log_bind_pf(u_int8_t pf, const struct nf_logger *logger)
{
+ if (pf >= ARRAY_SIZE(nf_loggers))
+ return -EINVAL;
mutex_lock(&nf_log_mutex);
if (__find_logger(pf, logger->name) == NULL) {
mutex_unlock(&nf_log_mutex);
@@ -96,6 +98,8 @@ EXPORT_SYMBOL(nf_log_bind_pf);

void nf_log_unbind_pf(u_int8_t pf)
{
+ if (pf >= ARRAY_SIZE(nf_loggers))
+ return;
mutex_lock(&nf_log_mutex);
rcu_assign_pointer(nf_loggers[pf], NULL);
mutex_unlock(&nf_log_mutex);

2011-03-11 20:43:25

by Greg KH

[permalink] [raw]

Subject: [17/17] net: dont allow CAP_NET_ADMIN to load non-netdev kernel modules

2.6.32-longterm review patch. If anyone has any objections, please let us know.

------------------

From: Vasiliy Kulikov <[email protected]>

commit 8909c9ad8ff03611c9c96c9a92656213e4bb495b upstream.

Since a8f80e8ff94ecba629542d9b4b5f5a8ee3eb565c any process with
CAP_NET_ADMIN may load any module from /lib/modules/. This doesn't mean
that CAP_NET_ADMIN is a superset of CAP_SYS_MODULE as modules are
limited to /lib/modules/**. However, CAP_NET_ADMIN capability shouldn't
allow anybody load any module not related to networking.

This patch restricts an ability of autoloading modules to netdev modules
with explicit aliases. This fixes CVE-2011-1019.

Arnd Bergmann suggested to leave untouched the old pre-v2.6.32 behavior
of loading netdev modules by name (without any prefix) for processes
with CAP_SYS_MODULE to maintain the compatibility with network scripts
that use autoloading netdev modules by aliases like "eth0", "wlan0".

Currently there are only three users of the feature in the upstream
kernel: ipip, ip_gre and sit.

root@albatros:~# capsh --drop=$(seq -s, 0 11),$(seq -s, 13 34) --
root@albatros:~# grep Cap /proc/$$/status
CapInh: 0000000000000000
CapPrm: fffffff800001000
CapEff: fffffff800001000
CapBnd: fffffff800001000
root@albatros:~# modprobe xfs
FATAL: Error inserting xfs
(/lib/modules/2.6.38-rc6-00001-g2bf4ca3/kernel/fs/xfs/xfs.ko): Operation not permitted
root@albatros:~# lsmod | grep xfs
root@albatros:~# ifconfig xfs
xfs: error fetching interface information: Device not found
root@albatros:~# lsmod | grep xfs
root@albatros:~# lsmod | grep sit
root@albatros:~# ifconfig sit
sit: error fetching interface information: Device not found
root@albatros:~# lsmod | grep sit
root@albatros:~# ifconfig sit0
sit0 Link encap:IPv6-in-IPv4
NOARP MTU:1480 Metric:1

root@albatros:~# lsmod | grep sit
sit 10457 0
tunnel4 2957 1 sit

For CAP_SYS_MODULE module loading is still relaxed:

root@albatros:~# grep Cap /proc/$$/status
CapInh: 0000000000000000
CapPrm: ffffffffffffffff
CapEff: ffffffffffffffff
CapBnd: ffffffffffffffff
root@albatros:~# ifconfig xfs
xfs: error fetching interface information: Device not found
root@albatros:~# lsmod | grep xfs
xfs 745319 0

Reference: https://lkml.org/lkml/2011/2/24/203

Signed-off-by: Vasiliy Kulikov <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>
Acked-by: David S. Miller <[email protected]>
Acked-by: Kees Cook <[email protected]>
Signed-off-by: James Morris <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/linux/netdevice.h | 4 ++++
net/core/dev.c | 12 ++++++++++--
net/ipv4/ip_gre.c | 1 +
net/ipv4/ipip.c | 1 +
net/ipv6/sit.c | 2 +-
5 files changed, 17 insertions(+), 3 deletions(-)

--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2015,6 +2015,10 @@ static inline u32 dev_ethtool_get_flags(
return 0;
return dev->ethtool_ops->get_flags(dev);
}
+
+#define MODULE_ALIAS_NETDEV(device) \
+ MODULE_ALIAS("netdev-" device)
+
#endif /* __KERNEL__ */

#endif /* _LINUX_NETDEVICE_H */
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1037,13 +1037,21 @@ EXPORT_SYMBOL(netdev_bonding_change);
void dev_load(struct net *net, const char *name)
{
struct net_device *dev;
+ int no_module;

read_lock(&dev_base_lock);
dev = __dev_get_by_name(net, name);
read_unlock(&dev_base_lock);

- if (!dev && capable(CAP_NET_ADMIN))
- request_module("%s", name);
+ no_module = !dev;
+ if (no_module && capable(CAP_NET_ADMIN))
+ no_module = request_module("netdev-%s", name);
+ if (no_module && capable(CAP_SYS_MODULE)) {
+ if (!request_module("%s", name))
+ pr_err("Loading kernel module for a network device "
+"with CAP_SYS_MODULE (deprecated). Use CAP_NET_ADMIN and alias netdev-%s "
+"instead\n", name);
+ }
}
EXPORT_SYMBOL(dev_load);

--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -1708,3 +1708,4 @@ module_exit(ipgre_fini);
MODULE_LICENSE("GPL");
MODULE_ALIAS_RTNL_LINK("gre");
MODULE_ALIAS_RTNL_LINK("gretap");
+MODULE_ALIAS_NETDEV("gre0");
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -853,3 +853,4 @@ static void __exit ipip_fini(void)
module_init(ipip_init);
module_exit(ipip_fini);
MODULE_LICENSE("GPL");
+MODULE_ALIAS_NETDEV("tunl0");
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1101,4 +1101,4 @@ static int __init sit_init(void)
module_init(sit_init);
module_exit(sit_cleanup);
MODULE_LICENSE("GPL");
-MODULE_ALIAS("sit0");
+MODULE_ALIAS_NETDEV("sit0");

2011-03-11 20:43:23

by Greg KH

[permalink] [raw]

Subject: [14/17] nfsd: wrong index used in inner loop

2.6.32-longterm review patch. If anyone has any objections, please let us know.

------------------

From: roel <[email protected]>

commit 3ec07aa9522e3d5e9d5ede7bef946756e623a0a0 upstream.

Index i was already used in the outer loop

Signed-off-by: Roel Kluin <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/nfsd/nfs4xdr.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1114,7 +1114,7 @@ nfsd4_decode_create_session(struct nfsd4

u32 dummy;
char *machine_name;
- int i;
+ int i, j;
int nr_secflavs;

READ_BUF(16);
@@ -1187,7 +1187,7 @@ nfsd4_decode_create_session(struct nfsd4
READ_BUF(4);
READ32(dummy);
READ_BUF(dummy * 4);
- for (i = 0; i < dummy; ++i)
+ for (j = 0; j < dummy; ++j)
READ32(dummy);
break;
case RPC_AUTH_GSS:

2011-03-11 20:43:20

by Greg KH

[permalink] [raw]

Subject: [15/17] r8169: use RxFIFO overflow workaround for 8168c chipset.

2.6.32-longterm review patch. If anyone has any objections, please let us know.

------------------

From: Ivan Vecera <[email protected]>

commit b5ba6d12bdac21bc0620a5089e0f24e362645efd upstream.

I found that one of the 8168c chipsets (concretely XID 1c4000c0) starts
generating RxFIFO overflow errors. The result is an infinite loop in
interrupt handler as the RxFIFOOver is handled only for ...MAC_VER_11.
With the workaround everything goes fine.

Signed-off-by: Ivan Vecera <[email protected]>
Acked-by: Francois Romieu <[email protected]>
Cc: Hayes <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/r8169.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -3741,7 +3741,8 @@ static void rtl_hw_start_8168(struct net
RTL_W16(IntrMitigate, 0x5151);

/* Work around for RxFIFO overflow. */
- if (tp->mac_version == RTL_GIGA_MAC_VER_11) {
+ if (tp->mac_version == RTL_GIGA_MAC_VER_11 ||
+ tp->mac_version == RTL_GIGA_MAC_VER_22) {
tp->intr_event |= RxFIFOOver | PCSTimeout;
tp->intr_event &= ~RxOverflow;
}
@@ -4633,7 +4634,8 @@ static irqreturn_t rtl8169_interrupt(int

/* Work around for rx fifo overflow */
if (unlikely(status & RxFIFOOver) &&
- (tp->mac_version == RTL_GIGA_MAC_VER_11)) {
+ (tp->mac_version == RTL_GIGA_MAC_VER_11 ||
+ tp->mac_version == RTL_GIGA_MAC_VER_22)) {
netif_stop_queue(dev);
rtl8169_tx_timeout(dev);
break;

2011-03-11 20:43:10

by Greg KH

[permalink] [raw]

Subject: [06/17] powerpc/kdump: CPUs assume the context of the oopsing CPU

2.6.32-longterm review patch. If anyone has any objections, please let us know.

------------------

From: Anton Blanchard <[email protected]>

commit 0644079410065567e3bb31fcb8e6441f2b7685a9 upstream.

We wrap the crash_shutdown_handles[] calls with longjmp/setjmp, so if any
of them fault we can recover. The problem is we add a hook to the debugger
fault handler hook which calls longjmp unconditionally.

This first part of kdump is run before we marshall the other CPUs, so there
is a very good chance some CPU on the box is going to page fault. And when
it does it hits the longjmp code and assumes the context of the oopsing CPU.
The machine gets very confused when it has 10 CPUs all with the same stack,
all thinking they have the same CPU id. I get even more confused trying
to debug it.

The patch below adds crash_shutdown_cpu and uses it to specify which cpu is
in the protected region. Since it can only be -1 or the oopsing CPU, we don't
need to use memory barriers since it is only valid on the local CPU - no other
CPU will ever see a value that matches it's local CPU id.

Eventually we should switch the order and marshall all CPUs before doing the
crash_shutdown_handles[] calls, but that is a bigger fix.

Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Cc: Kamalesh babulal <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/powerpc/kernel/crash.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

--- a/arch/powerpc/kernel/crash.c
+++ b/arch/powerpc/kernel/crash.c
@@ -347,10 +347,12 @@ int crash_shutdown_unregister(crash_shut
EXPORT_SYMBOL(crash_shutdown_unregister);

static unsigned long crash_shutdown_buf[JMP_BUF_LEN];
+static int crash_shutdown_cpu = -1;

static int handle_fault(struct pt_regs *regs)
{
- longjmp(crash_shutdown_buf, 1);
+ if (crash_shutdown_cpu == smp_processor_id())
+ longjmp(crash_shutdown_buf, 1);
return 0;
}

@@ -388,6 +390,7 @@ void default_machine_crash_shutdown(stru
*/
old_handler = __debugger_fault_handler;
__debugger_fault_handler = handle_fault;
+ crash_shutdown_cpu = smp_processor_id();
for (i = 0; crash_shutdown_handles[i]; i++) {
if (setjmp(crash_shutdown_buf) == 0) {
/*
@@ -401,6 +404,7 @@ void default_machine_crash_shutdown(stru
asm volatile("sync; isync");
}
}
+ crash_shutdown_cpu = -1;
__debugger_fault_handler = old_handler;

/*

2011-03-11 20:43:18

by Greg KH

[permalink] [raw]

Subject: [16/17] Staging: comedi: jr3_pci: Dont ioremap too much space. Check result.

2.6.32-longterm review patch. If anyone has any objections, please let us know.

------------------

From: Ian Abbott <[email protected]>

commit fa5c5f4ce0c9ba03a670c640cad17e14cb35678b upstream.

For the JR3/PCI cards, the size of the PCIBAR0 region depends on the
number of channels. Don't try and ioremap space for 4 channels if the
card has fewer channels. Also check for ioremap failure.

Thanks to Anders Blomdell for input and Sami Hussein for testing.

Signed-off-by: Ian Abbott <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/staging/comedi/drivers/jr3_pci.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

--- a/drivers/staging/comedi/drivers/jr3_pci.c
+++ b/drivers/staging/comedi/drivers/jr3_pci.c
@@ -856,8 +856,11 @@ static int jr3_pci_attach(struct comedi_
}

devpriv->pci_enabled = 1;
- devpriv->iobase =
- ioremap(pci_resource_start(card, 0), sizeof(struct jr3_t));
+ devpriv->iobase = ioremap(pci_resource_start(card, 0),
+ offsetof(struct jr3_t, channel[devpriv->n_channels]));
+ if (!devpriv->iobase)
+ return -ENOMEM;
+
result = alloc_subdevices(dev, devpriv->n_channels);
if (result < 0)
goto out;

2011-03-11 20:44:36

by Greg KH

[permalink] [raw]

Subject: [12/17] powerpc/kexec: Fix orphaned offline CPUs across kexec

2.6.32-longterm review patch. If anyone has any objections, please let us know.

------------------

From: Matt Evans <[email protected]>

Commit: e8e5c2155b0035b6e04f29be67f6444bc914005b upstream

When CPU hotplug is used, some CPUs may be offline at the time a kexec is
performed. The subsequent kernel may expect these CPUs to be already running,
and will declare them stuck. On pseries, there's also a soft-offline (cede)
state that CPUs may be in; this can also cause problems as the kexeced kernel
may ask RTAS if they're online -- and RTAS would say they are. The CPU will
either appear stuck, or will cause a crash as we replace its cede loop beneath
it.

This patch kicks each present offline CPU awake before the kexec, so that
none are forever lost to these assumptions in the subsequent kernel.

Now, the behaviour is that all available CPUs that were offlined are now
online & usable after the kexec. This mimics the behaviour of a full reboot
(on which all CPUs will be restarted).

Signed-off-by: Matt Evans <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: Kamalesh babulal <[email protected]>
cc: Anton Blanchard <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/powerpc/kernel/machine_kexec_64.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)

--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -15,6 +15,7 @@
#include <linux/thread_info.h>
#include <linux/init_task.h>
#include <linux/errno.h>
+#include <linux/cpu.h>

#include <asm/page.h>
#include <asm/current.h>
@@ -169,10 +170,34 @@ static void kexec_smp_down(void *arg)
/* NOTREACHED */
}

+/*
+ * We need to make sure each present CPU is online. The next kernel will scan
+ * the device tree and assume primary threads are online and query secondary
+ * threads via RTAS to online them if required. If we don't online primary
+ * threads, they will be stuck. However, we also online secondary threads as we
+ * may be using 'cede offline'. In this case RTAS doesn't see the secondary
+ * threads as offline -- and again, these CPUs will be stuck.
+ *
+ * So, we online all CPUs that should be running, including secondary threads.
+ */
+static void wake_offline_cpus(void)
+{
+ int cpu = 0;
+
+ for_each_present_cpu(cpu) {
+ if (!cpu_online(cpu)) {
+ printk(KERN_INFO "kexec: Waking offline cpu %d.\n",
+ cpu);
+ cpu_up(cpu);
+ }
+ }
+}
+
static void kexec_prepare_cpus(void)
{
int my_cpu, i, notified=-1;

+ wake_offline_cpus();
smp_call_function(kexec_smp_down, NULL, /* wait */0);
my_cpu = get_cpu();

2011-03-11 20:44:38

by Greg KH

[permalink] [raw]

Subject: [11/17] powerpc/crashdump: Do not fail on NULL pointer dereferencing

2.6.32-longterm review patch. If anyone has any objections, please let us know.

------------------

From: Maxim Uvarov <[email protected]>

commit 426b6cb478e60352a463a0d1ec75c1c9fab30b13 upstream.

Signed-off-by: Maxim Uvarov <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: Kamalesh babulal <[email protected]>
cc: Anton Blanchard <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/powerpc/kernel/crash.c | 3 +++
1 file changed, 3 insertions(+)

--- a/arch/powerpc/kernel/crash.c
+++ b/arch/powerpc/kernel/crash.c
@@ -377,6 +377,9 @@ void default_machine_crash_shutdown(stru
for_each_irq(i) {
struct irq_desc *desc = irq_desc + i;

+ if (!desc || !desc->chip || !desc->chip->eoi)
+ continue;
+
if (desc->status & IRQ_INPROGRESS)
desc->chip->eoi(i);

2011-03-11 20:45:01

by Greg KH

[permalink] [raw]

Subject: [10/17] powerpc/kexec: Speedup kexec hash PTE tear down

2.6.32-longterm review patch. If anyone has any objections, please let us know.

------------------

From: Michael Neuling <[email protected]>

commit d504bed676caad29a3dba3d3727298c560628f5c upstream.

Currently for kexec the PTE tear down on 1TB segment systems normally
requires 3 hcalls for each PTE removal. On a machine with 32GB of
memory it can take around a minute to remove all the PTEs.

This optimises the path so that we only remove PTEs that are valid.
It also uses the read 4 PTEs at once HCALL. For the common case where
a PTEs is invalid in a 1TB segment, this turns the 3 HCALLs per PTE
down to 1 HCALL per 4 PTEs.

This gives an > 10x speedup in kexec times on PHYP, taking a 32GB
machine from around 1 minute down to a few seconds.

Signed-off-by: Michael Neuling <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: Kamalesh babulal <[email protected]>
cc: Anton Blanchard <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/powerpc/platforms/pseries/lpar.c | 33 ++++++++++++++++++++-------------
1 file changed, 20 insertions(+), 13 deletions(-)

--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -366,21 +366,28 @@ static void pSeries_lpar_hptab_clear(voi
{
unsigned long size_bytes = 1UL << ppc64_pft_size;
unsigned long hpte_count = size_bytes >> 4;
- unsigned long dummy1, dummy2, dword0;
+ struct {
+ unsigned long pteh;
+ unsigned long ptel;
+ } ptes[4];
long lpar_rc;
- int i;
+ int i, j;

- /* TODO: Use bulk call */
- for (i = 0; i < hpte_count; i++) {
- /* dont remove HPTEs with VRMA mappings */
- lpar_rc = plpar_pte_remove_raw(H_ANDCOND, i, HPTE_V_1TB_SEG,
- &dummy1, &dummy2);
- if (lpar_rc == H_NOT_FOUND) {
- lpar_rc = plpar_pte_read_raw(0, i, &dword0, &dummy1);
- if (!lpar_rc && ((dword0 & HPTE_V_VRMA_MASK)
- != HPTE_V_VRMA_MASK))
- /* Can be hpte for 1TB Seg. So remove it */
- plpar_pte_remove_raw(0, i, 0, &dummy1, &dummy2);
+ /* Read in batches of 4,
+ * invalidate only valid entries not in the VRMA
+ * hpte_count will be a multiple of 4
+ */
+ for (i = 0; i < hpte_count; i += 4) {
+ lpar_rc = plpar_pte_read_4_raw(0, i, (void *)ptes);
+ if (lpar_rc != H_SUCCESS)
+ continue;
+ for (j = 0; j < 4; j++){
+ if ((ptes[j].pteh & HPTE_V_VRMA_MASK) ==
+ HPTE_V_VRMA_MASK)
+ continue;
+ if (ptes[j].pteh & HPTE_V_VALID)
+ plpar_pte_remove_raw(0, i + j, 0,
+ &(ptes[j].pteh), &(ptes[j].ptel));
}
}
}

2011-03-11 20:45:19

by Greg KH

[permalink] [raw]

Subject: [09/17] powerpc/pseries: Add hcall to read 4 ptes at a time in real mode

2.6.32-longterm review patch. If anyone has any objections, please let us know.

------------------

From: Michael Neuling <[email protected]>

commit f90ece28c1f5b3ec13fe481406857fe92f4bc7d1 upstream.

This adds plpar_pte_read_4_raw() which can be used read 4 PTEs from
PHYP at a time, while in real mode.

It also creates a new hcall9 which can be used in real mode. It's the
same as plpar_hcall9 but minus the tracing hcall statistics which may
require variables outside the RMO.

Signed-off-by: Michael Neuling <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: Kamalesh babulal <[email protected]>
Cc: Anton Blanchard <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/powerpc/include/asm/hvcall.h | 1
arch/powerpc/platforms/pseries/hvCall.S | 38 ++++++++++++++++++++++++
arch/powerpc/platforms/pseries/plpar_wrappers.h | 18 +++++++++++
3 files changed, 57 insertions(+)

--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -268,6 +268,7 @@ long plpar_hcall_raw(unsigned long opcod
*/
#define PLPAR_HCALL9_BUFSIZE 9
long plpar_hcall9(unsigned long opcode, unsigned long *retbuf, ...);
+long plpar_hcall9_raw(unsigned long opcode, unsigned long *retbuf, ...);

/* For hcall instrumentation. One structure per-hcall, per-CPU */
struct hcall_stats {
--- a/arch/powerpc/platforms/pseries/hvCall.S
+++ b/arch/powerpc/platforms/pseries/hvCall.S
@@ -202,3 +202,41 @@ _GLOBAL(plpar_hcall9)
mtcrf 0xff,r0

blr /* return r3 = status */
+
+/* See plpar_hcall_raw to see why this is needed */
+_GLOBAL(plpar_hcall9_raw)
+ HMT_MEDIUM
+
+ mfcr r0
+ stw r0,8(r1)
+
+ std r4,STK_PARM(r4)(r1) /* Save ret buffer */
+
+ mr r4,r5
+ mr r5,r6
+ mr r6,r7
+ mr r7,r8
+ mr r8,r9
+ mr r9,r10
+ ld r10,STK_PARM(r11)(r1) /* put arg7 in R10 */
+ ld r11,STK_PARM(r12)(r1) /* put arg8 in R11 */
+ ld r12,STK_PARM(r13)(r1) /* put arg9 in R12 */
+
+ HVSC /* invoke the hypervisor */
+
+ mr r0,r12
+ ld r12,STK_PARM(r4)(r1)
+ std r4, 0(r12)
+ std r5, 8(r12)
+ std r6, 16(r12)
+ std r7, 24(r12)
+ std r8, 32(r12)
+ std r9, 40(r12)
+ std r10,48(r12)
+ std r11,56(r12)
+ std r0, 64(r12)
+
+ lwz r0,8(r1)
+ mtcrf 0xff,r0
+
+ blr /* return r3 = status */
--- a/arch/powerpc/platforms/pseries/plpar_wrappers.h
+++ b/arch/powerpc/platforms/pseries/plpar_wrappers.h
@@ -169,6 +169,24 @@ static inline long plpar_pte_read_raw(un
return rc;
}

+/*
+ * plpar_pte_read_4_raw can be called in real mode.
+ * ptes must be 8*sizeof(unsigned long)
+ */
+static inline long plpar_pte_read_4_raw(unsigned long flags, unsigned long ptex,
+ unsigned long *ptes)
+
+{
+ long rc;
+ unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
+
+ rc = plpar_hcall9_raw(H_READ, retbuf, flags | H_READ_4, ptex);
+
+ memcpy(ptes, retbuf, 8*sizeof(unsigned long));
+
+ return rc;
+}
+
static inline long plpar_pte_protect(unsigned long flags, unsigned long ptex,
unsigned long avpn)
{

2011-03-11 20:45:38

by Greg KH

[permalink] [raw]

Subject: [07/17] powerpc/kdump: Use chip->shutdown to disable IRQs

2.6.32-longterm review patch. If anyone has any objections, please let us know.

------------------

From: Anton Blanchard <[email protected]>

commit 5d7a87217de48b234b3c8ff8a73059947d822e07 upstream.

I saw this in a kdump kernel:

IOMMU table initialized, virtual merging enabled
Interrupt 155954 (real) is invalid, disabling it.
Interrupt 155953 (real) is invalid, disabling it.

ie we took some spurious interrupts. default_machine_crash_shutdown tries
to disable all interrupt sources but uses chip->disable which maps to
the default action of:

static void default_disable(unsigned int irq)
{
}

If we use chip->shutdown, then we actually mask the IRQ:

static void default_shutdown(unsigned int irq)
{
struct irq_desc *desc = irq_to_desc(irq);

desc->chip->mask(irq);
desc->status |= IRQ_MASKED;
}

Not sure why we don't implement a ->disable action for xics.c, or why
default_disable doesn't mask the interrupt.

Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: Kamalesh babulal <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/powerpc/kernel/crash.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/powerpc/kernel/crash.c
+++ b/arch/powerpc/kernel/crash.c
@@ -381,7 +381,7 @@ void default_machine_crash_shutdown(stru
desc->chip->eoi(i);

if (!(desc->status & IRQ_DISABLED))
- desc->chip->disable(i);
+ desc->chip->shutdown(i);
}

/*

2011-03-11 20:43:04

by Greg KH

[permalink] [raw]

Subject: [02/17] [S390] keyboard: integer underflow bug

2.6.32-longterm review patch. If anyone has any objections, please let us know.

------------------

From: Dan Carpenter <[email protected]>

commit b652277b09d3d030cb074cc6a98ba80b34244c03 upstream.

The "ct" variable should be an unsigned int. Both struct kbdiacrs
->kb_cnt and struct kbd_data ->accent_table_size are unsigned ints.

Making it signed causes a problem in KBDIACRUC because the user could
set the signed bit and cause a buffer overflow.

Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/s390/char/keyboard.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/s390/char/keyboard.c
+++ b/drivers/s390/char/keyboard.c
@@ -462,7 +462,8 @@ kbd_ioctl(struct kbd_data *kbd, struct f
unsigned int cmd, unsigned long arg)
{
void __user *argp;
- int ct, perm;
+ unsigned int ct;
+ int perm;

argp = (void __user *)arg;

2011-03-11 20:46:15

by Greg KH

[permalink] [raw]

Subject: [05/17] mm: fix possible cause of a page_mapped BUG

2.6.32-longterm review patch. If anyone has any objections, please let us know.

------------------

From: Hugh Dickins <[email protected]>

commit a3e8cc643d22d2c8ed36b9be7d9c9ca21efcf7f7 upstream.

Robert Swiecki reported a BUG_ON(page_mapped) from a fuzzer, punching
a hole with madvise(,, MADV_REMOVE). That path is under mutex, and
cannot be explained by lack of serialization in unmap_mapping_range().

Reviewing the code, I found one place where vm_truncate_count handling
should have been updated, when I switched at the last minute from one
way of managing the restart_addr to another: mremap move changes the
virtual addresses, so it ought to adjust the restart_addr.

But rather than exporting the notion of restart_addr from memory.c, or
converting to restart_pgoff throughout, simply reset vm_truncate_count
to 0 to force a rescan if mremap move races with preempted truncation.

We have no confirmation that this fixes Robert's BUG,
but it is a fix that's worth making anyway.

Signed-off-by: Hugh Dickins <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Cc: Kerin Millar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
mm/mremap.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -92,9 +92,7 @@ static void move_ptes(struct vm_area_str
*/
mapping = vma->vm_file->f_mapping;
spin_lock(&mapping->i_mmap_lock);
- if (new_vma->vm_truncate_count &&
- new_vma->vm_truncate_count != vma->vm_truncate_count)
- new_vma->vm_truncate_count = 0;
+ new_vma->vm_truncate_count = 0;
}

/*

2011-03-11 20:46:35

by Greg KH

[permalink] [raw]

Subject: [04/17] ixgbe: fix for 82599 erratum on Header Splitting

2.6.32-longterm review patch. If anyone has any objections, please let us know.

------------------

From: Don Skidmore <[email protected]>

commit a124339ad28389093ed15eca990d39c51c5736cc upstream.

We have found a hardware erratum on 82599 hardware that can lead to
unpredictable behavior when Header Splitting mode is enabled. So
we are no longer enabling this feature on affected hardware.

Please see the 82599 Specification Update for more information.

Signed-off-by: Don Skidmore <[email protected]>
Tested-by: Stephen Ko <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/ixgbe/ixgbe_main.c | 4 ++++
1 file changed, 4 insertions(+)

--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -2134,6 +2134,10 @@ static void ixgbe_configure_rx(struct ix
/* Decide whether to use packet split mode or not */
adapter->flags |= IXGBE_FLAG_RX_PS_ENABLED;

+ /* Disable packet split due to 82599 erratum #45 */
+ if (hw->mac.type == ixgbe_mac_82599EB)
+ adapter->flags &= ~IXGBE_FLAG_RX_PS_ENABLED;
+
/* Set the RX buffer length according to the mode */
if (adapter->flags & IXGBE_FLAG_RX_PS_ENABLED) {
rx_buf_len = IXGBE_RX_HDR_SIZE;

2011-03-11 22:22:20

by Tim Gardner

[permalink] [raw]

Subject: Re: [14/17] nfsd: wrong index used in inner loop

On 03/11/2011 08:40 PM, Greg KH wrote:
> 2.6.32-longterm review patch. If anyone has any objections, please let us know.
>
> ------------------
>
> From: roel<[email protected]>
>
> commit 3ec07aa9522e3d5e9d5ede7bef946756e623a0a0 upstream.
>
> Index i was already used in the outer loop
>
> Signed-off-by: Roel Kluin<[email protected]>
> Signed-off-by: J. Bruce Fields<[email protected]>
> Signed-off-by: Greg Kroah-Hartman<[email protected]>
>
> ---
> fs/nfsd/nfs4xdr.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -1114,7 +1114,7 @@ nfsd4_decode_create_session(struct nfsd4
>
> u32 dummy;
> char *machine_name;
> - int i;
> + int i, j;
> int nr_secflavs;
>
> READ_BUF(16);
> @@ -1187,7 +1187,7 @@ nfsd4_decode_create_session(struct nfsd4
> READ_BUF(4);
> READ32(dummy);
> READ_BUF(dummy * 4);
> - for (i = 0; i< dummy; ++i)
> + for (j = 0; j< dummy; ++j)
> READ32(dummy);
> break;
> case RPC_AUTH_GSS:
>
>
> --

I agree that fixing the index in this loop is a good thing, but its
caused me to look at the result:

for (j = 0; j< dummy; ++j)
READ32(dummy);

It seems to me that this loop might never terminate if the original
buffer is maliciously constructed, e.g., 0, 1, 2, 3, ... Is the data in
this buffer really that well vetted?

rtg
--
Tim Gardner [email protected]

2011-03-17 23:00:55

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [14/17] nfsd: wrong index used in inner loop

On Fri, Mar 11, 2011 at 10:21:58PM +0000, Tim Gardner wrote:
> On 03/11/2011 08:40 PM, Greg KH wrote:
> >2.6.32-longterm review patch. If anyone has any objections, please let us know.
> >
> >------------------
> >
> >From: roel<[email protected]>
> >
> >commit 3ec07aa9522e3d5e9d5ede7bef946756e623a0a0 upstream.
> >
> >Index i was already used in the outer loop
> >
> >Signed-off-by: Roel Kluin<[email protected]>
> >Signed-off-by: J. Bruce Fields<[email protected]>
> >Signed-off-by: Greg Kroah-Hartman<[email protected]>
> >
> >---
> > fs/nfsd/nfs4xdr.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> >--- a/fs/nfsd/nfs4xdr.c
> >+++ b/fs/nfsd/nfs4xdr.c
> >@@ -1114,7 +1114,7 @@ nfsd4_decode_create_session(struct nfsd4
> >
> > u32 dummy;
> > char *machine_name;
> >- int i;
> >+ int i, j;
> > int nr_secflavs;
> >
> > READ_BUF(16);
> >@@ -1187,7 +1187,7 @@ nfsd4_decode_create_session(struct nfsd4
> > READ_BUF(4);
> > READ32(dummy);
> > READ_BUF(dummy * 4);
> >- for (i = 0; i< dummy; ++i)
> >+ for (j = 0; j< dummy; ++j)
> > READ32(dummy);
> > break;
> > case RPC_AUTH_GSS:
> >
> >
> >--
>
> I agree that fixing the index in this loop is a good thing, but its
> caused me to look at the result:
>
> for (j = 0; j< dummy; ++j)
> READ32(dummy);
>
> It seems to me that this loop might never terminate if the original
> buffer is maliciously constructed, e.g., 0, 1, 2, 3, ... Is the data
> in this buffer really that well vetted?

Agreed, the code's still clearly bogus. In fact, we can just delete
that loop entirely; I have a patch queued up to send to Linus soon.

(But go ahead and apply this anyway, and then you'll get the followup
patch when it lands.)

--b.

2011-03-18 00:55:54

by Tim Gardner

[permalink] [raw]

Subject: Re: [14/17] nfsd: wrong index used in inner loop

On 03/17/2011 05:00 PM, J. Bruce Fields wrote:
> On Fri, Mar 11, 2011 at 10:21:58PM +0000, Tim Gardner wrote:
>> On 03/11/2011 08:40 PM, Greg KH wrote:
>>> 2.6.32-longterm review patch. If anyone has any objections, please let us know.
>>>
>>> ------------------
>>>
>>> From: roel<[email protected]>
>>>
>>> commit 3ec07aa9522e3d5e9d5ede7bef946756e623a0a0 upstream.
>>>
>>> Index i was already used in the outer loop
>>>
>>> Signed-off-by: Roel Kluin<[email protected]>
>>> Signed-off-by: J. Bruce Fields<[email protected]>
>>> Signed-off-by: Greg Kroah-Hartman<[email protected]>
>>>
>>> ---
>>> fs/nfsd/nfs4xdr.c | 4 ++--
>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> --- a/fs/nfsd/nfs4xdr.c
>>> +++ b/fs/nfsd/nfs4xdr.c
>>> @@ -1114,7 +1114,7 @@ nfsd4_decode_create_session(struct nfsd4
>>>
>>> u32 dummy;
>>> char *machine_name;
>>> - int i;
>>> + int i, j;
>>> int nr_secflavs;
>>>
>>> READ_BUF(16);
>>> @@ -1187,7 +1187,7 @@ nfsd4_decode_create_session(struct nfsd4
>>> READ_BUF(4);
>>> READ32(dummy);
>>> READ_BUF(dummy * 4);
>>> - for (i = 0; i< dummy; ++i)
>>> + for (j = 0; j< dummy; ++j)
>>> READ32(dummy);
>>> break;
>>> case RPC_AUTH_GSS:
>>>
>>>
>>> --
>>
>> I agree that fixing the index in this loop is a good thing, but its
>> caused me to look at the result:
>>
>> for (j = 0; j< dummy; ++j)
>> READ32(dummy);
>>
>> It seems to me that this loop might never terminate if the original
>> buffer is maliciously constructed, e.g., 0, 1, 2, 3, ... Is the data
>> in this buffer really that well vetted?
>
> Agreed, the code's still clearly bogus. In fact, we can just delete
> that loop entirely; I have a patch queued up to send to Linus soon.
>
> (But go ahead and apply this anyway, and then you'll get the followup
> patch when it lands.)
>
> --b.
>

Will do. Thanks for the update.

rtg
--
Tim Gardner [email protected]