This is the start of the stable review cycle for the 4.19.148 release.
There are 37 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 27 Sep 2020 12:47:02 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.148-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <[email protected]>
Linux 4.19.148-rc1
Lukas Wunner <[email protected]>
serial: 8250: Avoid error message on reprobe
Priyaranjan Jha <[email protected]>
tcp_bbr: adapt cwnd based on ack aggregation estimation
Priyaranjan Jha <[email protected]>
tcp_bbr: refactor bbr_target_cwnd() for general inflight provisioning
Xunlei Pang <[email protected]>
mm: memcg: fix memcg reclaim soft lockup
Masahiro Yamada <[email protected]>
kbuild: support LLVM=1 to switch the default tools to Clang/LLVM
Masahiro Yamada <[email protected]>
kbuild: replace AS=clang with LLVM_IAS=1
Masahiro Yamada <[email protected]>
kbuild: remove AS variable
Dmitry Golovin <[email protected]>
x86/boot: kbuild: allow readelf executable to be specified
Masahiro Yamada <[email protected]>
net: wan: wanxl: use $(M68KCC) instead of $(M68KAS) for rebuilding firmware
Masahiro Yamada <[email protected]>
net: wan: wanxl: use allow to pass CROSS_COMPILE_M68k for rebuilding firmware
Fangrui Song <[email protected]>
Documentation/llvm: fix the name of llvm-size
Nick Desaulniers <[email protected]>
Documentation/llvm: add documentation on building w/ Clang/LLVM
Vasily Gorbik <[email protected]>
kbuild: add OBJSIZE variable for the size tool
Nick Desaulniers <[email protected]>
MAINTAINERS: add CLANG/LLVM BUILD SUPPORT info
David Ahern <[email protected]>
ipv4: Update exception handling for multipath routes via same device
Eric Dumazet <[email protected]>
net: add __must_check to skb_put_padto()
Eric Dumazet <[email protected]>
net: qrtr: check skb_put_padto() return value
Florian Fainelli <[email protected]>
net: phy: Avoid NPD upon phy_detach() when driver is unbound
Michael Chan <[email protected]>
bnxt_en: Protect bnxt_set_eee() and bnxt_set_pauseparam() with mutex.
Edwin Peer <[email protected]>
bnxt_en: return proper error codes in bnxt_show_temp
Xin Long <[email protected]>
tipc: use skb_unshare() instead in tipc_buf_append()
Tetsuo Handa <[email protected]>
tipc: fix shutdown() of connection oriented socket
Peilin Ye <[email protected]>
tipc: Fix memory leak in tipc_group_create_member()
Jakub Kicinski <[email protected]>
nfp: use correct define to return NONE fec
Yunsheng Lin <[email protected]>
net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc
Necip Fazil Yildiran <[email protected]>
net: ipv6: fix kconfig dependency warning for IPV6_SEG6_HMAC
Linus Walleij <[email protected]>
net: dsa: rtl8366: Properly clear member config
Petr Machata <[email protected]>
net: DCB: Validate DCB_ATTR_DCB_BUFFER argument
Eric Dumazet <[email protected]>
ipv6: avoid lockdep issue in fib6_del()
Wei Wang <[email protected]>
ip: fix tos reflection in ack and reset packets
Dan Carpenter <[email protected]>
hdlc_ppp: add range checks in ppp_cp_parse_cr()
Mark Gray <[email protected]>
geneve: add transport ports in route lookup for geneve
Ganji Aravind <[email protected]>
cxgb4: Fix offset when clearing filter byte counters
Ralph Campbell <[email protected]>
mm/thp: fix __split_huge_pmd_locked() for migration PMD
Muchun Song <[email protected]>
kprobes: fix kill kprobe which has been marked as gone
Rustam Kovhaev <[email protected]>
KVM: fix memory leak in kvm_io_bus_unregister_dev()
Mark Salyzyn <[email protected]>
af_key: pfkey_dump needs parameter validation
-------------
Diffstat:
Documentation/kbuild/llvm.rst | 87 ++++++++++
MAINTAINERS | 9 ++
Makefile | 40 +++--
arch/x86/boot/compressed/Makefile | 2 +-
drivers/net/dsa/rtl8366.c | 20 ++-
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 19 ++-
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 31 ++--
drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c | 9 +-
.../net/ethernet/netronome/nfp/nfp_net_ethtool.c | 4 +-
drivers/net/geneve.c | 37 +++--
drivers/net/phy/phy_device.c | 3 +-
drivers/net/wan/Kconfig | 2 +-
drivers/net/wan/Makefile | 12 +-
drivers/net/wan/hdlc_ppp.c | 16 +-
drivers/tty/serial/8250/8250_core.c | 11 +-
include/linux/skbuff.h | 7 +-
include/net/inet_connection_sock.h | 4 +-
kernel/kprobes.c | 9 +-
mm/huge_memory.c | 40 +++--
mm/vmscan.c | 8 +
net/dcb/dcbnl.c | 8 +
net/ipv4/ip_output.c | 3 +-
net/ipv4/route.c | 11 +-
net/ipv4/tcp_bbr.c | 180 ++++++++++++++++++---
net/ipv6/Kconfig | 1 +
net/ipv6/ip6_fib.c | 13 +-
net/key/af_key.c | 7 +
net/qrtr/qrtr.c | 20 +--
net/sched/sch_generic.c | 49 ++++--
net/tipc/group.c | 14 +-
net/tipc/msg.c | 3 +-
net/tipc/socket.c | 5 +-
tools/objtool/Makefile | 6 +
virt/kvm/kvm_main.c | 21 +--
34 files changed, 550 insertions(+), 161 deletions(-)
From: Wei Wang <[email protected]>
[ Upstream commit ba9e04a7ddf4f22a10e05bf9403db6b97743c7bf ]
Currently, in tcp_v4_reqsk_send_ack() and tcp_v4_send_reset(), we
echo the TOS value of the received packets in the response.
However, we do not want to echo the lower 2 ECN bits in accordance
with RFC 3168 6.1.5 robustness principles.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Wei Wang <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/ip_output.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -73,6 +73,7 @@
#include <net/icmp.h>
#include <net/checksum.h>
#include <net/inetpeer.h>
+#include <net/inet_ecn.h>
#include <net/lwtunnel.h>
#include <linux/bpf-cgroup.h>
#include <linux/igmp.h>
@@ -1582,7 +1583,7 @@ void ip_send_unicast_reply(struct sock *
if (IS_ERR(rt))
return;
- inet_sk(sk)->tos = arg->tos;
+ inet_sk(sk)->tos = arg->tos & ~INET_ECN_MASK;
sk->sk_priority = skb->priority;
sk->sk_protocol = ip_hdr(skb)->protocol;
From: Michael Chan <[email protected]>
[ Upstream commit a53906908148d64423398a62c4435efb0d09652c ]
All changes related to bp->link_info require the protection of the
link_lock mutex. It's not sufficient to rely just on RTNL.
Fixes: 163e9ef63641 ("bnxt_en: Fix race when modifying pause settings.")
Reviewed-by: Edwin Peer <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 31 ++++++++++++++--------
1 file changed, 20 insertions(+), 11 deletions(-)
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -1369,9 +1369,12 @@ static int bnxt_set_pauseparam(struct ne
if (!BNXT_SINGLE_PF(bp))
return -EOPNOTSUPP;
+ mutex_lock(&bp->link_lock);
if (epause->autoneg) {
- if (!(link_info->autoneg & BNXT_AUTONEG_SPEED))
- return -EINVAL;
+ if (!(link_info->autoneg & BNXT_AUTONEG_SPEED)) {
+ rc = -EINVAL;
+ goto pause_exit;
+ }
link_info->autoneg |= BNXT_AUTONEG_FLOW_CTRL;
if (bp->hwrm_spec_code >= 0x10201)
@@ -1392,11 +1395,11 @@ static int bnxt_set_pauseparam(struct ne
if (epause->tx_pause)
link_info->req_flow_ctrl |= BNXT_LINK_PAUSE_TX;
- if (netif_running(dev)) {
- mutex_lock(&bp->link_lock);
+ if (netif_running(dev))
rc = bnxt_hwrm_set_pause(bp);
- mutex_unlock(&bp->link_lock);
- }
+
+pause_exit:
+ mutex_unlock(&bp->link_lock);
return rc;
}
@@ -2113,8 +2116,7 @@ static int bnxt_set_eee(struct net_devic
struct bnxt *bp = netdev_priv(dev);
struct ethtool_eee *eee = &bp->eee;
struct bnxt_link_info *link_info = &bp->link_info;
- u32 advertising =
- _bnxt_fw_to_ethtool_adv_spds(link_info->advertising, 0);
+ u32 advertising;
int rc = 0;
if (!BNXT_SINGLE_PF(bp))
@@ -2123,19 +2125,23 @@ static int bnxt_set_eee(struct net_devic
if (!(bp->flags & BNXT_FLAG_EEE_CAP))
return -EOPNOTSUPP;
+ mutex_lock(&bp->link_lock);
+ advertising = _bnxt_fw_to_ethtool_adv_spds(link_info->advertising, 0);
if (!edata->eee_enabled)
goto eee_ok;
if (!(link_info->autoneg & BNXT_AUTONEG_SPEED)) {
netdev_warn(dev, "EEE requires autoneg\n");
- return -EINVAL;
+ rc = -EINVAL;
+ goto eee_exit;
}
if (edata->tx_lpi_enabled) {
if (bp->lpi_tmr_hi && (edata->tx_lpi_timer > bp->lpi_tmr_hi ||
edata->tx_lpi_timer < bp->lpi_tmr_lo)) {
netdev_warn(dev, "Valid LPI timer range is %d and %d microsecs\n",
bp->lpi_tmr_lo, bp->lpi_tmr_hi);
- return -EINVAL;
+ rc = -EINVAL;
+ goto eee_exit;
} else if (!bp->lpi_tmr_hi) {
edata->tx_lpi_timer = eee->tx_lpi_timer;
}
@@ -2145,7 +2151,8 @@ static int bnxt_set_eee(struct net_devic
} else if (edata->advertised & ~advertising) {
netdev_warn(dev, "EEE advertised %x must be a subset of autoneg advertised speeds %x\n",
edata->advertised, advertising);
- return -EINVAL;
+ rc = -EINVAL;
+ goto eee_exit;
}
eee->advertised = edata->advertised;
@@ -2157,6 +2164,8 @@ eee_ok:
if (netif_running(dev))
rc = bnxt_hwrm_set_link_setting(bp, false, true);
+eee_exit:
+ mutex_unlock(&bp->link_lock);
return rc;
}
From: Florian Fainelli <[email protected]>
[ Upstream commit c2b727df7caa33876e7066bde090f40001b6d643 ]
If we have unbound the PHY driver prior to calling phy_detach() (often
via phy_disconnect()) then we can cause a NULL pointer de-reference
accessing the driver owner member. The steps to reproduce are:
echo unimac-mdio-0:01 > /sys/class/net/eth0/phydev/driver/unbind
ip link set eth0 down
Fixes: cafe8df8b9bc ("net: phy: Fix lack of reference count on PHY driver")
Signed-off-by: Florian Fainelli <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/phy/phy_device.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1154,7 +1154,8 @@ void phy_detach(struct phy_device *phyde
phy_led_triggers_unregister(phydev);
- module_put(phydev->mdio.dev.driver->owner);
+ if (phydev->mdio.dev.driver)
+ module_put(phydev->mdio.dev.driver->owner);
/* If the device had no specific driver before (i.e. - it
* was using the generic driver), we unbind the device
From: Masahiro Yamada <[email protected]>
commit 63b903dfebdea92aa92ad337d8451a6fbfeabf9d upstream.
As far as I understood from the Kconfig help text, this build rule is
used to rebuild the driver firmware, which runs on an old m68k-based
chip. So, you need m68k tools for the firmware rebuild.
wanxl.c is a PCI driver, but CONFIG_M68K does not select CONFIG_HAVE_PCI.
So, you cannot enable CONFIG_WANXL_BUILD_FIRMWARE for ARCH=m68k. In other
words, ifeq ($(ARCH),m68k) is false here.
I am keeping the dead code for now, but rebuilding the firmware requires
'as68k' and 'ld68k', which I do not have in hand.
Instead, the kernel.org m68k GCC [1] successfully built it.
Allowing a user to pass in CROSS_COMPILE_M68K= is handier.
[1] https://mirrors.edge.kernel.org/pub/tools/crosstool/files/bin/x86_64/9.2.0/x86_64-gcc-9.2.0-nolibc-m68k-linux.tar.xz
Suggested-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/wan/Kconfig | 2 +-
drivers/net/wan/Makefile | 12 ++++++------
2 files changed, 7 insertions(+), 7 deletions(-)
--- a/drivers/net/wan/Kconfig
+++ b/drivers/net/wan/Kconfig
@@ -199,7 +199,7 @@ config WANXL_BUILD_FIRMWARE
depends on WANXL && !PREVENT_FIRMWARE_BUILD
help
Allows you to rebuild firmware run by the QUICC processor.
- It requires as68k, ld68k and hexdump programs.
+ It requires m68k toolchains and hexdump programs.
You should never need this option, say N.
--- a/drivers/net/wan/Makefile
+++ b/drivers/net/wan/Makefile
@@ -41,17 +41,17 @@ $(obj)/wanxl.o: $(obj)/wanxlfw.inc
ifeq ($(CONFIG_WANXL_BUILD_FIRMWARE),y)
ifeq ($(ARCH),m68k)
- AS68K = $(AS)
- LD68K = $(LD)
+ M68KAS = $(AS)
+ M68KLD = $(LD)
else
- AS68K = as68k
- LD68K = ld68k
+ M68KAS = $(CROSS_COMPILE_M68K)as
+ M68KLD = $(CROSS_COMPILE_M68K)ld
endif
quiet_cmd_build_wanxlfw = BLD FW $@
cmd_build_wanxlfw = \
- $(CPP) -D__ASSEMBLY__ -Wp,-MD,$(depfile) -I$(srctree)/include/uapi $< | $(AS68K) -m68360 -o $(obj)/wanxlfw.o; \
- $(LD68K) --oformat binary -Ttext 0x1000 $(obj)/wanxlfw.o -o $(obj)/wanxlfw.bin; \
+ $(CPP) -D__ASSEMBLY__ -Wp,-MD,$(depfile) -I$(srctree)/include/uapi $< | $(M68KAS) -m68360 -o $(obj)/wanxlfw.o; \
+ $(M68KLD) --oformat binary -Ttext 0x1000 $(obj)/wanxlfw.o -o $(obj)/wanxlfw.bin; \
hexdump -ve '"\n" 16/1 "0x%02X,"' $(obj)/wanxlfw.bin | sed 's/0x ,//g;1s/^/static const u8 firmware[]={/;$$s/,$$/\n};\n/' >$(obj)/wanxlfw.inc; \
rm -f $(obj)/wanxlfw.bin $(obj)/wanxlfw.o
From: Fangrui Song <[email protected]>
commit 0f44fbc162b737ff6251ae248184390ae2279fee upstream.
The tool is called llvm-size, not llvm-objsize.
Fixes: fcf1b6a35c16 ("Documentation/llvm: add documentation on building w/ Clang/LLVM")
Signed-off-by: Fangrui Song <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
Documentation/kbuild/llvm.rst | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/Documentation/kbuild/llvm.rst
+++ b/Documentation/kbuild/llvm.rst
@@ -51,7 +51,7 @@ LLVM has substitutes for GNU binutils ut
additional parameters to `make`.
make CC=clang AS=clang LD=ld.lld AR=llvm-ar NM=llvm-nm STRIP=llvm-strip \\
- OBJCOPY=llvm-objcopy OBJDUMP=llvm-objdump OBJSIZE=llvm-objsize \\
+ OBJCOPY=llvm-objcopy OBJDUMP=llvm-objdump OBJSIZE=llvm-size \\
READELF=llvm-readelf HOSTCC=clang HOSTCXX=clang++ HOSTAR=llvm-ar \\
HOSTLD=ld.lld
From: Dmitry Golovin <[email protected]>
commit eefb8c124fd969e9a174ff2bedff86aa305a7438 upstream.
Introduce a new READELF variable to top-level Makefile, so the name of
readelf binary can be specified.
Before this change the name of the binary was hardcoded to
"$(CROSS_COMPILE)readelf" which might not be present for every
toolchain.
This allows to build with LLVM Object Reader by using make parameter
READELF=llvm-readelf.
Link: https://github.com/ClangBuiltLinux/linux/issues/771
Signed-off-by: Dmitry Golovin <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
[nd: conflict in exported vars list from not backporting commit
e83b9f55448a ("kbuild: add ability to generate BTF type info for vmlinux")]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
Makefile | 3 ++-
arch/x86/boot/compressed/Makefile | 2 +-
2 files changed, 3 insertions(+), 2 deletions(-)
--- a/Makefile
+++ b/Makefile
@@ -378,6 +378,7 @@ STRIP = $(CROSS_COMPILE)strip
OBJCOPY = $(CROSS_COMPILE)objcopy
OBJDUMP = $(CROSS_COMPILE)objdump
OBJSIZE = $(CROSS_COMPILE)size
+READELF = $(CROSS_COMPILE)readelf
LEX = flex
YACC = bison
AWK = awk
@@ -434,7 +435,7 @@ GCC_PLUGINS_CFLAGS :=
CLANG_FLAGS :=
export ARCH SRCARCH CONFIG_SHELL HOSTCC KBUILD_HOSTCFLAGS CROSS_COMPILE AS LD CC
-export CPP AR NM STRIP OBJCOPY OBJDUMP OBJSIZE KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS
+export CPP AR NM STRIP OBJCOPY OBJDUMP OBJSIZE READELF KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS
export MAKE LEX YACC AWK GENKSYMS INSTALLKERNEL PERL PYTHON PYTHON2 PYTHON3 UTS_MACHINE
export HOSTCXX KBUILD_HOSTCXXFLAGS LDFLAGS_MODULE CHECK CHECKFLAGS
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -102,7 +102,7 @@ vmlinux-objs-$(CONFIG_EFI_MIXED) += $(ob
quiet_cmd_check_data_rel = DATAREL $@
define cmd_check_data_rel
for obj in $(filter %.o,$^); do \
- ${CROSS_COMPILE}readelf -S $$obj | grep -qF .rel.local && { \
+ $(READELF) -S $$obj | grep -qF .rel.local && { \
echo "error: $$obj has data relocations!" >&2; \
exit 1; \
} || true; \
From: Edwin Peer <[email protected]>
[ Upstream commit d69753fa1ecb3218b56b022722f7a5822735b876 ]
Returning "unknown" as a temperature value violates the hwmon interface
rules. Appropriate error codes should be returned via device_attribute
show instead. These will ultimately be propagated to the user via the
file system interface.
In addition to the corrected error handling, it is an even better idea to
not present the sensor in sysfs at all if it is known that the read will
definitely fail. Given that temp1_input is currently the only sensor
reported, ensure no hwmon registration if TEMP_MONITOR_QUERY is not
supported or if it will fail due to access permissions. Something smarter
may be needed if and when other sensors are added.
Fixes: 12cce90b934b ("bnxt_en: fix HWRM error when querying VF temperature")
Signed-off-by: Edwin Peer <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -6837,18 +6837,16 @@ static ssize_t bnxt_show_temp(struct dev
struct hwrm_temp_monitor_query_output *resp;
struct bnxt *bp = dev_get_drvdata(dev);
u32 len = 0;
+ int rc;
resp = bp->hwrm_cmd_resp_addr;
bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_TEMP_MONITOR_QUERY, -1, -1);
mutex_lock(&bp->hwrm_cmd_lock);
- if (!_hwrm_send_message_silent(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT))
+ rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+ if (!rc)
len = sprintf(buf, "%u\n", resp->temp * 1000); /* display millidegree */
mutex_unlock(&bp->hwrm_cmd_lock);
-
- if (len)
- return len;
-
- return sprintf(buf, "unknown\n");
+ return rc ?: len;
}
static SENSOR_DEVICE_ATTR(temp1_input, 0444, bnxt_show_temp, NULL, 0);
@@ -6868,7 +6866,16 @@ static void bnxt_hwmon_close(struct bnxt
static void bnxt_hwmon_open(struct bnxt *bp)
{
+ struct hwrm_temp_monitor_query_input req = {0};
struct pci_dev *pdev = bp->pdev;
+ int rc;
+
+ bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_TEMP_MONITOR_QUERY, -1, -1);
+ rc = hwrm_send_message_silent(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+ if (rc == -EACCES || rc == -EOPNOTSUPP) {
+ bnxt_hwmon_close(bp);
+ return;
+ }
bp->hwmon_dev = hwmon_device_register_with_groups(&pdev->dev,
DRV_MODULE_NAME, bp,
From: Masahiro Yamada <[email protected]>
commit a0d1c951ef08ed24f35129267e3595d86f57f5d3 upstream.
As Documentation/kbuild/llvm.rst implies, building the kernel with a
full set of LLVM tools gets very verbose and unwieldy.
Provide a single switch LLVM=1 to use Clang and LLVM tools instead
of GCC and Binutils. You can pass it from the command line or as an
environment variable.
Please note LLVM=1 does not turn on the integrated assembler. You need
to pass LLVM_IAS=1 to use it. When the upstream kernel is ready for the
integrated assembler, I think we can make it default.
We discussed what we need, and we agreed to go with a simple boolean
flag that switches both target and host tools:
https://lkml.org/lkml/2020/3/28/494
https://lkml.org/lkml/2020/4/3/43
Some items discussed, but not adopted:
- LLVM_DIR
When multiple versions of LLVM are installed, I just thought supporting
LLVM_DIR=/path/to/my/llvm/bin/ might be useful.
CC = $(LLVM_DIR)clang
LD = $(LLVM_DIR)ld.lld
...
However, we can handle this by modifying PATH. So, we decided to not do
this.
- LLVM_SUFFIX
Some distributions (e.g. Debian) package specific versions of LLVM with
naming conventions that use the version as a suffix.
CC = clang$(LLVM_SUFFIX)
LD = ld.lld(LLVM_SUFFIX)
...
will allow a user to pass LLVM_SUFFIX=-11 to use clang-11 etc.,
but the suffixed versions in /usr/bin/ are symlinks to binaries in
/usr/lib/llvm-#/bin/, so this can also be handled by PATH.
Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Tested-by: Nathan Chancellor <[email protected]> # build
Tested-by: Nick Desaulniers <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
[nd: conflict in exported vars list from not backporting commit
e83b9f55448a ("kbuild: add ability to generate BTF type info for vmlinux")]
[nd: hunk against Documentation/kbuild/kbuild.rst dropped due to not backporting
commit cd238effefa2 ("docs: kbuild: convert docs to ReST and rename to *.rst")]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
Documentation/kbuild/llvm.rst | 8 ++++++--
Makefile | 29 +++++++++++++++++++++++------
tools/objtool/Makefile | 6 ++++++
3 files changed, 35 insertions(+), 8 deletions(-)
--- a/Documentation/kbuild/llvm.rst
+++ b/Documentation/kbuild/llvm.rst
@@ -47,8 +47,12 @@ example:
LLVM Utilities
--------------
-LLVM has substitutes for GNU binutils utilities. These can be invoked as
-additional parameters to `make`.
+LLVM has substitutes for GNU binutils utilities. Kbuild supports `LLVM=1`
+to enable them.
+
+ make LLVM=1
+
+They can be enabled individually. The full list of the parameters:
make CC=clang LD=ld.lld AR=llvm-ar NM=llvm-nm STRIP=llvm-strip \\
OBJCOPY=llvm-objcopy OBJDUMP=llvm-objdump OBJSIZE=llvm-size \\
--- a/Makefile
+++ b/Makefile
@@ -358,8 +358,13 @@ HOST_LFS_CFLAGS := $(shell getconf LFS_C
HOST_LFS_LDFLAGS := $(shell getconf LFS_LDFLAGS 2>/dev/null)
HOST_LFS_LIBS := $(shell getconf LFS_LIBS 2>/dev/null)
-HOSTCC = gcc
-HOSTCXX = g++
+ifneq ($(LLVM),)
+HOSTCC = clang
+HOSTCXX = clang++
+else
+HOSTCC = gcc
+HOSTCXX = g++
+endif
KBUILD_HOSTCFLAGS := -Wall -Wmissing-prototypes -Wstrict-prototypes -O2 \
-fomit-frame-pointer -std=gnu89 $(HOST_LFS_CFLAGS) \
$(HOSTCFLAGS)
@@ -368,16 +373,28 @@ KBUILD_HOSTLDFLAGS := $(HOST_LFS_LDFLAG
KBUILD_HOSTLDLIBS := $(HOST_LFS_LIBS) $(HOSTLDLIBS)
# Make variables (CC, etc...)
-LD = $(CROSS_COMPILE)ld
-CC = $(CROSS_COMPILE)gcc
CPP = $(CC) -E
+ifneq ($(LLVM),)
+CC = clang
+LD = ld.lld
+AR = llvm-ar
+NM = llvm-nm
+OBJCOPY = llvm-objcopy
+OBJDUMP = llvm-objdump
+READELF = llvm-readelf
+OBJSIZE = llvm-size
+STRIP = llvm-strip
+else
+CC = $(CROSS_COMPILE)gcc
+LD = $(CROSS_COMPILE)ld
AR = $(CROSS_COMPILE)ar
NM = $(CROSS_COMPILE)nm
-STRIP = $(CROSS_COMPILE)strip
OBJCOPY = $(CROSS_COMPILE)objcopy
OBJDUMP = $(CROSS_COMPILE)objdump
-OBJSIZE = $(CROSS_COMPILE)size
READELF = $(CROSS_COMPILE)readelf
+OBJSIZE = $(CROSS_COMPILE)size
+STRIP = $(CROSS_COMPILE)strip
+endif
LEX = flex
YACC = bison
AWK = awk
--- a/tools/objtool/Makefile
+++ b/tools/objtool/Makefile
@@ -7,9 +7,15 @@ ARCH := x86
endif
# always use the host compiler
+ifneq ($(LLVM),)
+HOSTAR ?= llvm-ar
+HOSTCC ?= clang
+HOSTLD ?= ld.lld
+else
HOSTAR ?= ar
HOSTCC ?= gcc
HOSTLD ?= ld
+endif
AR = $(HOSTAR)
CC = $(HOSTCC)
LD = $(HOSTLD)
From: Priyaranjan Jha <[email protected]>
commit 232aa8ec3ed979d4716891540c03a806ecab0c37 upstream.
Because bbr_target_cwnd() is really a general-purpose BBR helper for
computing some volume of inflight data as a function of the estimated
BDP, refactor it into following helper functions:
- bbr_bdp()
- bbr_quantization_budget()
- bbr_inflight()
Signed-off-by: Priyaranjan Jha <[email protected]>
Signed-off-by: Neal Cardwell <[email protected]>
Signed-off-by: Yuchung Cheng <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_bbr.c | 60 ++++++++++++++++++++++++++++++++++-------------------
1 file changed, 39 insertions(+), 21 deletions(-)
--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -315,30 +315,19 @@ static void bbr_cwnd_event(struct sock *
}
}
-/* Find target cwnd. Right-size the cwnd based on min RTT and the
- * estimated bottleneck bandwidth:
+/* Calculate bdp based on min RTT and the estimated bottleneck bandwidth:
*
- * cwnd = bw * min_rtt * gain = BDP * gain
+ * bdp = bw * min_rtt * gain
*
* The key factor, gain, controls the amount of queue. While a small gain
* builds a smaller queue, it becomes more vulnerable to noise in RTT
* measurements (e.g., delayed ACKs or other ACK compression effects). This
* noise may cause BBR to under-estimate the rate.
- *
- * To achieve full performance in high-speed paths, we budget enough cwnd to
- * fit full-sized skbs in-flight on both end hosts to fully utilize the path:
- * - one skb in sending host Qdisc,
- * - one skb in sending host TSO/GSO engine
- * - one skb being received by receiver host LRO/GRO/delayed-ACK engine
- * Don't worry, at low rates (bbr_min_tso_rate) this won't bloat cwnd because
- * in such cases tso_segs_goal is 1. The minimum cwnd is 4 packets,
- * which allows 2 outstanding 2-packet sequences, to try to keep pipe
- * full even with ACK-every-other-packet delayed ACKs.
*/
-static u32 bbr_target_cwnd(struct sock *sk, u32 bw, int gain)
+static u32 bbr_bdp(struct sock *sk, u32 bw, int gain)
{
struct bbr *bbr = inet_csk_ca(sk);
- u32 cwnd;
+ u32 bdp;
u64 w;
/* If we've never had a valid RTT sample, cap cwnd at the initial
@@ -353,7 +342,24 @@ static u32 bbr_target_cwnd(struct sock *
w = (u64)bw * bbr->min_rtt_us;
/* Apply a gain to the given value, then remove the BW_SCALE shift. */
- cwnd = (((w * gain) >> BBR_SCALE) + BW_UNIT - 1) / BW_UNIT;
+ bdp = (((w * gain) >> BBR_SCALE) + BW_UNIT - 1) / BW_UNIT;
+
+ return bdp;
+}
+
+/* To achieve full performance in high-speed paths, we budget enough cwnd to
+ * fit full-sized skbs in-flight on both end hosts to fully utilize the path:
+ * - one skb in sending host Qdisc,
+ * - one skb in sending host TSO/GSO engine
+ * - one skb being received by receiver host LRO/GRO/delayed-ACK engine
+ * Don't worry, at low rates (bbr_min_tso_rate) this won't bloat cwnd because
+ * in such cases tso_segs_goal is 1. The minimum cwnd is 4 packets,
+ * which allows 2 outstanding 2-packet sequences, to try to keep pipe
+ * full even with ACK-every-other-packet delayed ACKs.
+ */
+static u32 bbr_quantization_budget(struct sock *sk, u32 cwnd, int gain)
+{
+ struct bbr *bbr = inet_csk_ca(sk);
/* Allow enough full-sized skbs in flight to utilize end systems. */
cwnd += 3 * bbr_tso_segs_goal(sk);
@@ -368,6 +374,17 @@ static u32 bbr_target_cwnd(struct sock *
return cwnd;
}
+/* Find inflight based on min RTT and the estimated bottleneck bandwidth. */
+static u32 bbr_inflight(struct sock *sk, u32 bw, int gain)
+{
+ u32 inflight;
+
+ inflight = bbr_bdp(sk, bw, gain);
+ inflight = bbr_quantization_budget(sk, inflight, gain);
+
+ return inflight;
+}
+
/* An optimization in BBR to reduce losses: On the first round of recovery, we
* follow the packet conservation principle: send P packets per P packets acked.
* After that, we slow-start and send at most 2*P packets per P packets acked.
@@ -429,7 +446,8 @@ static void bbr_set_cwnd(struct sock *sk
goto done;
/* If we're below target cwnd, slow start cwnd toward target cwnd. */
- target_cwnd = bbr_target_cwnd(sk, bw, gain);
+ target_cwnd = bbr_bdp(sk, bw, gain);
+ target_cwnd = bbr_quantization_budget(sk, target_cwnd, gain);
if (bbr_full_bw_reached(sk)) /* only cut cwnd if we filled the pipe */
cwnd = min(cwnd + acked, target_cwnd);
else if (cwnd < target_cwnd || tp->delivered < TCP_INIT_CWND)
@@ -470,14 +488,14 @@ static bool bbr_is_next_cycle_phase(stru
if (bbr->pacing_gain > BBR_UNIT)
return is_full_length &&
(rs->losses || /* perhaps pacing_gain*BDP won't fit */
- inflight >= bbr_target_cwnd(sk, bw, bbr->pacing_gain));
+ inflight >= bbr_inflight(sk, bw, bbr->pacing_gain));
/* A pacing_gain < 1.0 tries to drain extra queue we added if bw
* probing didn't find more bw. If inflight falls to match BDP then we
* estimate queue is drained; persisting would underutilize the pipe.
*/
return is_full_length ||
- inflight <= bbr_target_cwnd(sk, bw, BBR_UNIT);
+ inflight <= bbr_inflight(sk, bw, BBR_UNIT);
}
static void bbr_advance_cycle_phase(struct sock *sk)
@@ -736,11 +754,11 @@ static void bbr_check_drain(struct sock
bbr->pacing_gain = bbr_drain_gain; /* pace slow to drain */
bbr->cwnd_gain = bbr_high_gain; /* maintain cwnd */
tcp_sk(sk)->snd_ssthresh =
- bbr_target_cwnd(sk, bbr_max_bw(sk), BBR_UNIT);
+ bbr_inflight(sk, bbr_max_bw(sk), BBR_UNIT);
} /* fall through to check if in-flight is already small: */
if (bbr->mode == BBR_DRAIN &&
tcp_packets_in_flight(tcp_sk(sk)) <=
- bbr_target_cwnd(sk, bbr_max_bw(sk), BBR_UNIT))
+ bbr_inflight(sk, bbr_max_bw(sk), BBR_UNIT))
bbr_reset_probe_bw_mode(sk); /* we estimate queue is drained */
}
From: Lukas Wunner <[email protected]>
commit e0a851fe6b9b619527bd928aa93caaddd003f70c upstream.
If the call to uart_add_one_port() in serial8250_register_8250_port()
fails, a half-initialized entry in the serial_8250ports[] array is left
behind.
A subsequent reprobe of the same serial port causes that entry to be
reused. Because uart->port.dev is set, uart_remove_one_port() is called
for the half-initialized entry and bails out with an error message:
bcm2835-aux-uart 3f215040.serial: Removing wrong port: (null) != (ptrval)
The same happens on failure of mctrl_gpio_init() since commit
4a96895f74c9 ("tty/serial/8250: use mctrl_gpio helpers").
Fix by zeroing the uart->port.dev pointer in the probe error path.
The bug was introduced in v2.6.10 by historical commit befff6f5bf5f
("[SERIAL] Add new port registration/unregistration functions."):
https://git.kernel.org/tglx/history/c/befff6f5bf5f
The commit added an unconditional call to uart_remove_one_port() in
serial8250_register_port(). In v3.7, commit 835d844d1a28 ("8250_pnp:
do pnp probe before legacy probe") made that call conditional on
uart->port.dev which allows me to fix the issue by zeroing that pointer
in the error path. Thus, the present commit will fix the problem as far
back as v3.7 whereas still older versions need to also cherry-pick
835d844d1a28.
Fixes: 835d844d1a28 ("8250_pnp: do pnp probe before legacy probe")
Signed-off-by: Lukas Wunner <[email protected]>
Cc: [email protected] # v2.6.10
Cc: [email protected] # v2.6.10: 835d844d1a28: 8250_pnp: do pnp probe before legacy
Reviewed-by: Andy Shevchenko <[email protected]>
Link: https://lore.kernel.org/r/b4a072013ee1a1d13ee06b4325afb19bda57ca1b.1589285873.git.lukas@wunner.de
[iwamatsu: Backported to 4.14, 4.19: adjust context]
Signed-off-by: Nobuhiro Iwamatsu (CIP) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/tty/serial/8250/8250_core.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
--- a/drivers/tty/serial/8250/8250_core.c
+++ b/drivers/tty/serial/8250/8250_core.c
@@ -1062,8 +1062,10 @@ int serial8250_register_8250_port(struct
serial8250_apply_quirks(uart);
ret = uart_add_one_port(&serial8250_reg,
&uart->port);
- if (ret == 0)
- ret = uart->port.line;
+ if (ret)
+ goto err;
+
+ ret = uart->port.line;
} else {
dev_info(uart->port.dev,
"skipping CIR port at 0x%lx / 0x%llx, IRQ %d\n",
@@ -1088,6 +1090,11 @@ int serial8250_register_8250_port(struct
mutex_unlock(&serial_mutex);
return ret;
+
+err:
+ uart->port.dev = NULL;
+ mutex_unlock(&serial_mutex);
+ return ret;
}
EXPORT_SYMBOL(serial8250_register_8250_port);
From: Yunsheng Lin <[email protected]>
[ Upstream commit 2fb541c862c987d02dfdf28f1545016deecfa0d5 ]
Currently there is concurrent reset and enqueue operation for the
same lockless qdisc when there is no lock to synchronize the
q->enqueue() in __dev_xmit_skb() with the qdisc reset operation in
qdisc_deactivate() called by dev_deactivate_queue(), which may cause
out-of-bounds access for priv->ring[] in hns3 driver if user has
requested a smaller queue num when __dev_xmit_skb() still enqueue a
skb with a larger queue_mapping after the corresponding qdisc is
reset, and call hns3_nic_net_xmit() with that skb later.
Reused the existing synchronize_net() in dev_deactivate_many() to
make sure skb with larger queue_mapping enqueued to old qdisc(which
is saved in dev_queue->qdisc_sleeping) will always be reset when
dev_reset_queue() is called.
Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking")
Signed-off-by: Yunsheng Lin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/sched/sch_generic.c | 49 ++++++++++++++++++++++++++++++++----------------
1 file changed, 33 insertions(+), 16 deletions(-)
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -1115,27 +1115,36 @@ static void dev_deactivate_queue(struct
struct netdev_queue *dev_queue,
void *_qdisc_default)
{
- struct Qdisc *qdisc_default = _qdisc_default;
- struct Qdisc *qdisc;
+ struct Qdisc *qdisc = rtnl_dereference(dev_queue->qdisc);
- qdisc = rtnl_dereference(dev_queue->qdisc);
if (qdisc) {
- bool nolock = qdisc->flags & TCQ_F_NOLOCK;
-
- if (nolock)
- spin_lock_bh(&qdisc->seqlock);
- spin_lock_bh(qdisc_lock(qdisc));
-
if (!(qdisc->flags & TCQ_F_BUILTIN))
set_bit(__QDISC_STATE_DEACTIVATED, &qdisc->state);
+ }
+}
- rcu_assign_pointer(dev_queue->qdisc, qdisc_default);
- qdisc_reset(qdisc);
+static void dev_reset_queue(struct net_device *dev,
+ struct netdev_queue *dev_queue,
+ void *_unused)
+{
+ struct Qdisc *qdisc;
+ bool nolock;
- spin_unlock_bh(qdisc_lock(qdisc));
- if (nolock)
- spin_unlock_bh(&qdisc->seqlock);
- }
+ qdisc = dev_queue->qdisc_sleeping;
+ if (!qdisc)
+ return;
+
+ nolock = qdisc->flags & TCQ_F_NOLOCK;
+
+ if (nolock)
+ spin_lock_bh(&qdisc->seqlock);
+ spin_lock_bh(qdisc_lock(qdisc));
+
+ qdisc_reset(qdisc);
+
+ spin_unlock_bh(qdisc_lock(qdisc));
+ if (nolock)
+ spin_unlock_bh(&qdisc->seqlock);
}
static bool some_qdisc_is_busy(struct net_device *dev)
@@ -1196,12 +1205,20 @@ void dev_deactivate_many(struct list_hea
dev_watchdog_down(dev);
}
- /* Wait for outstanding qdisc-less dev_queue_xmit calls.
+ /* Wait for outstanding qdisc-less dev_queue_xmit calls or
+ * outstanding qdisc enqueuing calls.
* This is avoided if all devices are in dismantle phase :
* Caller will call synchronize_net() for us
*/
synchronize_net();
+ list_for_each_entry(dev, head, close_list) {
+ netdev_for_each_tx_queue(dev, dev_reset_queue, NULL);
+
+ if (dev_ingress_queue(dev))
+ dev_reset_queue(dev, dev_ingress_queue(dev), NULL);
+ }
+
/* Wait for outstanding qdisc_run calls. */
list_for_each_entry(dev, head, close_list) {
while (some_qdisc_is_busy(dev))
From: Jakub Kicinski <[email protected]>
[ Upstream commit 5f6857e808a8bd078296575b417c4b9d160b9779 ]
struct ethtool_fecparam carries bitmasks not bit numbers.
We want to return 1 (NONE), not 0.
Fixes: 0d0870938337 ("nfp: implement ethtool FEC mode settings")
Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
@@ -744,8 +744,8 @@ nfp_port_get_fecparam(struct net_device
struct nfp_eth_table_port *eth_port;
struct nfp_port *port;
- param->active_fec = ETHTOOL_FEC_NONE_BIT;
- param->fec = ETHTOOL_FEC_NONE_BIT;
+ param->active_fec = ETHTOOL_FEC_NONE;
+ param->fec = ETHTOOL_FEC_NONE;
port = nfp_port_from_netdev(netdev);
eth_port = nfp_port_get_eth_port(port);
From: Ganji Aravind <[email protected]>
[ Upstream commit 94cc242a067a869c29800aa789d38b7676136e50 ]
Pass the correct offset to clear the stale filter hit
bytes counter. Otherwise, the counter starts incrementing
from the stale information, instead of 0.
Fixes: 12b276fbf6e0 ("cxgb4: add support to create hash filters")
Signed-off-by: Ganji Aravind <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
@@ -1591,13 +1591,16 @@ out:
static int configure_filter_tcb(struct adapter *adap, unsigned int tid,
struct filter_entry *f)
{
- if (f->fs.hitcnts)
+ if (f->fs.hitcnts) {
set_tcb_field(adap, f, tid, TCB_TIMESTAMP_W,
- TCB_TIMESTAMP_V(TCB_TIMESTAMP_M) |
+ TCB_TIMESTAMP_V(TCB_TIMESTAMP_M),
+ TCB_TIMESTAMP_V(0ULL),
+ 1);
+ set_tcb_field(adap, f, tid, TCB_RTT_TS_RECENT_AGE_W,
TCB_RTT_TS_RECENT_AGE_V(TCB_RTT_TS_RECENT_AGE_M),
- TCB_TIMESTAMP_V(0ULL) |
TCB_RTT_TS_RECENT_AGE_V(0ULL),
1);
+ }
if (f->fs.newdmac)
set_tcb_tflag(adap, f, tid, TF_CCTRL_ECE_S, 1,
From: Dan Carpenter <[email protected]>
[ Upstream commit 66d42ed8b25b64eb63111a2b8582c5afc8bf1105 ]
There are a couple bugs here:
1) If opt[1] is zero then this results in a forever loop. If the value
is less than 2 then it is invalid.
2) It assumes that "len" is more than sizeof(valid_accm) or 6 which can
result in memory corruption.
In the case of LCP_OPTION_ACCM, then we should check "opt[1]" instead
of "len" because, if "opt[1]" is less than sizeof(valid_accm) then
"nak_len" gets out of sync and it can lead to memory corruption in the
next iterations through the loop. In case of LCP_OPTION_MAGIC, the
only valid value for opt[1] is 6, but the code is trying to log invalid
data so we should only discard the data when "len" is less than 6
because that leads to a read overflow.
Reported-by: ChenNan Of Chaitin Security Research Lab <[email protected]>
Fixes: e022c2f07ae5 ("WAN: new synchronous PPP implementation for generic HDLC.")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/wan/hdlc_ppp.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
--- a/drivers/net/wan/hdlc_ppp.c
+++ b/drivers/net/wan/hdlc_ppp.c
@@ -386,11 +386,8 @@ static void ppp_cp_parse_cr(struct net_d
}
for (opt = data; len; len -= opt[1], opt += opt[1]) {
- if (len < 2 || len < opt[1]) {
- dev->stats.rx_errors++;
- kfree(out);
- return; /* bad packet, drop silently */
- }
+ if (len < 2 || opt[1] < 2 || len < opt[1])
+ goto err_out;
if (pid == PID_LCP)
switch (opt[0]) {
@@ -398,6 +395,8 @@ static void ppp_cp_parse_cr(struct net_d
continue; /* MRU always OK and > 1500 bytes? */
case LCP_OPTION_ACCM: /* async control character map */
+ if (opt[1] < sizeof(valid_accm))
+ goto err_out;
if (!memcmp(opt, valid_accm,
sizeof(valid_accm)))
continue;
@@ -409,6 +408,8 @@ static void ppp_cp_parse_cr(struct net_d
}
break;
case LCP_OPTION_MAGIC:
+ if (len < 6)
+ goto err_out;
if (opt[1] != 6 || (!opt[2] && !opt[3] &&
!opt[4] && !opt[5]))
break; /* reject invalid magic number */
@@ -427,6 +428,11 @@ static void ppp_cp_parse_cr(struct net_d
ppp_cp_event(dev, pid, RCR_GOOD, CP_CONF_ACK, id, req_len, data);
kfree(out);
+ return;
+
+err_out:
+ dev->stats.rx_errors++;
+ kfree(out);
}
static int ppp_rx(struct sk_buff *skb)
From: Eric Dumazet <[email protected]>
[ Upstream commit 843d926b003ea692468c8cc5bea1f9f58dfa8c75 ]
syzbot reported twice a lockdep issue in fib6_del() [1]
which I think is caused by net->ipv6.fib6_null_entry
having a NULL fib6_table pointer.
fib6_del() already checks for fib6_null_entry special
case, we only need to return earlier.
Bug seems to occur very rarely, I have thus chosen
a 'bug origin' that makes backports not too complex.
[1]
WARNING: suspicious RCU usage
5.9.0-rc4-syzkaller #0 Not tainted
-----------------------------
net/ipv6/ip6_fib.c:1996 suspicious rcu_dereference_protected() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
4 locks held by syz-executor.5/8095:
#0: ffffffff8a7ea708 (rtnl_mutex){+.+.}-{3:3}, at: ppp_release+0x178/0x240 drivers/net/ppp/ppp_generic.c:401
#1: ffff88804c422dd8 (&net->ipv6.fib6_gc_lock){+.-.}-{2:2}, at: spin_trylock_bh include/linux/spinlock.h:414 [inline]
#1: ffff88804c422dd8 (&net->ipv6.fib6_gc_lock){+.-.}-{2:2}, at: fib6_run_gc+0x21b/0x2d0 net/ipv6/ip6_fib.c:2312
#2: ffffffff89bd6a40 (rcu_read_lock){....}-{1:2}, at: __fib6_clean_all+0x0/0x290 net/ipv6/ip6_fib.c:2613
#3: ffff8880a82e6430 (&tb->tb6_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:359 [inline]
#3: ffff8880a82e6430 (&tb->tb6_lock){+.-.}-{2:2}, at: __fib6_clean_all+0x107/0x290 net/ipv6/ip6_fib.c:2245
stack backtrace:
CPU: 1 PID: 8095 Comm: syz-executor.5 Not tainted 5.9.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x198/0x1fd lib/dump_stack.c:118
fib6_del+0x12b4/0x1630 net/ipv6/ip6_fib.c:1996
fib6_clean_node+0x39b/0x570 net/ipv6/ip6_fib.c:2180
fib6_walk_continue+0x4aa/0x8e0 net/ipv6/ip6_fib.c:2102
fib6_walk+0x182/0x370 net/ipv6/ip6_fib.c:2150
fib6_clean_tree+0xdb/0x120 net/ipv6/ip6_fib.c:2230
__fib6_clean_all+0x120/0x290 net/ipv6/ip6_fib.c:2246
fib6_clean_all net/ipv6/ip6_fib.c:2257 [inline]
fib6_run_gc+0x113/0x2d0 net/ipv6/ip6_fib.c:2320
ndisc_netdev_event+0x217/0x350 net/ipv6/ndisc.c:1805
notifier_call_chain+0xb5/0x200 kernel/notifier.c:83
call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:2033
call_netdevice_notifiers_extack net/core/dev.c:2045 [inline]
call_netdevice_notifiers net/core/dev.c:2059 [inline]
dev_close_many+0x30b/0x650 net/core/dev.c:1634
rollback_registered_many+0x3a8/0x1210 net/core/dev.c:9261
rollback_registered net/core/dev.c:9329 [inline]
unregister_netdevice_queue+0x2dd/0x570 net/core/dev.c:10410
unregister_netdevice include/linux/netdevice.h:2774 [inline]
ppp_release+0x216/0x240 drivers/net/ppp/ppp_generic.c:403
__fput+0x285/0x920 fs/file_table.c:281
task_work_run+0xdd/0x190 kernel/task_work.c:141
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_user_mode_loop kernel/entry/common.c:163 [inline]
exit_to_user_mode_prepare+0x1e1/0x200 kernel/entry/common.c:190
syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:265
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 421842edeaf6 ("net/ipv6: Add fib6_null_entry")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: David Ahern <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv6/ip6_fib.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1811,14 +1811,19 @@ static void fib6_del_route(struct fib6_t
/* Need to own table->tb6_lock */
int fib6_del(struct fib6_info *rt, struct nl_info *info)
{
- struct fib6_node *fn = rcu_dereference_protected(rt->fib6_node,
- lockdep_is_held(&rt->fib6_table->tb6_lock));
- struct fib6_table *table = rt->fib6_table;
struct net *net = info->nl_net;
struct fib6_info __rcu **rtp;
struct fib6_info __rcu **rtp_next;
+ struct fib6_table *table;
+ struct fib6_node *fn;
- if (!fn || rt == net->ipv6.fib6_null_entry)
+ if (rt == net->ipv6.fib6_null_entry)
+ return -ENOENT;
+
+ table = rt->fib6_table;
+ fn = rcu_dereference_protected(rt->fib6_node,
+ lockdep_is_held(&table->tb6_lock));
+ if (!fn)
return -ENOENT;
WARN_ON(!(fn->fn_flags & RTN_RTINFO));
From: Linus Walleij <[email protected]>
[ Upstream commit 4ddcaf1ebb5e4e99240f29d531ee69d4244fe416 ]
When removing a port from a VLAN we are just erasing the
member config for the VLAN, which is wrong: other ports
can be using it.
Just mask off the port and only zero out the rest of the
member config once ports using of the VLAN are removed
from it.
Reported-by: Florian Fainelli <[email protected]>
Fixes: d8652956cf37 ("net: dsa: realtek-smi: Add Realtek SMI driver")
Signed-off-by: Linus Walleij <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/dsa/rtl8366.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
--- a/drivers/net/dsa/rtl8366.c
+++ b/drivers/net/dsa/rtl8366.c
@@ -452,13 +452,19 @@ int rtl8366_vlan_del(struct dsa_switch *
return ret;
if (vid == vlanmc.vid) {
- /* clear VLAN member configurations */
- vlanmc.vid = 0;
- vlanmc.priority = 0;
- vlanmc.member = 0;
- vlanmc.untag = 0;
- vlanmc.fid = 0;
-
+ /* Remove this port from the VLAN */
+ vlanmc.member &= ~BIT(port);
+ vlanmc.untag &= ~BIT(port);
+ /*
+ * If no ports are members of this VLAN
+ * anymore then clear the whole member
+ * config so it can be reused.
+ */
+ if (!vlanmc.member && vlanmc.untag) {
+ vlanmc.vid = 0;
+ vlanmc.priority = 0;
+ vlanmc.fid = 0;
+ }
ret = smi->ops->set_vlan_mc(smi, i, &vlanmc);
if (ret) {
dev_err(smi->dev,
From: Masahiro Yamada <[email protected]>
commit 7e20e47c70f810d678d02941fa3c671209c4ca97 upstream.
The 'AS' variable is unused for building the kernel. Only the remaining
usage is to turn on the integrated assembler. A boolean flag is a better
fit for this purpose.
AS=clang was added for experts. So, I replaced it with LLVM_IAS=1,
breaking the backward compatibility.
Suggested-by: Nick Desaulniers <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
Documentation/kbuild/llvm.rst | 5 ++++-
Makefile | 2 ++
2 files changed, 6 insertions(+), 1 deletion(-)
--- a/Documentation/kbuild/llvm.rst
+++ b/Documentation/kbuild/llvm.rst
@@ -50,11 +50,14 @@ LLVM Utilities
LLVM has substitutes for GNU binutils utilities. These can be invoked as
additional parameters to `make`.
- make CC=clang AS=clang LD=ld.lld AR=llvm-ar NM=llvm-nm STRIP=llvm-strip \\
+ make CC=clang LD=ld.lld AR=llvm-ar NM=llvm-nm STRIP=llvm-strip \\
OBJCOPY=llvm-objcopy OBJDUMP=llvm-objdump OBJSIZE=llvm-size \\
READELF=llvm-readelf HOSTCC=clang HOSTCXX=clang++ HOSTAR=llvm-ar \\
HOSTLD=ld.lld
+Currently, the integrated assembler is disabled by default. You can pass
+`LLVM_IAS=1` to enable it.
+
Getting Help
------------
--- a/Makefile
+++ b/Makefile
@@ -492,7 +492,9 @@ endif
ifneq ($(GCC_TOOLCHAIN),)
CLANG_FLAGS += --gcc-toolchain=$(GCC_TOOLCHAIN)
endif
+ifneq ($(LLVM_IAS),1)
CLANG_FLAGS += -no-integrated-as
+endif
CLANG_FLAGS += -Werror=unknown-warning-option
KBUILD_CFLAGS += $(CLANG_FLAGS)
KBUILD_AFLAGS += $(CLANG_FLAGS)
From: Necip Fazil Yildiran <[email protected]>
[ Upstream commit db7cd91a4be15e1485d6b58c6afc8761c59c4efb ]
When IPV6_SEG6_HMAC is enabled and CRYPTO is disabled, it results in the
following Kbuild warning:
WARNING: unmet direct dependencies detected for CRYPTO_HMAC
Depends on [n]: CRYPTO [=n]
Selected by [y]:
- IPV6_SEG6_HMAC [=y] && NET [=y] && INET [=y] && IPV6 [=y]
WARNING: unmet direct dependencies detected for CRYPTO_SHA1
Depends on [n]: CRYPTO [=n]
Selected by [y]:
- IPV6_SEG6_HMAC [=y] && NET [=y] && INET [=y] && IPV6 [=y]
WARNING: unmet direct dependencies detected for CRYPTO_SHA256
Depends on [n]: CRYPTO [=n]
Selected by [y]:
- IPV6_SEG6_HMAC [=y] && NET [=y] && INET [=y] && IPV6 [=y]
The reason is that IPV6_SEG6_HMAC selects CRYPTO_HMAC, CRYPTO_SHA1, and
CRYPTO_SHA256 without depending on or selecting CRYPTO while those configs
are subordinate to CRYPTO.
Honor the kconfig menu hierarchy to remove kconfig dependency warnings.
Fixes: bf355b8d2c30 ("ipv6: sr: add core files for SR HMAC support")
Signed-off-by: Necip Fazil Yildiran <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv6/Kconfig | 1 +
1 file changed, 1 insertion(+)
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -321,6 +321,7 @@ config IPV6_SEG6_LWTUNNEL
config IPV6_SEG6_HMAC
bool "IPv6: Segment Routing HMAC support"
depends on IPV6
+ select CRYPTO
select CRYPTO_HMAC
select CRYPTO_SHA1
select CRYPTO_SHA256
From: Masahiro Yamada <[email protected]>
commit aa824e0c962b532d5073cbb41b2efcd6f5e72bae upstream.
As commit 5ef872636ca7 ("kbuild: get rid of misleading $(AS) from
documents") noted, we rarely use $(AS) directly in the kernel build.
Now that the only/last user of $(AS) in drivers/net/wan/Makefile was
converted to $(CC), $(AS) is no longer used in the build process.
You can still pass in AS=clang, which is just a switch to turn on
the LLVM integrated assembler.
Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
[nd: conflict in exported vars list from not backporting commit
e83b9f55448a ("kbuild: add ability to generate BTF type info for vmlinux")]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
Makefile | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/Makefile
+++ b/Makefile
@@ -368,7 +368,6 @@ KBUILD_HOSTLDFLAGS := $(HOST_LFS_LDFLAG
KBUILD_HOSTLDLIBS := $(HOST_LFS_LIBS) $(HOSTLDLIBS)
# Make variables (CC, etc...)
-AS = $(CROSS_COMPILE)as
LD = $(CROSS_COMPILE)ld
CC = $(CROSS_COMPILE)gcc
CPP = $(CC) -E
@@ -434,7 +433,7 @@ KBUILD_LDFLAGS :=
GCC_PLUGINS_CFLAGS :=
CLANG_FLAGS :=
-export ARCH SRCARCH CONFIG_SHELL HOSTCC KBUILD_HOSTCFLAGS CROSS_COMPILE AS LD CC
+export ARCH SRCARCH CONFIG_SHELL HOSTCC KBUILD_HOSTCFLAGS CROSS_COMPILE LD CC
export CPP AR NM STRIP OBJCOPY OBJDUMP OBJSIZE READELF KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS
export MAKE LEX YACC AWK GENKSYMS INSTALLKERNEL PERL PYTHON PYTHON2 PYTHON3 UTS_MACHINE
export HOSTCXX KBUILD_HOSTCXXFLAGS LDFLAGS_MODULE CHECK CHECKFLAGS
From: Vasily Gorbik <[email protected]>
commit 7bac98707f65b93bf994ef4e99b1eb9e7dbb9c32 upstream.
Define and export OBJSIZE variable for "size" tool from binutils to be
used in architecture specific Makefiles (naming the variable just "SIZE"
would be too risky). In particular this tool is useful to perform checks
that early boot code is not using bss section (which might have not been
zeroed yet or intersects with initrd or other files boot loader might
have put right after the linux kernel).
Link: http://lkml.kernel.org/r/patch-1.thread-2257a1.git-188f5a3d81d5.your-ad-here.call-01565088755-ext-5120@work.hours
Acked-by: Masahiro Yamada <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
[nd: conflict in exported vars list from not backporting commit
e83b9f55448a ("kbuild: add ability to generate BTF type info for vmlinux")]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
Makefile | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/Makefile
+++ b/Makefile
@@ -377,6 +377,7 @@ NM = $(CROSS_COMPILE)nm
STRIP = $(CROSS_COMPILE)strip
OBJCOPY = $(CROSS_COMPILE)objcopy
OBJDUMP = $(CROSS_COMPILE)objdump
+OBJSIZE = $(CROSS_COMPILE)size
LEX = flex
YACC = bison
AWK = awk
@@ -433,7 +434,7 @@ GCC_PLUGINS_CFLAGS :=
CLANG_FLAGS :=
export ARCH SRCARCH CONFIG_SHELL HOSTCC KBUILD_HOSTCFLAGS CROSS_COMPILE AS LD CC
-export CPP AR NM STRIP OBJCOPY OBJDUMP KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS
+export CPP AR NM STRIP OBJCOPY OBJDUMP OBJSIZE KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS
export MAKE LEX YACC AWK GENKSYMS INSTALLKERNEL PERL PYTHON PYTHON2 PYTHON3 UTS_MACHINE
export HOSTCXX KBUILD_HOSTCXXFLAGS LDFLAGS_MODULE CHECK CHECKFLAGS
From: Eric Dumazet <[email protected]>
[ Upstream commit 4a009cb04aeca0de60b73f37b102573354214b52 ]
skb_put_padto() and __skb_put_padto() callers
must check return values or risk use-after-free.
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/skbuff.h | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3014,8 +3014,9 @@ static inline int skb_padto(struct sk_bu
* is untouched. Otherwise it is extended. Returns zero on
* success. The skb is freed on error if @free_on_error is true.
*/
-static inline int __skb_put_padto(struct sk_buff *skb, unsigned int len,
- bool free_on_error)
+static inline int __must_check __skb_put_padto(struct sk_buff *skb,
+ unsigned int len,
+ bool free_on_error)
{
unsigned int size = skb->len;
@@ -3038,7 +3039,7 @@ static inline int __skb_put_padto(struct
* is untouched. Otherwise it is extended. Returns zero on
* success. The skb is freed on error.
*/
-static inline int skb_put_padto(struct sk_buff *skb, unsigned int len)
+static inline int __must_check skb_put_padto(struct sk_buff *skb, unsigned int len)
{
return __skb_put_padto(skb, len, true);
}
From: David Ahern <[email protected]>
[ Upstream commit 2fbc6e89b2f1403189e624cabaf73e189c5e50c6 ]
Kfir reported that pmtu exceptions are not created properly for
deployments where multipath routes use the same device.
After some digging I see 2 compounding problems:
1. ip_route_output_key_hash_rcu is updating the flowi4_oif *after*
the route lookup. This is the second use case where this has
been a problem (the first is related to use of vti devices with
VRF). I can not find any reason for the oif to be changed after the
lookup; the code goes back to the start of git. It does not seem
logical so remove it.
2. fib_lookups for exceptions do not call fib_select_path to handle
multipath route selection based on the hash.
The end result is that the fib_lookup used to add the exception
always creates it based using the first leg of the route.
An example topology showing the problem:
| host1
+------+
| eth0 | .209
+------+
|
+------+
switch | br0 |
+------+
|
+---------+---------+
| host2 | host3
+------+ +------+
| eth0 | .250 | eth0 | 192.168.252.252
+------+ +------+
+-----+ +-----+
| vti | .2 | vti | 192.168.247.3
+-----+ +-----+
\ /
=================================
tunnels
192.168.247.1/24
for h in host1 host2 host3; do
ip netns add ${h}
ip -netns ${h} link set lo up
ip netns exec ${h} sysctl -wq net.ipv4.ip_forward=1
done
ip netns add switch
ip -netns switch li set lo up
ip -netns switch link add br0 type bridge stp 0
ip -netns switch link set br0 up
for n in 1 2 3; do
ip -netns switch link add eth-sw type veth peer name eth-h${n}
ip -netns switch li set eth-h${n} master br0 up
ip -netns switch li set eth-sw netns host${n} name eth0
done
ip -netns host1 addr add 192.168.252.209/24 dev eth0
ip -netns host1 link set dev eth0 up
ip -netns host1 route add 192.168.247.0/24 \
nexthop via 192.168.252.250 dev eth0 nexthop via 192.168.252.252 dev eth0
ip -netns host2 addr add 192.168.252.250/24 dev eth0
ip -netns host2 link set dev eth0 up
ip -netns host2 addr add 192.168.252.252/24 dev eth0
ip -netns host3 link set dev eth0 up
ip netns add tunnel
ip -netns tunnel li set lo up
ip -netns tunnel li add br0 type bridge
ip -netns tunnel li set br0 up
for n in $(seq 11 20); do
ip -netns tunnel addr add dev br0 192.168.247.${n}/24
done
for n in 2 3
do
ip -netns tunnel link add vti${n} type veth peer name eth${n}
ip -netns tunnel link set eth${n} mtu 1360 master br0 up
ip -netns tunnel link set vti${n} netns host${n} mtu 1360 up
ip -netns host${n} addr add dev vti${n} 192.168.247.${n}/24
done
ip -netns tunnel ro add default nexthop via 192.168.247.2 nexthop via 192.168.247.3
ip netns exec host1 ping -M do -s 1400 -c3 -I 192.168.252.209 192.168.247.11
ip netns exec host1 ping -M do -s 1400 -c3 -I 192.168.252.209 192.168.247.15
ip -netns host1 ro ls cache
Before this patch the cache always shows exceptions against the first
leg in the multipath route; 192.168.252.250 per this example. Since the
hash has an initial random seed, you may need to vary the final octet
more than what is listed. In my tests, using addresses between 11 and 19
usually found 1 that used both legs.
With this patch, the cache will have exceptions for both legs.
Fixes: 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions")
Reported-by: Kfir Itzhak <[email protected]>
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/route.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -779,6 +779,8 @@ static void __ip_do_redirect(struct rtab
if (fib_lookup(net, fl4, &res, 0) == 0) {
struct fib_nh *nh = &FIB_RES_NH(res);
+ fib_select_path(net, &res, fl4, skb);
+ nh = &FIB_RES_NH(res);
update_or_create_fnhe(nh, fl4->daddr, new_gw,
0, false,
jiffies + ip_rt_gc_timeout);
@@ -1004,6 +1006,7 @@ out: kfree_skb(skb);
static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
{
struct dst_entry *dst = &rt->dst;
+ struct net *net = dev_net(dst->dev);
u32 old_mtu = ipv4_mtu(dst);
struct fib_result res;
bool lock = false;
@@ -1024,9 +1027,11 @@ static void __ip_rt_update_pmtu(struct r
return;
rcu_read_lock();
- if (fib_lookup(dev_net(dst->dev), fl4, &res, 0) == 0) {
- struct fib_nh *nh = &FIB_RES_NH(res);
+ if (fib_lookup(net, fl4, &res, 0) == 0) {
+ struct fib_nh *nh;
+ fib_select_path(net, &res, fl4, NULL);
+ nh = &FIB_RES_NH(res);
update_or_create_fnhe(nh, fl4->daddr, 0, mtu, lock,
jiffies + ip_rt_mtu_expires);
}
@@ -2536,8 +2541,6 @@ struct rtable *ip_route_output_key_hash_
fib_select_path(net, res, fl4, skb);
dev_out = FIB_RES_DEV(*res);
- fl4->flowi4_oif = dev_out->ifindex;
-
make_route:
rth = __mkroute_output(res, fl4, orig_oif, dev_out, flags);
From: Mark Salyzyn <[email protected]>
commit 37bd22420f856fcd976989f1d4f1f7ad28e1fcac upstream.
In pfkey_dump() dplen and splen can both be specified to access the
xfrm_address_t structure out of bounds in__xfrm_state_filter_match()
when it calls addr_match() with the indexes. Return EINVAL if either
are out of range.
Signed-off-by: Mark Salyzyn <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Steffen Klassert <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Steffen Klassert <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/key/af_key.c | 7 +++++++
1 file changed, 7 insertions(+)
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -1855,6 +1855,13 @@ static int pfkey_dump(struct sock *sk, s
if (ext_hdrs[SADB_X_EXT_FILTER - 1]) {
struct sadb_x_filter *xfilter = ext_hdrs[SADB_X_EXT_FILTER - 1];
+ if ((xfilter->sadb_x_filter_splen >=
+ (sizeof(xfrm_address_t) << 3)) ||
+ (xfilter->sadb_x_filter_dplen >=
+ (sizeof(xfrm_address_t) << 3))) {
+ mutex_unlock(&pfk->dump_lock);
+ return -EINVAL;
+ }
filter = kmalloc(sizeof(*filter), GFP_KERNEL);
if (filter == NULL) {
mutex_unlock(&pfk->dump_lock);
From: Priyaranjan Jha <[email protected]>
commit 78dc70ebaa38aa303274e333be6c98eef87619e2 upstream.
Aggregation effects are extremely common with wifi, cellular, and cable
modem link technologies, ACK decimation in middleboxes, and LRO and GRO
in receiving hosts. The aggregation can happen in either direction,
data or ACKs, but in either case the aggregation effect is visible
to the sender in the ACK stream.
Previously BBR's sending was often limited by cwnd under severe ACK
aggregation/decimation because BBR sized the cwnd at 2*BDP. If packets
were acked in bursts after long delays (e.g. one ACK acking 5*BDP after
5*RTT), BBR's sending was halted after sending 2*BDP over 2*RTT, leaving
the bottleneck idle for potentially long periods. Note that loss-based
congestion control does not have this issue because when facing
aggregation it continues increasing cwnd after bursts of ACKs, growing
cwnd until the buffer is full.
To achieve good throughput in the presence of aggregation effects, this
algorithm allows the BBR sender to put extra data in flight to keep the
bottleneck utilized during silences in the ACK stream that it has evidence
to suggest were caused by aggregation.
A summary of the algorithm: when a burst of packets are acked by a
stretched ACK or a burst of ACKs or both, BBR first estimates the expected
amount of data that should have been acked, based on its estimated
bandwidth. Then the surplus ("extra_acked") is recorded in a windowed-max
filter to estimate the recent level of observed ACK aggregation. Then cwnd
is increased by the ACK aggregation estimate. The larger cwnd avoids BBR
being cwnd-limited in the face of ACK silences that recent history suggests
were caused by aggregation. As a sanity check, the ACK aggregation degree
is upper-bounded by the cwnd (at the time of measurement) and a global max
of BW * 100ms. The algorithm is further described by the following
presentation:
https://datatracker.ietf.org/meeting/101/materials/slides-101-iccrg-an-update-on-bbr-work-at-google-00
In our internal testing, we observed a significant increase in BBR
throughput (measured using netperf), in a basic wifi setup.
- Host1 (sender on ethernet) -> AP -> Host2 (receiver on wifi)
- 2.4 GHz -> BBR before: ~73 Mbps; BBR after: ~102 Mbps; CUBIC: ~100 Mbps
- 5.0 GHz -> BBR before: ~362 Mbps; BBR after: ~593 Mbps; CUBIC: ~601 Mbps
Also, this code is running globally on YouTube TCP connections and produced
significant bandwidth increases for YouTube traffic.
This is based on Ian Swett's max_ack_height_ algorithm from the
QUIC BBR implementation.
Signed-off-by: Priyaranjan Jha <[email protected]>
Signed-off-by: Neal Cardwell <[email protected]>
Signed-off-by: Yuchung Cheng <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/net/inet_connection_sock.h | 4 -
net/ipv4/tcp_bbr.c | 122 ++++++++++++++++++++++++++++++++++++-
2 files changed, 123 insertions(+), 3 deletions(-)
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -139,8 +139,8 @@ struct inet_connection_sock {
} icsk_mtup;
u32 icsk_user_timeout;
- u64 icsk_ca_priv[88 / sizeof(u64)];
-#define ICSK_CA_PRIV_SIZE (11 * sizeof(u64))
+ u64 icsk_ca_priv[104 / sizeof(u64)];
+#define ICSK_CA_PRIV_SIZE (13 * sizeof(u64))
};
#define ICSK_TIME_RETRANS 1 /* Retransmit timer */
--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -115,6 +115,14 @@ struct bbr {
unused_b:5;
u32 prior_cwnd; /* prior cwnd upon entering loss recovery */
u32 full_bw; /* recent bw, to estimate if pipe is full */
+
+ /* For tracking ACK aggregation: */
+ u64 ack_epoch_mstamp; /* start of ACK sampling epoch */
+ u16 extra_acked[2]; /* max excess data ACKed in epoch */
+ u32 ack_epoch_acked:20, /* packets (S)ACKed in sampling epoch */
+ extra_acked_win_rtts:5, /* age of extra_acked, in round trips */
+ extra_acked_win_idx:1, /* current index in extra_acked array */
+ unused_c:6;
};
#define CYCLE_LEN 8 /* number of phases in a pacing gain cycle */
@@ -174,6 +182,15 @@ static const u32 bbr_lt_bw_diff = 4000 /
/* If we estimate we're policed, use lt_bw for this many round trips: */
static const u32 bbr_lt_bw_max_rtts = 48;
+/* Gain factor for adding extra_acked to target cwnd: */
+static const int bbr_extra_acked_gain = BBR_UNIT;
+/* Window length of extra_acked window. */
+static const u32 bbr_extra_acked_win_rtts = 5;
+/* Max allowed val for ack_epoch_acked, after which sampling epoch is reset */
+static const u32 bbr_ack_epoch_acked_reset_thresh = 1U << 20;
+/* Time period for clamping cwnd increment due to ack aggregation */
+static const u32 bbr_extra_acked_max_us = 100 * 1000;
+
static void bbr_check_probe_rtt_done(struct sock *sk);
/* Do we estimate that STARTUP filled the pipe? */
@@ -200,6 +217,16 @@ static u32 bbr_bw(const struct sock *sk)
return bbr->lt_use_bw ? bbr->lt_bw : bbr_max_bw(sk);
}
+/* Return maximum extra acked in past k-2k round trips,
+ * where k = bbr_extra_acked_win_rtts.
+ */
+static u16 bbr_extra_acked(const struct sock *sk)
+{
+ struct bbr *bbr = inet_csk_ca(sk);
+
+ return max(bbr->extra_acked[0], bbr->extra_acked[1]);
+}
+
/* Return rate in bytes per second, optionally with a gain.
* The order here is chosen carefully to avoid overflow of u64. This should
* work for input rates of up to 2.9Tbit/sec and gain of 2.89x.
@@ -305,6 +332,8 @@ static void bbr_cwnd_event(struct sock *
if (event == CA_EVENT_TX_START && tp->app_limited) {
bbr->idle_restart = 1;
+ bbr->ack_epoch_mstamp = tp->tcp_mstamp;
+ bbr->ack_epoch_acked = 0;
/* Avoid pointless buffer overflows: pace at est. bw if we don't
* need more speed (we're restarting from idle and app-limited).
*/
@@ -385,6 +414,22 @@ static u32 bbr_inflight(struct sock *sk,
return inflight;
}
+/* Find the cwnd increment based on estimate of ack aggregation */
+static u32 bbr_ack_aggregation_cwnd(struct sock *sk)
+{
+ u32 max_aggr_cwnd, aggr_cwnd = 0;
+
+ if (bbr_extra_acked_gain && bbr_full_bw_reached(sk)) {
+ max_aggr_cwnd = ((u64)bbr_bw(sk) * bbr_extra_acked_max_us)
+ / BW_UNIT;
+ aggr_cwnd = (bbr_extra_acked_gain * bbr_extra_acked(sk))
+ >> BBR_SCALE;
+ aggr_cwnd = min(aggr_cwnd, max_aggr_cwnd);
+ }
+
+ return aggr_cwnd;
+}
+
/* An optimization in BBR to reduce losses: On the first round of recovery, we
* follow the packet conservation principle: send P packets per P packets acked.
* After that, we slow-start and send at most 2*P packets per P packets acked.
@@ -445,9 +490,15 @@ static void bbr_set_cwnd(struct sock *sk
if (bbr_set_cwnd_to_recover_or_restore(sk, rs, acked, &cwnd))
goto done;
- /* If we're below target cwnd, slow start cwnd toward target cwnd. */
target_cwnd = bbr_bdp(sk, bw, gain);
+
+ /* Increment the cwnd to account for excess ACKed data that seems
+ * due to aggregation (of data and/or ACKs) visible in the ACK stream.
+ */
+ target_cwnd += bbr_ack_aggregation_cwnd(sk);
target_cwnd = bbr_quantization_budget(sk, target_cwnd, gain);
+
+ /* If we're below target cwnd, slow start cwnd toward target cwnd. */
if (bbr_full_bw_reached(sk)) /* only cut cwnd if we filled the pipe */
cwnd = min(cwnd + acked, target_cwnd);
else if (cwnd < target_cwnd || tp->delivered < TCP_INIT_CWND)
@@ -717,6 +768,67 @@ static void bbr_update_bw(struct sock *s
}
}
+/* Estimates the windowed max degree of ack aggregation.
+ * This is used to provision extra in-flight data to keep sending during
+ * inter-ACK silences.
+ *
+ * Degree of ack aggregation is estimated as extra data acked beyond expected.
+ *
+ * max_extra_acked = "maximum recent excess data ACKed beyond max_bw * interval"
+ * cwnd += max_extra_acked
+ *
+ * Max extra_acked is clamped by cwnd and bw * bbr_extra_acked_max_us (100 ms).
+ * Max filter is an approximate sliding window of 5-10 (packet timed) round
+ * trips.
+ */
+static void bbr_update_ack_aggregation(struct sock *sk,
+ const struct rate_sample *rs)
+{
+ u32 epoch_us, expected_acked, extra_acked;
+ struct bbr *bbr = inet_csk_ca(sk);
+ struct tcp_sock *tp = tcp_sk(sk);
+
+ if (!bbr_extra_acked_gain || rs->acked_sacked <= 0 ||
+ rs->delivered < 0 || rs->interval_us <= 0)
+ return;
+
+ if (bbr->round_start) {
+ bbr->extra_acked_win_rtts = min(0x1F,
+ bbr->extra_acked_win_rtts + 1);
+ if (bbr->extra_acked_win_rtts >= bbr_extra_acked_win_rtts) {
+ bbr->extra_acked_win_rtts = 0;
+ bbr->extra_acked_win_idx = bbr->extra_acked_win_idx ?
+ 0 : 1;
+ bbr->extra_acked[bbr->extra_acked_win_idx] = 0;
+ }
+ }
+
+ /* Compute how many packets we expected to be delivered over epoch. */
+ epoch_us = tcp_stamp_us_delta(tp->delivered_mstamp,
+ bbr->ack_epoch_mstamp);
+ expected_acked = ((u64)bbr_bw(sk) * epoch_us) / BW_UNIT;
+
+ /* Reset the aggregation epoch if ACK rate is below expected rate or
+ * significantly large no. of ack received since epoch (potentially
+ * quite old epoch).
+ */
+ if (bbr->ack_epoch_acked <= expected_acked ||
+ (bbr->ack_epoch_acked + rs->acked_sacked >=
+ bbr_ack_epoch_acked_reset_thresh)) {
+ bbr->ack_epoch_acked = 0;
+ bbr->ack_epoch_mstamp = tp->delivered_mstamp;
+ expected_acked = 0;
+ }
+
+ /* Compute excess data delivered, beyond what was expected. */
+ bbr->ack_epoch_acked = min_t(u32, 0xFFFFF,
+ bbr->ack_epoch_acked + rs->acked_sacked);
+ extra_acked = bbr->ack_epoch_acked - expected_acked;
+ extra_acked = min(extra_acked, tp->snd_cwnd);
+ if (extra_acked > bbr->extra_acked[bbr->extra_acked_win_idx])
+ bbr->extra_acked[bbr->extra_acked_win_idx] = extra_acked;
+}
+
/* Estimate when the pipe is full, using the change in delivery rate: BBR
* estimates that STARTUP filled the pipe if the estimated bw hasn't changed by
* at least bbr_full_bw_thresh (25%) after bbr_full_bw_cnt (3) non-app-limited
@@ -846,6 +958,7 @@ static void bbr_update_min_rtt(struct so
static void bbr_update_model(struct sock *sk, const struct rate_sample *rs)
{
bbr_update_bw(sk, rs);
+ bbr_update_ack_aggregation(sk, rs);
bbr_update_cycle_phase(sk, rs);
bbr_check_full_bw_reached(sk, rs);
bbr_check_drain(sk, rs);
@@ -896,6 +1009,13 @@ static void bbr_init(struct sock *sk)
bbr_reset_lt_bw_sampling(sk);
bbr_reset_startup_mode(sk);
+ bbr->ack_epoch_mstamp = tp->tcp_mstamp;
+ bbr->ack_epoch_acked = 0;
+ bbr->extra_acked_win_rtts = 0;
+ bbr->extra_acked_win_idx = 0;
+ bbr->extra_acked[0] = 0;
+ bbr->extra_acked[1] = 0;
+
cmpxchg(&sk->sk_pacing_status, SK_PACING_NONE, SK_PACING_NEEDED);
}
From: Mark Gray <[email protected]>
[ Upstream commit 34beb21594519ce64a55a498c2fe7d567bc1ca20 ]
This patch adds transport ports information for route lookup so that
IPsec can select Geneve tunnel traffic to do encryption. This is
needed for OVS/OVN IPsec with encrypted Geneve tunnels.
This can be tested by configuring a host-host VPN using an IKE
daemon and specifying port numbers. For example, for an
Openswan-type configuration, the following parameters should be
configured on both hosts and IPsec set up as-per normal:
$ cat /etc/ipsec.conf
conn in
...
left=$IP1
right=$IP2
...
leftprotoport=udp/6081
rightprotoport=udp
...
conn out
...
left=$IP1
right=$IP2
...
leftprotoport=udp
rightprotoport=udp/6081
...
The tunnel can then be setup using "ip" on both hosts (but
changing the relevant IP addresses):
$ ip link add tun type geneve id 1000 remote $IP2
$ ip addr add 192.168.0.1/24 dev tun
$ ip link set tun up
This can then be tested by pinging from $IP1:
$ ping 192.168.0.2
Without this patch the traffic is unencrypted on the wire.
Fixes: 2d07dc79fe04 ("geneve: add initial netdev driver for GENEVE tunnels")
Signed-off-by: Qiuyu Xiao <[email protected]>
Signed-off-by: Mark Gray <[email protected]>
Reviewed-by: Greg Rose <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/geneve.c | 37 +++++++++++++++++++++++++++----------
1 file changed, 27 insertions(+), 10 deletions(-)
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -721,7 +721,8 @@ static struct rtable *geneve_get_v4_rt(s
struct net_device *dev,
struct geneve_sock *gs4,
struct flowi4 *fl4,
- const struct ip_tunnel_info *info)
+ const struct ip_tunnel_info *info,
+ __be16 dport, __be16 sport)
{
bool use_cache = ip_tunnel_dst_cache_usable(skb, info);
struct geneve_dev *geneve = netdev_priv(dev);
@@ -737,6 +738,8 @@ static struct rtable *geneve_get_v4_rt(s
fl4->flowi4_proto = IPPROTO_UDP;
fl4->daddr = info->key.u.ipv4.dst;
fl4->saddr = info->key.u.ipv4.src;
+ fl4->fl4_dport = dport;
+ fl4->fl4_sport = sport;
tos = info->key.tos;
if ((tos == 1) && !geneve->collect_md) {
@@ -771,7 +774,8 @@ static struct dst_entry *geneve_get_v6_d
struct net_device *dev,
struct geneve_sock *gs6,
struct flowi6 *fl6,
- const struct ip_tunnel_info *info)
+ const struct ip_tunnel_info *info,
+ __be16 dport, __be16 sport)
{
bool use_cache = ip_tunnel_dst_cache_usable(skb, info);
struct geneve_dev *geneve = netdev_priv(dev);
@@ -787,6 +791,9 @@ static struct dst_entry *geneve_get_v6_d
fl6->flowi6_proto = IPPROTO_UDP;
fl6->daddr = info->key.u.ipv6.dst;
fl6->saddr = info->key.u.ipv6.src;
+ fl6->fl6_dport = dport;
+ fl6->fl6_sport = sport;
+
prio = info->key.tos;
if ((prio == 1) && !geneve->collect_md) {
prio = ip_tunnel_get_dsfield(ip_hdr(skb), skb);
@@ -833,14 +840,15 @@ static int geneve_xmit_skb(struct sk_buf
__be16 df;
int err;
- rt = geneve_get_v4_rt(skb, dev, gs4, &fl4, info);
+ sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true);
+ rt = geneve_get_v4_rt(skb, dev, gs4, &fl4, info,
+ geneve->info.key.tp_dst, sport);
if (IS_ERR(rt))
return PTR_ERR(rt);
skb_tunnel_check_pmtu(skb, &rt->dst,
GENEVE_IPV4_HLEN + info->options_len);
- sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true);
if (geneve->collect_md) {
tos = ip_tunnel_ecn_encap(key->tos, ip_hdr(skb), skb);
ttl = key->ttl;
@@ -875,13 +883,14 @@ static int geneve6_xmit_skb(struct sk_bu
__be16 sport;
int err;
- dst = geneve_get_v6_dst(skb, dev, gs6, &fl6, info);
+ sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true);
+ dst = geneve_get_v6_dst(skb, dev, gs6, &fl6, info,
+ geneve->info.key.tp_dst, sport);
if (IS_ERR(dst))
return PTR_ERR(dst);
skb_tunnel_check_pmtu(skb, dst, GENEVE_IPV6_HLEN + info->options_len);
- sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true);
if (geneve->collect_md) {
prio = ip_tunnel_ecn_encap(key->tos, ip_hdr(skb), skb);
ttl = key->ttl;
@@ -958,13 +967,18 @@ static int geneve_fill_metadata_dst(stru
{
struct ip_tunnel_info *info = skb_tunnel_info(skb);
struct geneve_dev *geneve = netdev_priv(dev);
+ __be16 sport;
if (ip_tunnel_info_af(info) == AF_INET) {
struct rtable *rt;
struct flowi4 fl4;
+
struct geneve_sock *gs4 = rcu_dereference(geneve->sock4);
+ sport = udp_flow_src_port(geneve->net, skb,
+ 1, USHRT_MAX, true);
- rt = geneve_get_v4_rt(skb, dev, gs4, &fl4, info);
+ rt = geneve_get_v4_rt(skb, dev, gs4, &fl4, info,
+ geneve->info.key.tp_dst, sport);
if (IS_ERR(rt))
return PTR_ERR(rt);
@@ -974,9 +988,13 @@ static int geneve_fill_metadata_dst(stru
} else if (ip_tunnel_info_af(info) == AF_INET6) {
struct dst_entry *dst;
struct flowi6 fl6;
+
struct geneve_sock *gs6 = rcu_dereference(geneve->sock6);
+ sport = udp_flow_src_port(geneve->net, skb,
+ 1, USHRT_MAX, true);
- dst = geneve_get_v6_dst(skb, dev, gs6, &fl6, info);
+ dst = geneve_get_v6_dst(skb, dev, gs6, &fl6, info,
+ geneve->info.key.tp_dst, sport);
if (IS_ERR(dst))
return PTR_ERR(dst);
@@ -987,8 +1005,7 @@ static int geneve_fill_metadata_dst(stru
return -EINVAL;
}
- info->key.tp_src = udp_flow_src_port(geneve->net, skb,
- 1, USHRT_MAX, true);
+ info->key.tp_src = sport;
info->key.tp_dst = geneve->info.key.tp_dst;
return 0;
}
From: Muchun Song <[email protected]>
[ Upstream commit b0399092ccebd9feef68d4ceb8d6219a8c0caa05 ]
If a kprobe is marked as gone, we should not kill it again. Otherwise, we
can disarm the kprobe more than once. In that case, the statistics of
kprobe_ftrace_enabled can unbalance which can lead to that kprobe do not
work.
Fixes: e8386a0cb22f ("kprobes: support probing module __exit function")
Co-developed-by: Chengming Zhou <[email protected]>
Signed-off-by: Muchun Song <[email protected]>
Signed-off-by: Chengming Zhou <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Masami Hiramatsu <[email protected]>
Cc: "Naveen N . Rao" <[email protected]>
Cc: Anil S Keshavamurthy <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
kernel/kprobes.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index eb4bffe6d764d..230d9d599b5aa 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -2061,6 +2061,9 @@ static void kill_kprobe(struct kprobe *p)
{
struct kprobe *kp;
+ if (WARN_ON_ONCE(kprobe_gone(p)))
+ return;
+
p->flags |= KPROBE_FLAG_GONE;
if (kprobe_aggrprobe(p)) {
/*
@@ -2243,7 +2246,10 @@ static int kprobes_module_callback(struct notifier_block *nb,
mutex_lock(&kprobe_mutex);
for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
head = &kprobe_table[i];
- hlist_for_each_entry_rcu(p, head, hlist)
+ hlist_for_each_entry_rcu(p, head, hlist) {
+ if (kprobe_gone(p))
+ continue;
+
if (within_module_init((unsigned long)p->addr, mod) ||
(checkcore &&
within_module_core((unsigned long)p->addr, mod))) {
@@ -2260,6 +2266,7 @@ static int kprobes_module_callback(struct notifier_block *nb,
*/
kill_kprobe(p);
}
+ }
}
mutex_unlock(&kprobe_mutex);
return NOTIFY_DONE;
--
2.25.1
From: Ralph Campbell <[email protected]>
[ Upstream commit ec0abae6dcdf7ef88607c869bf35a4b63ce1b370 ]
A migrating transparent huge page has to already be unmapped. Otherwise,
the page could be modified while it is being copied to a new page and data
could be lost. The function __split_huge_pmd() checks for a PMD migration
entry before calling __split_huge_pmd_locked() leading one to think that
__split_huge_pmd_locked() can handle splitting a migrating PMD.
However, the code always increments the page->_mapcount and adjusts the
memory control group accounting assuming the page is mapped.
Also, if the PMD entry is a migration PMD entry, the call to
is_huge_zero_pmd(*pmd) is incorrect because it calls pmd_pfn(pmd) instead
of migration_entry_to_pfn(pmd_to_swp_entry(pmd)). Fix these problems by
checking for a PMD migration entry.
Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path")
Signed-off-by: Ralph Campbell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Yang Shi <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Bharata B Rao <[email protected]>
Cc: Ben Skeggs <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: <[email protected]> [4.14+]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
mm/huge_memory.c | 40 +++++++++++++++++++++++-----------------
1 file changed, 23 insertions(+), 17 deletions(-)
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2145,7 +2145,7 @@ static void __split_huge_pmd_locked(stru
put_page(page);
add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR);
return;
- } else if (is_huge_zero_pmd(*pmd)) {
+ } else if (pmd_trans_huge(*pmd) && is_huge_zero_pmd(*pmd)) {
/*
* FIXME: Do we want to invalidate secondary mmu by calling
* mmu_notifier_invalidate_range() see comments below inside
@@ -2233,27 +2233,33 @@ static void __split_huge_pmd_locked(stru
pte = pte_offset_map(&_pmd, addr);
BUG_ON(!pte_none(*pte));
set_pte_at(mm, addr, pte, entry);
- atomic_inc(&page[i]._mapcount);
- pte_unmap(pte);
- }
-
- /*
- * Set PG_double_map before dropping compound_mapcount to avoid
- * false-negative page_mapped().
- */
- if (compound_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) {
- for (i = 0; i < HPAGE_PMD_NR; i++)
+ if (!pmd_migration)
atomic_inc(&page[i]._mapcount);
+ pte_unmap(pte);
}
- if (atomic_add_negative(-1, compound_mapcount_ptr(page))) {
- /* Last compound_mapcount is gone. */
- __dec_node_page_state(page, NR_ANON_THPS);
- if (TestClearPageDoubleMap(page)) {
- /* No need in mapcount reference anymore */
+ if (!pmd_migration) {
+ /*
+ * Set PG_double_map before dropping compound_mapcount to avoid
+ * false-negative page_mapped().
+ */
+ if (compound_mapcount(page) > 1 &&
+ !TestSetPageDoubleMap(page)) {
for (i = 0; i < HPAGE_PMD_NR; i++)
- atomic_dec(&page[i]._mapcount);
+ atomic_inc(&page[i]._mapcount);
+ }
+
+ lock_page_memcg(page);
+ if (atomic_add_negative(-1, compound_mapcount_ptr(page))) {
+ /* Last compound_mapcount is gone. */
+ __dec_lruvec_page_state(page, NR_ANON_THPS);
+ if (TestClearPageDoubleMap(page)) {
+ /* No need in mapcount reference anymore */
+ for (i = 0; i < HPAGE_PMD_NR; i++)
+ atomic_dec(&page[i]._mapcount);
+ }
}
+ unlock_page_memcg(page);
}
smp_wmb(); /* make pte visible before pmd */
From: Nick Desaulniers <[email protected]>
commit 8708e13c6a0600625eea3aebd027c0715a5d2bb2 upstream.
Add keyword support so that our mailing list gets cc'ed for clang/llvm
patches. We're pretty active on our mailing list so far as code review.
There are numerous Googlers like myself that are paid to support
building the Linux kernel with Clang and LLVM.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Nick Desaulniers <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Cc: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
MAINTAINERS | 8 ++++++++
1 file changed, 8 insertions(+)
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3613,6 +3613,14 @@ M: Miguel Ojeda <miguel.ojeda.sandonis@g
S: Maintained
F: .clang-format
+CLANG/LLVM BUILD SUPPORT
+L: [email protected]
+W: https://clangbuiltlinux.github.io/
+B: https://github.com/ClangBuiltLinux/linux/issues
+C: irc://chat.freenode.net/clangbuiltlinux
+S: Supported
+K: \b(?i:clang|llvm)\b
+
CLEANCACHE API
M: Konrad Rzeszutek Wilk <[email protected]>
L: [email protected]
From: Nick Desaulniers <[email protected]>
commit fcf1b6a35c16ac500fa908a4022238e5d666eabf upstream.
added to kbuild documentation. Provides more official info on building
kernels with Clang and LLVM than our wiki.
Suggested-by: Kees Cook <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Reviewed-by: Sedat Dilek <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
[nd: hunk against Documentation/kbuild/index.rst dropped due to not backporting
commit cd238effefa2 ("docs: kbuild: convert docs to ReST and rename to *.rst")]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
Documentation/kbuild/llvm.rst | 80 ++++++++++++++++++++++++++++++++++++++++++
MAINTAINERS | 1
2 files changed, 81 insertions(+)
create mode 100644 Documentation/kbuild/llvm.rst
--- /dev/null
+++ b/Documentation/kbuild/llvm.rst
@@ -0,0 +1,80 @@
+==============================
+Building Linux with Clang/LLVM
+==============================
+
+This document covers how to build the Linux kernel with Clang and LLVM
+utilities.
+
+About
+-----
+
+The Linux kernel has always traditionally been compiled with GNU toolchains
+such as GCC and binutils. Ongoing work has allowed for `Clang
+<https://clang.llvm.org/>`_ and `LLVM <https://llvm.org/>`_ utilities to be
+used as viable substitutes. Distributions such as `Android
+<https://www.android.com/>`_, `ChromeOS
+<https://www.chromium.org/chromium-os>`_, and `OpenMandriva
+<https://www.openmandriva.org/>`_ use Clang built kernels. `LLVM is a
+collection of toolchain components implemented in terms of C++ objects
+<https://www.aosabook.org/en/llvm.html>`_. Clang is a front-end to LLVM that
+supports C and the GNU C extensions required by the kernel, and is pronounced
+"klang," not "see-lang."
+
+Clang
+-----
+
+The compiler used can be swapped out via `CC=` command line argument to `make`.
+`CC=` should be set when selecting a config and during a build.
+
+ make CC=clang defconfig
+
+ make CC=clang
+
+Cross Compiling
+---------------
+
+A single Clang compiler binary will typically contain all supported backends,
+which can help simplify cross compiling.
+
+ ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- make CC=clang
+
+`CROSS_COMPILE` is not used to prefix the Clang compiler binary, instead
+`CROSS_COMPILE` is used to set a command line flag: `--target <triple>`. For
+example:
+
+ clang --target aarch64-linux-gnu foo.c
+
+LLVM Utilities
+--------------
+
+LLVM has substitutes for GNU binutils utilities. These can be invoked as
+additional parameters to `make`.
+
+ make CC=clang AS=clang LD=ld.lld AR=llvm-ar NM=llvm-nm STRIP=llvm-strip \\
+ OBJCOPY=llvm-objcopy OBJDUMP=llvm-objdump OBJSIZE=llvm-objsize \\
+ READELF=llvm-readelf HOSTCC=clang HOSTCXX=clang++ HOSTAR=llvm-ar \\
+ HOSTLD=ld.lld
+
+Getting Help
+------------
+
+- `Website <https://clangbuiltlinux.github.io/>`_
+- `Mailing List <https://groups.google.com/forum/#!forum/clang-built-linux>`_: <[email protected]>
+- `Issue Tracker <https://github.com/ClangBuiltLinux/linux/issues>`_
+- IRC: #clangbuiltlinux on chat.freenode.net
+- `Telegram <https://t.me/ClangBuiltLinux>`_: @ClangBuiltLinux
+- `Wiki <https://github.com/ClangBuiltLinux/linux/wiki>`_
+- `Beginner Bugs <https://github.com/ClangBuiltLinux/linux/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22>`_
+
+Getting LLVM
+-------------
+
+- http://releases.llvm.org/download.html
+- https://github.com/llvm/llvm-project
+- https://llvm.org/docs/GettingStarted.html
+- https://llvm.org/docs/CMake.html
+- https://apt.llvm.org/
+- https://www.archlinux.org/packages/extra/x86_64/llvm/
+- https://github.com/ClangBuiltLinux/tc-build
+- https://github.com/ClangBuiltLinux/linux/wiki/Building-Clang-from-source
+- https://android.googlesource.com/platform/prebuilts/clang/host/linux-x86/
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3620,6 +3620,7 @@ B: https://github.com/ClangBuiltLinux/li
C: irc://chat.freenode.net/clangbuiltlinux
S: Supported
K: \b(?i:clang|llvm)\b
+F: Documentation/kbuild/llvm.rst
CLEANCACHE API
M: Konrad Rzeszutek Wilk <[email protected]>
From: Masahiro Yamada <[email protected]>
commit 734f3719d3438f9cc181d674c33ca9762e9148a1 upstream.
The firmware source, wanxlfw.S, is currently compiled by the combo of
$(CPP) and $(M68KAS). This is not what we usually do for compiling *.S
files. In fact, this Makefile is the only user of $(AS) in the kernel
build.
Instead of combining $(CPP) and (AS) from different tool sets, using
$(M68KCC) as an assembler driver is simpler, and saner.
Signed-off-by: Masahiro Yamada <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/wan/Makefile | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/net/wan/Makefile
+++ b/drivers/net/wan/Makefile
@@ -41,16 +41,16 @@ $(obj)/wanxl.o: $(obj)/wanxlfw.inc
ifeq ($(CONFIG_WANXL_BUILD_FIRMWARE),y)
ifeq ($(ARCH),m68k)
- M68KAS = $(AS)
+ M68KCC = $(CC)
M68KLD = $(LD)
else
- M68KAS = $(CROSS_COMPILE_M68K)as
+ M68KCC = $(CROSS_COMPILE_M68K)gcc
M68KLD = $(CROSS_COMPILE_M68K)ld
endif
quiet_cmd_build_wanxlfw = BLD FW $@
cmd_build_wanxlfw = \
- $(CPP) -D__ASSEMBLY__ -Wp,-MD,$(depfile) -I$(srctree)/include/uapi $< | $(M68KAS) -m68360 -o $(obj)/wanxlfw.o; \
+ $(M68KCC) -D__ASSEMBLY__ -Wp,-MD,$(depfile) -I$(srctree)/include/uapi -c -o $(obj)/wanxlfw.o $<; \
$(M68KLD) --oformat binary -Ttext 0x1000 $(obj)/wanxlfw.o -o $(obj)/wanxlfw.bin; \
hexdump -ve '"\n" 16/1 "0x%02X,"' $(obj)/wanxlfw.bin | sed 's/0x ,//g;1s/^/static const u8 firmware[]={/;$$s/,$$/\n};\n/' >$(obj)/wanxlfw.inc; \
rm -f $(obj)/wanxlfw.bin $(obj)/wanxlfw.o
From: Xunlei Pang <[email protected]>
commit e3336cab2579012b1e72b5265adf98e2d6e244ad upstream.
We've met softlockup with "CONFIG_PREEMPT_NONE=y", when the target memcg
doesn't have any reclaimable memory.
It can be easily reproduced as below:
watchdog: BUG: soft lockup - CPU#0 stuck for 111s![memcg_test:2204]
CPU: 0 PID: 2204 Comm: memcg_test Not tainted 5.9.0-rc2+ #12
Call Trace:
shrink_lruvec+0x49f/0x640
shrink_node+0x2a6/0x6f0
do_try_to_free_pages+0xe9/0x3e0
try_to_free_mem_cgroup_pages+0xef/0x1f0
try_charge+0x2c1/0x750
mem_cgroup_charge+0xd7/0x240
__add_to_page_cache_locked+0x2fd/0x370
add_to_page_cache_lru+0x4a/0xc0
pagecache_get_page+0x10b/0x2f0
filemap_fault+0x661/0xad0
ext4_filemap_fault+0x2c/0x40
__do_fault+0x4d/0xf9
handle_mm_fault+0x1080/0x1790
It only happens on our 1-vcpu instances, because there's no chance for
oom reaper to run to reclaim the to-be-killed process.
Add a cond_resched() at the upper shrink_node_memcgs() to solve this
issue, this will mean that we will get a scheduling point for each memcg
in the reclaimed hierarchy without any dependency on the reclaimable
memory in that memcg thus making it more predictable.
Suggested-by: Michal Hocko <[email protected]>
Signed-off-by: Xunlei Pang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Chris Down <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Julius Hemanth Pitti <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/vmscan.c | 8 ++++++++
1 file changed, 8 insertions(+)
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2708,6 +2708,14 @@ static bool shrink_node(pg_data_t *pgdat
unsigned long reclaimed;
unsigned long scanned;
+ /*
+ * This loop can become CPU-bound when target memcgs
+ * aren't eligible for reclaim - either because they
+ * don't have any reclaimable pages, or because their
+ * memory is explicitly protected. Avoid soft lockups.
+ */
+ cond_resched();
+
switch (mem_cgroup_protected(root, memcg)) {
case MEMCG_PROT_MIN:
/*
From: Petr Machata <[email protected]>
[ Upstream commit 297e77e53eadb332d5062913447b104a772dc33b ]
The parameter passed via DCB_ATTR_DCB_BUFFER is a struct dcbnl_buffer. The
field prio2buffer is an array of IEEE_8021Q_MAX_PRIORITIES bytes, where
each value is a number of a buffer to direct that priority's traffic to.
That value is however never validated to lie within the bounds set by
DCBX_MAX_BUFFERS. The only driver that currently implements the callback is
mlx5 (maintainers CCd), and that does not do any validation either, in
particual allowing incorrect configuration if the prio2buffer value does
not fit into 4 bits.
Instead of offloading the need to validate the buffer index to drivers, do
it right there in core, and bounce the request if the value is too large.
CC: Parav Pandit <[email protected]>
CC: Saeed Mahameed <[email protected]>
Fixes: e549f6f9c098 ("net/dcb: Add dcbnl buffer attribute")
Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/dcb/dcbnl.c | 8 ++++++++
1 file changed, 8 insertions(+)
--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -1421,6 +1421,7 @@ static int dcbnl_ieee_set(struct net_dev
{
const struct dcbnl_rtnl_ops *ops = netdev->dcbnl_ops;
struct nlattr *ieee[DCB_ATTR_IEEE_MAX + 1];
+ int prio;
int err;
if (!ops)
@@ -1469,6 +1470,13 @@ static int dcbnl_ieee_set(struct net_dev
struct dcbnl_buffer *buffer =
nla_data(ieee[DCB_ATTR_DCB_BUFFER]);
+ for (prio = 0; prio < ARRAY_SIZE(buffer->prio2buffer); prio++) {
+ if (buffer->prio2buffer[prio] >= DCBX_MAX_BUFFERS) {
+ err = -EINVAL;
+ goto err;
+ }
+ }
+
err = ops->dcbnl_setbuffer(netdev, buffer);
if (err)
goto err;
From: Peilin Ye <[email protected]>
[ Upstream commit bb3a420d47ab00d7e1e5083286cab15235a96680 ]
tipc_group_add_to_tree() returns silently if `key` matches `nkey` of an
existing node, causing tipc_group_create_member() to leak memory. Let
tipc_group_add_to_tree() return an error in such a case, so that
tipc_group_create_member() can handle it properly.
Fixes: 75da2163dbb6 ("tipc: introduce communication groups")
Reported-and-tested-by: [email protected]
Cc: Hillf Danton <[email protected]>
Link: https://syzkaller.appspot.com/bug?id=048390604fe1b60df34150265479202f10e13aff
Signed-off-by: Peilin Ye <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/tipc/group.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
--- a/net/tipc/group.c
+++ b/net/tipc/group.c
@@ -273,8 +273,8 @@ static struct tipc_member *tipc_group_fi
return NULL;
}
-static void tipc_group_add_to_tree(struct tipc_group *grp,
- struct tipc_member *m)
+static int tipc_group_add_to_tree(struct tipc_group *grp,
+ struct tipc_member *m)
{
u64 nkey, key = (u64)m->node << 32 | m->port;
struct rb_node **n, *parent = NULL;
@@ -291,10 +291,11 @@ static void tipc_group_add_to_tree(struc
else if (key > nkey)
n = &(*n)->rb_right;
else
- return;
+ return -EEXIST;
}
rb_link_node(&m->tree_node, parent, n);
rb_insert_color(&m->tree_node, &grp->members);
+ return 0;
}
static struct tipc_member *tipc_group_create_member(struct tipc_group *grp,
@@ -302,6 +303,7 @@ static struct tipc_member *tipc_group_cr
u32 instance, int state)
{
struct tipc_member *m;
+ int ret;
m = kzalloc(sizeof(*m), GFP_ATOMIC);
if (!m)
@@ -314,8 +316,12 @@ static struct tipc_member *tipc_group_cr
m->port = port;
m->instance = instance;
m->bc_acked = grp->bc_snd_nxt - 1;
+ ret = tipc_group_add_to_tree(grp, m);
+ if (ret < 0) {
+ kfree(m);
+ return NULL;
+ }
grp->member_cnt++;
- tipc_group_add_to_tree(grp, m);
tipc_nlist_add(&grp->dests, m->node);
m->state = state;
return m;
From: Tetsuo Handa <[email protected]>
[ Upstream commit a4b5cc9e10803ecba64a7d54c0f47e4564b4a980 ]
I confirmed that the problem fixed by commit 2a63866c8b51a3f7 ("tipc: fix
shutdown() of connectionless socket") also applies to stream socket.
----------
#include <sys/socket.h>
#include <unistd.h>
#include <sys/wait.h>
int main(int argc, char *argv[])
{
int fds[2] = { -1, -1 };
socketpair(PF_TIPC, SOCK_STREAM /* or SOCK_DGRAM */, 0, fds);
if (fork() == 0)
_exit(read(fds[0], NULL, 1));
shutdown(fds[0], SHUT_RDWR); /* This must make read() return. */
wait(NULL); /* To be woken up by _exit(). */
return 0;
}
----------
Since shutdown(SHUT_RDWR) should affect all processes sharing that socket,
unconditionally setting sk->sk_shutdown to SHUTDOWN_MASK will be the right
behavior.
Signed-off-by: Tetsuo Handa <[email protected]>
Acked-by: Ying Xue <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/tipc/socket.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2565,10 +2565,7 @@ static int tipc_shutdown(struct socket *
lock_sock(sk);
__tipc_shutdown(sock, TIPC_CONN_SHUTDOWN);
- if (tipc_sk_type_connectionless(sk))
- sk->sk_shutdown = SHUTDOWN_MASK;
- else
- sk->sk_shutdown = SEND_SHUTDOWN;
+ sk->sk_shutdown = SHUTDOWN_MASK;
if (sk->sk_state == TIPC_DISCONNECTING) {
/* Discard any unreceived messages */
From: Rustam Kovhaev <[email protected]>
[ Upstream commit f65886606c2d3b562716de030706dfe1bea4ed5e ]
when kmalloc() fails in kvm_io_bus_unregister_dev(), before removing
the bus, we should iterate over all other devices linked to it and call
kvm_iodevice_destructor() for them
Fixes: 90db10434b16 ("KVM: kvm_io_bus_unregister_dev() should never fail")
Cc: [email protected]
Reported-and-tested-by: [email protected]
Link: https://syzkaller.appspot.com/bug?extid=f196caa45793d6374707
Signed-off-by: Rustam Kovhaev <[email protected]>
Reviewed-by: Vitaly Kuznetsov <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
virt/kvm/kvm_main.c | 21 ++++++++++++---------
1 file changed, 12 insertions(+), 9 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2155b52b17eca..6bd01d12df2ec 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3844,7 +3844,7 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
struct kvm_io_device *dev)
{
- int i;
+ int i, j;
struct kvm_io_bus *new_bus, *bus;
bus = kvm_get_bus(kvm, bus_idx);
@@ -3861,17 +3861,20 @@ void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
new_bus = kmalloc(sizeof(*bus) + ((bus->dev_count - 1) *
sizeof(struct kvm_io_range)), GFP_KERNEL);
- if (!new_bus) {
+ if (new_bus) {
+ memcpy(new_bus, bus, sizeof(*bus) + i * sizeof(struct kvm_io_range));
+ new_bus->dev_count--;
+ memcpy(new_bus->range + i, bus->range + i + 1,
+ (new_bus->dev_count - i) * sizeof(struct kvm_io_range));
+ } else {
pr_err("kvm: failed to shrink bus, removing it completely\n");
- goto broken;
+ for (j = 0; j < bus->dev_count; j++) {
+ if (j == i)
+ continue;
+ kvm_iodevice_destructor(bus->range[j].dev);
+ }
}
- memcpy(new_bus, bus, sizeof(*bus) + i * sizeof(struct kvm_io_range));
- new_bus->dev_count--;
- memcpy(new_bus->range + i, bus->range + i + 1,
- (new_bus->dev_count - i) * sizeof(struct kvm_io_range));
-
-broken:
rcu_assign_pointer(kvm->buses[bus_idx], new_bus);
synchronize_srcu_expedited(&kvm->srcu);
kfree(bus);
--
2.25.1
From: Eric Dumazet <[email protected]>
[ Upstream commit 3ca1a42a52ca4b4f02061683851692ad65fefac8 ]
If skb_put_padto() returns an error, skb has been freed.
Better not touch it anymore, as reported by syzbot [1]
Note to qrtr maintainers : this suggests qrtr_sendmsg()
should adjust sock_alloc_send_skb() second parameter
to account for the potential added alignment to avoid
reallocation.
[1]
BUG: KASAN: use-after-free in __skb_insert include/linux/skbuff.h:1907 [inline]
BUG: KASAN: use-after-free in __skb_queue_before include/linux/skbuff.h:2016 [inline]
BUG: KASAN: use-after-free in __skb_queue_tail include/linux/skbuff.h:2049 [inline]
BUG: KASAN: use-after-free in skb_queue_tail+0x6b/0x120 net/core/skbuff.c:3146
Write of size 8 at addr ffff88804d8ab3c0 by task syz-executor.4/4316
CPU: 1 PID: 4316 Comm: syz-executor.4 Not tainted 5.9.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1d6/0x29e lib/dump_stack.c:118
print_address_description+0x66/0x620 mm/kasan/report.c:383
__kasan_report mm/kasan/report.c:513 [inline]
kasan_report+0x132/0x1d0 mm/kasan/report.c:530
__skb_insert include/linux/skbuff.h:1907 [inline]
__skb_queue_before include/linux/skbuff.h:2016 [inline]
__skb_queue_tail include/linux/skbuff.h:2049 [inline]
skb_queue_tail+0x6b/0x120 net/core/skbuff.c:3146
qrtr_tun_send+0x1a/0x40 net/qrtr/tun.c:23
qrtr_node_enqueue+0x44f/0xc00 net/qrtr/qrtr.c:364
qrtr_bcast_enqueue+0xbe/0x140 net/qrtr/qrtr.c:861
qrtr_sendmsg+0x680/0x9c0 net/qrtr/qrtr.c:960
sock_sendmsg_nosec net/socket.c:651 [inline]
sock_sendmsg net/socket.c:671 [inline]
sock_write_iter+0x317/0x470 net/socket.c:998
call_write_iter include/linux/fs.h:1882 [inline]
new_sync_write fs/read_write.c:503 [inline]
vfs_write+0xa96/0xd10 fs/read_write.c:578
ksys_write+0x11b/0x220 fs/read_write.c:631
do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45d5b9
Code: 5d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f84b5b81c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000038b40 RCX: 000000000045d5b9
RDX: 0000000000000055 RSI: 0000000020001240 RDI: 0000000000000003
RBP: 00007f84b5b81ca0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000f
R13: 00007ffcbbf86daf R14: 00007f84b5b829c0 R15: 000000000118cf4c
Allocated by task 4316:
kasan_save_stack mm/kasan/common.c:48 [inline]
kasan_set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc+0x100/0x130 mm/kasan/common.c:461
slab_post_alloc_hook+0x3e/0x290 mm/slab.h:518
slab_alloc mm/slab.c:3312 [inline]
kmem_cache_alloc+0x1c1/0x2d0 mm/slab.c:3482
skb_clone+0x1b2/0x370 net/core/skbuff.c:1449
qrtr_bcast_enqueue+0x6d/0x140 net/qrtr/qrtr.c:857
qrtr_sendmsg+0x680/0x9c0 net/qrtr/qrtr.c:960
sock_sendmsg_nosec net/socket.c:651 [inline]
sock_sendmsg net/socket.c:671 [inline]
sock_write_iter+0x317/0x470 net/socket.c:998
call_write_iter include/linux/fs.h:1882 [inline]
new_sync_write fs/read_write.c:503 [inline]
vfs_write+0xa96/0xd10 fs/read_write.c:578
ksys_write+0x11b/0x220 fs/read_write.c:631
do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 4316:
kasan_save_stack mm/kasan/common.c:48 [inline]
kasan_set_track+0x3d/0x70 mm/kasan/common.c:56
kasan_set_free_info+0x17/0x30 mm/kasan/generic.c:355
__kasan_slab_free+0xdd/0x110 mm/kasan/common.c:422
__cache_free mm/slab.c:3418 [inline]
kmem_cache_free+0x82/0xf0 mm/slab.c:3693
__skb_pad+0x3f5/0x5a0 net/core/skbuff.c:1823
__skb_put_padto include/linux/skbuff.h:3233 [inline]
skb_put_padto include/linux/skbuff.h:3252 [inline]
qrtr_node_enqueue+0x62f/0xc00 net/qrtr/qrtr.c:360
qrtr_bcast_enqueue+0xbe/0x140 net/qrtr/qrtr.c:861
qrtr_sendmsg+0x680/0x9c0 net/qrtr/qrtr.c:960
sock_sendmsg_nosec net/socket.c:651 [inline]
sock_sendmsg net/socket.c:671 [inline]
sock_write_iter+0x317/0x470 net/socket.c:998
call_write_iter include/linux/fs.h:1882 [inline]
new_sync_write fs/read_write.c:503 [inline]
vfs_write+0xa96/0xd10 fs/read_write.c:578
ksys_write+0x11b/0x220 fs/read_write.c:631
do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The buggy address belongs to the object at ffff88804d8ab3c0
which belongs to the cache skbuff_head_cache of size 224
The buggy address is located 0 bytes inside of
224-byte region [ffff88804d8ab3c0, ffff88804d8ab4a0)
The buggy address belongs to the page:
page:00000000ea8cccfb refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88804d8abb40 pfn:0x4d8ab
flags: 0xfffe0000000200(slab)
raw: 00fffe0000000200 ffffea0002237ec8 ffffea00029b3388 ffff88821bb66800
raw: ffff88804d8abb40 ffff88804d8ab000 000000010000000b 0000000000000000
page dumped because: kasan: bad access detected
Fixes: ce57785bf91b ("net: qrtr: fix len of skb_put_padto in qrtr_node_enqueue")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Cc: Carl Huang <[email protected]>
Cc: Wen Gong <[email protected]>
Cc: Bjorn Andersson <[email protected]>
Cc: Manivannan Sadhasivam <[email protected]>
Acked-by: Manivannan Sadhasivam <[email protected]>
Reviewed-by: Bjorn Andersson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/qrtr/qrtr.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
--- a/net/qrtr/qrtr.c
+++ b/net/qrtr/qrtr.c
@@ -185,7 +185,7 @@ static int qrtr_node_enqueue(struct qrtr
{
struct qrtr_hdr_v1 *hdr;
size_t len = skb->len;
- int rc = -ENODEV;
+ int rc;
hdr = skb_push(skb, sizeof(*hdr));
hdr->version = cpu_to_le32(QRTR_PROTO_VER_1);
@@ -203,15 +203,17 @@ static int qrtr_node_enqueue(struct qrtr
hdr->size = cpu_to_le32(len);
hdr->confirm_rx = 0;
- skb_put_padto(skb, ALIGN(len, 4) + sizeof(*hdr));
-
- mutex_lock(&node->ep_lock);
- if (node->ep)
- rc = node->ep->xmit(node->ep, skb);
- else
- kfree_skb(skb);
- mutex_unlock(&node->ep_lock);
+ rc = skb_put_padto(skb, ALIGN(len, 4) + sizeof(*hdr));
+ if (!rc) {
+ mutex_lock(&node->ep_lock);
+ rc = -ENODEV;
+ if (node->ep)
+ rc = node->ep->xmit(node->ep, skb);
+ else
+ kfree_skb(skb);
+ mutex_unlock(&node->ep_lock);
+ }
return rc;
}
From: Xin Long <[email protected]>
[ Upstream commit ff48b6222e65ebdba5a403ef1deba6214e749193 ]
In tipc_buf_append() it may change skb's frag_list, and it causes
problems when this skb is cloned. skb_unclone() doesn't really
make this skb's flag_list available to change.
Shuang Li has reported an use-after-free issue because of this
when creating quite a few macvlan dev over the same dev, where
the broadcast packets will be cloned and go up to the stack:
[ ] BUG: KASAN: use-after-free in pskb_expand_head+0x86d/0xea0
[ ] Call Trace:
[ ] dump_stack+0x7c/0xb0
[ ] print_address_description.constprop.7+0x1a/0x220
[ ] kasan_report.cold.10+0x37/0x7c
[ ] check_memory_region+0x183/0x1e0
[ ] pskb_expand_head+0x86d/0xea0
[ ] process_backlog+0x1df/0x660
[ ] net_rx_action+0x3b4/0xc90
[ ]
[ ] Allocated by task 1786:
[ ] kmem_cache_alloc+0xbf/0x220
[ ] skb_clone+0x10a/0x300
[ ] macvlan_broadcast+0x2f6/0x590 [macvlan]
[ ] macvlan_process_broadcast+0x37c/0x516 [macvlan]
[ ] process_one_work+0x66a/0x1060
[ ] worker_thread+0x87/0xb10
[ ]
[ ] Freed by task 3253:
[ ] kmem_cache_free+0x82/0x2a0
[ ] skb_release_data+0x2c3/0x6e0
[ ] kfree_skb+0x78/0x1d0
[ ] tipc_recvmsg+0x3be/0xa40 [tipc]
So fix it by using skb_unshare() instead, which would create a new
skb for the cloned frag and it'll be safe to change its frag_list.
The similar things were also done in sctp_make_reassembled_event(),
which is using skb_copy().
Reported-by: Shuang Li <[email protected]>
Fixes: 37e22164a8a3 ("tipc: rename and move message reassembly function")
Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/tipc/msg.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/tipc/msg.c
+++ b/net/tipc/msg.c
@@ -140,7 +140,8 @@ int tipc_buf_append(struct sk_buff **hea
if (fragid == FIRST_FRAGMENT) {
if (unlikely(head))
goto err;
- if (unlikely(skb_unclone(frag, GFP_ATOMIC)))
+ frag = skb_unshare(frag, GFP_ATOMIC);
+ if (unlikely(!frag))
goto err;
head = *headbuf = frag;
*buf = NULL;
Hi!
> [ Upstream commit 2fbc6e89b2f1403189e624cabaf73e189c5e50c6 ]
>
> Kfir reported that pmtu exceptions are not created properly for
> deployments where multipath routes use the same device.
This is mismerged (in a way that does not affect functionality):
> @@ -779,6 +779,8 @@ static void __ip_do_redirect(struct rtab
> if (fib_lookup(net, fl4, &res, 0) == 0) {
> struct fib_nh *nh = &FIB_RES_NH(res);
>
> + fib_select_path(net, &res, fl4, skb);
> + nh = &FIB_RES_NH(res);
> update_or_create_fnhe(nh, fl4->daddr, new_gw,
> 0, false,
nh is assigned value that is never used. Mainline patch removes the
assignment (but variable has different type).
4.19 should delete the assignment, too.
Best regards,
Pavel
Signed-off-by: Pavel Machek (CIP) <[email protected]>
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index f60e28418ece..84de87b7eedc 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -777,7 +777,7 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow
neigh_event_send(n, NULL);
} else {
if (fib_lookup(net, fl4, &res, 0) == 0) {
- struct fib_nh *nh = &FIB_RES_NH(res);
+ struct fib_nh *nh;
fib_select_path(net, &res, fl4, skb);
nh = &FIB_RES_NH(res);
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Hi!
> This is the start of the stable review cycle for the 4.19.148 release.
> There are 37 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
CIP testing did not detect any problems.
https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/tree/linux-4.19.y
I see significant part of this is LLVM support... I'm quite surprised
to see it in -stable. Few words if LLVM is now officially supported in
4.19 or what is going on here would be welcome.
Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
On 9/25/20 6:48 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.19.148 release.
> There are 37 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sun, 27 Sep 2020 12:47:02 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.148-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan <[email protected]>
thanks,
-- Shuah
On Fri, 25 Sep 2020 at 18:23, Greg Kroah-Hartman
<[email protected]> wrote:
>
> This is the start of the stable review cycle for the 4.19.148 release.
> There are 37 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sun, 27 Sep 2020 12:47:02 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.148-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>
Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing <[email protected]>
Summary
------------------------------------------------------------------------
kernel: 4.19.148-rc1
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.19.y
git commit: 1e68f3302e6a25ff4310dbf4f2de747180d01146
git describe: v4.19.147-38-g1e68f3302e6a
Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-4.19.y-sanity/build/v4.19.147-38-g1e68f3302e6a
No regressions (compared to build v4.19.147)
No fixes (compared to build v4.19.147)
Ran 28038 total tests in the following environments and test suites.
Environments
--------------
- dragonboard-410c
- hi6220-hikey
- i386
- juno-r2
- juno-r2-compat
- juno-r2-kasan
- nxp-ls2088
- qemu_arm
- qemu_arm64
- qemu_i386
- qemu_x86_64
- x15
- x86
- x86-kasan
Test Suites
-----------
* linux-log-parser
* ltp-cap_bounds-tests
* ltp-commands-tests
* ltp-containers-tests
* ltp-cpuhotplug-tests
* ltp-crypto-tests
* ltp-dio-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-syscalls-tests
* libhugetlbfs
* ltp-fs-tests
* ltp-hugetlb-tests
* ltp-mm-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-securebits-tests
* ltp-tracing-tests
* v4l2-compliance
* ltp-controllers-tests
* ltp-cve-tests
* ltp-open-posix-tests
* ltp-sched-tests
* network-basic-tests
--
Linaro LKFT
https://lkft.linaro.org
On Fri, Sep 25, 2020 at 02:48:28PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.19.148 release.
> There are 37 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sun, 27 Sep 2020 12:47:02 +0000.
> Anything received after that time might be too late.
>
Build results:
total: 155 pass: 155 fail: 0
Qemu test results:
total: 421 pass: 421 fail: 0
Tested-by: Guenter Roeck <[email protected]>
Guenter
On Fri, Sep 25, 2020 at 06:51:34PM +0200, Pavel Machek wrote:
> Hi!
>
> > [ Upstream commit 2fbc6e89b2f1403189e624cabaf73e189c5e50c6 ]
> >
> > Kfir reported that pmtu exceptions are not created properly for
> > deployments where multipath routes use the same device.
>
> This is mismerged (in a way that does not affect functionality):
>
>
> > @@ -779,6 +779,8 @@ static void __ip_do_redirect(struct rtab
> > if (fib_lookup(net, fl4, &res, 0) == 0) {
> > struct fib_nh *nh = &FIB_RES_NH(res);
> >
> > + fib_select_path(net, &res, fl4, skb);
> > + nh = &FIB_RES_NH(res);
> > update_or_create_fnhe(nh, fl4->daddr, new_gw,
> > 0, false,
>
> nh is assigned value that is never used. Mainline patch removes the
> assignment (but variable has different type).
>
> 4.19 should delete the assignment, too.
Ah, good catch, I'll merge this in, thanks.
greg k-h
On Fri, Sep 25, 2020 at 07:39:46PM +0200, Pavel Machek wrote:
> Hi!
>
> > This is the start of the stable review cycle for the 4.19.148 release.
> > There are 37 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
>
> CIP testing did not detect any problems.
>
> https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/tree/linux-4.19.y
Great!
> I see significant part of this is LLVM support... I'm quite surprised
> to see it in -stable. Few words if LLVM is now officially supported in
> 4.19 or what is going on here would be welcome.
See the stable list archives for the submission of these patches if you
are curious about it.
And yes, many systems run 4.19.y with llvm, many many millions of
devices or so :)
thanks,
greg k-h
On Sat 2020-09-26 17:46:35, Greg Kroah-Hartman wrote:
> On Fri, Sep 25, 2020 at 06:51:34PM +0200, Pavel Machek wrote:
> > Hi!
> >
> > > [ Upstream commit 2fbc6e89b2f1403189e624cabaf73e189c5e50c6 ]
> > >
> > > Kfir reported that pmtu exceptions are not created properly for
> > > deployments where multipath routes use the same device.
> >
> > This is mismerged (in a way that does not affect functionality):
> >
> >
> > > @@ -779,6 +779,8 @@ static void __ip_do_redirect(struct rtab
> > > if (fib_lookup(net, fl4, &res, 0) == 0) {
> > > struct fib_nh *nh = &FIB_RES_NH(res);
> > >
> > > + fib_select_path(net, &res, fl4, skb);
> > > + nh = &FIB_RES_NH(res);
> > > update_or_create_fnhe(nh, fl4->daddr, new_gw,
> > > 0, false,
> >
> > nh is assigned value that is never used. Mainline patch removes the
> > assignment (but variable has different type).
> >
> > 4.19 should delete the assignment, too.
>
> Ah, good catch, I'll merge this in, thanks.
Thank you!
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany