From: Zhaoyang Huang <[email protected]>
The parameter allocation here is used for indicating if the memory
allocation can stall or not. Since we have got the skb buffer, it
doesn't make sense to check if we can yield on the net's congested
via gfp_flags. Remove it now.
Signed-off-by: Zhaoyang Huang <[email protected]>
---
net/netlink/af_netlink.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 4c57532..af5b6af 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1526,7 +1526,7 @@ int netlink_broadcast(struct sock *ssk, struct sk_buff *skb, u32 portid,
consume_skb(info.skb2);
if (info.delivered) {
- if (info.congested && gfpflags_allow_blocking(allocation))
+ if (info.congested)
yield();
return 0;
}
--
1.9.1
On Thu, 23 Dec 2021 09:56:07 +0800 Huangzhaoyang wrote:
> From: Zhaoyang Huang <[email protected]>
>
> The parameter allocation here is used for indicating if the memory
> allocation can stall or not. Since we have got the skb buffer, it
> doesn't make sense to check if we can yield on the net's congested
> via gfp_flags. Remove it now.
This is checking if we can sleep AFAICT. What are you trying to fix?
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index 4c57532..af5b6af 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -1526,7 +1526,7 @@ int netlink_broadcast(struct sock *ssk, struct sk_buff *skb, u32 portid,
> consume_skb(info.skb2);
>
> if (info.delivered) {
> - if (info.congested && gfpflags_allow_blocking(allocation))
> + if (info.congested)
> yield();
> return 0;
> }
On Fri, Dec 24, 2021 at 1:11 AM Jakub Kicinski <[email protected]> wrote:
>
> On Thu, 23 Dec 2021 09:56:07 +0800 Huangzhaoyang wrote:
> > From: Zhaoyang Huang <[email protected]>
> >
> > The parameter allocation here is used for indicating if the memory
> > allocation can stall or not. Since we have got the skb buffer, it
> > doesn't make sense to check if we can yield on the net's congested
> > via gfp_flags. Remove it now.
>
> This is checking if we can sleep AFAICT. What are you trying to fix?
Yes and NO. gfp means *get free pages* which indicate if the embedded
memory allocation among the process can sleep or not, but without any
other meanings. The driver which invokes this function could have to
use GFP_KERNEL for allocating memory as the critical resources but
don't want to sleep on the netlink's congestion.
>
> > diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> > index 4c57532..af5b6af 100644
> > --- a/net/netlink/af_netlink.c
> > +++ b/net/netlink/af_netlink.c
> > @@ -1526,7 +1526,7 @@ int netlink_broadcast(struct sock *ssk, struct sk_buff *skb, u32 portid,
> > consume_skb(info.skb2);
> >
> > if (info.delivered) {
> > - if (info.congested && gfpflags_allow_blocking(allocation))
> > + if (info.congested)
> > yield();
> > return 0;
> > }
>
On Mon, Dec 27, 2021 at 2:14 PM Zhaoyang Huang <[email protected]> wrote:
>
> On Fri, Dec 24, 2021 at 1:11 AM Jakub Kicinski <[email protected]> wrote:
> >
> > On Thu, 23 Dec 2021 09:56:07 +0800 Huangzhaoyang wrote:
> > > From: Zhaoyang Huang <[email protected]>
> > >
> > > The parameter allocation here is used for indicating if the memory
> > > allocation can stall or not. Since we have got the skb buffer, it
> > > doesn't make sense to check if we can yield on the net's congested
> > > via gfp_flags. Remove it now.
> >
> > This is checking if we can sleep AFAICT. What are you trying to fix?
> Yes and NO. gfp means *get free pages* which indicate if the embedded
> memory allocation among the process can sleep or not, but without any
> other meanings. The driver which invokes this function could have to
> use GFP_KERNEL for allocating memory as the critical resources but
> don't want to sleep on the netlink's congestion.
Since unique block flags(msg_flags & MSG_DONTWAIT) work as parameters
for unicast, could we introduce it to broadcast, instead of abusing
gfp_flag.
static int netlink_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
{
...
if (dst_group) {
refcount_inc(&skb->users);
netlink_broadcast(sk, skb, dst_portid, dst_group, GFP_KERNEL);
}
err = netlink_unicast(sk, skb, dst_portid, msg->msg_flags &
MSG_DONTWAIT);
>
> >
> > > diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> > > index 4c57532..af5b6af 100644
> > > --- a/net/netlink/af_netlink.c
> > > +++ b/net/netlink/af_netlink.c
> > > @@ -1526,7 +1526,7 @@ int netlink_broadcast(struct sock *ssk, struct sk_buff *skb, u32 portid,
> > > consume_skb(info.skb2);
> > >
> > > if (info.delivered) {
> > > - if (info.congested && gfpflags_allow_blocking(allocation))
> > > + if (info.congested)
> > > yield();
> > > return 0;
> > > }
> >
On Mon, 27 Dec 2021 15:38:31 +0800 Zhaoyang Huang wrote:
> On Mon, Dec 27, 2021 at 2:14 PM Zhaoyang Huang <[email protected]> wrote:
> > On Fri, Dec 24, 2021 at 1:11 AM Jakub Kicinski <[email protected]> wrote:
> > > This is checking if we can sleep AFAICT. What are you trying to fix?
> > Yes and NO. gfp means *get free pages* which indicate if the embedded
> > memory allocation among the process can sleep or not, but without any
> > other meanings. The driver which invokes this function could have to
> > use GFP_KERNEL for allocating memory as the critical resources but
> > don't want to sleep on the netlink's congestion.
Let's focus on explaining the problem you're trying to solve.
What's the driver you're talking about? Why does it have to
pay attention to the innards of netlink? Details, please.
> Since unique block flags(msg_flags & MSG_DONTWAIT) work as parameters
> for unicast, could we introduce it to broadcast, instead of abusing
> gfp_flag.
>
> static int netlink_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
> {
> ...
> if (dst_group) {
> refcount_inc(&skb->users);
> netlink_broadcast(sk, skb, dst_portid, dst_group, GFP_KERNEL);
> }
> err = netlink_unicast(sk, skb, dst_portid, msg->msg_flags & MSG_DONTWAIT);
Greeting,
FYI, we noticed the following commit (built with gcc-9):
commit: 6f87debdea50bc76bc30174b96e5a0264b0eead4 ("[PATCH] net: remove judgement based on gfp_flags")
url: https://github.com/0day-ci/linux/commits/Huangzhaoyang/net-remove-judgement-based-on-gfp_flags/20211223-095643
base: https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git 62a3106697f3c6f9af64a2cd0f9ff58552010dc8
patch link: https://lore.kernel.org/netdev/[email protected]
in testcase: stress-ng
version: stress-ng-x86_64-0.11-06_20211228
with following parameters:
nr_threads: 100%
testtime: 60s
sc_pid_max: 4194304
class: scheduler
test: netlink-proc
cpufreq_governor: performance
ucode: 0x5003102
on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 512G memory
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>
[ 42.703679][ T5762] BUG: scheduling while atomic: stress-ng-2/5762/0x00000002
[ 42.703695][ T5755] BUG: scheduling while atomic: stress-ng-spawn/5755/0x00000002
[ 42.703697][ T5756] BUG: scheduling while atomic: stress-ng-spawn/5756/0x00000002
[ 42.703698][ T5754] BUG: scheduling while atomic: stress-ng-spawn/5754/0x00000002
[ 42.703700][ T5756] Modules linked in:
[ 42.703701][ T5754] Modules linked in:
[ 42.703700][ T5718] BUG: scheduling while atomic: stress-ng/5718/0x00000002
[ 42.703702][ T5756] binfmt_misc
[ 42.703702][ T5754] binfmt_misc
[ 42.703703][ T5718] Modules linked in:
[ 42.703703][ T5756] btrfs
[ 42.703704][ T5718] binfmt_misc
[ 42.703703][ T5754] btrfs
[ 42.703705][ T5756] blake2b_generic
[ 42.703705][ T5763] BUG: scheduling while atomic: stress-ng-3/5763/0x00000002
[ 42.703706][ T5718] btrfs
[ 42.703706][ T5754] blake2b_generic
[ 42.703707][ T5756] xor
[ 42.703708][ T5718] blake2b_generic
[ 42.703708][ T5754] xor
[ 42.703708][ T5763] Modules linked in:
[ 42.703708][ T5756] raid6_pq
[ 42.703710][ T5754] raid6_pq
[ 42.703710][ T5718] xor
[ 42.703710][ T5763] binfmt_misc
[ 42.703710][ T5756] zstd_compress
[ 42.703711][ T5718] raid6_pq
[ 42.703711][ T5754] zstd_compress
[ 42.703712][ T5763] btrfs
[ 42.703713][ T5756] libcrc32c
[ 42.703713][ T5718] zstd_compress
[ 42.703713][ T5754] libcrc32c
[ 42.703712][ T5758] BUG: scheduling while atomic: stress-ng-spawn/5758/0x00000002
[ 42.703714][ T5763] blake2b_generic
[ 42.703715][ T5756] sd_mod
[ 42.703715][ T5718] libcrc32c
[ 42.703715][ T5754] sd_mod
[ 42.703715][ T5763] xor
[ 42.703716][ T5758] Modules linked in:
[ 42.703715][ T5749] BUG: scheduling while atomic: stress-ng/5749/0x00000002
[ 42.703717][ T5756] t10_pi
[ 42.703717][ T5718] sd_mod
[ 42.703717][ T5763] raid6_pq
[ 42.703718][ T5754] t10_pi
[ 42.703718][ T5758] binfmt_misc
[ 42.703719][ T5756] sg
[ 42.703719][ T5749] Modules linked in:
[ 42.703719][ T5718] t10_pi
[ 42.703719][ T5763] zstd_compress
[ 42.703720][ T5754] sg
[ 42.703721][ T5756] intel_rapl_msr
[ 42.703721][ T5758] btrfs
[ 42.703722][ T5749] binfmt_misc
[ 42.703722][ T5763] libcrc32c
[ 42.703722][ T5754] intel_rapl_msr
[ 42.703722][ T5718] sg
[ 42.703722][ T5764] BUG: scheduling while atomic: stress-ng-4/5764/0x00000002
[ 42.703723][ T5756] intel_rapl_common
[ 42.703724][ T5749] btrfs
[ 42.703725][ T5763] sd_mod
[ 42.703725][ T5718] intel_rapl_msr
[ 42.703725][ T5758] blake2b_generic
[ 42.703725][ T5754] intel_rapl_common
[ 42.703726][ T5756] skx_edac
[ 42.703726][ T5764] Modules linked in:
[ 42.703727][ T5763] t10_pi
[ 42.703727][ T5749] blake2b_generic
[ 42.703728][ T5718] intel_rapl_common
[ 42.703728][ T5754] skx_edac
[ 42.703728][ T5758] xor
[ 42.703728][ T5756] nfit
[ 42.703729][ T5764] binfmt_misc
[ 42.703729][ T5763] sg
[ 42.703729][ T5749] xor
[ 42.703731][ T5754] nfit
[ 42.703731][ T5758] raid6_pq
[ 42.703731][ T5718] skx_edac
[ 42.703732][ T5756] libnvdimm
[ 42.703732][ T5764] btrfs
[ 42.703732][ T5749] raid6_pq
[ 42.703732][ T5763] intel_rapl_msr
[ 42.703733][ T5754] libnvdimm
[ 42.703734][ T5718] nfit
[ 42.703733][ T5758] zstd_compress
[ 42.703734][ T5764] blake2b_generic
[ 42.703734][ T5756] x86_pkg_temp_thermal
[ 42.703735][ T5749] zstd_compress
[ 42.703735][ T5754] x86_pkg_temp_thermal
[ 42.703735][ T5763] intel_rapl_common
[ 42.703736][ T5758] libcrc32c
[ 42.703736][ T5764] xor
[ 42.703737][ T5749] libcrc32c
[ 42.703737][ T5756] intel_powerclamp
[ 42.703737][ T5718] libnvdimm
[ 42.703737][ T5763] skx_edac
[ 42.703738][ T5758] sd_mod
[ 42.703738][ T5754] intel_powerclamp
[ 42.703740][ T5764] raid6_pq
[ 42.703740][ T5749] sd_mod
[ 42.703740][ T5756] coretemp
[ 42.703740][ T5718] x86_pkg_temp_thermal
[ 42.703740][ T5758] t10_pi
[ 42.703741][ T5754] coretemp
[ 42.703741][ T5763] nfit
[ 42.703742][ T5749] t10_pi
[ 42.703742][ T5756] ipmi_ssif
[ 42.703743][ T5718] intel_powerclamp
[ 42.703743][ T5764] zstd_compress
[ 42.703743][ T5758] sg
[ 42.703744][ T5754] ipmi_ssif
[ 42.703745][ T5763] libnvdimm
[ 42.703745][ T5749] sg
[ 42.703745][ T5718] coretemp
[ 42.703746][ T5764] libcrc32c
[ 42.703746][ T5756] kvm_intel
[ 42.703747][ T5758] intel_rapl_msr
[ 42.703747][ T5763] x86_pkg_temp_thermal
[ 42.703748][ T5749] intel_rapl_msr
[ 42.703747][ T5757] BUG: scheduling while atomic: stress-ng-dead/5757/0x00000002
[ 42.703748][ T5754] kvm_intel
[ 42.703749][ T5756] kvm
[ 42.703749][ T5764] sd_mod
[ 42.703750][ T5718] ipmi_ssif
[ 42.703750][ T5758] intel_rapl_common
[ 42.703750][ T5749] intel_rapl_common
[ 42.703751][ T5763] intel_powerclamp
[ 42.703751][ T5757] Modules linked in:
[ 42.703752][ T5756] ast
[ 42.703751][ T5764] t10_pi
[ 42.703753][ T5718] kvm_intel
[ 42.703753][ T5754] kvm
[ 42.703754][ T5763] coretemp
[ 42.703754][ T5758] skx_edac
[ 42.703755][ T5756] drm_vram_helper
[ 42.703755][ T5749] skx_edac
[ 42.703755][ T5757] binfmt_misc
[ 42.703756][ T5764] sg
[ 42.703756][ T5718] kvm
[ 42.703756][ T5754] ast
[ 42.703756][ T5763] ipmi_ssif
[ 42.703757][ T5756] drm_ttm_helper
[ 42.703758][ T5749] nfit
[ 42.703758][ T5758] nfit
[ 42.703759][ T5757] btrfs
[ 42.703760][ T5754] drm_vram_helper
[ 42.703760][ T5764] intel_rapl_msr
[ 42.703760][ T5763] kvm_intel
[ 42.703761][ T5756] ttm
[ 42.703762][ T5749] libnvdimm
[ 42.703762][ T5758] libnvdimm
[ 42.703763][ T5718] ast
[ 42.703763][ T5757] blake2b_generic
[ 42.703764][ T5754] drm_ttm_helper
[ 42.703764][ T5764] intel_rapl_common
[ 42.703764][ T5756] drm_kms_helper
[ 42.703764][ T5749] x86_pkg_temp_thermal
[ 42.703765][ T5763] kvm
[ 42.703765][ T5718] drm_vram_helper
[ 42.703765][ T5758] x86_pkg_temp_thermal
[ 42.703766][ T5764] skx_edac
[ 42.703767][ T5757] xor
[ 42.703767][ T5718] drm_ttm_helper
[ 42.703767][ T5756] irqbypass
[ 42.703768][ T5758] intel_powerclamp
[ 42.703768][ T5763] ast
[ 42.703769][ T5749] intel_powerclamp
[ 42.703769][ T5757] raid6_pq
[ 42.703770][ T5764] nfit
[ 42.703769][ T5754] ttm
[ 42.703770][ T5756] crct10dif_pclmul
[ 42.703771][ T5718] ttm
[ 42.703772][ T5749] coretemp
[ 42.703772][ T5763] drm_vram_helper
[ 42.703773][ T5754] drm_kms_helper
[ 42.703772][ T5758] coretemp
[ 42.703773][ T5756] crc32_pclmul
[ 42.703773][ T5764] libnvdimm
[ 42.703774][ T5749] ipmi_ssif
[ 42.703774][ T5718] drm_kms_helper
[ 42.703773][ T5757] zstd_compress
[ 42.703775][ T5763] drm_ttm_helper
[ 42.703775][ T5754] irqbypass
[ 42.703776][ T5756] crc32c_intel
[ 42.703776][ T5764] x86_pkg_temp_thermal
[ 42.703777][ T5758] ipmi_ssif
[ 42.703777][ T5757] libcrc32c
[ 42.703777][ T5763] ttm
[ 42.703777][ T5749] kvm_intel
[ 42.703778][ T5718] irqbypass
[ 42.703779][ T5754] crct10dif_pclmul
[ 42.703780][ T5749] kvm
[ 42.703780][ T5764] intel_powerclamp
[ 42.703780][ T5756] ghash_clmulni_intel
[ 42.703781][ T5757] sd_mod
[ 42.703781][ T5718] crct10dif_pclmul
[ 42.703782][ T5754] crc32_pclmul
[ 42.703781][ T5763] drm_kms_helper
[ 42.703783][ T5756] syscopyarea
[ 42.703783][ T5764] coretemp
[ 42.703784][ T5757] t10_pi
[ 42.703784][ T5758] kvm_intel
[ 42.703785][ T5718] crc32_pclmul
[ 42.703786][ T5749] ast
[ 42.703786][ T5763] irqbypass
[ 42.703786][ T5754] crc32c_intel
[ 42.703786][ T5756] rapl
[ 42.703786][ T5764] ipmi_ssif
[ 42.703787][ T5757] sg
[ 42.703788][ T5763] crct10dif_pclmul
[ 42.703788][ T5758] kvm
[ 42.703789][ T5718] crc32c_intel
[ 42.703789][ T5749] drm_vram_helper
[ 42.703789][ T5756] sysfillrect
[ 42.703790][ T5763] crc32_pclmul
[ 42.703790][ T5718] ghash_clmulni_intel
[ 42.703790][ T5757] intel_rapl_msr
[ 42.703792][ T5764] kvm_intel
[ 42.703792][ T5754] ghash_clmulni_intel
[ 42.703792][ T5756] sysimgblt
[ 42.703793][ T5757] intel_rapl_common
[ 42.703794][ T5764] kvm
[ 42.703793][ T5749] drm_ttm_helper
[ 42.703794][ T5763] crc32c_intel
[ 42.703795][ T5718] syscopyarea
[ 42.703795][ T5754] syscopyarea
[ 42.703795][ T5758] ast
[ 42.703797][ T5757] skx_edac
[ 42.703796][ T5756] intel_cstate
[ 42.703797][ T5763] ghash_clmulni_intel
[ 42.703797][ T5754] rapl
[ 42.703798][ T5764] ast
[ 42.703797][ T5749] ttm
[ 42.703798][ T5718] rapl
[ 42.703799][ T5756] acpi_ipmi
[ 42.703800][ T5758] drm_vram_helper
[ 42.703800][ T5754] sysfillrect
[ 42.703801][ T5757] nfit
[ 42.703800][ T5749] drm_kms_helper
[ 42.703800][ T5763] syscopyarea
[ 42.703802][ T5764] drm_vram_helper
[ 42.703802][ T5756] ahci
[ 42.703802][ T5718] sysfillrect
[ 42.703803][ T5754] sysimgblt
[ 42.703803][ T5758] drm_ttm_helper
[ 42.703803][ T5749] irqbypass
[ 42.703804][ T5763] rapl
[ 42.703804][ T5757] libnvdimm
[ 42.703804][ T5764] drm_ttm_helper
[ 42.703805][ T5718] sysimgblt
[ 42.703806][ T5764] ttm
[ 42.703807][ T5754] intel_cstate
[ 42.703806][ T5758] ttm
[ 42.703805][ T5756] fb_sys_fops
[ 42.703807][ T5749] crct10dif_pclmul
[ 42.703807][ T5757] x86_pkg_temp_thermal
[ 42.703809][ T5763] sysfillrect
[ 42.703809][ T5764] drm_kms_helper
[ 42.703808][ T5754] acpi_ipmi
[ 42.703810][ T5749] crc32_pclmul
[ 42.703810][ T5718] intel_cstate
[ 42.703811][ T5758] drm_kms_helper
[ 42.703811][ T5763] sysimgblt
[ 42.703811][ T5764] irqbypass
[ 42.703812][ T5757] intel_powerclamp
[ 42.703811][ T5756] libahci
[ 42.703813][ T5749] crc32c_intel
[ 42.703813][ T5718] acpi_ipmi
[ 42.703813][ T5754] ahci
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation
Thanks,
Oliver Sang