Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754807AbcLNERI (ORCPT ); Tue, 13 Dec 2016 23:17:08 -0500 Received: from mx1.redhat.com ([209.132.183.28]:55776 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754592AbcLNERG (ORCPT ); Tue, 13 Dec 2016 23:17:06 -0500 Date: Tue, 13 Dec 2016 23:17:01 -0500 From: Richard Guy Briggs To: Cong Wang Cc: Herbert Xu , Johannes Berg , netdev , Florian Westphal , LKML , Eric Dumazet , linux-audit@redhat.com, syzkaller , David Miller , Dmitry Vyukov Subject: Re: netlink: GPF in sock_sndtimeo Message-ID: <20161214041701.GN22660@madcap2.tricolour.ca> References: <20161130045207.GE26673@madcap2.tricolour.ca> <20161209060248.GT22655@madcap2.tricolour.ca> <20161209110155.GW22655@madcap2.tricolour.ca> <20161212100215.GA1305@madcap2.tricolour.ca> <20161213105233.GG1305@madcap2.tricolour.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Wed, 14 Dec 2016 04:17:05 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1954 Lines: 51 On 2016-12-13 16:17, Cong Wang wrote: > On Tue, Dec 13, 2016 at 2:52 AM, Richard Guy Briggs wrote: > > It is actually the audit_pid and audit_nlk_portid that I care about > > more. The audit daemon could vanish or close the socket while the > > kernel sock to which it was attached is still quite valid. Accessing > > the set of three atomically is the urge. I wonder if it makes more > > sense to test for the presence of auditd using audit_sock rather than > > audit_pid, but still keep audit_pid for our reporting and replacement > > strategy. Another idea would be to put the three in one struct. > > Note, the process has audit_pid should hold a refcnt to the netns too, > so the netns can't be gone until that process is gone. I noted that. I did wonder if there might be a problem if all the processes were moved to another netns with the struct sock stuck in the now process-void netns. This is alluded-to in 6f285b19d09f ("audit: Send replies in the proper network namespace."). > > Can someone explain how they think the original test was able to trigger > > this GPF? Network namespace shutdown while something pretended to set > > up a new auditd? That's impressive for a fuzzer if that's the case... > > Is there an strace? I guess it is all in test(). > > I am surprised you still don't get the race condition even when you > are now working on v2... > > The race happens in this scenarios : > > 1) Create a new netns > > 2) In the new netns, communicate with kauditd to set audit_sock > > 3) Generate some audit messages, so kauditd will keep sending them > via audit_sock > > 4) exit the netns > > 5) the previous audit_sock is now going away, but kaudit_sock could still > access it in this small window. Ah ok that fits... - RGB -- Richard Guy Briggs Kernel Security Engineering, Base Operating Systems, Red Hat Remote, Ottawa, Canada Voice: +1.647.777.2635, Internal: (81) 32635