Return-Path: MIME-Version: 1.0 In-Reply-To: References: <1340115637-2595-1-git-send-email-mikel.astiz.oss@gmail.com> <20120629132435.GB27007@x220> Date: Tue, 3 Jul 2012 17:46:48 +0200 Message-ID: Subject: Re: [RFC v0] Bluetooth: Fix lost socket error code From: Mikel Astiz To: Mike Cc: johan.hedberg@gmail.com, linux-bluetooth@vger.kernel.org Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-bluetooth-owner@vger.kernel.org List-ID: Hi Mike, On Tue, Jul 3, 2012 at 3:42 PM, Mike wrote: > Hi Mikel, > > On Tue, Jul 3, 2012 at 2:44 AM, Mikel Astiz wrote: >> Hi Johan, >> >> On Fri, Jun 29, 2012 at 3:24 PM, Johan Hedberg wrote: >>> Hi Mikel, >>> >>> On Tue, Jun 19, 2012, Mikel Astiz wrote: >>>> From: Mikel Astiz >>>> >>>> Using sock_error() here to check the status of the socket is wrong >>>> because it resets sk->sk_err to zero. For RFCOMM sockets, this means the >>>> disconnect reason is not exposed to userland in the socket options. >>>> >>>> Signed-off-by: Mikel Astiz >>>> --- >>>> This a second attempt to expose ACL disconnect reason to userland, as proposed first in "[RFC v0] Bluetooth: mgmt: Add device disconnect reason". >>>> >>>> The motivation behind was explained in the userspace patchset "[RFC BlueZ v0 0/5] ACL disconnect reason". >>>> >>>> This second approach focuses on RFCOMM sockets given that L2CAP is already exposing such information. >>>> >>>> net/bluetooth/af_bluetooth.c | 2 +- >>>> 1 files changed, 1 insertions(+), 1 deletions(-) >>>> >>>> diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c >>>> index f7db579..4799338 100644 >>>> --- a/net/bluetooth/af_bluetooth.c >>>> +++ b/net/bluetooth/af_bluetooth.c >>>> @@ -313,7 +313,7 @@ int bt_sock_stream_recvmsg(struct kiocb *iocb, struct socket *sock, >>>> if (copied >= target) >>>> break; >>>> >>>> - err = sock_error(sk); >>>> + err = sk->sk_err; >>>> if (err) >>>> break; >>>> if (sk->sk_shutdown & RCV_SHUTDOWN) >>> >>> Looking at the implementation of sock_error (include/net/sock.h) This >>> should be: >>> >>> err = -sk->sk_err; >> >> Right. >> >>> However, is this really needed if the error is returned for the write() >>> or read() that triggers it (through errno). And in the case of >>> POLLHUP/POLLERR there wont be any write/read that could clear the error >>> so you should be able to read it with getsockopt just fine. Am I missing >>> something? >> >> I don't actually get why rfcomm_sock_recvmsg() is being called, but >> that's what I experimentally see: >> >> [18985.546436] [73] rfcomm_l2data_ready:197: ffff880205613800 bytes 4 >> [18985.546535] [21592] rfcomm_process_rx:1853: session >> ffff880230ccc9c0 state 1 qlen 1 >> [18985.546546] [21592] rfcomm_recv_disc:1205: session ffff880230ccc9c0 >> state 1 dlci 26 >> [18985.546552] [21592] rfcomm_send_ua:773: ffff880230ccc9c0 dlci 26 >> [18985.546558] [21592] rfcomm_send_cmd:750: ffff880230ccc9c0 cmd 115 >> [18985.546563] [21592] rfcomm_send_frame:741: session ffff880230ccc9c0 len 4 >> [18985.546591] [21592] __rfcomm_dlc_close:447: dlc ffff88016fe16500 >> state 9 dlci 26 err 104 session ffff880230ccc9c0 >> [18985.546597] [21592] rfcomm_dlc_clear_timer:286: dlc ffff88016fe16500 state 9 >> [18985.546603] [21592] rfcomm_sk_state_change:71: dlc ffff88016fe16500 >> state 9 err 104 >> >> Here the error has been set. >> >> [18985.546628] [21592] rfcomm_dlc_unlink:351: dlc ffff88016fe16500 >> refcnt 2 session ffff880230ccc9c0 >> [18985.546635] [21592] rfcomm_session_set_timer:250: session >> ffff880230ccc9c0 state 1 timeout 2000 >> [18985.546646] [21592] rfcomm_process_dlcs:1792: session >> ffff880230ccc9c0 state 1 >> [18985.546701] [32724] bt_sock_stream_recvmsg:300: sk ffff880205615800 size 7768 >> >> And here the error has been cleared. >> >> [18985.546780] [32724] rfcomm_sock_getsockopt:806: sk ffff880205615800 >> [18985.546788] [32724] rfcomm_sock_getsockopt_old:742: sk ffff880205615800 >> [18985.546791] [21576] rfcomm_sock_shutdown:882: sock >> ffff880136965180, sk ffff880205615800 >> [18985.546799] [21576] __rfcomm_sock_close:209: sk ffff880205615800 >> state 9 socket ffff880136965180 >> >> My best guess would be that we don't shutdown the socket immediately >> after the error, and that's the reason why packets are still flowing >> around, but I can't really tell. >> >> In any case, using sock_error() to exit from bt_sock_stream_recvmsg() >> seems a bad idea. I would say it's buggy due to race conditions, since >> the socket option is temporarily set and then later cleared, don't you >> think? > > Are you seeing this in reference to ofono? I noticed that > gatchat/gatio.c, function received_data would try to do a read before > checking the condition that caused it to be called. In this case, the > socket error would be correct at the beginning of the function, but > would no longer be correct by the time the function reaches the > condition check for G_IO_HUP as a read attempt occurs. Yes, you're right. The socket option does exist at the beginning of received_data(), and it's cleared during the body. However, I'm still not sure what the general Kernel policy is regarding socket error reporting, so my question would be: is it a desired behaviour that the SO_ERROR would be cleared after a read attempt? If so, the Kernel would need no patch, and it would become a userland problem to propagate the disconnect reason as appropriate. Cheers, Mikel