Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1243184imu; Thu, 13 Dec 2018 11:42:26 -0800 (PST) X-Google-Smtp-Source: AFSGD/U7FU79ptslvjaZwhxiQvOT0mungsN9wB+gju+cMX4bL+EzE7Xhz1yFieoJghf/HIqLOku6 X-Received: by 2002:a17:902:47aa:: with SMTP id r39mr79819pld.219.1544730146840; Thu, 13 Dec 2018 11:42:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544730146; cv=none; d=google.com; s=arc-20160816; b=uQ3U1gUUiZggpxQ8LTK4DYpxFLtdMvt1ZGtYh8U9aKiIbpBiN7sr/yF8soCeQ/dVdu 9XfHzLa7WzRHd3X7Pw7RrXq2l/rWVSB/3pFEdZQP9n2FKRBwDd74DNn36AF0JkajSSfi wkoP7s2yzBIK67ZiO6AGxHfX2xpk+4tYKAYV0liOx9a8RY1M8xiPjTj2/pTuaY9u2j6I 2CLdKGFVT2y+B+nfD9ej4Fm29SmBnBLniiIkPCDV8TwVRJW5wFSubw4IKm7LpJj4Wh1h KQeW0Ks4WpnhMyAUyUkoiOQnbAm0eyEHORFX2p6XzqUCaV4NGU7r0NEAJ4cRbob2UTSr GK7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=lVy7QWyLumtaavz/1y04XXz+ZoxywNa6R1Q75WXbWYc=; b=OH3x69wtnIrF5GFByDmlMLSTiPSBJ9SDPtgJOkCbowM/CANgv4qoFHRW44wwJQkbD/ FMhDVibjh0udzR83RmScARGMdjRQEpHpbuYg8aKfb37STxKFVHU8RMC7tTRCM3NL1Tcm grziBaViug6jBMcXgAtXvU7rHvt910bkxwbF/DSbWU2+3BSrlFsZGROJ/s7Ti5Grmc0D wGh4WWSMz/8LSVuZbvHf93lF4lEm3994v8YpHh+qe+Qy4WBi68dK9gMZOrNthcEv1DDw 72LyF9Chblo1basMH/8w3AaY9+AV0lc0P0C6uuc6R2NaHJz5e3SV9Zn5CZ//pEw3djV2 wDAw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 1si2219555plr.189.2018.12.13.11.42.11; Thu, 13 Dec 2018 11:42:26 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728552AbeLMTk5 (ORCPT + 99 others); Thu, 13 Dec 2018 14:40:57 -0500 Received: from metis.ext.pengutronix.de ([85.220.165.71]:51067 "EHLO metis.ext.pengutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727707AbeLMTk5 (ORCPT ); Thu, 13 Dec 2018 14:40:57 -0500 Received: from ptx.hi.pengutronix.de ([2001:67c:670:100:1d::c0]) by metis.ext.pengutronix.de with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1gXWql-0005fj-Pe; Thu, 13 Dec 2018 20:40:47 +0100 Received: from ukl by ptx.hi.pengutronix.de with local (Exim 4.89) (envelope-from ) id 1gXWqk-0006Sv-0h; Thu, 13 Dec 2018 20:40:46 +0100 Date: Thu, 13 Dec 2018 20:40:45 +0100 From: Uwe =?iso-8859-1?Q?Kleine-K=F6nig?= To: Jeremy Cline Cc: Thierry Reding , Andrew Morton , Thomas Gleixner , Jonathan Corbet , Joe Perches , linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode Message-ID: <20181213194045.om6gixij6a63jvfg@pengutronix.de> References: <20181212131210.28024-1-thierry.reding@gmail.com> <20181212181410.GC2352@laptop.jcline.org> <20181213073708.nwj4nmnccuugvrhc@pengutronix.de> <20181213151052.GA13313@laptop.jcline.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20181213151052.GA13313@laptop.jcline.org> User-Agent: NeoMutt/20170113 (1.7.2) X-SA-Exim-Connect-IP: 2001:67c:670:100:1d::c0 X-SA-Exim-Mail-From: ukl@pengutronix.de X-SA-Exim-Scanned: No (on metis.ext.pengutronix.de); SAEximRunCond expanded to false X-PTX-Original-Recipient: linux-kernel@vger.kernel.org Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 13, 2018 at 10:10:52AM -0500, Jeremy Cline wrote: > On Thu, Dec 13, 2018 at 08:37:08AM +0100, Uwe Kleine-K?nig wrote: > > It didn't break for me. Can you provide details about how and when it > > broke for you? > > I was wrong about it being Python 2 that broke, sorry about that. > 6f4d29df66ac broke Python 3 when you run it against a sub-tree because > scan_git_tree() opens the files in binary mode, but then find is run > with a text string: > > $ python3 scripts/spdxcheck.py net/ > FAIL: argument should be integer or bytes-like object, not 'str' > Traceback (most recent call last): > File "scripts/spdxcheck.py", line 259, in > scan_git_subtree(repo.head.reference.commit.tree, p) > File "scripts/spdxcheck.py", line 211, in scan_git_subtree > scan_git_tree(tree) > File "scripts/spdxcheck.py", line 206, in scan_git_tree > parser.parse_lines(fd, args.maxlines, el.path) > File "scripts/spdxcheck.py", line 175, in parse_lines > if line.find("SPDX-License-Identifier:") < 0: > TypeError: argument should be integer or bytes-like object, not 'str' > > The reason I opened things in binary mode when I started adding Python 3 > support was because not all files were valid UTF-8 (and some were > binary) so I decoded the text line-by-line and ignored any decoding > errors for simplicity's sake. OK I understand. The problem is that there are inconsistencies in handling files as binaries or not that already existed before 6f4d29df66ac. Different code paths result in a different type for line depending on how fd was opened. I fixed the cases where fd was opened as text file and broke the cases where it was opened as binary. So changing this to consistently using binary mode (as the patch by Thierry does) seems the right thing to do. Thanks Uwe -- Pengutronix e.K. | Uwe Kleine-K?nig | Industrial Linux Solutions | http://www.pengutronix.de/ |