From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS53758 23.128.96.0/24 X-Spam-Status: No, score=-3.7 required=3.0 tests=AWL,BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by dcvr.yhbt.net (Postfix) with ESMTP id 569C91F4B4 for ; Mon, 12 Apr 2021 09:31:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239843AbhDLJSc (ORCPT ); Mon, 12 Apr 2021 05:18:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41860 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241789AbhDLJRR (ORCPT ); Mon, 12 Apr 2021 05:17:17 -0400 Received: from adoakley.name (adoakley.name [IPv6:2a01:4f8:c17:1310::2]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 71908C06138D for ; Mon, 12 Apr 2021 02:15:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=adoakley.name; s=2018; h=Content-Transfer-Encoding:MIME-Version:Message-Id: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=aQpZ9/1TtnpAriNgMfy1asibmOtZ5V7lkQcOthn0zCM=; b=ZJPXrrtSP/iI2ABKDe6jc7u63Z 1D5rdKbaVUurIjhfS6EtVbYDG7Z1/CC0+0XGgELstxzpCrPn9evEk+ui+llEWmGf7yy+6O2FybK+I TUtrIMSP37ONntU5A7+Eqyd3ZV0jzFTlEztS3w13xQ3KKKCDz5Lo+lwzu/m0x74HAx+w=; Received: from [2001:8b0:14bb:e93b::df1] (helo=ado-tr.ado-tr.lan) by adoakley.name with esmtpsa (TLS1.3) tls TLS_AES_256_GCM_SHA384 (Exim 4.93.0.4) (envelope-from ) id 1lVsJX-0000qB-0t; Mon, 12 Apr 2021 08:52:59 +0000 From: Andrew Oakley To: git@vger.kernel.org Cc: Luke Diamand , Feiyang Xue , Tzadik Vanderhoof Subject: [PATCH 0/2] git-p4: encoding of data from perforce Date: Mon, 12 Apr 2021 09:52:49 +0100 Message-Id: <20210412085251.51475-1-andrew@adoakley.name> X-Mailer: git-send-email 2.26.3 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When using python3, git-p4 fails to handle data from perforce which is not valid UTF-8. In large repositories it's very likely that such data will exist - perforce itself does no validation of the data by default. Historically git-p4 has just passed whatever bytes it got from perforce into git. This seems like a sensible approach - git-p4 has no idea what encoding may have been used and it seems likely that different encodings are used within a repository. I was trying to do a more thorough job, moving more of git-p4 over to using bytes. Unfortunately the changes end up being large and hard to review. In most cases it's probably sufficient to just avoid decoding the commit messages. There have been a couple of previous proposals around trying to decode this data using a user-configured encoding: http://public-inbox.org/git/CAE5ih7-F9efsiV5AQmw3ocjiy+BT6ZAT5fA0Lx0OSkVTO8Kqjg@mail.gmail.com/T/ http://public-inbox.org/git/20210409153815.7joohvmlnh6itczc@tb-raspi4/T/