From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS53758 23.128.96.0/24 X-Spam-Status: No, score=-3.3 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI, SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by dcvr.yhbt.net (Postfix) with ESMTP id 62F2F1F5AE for ; Thu, 20 May 2021 01:56:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230062AbhETB4o (ORCPT ); Wed, 19 May 2021 21:56:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55862 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229498AbhETB4k (ORCPT ); Wed, 19 May 2021 21:56:40 -0400 Received: from mail-pl1-x631.google.com (mail-pl1-x631.google.com [IPv6:2607:f8b0:4864:20::631]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 310D1C061574 for ; Wed, 19 May 2021 18:55:20 -0700 (PDT) Received: by mail-pl1-x631.google.com with SMTP id 69so8111018plc.5 for ; Wed, 19 May 2021 18:55:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=cJNDrY7I6mrRTP7uFiASB1Czx4huM0F7W9NakTrQJqw=; b=sbhu9hlMYimwoUzw3WB1rK7jsegqwujQTW4G/UE2TEqa/wtdTPv3LP+VXRas16CAai JGVArdFSd3Bd0/q+iiruLi7RGhRfgTlztObW1mQI5otebIw5glW5Klx7vHYXeQz0AHqi UJtgmZ0YWH8Wd4J51U3IB8uVSU+GXf90p2fc3Njg5tVoBDDzLmERUPHMjZkx5po0CkXi g0D0/Cs3BHaLQekQAxsxXSTo5+9Z5kU0aKzlkzc/lHV25urwoHX6OkZ2evhR6yRJbMlb jYdDFsKl0RPWVTnnR55A0TybN2fGxlwYo9RHm67mAy/QJE7xn7zUI5Ypu1BEBlwioQPf Tvqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=cJNDrY7I6mrRTP7uFiASB1Czx4huM0F7W9NakTrQJqw=; b=SwB8Lz2EugmDZxfKW3KDmh8UZjZzyUC+BnWa7SAuDgQxTBuLMiBMBWg5Wb6CRcpwE+ XRcDRkt7QWXFAGKjBUcNJK4x3lsyVJWEvy9BFWWX1N9mlf+yiKRQBkeWQF/heriqoPNB nc8PZlPuwGcpW3EB+4ZriYRrPDFdfCd5s3oN8HnM1YWu9HTY3pK0/suchVO5paMZvjHO hfoqHd8tFAXTCWT4xic2NpZDV6rUrTQzRCuzyC12UK2Ts3o89DR8A0LXXV4Fu3RxZoKe Y4QP97VC1sFzOdPJJzlSHcsnaZyR4IrYn6K8x3y35KW6u4xob/6igdY8+nFQl45cg9YY xJRQ== X-Gm-Message-State: AOAM532/gWmoHzz+Vyq2GJP/OUtVvPNtU83R2b0UQrDtU1NSHLtREpTI fjrU05q1wlR7JrZeMRSqM84= X-Google-Smtp-Source: ABdhPJzyGfdXAJI1F2yvUkcm8LeBmL/Ab3p36hD59biHnCpuvmnShWUzffGl8EVZNncbAOUge3ogAA== X-Received: by 2002:a17:902:f704:b029:f4:228d:4dca with SMTP id h4-20020a170902f704b02900f4228d4dcamr3001125plo.26.1621475719499; Wed, 19 May 2021 18:55:19 -0700 (PDT) Received: from google.com ([2620:15c:2ce:200:4edd:d685:2ff9:b567]) by smtp.gmail.com with ESMTPSA id e6sm2203574pjt.1.2021.05.19.18.55.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 May 2021 18:55:18 -0700 (PDT) Date: Wed, 19 May 2021 18:55:16 -0700 From: Jonathan Nieder To: Junio C Hamano Cc: git@vger.kernel.org, Josh Steadmon , Jeff King , Jeff Hostetler Subject: Re: RFC: error codes on exit Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Hi, Junio C Hamano wrote: > Jonathan Nieder writes: >> In order to do this, I would like to annotate "exit" events with a >> classification of the error. > > We already have BUG() for e. and die() for everything else, and > "everything else" may be overly broad for your purpose. > > I am sympathetic to the cause and I agree that introducing a > finer-grained classification might be a solution. I however am not > sure how we can enforce developers to apply such a manually assigned > "error code" cosistently. I think two things you're hinting at are "what about maintainability?" and "what is the migration path?" I suspect that a number of error paths will remain unclassified for a long time, possibly indefinitely. The way I've seen this treated in other tools is that it's okay for something to show up as an INTERNAL error if it doesn't happen frequently: sure, that can cause us a bit of unnecessary worry when it starts occuring more often, but at that point we're in a good place to replace it with something more appropriate. That means that we would still want to keep die() or some equivalent. That in turn might suggest that the API I suggested is overly verbose; it might make sense to have a different die()-style helper for each type of error, matching what we do with die() and BUG(). Side note: you might wonder why keeping die() would even be a question. For example, there are all the outstanding patches that still use die(); changing such a fundamental API would seem to be a nonstarter. Fortunately, though, the tools in contrib/coccinelle/README allow changing an API in three steps: 1. Introduce the new API. Keep the old API around for backward compatibility. 2. Add a "pending" coccinelle semantic patch to automatically update callers to the new API. Update existing callers using 'make coccicheck-pending'. 3. Remove the old API and mark the semantic patch as no longer pending. Patches using the old API can be fixed using 'make coccicheck' So we can make this decision based on whether the resulting API is one we like more; in this example, I suspect that keeping die() is preferable _even though_ it would be possible to remove by staying in step 2 for a while without too much fuss. > Just to throw in a totally different alternative to see if it works > better, I wonder if you can teach die() to report to the trace2 > stream where in the code it was called from and which vintage of Git > it is running. > > The stat collection side that cares about certain class of failures > can have function that maps "die() at :@" > to "what kind of die() it is". > > E.g. blame.c:50@v2.32.0-rc0-184-gbbde7e6616" may be BUG(), while > blame.c:2740@v2.32.0-rc0-184-gbbde7e6616 may be an user-error. For ad hoc queries, this is a rather nice tool. Traces already record filename, version, and line number, though I believe in the die() case it currently just points to the implementation of die(). ;-) However, for analysis in aggregate (for example, to define an SLO[1]) that would require us to maintain a database that maps : to error code. That database would be essentially the same as patches to record the error codes, so what it would really amount to is having these deployments using a permanent fork of Git. It would also get rid of the chance to discuss and improve common error paths on-list. If we expect the error codes to not be useful to anyone else, then that is the right choice to make (or rather, we'd have to use other heuristics, such as having the traces record a collection of offsets in the binary and a build-id so we can key off of stack trace signatures). Part of the reason I started this thread is to get a sense of whether these can be useful to others. Thanks, Jonathan [1] https://sre.google/sre-book/service-level-objectives/