From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.5 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id 94B671F424 for ; Mon, 23 Apr 2018 14:49:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755652AbeDWOt5 (ORCPT ); Mon, 23 Apr 2018 10:49:57 -0400 Received: from mail-qk0-f195.google.com ([209.85.220.195]:41966 "EHLO mail-qk0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755601AbeDWOtu (ORCPT ); Mon, 23 Apr 2018 10:49:50 -0400 Received: by mail-qk0-f195.google.com with SMTP id d125so369060qkb.8 for ; Mon, 23 Apr 2018 07:49:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language; bh=Vg9NJXCWRpCOJLPYPV73Y34Ozu82IDZc/XMiYhIMLZk=; b=mPXCc4A7HOEwXxkDdkFFmKfqdnog68PVVuNeakoaaNVtlUtZJIgyhmUM4N5QIIb31Y ClBHlo53oNCtcwjlTH51M3tylb8KJ/ZWO4dfdaXTdoT28z3G408XL/2LVb/7LVOlY0UI l4/PXaoMRf/we7eZuI0VGhtTOwlzXPfFbbVHagwTr+ngwk2gtUddreJdwThqpRgbRf+K XdWIrFQialx7kClpWJJHX6ZjLW87sgUf+PAj14Rv4cbu9ysAWrvrI5vp3BbNRtpQRAQ3 QdAL0C4zbS3k+aAhEAxROKxMoXxjc08efVWURHhgm0blzOnAFamSiJqavs7nRQDLgYK1 tBVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=Vg9NJXCWRpCOJLPYPV73Y34Ozu82IDZc/XMiYhIMLZk=; b=m1+ENwL4oNizBoAmmgWPulDT+RGwL8JUX+c01rRF9WdB2Bhx55odk6s871ICp+AXN6 4FMwbyNpbgRdg2dnyKm+N1eMZKNZtd24mXGU92HllDNLjNVssrDRyivwjJLotpxg18OH d37umbIQr6s0KMcHO/yQ/eeKeClgdf0GoaPMp6209EB8ua+GOqTyXURdL++JexonLbv0 OmwO++VxSJcWAwpWlD9IvYnD/OPFe+Qv41GgvsrDvvMkliNCRso6gN1s9DEd5+i9Ryrl RrlySm9fhcUAb/yauxZJYPuvf9VHhftXYlvfdnNFIe8LKIkDs83JyKip0en4OQRU47DE uBDQ== X-Gm-Message-State: ALQs6tBj+zGRrT6fUUSvhl3IkB95UU+EKIhe7cZP8YxRpiDG/me3ru8f NdFRqpHmQEe34W8sBVPrDHo= X-Google-Smtp-Source: AB8JxZqNJ3uE5xMOhr8pC2cceta9XiBcJxgXbfwLRlHceAUJmUkoOZuX8bSlEFdDiCHYKWaI91J4VA== X-Received: by 10.55.179.132 with SMTP id c126mr21683375qkf.160.1524494989846; Mon, 23 Apr 2018 07:49:49 -0700 (PDT) Received: from ?IPv6:2001:4898:6808:13e:c4e6:7a22:56f1:df04? ([2001:4898:8010:0:ae1c:7a22:56f1:df04]) by smtp.gmail.com with ESMTPSA id k50-v6sm123qtb.31.2018.04.23.07.49.48 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 23 Apr 2018 07:49:49 -0700 (PDT) Subject: Re: [PATCH v3 8/9] commit-graph: always load commit-graph information To: Jakub Narebski , Derrick Stolee Cc: "git@vger.kernel.org" , "peff@peff.net" , "avarab@gmail.com" , "sbeller@google.com" , "larsxschneider@gmail.com" , "bmwill@google.com" , "gitster@pobox.com" , "sunshine@sunshineco.com" , "jonathantanmy@google.com" References: <20180409164131.37312-1-dstolee@microsoft.com> <20180417170001.138464-1-dstolee@microsoft.com> <20180417170001.138464-9-dstolee@microsoft.com> <8636zsgj1g.fsf@gmail.com> From: Derrick Stolee Message-ID: <88b0fb38-57d5-f654-27c8-dd1807436d93@gmail.com> Date: Mon, 23 Apr 2018 10:49:48 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <8636zsgj1g.fsf@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On 4/18/2018 8:02 PM, Jakub Narebski wrote: > Derrick Stolee writes: > >> Most code paths load commits using lookup_commit() and then >> parse_commit(). In some cases, including some branch lookups, the commit >> is parsed using parse_object_buffer() which side-steps parse_commit() in >> favor of parse_commit_buffer(). >> >> With generation numbers in the commit-graph, we need to ensure that any >> commit that exists in the commit-graph file has its generation number >> loaded. > All right, that is nice explanation of the why behind this change. > >> Create new load_commit_graph_info() method to fill in the information >> for a commit that exists only in the commit-graph file. Call it from >> parse_commit_buffer() after loading the other commit information from >> the given buffer. Only fill this information when specified by the >> 'check_graph' parameter. This avoids duplicate work when we already >> checked the graph in parse_commit_gently() or when simply checking the >> buffer contents in check_commit(). > Couldn't this 'check_graph' parameter be a global variable similar to > the 'commit_graph' variable? Maybe I am not understanding it. See the two callers at the bottom of the patch. They have different purposes: one needs to fill in a valid commit struct, the other needs to check the commit buffer is valid (then throws away the struct). They have different values for 'check_graph'. Also, in parse_commit_gently() we check parse_commit_in_graph() before we call parse_commit_buffer, so we do not want to repeat work; in the case of a valid commit-graph file, but the commit is not in the commit-graph, we would repeat our binary search for the same commit. > >> Signed-off-by: Derrick Stolee >> --- >> commit-graph.c | 51 ++++++++++++++++++++++++++++++++------------------ >> commit-graph.h | 8 ++++++++ >> commit.c | 7 +++++-- >> commit.h | 2 +- >> object.c | 2 +- >> sha1_file.c | 2 +- >> 6 files changed, 49 insertions(+), 23 deletions(-) >> >> diff --git a/commit-graph.c b/commit-graph.c >> index 688d5b1801..21e853c21a 100644 >> --- a/commit-graph.c >> +++ b/commit-graph.c >> @@ -245,13 +245,19 @@ static struct commit_list **insert_parent_or_die(struct commit_graph *g, >> return &commit_list_insert(c, pptr)->next; >> } >> >> +static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, uint32_t pos) >> +{ >> + const unsigned char *commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * pos; >> + item->generation = get_be32(commit_data + g->hash_len + 8) >> 2; >> +} >> + >> static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t pos) >> { >> uint32_t edge_value; >> uint32_t *parent_data_ptr; >> uint64_t date_low, date_high; >> struct commit_list **pptr; >> - const unsigned char *commit_data = g->chunk_commit_data + (g->hash_len + 16) * pos; >> + const unsigned char *commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * pos; > I'm probably wrong, but isn't it unrelated change? You're right. I saw this while I was in here, and there was a similar comment on this change in a different patch. Probably best to keep these cleanup things in a separate commit. >> item->object.parsed = 1; >> item->graph_pos = pos; >> @@ -292,31 +298,40 @@ static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, uin >> return 1; >> } >> >> +static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t *pos) >> +{ >> + if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) { >> + *pos = item->graph_pos; >> + return 1; >> + } else { >> + return bsearch_graph(commit_graph, &(item->object.oid), pos); >> + } >> +} > All right (after the fix). > >> + >> int parse_commit_in_graph(struct commit *item) >> { >> + uint32_t pos; >> + >> + if (item->object.parsed) >> + return 0; >> if (!core_commit_graph) >> return 0; >> - if (item->object.parsed) >> - return 1; > Hmmm... previously the function returned 1 if item->object.parsed, now > it returns 0 for this situation. I don't understand this change. The good news is that this change is unimportant (the only caller is parse_commit_gently() which checks item->object.parsed before calling parse_commit_in_graph()). I wonder why I reordered those things, anyway. I'll revert to simplify the patch. > >> - >> prepare_commit_graph(); >> - if (commit_graph) { >> - uint32_t pos; >> - int found; >> - if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) { >> - pos = item->graph_pos; >> - found = 1; >> - } else { >> - found = bsearch_graph(commit_graph, &(item->object.oid), &pos); >> - } >> - >> - if (found) >> - return fill_commit_in_graph(item, commit_graph, pos); >> - } >> - >> + if (commit_graph && find_commit_in_graph(item, commit_graph, &pos)) >> + return fill_commit_in_graph(item, commit_graph, pos); > Nice refactoring. > >> return 0; >> } >> >> +void load_commit_graph_info(struct commit *item) >> +{ >> + uint32_t pos; >> + if (!core_commit_graph) >> + return; >> + prepare_commit_graph(); >> + if (commit_graph && find_commit_in_graph(item, commit_graph, &pos)) >> + fill_commit_graph_info(item, commit_graph, pos); >> +} > And the reason for the refactoring. > >> + >> static struct tree *load_tree_for_commit(struct commit_graph *g, struct commit *c) >> { >> struct object_id oid; >> diff --git a/commit-graph.h b/commit-graph.h >> index 260a468e73..96cccb10f3 100644 >> --- a/commit-graph.h >> +++ b/commit-graph.h >> @@ -17,6 +17,14 @@ char *get_commit_graph_filename(const char *obj_dir); >> */ >> int parse_commit_in_graph(struct commit *item); >> >> +/* >> + * It is possible that we loaded commit contents from the commit buffer, >> + * but we also want to ensure the commit-graph content is correctly >> + * checked and filled. Fill the graph_pos and generation members of >> + * the given commit. >> + */ >> +void load_commit_graph_info(struct commit *item); >> + >> struct tree *get_commit_tree_in_graph(const struct commit *c); >> >> struct commit_graph { >> diff --git a/commit.c b/commit.c >> index a70f120878..9ef6f699bd 100644 >> --- a/commit.c >> +++ b/commit.c >> @@ -331,7 +331,7 @@ const void *detach_commit_buffer(struct commit *commit, unsigned long *sizep) >> return ret; >> } >> >> -int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long size) >> +int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long size, int check_graph) >> { >> const char *tail = buffer; >> const char *bufptr = buffer; >> @@ -386,6 +386,9 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s >> } >> item->date = parse_commit_date(bufptr, tail); >> >> + if (check_graph) >> + load_commit_graph_info(item); >> + >> return 0; >> } >> >> @@ -412,7 +415,7 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing) >> return error("Object %s not a commit", >> oid_to_hex(&item->object.oid)); >> } >> - ret = parse_commit_buffer(item, buffer, size); >> + ret = parse_commit_buffer(item, buffer, size, 0); >> if (save_commit_buffer && !ret) { >> set_commit_buffer(item, buffer, size); >> return 0; >> diff --git a/commit.h b/commit.h >> index 64436ff44e..b5afde1ae9 100644 >> --- a/commit.h >> +++ b/commit.h >> @@ -72,7 +72,7 @@ struct commit *lookup_commit_reference_by_name(const char *name); >> */ >> struct commit *lookup_commit_or_die(const struct object_id *oid, const char *ref_name); >> >> -int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long size); >> +int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long size, int check_graph); >> int parse_commit_gently(struct commit *item, int quiet_on_missing); >> static inline int parse_commit(struct commit *item) >> { >> diff --git a/object.c b/object.c >> index e6ad3f61f0..efe4871325 100644 >> --- a/object.c >> +++ b/object.c >> @@ -207,7 +207,7 @@ struct object *parse_object_buffer(const struct object_id *oid, enum object_type >> } else if (type == OBJ_COMMIT) { >> struct commit *commit = lookup_commit(oid); >> if (commit) { >> - if (parse_commit_buffer(commit, buffer, size)) >> + if (parse_commit_buffer(commit, buffer, size, 1)) >> return NULL; >> if (!get_cached_commit_buffer(commit, NULL)) { >> set_commit_buffer(commit, buffer, size); >> diff --git a/sha1_file.c b/sha1_file.c >> index 1b94f39c4c..0fd4f0b8b6 100644 >> --- a/sha1_file.c >> +++ b/sha1_file.c >> @@ -1755,7 +1755,7 @@ static void check_commit(const void *buf, size_t size) >> { >> struct commit c; >> memset(&c, 0, sizeof(c)); >> - if (parse_commit_buffer(&c, buf, size)) >> + if (parse_commit_buffer(&c, buf, size, 0)) >> die("corrupt commit"); >> }