git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Samuel Lijin <sxlijin@gmail.com>
Cc: Jeff Hostetler <git@jeffhostetler.com>,
	Junio C Hamano <gitster@pobox.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: [PATCH 0/5] Start of a journey: drop NO_THE_INDEX_COMPATIBILITY_MACROS
Date: Wed, 3 May 2017 10:14:55 -0700	[thread overview]
Message-ID: <CAGZ79kZs8q2cn7wJr-2ZFGZEwSsy8K2RZ_xBXfUuZBiGM0Gjog@mail.gmail.com> (raw)
In-Reply-To: <CAJZjrdWF4NLPty81wmPsgUoVz1FddYhTGdjrNuB1LcME5qrwDQ@mail.gmail.com>

On Wed, May 3, 2017 at 4:31 AM, Samuel Lijin <sxlijin@gmail.com> wrote:
> On Tue, May 2, 2017 at 9:05 AM, Jeff Hostetler <git@jeffhostetler.com> wrote:
>>
>>
>> On 5/2/2017 12:17 AM, Stefan Beller wrote:
>>>
>>> On Mon, May 1, 2017 at 6:36 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>>>
>>>> Stefan Beller <sbeller@google.com> writes:
>>>>
>>>>> This applies to origin/master.
>>>>>
>>>>> For better readability and understandability for newcomers it is a good
>>>>> idea
>>>>> to not offer 2 APIs doing the same thing with on being the #define of
>>>>> the other.
>>>>>
>>>>> In the long run we may want to drop the macros guarded by
>>>>> NO_THE_INDEX_COMPATIBILITY_MACROS. This converts a couple of them.
>>
>>
>> Thank you for bringing this up and making this proposal.
>> I started a similar effort internally last fall, but
>> stopped because of the footprint size.
>>
>>>>
>>>> Why?  Why should we keep typing &the_index, when most of the time we
>>>> are given _the_ index and working on it?
>>>
>>>
>>> As someone knowledgeable with the code base you know that the cache_*
>>> and index_* functions only differ by an index argument. A newcomer may not
>>> know this, so they wonder why we have (A) so many functions [and which is
>>> the
>>> right function to use]; it is an issue of ease of use of the code base.
>>>
>>> Anything you do In submodule land today needs to spawn new processes in
>>> the submodule. This is cumbersome and not performant. So in the far future
>>> we may want to have an abstraction of a repo (B), i.e. all repository
>>> state in
>>> one struct/class. That way we can open a submodule in-process and perform
>>> the required actions without spawning a process.
>>>
>>> The road to (B) is a long road, but we have to art somewhere. And this
>>> seemed
>>> like a good place by introducing a dedicated argument for the
>>> repository. In a follow
>>> up in the future we may want to replace &the_index by
>>> "the_main_repo.its_index"
>>> and then could also run the commands on other (submodule) indexes. But
>>> more
>>> importantly, all these commands would operate on a repository object.
>>>
>>> In such a far future we would have functions like the cmd_* functions
>>> that would take a repository object instead of doing its setup discovery
>>> on their own.
>>>
>>> Another reason may be its current velocity (or absence of it) w.r.t. to
>>> these
>>> functions, such that fewer merge conflicts may arise.
>>
>>
>> In addition to (eventually) allowing multiple repos be open at
>> the same time for submodules, it would also help with various
>> multi-threading efforts.  For example, we have loops that do a
>> "for (k = 0, k < active_nr; k++) {...}"  There is no visual clue
>> in that code that it references "the_index" and therefore should
>> be subject to the same locking.  Granted, this is a trivial example,
>> but goes to the argument that the code has lots of subtle global
>> variables and macros that make it difficult to reason about the
>> code.
>
> Just to throw out an example, I'm relatively new to the codebase (I've
> been lurking on the mailing list for a few months now) and for a
> recent project (I'm an undergrad wrapping up my senior year, and one
> of my classes' final projects was to do something that involved
> concurrency) I took a shot at parallelizing the estimate_similarity()
> calls in diffcore_rename(). The only way I was able to get it to work
> was by dropping global mutexes in one or two files (the code for those
> mutexes still makes me cringe), because of concurrent writes to global
> data structures.

That sounds like a challenge. As we have many globals, we need to be
very careful about threading.

Also an interesting discussion about threading:
https://public-inbox.org/git/9e4733910708111412t48c1beaahfbaa2c68a02f64f1@mail.gmail.com/

Are the patches available for discussion?

Thanks,
Stefan

  reply	other threads:[~2017-05-03 17:15 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-01 19:07 [PATCH 0/5] Start of a journey: drop NO_THE_INDEX_COMPATIBILITY_MACROS Stefan Beller
2017-05-01 19:07 ` [PATCH 1/5] cache.h: drop read_cache() Stefan Beller
2017-05-01 19:07 ` [PATCH 2/5] cache.h: drop active_* macros Stefan Beller
2017-05-01 19:07 ` [PATCH 3/5] cache.h: drop read_cache_from Stefan Beller
2017-05-01 19:07 ` [PATCH 4/5] cache.h: drop read_cache_preload(pathspec) Stefan Beller
2017-05-01 19:07 ` [PATCH 5/5] cache.h: drop read_cache_unmerged() Stefan Beller
2017-05-02  1:36 ` [PATCH 0/5] Start of a journey: drop NO_THE_INDEX_COMPATIBILITY_MACROS Junio C Hamano
2017-05-02  4:17   ` Stefan Beller
2017-05-02 14:05     ` Jeff Hostetler
2017-05-03 11:31       ` Samuel Lijin
2017-05-03 17:14         ` Stefan Beller [this message]
2017-05-03 18:22           ` Samuel Lijin
2017-05-04  3:29             ` Brandon Williams
2017-05-03 10:27   ` Duy Nguyen
2017-05-03 17:02     ` Stefan Beller
2017-05-04  2:48     ` Junio C Hamano
2017-05-04  3:24       ` Brandon Williams
2017-05-04 18:30       ` Stefan Beller
2017-05-05 14:31         ` Johannes Schindelin
2017-05-05 17:20           ` Brandon Williams
2017-05-04 19:19       ` Jonathan Nieder
2017-05-05 17:22         ` Junio C Hamano
2017-05-05 17:29           ` Brandon Williams
2017-05-02 15:35 ` Jeff Hostetler
2017-05-02 17:06   ` Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGZ79kZs8q2cn7wJr-2ZFGZEwSsy8K2RZ_xBXfUuZBiGM0Gjog@mail.gmail.com \
    --to=sbeller@google.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=sxlijin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).