git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Shourya Shukla <shouryashukla.oo@gmail.com>
Cc: git@vger.kernel.org, chriscool@tuxfamily.org, peff@peff.net,
	t.gummerer@gmail.com
Subject: Re: Conversion of 'git submodule' to C: need some help
Date: Thu, 13 Feb 2020 14:33:58 +0100 (CET)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.2002131412300.46@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <20200209111349.20472-1-shouryashukla.oo@gmail.com>

Hi Shourya,

just adding a little to what Abhishek said (which was pretty sound
advice!) below.

On Sun, 9 Feb 2020, Shourya Shukla wrote:

> I am facing some problems and would love some insight on them:
>
> 	1. What exactly are we aiming in [3]? To replace the function completely
> 	   or to just add some 'repo_submodule_init' functionality?

If you follow the "Git blame" link in the breadcrumb menu, you will get to
the commit that added the TODO:
https://github.com/periperidip/git/commit/18cfc0886617e28fb6d29d579bec0ffcdb439196

Unfortunately, it does not necessarily help me understand what that TODO
is about. So let's analyze the code:

int add_submodule_odb(const char *path)
{
	struct strbuf objects_directory = STRBUF_INIT;
	int ret = 0;
	ret = strbuf_git_path_submodule(&objects_directory, path, "objects/");
	if (ret)
		goto done;
	if (!is_directory(objects_directory.buf)) {
		ret = -1;
		goto done;
	}
	add_to_alternates_memory(objects_directory.buf);
done:
	strbuf_release(&objects_directory);
	return ret;
}

Okay, so this just adds the object database of the submodule (if it
exists, if it does not exist, the submodule is probably _already_ using
the superproject's database).

To understand what I am talking about, have a look at this document:
https://git-scm.com/docs/gitrepository-layout#Documentation/gitrepository-layout.txt-objects

So what does the function do that was suggested as a better alternative?

int repo_submodule_init(struct repository *subrepo,
			struct repository *superproject,
			const struct submodule *sub)
{
	struct strbuf gitdir = STRBUF_INIT;
	struct strbuf worktree = STRBUF_INIT;
	int ret = 0;


	if (!sub) {
		ret = -1;
		goto out;
	}


	strbuf_repo_worktree_path(&gitdir, superproject, "%s/.git", sub->path);
	strbuf_repo_worktree_path(&worktree, superproject, "%s", sub->path);


	if (repo_init(subrepo, gitdir.buf, worktree.buf)) {
		/*
		 * If initialization fails then it may be due to the
		 * submodule
		 * not being populated in the superproject's worktree.
		 * Instead
		 * we can try to initialize the submodule by finding it's
		 * gitdir
		 * in the superproject's 'modules' directory.  In this
		 * case the
		 * submodule would not have a worktree.
		 */
		strbuf_reset(&gitdir);
		strbuf_repo_git_path(&gitdir, superproject,
				     "modules/%s", sub->name);


		if (repo_init(subrepo, gitdir.buf, NULL)) {
			ret = -1;
			goto out;
		}
	}


	subrepo->submodule_prefix = xstrfmt("%s%s/",
					    superproject->submodule_prefix ?
					    superproject->submodule_prefix :
					    "", sub->path);

out:
	strbuf_release(&gitdir);
	strbuf_release(&worktree);
	return ret;
}

Ah, that populates a complete `struct repository`! I fear, however, that
our object lookup is currently not tied to such a `struct repository`
instance. So I think that this TODO can only be addressed once a ton more
patch series like
https://lore.kernel.org/git/f1e4da02-9411-8a93-ca62-6d7ae7bf4ae8@gmail.com/
made it not only to the Git mailing list, but into `master`.

> 	2. Something I inferred was that functions with names of the pattern 'strbuf_git_*'
> 	   are trying to 'create a path'(are they physically creating the path or just
> 	   instructing git about them?) while functions of the pattern 'git_*' are trying
> 	   to check some conditions denoted by their function names(for instance
> 	   'git_config_rename_section_in_file')? Is this inference correct to some extent?

All `strbuf_*()` functions work on our "string class" (I forgot who said
it, but it is true that any sufficiently advanced C project sooner or
later develops their own string data type).

To know whether the functions in question create a path or not, you will
have to find their documentation in the appropriate header file (usually
`strbuf.h`), or absent that, find and understand their implementation
(usually in `strbuf.c`).

> 	3. How does one check which all parts of a command have been completed? Is it checked
> 	   by looking at the file history or by comparing with the shell script of the command
> 	   or are there any other means?

You mean whether a scripted command has been completely converted to C?
There is no universal way to do that.

In `git submodule`'s instance, I would say that a subcommand is converted
successfully when all parts except for the command-line option parsing
have been moved into the `submodule--helper`. Eventually,
`git-submodule.sh` will only have functions that parse command-line
options and then pass the result on to the helper. At that point, the
command-line option parsing can _also_ be moved into the helper. Or maybe
even the entire script in one go, I am not sure how big of a patch that
would be.

> 	4. Is it fine if I am not able to understand the purpose of certain functions right now(such as
> 	   'add_submodule_odb')? I am able to get a rough idea of what the functions are doing but I am
> 	   not able to decode certain functions line-by-line.

It is okay not to understand all the details, but if you want to work on
the code, you will need to understand at least the purpose, and if you
want to come up with a project plan (e.g. for GSoC), it will be _really_
helpful to form an understanding of the implementation details, too.

> Currently, I am studying in depth about 'git objects' and the submodule command on the git Documentation.
> What else do would you advise me to strengthen my understanding of the code and git in general?

I don't know what in particular you want to strengthen. Typically, a good
way to learn enough about the code base in preparation for Google Summer
of Code or Outreachy is to read the code, and whenever anything is
unclear, try to learn about the data structures and/or the underlying
design by studying the files in `Documentation/` (in particular in the
`technical/` subdirectory) whose names seem relevant.

Ciao,
Johannes

>
> Regards,
> Shourya Shukla
>
> [1]: https://github.com/periperidip/git/blob/v2.25.0/submodule.c
> [2]: https://lore.kernel.org/git/20200201173841.13760-1-shouryashukla.oo@gmail.com/
> [3]: https://github.com/periperidip/git/blob/v2.25.0/submodule.c#L168
>
>

  reply	other threads:[~2020-02-13 13:34 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-09 11:13 Conversion of 'git submodule' to C: need some help Shourya Shukla
2020-02-13 13:33 ` Johannes Schindelin [this message]
  -- strict thread matches above, loose matches on Subject: below --
2020-02-09 15:00 Abhishek Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.2002131412300.46@tvgsbejvaqbjf.bet \
    --to=johannes.schindelin@gmx.de \
    --cc=chriscool@tuxfamily.org \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=shouryashukla.oo@gmail.com \
    --cc=t.gummerer@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).