git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [GSoC] Git Blog 5
@ 2021-06-20 16:53 ZheNing Hu
  2021-06-21 12:07 ` Christian Couder
  0 siblings, 1 reply; 3+ messages in thread
From: ZheNing Hu @ 2021-06-20 16:53 UTC (permalink / raw)
  To: Git List
  Cc: Junio C Hamano, Hariom verma,
	Ævar Arnfjörð Bjarmason, Christian Couder

My fifth week blog finished:
The web version is here:
https://adlternative.github.io/GSOC-Git-Blog-5/

## Week5: Tempting apple

This week, I spent a lot of time working on the digital circuit
course design of the school. So this week's patches for git
was completed in a hurry. This week `Ævar Arnfjörð Bjarmason`
gave a lot of useful suggestions for the patch I wrote earlier.
Some are related to code style improvements, and some are
better design ideas.

Before, I wanted to use a `<s, s_size>` style string in `atom_value`
to help copy and compare data containing '\0'.

Like this:

```c
struct atom_value {
const char *s;
size_t s_size;
...
};
```

But `Ævar Arnfjörð Bjarmason` thinks it is more reasonable to
use `strbuf` instead of `<s, s_size>`.

Like this:

```c
struct atom_value {
struct strbuf s;
...
};
```

Since the `strbuf` API has a natural `<s, s_size>`, we can add data
that may contain '\0' through `strbuf_add()`, `strbuf_addbuf()`, and
`strbuf_addf()` can also be used to fill strbuf with format string. Use
 `strbuf_addstr()` to replace `xstrdup()`, `strbuf_add()` to replace
 `xmemdupz()`, this is indeed a very tempting choice.

But in my actual refactoring process, this is not very easy to achieve.

For example, the original interface is like this,

```c
v->s = copy_email(wholine, &used_atom[i]);
```

`v->s` will be filled with the data dynamically allocated by `copy_email()`.

```c
static const char *copy_email(const char *buf, struct used_atom *atom)
{
...
if (!eoemail)
return xstrdup("");
return xmemdupz(email, eoemail - email);
}
```

Then if we want use `strbuf` type `v->s`, we should change the
 `copy_email()` interface parameters and return value.

```c
static void copy_email(struct strbuf *str, const char *buf, struct
used_atom *atom)
{
...
if (!eoemail)
return;
return strbuf_add(str, email, eoemail - email);
}
```

Then the caller can do:

```c
copy_email(&v->s, wholine, &used_atom[i]);
```

This is in line with our expectations.

But something like `fill_remote_ref_details()`, things gradually
become complicated and difficult. `fill_remote_ref_details()`.
Just consider the `show_ref()` called in `fill_remote_ref_details()`,
`show_ref()` may call `shorten_unambiguous_ref()` internally, and
then another function is called internally in `shorten_unambiguous_ref()`...
This makes us fail the method of passing `v->s` in parameters like
`copy_email()` does. Another way of thinking: what if I can "attach"
the data directly? Using `strbuf_attach()` may be a viable option, but...

```c
size_t len;
void *data = shorten_unambiguous_ref(refname, warn_ambiguous_refs);
len = strlen(data);
strbuf_attach(&str, data, len, len);
```

...we need to get the length of the data, but this is not easy to do,
`strlen()` can only be used on data that does not contain '\0',
and we are not sure whether a function like `shorten_unambiguous_ref()`
will return a `NULL`.

Well, this is one of the reasons why I cannot move on.

On the other hand, look at the following piece of code, it appears
in `populate_value()`.

```c
for (i = 0; i < used_atom_cnt; i++) {
struct atom_value *v = &ref->value[i];
if (v->s == NULL && used_atom[i].source == SOURCE_NONE)
return strbuf_addf_ret(err, -1, _("missing object %s for %s"),
oid_to_hex(&ref->objectname), ref->refname);
}
```

We need to determine whether `v->s` equals to `NULL`. We can use
c-style strings to easily distinguish between empty strings(`xstrdup("")`)
and `NULL`, but if we use strbuf, it is not easy to distinguish, because
an empty strbuf has the following characteristics: `s.buf == strbuf_slopbuf`
or `sb->buf[0] == '\0'`. It can be said that we shouldn’t even use `NULL`
to assign to `s.buf`.

So in the end, I rejected this seemingly very attractive solution, and use the
previous strategy: `<s, s_size>`, unless someone can think of a better
solution later `;-)`.

* FootNote:
1. Why is there no `strbuf_empty()` in `strbuf` API? I think this may be
a very important thing.

```c
#define strbuf_empty(sb) \
(sb->buf == strbuf_slopbuf) ? \
(!strbuf_slopbuf[0]) : \
(sb->buf[0] == '\0')
```

2. Another thing worth mentioning is: I will have the school final exam
between July 7th and July 14th, I may be busy during this period.

Thanks for Git community, reviewers and mentors.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-06-21 13:18 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-20 16:53 [GSoC] Git Blog 5 ZheNing Hu
2021-06-21 12:07 ` Christian Couder
2021-06-21 13:17   ` ZheNing Hu

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).