Re: [PATCH v3 18/24] read-cache: write index-v5

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: Duy Nguyen <pclouds@gmail.com>
To: Thomas Gummerer <t.gummerer@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Thomas Rast <trast@inf.ethz.ch>,
	Michael Haggerty <mhagger@alum.mit.edu>,
	Junio C Hamano <gitster@pobox.com>,
	Robin Rosenberg <robin.rosenberg@dewire.com>,
	Eric Sunshine <sunshine@sunshineco.com>,
	Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Subject: Re: [PATCH v3 18/24] read-cache: write index-v5
Date: Sat, 24 Aug 2013 10:58:44 +0700	[thread overview]
Message-ID: <CACsJy8BocjJqgyJwa8DzwiNdmpu-yJH9vVuEPunxjtRyr-rxYA@mail.gmail.com> (raw)
In-Reply-To: <1376854933-31241-19-git-send-email-t.gummerer@gmail.com>

On Mon, Aug 19, 2013 at 2:42 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> +char *super_directory(const char *filename)
> +{
> +       char *slash;
> +
> +       slash = strrchr(filename, '/');
> +       if (slash)
> +               return xmemdupz(filename, slash-filename);
> +       return NULL;
> +}
> +

Why is this function not static? There are a few other in this patch
(I did not notice in others, but I wasn't looking for them..)

But isn't this what dirname() is?

Another point about this function is it returns a new allocated
string, but I see no free() anywhere in this patch. Leak alert!

> +struct directory_entry *init_directory_entry(char *pathname, int len)
> +{
> +       struct directory_entry *de = xmalloc(directory_entry_size(len));
> +
> +       memcpy(de->pathname, pathname, len);
> +       de->pathname[len] = '\0';
> +       de->de_flags      = 0;
> +       de->de_foffset    = 0;
> +       de->de_cr         = 0;
> +       de->de_ncr        = 0;
> +       de->de_nsubtrees  = 0;
> +       de->de_nfiles     = 0;
> +       de->de_nentries   = 0;
> +       memset(de->sha1, 0, 20);
> +       de->de_pathlen    = len;
> +       de->next          = NULL;
> +       de->next_hash     = NULL;
> +       de->ce            = NULL;
> +       de->ce_last       = NULL;
> +       de->conflict      = NULL;
> +       de->conflict_last = NULL;
> +       de->conflict_size = 0;
> +       return de;
> +}

I think this function could be shortened to

struct directory_entry *de = xcalloc(1, directory_entry_size(len));
memcpy(de->pathname, pathname, len);
de->de_pathlen = len;
return de;

> +static struct directory_entry *get_directory(char *dir, unsigned int dir_len,
> +                                            struct hash_table *table,
> +                                            unsigned int *total_dir_len,
> +                                            unsigned int *ndir,
> +                                            struct directory_entry **current)
> +{
> +       struct directory_entry *tmp = NULL, *search, *new, *ret;
> +       uint32_t crc;
> +
> +       search = find_directory(dir, dir_len, &crc, table);
> +       if (search)
> +               return search;
> +       while (!search) {
> +               new = init_directory_entry(dir, dir_len);
> +               insert_directory_entry(new, table, total_dir_len, ndir, crc);
> +               if (!tmp)
> +                       ret = new;
> +               else
> +                       new->de_nsubtrees = 1;
> +               new->next = tmp;
> +               tmp = new;
> +               dir = super_directory(dir);

It feels more natural to create directory_entry(s) from parent to
subdir. If you do so you could reset dir to the remaining of directory
and perform strchr() and do not need to allocate new memory everytime
you call super_directory (because it relies on NUL at the end of the
string).

> +               dir_len = dir ? strlen(dir) : 0;
> +               search = find_directory(dir, dir_len, &crc, table);
> +       }
> +       search->de_nsubtrees++;
> +       (*current)->next = tmp;
> +       while ((*current)->next)
> +               *current = (*current)->next;
> +
> +       return ret;
> +}

> +static struct directory_entry *compile_directory_data(struct index_state *istate,
> +                                                     int nfile,
> +                                                     unsigned int *ndir,
> +                                                     unsigned int *total_dir_len,
> +                                                     unsigned int *total_file_len)
> +{
> +       int i, dir_len = -1;
> +       char *dir;
> +       struct directory_entry *de, *current, *search;
> +       struct cache_entry **cache = istate->cache;
> +       struct conflict_entry *conflict_entry;
> +       struct hash_table table;
> +       uint32_t crc;
> +
> +       init_hash(&table);
> +       de = init_directory_entry("", 0);
> +       current = de;
> +       *ndir = 1;
> +       *total_dir_len = 1;
> +       crc = crc32(0, (Bytef*)de->pathname, de->de_pathlen);
> +       insert_hash(crc, de, &table);
> +       conflict_entry = NULL;
> +       for (i = 0; i < nfile; i++) {
> +               if (cache[i]->ce_flags & CE_REMOVE)
> +                       continue;
> +
> +               if (dir_len < 0
> +                   || cache[i]->name[dir_len] != '/'

Need a check to make sure name[dir_len] is not out of bound

> +                   || strchr(cache[i]->name + dir_len + 1, '/')
> +                   || cache_name_compare(cache[i]->name, ce_namelen(cache[i]),
> +                                         dir, dir_len)) {

In my opinon, "if (dir_len < 0 || !(must && be && a && subdirectory))"
is easier to read..

> +                       dir = super_directory(cache[i]->name);
> +                       dir_len = dir ? strlen(dir) : 0;
> +                       search = get_directory(dir, dir_len, &table,
> +                                              total_dir_len, ndir,
> +                                              &current);
> +               }
> +               search->de_nfiles++;
> +               *total_file_len += ce_namelen(cache[i]) + 1;
> +               if (search->de_pathlen)
> +                       *total_file_len -= search->de_pathlen + 1;
> +               ce_queue_push(&(search->ce), &(search->ce_last), cache[i]);
> +
> +               if (ce_stage(cache[i]) > 0) {
> +                       struct conflict_part *conflict_part;
> +                       if (!conflict_entry ||
> +                           cache_name_compare(conflict_entry->name, conflict_entry->namelen,
> +                                              cache[i]->name, ce_namelen(cache[i]))) {
> +                               conflict_entry = create_conflict_entry_from_ce(cache[i], search->de_pathlen);
> +                               add_conflict_to_directory_entry(search, conflict_entry);
> +                       }
> +                       conflict_part = conflict_part_from_inmemory(cache[i]);
> +                       add_part_to_conflict_entry(search, conflict_entry, conflict_part);
> +               }
> +       }
> +       return de;
> +}
> +

> +static int write_directories(struct directory_entry *de, int fd, int conflict_offset)
> +{
> +       struct directory_entry *current;
> +       struct ondisk_directory_entry ondisk;
> +       int current_offset, offset_write, ondisk_size, foffset;
> +       uint32_t crc;
> +
> +       /*
> +        * This is needed because the compiler aligns structs to sizes multiple
> +        * of 4
> +        */
> +       ondisk_size = sizeof(ondisk.flags)
> +               + sizeof(ondisk.foffset)
> +               + sizeof(ondisk.cr)
> +               + sizeof(ondisk.ncr)
> +               + sizeof(ondisk.nsubtrees)
> +               + sizeof(ondisk.nfiles)
> +               + sizeof(ondisk.nentries)
> +               + sizeof(ondisk.sha1);
> +       current = de;
> +       current_offset = 0;
> +       foffset = 0;
> +       while (current) {
> +               int pathlen;
> +
> +               offset_write = htonl(current_offset);
> +               if (ce_write(NULL, fd, &offset_write, 4) < 0)
> +                       return -1;
> +               if (current->de_pathlen == 0)
> +                       pathlen = 0;
> +               else
> +                       pathlen = current->de_pathlen + 1;
> +               current_offset += pathlen + 1 + ondisk_size + 4;
> +               current = current->next;
> +       }
> +       /*
> +        * Write one more offset, which points to the end of the entries,
> +        * because we use it for calculating the dir length, instead of
> +        * using strlen.
> +        */
> +       offset_write = htonl(current_offset);
> +       if (ce_write(NULL, fd, &offset_write, 4) < 0)
> +               return -1;
> +       current = de;
> +       while (current) {
> +               crc = 0;
> +               if (current->de_pathlen == 0) {
> +                       if (ce_write(&crc, fd, current->pathname, 1) < 0)
> +                               return -1;
> +               } else {
> +                       char *path;
> +                       path = xmalloc(sizeof(char) * (current->de_pathlen + 2));
> +                       memcpy(path, current->pathname, current->de_pathlen);
> +                       memcpy(path + current->de_pathlen, "/\0", 2);
> +                       if (ce_write(&crc, fd, path, current->de_pathlen + 2) < 0)
> +                               return -1;

xmalloc without free

> +               }
> +               current->de_foffset = foffset;
> +               current->de_cr = conflict_offset;
> +               ondisk_from_directory_entry(current, &ondisk);
> +               if (ce_write(&crc, fd, &ondisk, ondisk_size) < 0)
> +                       return -1;
> +               crc = htonl(crc);
> +               if (ce_write(NULL, fd, &crc, 4) < 0)
> +                       return -1;
> +               conflict_offset += current->conflict_size;
> +               foffset += current->de_nfiles * 4;
> +               current = current->next;
> +       }
> +       return 0;
> +}
> +
> +static int write_entries(struct index_state *istate,
> +                           struct directory_entry *de,
> +                           int entries,
> +                           int fd,
> +                           int offset_to_offset)
> +{
> +       int offset, offset_write, ondisk_size;
> +       struct directory_entry *current;
> +
> +       offset = 0;
> +       ondisk_size = sizeof(struct ondisk_cache_entry);
> +       current = de;
> +       while (current) {

A short comment a the beginning of this block saying this writes
fileoffsets table would be nice.

> +               int pathlen;
> +               struct cache_entry *ce = current->ce;
> +
> +               if (current->de_pathlen == 0)
> +                       pathlen = 0;
> +               else
> +                       pathlen = current->de_pathlen + 1;
> +               while (ce) {
> +                       if (ce->ce_flags & CE_REMOVE)
> +                               continue;

How come CE_REMOVE'd entries get here? I thought they were all ignored
at compile_directory_data()

> +                       if (!ce_uptodate(ce) && is_racy_timestamp(istate, ce))
> +                               ce_smudge_racily_clean_entry(ce);
> +                       if (is_null_sha1(ce->sha1))
> +                               return error("cache entry has null sha1: %s", ce->name);
> +
> +                       offset_write = htonl(offset);
> +                       if (ce_write(NULL, fd, &offset_write, 4) < 0)
> +                               return -1;
> +                       offset += ce_namelen(ce) - pathlen + 1 + ondisk_size + 4;
> +                       ce = ce->next_ce;
> +               }
> +               current = current->next;
> +       }
> +       /*
> +        * Write one more offset, which points to the end of the entries,
> +        * because we use it for calculating the file length, instead of
> +        * using strlen.
> +        */
> +       offset_write = htonl(offset);
> +       if (ce_write(NULL, fd, &offset_write, 4) < 0)
> +               return -1;
> +
> +       offset = offset_to_offset;
> +       current = de;
> +       while (current) {
> +               int pathlen;
> +               struct cache_entry *ce = current->ce;
> +
> +               if (current->de_pathlen == 0)
> +                       pathlen = 0;
> +               else
> +                       pathlen = current->de_pathlen + 1;
> +               while (ce) {
> +                       struct ondisk_cache_entry ondisk;
> +                       uint32_t crc, calc_crc;
> +
> +                       if (ce->ce_flags & CE_REMOVE)
> +                               continue;
> +                       calc_crc = htonl(offset);
> +                       crc = crc32(0, (Bytef*)&calc_crc, 4);
> +                       if (ce_write(&crc, fd, ce->name + pathlen,
> +                                       ce_namelen(ce) - pathlen + 1) < 0)
> +                               return -1;
> +                       ondisk_from_cache_entry(ce, &ondisk);
> +                       if (ce_write(&crc, fd, &ondisk, ondisk_size) < 0)
> +                               return -1;
> +                       crc = htonl(crc);
> +                       if (ce_write(NULL, fd, &crc, 4) < 0)
> +                               return -1;
> +                       offset += 4;
> +                       ce = ce->next_ce;
> +               }
> +               current = current->next;
> +       }
> +       return 0;
> +}

> +static int write_index_v5(struct index_state *istate, int newfd)
> +{
> +       struct cache_header hdr;
> +       struct cache_header_v5 hdr_v5;
> +       struct cache_entry **cache = istate->cache;
> +       struct directory_entry *de;
> +       struct ondisk_directory_entry *ondisk;
> +       unsigned int entries = istate->cache_nr;
> +       unsigned int i, removed, total_dir_len, ondisk_directory_size;
> +       unsigned int total_file_len, conflict_offset, foffsetblock;
> +       unsigned int ndir;
> +       uint32_t crc;
> +
> +       if (istate->filter_opts)
> +               die("BUG: index: cannot write a partially read index");
> +
> +       for (i = removed = 0; i < entries; i++) {
> +               if (cache[i]->ce_flags & CE_REMOVE)
> +                       removed++;
> +       }
> +       hdr.hdr_signature = htonl(CACHE_SIGNATURE);
> +       hdr.hdr_version = htonl(istate->version);
> +       hdr.hdr_entries = htonl(entries - removed);
> +       hdr_v5.hdr_nextension = htonl(0); /* Currently no extensions are supported */
> +
> +       total_dir_len = 0;
> +       total_file_len = 0;
> +       de = compile_directory_data(istate, entries, &ndir,
> +                                   &total_dir_len, &total_file_len);
> +       hdr_v5.hdr_ndir = htonl(ndir);
> +
> +       /*
> +        * This is needed because the compiler aligns structs to sizes multipe
> +        * of 4
> +        */
> +       ondisk_directory_size = sizeof(ondisk->flags)
> +               + sizeof(ondisk->foffset)
> +               + sizeof(ondisk->cr)
> +               + sizeof(ondisk->ncr)
> +               + sizeof(ondisk->nsubtrees)
> +               + sizeof(ondisk->nfiles)
> +               + sizeof(ondisk->nentries)
> +               + sizeof(ondisk->sha1);

There is a similar statement in read code. This calls for a macro to
share this sum.

> +       foffsetblock = sizeof(hdr) + sizeof(hdr_v5) + 4
> +               + (ndir + 1) * 4
> +               + total_dir_len
> +               + ndir * (ondisk_directory_size + 4);
> +       hdr_v5.hdr_fblockoffset = htonl(foffsetblock + (entries - removed + 1) * 4);
> +       crc = 0;
> +       if (ce_write(&crc, newfd, &hdr, sizeof(hdr)) < 0)
> +               return -1;
> +       if (ce_write(&crc, newfd, &hdr_v5, sizeof(hdr_v5)) < 0)
> +               return -1;
> +       crc = htonl(crc);
> +       if (ce_write(NULL, newfd, &crc, 4) < 0)
> +               return -1;
> +
> +       conflict_offset = foffsetblock +
> +               + (entries - removed + 1) * 4
> +               + total_file_len
> +               + (entries - removed) * (sizeof(struct ondisk_cache_entry) + 4);
> +       if (write_directories(de, newfd, conflict_offset) < 0)
> +               return -1;
> +       if (write_entries(istate, de, entries, newfd, foffsetblock) < 0)
> +               return -1;
> +       if (write_conflicts(istate, de, newfd) < 0)
> +               return -1;
> +       return ce_flush(newfd);
> +}
-- 
Duy

next prev parent reply	other threads:[~2013-08-24  3:59 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-18 19:41 [PATCH v3 00/24] Index-v5 Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 01/24] t2104: Don't fail for index versions other than [23] Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 02/24] read-cache: use fixed width integer types Thomas Gummerer
2013-08-18 20:21   ` Eric Sunshine
2013-08-20 19:30   ` Junio C Hamano
2013-08-21  3:05     ` Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 03/24] read-cache: split index file version specific functionality Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 04/24] read-cache: clear version in discard_index() Thomas Gummerer
2013-08-20 19:34   ` Junio C Hamano
2013-08-21  3:06     ` Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 05/24] read-cache: move index v2 specific functions to their own file Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 06/24] read-cache: Don't compare uid, gid and ino on cygwin Thomas Gummerer
2013-08-18 22:34   ` Ramsay Jones
2013-08-20  8:36     ` Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 07/24] read-cache: Re-read index if index file changed Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 08/24] add documentation for the index api Thomas Gummerer
2013-08-18 20:50   ` Eric Sunshine
2013-08-18 19:41 ` [PATCH v3 09/24] read-cache: add index reading api Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 10/24] make sure partially read index is not changed Thomas Gummerer
2013-08-18 21:06   ` Eric Sunshine
2013-08-20  8:46     ` Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 11/24] grep.c: use index api Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 12/24] ls-files.c: " Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 13/24] documentation: add documentation of the index-v5 file format Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 14/24] read-cache: make in-memory format aware of stat_crc Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 15/24] read-cache: read index-v5 Thomas Gummerer
2013-08-19  1:57   ` Eric Sunshine
2013-08-20 14:01   ` Duy Nguyen
2013-08-20 20:59     ` Thomas Gummerer
2013-08-21  0:44       ` Duy Nguyen
2013-08-20 14:16   ` Duy Nguyen
2013-08-20 21:13     ` Thomas Gummerer
2013-08-23 23:52   ` Duy Nguyen
2013-08-18 19:42 ` [PATCH v3 16/24] read-cache: read resolve-undo data Thomas Gummerer
2013-08-19  1:59   ` Eric Sunshine
2013-08-18 19:42 ` [PATCH v3 17/24] read-cache: read cache-tree in index-v5 Thomas Gummerer
2013-08-24  0:09   ` Duy Nguyen
2013-11-25 15:41     ` Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 18/24] read-cache: write index-v5 Thomas Gummerer
2013-08-24  3:58   ` Duy Nguyen [this message]
2013-11-25 15:37     ` Thomas Gummerer
2013-08-24  4:07   ` Duy Nguyen
2013-08-24  9:56     ` Duy Nguyen
2013-08-18 19:42 ` [PATCH v3 19/24] read-cache: write index-v5 cache-tree data Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 20/24] read-cache: write resolve-undo data for index-v5 Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 21/24] update-index.c: rewrite index when index-version is given Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 22/24] p0003-index.sh: add perf test for the index formats Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 23/24] introduce GIT_INDEX_VERSION environment variable Thomas Gummerer
2013-08-21  0:57   ` Duy Nguyen
2013-08-21  4:01     ` Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 24/24] test-lib: allow setting the index format version Thomas Gummerer
2013-08-24  4:16 ` [PATCH v3 00/24] Index-v5 Duy Nguyen
2013-08-25  3:07   ` Junio C Hamano
2013-08-25  4:40     ` Duy Nguyen
2013-08-31  5:23     ` Thomas Gummerer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACsJy8BocjJqgyJwa8DzwiNdmpu-yJH9vVuEPunxjtRyr-rxYA@mail.gmail.com \
    --to=pclouds@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=ramsay@ramsay1.demon.co.uk \
    --cc=robin.rosenberg@dewire.com \
    --cc=sunshine@sunshineco.com \
    --cc=t.gummerer@gmail.com \
    --cc=trast@inf.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).