git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
* [RFC PATCH 000/194] Moving global state into the repository object
@ 2018-02-05 23:51 Stefan Beller
  2018-02-06  0:54 ` Stefan Beller
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Stefan Beller @ 2018-02-05 23:51 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

This series moves a lot of global state into the repository struct.
It applies on top of 2512f15446149235156528dafbe75930c712b29e (2.16.0)
It can be found at https://github.com/stefanbeller/git/tree/object-store

Motivation for this series:
* Easier to reason about code when all state is stored in one place,
  for example for multithreading
* Easier to move to processing submodules in-process instead of
  calling a processes for the submodule handling.
  The last patch demonstrates this.

Usually when approaching large refactoring series, a lot of tedious review
work is involved and yet it is hard to be convinced the series did everything
right.

When reviewing this series in particular it is hard to be convinced if
any function that is converted to take the repository as an argument
is fully converted or if there are any hidden dependencies that are not
obvious from code review.

This is why this series tries to be reviewer friendly by utilizing
machine assistance as much as possible. There are 3 types of patches
in this series:

(A) <area>: add repository argument to <function_foo>
  This sort of patch is just adding the repository as an argument to that
  function. It assumes the given repository is `the_repository` and has
  a compile time check for that assertion using a preprocessor trick.
  As did a compile check after each commit of the series, I don't expect
  the review burden on these patches to be high. Review on these patches
  is mostly checking for formatting errors or if I sneak in malicious
  code -- if you're inclined to believe that.

(B) <area>: allow <function_foo> to handle arbitrary repositories
  This sort of patch is touching code inside the given function only,
  usually replacing all occurrences of `the_repository` by the argument
  `r`. Here the review is critical: Did I miss any function that relies
  on state of `the_repository` ?
  This series was developed by converting all functions of
  packfile/sha1-file/commit/etc using (A); after the demo patch
  was possible, all patches that did (A), but not (B) were deleted.
  Therefore I was confident at time of writing that patch that
  the conversion of a function which doesn't take a repository argument
  is okay.

(C) other patches, such as moving code around,
    demoing this series in the last patch

This approach was chosen as I think it is review friendly, despite the
huge number of patches.

A weakness of this approach is the buildup of a large series, which ignores
the ongoing development. Rebasing that series turned out to be ok, but merging
it with confidence is an issue.

This series was developed partially as a pair programming exercise with
Jonathan Nieder and Brandon Williams.

The idea with the #define trick to separate the two very separate issues
of touching a lot of code and reasoning about its correctness (hidden deps)
is mostly from Jonathan Tan.

Any suggestions welcome!

Thanks,
Stefan

Brandon Williams (4):
  sha1_file: allow add_to_alternates_file to handle arbitrary
    repositories
  commit: convert commit_graft_pos() to handle arbitrary repositories
  commit: convert register_commit_graft to handle arbitrary repositories
  commit: convert read_graft_file to handle arbitrary repositories

Jonathan Nieder (50):
  object-store: move packed_git and packed_git_mru to object store
  pack: move prepare_packed_git_run_once to object store
  pack: move approximate object count to object store
  pack: add repository argument to install_packed_git
  pack: add repository argument to prepare_packed_git_one
  pack: add repository argument to rearrange_packed_git
  pack: add repository argument to prepare_packed_git_mru
  pack: add repository argument to prepare_packed_git
  pack: add repository argument to reprepare_packed_git
  pack: add repository argument to sha1_file_name
  pack: add repository argument to map_sha1_file
  sha1_file: allow alt_odb_usable to handle arbitrary repositories
  pack: allow install_packed_git to handle arbitrary repositories
  pack: allow rearrange_packed_git to handle arbitrary repositories
  pack: allow prepare_packed_git_mru to handle arbitrary repositories
  pack: allow prepare_packed_git_one to handle arbitrary repositories
  pack: allow prepare_packed_git to handle arbitrary repositories
  pack: allow reprepare_packed_git to handle arbitrary repositories
  pack: allow sha1_file_name to handle arbitrary repositories
  pack: allow stat_sha1_file to handle arbitrary repositories
  pack: allow open_sha1_file to handle arbitrary repositories
  pack: allow map_sha1_file_1 to handle arbitrary repositories
  pack: allow map_sha1_file to handle arbitrary repositories
  pack: allow sha1_loose_object_info to handle arbitrary repositories
  sha1_file: add repository argument to sha1_object_info_extended
  object-store: move alternates API to new alternates.h
  object-store: move loose object functions to new loose-object.h
  pack: move struct pack_window and pack_entry to packfile.h
  object-store: move replace_objects back to object-store
  object-store: move lookup_replace_object to replace-object.h
  pack: add repository argument to packed_object_info
  pack: add repository argument to find_pack_entry
  object: add repository argument to lookup_object
  object: add repository argument to grow_object_hash
  object: add repository argument to lookup_commit_reference_gently
  object: add repository argument to lookup_commit
  object: move grafts to object parser
  object: add repository argument to commit_graft_pos
  object: add repository argument to register_commit_graft
  object: add repository argument to read_graft_file
  object: add repository argument to prepare_commit_graft
  object: add repository argument to lookup_commit_graft
  object: add repository argument to lookup_unknown_object
  Rename sha1_object_info.cocci to object_store.cocci
  alternates: add repository argument to add_to_alternates_memory
  object-store: move check_sha1_signature to object-store.h
  object-store: add repository argument to check_sha1_signature
  object-store: add repository argument to read_object
  object-store: add repository argument to read_sha1_file_extended
  object: move read_object_with_reference to object.h

Stefan Beller (140):
  repository: introduce object store field
  object-store: move alt_odb_list and alt_odb_tail to object store
  sha1_file: add repository argument to alt_odb_usable
  sha1_file: add repository argument to link_alt_odb_entry
  sha1_file: add repository argument to read_info_alternates
  sha1_file: add repository argument to link_alt_odb_entries
  sha1_file: add repository argument to stat_sha1_file
  sha1_file: add repository argument to open_sha1_file
  sha1_file: add repository argument to map_sha1_file_1
  sha1_file: add repository argument to sha1_loose_object_info
  object-store: add repository argument to prepare_alt_odb
  object-store: add repository argument to foreach_alt_odb
  object-store: allow prepare_alt_odb to handle arbitrary repositories
  object-store: allow foreach_alt_odb to handle arbitrary repositories
  replace_object.c: rename to use dash in file name
  replace-object: move replace_object to object store
  object-store: move object access functions to object-store.h
  replace-object: add repository argument to do_lookup_replace_object
  replace-object: move replace objects prepared flag to object store
  replace-object: check_replace_refs is safe in multi repo environment
  refs: add repository argument to for_each_replace_ref
  refs: add repository argument to get_main_ref_store
  replace-object: add repository argument to register_replace_object
  replace-object: add repository argument to register_replace_ref
  replace-object: add repository argument to replace_object_pos
  replace-object: allow replace_object_pos to handle arbitrary
    repositories
  replace-object: allow register_replace_object to handle arbitrary
    repositories
  replace-object: add repository argument to prepare_replace_object
  refs: store the main ref store inside the repository struct
  refs: allow for_each_replace_ref to handle arbitrary repositories
  replace-object: allow prepare_replace_object to handle arbitrary
    repositories
  replace_object: allow do_lookup_replace_object to handle arbitrary
    repositories
  replace-object: add repository argument to lookup_replace_object
  repository: allow lookup_replace_object to handle arbitrary
    repositories
  object-store: add repository argument to sha1_object_info
  pack: add repository argument to retry_bad_packed_offset
  pack: add repository argument to packed_to_object_type
  packfile: add repository argument to read_object
  packfile: add repository argument to unpack_entry
  packfile: add repository argument to cache_or_unpack_entry
  pack: allow find_pack_entry to handle arbitrary repositories
  object-store: allow sha1_object_info to handle arbitrary repositories
  fetch, push: do not use submodule as alternate in has_commits check
  push: add test showing bad interaction of replace refs and submodules
  replace_object: allow register_replace_ref to handle arbitrary
    repositories
  cache.h: migrate the definition of object_id to object.h
  repository: introduce object parser field
  object: add repository argument to parse_object
  object: add repository argument to create_object
  blob: add repository argument to lookup_blob
  tree: add repository argument to lookup_tree
  tag: add repository argument to lookup_tag
  tag: add repository argument to parse_tag_buffer
  tag: add repository argument to deref_tag
  object: add repository argument to lookup_commit_reference
  commit: add repository argument to parse_commit_buffer
  object: allow grow_object_hash to handle arbitrary repositories
  object: allow create_object to handle arbitrary repositories
  object: allow lookup_object to handle arbitrary repositories
  object: allow lookup_unknown_object to handle arbitrary repositories
  object: add repository argument to parse_object_buffer
  repository: keep track of all open repositories
  alternates: add repository argument to add_to_alternates_file
  object-store: add repository argument to read_sha1_file
  packfile: add repository argument to has_packed_and_bad
  packfile: allow has_packed_and_bad to handle arbitrary repositories
  streaming: add repository argument to open_istream_fn
  streaming: add repository argument to open_istream
  streaming: add repository argument to istream_source
  streaming: allow istream_source to handle arbitrary repositories
  sha1_file: allow read_object to handle arbitrary repositories
  object-store.h: allow read_sha1_file{_extended} to handle arbitrary
    repositories
  streaming: allow open_istream_incore to handle arbitrary repositories
  streaming: allow open_istream_pack_non_delta to handle arbitrary
    repositories
  streaming: allow open_istream_loose to handle arbitrary repositories
  streaming: allow open_istream to handle arbitrary repositories
  alternates: convert add_to_alternates_memory to handle arbitrary repos
  object: add repository argument to object_as_type
  alloc: add repository argument to alloc_blob_node
  alloc: add repository argument to alloc_tree_node
  alloc: add repository argument to alloc_commit_node
  alloc: add repository argument to alloc_tag_node
  alloc: add repository argument to alloc_object_node
  alloc: add repository argument to alloc_report
  alloc: add repository argument to alloc_commit_index
  alloc: allow arbitrary repositories for alloc functions
  object: allow object_as_type to handle arbitrary repositories
  commit: allow lookup_commit to handle arbitrary repositories
  object: add repository argument to parse_commit_gently
  commit: add repository argument to parse_commit
  commit: add repository argument to set_commit_buffer
  commit: add repository argument to get_cached_commit_buffer
  commit: add repository argument to unuse_commit_buffer
  commit: add repository argument to free_commit_buffer
  commit: allow commit buffer functions to handle arbitrary repositories
  tree: allow lookup_tree to handle arbitrary repositories
  cache: convert get_graft_file to handle arbitrary repositories
  shallow: add repository argument to set_alternate_shallow_file
  shallow: add repository argument to register_shallow
  shallow: add repository argument to check_shallow_file_for_update
  shallow: add repository argument to is_repository_shallow
  migrate cached path to use the_repository
  shallow: migrate shallow information into the object parser
  commit: allow prepare_commit_graft to handle arbitrary repositories
  commit: allow lookup_commit_graft to handle arbitrary repositories
  commit: allow arbitrary repository argument for parse_commit_buffer
  commit: allow parse_commit_gently to handle arbitrary repository
  commit: add repository argument to get_merge_bases_many_0
  commit: add repository argument to merge_bases_many
  commit: add repository argument to paint_down_to_common
  commit: allow parse_commit to handle arbitrary repositories
  commit: allow paint_down_to_common to handle arbitrary repositories
  commit: allow merge_bases_many to handle arbitrary repositories
  commit: add repository argument to remove_redundant
  commit: allow remove_redundant to handle arbitrary repositories
  commit: allow get_merge_bases_many_0 to handle arbitrary repositories
  commit: add repository argument to get_merge_bases
  commit: allow get_merge_bases to handle arbitrary repositories
  blob: allow lookup_blob to handle arbitrary repositories
  tag: allow lookup_tag to handle arbitrary repositories
  tag: allow parse_tag_buffer to handle arbitrary repositories
  object: allow parse_object_buffer to handle arbitrary repositories
  objects: allow check_sha1_signature to handle arbitrary repositories
  object: allow parse_object to handle arbitrary repositories
  tag: allow deref_tag to handle arbitrary repositories
  commit: allow lookup_commit_reference_gently to handle arbitrary
    repositories
  commit: allow lookup_commit_reference to handle arbitrary repositories
  commit: add repository argument to get_commit_buffer
  commit: add repository argument to logmsg_reencode
  pretty: add repository argument to format_commit_message
  commit: allow get_commit_buffer to handle arbitrary repositories
  pretty: allow logmsg_reencode to handle arbitrary repositories
  commit: add repository argument to in_merge_bases_many
  commit: add repository argument to in_merge_bases
  commit: allow in_merge_bases_many to handle arbitrary repositories
  commit: allow in_merge_bases to handle arbitrary repositories
  pretty: allow format_commit_message to handle arbitrary repositories
  submodule: add repository argument to print_submodule_summary
  submodule: Reorder open_submodule function
  submodule: use submodule repos for object lookup

 Makefile                                   |   2 +-
 alloc.c                                    |  48 +++--
 alternates.h                               |  68 +++++++
 apply.c                                    |   8 +-
 archive-tar.c                              |   6 +-
 archive-zip.c                              |   8 +-
 archive.c                                  |   9 +-
 bisect.c                                   |   6 +-
 blame.c                                    |  40 ++--
 blob.c                                     |  10 +-
 blob.h                                     |   2 +-
 branch.c                                   |  16 +-
 builtin/am.c                               |  14 +-
 builtin/blame.c                            |   9 +-
 builtin/branch.c                           |  11 +-
 builtin/cat-file.c                         |  28 ++-
 builtin/checkout.c                         |  12 +-
 builtin/clone.c                            |  14 +-
 builtin/commit-tree.c                      |   5 +-
 builtin/commit.c                           |  63 +++---
 builtin/count-objects.c                    |  12 +-
 builtin/describe.c                         |  19 +-
 builtin/diff-tree.c                        |   9 +-
 builtin/diff.c                             |   7 +-
 builtin/difftool.c                         |   5 +-
 builtin/fast-export.c                      |  30 +--
 builtin/fetch.c                            |  21 +-
 builtin/fmt-merge-msg.c                    |  19 +-
 builtin/fsck.c                             |  44 ++--
 builtin/gc.c                               |   8 +-
 builtin/grep.c                             |  10 +-
 builtin/hash-object.c                      |   1 +
 builtin/index-pack.c                       |  26 ++-
 builtin/log.c                              |  26 ++-
 builtin/ls-files.c                         |   2 +-
 builtin/ls-tree.c                          |   4 +-
 builtin/merge-base.c                       |  11 +-
 builtin/merge-tree.c                       |  12 +-
 builtin/merge.c                            |  41 ++--
 builtin/mktag.c                            |  10 +-
 builtin/mktree.c                           |   4 +-
 builtin/name-rev.c                         |  15 +-
 builtin/notes.c                            |  16 +-
 builtin/pack-objects.c                     |  66 +++---
 builtin/pack-redundant.c                   |   8 +-
 builtin/pack-refs.c                        |   3 +-
 builtin/prune-packed.c                     |   1 +
 builtin/prune.c                            |   8 +-
 builtin/pull.c                             |  19 +-
 builtin/receive-pack.c                     |  14 +-
 builtin/reflog.c                           |  21 +-
 builtin/remote.c                           |   1 +
 builtin/replace.c                          |  24 ++-
 builtin/reset.c                            |  11 +-
 builtin/rev-list.c                         |   7 +-
 builtin/rev-parse.c                        |  11 +-
 builtin/shortlog.c                         |   5 +-
 builtin/show-branch.c                      |   9 +-
 builtin/show-ref.c                         |   1 +
 builtin/submodule--helper.c                |   5 +-
 builtin/tag.c                              |  12 +-
 builtin/unpack-file.c                      |   4 +-
 builtin/unpack-objects.c                   |  15 +-
 builtin/verify-commit.c                    |   7 +-
 bulk-checkin.c                             |   3 +-
 bundle.c                                   |   9 +-
 cache-tree.c                               |   4 +-
 cache.h                                    | 315 ++---------------------------
 combine-diff.c                             |   5 +-
 commit.c                                   | 241 ++++++++++++----------
 commit.h                                   |  49 +++--
 config.c                                   |   2 +-
 contrib/coccinelle/cached_path.cocci       |  44 ++++
 contrib/coccinelle/object_parser.cocci     | 125 ++++++++++++
 contrib/coccinelle/object_store.cocci      |  74 +++++++
 contrib/coccinelle/packed_git.cocci        |  15 ++
 contrib/coccinelle/submodule_reading.cocci |  34 ++++
 convert.c                                  |   1 +
 diff.c                                     |   8 +-
 diffcore-rename.c                          |   1 +
 dir.c                                      |   4 +-
 entry.c                                    |   5 +-
 environment.c                              |  10 +-
 fast-import.c                              |  45 +++--
 fetch-pack.c                               |  47 +++--
 fsck.c                                     |  20 +-
 git.c                                      |   2 +-
 grep.c                                     |   5 +-
 http-backend.c                             |  12 +-
 http-push.c                                |  24 ++-
 http-walker.c                              |   5 +-
 http.c                                     |  10 +-
 line-log.c                                 |   2 +-
 list-objects-filter.c                      |   2 +-
 list-objects.c                             |   4 +-
 log-tree.c                                 |  18 +-
 loose-object.h                             |  91 +++++++++
 mailmap.c                                  |   4 +-
 match-trees.c                              |   6 +-
 merge-blobs.c                              |   8 +-
 merge-recursive.c                          |  30 +--
 mru.h                                      |   1 +
 notes-cache.c                              |   9 +-
 notes-merge.c                              |  15 +-
 notes-utils.c                              |   6 +-
 notes.c                                    |  12 +-
 object-store.h                             | 183 +++++++++++++++++
 object.c                                   | 106 +++++-----
 object.h                                   |  82 +++++++-
 pack-bitmap-write.c                        |   7 +-
 pack-bitmap.c                              |   6 +-
 pack-check.c                               |   6 +-
 pack-revindex.c                            |   1 +
 pack.h                                     |   1 +
 packfile.c                                 | 149 ++++++++------
 packfile.h                                 |  41 +++-
 parse-options-cb.c                         |   2 +-
 path.c                                     |  18 +-
 path.h                                     |  40 +++-
 pretty.c                                   |  24 ++-
 pretty.h                                   |   2 +-
 reachable.c                                |  12 +-
 read-cache.c                               |   7 +-
 ref-filter.c                               |  15 +-
 reflog-walk.c                              |   3 +-
 refs.c                                     |  87 ++++----
 refs.h                                     |   4 +-
 refs/files-backend.c                       |   2 +-
 remote-testsvn.c                           |   6 +-
 remote.c                                   |  27 ++-
 replace_object.c => replace-object.c       |  70 +++----
 replace-object.h                           |  34 ++++
 repository.c                               |  84 ++++++--
 repository.h                               |  33 ++-
 rerere.c                                   |  12 +-
 revision.c                                 |  48 ++---
 send-pack.c                                |   7 +-
 sequencer.c                                |  93 +++++----
 server-info.c                              |  12 +-
 sha1_file.c                                | 236 +++++++++++----------
 sha1_name.c                                |  75 ++++---
 shallow.c                                  | 113 ++++++-----
 streaming.c                                |  33 +--
 streaming.h                                |   2 +-
 submodule-config.c                         |   3 +-
 submodule.c                                | 100 +++++----
 t/helper/test-example-decorate.c           |   6 +-
 t/helper/test-ref-store.c                  |   6 +-
 t/helper/test-revision-walking.c           |   2 +-
 t/t5531-deep-submodule-push.sh             |  16 ++
 tag.c                                      |  36 ++--
 tag.h                                      |   7 +-
 tmp-objdir.c                               |   4 +-
 transport.c                                |   5 +-
 tree-walk.c                                |   5 +-
 tree.c                                     |  24 ++-
 tree.h                                     |   2 +-
 upload-pack.c                              |  25 ++-
 walker.c                                   |  17 +-
 wt-status.c                                |  10 +-
 xdiff-interface.c                          |   4 +-
 161 files changed, 2669 insertions(+), 1596 deletions(-)
 create mode 100644 alternates.h
 create mode 100644 contrib/coccinelle/cached_path.cocci
 create mode 100644 contrib/coccinelle/object_parser.cocci
 create mode 100644 contrib/coccinelle/object_store.cocci
 create mode 100644 contrib/coccinelle/packed_git.cocci
 create mode 100644 contrib/coccinelle/submodule_reading.cocci
 create mode 100644 loose-object.h
 create mode 100644 object-store.h
 rename replace_object.c => replace-object.c (55%)
 create mode 100644 replace-object.h

-- 
2.15.1.433.g936d1b9894.dirty


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 000/194] Moving global state into the repository object
  2018-02-05 23:51 [RFC PATCH 000/194] Moving global state into the repository object Stefan Beller
@ 2018-02-06  0:54 ` Stefan Beller
  2018-02-06  3:32 ` brian m. carlson
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Stefan Beller @ 2018-02-06  0:54 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

On Mon, Feb 5, 2018 at 3:51 PM, Stefan Beller <sbeller@google.com> wrote:

>
> Any suggestions welcome!
>

I wouldn't say sending out this many patches is a smooth experience:

* After trying it out internally, it stopped at patch ~80 and the rate
limiter kicked in.
* Okay fine, I'll rate limit myself to be able to send out as many patches!
* Apparently I did not understand the docs correctly, and after reading the code
  a patch was made:
  https://public-inbox.org/git/20180206000743.217503-1-sbeller@google.com/
  I think the sensible thing would be to error out instead of ignoring the
  relogin parameter.
  This is why patches 21/22 ff. have a weird in-reply-to setup, as I hit CTRL-C
  to stop the unlimited sending and picked up from there using the rate limited
  --relogin-delay=25 --batch-size=3
* You'll notice that at patches 99/100 ff the same weird in-reply-to appeared.
  One patch has had a missing '>' in the signoff, such that the server refused
  to cc appropriately ("Stefan" not a valid recipient)
  It would be nice if we could check for that before even sending out emails,
  but I could live with this blamed on the user. (It was my mistake)
* Patch 118 just bounced. "Message rejected."
  It is found at
https://github.com/stefanbeller/git/commit/141ba1f82c223636728a476f9acc1229f353a381

Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 000/194] Moving global state into the repository object
  2018-02-05 23:51 [RFC PATCH 000/194] Moving global state into the repository object Stefan Beller
  2018-02-06  0:54 ` Stefan Beller
@ 2018-02-06  3:32 ` brian m. carlson
  2018-02-06 20:25 ` Stefan Beller
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: brian m. carlson @ 2018-02-06  3:32 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 3390 bytes --]

On Mon, Feb 05, 2018 at 03:51:54PM -0800, Stefan Beller wrote:
> This series moves a lot of global state into the repository struct.
> It applies on top of 2512f15446149235156528dafbe75930c712b29e (2.16.0)
> It can be found at https://github.com/stefanbeller/git/tree/object-store
> 
> Motivation for this series:
> * Easier to reason about code when all state is stored in one place,
>   for example for multithreading
> * Easier to move to processing submodules in-process instead of
>   calling a processes for the submodule handling.
>   The last patch demonstrates this.

Of course, I like performance improvements (who doesn't?), but I really
like the way this series gets rid of various global variables.  I much
prefer to have fewer globals when possible.

> This is why this series tries to be reviewer friendly by utilizing
> machine assistance as much as possible. There are 3 types of patches
> in this series:
> 
> (A) <area>: add repository argument to <function_foo>
>   This sort of patch is just adding the repository as an argument to that
>   function. It assumes the given repository is `the_repository` and has
>   a compile time check for that assertion using a preprocessor trick.
>   As did a compile check after each commit of the series, I don't expect
>   the review burden on these patches to be high. Review on these patches
>   is mostly checking for formatting errors or if I sneak in malicious
>   code -- if you're inclined to believe that.
> 
> (B) <area>: allow <function_foo> to handle arbitrary repositories
>   This sort of patch is touching code inside the given function only,
>   usually replacing all occurrences of `the_repository` by the argument
>   `r`. Here the review is critical: Did I miss any function that relies
>   on state of `the_repository` ?
>   This series was developed by converting all functions of
>   packfile/sha1-file/commit/etc using (A); after the demo patch
>   was possible, all patches that did (A), but not (B) were deleted.
>   Therefore I was confident at time of writing that patch that
>   the conversion of a function which doesn't take a repository argument
>   is okay.
> 
> (C) other patches, such as moving code around,
>     demoing this series in the last patch
> 
> This approach was chosen as I think it is review friendly, despite the
> huge number of patches.

I very much appreciate this approach.  It made reviewing things much
nicer.

> A weakness of this approach is the buildup of a large series, which ignores
> the ongoing development. Rebasing that series turned out to be ok, but merging
> it with confidence is an issue.

That's my concern as well.  This necessarily conflicts with some of the
object_id and the_hash_algo work, but rebasing work on top of it
shouldn't be too awful once it's ready to be picked up.  In fact, long
term, it makes that work easier, so I'm all for it.

I'm sure Junio will have thoughts on the size of the series and
potential conflicts in flight, but I'll let him articulate those.  The
series could in theory be split into pieces the way you've done it: it
would just be less convenient to work with some of the functions in the
mean time.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 867 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 000/194] Moving global state into the repository object
  2018-02-05 23:51 [RFC PATCH 000/194] Moving global state into the repository object Stefan Beller
  2018-02-06  0:54 ` Stefan Beller
  2018-02-06  3:32 ` brian m. carlson
@ 2018-02-06 20:25 ` Stefan Beller
  2018-02-06 23:46   ` Stefan Beller
  2018-02-07  9:54   ` Eric Sunshine
  2018-02-07 11:48 ` Duy Nguyen
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 11+ messages in thread
From: Stefan Beller @ 2018-02-06 20:25 UTC (permalink / raw)
  To: git, Eric Sunshine; +Cc: Stefan Beller

> Any suggestions welcome!

Eric repeatedly points out leaking memory.

As of today we do not care about memory leaking as it is cleaned
up at the end of the program anyway, for example the objects
hash table is never cleared.

In a resend I will put the infrastructure in place to free the memory via
adding

  raw_object_store_clear(struct raw_object_store *)
  object_parser_clear(struct object_parser *)
  ref_store_clear(struct ref_store *)

and calling these functions on repo destruction. The functions
itself will be empty code-wise and contain TODO comments listing
all variables that need care.

Follow up patches can figure out what is best to do, such as closing
the memleaks. As repo_clear is not called for the_repository
we'd even keep the property of relying on fast cleanup by the OS.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 000/194] Moving global state into the repository object
  2018-02-06 20:25 ` Stefan Beller
@ 2018-02-06 23:46   ` Stefan Beller
  2018-02-07  9:54   ` Eric Sunshine
  1 sibling, 0 replies; 11+ messages in thread
From: Stefan Beller @ 2018-02-06 23:46 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

It took some time to do fixes in such a long series, mostl of my time spent in
rebasing and being extra careful about selecting the right commit to edit.
Based on feedback so far I have queued the changes below.
The changes are also available at https://github.com/stefanbeller/git/tree/object-store-v2
I do not plan on sending out this series today, as I would want
to wait a couple of days between resending for such a large series.

I think I addressed all issues raised, except finishing the memleak
issues, which I'll continue working on.

Stefan

diff --git c/object-store.h w/object-store.h
index 62763b8412..a6051ccb73 100644
--- c/object-store.h
+++ w/object-store.h
@@ -60,6 +60,8 @@ struct raw_object_store {
 #define RAW_OBJECT_STORE_INIT \
 	{ NULL, MRU_INIT, ALTERNATES_INIT, { NULL, 0, 0, 0 }, 0, 0, 0 }
 
+extern void raw_object_store_clear(struct raw_object_store *o);
+
 struct packed_git {
 	struct packed_git *next;
 	struct pack_window *windows;
diff --git c/object.c w/object.c
index 2fe5fac3ce..f13f9d97f4 100644
--- c/object.c
+++ w/object.c
@@ -440,3 +440,20 @@ void clear_object_flags(unsigned flags)
 			obj->flags &= ~flags;
 	}
 }
+
+
+void raw_object_store_clear(struct raw_object_store *o)
+{
+	/* TODO: free alt_odb_list/tail */
+	/* TODO: clear packed_git, packed_git_mru */
+}
+
+void object_parser_clear(struct object_parser *o)
+{
+	/*
+	 * TOOD free objects in o->obj_hash.
+	 *
+	 * As objects are allocated in slabs (see alloc.c), we do
+	 * not need to free each object, but each slab instead.
+	 */
+}
diff --git c/object.h w/object.h
index 900f1b6611..0b42b09322 100644
--- c/object.h
+++ w/object.h
@@ -43,6 +43,8 @@ extern struct alloc_state the_repository_object_state;
 	&the_repository_object_state, \
 	0 }
 
+extern void object_parser_clear(struct object_parser *o);
+
 struct object_list {
 	struct object *item;
 	struct object_list *next;
diff --git c/repository.c w/repository.c
index 64fb6d8b34..d5ea158b26 100644
--- c/repository.c
+++ w/repository.c
@@ -141,6 +141,9 @@ static void repo_clear(struct repository *repo)
 	FREE_AND_NULL(repo->worktree);
 	FREE_AND_NULL(repo->submodule_prefix);
 
+	raw_object_store_clear(&repo->objects);
+	object_parser_clear(&repo->parsed_objects);
+
 	if (repo->config) {
 		git_configset_clear(repo->config);
 		FREE_AND_NULL(repo->config);
diff --git c/submodule.c w/submodule.c
index 9bd337ce99..c9634f84ef 100644
--- c/submodule.c
+++ w/submodule.c
@@ -494,10 +494,7 @@ static int open_submodule(struct repository *out, const char *path)
 {
 	struct strbuf sb = STRBUF_INIT;
 
-	if (submodule_to_gitdir(&sb, path))
-		return -1;
-
-	if (repo_init(out, sb.buf, NULL)) {
+	if (submodule_to_gitdir(&sb, path) || repo_init(out, sb.buf, NULL)) {
 		strbuf_release(&sb);
 		return -1;
 	}
diff --git c/t/t5531-deep-submodule-push.sh w/t/t5531-deep-submodule-push.sh
index 8b2aa5a0f4..39cb2c1c34 100755
--- c/t/t5531-deep-submodule-push.sh
+++ w/t/t5531-deep-submodule-push.sh
@@ -308,22 +308,6 @@ test_expect_success 'submodule entry pointing at a tag is error' '
 	test_i18ngrep "is a tag, not a commit" err
 '
 
-test_expect_success 'replace ref does not interfere with submodule access' '
-	test_commit -C work/gar/bage one &&
-	test_commit -C work/gar/bage two &&
-	git -C work/gar/bage reset HEAD^^ &&
-	git -C work/gar/bage replace two one &&
-	test_when_finished "git -C work/gar/bage replace -d two" &&
-
-	test_commit -C work/gar/bage three &&
-	git -C work add gar/bage &&
-	git -C work commit -m "advance submodule" &&
-
-	git -C work push --recurse-submodules=on-demand ../pub.git master 2>err &&
-	! grep error err &&
-	! grep fatal err
-'
-
 test_expect_success 'push fails if recurse submodules option passed as yes' '
 	(
 		cd work/gar/bage &&

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 000/194] Moving global state into the repository object
  2018-02-06 20:25 ` Stefan Beller
  2018-02-06 23:46   ` Stefan Beller
@ 2018-02-07  9:54   ` Eric Sunshine
  1 sibling, 0 replies; 11+ messages in thread
From: Eric Sunshine @ 2018-02-07  9:54 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

On Tue, Feb 6, 2018 at 3:25 PM, Stefan Beller <sbeller@google.com> wrote:
>> Any suggestions welcome!
>
> Eric repeatedly points out leaking memory.
>
> As of today we do not care about memory leaking as it is cleaned
> up at the end of the program anyway, for example the objects
> hash table is never cleared.

Is this still true/desirable when multiple 'repos' are involved?

> In a resend I will put the infrastructure in place to free the memory via
> adding
>
>   raw_object_store_clear(struct raw_object_store *)
>   object_parser_clear(struct object_parser *)
>   ref_store_clear(struct ref_store *)
>
> and calling these functions on repo destruction. The functions
> itself will be empty code-wise and contain TODO comments listing
> all variables that need care.

I'm confused. If you go to the effort of inserting TODO's why not go
all the way and instead insert the actual code?

> Follow up patches can figure out what is best to do, such as closing
> the memleaks. As repo_clear is not called for the_repository
> we'd even keep the property of relying on fast cleanup by the OS.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 000/194] Moving global state into the repository object
  2018-02-05 23:51 [RFC PATCH 000/194] Moving global state into the repository object Stefan Beller
                   ` (2 preceding siblings ...)
  2018-02-06 20:25 ` Stefan Beller
@ 2018-02-07 11:48 ` Duy Nguyen
  2018-02-07 18:06   ` Stefan Beller
  2018-02-07 16:39 ` Jeff Hostetler
  2018-02-08 15:42 ` Elijah Newren
  5 siblings, 1 reply; 11+ messages in thread
From: Duy Nguyen @ 2018-02-07 11:48 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Git Mailing List

On Tue, Feb 6, 2018 at 6:51 AM, Stefan Beller <sbeller@google.com> wrote:
> This series moves a lot of global state into the repository struct.
> It applies on top of 2512f15446149235156528dafbe75930c712b29e (2.16.0)
> It can be found at https://github.com/stefanbeller/git/tree/object-store
>
> Motivation for this series:
> * Easier to reason about code when all state is stored in one place,
>   for example for multithreading
> * Easier to move to processing submodules in-process instead of
>   calling a processes for the submodule handling.
>   The last patch demonstrates this.

I have a mixed feeling about this. The end game is to keep
"the_repository" references as minimum as possible, right? Like we
only need to mention in in cmd_xxx() and not all over the place. If
so, yay!!

Something else.. maybe "struct repository *" should not be a universal
function argument to avoid global states. Like sha1_file.c is mostly about the
object store, and I see that you added object store struct (nice!).
These sha1 related function should take the object_store* (which
probably is a combination of both packed and loose stores since they
depend on each other), not repository*. This way if a function needs
both access to object store and ref store, we can see the two
dependencies from function arguments.

The alternate object store, if modeled right, could share the same
object store interface. But I don't think we should do anything that
big right now, just put alternate object store inside object_store.

Similarly those functions in blob.c, commit.c, tree.c... should take
object_parser* as argument instead of repository*. Maybe there's a
better name for this than object_parser. parsed_object_store I guess.
Or maybe just object_pool.
-- 
Duy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 000/194] Moving global state into the repository object
  2018-02-05 23:51 [RFC PATCH 000/194] Moving global state into the repository object Stefan Beller
                   ` (3 preceding siblings ...)
  2018-02-07 11:48 ` Duy Nguyen
@ 2018-02-07 16:39 ` Jeff Hostetler
  2018-02-08 15:42 ` Elijah Newren
  5 siblings, 0 replies; 11+ messages in thread
From: Jeff Hostetler @ 2018-02-07 16:39 UTC (permalink / raw)
  To: Stefan Beller, git



On 2/5/2018 6:51 PM, Stefan Beller wrote:
> This series moves a lot of global state into the repository struct.
> It applies on top of 2512f15446149235156528dafbe75930c712b29e (2.16.0)
> It can be found at https://github.com/stefanbeller/git/tree/object-store
> 
> Motivation for this series:
> * Easier to reason about code when all state is stored in one place,
>    for example for multithreading
> * Easier to move to processing submodules in-process instead of
>    calling a processes for the submodule handling.
>    The last patch demonstrates this.
[...]

Very nice.  My eyes glazed over a few times, but I like the
direction you're heading here.

Thanks for tackling this!
Jeff



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 000/194] Moving global state into the repository object
  2018-02-07 11:48 ` Duy Nguyen
@ 2018-02-07 18:06   ` Stefan Beller
  2018-02-10 10:36     ` Duy Nguyen
  0 siblings, 1 reply; 11+ messages in thread
From: Stefan Beller @ 2018-02-07 18:06 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List

On Wed, Feb 7, 2018 at 3:48 AM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Tue, Feb 6, 2018 at 6:51 AM, Stefan Beller <sbeller@google.com> wrote:
>> This series moves a lot of global state into the repository struct.
>> It applies on top of 2512f15446149235156528dafbe75930c712b29e (2.16.0)
>> It can be found at https://github.com/stefanbeller/git/tree/object-store
>>
>> Motivation for this series:
>> * Easier to reason about code when all state is stored in one place,
>>   for example for multithreading
>> * Easier to move to processing submodules in-process instead of
>>   calling a processes for the submodule handling.
>>   The last patch demonstrates this.
>
> I have a mixed feeling about this. The end game is to keep
> "the_repository" references as minimum as possible, right? Like we
> only need to mention in in cmd_xxx() and not all over the place. If
> so, yay!!

Yes. And the super-end-game long after this series is to have
    cmd_xxx(struct repository *r, argc, argv)
or so.

> Something else.. maybe "struct repository *" should not be a universal
> function argument to avoid global states. Like sha1_file.c is mostly about the
> object store, and I see that you added object store struct (nice!).
> These sha1 related function should take the object_store* (which
> probably is a combination of both packed and loose stores since they
> depend on each other), not repository*. This way if a function needs
> both access to object store and ref store, we can see the two
> dependencies from function arguments.

I tried that in the beginning and it blew up a couple of times when I realized
that I forgot to pass through one of these dependencies.
Maybe we can go to the repository and shrink the scope in a follow up?

> The alternate object store, if modeled right, could share the same
> object store interface. But I don't think we should do anything that
> big right now, just put alternate object store inside object_store.

yup that is the case, see struct raw_object_store at the end of the series
https://github.com/stefanbeller/git/blob/object-store-v2/object-store.h

> Similarly those functions in blob.c, commit.c, tree.c... should take
> object_parser* as argument instead of repository*. Maybe there's a
> better name for this than object_parser. parsed_object_store I guess.
> Or maybe just object_pool.

Note that the initial few patches are misleading in the name,
https://public-inbox.org/git/20180205235735.216710-59-sbeller@google.com/
which splits up the object handling into two layers, the "physical" layer
(where to get the data from, mmaping pack files, alternates, streams of bytes),
which is "struct raw_object_store objects;" in the repository, and the
"object" layer, which is about parsing the objects and making sense of the data
(which tree belongs to a commit, dereferencing tags)

So maybe I'll try to make the first layer into its own series, such
that we'll have a smaller series.

Thanks,
Stefan

> --
> Duy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 000/194] Moving global state into the repository object
  2018-02-05 23:51 [RFC PATCH 000/194] Moving global state into the repository object Stefan Beller
                   ` (4 preceding siblings ...)
  2018-02-07 16:39 ` Jeff Hostetler
@ 2018-02-08 15:42 ` Elijah Newren
  5 siblings, 0 replies; 11+ messages in thread
From: Elijah Newren @ 2018-02-08 15:42 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Git Mailing List

On Mon, Feb 5, 2018 at 3:51 PM, Stefan Beller <sbeller@google.com> wrote:
> This series moves a lot of global state into the repository struct.
> It applies on top of 2512f15446149235156528dafbe75930c712b29e (2.16.0)
> It can be found at https://github.com/stefanbeller/git/tree/object-store
>
> Motivation for this series:
> * Easier to reason about code when all state is stored in one place,
>   for example for multithreading
> * Easier to move to processing submodules in-process instead of
>   calling a processes for the submodule handling.
>   The last patch demonstrates this.

For what it's worth...I spot checked some random patches, looked
closely at the changes to merge-recursive, and looked at what was
happening to cache.h

I like how you structured the series, and I like the goals.  Didn't
find any problems in the small pieces I looked at.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 000/194] Moving global state into the repository object
  2018-02-07 18:06   ` Stefan Beller
@ 2018-02-10 10:36     ` Duy Nguyen
  0 siblings, 0 replies; 11+ messages in thread
From: Duy Nguyen @ 2018-02-10 10:36 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Git Mailing List

On Thu, Feb 8, 2018 at 1:06 AM, Stefan Beller <sbeller@google.com> wrote:
>> Something else.. maybe "struct repository *" should not be a universal
>> function argument to avoid global states. Like sha1_file.c is mostly about the
>> object store, and I see that you added object store struct (nice!).
>> These sha1 related function should take the object_store* (which
>> probably is a combination of both packed and loose stores since they
>> depend on each other), not repository*. This way if a function needs
>> both access to object store and ref store, we can see the two
>> dependencies from function arguments.
>
> I tried that in the beginning and it blew up a couple of times when I realized
> that I forgot to pass through one of these dependencies.
> Maybe we can go to the repository and shrink the scope in a follow up?

I think it's a good thing to do. We need to make these implicit
dependencies explicit sooner or later.

Also, perhaps at the earlier steps you don't need to add everything to
the_respository yet. You can have the_object_store, the_parsed_object
(and we already have get_main_ref_store()), then use them internally
without touching the API, which helps reduce code change. For example,
read_sha1_file() could be converted to take "struct object_store *"
all the way

void *read_sha1_file_extended(const unsigned char *sha1,
      enum object_type *type,
      unsigned long *size,
      int lookup_replace)
{
        struct object_store *store = &the_object_store;
        ...
        // more access to "store"
}

When every piece is in place, the API change can be made be removing
that "store = &the_object_store" line and make "store" an argument of
read_sha1_file().
-- 
Duy

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, back to index

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-05 23:51 [RFC PATCH 000/194] Moving global state into the repository object Stefan Beller
2018-02-06  0:54 ` Stefan Beller
2018-02-06  3:32 ` brian m. carlson
2018-02-06 20:25 ` Stefan Beller
2018-02-06 23:46   ` Stefan Beller
2018-02-07  9:54   ` Eric Sunshine
2018-02-07 11:48 ` Duy Nguyen
2018-02-07 18:06   ` Stefan Beller
2018-02-10 10:36     ` Duy Nguyen
2018-02-07 16:39 ` Jeff Hostetler
2018-02-08 15:42 ` Elijah Newren

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox