On Tue, Apr 06, 2021 at 01:30:57PM -0400, Jeff King wrote: > On Mon, Mar 15, 2021 at 02:14:36PM +0100, Patrick Steinhardt wrote: > > > The NOT_USER_GIVEN flag of an object marks whether a flag was explicitly > > provided by the user or not. The most important use case for this is > > when filtering objects: only objects that were not explicitly requested > > will get filtered. > > > > The flag is currently only set for blobs and trees, which has been fine > > given that there are no filters for tags or commits currently. We're > > about to extend filtering capabilities to add object type filter though, > > which requires us to set up the NOT_USER_GIVEN flag correctly -- if it's > > not set, the object wouldn't get filtered at all. > > > > Mark unseen commit parents as NOT_USER_GIVEN when processing parents. > > Like this, explicitly provided parents stay user-given and thus > > unfiltered, while parents which get loaded as part of the graph walk > > can be filtered. > > > > This commit shouldn't have any user-visible impact yet as there is no > > logic to filter commits yet. > > I'm still scratching my head a bit to understand how NOT_USER_GIVEN can > possibly be correct (as opposed to USER_GIVEN). If we visit the commit > in a not-user-given context and add the flag, how do we know it wasn't > _also_ visited in a user-given context? > > Just guessing, but perhaps the SEEN flag is saving us here? If we visit > the user-given commit itself first, then we give it the SEEN flag. Then > if we try to visit it again via parent traversal, we've already > processed it and don't add the NOT_USER_GIVEN flag here. Yes, I think that's mostly it. > That seems the opposite of the order we'd usually traverse, but I think > we set SEEN on each commit in prepare_revision_walk(), before we do any > traversing. > > So I _think_ it all works even with your changes here, but I have to say > this NOT_USER_GIVEN thing seems really fragile to me. Not new in your > series, of course, but something we may want to look at. > > Just grepping around, "rev-list -g" will happily remove SEEN flags, so I > suspect it interacts badly with --filter. Just trying "rev-list -g > --objects --filter=object:type=blob HEAD" shows that it produces quite a > lot of commits (which I think is a more fundamental problem: it is not > walking the parent chain at all to assign these NOT_USER_GIVEN flags). I totally agree that this feels fragile, and developing this series with NOT_USER_GIVEN wasn't the most enjoyable experience either. I wouldn't love to be doing the conversion back to USER_GIVEN as part of this series, but I wouldn't oppose doing that job either. Right now I don't feel like I'm sufficiently sure that it's working for all cases, and indeed your example with "rev-list -g" already shows one case where it's breaking. So let me know whether I should add the conversion as preparatory step. Patrick