Only track built files for final output?

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* Only track built files for final output?
@ 2019-08-20 12:21 Leam Hall
  2019-08-20 17:46 ` Pratyush Yadav
  0 siblings, 1 reply; 6+ messages in thread
From: Leam Hall @ 2019-08-20 12:21 UTC (permalink / raw)
  To: git

Hey all, a newbie could use some help.

We have some code that generates data files, and as a part of our build 
process those files are rebuilt to ensure things work. This causes an 
issue with branches and merging, as the data files change slightly and 
dealing with half a dozen merge conflicts, for files that are in an 
interim state, is frustrating. The catch is that when the code goes to 
the production state, those files must be in place and current.

We use a release branch, and then fork off that for each issue. Testing, 
and file creation, is a part of the pre-merge process. This is what 
causes the merge conflicts.

Right now my thought is to put the "final" versions of the files in some 
other directory, and put the interim file storage directory in 
.gitignore. Is there a better way to do this?

Thanks!

Leam

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Only track built files for final output?
  2019-08-20 12:21 Only track built files for final output? Leam Hall
@ 2019-08-20 17:46 ` Pratyush Yadav
  2019-08-20 18:01   ` Leam Hall
  2019-08-20 18:11   ` Randall S. Becker
  0 siblings, 2 replies; 6+ messages in thread
From: Pratyush Yadav @ 2019-08-20 17:46 UTC (permalink / raw)
  To: Leam Hall; +Cc: git

On 20/08/19 08:21AM, Leam Hall wrote:
> Hey all, a newbie could use some help.
> 
> We have some code that generates data files, and as a part of our build
> process those files are rebuilt to ensure things work. This causes an issue
> with branches and merging, as the data files change slightly and dealing
> with half a dozen merge conflicts, for files that are in an interim state,
> is frustrating. The catch is that when the code goes to the production
> state, those files must be in place and current.
> 
> We use a release branch, and then fork off that for each issue. Testing, and
> file creation, is a part of the pre-merge process. This is what causes the
> merge conflicts.
> 
> Right now my thought is to put the "final" versions of the files in some
> other directory, and put the interim file storage directory in .gitignore.
> Is there a better way to do this?
> 

My philosophy with Git is to only track files that I need to generate 
the final product. I never track the generated files, because I can 
always get to them via the tracked "source" files.

So for example, I was working on a simple parser in Flex and Bison. Flex 
and Bison take source files in their syntax, and generate a C file each 
that is then compiled and linked to get to the final binary. So instead 
of tracking the generated C files, I only tracked the source Flex and 
Bison files. My build system can always get me the generated files.

So in your case, what's wrong with just tracking the source files needed 
to generate the other files, and then when you want a release binary, 
just clone the repo, run your build system, and get the generated files?  
What benefit do you get by tracking the generated files?

-- 
Regards,
Pratyush Yadav

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Only track built files for final output?
  2019-08-20 17:46 ` Pratyush Yadav
@ 2019-08-20 18:01   ` Leam Hall
  2019-08-20 18:56     ` Pratyush Yadav
  2019-08-20 19:42     ` Phil Hord
  2019-08-20 18:11   ` Randall S. Becker
  1 sibling, 2 replies; 6+ messages in thread
From: Leam Hall @ 2019-08-20 18:01 UTC (permalink / raw)
  To: git

On 8/20/19 1:46 PM, Pratyush Yadav wrote:
> On 20/08/19 08:21AM, Leam Hall wrote:
>> Hey all, a newbie could use some help.
>>
>> We have some code that generates data files, and as a part of our build
>> process those files are rebuilt to ensure things work. This causes an issue
>> with branches and merging, as the data files change slightly and dealing
>> with half a dozen merge conflicts, for files that are in an interim state,
>> is frustrating. The catch is that when the code goes to the production
>> state, those files must be in place and current.
>>
>> We use a release branch, and then fork off that for each issue. Testing, and
>> file creation, is a part of the pre-merge process. This is what causes the
>> merge conflicts.
>>
>> Right now my thought is to put the "final" versions of the files in some
>> other directory, and put the interim file storage directory in .gitignore.
>> Is there a better way to do this?
>>
> 
> My philosophy with Git is to only track files that I need to generate
> the final product. I never track the generated files, because I can
> always get to them via the tracked "source" files.
> 
> So for example, I was working on a simple parser in Flex and Bison. Flex
> and Bison take source files in their syntax, and generate a C file each
> that is then compiled and linked to get to the final binary. So instead
> of tracking the generated C files, I only tracked the source Flex and
> Bison files. My build system can always get me the generated files.
> 
> So in your case, what's wrong with just tracking the source files needed
> to generate the other files, and then when you want a release binary,
> just clone the repo, run your build system, and get the generated files?
> What benefit do you get by tracking the generated files?

For internal use I agree with you. However, there's an issue.

The generated files are used by another program's build system, and I 
can't guarantee the other build system's build system is built like 
ours. It seems easier to provide them the generated files and decouple 
their build system layout from ours.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Only track built files for final output?
  2019-08-20 17:46 ` Pratyush Yadav
  2019-08-20 18:01   ` Leam Hall
@ 2019-08-20 18:11   ` Randall S. Becker
  1 sibling, 0 replies; 6+ messages in thread
From: Randall S. Becker @ 2019-08-20 18:11 UTC (permalink / raw)
  To: 'Pratyush Yadav', 'Leam Hall'; +Cc: git

On August 20, 2019 1:47 PM, Pratyush Yadav
> On 20/08/19 08:21AM, Leam Hall wrote:
> > Hey all, a newbie could use some help.
> >
> > We have some code that generates data files, and as a part of our
> > build process those files are rebuilt to ensure things work. This
> > causes an issue with branches and merging, as the data files change
> > slightly and dealing with half a dozen merge conflicts, for files that
> > are in an interim state, is frustrating. The catch is that when the
> > code goes to the production state, those files must be in place and
current.
> >
> > We use a release branch, and then fork off that for each issue.
> > Testing, and file creation, is a part of the pre-merge process. This
> > is what causes the merge conflicts.
> >
> > Right now my thought is to put the "final" versions of the files in
> > some other directory, and put the interim file storage directory in
> .gitignore.
> > Is there a better way to do this?
> >
> 
> My philosophy with Git is to only track files that I need to generate the
final
> product. I never track the generated files, because I can always get to
them
> via the tracked "source" files.
> 
> So for example, I was working on a simple parser in Flex and Bison. Flex
and
> Bison take source files in their syntax, and generate a C file each that
is then
> compiled and linked to get to the final binary. So instead of tracking the
> generated C files, I only tracked the source Flex and Bison files. My
build
> system can always get me the generated files.
> 
> So in your case, what's wrong with just tracking the source files needed
to
> generate the other files, and then when you want a release binary, just
clone
> the repo, run your build system, and get the generated files?
> What benefit do you get by tracking the generated files?

The benefit of putting final release packages into git is based on the
following set of requirements in highly regulated industries:

1. The release artifacts can never change from the point in time at which
they are certified as working (a.k.a. passed tests) to the point when they
are replaced with other artifacts (a subsequent release). Recompiling is not
sufficient as the compilers themselves may change or be compromised. This is
an audit requirement.
2. The source commit(s) used to create the release artifacts must be
immutable so that the origins of the release artifacts are always known.
This is also an audit requirement in regulated industries.
3. Disconnecting the source from the object (as is common in artifact
repositories) breaks #2 and allows malicious code injection in
after-the-test code reproduction. Variant of #2 but from the security
perspective.
4. Metadata on the origin of the release artifacts (the clone URL, the
parent commit, the branch, signed commits), are required for forensic
analysis of code in a compliance environment.

There are other related variants of the above, but those are the essential
ones that are generally accepted in financial, insurance, medical device,
and industrial applications. Increasingly, food production and distribution
sectors are realizing that they are also subject to the above. I sadly
cannot cite specific internal regulations or policies for NDA reasons, but
hope that others are able to do that.

Regards,
Randall

-- Brief whoami:
 NonStop developer since approximately 211288444200000000
 UNIX developer since approximately 421664400
-- In my real life, I talk too much.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Only track built files for final output?
  2019-08-20 18:01   ` Leam Hall
@ 2019-08-20 18:56     ` Pratyush Yadav
  2019-08-20 19:42     ` Phil Hord
  1 sibling, 0 replies; 6+ messages in thread
From: Pratyush Yadav @ 2019-08-20 18:56 UTC (permalink / raw)
  To: Leam Hall; +Cc: git

On 20/08/19 02:01PM, Leam Hall wrote:
> On 8/20/19 1:46 PM, Pratyush Yadav wrote:
> > On 20/08/19 08:21AM, Leam Hall wrote:
> > > Hey all, a newbie could use some help.
> > > 
> > > We have some code that generates data files, and as a part of our build
> > > process those files are rebuilt to ensure things work. This causes an issue
> > > with branches and merging, as the data files change slightly and dealing
> > > with half a dozen merge conflicts, for files that are in an interim state,
> > > is frustrating. The catch is that when the code goes to the production
> > > state, those files must be in place and current.
> > > 
> > > We use a release branch, and then fork off that for each issue. Testing, and
> > > file creation, is a part of the pre-merge process. This is what causes the
> > > merge conflicts.
> > > 
> > > Right now my thought is to put the "final" versions of the files in some
> > > other directory, and put the interim file storage directory in .gitignore.
> > > Is there a better way to do this?
> > > 
> > 
> > My philosophy with Git is to only track files that I need to generate
> > the final product. I never track the generated files, because I can
> > always get to them via the tracked "source" files.
> > 
> > So for example, I was working on a simple parser in Flex and Bison. Flex
> > and Bison take source files in their syntax, and generate a C file each
> > that is then compiled and linked to get to the final binary. So instead
> > of tracking the generated C files, I only tracked the source Flex and
> > Bison files. My build system can always get me the generated files.
> > 
> > So in your case, what's wrong with just tracking the source files needed
> > to generate the other files, and then when you want a release binary,
> > just clone the repo, run your build system, and get the generated files?
> > What benefit do you get by tracking the generated files?
> 
> For internal use I agree with you. However, there's an issue.
> 
> The generated files are used by another program's build system, and I can't
> guarantee the other build system's build system is built like ours. It seems
> easier to provide them the generated files and decouple their build system
> layout from ours.

Maybe I don't completely understand your use case, but you can still 
pass off the generated files to the external build system without having 
to track them. Unless the external build system exclusively relies on 
git clones/fetches, how about packaging your release with your files 
generated from your build system in a tarball (or anything else that 
works for you) and pushing them to the external build system?

Assuming you just _have_ to track those files, will always resolving the 
merge conflicts as 'theirs' work?

My guess about your process works is you branch off, make a new feature 
or fix, and then merge those changes to your master. In that case, the 
changes that the feature branch made to your generated files should 
always be the ones that get committed, correct? master's version of the 
generated files should be stale. So your merge conflicts always need to 
be resolved as 'theirs', at least on the generated files. I don't know 
if git-merge supports file-specific merge strategies though, please 
check once. Otherwise, maybe you can write a script that resolves 
conflicts as 'theirs' for the generated files, and lets you figure it 
out manually for the rest. 

I'm just thinking out loud. I don't know how well this will scale. Maybe 
the more experienced folks here will have better ideas.

-- 
Regards,
Pratyush Yadav

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Only track built files for final output?
  2019-08-20 18:01   ` Leam Hall
  2019-08-20 18:56     ` Pratyush Yadav
@ 2019-08-20 19:42     ` Phil Hord
  1 sibling, 0 replies; 6+ messages in thread
From: Phil Hord @ 2019-08-20 19:42 UTC (permalink / raw)
  To: Leam Hall; +Cc: Git

On Tue, Aug 20, 2019 at 11:01 AM Leam Hall <leamhall@gmail.com> wrote:
> On 8/20/19 1:46 PM, Pratyush Yadav wrote:

> > So in your case, what's wrong with just tracking the source files needed
> > to generate the other files, and then when you want a release binary,
> > just clone the repo, run your build system, and get the generated files?
> > What benefit do you get by tracking the generated files?
>
> For internal use I agree with you. However, there's an issue.
>
> The generated files are used by another program's build system, and I
> can't guarantee the other build system's build system is built like
> ours. It seems easier to provide them the generated files and decouple
> their build system layout from ours.

It becomes a burden to keep build products in the repo over time, for
the reasons you already mentioned (they don't merge and you shouldn't
try), but also because those build products never go away, leading to
repo-bloat.  Once you realize the cost is too great, it's often too
late to do something about it cheaply.  My advice is to keep your
source repository clean from the beginning, so it contains only source
code.

This means you still have a problem because you want to distribute
certified build artifacts.  I recommend you use some other tool to
handle that, like Artifactory.

I recognize it seems easy to use Git for this because Git already acts
like a reliable, portable, trackable file distribution system. But
that's secondary to Git's purpose; there are better tools for that. If
you must lean on Git for this, I like to isolate the binaries into a
submodule so developers who don't want or need them aren't bothered by
them, and they can stay out of the way of merges.  But submodules
present new workflow challenges and will require some study and
education.  If you want to keep them out of the way of developers, you
can keep your source code repo and your artifact repo completely
separate and make some "superproject" which contains both of those
repos as submodules.  The nice feature about this setup is you can
positively associate the set of build products with the set of source
code that produced them.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-08-20 19:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-20 12:21 Only track built files for final output? Leam Hall
2019-08-20 17:46 ` Pratyush Yadav
2019-08-20 18:01   ` Leam Hall
2019-08-20 18:56     ` Pratyush Yadav
2019-08-20 19:42     ` Phil Hord
2019-08-20 18:11   ` Randall S. Becker

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).