* Only track built files for final output? @ 2019-08-20 12:21 Leam Hall 2019-08-20 17:46 ` Pratyush Yadav 0 siblings, 1 reply; 6+ messages in thread From: Leam Hall @ 2019-08-20 12:21 UTC (permalink / raw) To: git Hey all, a newbie could use some help. We have some code that generates data files, and as a part of our build process those files are rebuilt to ensure things work. This causes an issue with branches and merging, as the data files change slightly and dealing with half a dozen merge conflicts, for files that are in an interim state, is frustrating. The catch is that when the code goes to the production state, those files must be in place and current. We use a release branch, and then fork off that for each issue. Testing, and file creation, is a part of the pre-merge process. This is what causes the merge conflicts. Right now my thought is to put the "final" versions of the files in some other directory, and put the interim file storage directory in .gitignore. Is there a better way to do this? Thanks! Leam ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Only track built files for final output? 2019-08-20 12:21 Only track built files for final output? Leam Hall @ 2019-08-20 17:46 ` Pratyush Yadav 2019-08-20 18:01 ` Leam Hall 2019-08-20 18:11 ` Randall S. Becker 0 siblings, 2 replies; 6+ messages in thread From: Pratyush Yadav @ 2019-08-20 17:46 UTC (permalink / raw) To: Leam Hall; +Cc: git On 20/08/19 08:21AM, Leam Hall wrote: > Hey all, a newbie could use some help. > > We have some code that generates data files, and as a part of our build > process those files are rebuilt to ensure things work. This causes an issue > with branches and merging, as the data files change slightly and dealing > with half a dozen merge conflicts, for files that are in an interim state, > is frustrating. The catch is that when the code goes to the production > state, those files must be in place and current. > > We use a release branch, and then fork off that for each issue. Testing, and > file creation, is a part of the pre-merge process. This is what causes the > merge conflicts. > > Right now my thought is to put the "final" versions of the files in some > other directory, and put the interim file storage directory in .gitignore. > Is there a better way to do this? > My philosophy with Git is to only track files that I need to generate the final product. I never track the generated files, because I can always get to them via the tracked "source" files. So for example, I was working on a simple parser in Flex and Bison. Flex and Bison take source files in their syntax, and generate a C file each that is then compiled and linked to get to the final binary. So instead of tracking the generated C files, I only tracked the source Flex and Bison files. My build system can always get me the generated files. So in your case, what's wrong with just tracking the source files needed to generate the other files, and then when you want a release binary, just clone the repo, run your build system, and get the generated files? What benefit do you get by tracking the generated files? -- Regards, Pratyush Yadav ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Only track built files for final output? 2019-08-20 17:46 ` Pratyush Yadav @ 2019-08-20 18:01 ` Leam Hall 2019-08-20 18:56 ` Pratyush Yadav 2019-08-20 19:42 ` Phil Hord 2019-08-20 18:11 ` Randall S. Becker 1 sibling, 2 replies; 6+ messages in thread From: Leam Hall @ 2019-08-20 18:01 UTC (permalink / raw) To: git On 8/20/19 1:46 PM, Pratyush Yadav wrote: > On 20/08/19 08:21AM, Leam Hall wrote: >> Hey all, a newbie could use some help. >> >> We have some code that generates data files, and as a part of our build >> process those files are rebuilt to ensure things work. This causes an issue >> with branches and merging, as the data files change slightly and dealing >> with half a dozen merge conflicts, for files that are in an interim state, >> is frustrating. The catch is that when the code goes to the production >> state, those files must be in place and current. >> >> We use a release branch, and then fork off that for each issue. Testing, and >> file creation, is a part of the pre-merge process. This is what causes the >> merge conflicts. >> >> Right now my thought is to put the "final" versions of the files in some >> other directory, and put the interim file storage directory in .gitignore. >> Is there a better way to do this? >> > > My philosophy with Git is to only track files that I need to generate > the final product. I never track the generated files, because I can > always get to them via the tracked "source" files. > > So for example, I was working on a simple parser in Flex and Bison. Flex > and Bison take source files in their syntax, and generate a C file each > that is then compiled and linked to get to the final binary. So instead > of tracking the generated C files, I only tracked the source Flex and > Bison files. My build system can always get me the generated files. > > So in your case, what's wrong with just tracking the source files needed > to generate the other files, and then when you want a release binary, > just clone the repo, run your build system, and get the generated files? > What benefit do you get by tracking the generated files? For internal use I agree with you. However, there's an issue. The generated files are used by another program's build system, and I can't guarantee the other build system's build system is built like ours. It seems easier to provide them the generated files and decouple their build system layout from ours. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Only track built files for final output? 2019-08-20 18:01 ` Leam Hall @ 2019-08-20 18:56 ` Pratyush Yadav 2019-08-20 19:42 ` Phil Hord 1 sibling, 0 replies; 6+ messages in thread From: Pratyush Yadav @ 2019-08-20 18:56 UTC (permalink / raw) To: Leam Hall; +Cc: git On 20/08/19 02:01PM, Leam Hall wrote: > On 8/20/19 1:46 PM, Pratyush Yadav wrote: > > On 20/08/19 08:21AM, Leam Hall wrote: > > > Hey all, a newbie could use some help. > > > > > > We have some code that generates data files, and as a part of our build > > > process those files are rebuilt to ensure things work. This causes an issue > > > with branches and merging, as the data files change slightly and dealing > > > with half a dozen merge conflicts, for files that are in an interim state, > > > is frustrating. The catch is that when the code goes to the production > > > state, those files must be in place and current. > > > > > > We use a release branch, and then fork off that for each issue. Testing, and > > > file creation, is a part of the pre-merge process. This is what causes the > > > merge conflicts. > > > > > > Right now my thought is to put the "final" versions of the files in some > > > other directory, and put the interim file storage directory in .gitignore. > > > Is there a better way to do this? > > > > > > > My philosophy with Git is to only track files that I need to generate > > the final product. I never track the generated files, because I can > > always get to them via the tracked "source" files. > > > > So for example, I was working on a simple parser in Flex and Bison. Flex > > and Bison take source files in their syntax, and generate a C file each > > that is then compiled and linked to get to the final binary. So instead > > of tracking the generated C files, I only tracked the source Flex and > > Bison files. My build system can always get me the generated files. > > > > So in your case, what's wrong with just tracking the source files needed > > to generate the other files, and then when you want a release binary, > > just clone the repo, run your build system, and get the generated files? > > What benefit do you get by tracking the generated files? > > For internal use I agree with you. However, there's an issue. > > The generated files are used by another program's build system, and I can't > guarantee the other build system's build system is built like ours. It seems > easier to provide them the generated files and decouple their build system > layout from ours. Maybe I don't completely understand your use case, but you can still pass off the generated files to the external build system without having to track them. Unless the external build system exclusively relies on git clones/fetches, how about packaging your release with your files generated from your build system in a tarball (or anything else that works for you) and pushing them to the external build system? Assuming you just _have_ to track those files, will always resolving the merge conflicts as 'theirs' work? My guess about your process works is you branch off, make a new feature or fix, and then merge those changes to your master. In that case, the changes that the feature branch made to your generated files should always be the ones that get committed, correct? master's version of the generated files should be stale. So your merge conflicts always need to be resolved as 'theirs', at least on the generated files. I don't know if git-merge supports file-specific merge strategies though, please check once. Otherwise, maybe you can write a script that resolves conflicts as 'theirs' for the generated files, and lets you figure it out manually for the rest. I'm just thinking out loud. I don't know how well this will scale. Maybe the more experienced folks here will have better ideas. -- Regards, Pratyush Yadav ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Only track built files for final output? 2019-08-20 18:01 ` Leam Hall 2019-08-20 18:56 ` Pratyush Yadav @ 2019-08-20 19:42 ` Phil Hord 1 sibling, 0 replies; 6+ messages in thread From: Phil Hord @ 2019-08-20 19:42 UTC (permalink / raw) To: Leam Hall; +Cc: Git On Tue, Aug 20, 2019 at 11:01 AM Leam Hall <leamhall@gmail.com> wrote: > On 8/20/19 1:46 PM, Pratyush Yadav wrote: > > So in your case, what's wrong with just tracking the source files needed > > to generate the other files, and then when you want a release binary, > > just clone the repo, run your build system, and get the generated files? > > What benefit do you get by tracking the generated files? > > For internal use I agree with you. However, there's an issue. > > The generated files are used by another program's build system, and I > can't guarantee the other build system's build system is built like > ours. It seems easier to provide them the generated files and decouple > their build system layout from ours. It becomes a burden to keep build products in the repo over time, for the reasons you already mentioned (they don't merge and you shouldn't try), but also because those build products never go away, leading to repo-bloat. Once you realize the cost is too great, it's often too late to do something about it cheaply. My advice is to keep your source repository clean from the beginning, so it contains only source code. This means you still have a problem because you want to distribute certified build artifacts. I recommend you use some other tool to handle that, like Artifactory. I recognize it seems easy to use Git for this because Git already acts like a reliable, portable, trackable file distribution system. But that's secondary to Git's purpose; there are better tools for that. If you must lean on Git for this, I like to isolate the binaries into a submodule so developers who don't want or need them aren't bothered by them, and they can stay out of the way of merges. But submodules present new workflow challenges and will require some study and education. If you want to keep them out of the way of developers, you can keep your source code repo and your artifact repo completely separate and make some "superproject" which contains both of those repos as submodules. The nice feature about this setup is you can positively associate the set of build products with the set of source code that produced them. ^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: Only track built files for final output? 2019-08-20 17:46 ` Pratyush Yadav 2019-08-20 18:01 ` Leam Hall @ 2019-08-20 18:11 ` Randall S. Becker 1 sibling, 0 replies; 6+ messages in thread From: Randall S. Becker @ 2019-08-20 18:11 UTC (permalink / raw) To: 'Pratyush Yadav', 'Leam Hall'; +Cc: git On August 20, 2019 1:47 PM, Pratyush Yadav > On 20/08/19 08:21AM, Leam Hall wrote: > > Hey all, a newbie could use some help. > > > > We have some code that generates data files, and as a part of our > > build process those files are rebuilt to ensure things work. This > > causes an issue with branches and merging, as the data files change > > slightly and dealing with half a dozen merge conflicts, for files that > > are in an interim state, is frustrating. The catch is that when the > > code goes to the production state, those files must be in place and current. > > > > We use a release branch, and then fork off that for each issue. > > Testing, and file creation, is a part of the pre-merge process. This > > is what causes the merge conflicts. > > > > Right now my thought is to put the "final" versions of the files in > > some other directory, and put the interim file storage directory in > .gitignore. > > Is there a better way to do this? > > > > My philosophy with Git is to only track files that I need to generate the final > product. I never track the generated files, because I can always get to them > via the tracked "source" files. > > So for example, I was working on a simple parser in Flex and Bison. Flex and > Bison take source files in their syntax, and generate a C file each that is then > compiled and linked to get to the final binary. So instead of tracking the > generated C files, I only tracked the source Flex and Bison files. My build > system can always get me the generated files. > > So in your case, what's wrong with just tracking the source files needed to > generate the other files, and then when you want a release binary, just clone > the repo, run your build system, and get the generated files? > What benefit do you get by tracking the generated files? The benefit of putting final release packages into git is based on the following set of requirements in highly regulated industries: 1. The release artifacts can never change from the point in time at which they are certified as working (a.k.a. passed tests) to the point when they are replaced with other artifacts (a subsequent release). Recompiling is not sufficient as the compilers themselves may change or be compromised. This is an audit requirement. 2. The source commit(s) used to create the release artifacts must be immutable so that the origins of the release artifacts are always known. This is also an audit requirement in regulated industries. 3. Disconnecting the source from the object (as is common in artifact repositories) breaks #2 and allows malicious code injection in after-the-test code reproduction. Variant of #2 but from the security perspective. 4. Metadata on the origin of the release artifacts (the clone URL, the parent commit, the branch, signed commits), are required for forensic analysis of code in a compliance environment. There are other related variants of the above, but those are the essential ones that are generally accepted in financial, insurance, medical device, and industrial applications. Increasingly, food production and distribution sectors are realizing that they are also subject to the above. I sadly cannot cite specific internal regulations or policies for NDA reasons, but hope that others are able to do that. Regards, Randall -- Brief whoami: NonStop developer since approximately 211288444200000000 UNIX developer since approximately 421664400 -- In my real life, I talk too much. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-08-20 19:42 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-08-20 12:21 Only track built files for final output? Leam Hall 2019-08-20 17:46 ` Pratyush Yadav 2019-08-20 18:01 ` Leam Hall 2019-08-20 18:56 ` Pratyush Yadav 2019-08-20 19:42 ` Phil Hord 2019-08-20 18:11 ` Randall S. Becker
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).