From: Samuel Wales <samologist@gmail.com>
To: git@vger.kernel.org
Subject: Re: is this data corruption?
Date: Fri, 30 Dec 2022 17:33:20 -0700 [thread overview]
Message-ID: <CAJcAo8smwU2ddB96J+G2SOAP+FU2p4ejB2JX9+5QHdQEn43htQ@mail.gmail.com> (raw)
In-Reply-To: <CAJcAo8tjMLFisK5_13iD_JGo2xVQDJRX3wAC7wRD_V2GKFGevQ@mail.gmail.com>
p.p.s. git 2.20 has the same problem.
On 12/30/22, Samuel Wales <samologist@gmail.com> wrote:
> i am not subscribed, but am of the impression that's ok. please copy
> me directly.
>
>
> tldr: git diff is showing differences that do not exist in the files
> themselves.
>
> i have nothing staged, nothing fancy like stashing, etc. this is a
> repo of mostly emacs org mode files. mostly ascii text.
>
> git status and these commands show nothing unusual:
>
> git fsck --strict --no-dangling
> git gc --prune="0 days"
>
>
> the problem that seems like data corruption is that a few lines appear
> twice as - and once as +. but in the current version of the files,
> those lines exist only once. here are the lines. there are 2 -
> versions and one + version:
>
> +***************** REF bigpart is a partition
> +biglike and homelike are distracting nonsense i think except
> +to describe inferior filesets. anomalous subset of home
> +might be called homelike or so.
>
>
> emacs magit shows the same problem. however, it shows a slightly
> different diff. i did a meta-diff on git diff vs. magit, and there
> are about 800 + real-content lines that magit shows but git diff does
> not. i do not know what this means. wc -l is like
>
> 62540 aaa.diff
> 62965 bbb--magit.txt
>
> idk why a diff would be different with only + lines being different?
>
>
> in summary, what is wrong with my repo, if anything, and what can i do
> about it? nothing on the web for git corruption seems to say much,
> other than pull from github or whatever. this is my own repo, the
> original repo, so i cannot do that. org annex has an uncorrupt tool
> of some kind, but it did not seem relevant. i do have rsnapshot
> [basically rsync] backups of the repo and the most significant files
> and dirs, but i do not know what one does to use that to repair any
> issues. i won't get into why, but changes were made over months.
>
> is there a protocol for this?
>
> would git fsck have balked?
>
> thank you!
>
>
> p.s.
>
> i have no reason to believe this is related, but git diff has
> intermingled emacs org mode entries. but i don't have to talk about
> it in org terms; in generic text terms, it has intermingled parts of
> different paragraphs. as a user, i'd prefer that completely unrelated
> paragraphs not be mingled, regardless of minimality. if possible.
>
> with respect to the intermingling only, unless this is related to the
> possible corruption, i will presume the diff is correct, in that a
> patch from it would produce the same result as a patch that does not
> intermingle. i believe this intermingling is because diff does not
> understand org, or paragraphs for that matter. in org, an entry
> starts with "^[*]+ " and ends at the beginning of another entry or at
> eof. they consist in my case mostly of ascii text paragraphs. just
> as with paragraphs, if you move an entry, you don't expect it to be
> mingled with a different one in the diff.
>
> i have been told that this cannot be fixed by merely telling a
> slightly improved differ that stuff between stars is worth preserving,
> but that a parser, not merely a couple of regexps, is needed to reduce
> this intermingling. i have also been told that difftastic uses
> tree-sitter, which might get such a syntax for emacs org mode. and so
> maybe at some point git diff can use that. idk.
>
> idk if any of this is related but i include it for completeness.
>
> also, please don't laugh, but i am using git version 2.11.0. i will
> upgrade pending various library and os stuff but my main concern is
> not for git, but for possible corruption in the repo and what is
> possible to do, at least given rsnapshot, to fix it.
>
--
The Kafka Pandemic
A blog about science, health, human rights, and misopathy:
https://thekafkapandemic.blogspot.com
next prev parent reply other threads:[~2022-12-31 0:33 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-31 0:17 is this data corruption? Samuel Wales
2022-12-31 0:25 ` rsbecker
2022-12-31 0:57 ` Samuel Wales
2023-01-01 1:15 ` Samuel Wales
2023-01-01 1:14 ` Samuel Wales
2022-12-31 0:33 ` Samuel Wales [this message]
2023-01-01 4:07 ` Junio C Hamano
2023-01-01 4:46 ` Samuel Wales
2023-01-01 4:48 ` Samuel Wales
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJcAo8smwU2ddB96J+G2SOAP+FU2p4ejB2JX9+5QHdQEn43htQ@mail.gmail.com \
--to=samologist@gmail.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).