Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
126 views
in Technique[技术] by (71.8m points)

git - What is visible online on GitHub?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Each hosting provider can set up whatever access controls they like.

In the case of GitHub specifically, if the repository is public, everyone has access to every commit in that repository, by hash ID. From there, they can get to every file (because a repository contains commits and commits contain files).

If some sensitive data were committed in the past and the repository is public, you should assume they have escaped. See also https://www.collinsdictionary.com/us/dictionary/english/to-close-the-stable-door-after-the-horse-has-bolted.

Edit: consider an example. We create a Git repository:

$ cd tmp
$ mkdir sensitive
$ cd sensitive
$ git init
Initialized empty Git repository in ...
$ echo secret data, no one should ever know > secret
$ echo example > README
$ git add . && git commit -m initial
[master (root-commit) 1211ea7] initial
 2 files changed, 2 insertions(+)
 create mode 100644 README
 create mode 100644 secret
$ git rm secret
rm 'secret'
$ git commit -m "remove secret data, but it's still there"
[master de6528b] remove secret data, but it's still there
 1 file changed, 1 deletion(-)
 delete mode 100644 secret
$ ls
README

If we push this repository somewhere (to GitHub) and someone casually browses the result, they see only the README file. But cloning the repository, or looking at the parent commit of the one visible commit, suddenly they have access to the secret data in a file named secret, just by checking out the previous commit.

There are no files that you can't find: the repository consists of its two databases, and cloning the repository copies all (usually) of the main object database and some (usually—git clone --mirror copies all) of the secondary names-to-hash-IDs database. It's then possible, with maintenance commands, to browse through all the so-called blob objects that contain all the file data. It's possible to find every commit object in the object database and inspect the source tree attached to each commit.

The key idea here is this: A file's presence, or absence, in the tip commit of any given branch, as found by turning the branch name into a hash ID and using the hash ID to extract the given commit, says nothing about any file's presence or absence in any other commit. Each commit's snapshot is completely independent of all other snapshots. Your question talks about files you can see (which would be those in tip commits) but then says:

... and their history ...

and files do not have history. Commits are history. The history in the example repository above consists of two commits. The second commit has one file in it. The first commit has two files. One of those two files—the one named README—is identical in each commit, but that's simply a matter of the process I used to make the two commits. Except for the fact that we will all find the initial commit by starting with the second (i.e., last) commit and working backwards, the two commits are independent.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...