git - What is visible online on GitHub?

Question

Welcome To Ask or Share your Answers For Others

git - What is visible online on GitHub?

1 Reply

深蓝 · Answer 1 · 2021-10-06T18:52:47+0000

Each hosting provider can set up whatever access controls they like.

In the case of GitHub specifically, if the repository is public, everyone has access to every commit in that repository, by hash ID. From there, they can get to every file (because a repository contains commits and commits contain files).

If some sensitive data were committed in the past and the repository is public, you should assume they have escaped. See also https://www.collinsdictionary.com/us/dictionary/english/to-close-the-stable-door-after-the-horse-has-bolted.

Edit: consider an example. We create a Git repository:

$ cd tmp
$ mkdir sensitive
$ cd sensitive
$ git init
Initialized empty Git repository in ...
$ echo secret data, no one should ever know > secret
$ echo example > README
$ git add . && git commit -m initial
[master (root-commit) 1211ea7] initial
 2 files changed, 2 insertions(+)
 create mode 100644 README
 create mode 100644 secret
$ git rm secret
rm 'secret'
$ git commit -m "remove secret data, but it's still there"
[master de6528b] remove secret data, but it's still there
 1 file changed, 1 deletion(-)
 delete mode 100644 secret
$ ls
README

If we push this repository somewhere (to GitHub) and someone casually browses the result, they see only the README file. But cloning the repository, or looking at the parent commit of the one visible commit, suddenly they have access to the secret data in a file named secret, just by checking out the previous commit.

There are no files that you can't find: the repository consists of its two databases, and cloning the repository copies all (usually) of the main object database and some (usually—git clone --mirror copies all) of the secondary names-to-hash-IDs database. It's then possible, with maintenance commands, to browse through all the so-called blob objects that contain all the file data. It's possible to find every commit object in the object database and inspect the source tree attached to each commit.

The key idea here is this: A file's presence, or absence, in the tip commit of any given branch, as found by turning the branch name into a hash ID and using the hash ID to extract the given commit, says nothing about any file's presence or absence in any other commit. Each commit's snapshot is completely independent of all other snapshots. Your question talks about files you can see (which would be those in tip commits) but then says:

... and their history ...

and files do not have history. Commits are history. The history in the example repository above consists of two commits. The second commit has one file in it. The first commit has two files. One of those two files—the one named README—is identical in each commit, but that's simply a matter of the process I used to make the two commits. Except for the fact that we will all find the initial commit by starting with the second (i.e., last) commit and working backwards, the two commits are independent.

Categories

git - What is visible online on GitHub?

git - What is visible online on GitHub?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags