Skip to content

Feature/git cache#2384

Open
b4u-mw wants to merge 2 commits intoactions:mainfrom
b4u-mw:feature/git-cache
Open

Feature/git cache#2384
b4u-mw wants to merge 2 commits intoactions:mainfrom
b4u-mw:feature/git-cache

Conversation

@b4u-mw
Copy link

@b4u-mw b4u-mw commented Mar 5, 2026

Full implementation of a reference cache. Works for main git and submodules (recursive). Cache is automatically created/updated.
Fixes #2303

b4u-mw added 2 commits March 5, 2026 15:33
- Add `reference-cache` input to action.yml
- Introduce `GitCacheHelper` for bare clone cache management
- Prevent race conditions with `proper-lockfile` and atomic directory renames
- Support iterative submodule caching and robust relative URL resolution
- Append to `info/alternates` preserving existing alternate references
- Add fallback to standard clone on submodule cache failure
- Add unit tests for `GitCacheHelper`

Signed-off-by: Michael Wyraz <mw@brick4u.de>
When reference-cache is enabled, shallow fetches (fetch-depth > 0) are
counterproductive because objects are served from the local cache.
Shallow negotiation only adds network latency without saving bandwidth.

If fetch-depth was not explicitly set by the user, it is automatically
overridden to 0. If explicitly set, a warning is emitted explaining
the performance impact.

Signed-off-by: Michael Wyraz <mw@brick4u.de>
Copilot AI review requested due to automatic review settings March 5, 2026 15:02
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a reference cache feature for the actions/checkout GitHub Action, addressing issue #2303. The reference cache allows storing bare clones of repositories locally, so subsequent checkouts can use Git alternates to avoid re-downloading objects over the network. This is particularly valuable for self-hosted runners and custom runner images with persistent storage.

Changes:

  • Adds a new reference-cache input parameter that accepts a path to a local directory for storing bare cache repositories, with associated settings, input parsing, and documentation.
  • Introduces GitCacheHelper class (src/git-cache-helper.ts) that manages bare cache repositories with file-based locking (via proper-lockfile), atomic clone-to-rename patterns, and automatic cache creation/update.
  • Modifies the checkout flow in git-source-provider.ts to set up Git alternates for the main repository and iteratively process submodule updates with per-submodule reference caches, including automatic fetch-depth adjustment when reference cache is active.

Reviewed changes

Copilot reviewed 15 out of 17 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
action.yml Adds new reference-cache input parameter
src/git-source-settings.ts Adds referenceCache and fetchDepthExplicit fields to settings interface
src/input-helper.ts Reads reference-cache input and tracks explicit fetch-depth
src/git-cache-helper.ts New file: cache management with locking, bare clone/fetch, and URL-based dir naming
src/git-source-provider.ts Integrates reference cache into clone and submodule flows; adds adjustFetchDepthForCache
src/git-command-manager.ts Makes execGit/GitOutput public; adds referenceAdd for Git alternates
src/git-auth-helper.ts Adds removeGlobalAuth method for cache auth lifecycle
package.json / package-lock.json Adds proper-lockfile dependency and types
adrs/2303-reference-cache.md ADR documenting the design and acceptance criteria
README.md Documents the new reference-cache input
__test__/git-source-provider.test.ts Tests for adjustFetchDepthForCache
__test__/git-cache-helper.test.ts Tests for cache dir naming and setup behavior
__test__/git-directory-helper.test.ts Updates mock to include new interface methods
__test__/git-auth-helper.test.ts Updates mock and settings for new fields
__test__/input-helper.test.ts Asserts referenceCache defaults to empty
dist/index.js Bundled output including new code and proper-lockfile

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +37
# Reference Cache für schnelle Checkouts

## Zusammenfassung
Einführung eines lokal verwalteten Git-Referenz-Caches für Haupt-Repositories und Submodule, um Netzwerk-Traffic und Checkout-Zeiten auf persistenten Runnern (z.B. Self-Hosted) massiv zu reduzieren.

## Implementierungsplan

1. **Inputs:**
- In `action.yml` einen neuen Input `reference-cache` (Pfad zum Cache-Verzeichnis) hinzufügen. Default ist leer.
- In `src/git-source-settings.ts` und `src/input-helper.ts` den Input auslesen und bereitstellen (`settings.referenceCache`).

2. **Cache Manager (`src/git-cache-helper.ts`):**
- Eine neue Klasse/Helper-Logik, die das Erstellen (`git clone --bare`) und Aktualisieren (`git fetch --force`) von Bare Cache-Repos übernimmt.
- **Namenskonvention Cache-Verzeichnis:** Damit Admin-Lesbarkeit und Kollisionsfreiheit gewährleistet sind, wird das Cache-Verzeichnis aus der Repository-URL gebildet:
- Alle Sonderzeichen in der URL durch `_` ersetzen.
- Ein kurzer Hash (z. B. erste 8 Zeichen des SHA256) der echten URL zur Eindeutigkeit anhängen.
- Beispiel: `<reference-cache>/https___github_com_actions_checkout_8f9b1c2a.git`

3. **Haupt-Repo Checkout (`src/git-source-provider.ts`):**
- Vor dem Setup des Checkouts prüfen, ob `reference-cache` gesetzt ist.
- Wenn ja: den Cache-Ordner für die Haupt-URL aktualisieren/anlegen.
- Nach dem initialen `git.init()` den Pfad in `.git/objects/info/alternates` schreiben, der auf das `objects`-Verzeichnis des Cache-Ordners zeigt.

4. **Submodule Checkouts (Iterativ statt monolithisch):**
- Der aktuelle Befehl `git submodule update --recursive` funktioniert nicht out-of-the-box mit `reference`, wenn jedes Submodul seinen individuellen Referenz-Cache benötigt.
- Wenn `reference-cache` aktiv ist und Submodule initialisiert werden sollen:
- Lese `.gitmodules` aus (alle Sub-URLs ermitteln).
- Für jedes Submodul den Cache (genauso wie in Step 2) anlegen oder aktualisieren.
- Submodul einzeln auschecken per `git submodule update --init --reference <cache-pfad/.git> <pfad>`.
- Bei der Einstellung `recursive`: In jedes Submodul-Verzeichnis wechseln und den Vorgang für `.gitmodules` rekursiv auf Skript-Ebene durchführen (anstatt Git's `--recursive` Flag einfach weiterzugeben).

## Akzeptanzkriterien
1. **Neue Option konfigurierbar**: Der Input `reference-cache` kann übergeben werden, der Code reagiert darauf.
2. **Ordnerstruktur korrekt**: Der Cache-Ordner für das Hauptrepo und Submodule erhält Namen nach der "URL_Sonderzeichen_Ersetzt+SHA_Cut"-Logik.
3. **Bandbreite gespart / Alternates genutzt**: Beim Hauptcheckout wird eine `.git/objects/info/alternates`-Datei mit Pfad zum lokalen Cache erzeugt. Danach ausgeführte `git fetch`-Befehle sind signifikant schneller bzw. laden deutlich weniger Bytes herunter.
4. **Submodule erhalten Caches**: Auch tiefe (rekursive) Submodule profitieren für deren jeweilige Remote-URL vom Cache, da pro Submodul ein passender `--reference` Punkt dynamisch berechnet und übergeben wird.
5. **Kein --dissociate**: Aus Performance-Gründen bleibt der Arbeitsordner an den Cache gebunden (`git repack` ist zeitaufwändig). Fällt der Cache weg, muss der Workspace erst einmal neu erzeugt werden (was bei Action Runnern die Norm ist, falls es nicht ohnehin "single-use" Runner sind).
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ADR is written entirely in German, while the existing ADR (adrs/0153-checkout-v2.md) is in English. For consistency with the rest of the repository and to ensure all contributors can understand the architectural decisions, this document should be written in English.

Suggested change
# Reference Cache für schnelle Checkouts
## Zusammenfassung
Einführung eines lokal verwalteten Git-Referenz-Caches für Haupt-Repositories und Submodule, um Netzwerk-Traffic und Checkout-Zeiten auf persistenten Runnern (z.B. Self-Hosted) massiv zu reduzieren.
## Implementierungsplan
1. **Inputs:**
- In `action.yml` einen neuen Input `reference-cache` (Pfad zum Cache-Verzeichnis) hinzufügen. Default ist leer.
- In `src/git-source-settings.ts` und `src/input-helper.ts` den Input auslesen und bereitstellen (`settings.referenceCache`).
2. **Cache Manager (`src/git-cache-helper.ts`):**
- Eine neue Klasse/Helper-Logik, die das Erstellen (`git clone --bare`) und Aktualisieren (`git fetch --force`) von Bare Cache-Repos übernimmt.
- **Namenskonvention Cache-Verzeichnis:** Damit Admin-Lesbarkeit und Kollisionsfreiheit gewährleistet sind, wird das Cache-Verzeichnis aus der Repository-URL gebildet:
- Alle Sonderzeichen in der URL durch `_` ersetzen.
- Ein kurzer Hash (z. B. erste 8 Zeichen des SHA256) der echten URL zur Eindeutigkeit anhängen.
- Beispiel: `<reference-cache>/https___github_com_actions_checkout_8f9b1c2a.git`
3. **Haupt-Repo Checkout (`src/git-source-provider.ts`):**
- Vor dem Setup des Checkouts prüfen, ob `reference-cache` gesetzt ist.
- Wenn ja: den Cache-Ordner für die Haupt-URL aktualisieren/anlegen.
- Nach dem initialen `git.init()` den Pfad in `.git/objects/info/alternates` schreiben, der auf das `objects`-Verzeichnis des Cache-Ordners zeigt.
4. **Submodule Checkouts (Iterativ statt monolithisch):**
- Der aktuelle Befehl `git submodule update --recursive` funktioniert nicht out-of-the-box mit `reference`, wenn jedes Submodul seinen individuellen Referenz-Cache benötigt.
- Wenn `reference-cache` aktiv ist und Submodule initialisiert werden sollen:
- Lese `.gitmodules` aus (alle Sub-URLs ermitteln).
- Für jedes Submodul den Cache (genauso wie in Step 2) anlegen oder aktualisieren.
- Submodul einzeln auschecken per `git submodule update --init --reference <cache-pfad/.git> <pfad>`.
- Bei der Einstellung `recursive`: In jedes Submodul-Verzeichnis wechseln und den Vorgang für `.gitmodules` rekursiv auf Skript-Ebene durchführen (anstatt Git's `--recursive` Flag einfach weiterzugeben).
## Akzeptanzkriterien
1. **Neue Option konfigurierbar**: Der Input `reference-cache` kann übergeben werden, der Code reagiert darauf.
2. **Ordnerstruktur korrekt**: Der Cache-Ordner für das Hauptrepo und Submodule erhält Namen nach der "URL_Sonderzeichen_Ersetzt+SHA_Cut"-Logik.
3. **Bandbreite gespart / Alternates genutzt**: Beim Hauptcheckout wird eine `.git/objects/info/alternates`-Datei mit Pfad zum lokalen Cache erzeugt. Danach ausgeführte `git fetch`-Befehle sind signifikant schneller bzw. laden deutlich weniger Bytes herunter.
4. **Submodule erhalten Caches**: Auch tiefe (rekursive) Submodule profitieren für deren jeweilige Remote-URL vom Cache, da pro Submodul ein passender `--reference` Punkt dynamisch berechnet und übergeben wird.
5. **Kein --dissociate**: Aus Performance-Gründen bleibt der Arbeitsordner an den Cache gebunden (`git repack` ist zeitaufwändig). Fällt der Cache weg, muss der Workspace erst einmal neu erzeugt werden (was bei Action Runnern die Norm ist, falls es nicht ohnehin "single-use" Runner sind).
# Reference cache for fast checkouts
## Summary
Introduce a locally managed Git reference cache for main repositories and submodules to massively reduce network traffic and checkout times on persistent runners (e.g. self-hosted).
## Implementation plan
1. **Inputs:**
- Add a new input `reference-cache` (path to the cache directory) in `action.yml`. The default is empty.
- Read and expose this input in `src/git-source-settings.ts` and `src/input-helper.ts` as `settings.referenceCache`.
2. **Cache manager (`src/git-cache-helper.ts`):**
- Add a new class/helper responsible for creating (`git clone --bare`) and updating (`git fetch --force`) bare cache repositories.
- **Cache directory naming convention:** To keep the cache understandable for admins and avoid collisions, derive the cache directory name from the repository URL:
- Replace all non-alphanumeric characters in the URL with `_`.
- Append a short hash (e.g. first 8 characters of the SHA256) of the real URL to guarantee uniqueness.
- Example: `<reference-cache>/https___github_com_actions_checkout_8f9b1c2a.git`
3. **Main repository checkout (`src/git-source-provider.ts`):**
- Before setting up the checkout, check whether `reference-cache` is set.
- If it is: create or update the cache directory for the primary repository URL.
- After the initial `git init()`, write a path into `.git/objects/info/alternates` that points to the `objects` directory of the cache repository.
4. **Submodule checkouts (iterative instead of monolithic):**
- The current command `git submodule update --recursive` does not work out of the box with `--reference` when each submodule requires its own reference cache.
- When `reference-cache` is enabled and submodules need to be initialized:
- Read `.gitmodules` (determine all submodule URLs).
- For each submodule, create or update a cache repository (following the same logic as in step 2).
- Check out each submodule individually using `git submodule update --init --reference <cache-path/.git> <path>`.
- When `recursive` is requested: change into each submodule directory and repeat this process for its `.gitmodules` at the script level, instead of simply passing Git's `--recursive` flag through.
## Acceptance criteria
1. **New option is configurable**: The `reference-cache` input can be provided and the code reacts accordingly.
2. **Directory structure is correct**: The cache directory for the main repository and all submodules is named according to the "URL_characters_replaced_plus_truncated_SHA" logic.
3. **Bandwidth is saved / alternates are used**: During the main checkout, a `.git/objects/info/alternates` file is created that points to the local cache. Subsequent `git fetch` commands are significantly faster and download noticeably fewer bytes.
4. **Submodules use caches**: Deep (recursive) submodules also benefit from the cache for their respective remote URLs, because a suitable `--reference` target is dynamically computed and passed per submodule.
5. **No `--dissociate`**: For performance reasons, the working directory remains bound to the cache (`git repack` is expensive). If the cache is removed, the workspace must be recreated (which is the norm for action runners, especially if they are "single-use" runners).

Copilot uses AI. Check for mistakes.
Comment on lines +279 to +292

// Setup reference cache if requested
if (settings.referenceCache) {
core.startGroup('Setting up reference repository cache')
const cacheHelper = new GitCacheHelper(settings.referenceCache)
const cachePath = await cacheHelper.setupCache(git, repositoryUrl)
const cacheObjects = path.join(cachePath, 'objects')
if (fsHelper.directoryExistsSync(cacheObjects, false)) {
await git.referenceAdd(cacheObjects)
} else {
core.warning(`Reference repository cache objects directory ${cacheObjects} does not exist`)
}
core.endGroup()
}
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference cache setup (both updating the bare cache repo and configuring alternates) only runs when .git does not already exist. On self-hosted runners with persistent workspaces — which is the primary use case for reference caching — subsequent runs will find .git already present. This means:

  1. The bare cache repository is never updated after the first clone.
  2. New objects pushed to the remote after the initial cache creation won't be available locally, defeating the purpose of the cache on long-lived runners.

The cache update (setupCache) should also run when .git already exists. Additionally, the alternates file should be verified/re-added even for existing repos, in case the workspace was cleaned but the cache persists (or vice versa).

Suggested change
// Setup reference cache if requested
if (settings.referenceCache) {
core.startGroup('Setting up reference repository cache')
const cacheHelper = new GitCacheHelper(settings.referenceCache)
const cachePath = await cacheHelper.setupCache(git, repositoryUrl)
const cacheObjects = path.join(cachePath, 'objects')
if (fsHelper.directoryExistsSync(cacheObjects, false)) {
await git.referenceAdd(cacheObjects)
} else {
core.warning(`Reference repository cache objects directory ${cacheObjects} does not exist`)
}
core.endGroup()
}
}
// Setup reference cache if requested
if (settings.referenceCache) {
core.startGroup('Setting up reference repository cache')
const cacheHelper = new GitCacheHelper(settings.referenceCache)
const cachePath = await cacheHelper.setupCache(git, repositoryUrl)
const cacheObjects = path.join(cachePath, 'objects')
if (fsHelper.directoryExistsSync(cacheObjects, false)) {
await git.referenceAdd(cacheObjects)
} else {
core.warning(
`Reference repository cache objects directory ${cacheObjects} does not exist`
)
}
core.endGroup()

Copilot uses AI. Check for mistakes.
"jest-circus": "^29.7.0",
"js-yaml": "^4.1.0",
"prettier": "^3.3.3",
"proper-lockfile": "^4.1.2",
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proper-lockfile is imported and used in the runtime source code (src/git-cache-helper.ts), but it's listed under devDependencies. Following the existing convention in this project (e.g., uuid is in dependencies with @types/uuid in devDependencies), proper-lockfile should be moved to dependencies and only @types/proper-lockfile should remain in devDependencies.

While ncc bundles everything into dist/index.js regardless of the section, keeping the semantic distinction between runtime and dev dependencies is important for clarity and for tools that rely on this distinction (e.g., npm audit --production).

Copilot uses AI. Check for mistakes.
Comment on lines +26 to +32
async function iterativeSubmoduleUpdate(
git: IGitCommandManager,
cacheHelper: GitCacheHelper,
repositoryPath: string,
fetchDepth: number,
nestedSubmodules: boolean
): Promise<void> {
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is named iterativeSubmoduleUpdate, but when nestedSubmodules is true, it recursively calls itself (line 148). The name is misleading since the function uses recursion, not iteration. Consider renaming to recursiveSubmoduleUpdate or submoduleUpdateWithCache to more accurately describe its behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +71 to +72
// In rare cases where it somehow exists but objects/ didn't, clean it up
await fs.promises.rm(cachePath, { recursive: true, force: true })
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 71-72 have inconsistent indentation — they use an extra space compared to the surrounding code block. Given the project uses Prettier with 2-space tabWidth, running prettier --write should fix this.

Suggested change
// In rare cases where it somehow exists but objects/ didn't, clean it up
await fs.promises.rm(cachePath, { recursive: true, force: true })
// In rare cases where it somehow exists but objects/ didn't, clean it up
await fs.promises.rm(cachePath, { recursive: true, force: true })

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support reference clones

2 participants