Conversation
- Add `reference-cache` input to action.yml - Introduce `GitCacheHelper` for bare clone cache management - Prevent race conditions with `proper-lockfile` and atomic directory renames - Support iterative submodule caching and robust relative URL resolution - Append to `info/alternates` preserving existing alternate references - Add fallback to standard clone on submodule cache failure - Add unit tests for `GitCacheHelper` Signed-off-by: Michael Wyraz <mw@brick4u.de>
When reference-cache is enabled, shallow fetches (fetch-depth > 0) are counterproductive because objects are served from the local cache. Shallow negotiation only adds network latency without saving bandwidth. If fetch-depth was not explicitly set by the user, it is automatically overridden to 0. If explicitly set, a warning is emitted explaining the performance impact. Signed-off-by: Michael Wyraz <mw@brick4u.de>
There was a problem hiding this comment.
Pull request overview
This PR implements a reference cache feature for the actions/checkout GitHub Action, addressing issue #2303. The reference cache allows storing bare clones of repositories locally, so subsequent checkouts can use Git alternates to avoid re-downloading objects over the network. This is particularly valuable for self-hosted runners and custom runner images with persistent storage.
Changes:
- Adds a new
reference-cacheinput parameter that accepts a path to a local directory for storing bare cache repositories, with associated settings, input parsing, and documentation. - Introduces
GitCacheHelperclass (src/git-cache-helper.ts) that manages bare cache repositories with file-based locking (viaproper-lockfile), atomic clone-to-rename patterns, and automatic cache creation/update. - Modifies the checkout flow in
git-source-provider.tsto set up Git alternates for the main repository and iteratively process submodule updates with per-submodule reference caches, including automaticfetch-depthadjustment when reference cache is active.
Reviewed changes
Copilot reviewed 15 out of 17 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
action.yml |
Adds new reference-cache input parameter |
src/git-source-settings.ts |
Adds referenceCache and fetchDepthExplicit fields to settings interface |
src/input-helper.ts |
Reads reference-cache input and tracks explicit fetch-depth |
src/git-cache-helper.ts |
New file: cache management with locking, bare clone/fetch, and URL-based dir naming |
src/git-source-provider.ts |
Integrates reference cache into clone and submodule flows; adds adjustFetchDepthForCache |
src/git-command-manager.ts |
Makes execGit/GitOutput public; adds referenceAdd for Git alternates |
src/git-auth-helper.ts |
Adds removeGlobalAuth method for cache auth lifecycle |
package.json / package-lock.json |
Adds proper-lockfile dependency and types |
adrs/2303-reference-cache.md |
ADR documenting the design and acceptance criteria |
README.md |
Documents the new reference-cache input |
__test__/git-source-provider.test.ts |
Tests for adjustFetchDepthForCache |
__test__/git-cache-helper.test.ts |
Tests for cache dir naming and setup behavior |
__test__/git-directory-helper.test.ts |
Updates mock to include new interface methods |
__test__/git-auth-helper.test.ts |
Updates mock and settings for new fields |
__test__/input-helper.test.ts |
Asserts referenceCache defaults to empty |
dist/index.js |
Bundled output including new code and proper-lockfile |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Reference Cache für schnelle Checkouts | ||
|
|
||
| ## Zusammenfassung | ||
| Einführung eines lokal verwalteten Git-Referenz-Caches für Haupt-Repositories und Submodule, um Netzwerk-Traffic und Checkout-Zeiten auf persistenten Runnern (z.B. Self-Hosted) massiv zu reduzieren. | ||
|
|
||
| ## Implementierungsplan | ||
|
|
||
| 1. **Inputs:** | ||
| - In `action.yml` einen neuen Input `reference-cache` (Pfad zum Cache-Verzeichnis) hinzufügen. Default ist leer. | ||
| - In `src/git-source-settings.ts` und `src/input-helper.ts` den Input auslesen und bereitstellen (`settings.referenceCache`). | ||
|
|
||
| 2. **Cache Manager (`src/git-cache-helper.ts`):** | ||
| - Eine neue Klasse/Helper-Logik, die das Erstellen (`git clone --bare`) und Aktualisieren (`git fetch --force`) von Bare Cache-Repos übernimmt. | ||
| - **Namenskonvention Cache-Verzeichnis:** Damit Admin-Lesbarkeit und Kollisionsfreiheit gewährleistet sind, wird das Cache-Verzeichnis aus der Repository-URL gebildet: | ||
| - Alle Sonderzeichen in der URL durch `_` ersetzen. | ||
| - Ein kurzer Hash (z. B. erste 8 Zeichen des SHA256) der echten URL zur Eindeutigkeit anhängen. | ||
| - Beispiel: `<reference-cache>/https___github_com_actions_checkout_8f9b1c2a.git` | ||
|
|
||
| 3. **Haupt-Repo Checkout (`src/git-source-provider.ts`):** | ||
| - Vor dem Setup des Checkouts prüfen, ob `reference-cache` gesetzt ist. | ||
| - Wenn ja: den Cache-Ordner für die Haupt-URL aktualisieren/anlegen. | ||
| - Nach dem initialen `git.init()` den Pfad in `.git/objects/info/alternates` schreiben, der auf das `objects`-Verzeichnis des Cache-Ordners zeigt. | ||
|
|
||
| 4. **Submodule Checkouts (Iterativ statt monolithisch):** | ||
| - Der aktuelle Befehl `git submodule update --recursive` funktioniert nicht out-of-the-box mit `reference`, wenn jedes Submodul seinen individuellen Referenz-Cache benötigt. | ||
| - Wenn `reference-cache` aktiv ist und Submodule initialisiert werden sollen: | ||
| - Lese `.gitmodules` aus (alle Sub-URLs ermitteln). | ||
| - Für jedes Submodul den Cache (genauso wie in Step 2) anlegen oder aktualisieren. | ||
| - Submodul einzeln auschecken per `git submodule update --init --reference <cache-pfad/.git> <pfad>`. | ||
| - Bei der Einstellung `recursive`: In jedes Submodul-Verzeichnis wechseln und den Vorgang für `.gitmodules` rekursiv auf Skript-Ebene durchführen (anstatt Git's `--recursive` Flag einfach weiterzugeben). | ||
|
|
||
| ## Akzeptanzkriterien | ||
| 1. **Neue Option konfigurierbar**: Der Input `reference-cache` kann übergeben werden, der Code reagiert darauf. | ||
| 2. **Ordnerstruktur korrekt**: Der Cache-Ordner für das Hauptrepo und Submodule erhält Namen nach der "URL_Sonderzeichen_Ersetzt+SHA_Cut"-Logik. | ||
| 3. **Bandbreite gespart / Alternates genutzt**: Beim Hauptcheckout wird eine `.git/objects/info/alternates`-Datei mit Pfad zum lokalen Cache erzeugt. Danach ausgeführte `git fetch`-Befehle sind signifikant schneller bzw. laden deutlich weniger Bytes herunter. | ||
| 4. **Submodule erhalten Caches**: Auch tiefe (rekursive) Submodule profitieren für deren jeweilige Remote-URL vom Cache, da pro Submodul ein passender `--reference` Punkt dynamisch berechnet und übergeben wird. | ||
| 5. **Kein --dissociate**: Aus Performance-Gründen bleibt der Arbeitsordner an den Cache gebunden (`git repack` ist zeitaufwändig). Fällt der Cache weg, muss der Workspace erst einmal neu erzeugt werden (was bei Action Runnern die Norm ist, falls es nicht ohnehin "single-use" Runner sind). |
There was a problem hiding this comment.
The ADR is written entirely in German, while the existing ADR (adrs/0153-checkout-v2.md) is in English. For consistency with the rest of the repository and to ensure all contributors can understand the architectural decisions, this document should be written in English.
| # Reference Cache für schnelle Checkouts | |
| ## Zusammenfassung | |
| Einführung eines lokal verwalteten Git-Referenz-Caches für Haupt-Repositories und Submodule, um Netzwerk-Traffic und Checkout-Zeiten auf persistenten Runnern (z.B. Self-Hosted) massiv zu reduzieren. | |
| ## Implementierungsplan | |
| 1. **Inputs:** | |
| - In `action.yml` einen neuen Input `reference-cache` (Pfad zum Cache-Verzeichnis) hinzufügen. Default ist leer. | |
| - In `src/git-source-settings.ts` und `src/input-helper.ts` den Input auslesen und bereitstellen (`settings.referenceCache`). | |
| 2. **Cache Manager (`src/git-cache-helper.ts`):** | |
| - Eine neue Klasse/Helper-Logik, die das Erstellen (`git clone --bare`) und Aktualisieren (`git fetch --force`) von Bare Cache-Repos übernimmt. | |
| - **Namenskonvention Cache-Verzeichnis:** Damit Admin-Lesbarkeit und Kollisionsfreiheit gewährleistet sind, wird das Cache-Verzeichnis aus der Repository-URL gebildet: | |
| - Alle Sonderzeichen in der URL durch `_` ersetzen. | |
| - Ein kurzer Hash (z. B. erste 8 Zeichen des SHA256) der echten URL zur Eindeutigkeit anhängen. | |
| - Beispiel: `<reference-cache>/https___github_com_actions_checkout_8f9b1c2a.git` | |
| 3. **Haupt-Repo Checkout (`src/git-source-provider.ts`):** | |
| - Vor dem Setup des Checkouts prüfen, ob `reference-cache` gesetzt ist. | |
| - Wenn ja: den Cache-Ordner für die Haupt-URL aktualisieren/anlegen. | |
| - Nach dem initialen `git.init()` den Pfad in `.git/objects/info/alternates` schreiben, der auf das `objects`-Verzeichnis des Cache-Ordners zeigt. | |
| 4. **Submodule Checkouts (Iterativ statt monolithisch):** | |
| - Der aktuelle Befehl `git submodule update --recursive` funktioniert nicht out-of-the-box mit `reference`, wenn jedes Submodul seinen individuellen Referenz-Cache benötigt. | |
| - Wenn `reference-cache` aktiv ist und Submodule initialisiert werden sollen: | |
| - Lese `.gitmodules` aus (alle Sub-URLs ermitteln). | |
| - Für jedes Submodul den Cache (genauso wie in Step 2) anlegen oder aktualisieren. | |
| - Submodul einzeln auschecken per `git submodule update --init --reference <cache-pfad/.git> <pfad>`. | |
| - Bei der Einstellung `recursive`: In jedes Submodul-Verzeichnis wechseln und den Vorgang für `.gitmodules` rekursiv auf Skript-Ebene durchführen (anstatt Git's `--recursive` Flag einfach weiterzugeben). | |
| ## Akzeptanzkriterien | |
| 1. **Neue Option konfigurierbar**: Der Input `reference-cache` kann übergeben werden, der Code reagiert darauf. | |
| 2. **Ordnerstruktur korrekt**: Der Cache-Ordner für das Hauptrepo und Submodule erhält Namen nach der "URL_Sonderzeichen_Ersetzt+SHA_Cut"-Logik. | |
| 3. **Bandbreite gespart / Alternates genutzt**: Beim Hauptcheckout wird eine `.git/objects/info/alternates`-Datei mit Pfad zum lokalen Cache erzeugt. Danach ausgeführte `git fetch`-Befehle sind signifikant schneller bzw. laden deutlich weniger Bytes herunter. | |
| 4. **Submodule erhalten Caches**: Auch tiefe (rekursive) Submodule profitieren für deren jeweilige Remote-URL vom Cache, da pro Submodul ein passender `--reference` Punkt dynamisch berechnet und übergeben wird. | |
| 5. **Kein --dissociate**: Aus Performance-Gründen bleibt der Arbeitsordner an den Cache gebunden (`git repack` ist zeitaufwändig). Fällt der Cache weg, muss der Workspace erst einmal neu erzeugt werden (was bei Action Runnern die Norm ist, falls es nicht ohnehin "single-use" Runner sind). | |
| # Reference cache for fast checkouts | |
| ## Summary | |
| Introduce a locally managed Git reference cache for main repositories and submodules to massively reduce network traffic and checkout times on persistent runners (e.g. self-hosted). | |
| ## Implementation plan | |
| 1. **Inputs:** | |
| - Add a new input `reference-cache` (path to the cache directory) in `action.yml`. The default is empty. | |
| - Read and expose this input in `src/git-source-settings.ts` and `src/input-helper.ts` as `settings.referenceCache`. | |
| 2. **Cache manager (`src/git-cache-helper.ts`):** | |
| - Add a new class/helper responsible for creating (`git clone --bare`) and updating (`git fetch --force`) bare cache repositories. | |
| - **Cache directory naming convention:** To keep the cache understandable for admins and avoid collisions, derive the cache directory name from the repository URL: | |
| - Replace all non-alphanumeric characters in the URL with `_`. | |
| - Append a short hash (e.g. first 8 characters of the SHA256) of the real URL to guarantee uniqueness. | |
| - Example: `<reference-cache>/https___github_com_actions_checkout_8f9b1c2a.git` | |
| 3. **Main repository checkout (`src/git-source-provider.ts`):** | |
| - Before setting up the checkout, check whether `reference-cache` is set. | |
| - If it is: create or update the cache directory for the primary repository URL. | |
| - After the initial `git init()`, write a path into `.git/objects/info/alternates` that points to the `objects` directory of the cache repository. | |
| 4. **Submodule checkouts (iterative instead of monolithic):** | |
| - The current command `git submodule update --recursive` does not work out of the box with `--reference` when each submodule requires its own reference cache. | |
| - When `reference-cache` is enabled and submodules need to be initialized: | |
| - Read `.gitmodules` (determine all submodule URLs). | |
| - For each submodule, create or update a cache repository (following the same logic as in step 2). | |
| - Check out each submodule individually using `git submodule update --init --reference <cache-path/.git> <path>`. | |
| - When `recursive` is requested: change into each submodule directory and repeat this process for its `.gitmodules` at the script level, instead of simply passing Git's `--recursive` flag through. | |
| ## Acceptance criteria | |
| 1. **New option is configurable**: The `reference-cache` input can be provided and the code reacts accordingly. | |
| 2. **Directory structure is correct**: The cache directory for the main repository and all submodules is named according to the "URL_characters_replaced_plus_truncated_SHA" logic. | |
| 3. **Bandwidth is saved / alternates are used**: During the main checkout, a `.git/objects/info/alternates` file is created that points to the local cache. Subsequent `git fetch` commands are significantly faster and download noticeably fewer bytes. | |
| 4. **Submodules use caches**: Deep (recursive) submodules also benefit from the cache for their respective remote URLs, because a suitable `--reference` target is dynamically computed and passed per submodule. | |
| 5. **No `--dissociate`**: For performance reasons, the working directory remains bound to the cache (`git repack` is expensive). If the cache is removed, the workspace must be recreated (which is the norm for action runners, especially if they are "single-use" runners). |
|
|
||
| // Setup reference cache if requested | ||
| if (settings.referenceCache) { | ||
| core.startGroup('Setting up reference repository cache') | ||
| const cacheHelper = new GitCacheHelper(settings.referenceCache) | ||
| const cachePath = await cacheHelper.setupCache(git, repositoryUrl) | ||
| const cacheObjects = path.join(cachePath, 'objects') | ||
| if (fsHelper.directoryExistsSync(cacheObjects, false)) { | ||
| await git.referenceAdd(cacheObjects) | ||
| } else { | ||
| core.warning(`Reference repository cache objects directory ${cacheObjects} does not exist`) | ||
| } | ||
| core.endGroup() | ||
| } |
There was a problem hiding this comment.
The reference cache setup (both updating the bare cache repo and configuring alternates) only runs when .git does not already exist. On self-hosted runners with persistent workspaces — which is the primary use case for reference caching — subsequent runs will find .git already present. This means:
- The bare cache repository is never updated after the first clone.
- New objects pushed to the remote after the initial cache creation won't be available locally, defeating the purpose of the cache on long-lived runners.
The cache update (setupCache) should also run when .git already exists. Additionally, the alternates file should be verified/re-added even for existing repos, in case the workspace was cleaned but the cache persists (or vice versa).
| // Setup reference cache if requested | |
| if (settings.referenceCache) { | |
| core.startGroup('Setting up reference repository cache') | |
| const cacheHelper = new GitCacheHelper(settings.referenceCache) | |
| const cachePath = await cacheHelper.setupCache(git, repositoryUrl) | |
| const cacheObjects = path.join(cachePath, 'objects') | |
| if (fsHelper.directoryExistsSync(cacheObjects, false)) { | |
| await git.referenceAdd(cacheObjects) | |
| } else { | |
| core.warning(`Reference repository cache objects directory ${cacheObjects} does not exist`) | |
| } | |
| core.endGroup() | |
| } | |
| } | |
| // Setup reference cache if requested | |
| if (settings.referenceCache) { | |
| core.startGroup('Setting up reference repository cache') | |
| const cacheHelper = new GitCacheHelper(settings.referenceCache) | |
| const cachePath = await cacheHelper.setupCache(git, repositoryUrl) | |
| const cacheObjects = path.join(cachePath, 'objects') | |
| if (fsHelper.directoryExistsSync(cacheObjects, false)) { | |
| await git.referenceAdd(cacheObjects) | |
| } else { | |
| core.warning( | |
| `Reference repository cache objects directory ${cacheObjects} does not exist` | |
| ) | |
| } | |
| core.endGroup() |
| "jest-circus": "^29.7.0", | ||
| "js-yaml": "^4.1.0", | ||
| "prettier": "^3.3.3", | ||
| "proper-lockfile": "^4.1.2", |
There was a problem hiding this comment.
proper-lockfile is imported and used in the runtime source code (src/git-cache-helper.ts), but it's listed under devDependencies. Following the existing convention in this project (e.g., uuid is in dependencies with @types/uuid in devDependencies), proper-lockfile should be moved to dependencies and only @types/proper-lockfile should remain in devDependencies.
While ncc bundles everything into dist/index.js regardless of the section, keeping the semantic distinction between runtime and dev dependencies is important for clarity and for tools that rely on this distinction (e.g., npm audit --production).
| async function iterativeSubmoduleUpdate( | ||
| git: IGitCommandManager, | ||
| cacheHelper: GitCacheHelper, | ||
| repositoryPath: string, | ||
| fetchDepth: number, | ||
| nestedSubmodules: boolean | ||
| ): Promise<void> { |
There was a problem hiding this comment.
The function is named iterativeSubmoduleUpdate, but when nestedSubmodules is true, it recursively calls itself (line 148). The name is misleading since the function uses recursion, not iteration. Consider renaming to recursiveSubmoduleUpdate or submoduleUpdateWithCache to more accurately describe its behavior.
| // In rare cases where it somehow exists but objects/ didn't, clean it up | ||
| await fs.promises.rm(cachePath, { recursive: true, force: true }) |
There was a problem hiding this comment.
Lines 71-72 have inconsistent indentation — they use an extra space compared to the surrounding code block. Given the project uses Prettier with 2-space tabWidth, running prettier --write should fix this.
| // In rare cases where it somehow exists but objects/ didn't, clean it up | |
| await fs.promises.rm(cachePath, { recursive: true, force: true }) | |
| // In rare cases where it somehow exists but objects/ didn't, clean it up | |
| await fs.promises.rm(cachePath, { recursive: true, force: true }) |
Full implementation of a reference cache. Works for main git and submodules (recursive). Cache is automatically created/updated.
Fixes #2303