Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for shallow cloning repos #363

Merged
merged 6 commits into from
Feb 25, 2025
Merged

Conversation

MeNsaaH
Copy link
Collaborator

@MeNsaaH MeNsaaH commented Feb 7, 2025

This adds support for shallow cloning repos.

With every hook/request, kubechecks clone the full repo. This can become very expensive for large repositories.

Shallow cloning allows us to limit the cloning to just the commits we need.

@MeNsaaH MeNsaaH requested review from djeebus and Greyeye February 7, 2025 01:27
@MeNsaaH MeNsaaH force-pushed the enable-shallow-cloning branch from ab10be3 to 0837db6 Compare February 7, 2025 01:32
Copy link

github-actions bot commented Feb 7, 2025

Temporary image available at ghcr.io/zapier/kubechecks:0.0.0-pr363.

Signed-off-by: Mmadu Manasseh <[email protected]>
Signed-off-by: Mmadu Manasseh <[email protected]>
VcsUploadUrl string `mapstructure:"vcs-upload-url"` // github enterprise upload URL
VcsToken string `mapstructure:"vcs-token"`
VcsType string `mapstructure:"vcs-type"`
EnableShallowClone bool `mapstructure:"enable-shallow-clone"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should prefix the key with vcs- to make it clear what this configures.

Copy link
Collaborator Author

@MeNsaaH MeNsaaH Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djeebus I've used the RepoShallowClone for this instead, what do you think?

I feel Vcs prefixed variables are for vcs configs and this is not "completely" one of them.

@Greyeye
Copy link
Collaborator

Greyeye commented Feb 7, 2025

@MeNsaaH we got caught by dependencies missing several times (e.g. had to implement a kustomize walker)
I am not sure shallow clone is a good idea.

@MeNsaaH
Copy link
Collaborator Author

MeNsaaH commented Feb 7, 2025

@MeNsaaH we got caught by dependencies missing several times (e.g. had to implement a kustomize walker)
I am not sure shallow clone is a good idea.

@Greyeye can you please elaborate how shallow cloning will cause missing dependencies

@MeNsaaH MeNsaaH force-pushed the enable-shallow-cloning branch from a52748f to 2c06ea0 Compare February 7, 2025 22:14
@MeNsaaH MeNsaaH force-pushed the enable-shallow-cloning branch from 2c06ea0 to acd97a9 Compare February 7, 2025 22:41
@MeNsaaH MeNsaaH requested a review from djeebus February 10, 2025 11:41
Signed-off-by: Mmadu Manasseh <[email protected]>
@MeNsaaH MeNsaaH force-pushed the enable-shallow-cloning branch from a80d3ef to 012a083 Compare February 10, 2025 23:46
@MeNsaaH MeNsaaH requested a review from djeebus February 11, 2025 00:09
@zapier-sre-bot
Copy link
Collaborator

Mergecat's Review

Click to read mergecats review!

😼 Mergecat review of pkg/utils.go

@@ -16,6 +16,7 @@ func Pointer[T interface{}](item T) *T {
 }
 
 func WipeDir(dir string) {
+	log.Debug().Str("path", dir).Msg("wiping path")
 	if err := os.RemoveAll(dir); err != nil {
 		log.Error().
 			Err(err).

Feedback & Suggestions:

  1. Log Level Appropriateness: The addition of a log.Debug() statement is useful for tracing and debugging. However, ensure that the debug logging is appropriate for the production environment, as it might expose sensitive information about the directory structure. Consider using a configuration flag to enable or disable debug logging based on the environment. 🛡️

  2. Performance Consideration: Logging can introduce performance overhead, especially if the WipeDir function is called frequently. Ensure that the logging library is configured to handle high-frequency log messages efficiently. Consider using asynchronous logging if performance becomes an issue. 🚀


😼 Mergecat review of cmd/root.go

@@ -119,6 +119,9 @@ func init() {
 		newStringOpts().
 			withDefault("kubechecks again"))
 	stringSliceFlag(flags, "additional-apps-namespaces", "Additional namespaces other than the ArgoCDNamespace to monitor for applications.")
+	boolFlag(flags, "repo-shallow-clone", "Enable shallow cloning for all git repos.",
+		newBoolOpts().
+			withDefault(false))
 	stringFlag(flags, "identifier", "Identifier for the kubechecks instance. Used to differentiate between multiple kubechecks instances.",
 		newStringOpts().
 			withDefault(""))

Feedback & Suggestions:

  1. Security Consideration: Ensure that enabling shallow cloning does not inadvertently expose sensitive data or reduce the security of the cloning process. Shallow clones can be beneficial for performance but may omit important history that could be relevant for security audits.

  2. Documentation: Update the documentation to reflect the addition of the repo-shallow-clone flag. This will help users understand the implications and use cases for enabling shallow cloning.

  3. Testing: Verify that the shallow clone functionality is thoroughly tested, especially in environments where full history might be required for certain operations. Consider edge cases where shallow cloning might fail or cause unexpected behavior.

  4. Performance: While shallow cloning can improve performance by reducing the amount of data transferred, ensure that this does not negatively impact other parts of the application that might rely on full repository history.


😼 Mergecat review of pkg/config/config.go

@@ -79,6 +79,7 @@ type ServerConfig struct {
 	MonitorAllApplications   bool          `mapstructure:"monitor-all-applications"`
 	OpenAIAPIToken           string        `mapstructure:"openai-api-token"`
 	RepoRefreshInterval      time.Duration `mapstructure:"repo-refresh-interval"`
+	RepoShallowClone         bool          `mapstructure:"repo-shallow-clone"`
 	SchemasLocations         []string      `mapstructure:"schemas-location"`
 	ShowDebugInfo            bool          `mapstructure:"show-debug-info"`
 	TidyOutdatedCommentsMode string        `mapstructure:"tidy-outdated-comments-mode"`

Feedback & Suggestions:

  1. Security Consideration: If RepoShallowClone is intended to control whether repositories are cloned shallowly, ensure that this feature is implemented securely. Shallow clones can be beneficial for performance, but they might miss some history that could be important for security audits or understanding the full context of changes.

  2. Documentation: Consider adding a comment or documentation explaining the purpose of RepoShallowClone. This will help other developers understand its intended use and any implications it might have.

  3. Default Value: If RepoShallowClone is a new configuration option, consider whether it should have a default value set in the configuration initialization. This can prevent potential issues if the value is not explicitly set by the user.

  4. Testing: Ensure that there are tests covering scenarios where RepoShallowClone is both true and false. This will help verify that the behavior is as expected in both cases.


😼 Mergecat review of pkg/events/check.go

@@ -55,7 +55,7 @@ type CheckEvent struct {
 }
 
 type repoManager interface {
-	Clone(ctx context.Context, cloneURL, branchName string) (*git.Repo, error)
+	Clone(ctx context.Context, cloneURL, branchName string, shallow bool) (*git.Repo, error)
 }
 
 func generateMatcher(ce *CheckEvent, repo *git.Repo) error {
@@ -192,7 +192,7 @@ func (ce *CheckEvent) getRepo(ctx context.Context, cloneURL, branchName string)
 		return repo, nil
 	}
 
-	repo, err = ce.repoManager.Clone(ctx, cloneURL, branchName)
+	repo, err = ce.repoManager.Clone(ctx, cloneURL, branchName, ce.ctr.Config.RepoShallowClone)
 	if err != nil {
 		return nil, errors.Wrap(err, "failed to clone repo")
 	}

Feedback & Suggestions:

  1. Security Consideration: Ensure that the shallow parameter is properly validated and sanitized before being used in the Clone method. This is crucial to prevent any potential misuse or security vulnerabilities related to repository cloning.

  2. Performance Improvement: The introduction of the shallow parameter is a good step towards improving performance by potentially reducing the amount of data cloned. However, ensure that the RepoShallowClone configuration is set appropriately based on the use case, as shallow clones might not be suitable for all operations.

  3. Documentation Update: Update any relevant documentation or comments to reflect the change in the Clone method signature. This will help maintain clarity for future developers working with this code.

  4. Testing: Ensure that there are adequate tests covering both shallow and full clone scenarios to verify that the behavior is as expected and that no regressions are introduced.


😼 Mergecat review of pkg/git/manager.go

@@ -22,8 +22,11 @@ func NewRepoManager(cfg config.ServerConfig) *RepoManager {
 	return &RepoManager{cfg: cfg}
 }
 
-func (rm *RepoManager) Clone(ctx context.Context, cloneUrl, branchName string) (*Repo, error) {
+func (rm *RepoManager) Clone(ctx context.Context, cloneUrl, branchName string, shallow bool) (*Repo, error) {
 	repo := New(rm.cfg, cloneUrl, branchName)
+	if shallow {
+		repo.Shallow = true
+	}
 
 	if err := repo.Clone(ctx); err != nil {
 		return nil, errors.Wrap(err, "failed to clone repository")

Feedback & Suggestions:

  1. Parameter Documentation: 📄 Consider updating the function documentation to include the new shallow parameter. This will help other developers understand the purpose and effect of this parameter.

  2. Boolean Parameter: 🤔 Using a boolean parameter like shallow can sometimes lead to less readable code, especially if the function is called with a literal value (e.g., Clone(ctx, url, branch, true)). Consider using an options struct or constants to improve readability.

  3. Testing: 🧪 Ensure that there are corresponding tests for this new functionality. Testing both the shallow and non-shallow clone scenarios will help maintain code reliability.

  4. Error Handling: ⚠️ If the Shallow field in Repo has any constraints or specific behavior, ensure that the error handling in the Clone method accounts for these scenarios.


😼 Mergecat review of localdev/kubechecks/values.yaml

@@ -22,16 +22,18 @@ configMap:
     #
     # KUBECHECKS_LABEL_FILTER: "test" # On your PR/MR, prefix this with "kubechecks:"
     # KUBECHECKS_SCHEMAS_LOCATION: https://github.com/zapier/kubecheck-schemas.git
+    KUBECHECKS_REPO_REFRESH_INTERVAL: 30s
     KUBECHECKS_TIDY_OUTDATED_COMMENTS_MODE: "delete"
     KUBECHECKS_ENABLE_CONFTEST: "false"
+    KUBECHECKS_REPO_SHALLOW_CLONE: "true"
     KUBECHECKS_IDENTIFIER: "test"
 
 deployment:
   annotations:
     reloader.stakater.com/auto: "true" 
   
   image:
-    pullPolicy: Never
+    pullPolicy: IfNotPresent
     name: "kubechecks"
     tag: ""
 

Feedback & Suggestions:

  1. Security Concern: Ensure that the KUBECHECKS_REPO_REFRESH_INTERVAL and KUBECHECKS_REPO_SHALLOW_CLONE values are validated and sanitized to prevent any potential injection attacks. Consider documenting the expected format and constraints for these values. 🛡️

  2. Performance Consideration: The addition of KUBECHECKS_REPO_SHALLOW_CLONE: "true" is a good move for performance as it reduces the amount of data cloned. Ensure that this does not affect any functionality that might require full clone history. ⚡

  3. Image Pull Policy: Changing pullPolicy from Never to IfNotPresent is generally a good practice for environments where images might be updated. However, ensure that this aligns with your deployment strategy and that the latest images are available in your registry. 🐳


😼 Mergecat review of docs/usage.md

@@ -71,6 +71,7 @@ The full list of supported environment variables is described below:
 |`KUBECHECKS_POLICIES_LOCATION`|Sets rego policy locations to be used for every check request. Can be common path inside the repos being checked or git urls in either git or http(s) format.|`[./policies]`|
 |`KUBECHECKS_REPLAN_COMMENT_MSG`|comment message which re-triggers kubechecks on PR.|`kubechecks again`|
 |`KUBECHECKS_REPO_REFRESH_INTERVAL`|Interval between static repo refreshes (for schemas and policies).|`5m`|
+|`KUBECHECKS_REPO_SHALLOW_CLONE`|Enable shallow cloning for all git repos.|`false`|
 |`KUBECHECKS_SCHEMAS_LOCATION`|Sets schema locations to be used for every check request. Can be a common path on the host or git urls in either git or http(s) format.|`[]`|
 |`KUBECHECKS_SHOW_DEBUG_INFO`|Set to true to print debug info to the footer of MR comments.|`false`|
 |`KUBECHECKS_TIDY_OUTDATED_COMMENTS_MODE`|Sets the mode to use when tidying outdated comments. One of hide, delete.|`hide`|

Feedback & Suggestions:

  1. Clarification on Shallow Cloning: The new environment variable KUBECHECKS_REPO_SHALLOW_CLONE is a great addition for performance optimization. However, it would be beneficial to clarify what "shallow cloning" means in this context. Consider adding a brief explanation or a link to documentation about shallow cloning in Git. This will help users understand the implications of enabling this option. 📚

  2. Security Consideration: If shallow cloning is enabled, ensure that it does not inadvertently skip fetching important commit history that might be necessary for security audits or compliance checks. Consider adding a note about this potential trade-off. 🔒

  3. Default Value Explanation: The default value is set to false, which is good for maintaining current behavior. However, it might be helpful to explain why false is the default and under what circumstances a user might want to change it to true. This can guide users in making an informed decision. 🤔


😼 Mergecat review of cmd/locations.go

@@ -18,7 +18,7 @@ import (
 
 func processLocations(ctx context.Context, ctr container.Container, locations []string) error {
 	for index, location := range locations {
-		if newLocation, err := maybeCloneGitUrl(ctx, ctr.RepoManager, ctr.Config.RepoRefreshInterval, location, ctr.VcsClient.Username()); err != nil {
+		if newLocation, err := maybeCloneGitUrl(ctx, ctr.RepoManager, ctr.Config.RepoRefreshInterval, location, ctr.VcsClient.Username(), ctr.Config.RepoShallowClone); err != nil {
 			return errors.Wrapf(err, "failed to clone %q", location)
 		} else if newLocation != "" {
 			locations[index] = newLocation
@@ -31,12 +31,12 @@ func processLocations(ctx context.Context, ctr container.Container, locations []
 }
 
 type cloner interface {
-	Clone(ctx context.Context, cloneUrl, branchName string) (*git.Repo, error)
+	Clone(ctx context.Context, cloneUrl, branchName string, shallow bool) (*git.Repo, error)
 }
 
 var ErrCannotUseQueryWithFilePath = errors.New("relative and absolute file paths cannot have query parameters")
 
-func maybeCloneGitUrl(ctx context.Context, repoManager cloner, repoRefreshDuration time.Duration, location, vcsUsername string) (string, error) {
+func maybeCloneGitUrl(ctx context.Context, repoManager cloner, repoRefreshDuration time.Duration, location, vcsUsername string, shallow bool) (string, error) {
 	result := strings.SplitN(location, "?", 2)
 	if !isGitURL(result[0]) {
 		if len(result) > 1 {
@@ -51,7 +51,7 @@ func maybeCloneGitUrl(ctx context.Context, repoManager cloner, repoRefreshDurati
 	}
 	cloneUrl := repoUrl.CloneURL(vcsUsername)
 
-	repo, err := repoManager.Clone(ctx, cloneUrl, query.Get("branch"))
+	repo, err := repoManager.Clone(ctx, cloneUrl, query.Get("branch"), shallow)
 	if err != nil {
 		return "", errors.Wrap(err, "failed to clone")
 	}

Feedback & Suggestions:

  1. Parameter Addition: The addition of the shallow parameter to the Clone function and its propagation through maybeCloneGitUrl is a good enhancement for performance, especially when dealing with large repositories. However, ensure that the shallow parameter is being correctly utilized within the Clone method implementation to actually perform a shallow clone. 🏗️

  2. Backward Compatibility: Ensure that all calls to maybeCloneGitUrl and Clone are updated to include the new shallow parameter. If there are other parts of the codebase or external packages that use these functions, they will need to be updated to maintain compatibility. 🔄

  3. Documentation: Update any relevant documentation or comments to reflect the new parameter and its purpose. This will help other developers understand the change and its impact. 📚

  4. Testing: Add or update tests to cover scenarios with both shallow and non-shallow clones. This will ensure that the new functionality works as expected and does not introduce any regressions. 🧪


😼 Mergecat review of cmd/locations_test.go

@@ -19,7 +19,7 @@ type fakeCloner struct {
 	err                  error
 }
 
-func (f *fakeCloner) Clone(_ context.Context, cloneUrl, branchName string) (*git.Repo, error) {
+func (f *fakeCloner) Clone(_ context.Context, cloneUrl, branchName string, shallow bool) (*git.Repo, error) {
 	f.cloneUrl = cloneUrl
 	f.branchName = branchName
 	return f.result, f.err
@@ -43,7 +43,7 @@ func TestMaybeCloneGitUrl_NonGitUrl(t *testing.T) {
 		tc := tc
 		t.Run(tc.name, func(t *testing.T) {
 			fc := &fakeCloner{result: nil, err: nil}
-			actual, err := maybeCloneGitUrl(ctx, fc, time.Duration(0), tc.input, testUsername)
+			actual, err := maybeCloneGitUrl(ctx, fc, time.Duration(0), tc.input, testUsername, false)
 			require.NoError(t, err)
 			assert.Equal(t, "", fc.branchName)
 			assert.Equal(t, "", fc.cloneUrl)
@@ -137,7 +137,7 @@ func TestMaybeCloneGitUrl_HappyPath(t *testing.T) {
 		tc := tc
 		t.Run(tc.name, func(t *testing.T) {
 			fc := &fakeCloner{result: &git.Repo{Directory: testRoot}, err: nil}
-			actual, err := maybeCloneGitUrl(ctx, fc, time.Duration(0), tc.input, testUsername)
+			actual, err := maybeCloneGitUrl(ctx, fc, time.Duration(0), tc.input, testUsername, false)
 			require.NoError(t, err)
 			assert.Equal(t, tc.expected.branch, fc.branchName)
 			assert.Equal(t, tc.expected.cloneUrl, fc.cloneUrl)
@@ -165,7 +165,7 @@ func TestMaybeCloneGitUrl_URLError(t *testing.T) {
 		tc := tc
 		t.Run(tc.name, func(t *testing.T) {
 			fc := &fakeCloner{result: &git.Repo{Directory: testRoot}, err: nil}
-			result, err := maybeCloneGitUrl(ctx, fc, time.Duration(0), tc.input, testUsername)
+			result, err := maybeCloneGitUrl(ctx, fc, time.Duration(0), tc.input, testUsername, false)
 			require.ErrorContains(t, err, tc.expected)
 			require.Equal(t, "", result)
 		})
@@ -193,7 +193,7 @@ func TestMaybeCloneGitUrl_CloneError(t *testing.T) {
 			defer cancel()
 
 			fc := &fakeCloner{result: &git.Repo{Directory: testRoot}, err: tc.cloneError}
-			result, err := maybeCloneGitUrl(ctx, fc, time.Duration(0), tc.input, testUsername)
+			result, err := maybeCloneGitUrl(ctx, fc, time.Duration(0), tc.input, testUsername, false)
 			require.ErrorContains(t, err, tc.expected)
 			require.Equal(t, "", result)
 		})

Feedback & Suggestions:

  1. Parameter Addition: The addition of the shallow parameter to the Clone method and the maybeCloneGitUrl function calls is a good enhancement for supporting shallow cloning. However, ensure that the shallow parameter is utilized within the Clone method to affect the cloning behavior. If it's not yet implemented, consider adding logic to handle shallow cloning based on this parameter.

  2. Test Coverage: Since a new parameter shallow has been introduced, it would be beneficial to add test cases that specifically test the behavior when shallow is set to true. This will ensure that the shallow cloning functionality is correctly implemented and tested.

  3. Code Consistency: Ensure that all instances where maybeCloneGitUrl is called are updated to include the new shallow parameter. It seems like this has been done, but double-check to avoid any missed instances.

  4. Documentation: If this code is part of a larger codebase, consider updating any relevant documentation to reflect the new parameter and its intended use. This will help other developers understand the purpose and usage of the shallow parameter.

Overall, the changes are a step in the right direction for enhancing functionality. Just make sure to follow through with implementation and testing for the new parameter. 🚀


😼 Mergecat review of pkg/git/repo.go

@@ -28,6 +28,7 @@ type Repo struct {
 	BranchName string
 	Config     config.ServerConfig
 	CloneURL   string
+	Shallow    bool
 
 	// exposed state
 	Directory string
@@ -46,11 +47,17 @@ func New(cfg config.ServerConfig, cloneUrl, branchName string) *Repo {
 }
 
 func (r *Repo) Clone(ctx context.Context) error {
+	if r.Shallow {
+		return r.shallowClone(ctx)
+	}
+
 	var err error
 
-	r.Directory, err = os.MkdirTemp("/tmp", "kubechecks-repo-")
-	if err != nil {
-		return errors.Wrap(err, "failed to make temp dir")
+	if r.Directory == "" {
+		r.Directory, err = os.MkdirTemp("/tmp", "kubechecks-repo-")
+		if err != nil {
+			return errors.Wrap(err, "failed to make temp dir")
+		}
 	}
 
 	log.Info().
@@ -85,6 +92,63 @@ func (r *Repo) Clone(ctx context.Context) error {
 	return nil
 }
 
+func (r *Repo) shallowClone(ctx context.Context) error {
+	var err error
+
+	if r.Directory == "" {
+		r.Directory, err = os.MkdirTemp("/tmp", "kubechecks-repo-")
+		if err != nil {
+			return errors.Wrap(err, "failed to make temp dir")
+		}
+	}
+
+	log.Info().
+		Str("temp-dir", r.Directory).
+		Str("clone-url", r.CloneURL).
+		Str("branch", r.BranchName).
+		Msg("cloning git repo")
+
+	//  Attempt to locally clone the repo based on the provided information stored within
+	_, span := tracer.Start(ctx, "ShallowCloneRepo")
+	defer span.End()
+
+	args := []string{"clone", r.CloneURL, r.Directory, "--depth", "1"}
+	cmd := r.execGitCommand(args...)
+	out, err := cmd.CombinedOutput()
+	if err != nil {
+		log.Error().Err(err).Msgf("unable to clone repository, %s", out)
+		return err
+	}
+
+	if r.BranchName != "HEAD" {
+		// Fetch SHA
+		args = []string{"fetch", "origin", r.BranchName, "--depth", "1"}
+		cmd = r.execGitCommand(args...)
+		out, err = cmd.CombinedOutput()
+		if err != nil {
+			log.Error().Err(err).Msgf("unable to fetch %s repository, %s", r.BranchName, out)
+			return err
+		}
+		// Checkout SHA
+		args = []string{"checkout", r.BranchName}
+		cmd = r.execGitCommand(args...)
+		out, err = cmd.CombinedOutput()
+		if err != nil {
+			log.Error().Err(err).Msgf("unable to checkout branch %s repository, %s", r.BranchName, out)
+			return err
+		}
+	}
+
+	if log.Trace().Enabled() {
+		if err = filepath.WalkDir(r.Directory, printFile); err != nil {
+			log.Warn().Err(err).Msg("failed to walk directory")
+		}
+	}
+
+	log.Info().Msg("repo has been cloned")
+	return nil
+}
+
 func printFile(s string, d fs.DirEntry, err error) error {
 	if err != nil {
 		return err
@@ -118,8 +182,24 @@ func (r *Repo) MergeIntoTarget(ctx context.Context, ref string) error {
 			attribute.String("sha", ref),
 		))
 	defer span.End()
+	merge_command := []string{"merge", ref}
+	// For shallow clones, we need to pull the ref into the repo
+	if r.Shallow {
+		ref = strings.TrimPrefix(ref, "origin/")
+		cmd := r.execGitCommand("fetch", "origin", fmt.Sprintf("%s:%s", ref, ref), "--depth", "1")
+		out, err := cmd.CombinedOutput()
+		if err != nil {
+			telemetry.SetError(span, err, "fetch origin ref")
+			log.Error().Err(err).Msgf("unable to fetch ref %s, %s", ref, out)
+			return err
+		}
+		// When merging shallow clones, we need to allow unrelated histories
+		// and use the "theirs" strategy to avoid conflicts
+		// cons of this is that it may not be entirely accurate and may overwrite changes in the target branch
+		merge_command = []string{"merge", ref, "--allow-unrelated-histories", "-X", "theirs"}
+	}
 
-	cmd := r.execGitCommand("merge", ref)
+	cmd := r.execGitCommand(merge_command...)
 	out, err := cmd.CombinedOutput()
 	if err != nil {
 		telemetry.SetError(span, err, "merge commit into branch")
@@ -131,6 +211,15 @@ func (r *Repo) MergeIntoTarget(ctx context.Context, ref string) error {
 }
 
 func (r *Repo) Update(ctx context.Context) error {
+	// Since we're shallow cloning, to update we need to wipe the directory and re-clone
+	if r.Shallow {
+		r.Wipe()
+		err := os.Mkdir(r.Directory, 0700)
+		if err != nil {
+			return errors.Wrap(err, "failed to create repo directory")
+		}
+		return r.Clone(ctx)
+	}
 	cmd := r.execGitCommand("pull")
 	cmd.Stdout = os.Stdout
 	cmd.Stderr = os.Stdout

Feedback & Suggestions:

  1. Security Concern with Shallow Cloning: The use of --allow-unrelated-histories and -X theirs during merges for shallow clones can lead to potential data loss or overwriting of important changes. Consider implementing a more robust conflict resolution strategy or at least logging a warning to inform users of the potential risks. ⚠️

  2. Error Handling: In the shallowClone method, the error handling is consistent, but consider adding more context to the error messages to aid in debugging. For example, specify which step of the cloning process failed. 🐛

  3. Directory Creation: In the Update method, after wiping the directory, the directory is recreated with os.Mkdir. Ensure that the directory path is valid and consider using os.MkdirAll to handle nested directories gracefully. 🗂️

  4. Code Duplication: The logic for creating a temporary directory is duplicated in both Clone and shallowClone. Consider refactoring this into a separate helper function to reduce code duplication and improve maintainability. 🔄

  5. Performance Consideration: Shallow cloning and fetching with --depth 1 can be beneficial for performance, but ensure that this approach aligns with the use case requirements, especially if full history is needed later. 📈



Dependency Review

Click to read mergecats review!

No suggestions found

@MeNsaaH MeNsaaH requested review from djeebus and removed request for djeebus February 24, 2025 22:52
@djeebus
Copy link
Collaborator

djeebus commented Feb 25, 2025

I suspect he's talking about things like scripts or helm plugins that depend on git history to exist. It looks like this can produce some nice speed ups - one synthetic test showed it drop check times from ~16 seconds down to 11 seconds. As long as it's disabled by default (which it is), I think this is quite useful. Thanks!

@djeebus djeebus merged commit ed4056c into main Feb 25, 2025
5 checks passed
@djeebus djeebus deleted the enable-shallow-cloning branch February 25, 2025 00:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants