Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug: make indexing timeout configurable #676

Merged
merged 3 commits into from
Oct 31, 2023
Merged

Conversation

jtibshirani
Copy link
Member

On large repos, indexing might take quite a while and hit the indexing timeout.
This change helps debug these situations:

  • Make the indexing timeout configurable through an env variable
    INDEXING_TIMEOUT
  • Add more info to progress logging: log the total number of files being
    indexed, plus the file count per shard

@jtibshirani jtibshirani changed the title Debug: make timeout configurable Debug: make indexing timeout configurable Oct 31, 2023
@jtibshirani jtibshirani marked this pull request as ready for review October 31, 2023 04:31
@@ -183,7 +181,13 @@ func gitIndex(c gitIndexConfig, o *indexArgs, sourcegraph Sourcegraph, l sglog.L

buildOptions := o.BuildOptions()

ctx, cancel := context.WithTimeout(context.Background(), indexTimeout)
// indexingTimeout defines how long the indexserver waits before killing an indexing job.
indexingTimeout := getEnvWithDefaultDuration("INDEXING_TIMEOUT", defaultIndexingTimeout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this part of indexArgs and set it in Server.indexArgs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We chatted about this over video, @stefanhengl is okay with an environment variable since we want to be able to change it without bouncing the server.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a closer look at this, and decided to go with your suggestion, to match what we do for other debugging options like SKIP_SYMBOLS_REPOS_ALLOWLIST. I think it's best to be consistent, even though that means you need to restart the server to pick up new config.

@stefanhengl stefanhengl self-requested a review October 31, 2023 15:46
@jtibshirani jtibshirani merged commit 503302f into main Oct 31, 2023
8 checks passed
@jtibshirani jtibshirani deleted the jtibs/index-timeout branch October 31, 2023 17:34
jtibshirani added a commit to sourcegraph/sourcegraph-public-snapshot that referenced this pull request Oct 31, 2023
Bumps the Zoekt version to pick up debugging improvements for slow indexing
jobs (sourcegraph/zoekt#676). There are no other changes since the last bump.
keegancsmith pushed a commit that referenced this pull request Nov 1, 2023
On large repos, indexing might take quite a while and hit the indexing timeout.
This change helps debug these situations:
* Make the indexing timeout configurable through an env variable
`INDEXING_TIMEOUT`
* Add more info to progress logging: log the total number of files being
indexed, plus the file count per shard
Copy link
Member

@keegancsmith keegancsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, for some reason my PR comments never submitted. Better late than never :)

@@ -1088,8 +1088,11 @@ func (b *Builder) writeShard(fn string, ib *zoekt.IndexBuilder) (*finishedShard,
return nil, err
}

log.Printf("finished %s: %d index bytes (overhead %3.1f)", fn, fi.Size(),
float64(fi.Size())/float64(ib.ContentSize()+1))
log.Printf("finished shard %s: %d index bytes (overhead %3.1f), %d files processed \n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: you don't need the \n with the log pkg.

Suggested change
log.Printf("finished shard %s: %d index bytes (overhead %3.1f), %d files processed \n",
log.Printf("finished shard %s: %d index bytes (overhead %3.1f), %d files processed",

@@ -183,7 +181,13 @@ func gitIndex(c gitIndexConfig, o *indexArgs, sourcegraph Sourcegraph, l sglog.L

buildOptions := o.BuildOptions()

ctx, cancel := context.WithTimeout(context.Background(), indexTimeout)
// indexingTimeout defines how long the indexserver waits before killing an indexing job.
indexingTimeout := getEnvWithDefaultDuration("INDEXING_TIMEOUT", defaultIndexingTimeout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very minor: I think prefixing the envvar with something can help when reading configuration files. I see we have lost our consistency here, but will still have a lot of envvars prefixed with SRC_. We don't use single container deployments as much these days so likely less important, but the prefix can help with accidental collisions as well.

vovakulikov pushed a commit to sourcegraph/sourcegraph-public-snapshot that referenced this pull request Dec 12, 2023
Bumps the Zoekt version to pick up debugging improvements for slow indexing
jobs (sourcegraph/zoekt#676). There are no other changes since the last bump.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants