Using a managed object storage service (S3 or GCS)

By default, Sourcegraph will use a sourcegraph/blobstore server bundled with the instance to temporarily store code graph indexes uploaded by users as well as the results of search jobs.

You can alternatively configure your instance to instead store this data in an S3 or GCS bucket. Doing so may decrease your hosting costs as persistent volumes are often more expensive than the same storage space in an object store service.

`sourcegraph` bucket

Self-hosted Sourcegraph instances using S3 or GCS object storage should now provision an additional bucket named sourcegraph. Sourcegraph currently reports a warning when this bucket is not present, and it will become required for new features in a future release.

The sourcegraph bucket is intended to be the single bucket for new Sourcegraph features. Instead of creating one bucket per feature, new features store objects under namespaced key prefixes within this bucket.

Existing buckets for code graph indexes and search jobs remain in use. This change ensures future features can be enabled without requiring a new bucket for each feature.

Using S3 for the `sourcegraph` bucket

Set the following environment variables to target an S3 bucket for shared Sourcegraph uploads.

SOURCEGRAPH_UPLOAD_BACKEND=S3
SOURCEGRAPH_UPLOAD_BUCKET=sourcegraph (default)
SOURCEGRAPH_UPLOAD_AWS_ENDPOINT=https://s3.us-east-1.amazonaws.com
SOURCEGRAPH_UPLOAD_AWS_ACCESS_KEY_ID=<your access key>
SOURCEGRAPH_UPLOAD_AWS_SECRET_ACCESS_KEY=<your secret key>
SOURCEGRAPH_UPLOAD_AWS_SESSION_TOKEN=<your session token> (optional)
SOURCEGRAPH_UPLOAD_AWS_USE_EC2_ROLE_CREDENTIALS=true (optional; set to use EC2 metadata API over static credentials)
SOURCEGRAPH_UPLOAD_AWS_USE_PATH_STYLE=false (optional)
SOURCEGRAPH_UPLOAD_AWS_REGION=us-east-1 (default)

Using GCS for the `sourcegraph` bucket

Set the following environment variables to target a GCS bucket for shared Sourcegraph uploads.

SOURCEGRAPH_UPLOAD_BACKEND=GCS
SOURCEGRAPH_UPLOAD_BUCKET=sourcegraph (default)
SOURCEGRAPH_UPLOAD_GCP_PROJECT_ID=<my project id>
SOURCEGRAPH_UPLOAD_GOOGLE_APPLICATION_CREDENTIALS_FILE=</path/to/file>
SOURCEGRAPH_UPLOAD_GOOGLE_APPLICATION_CREDENTIALS_FILE_CONTENT=<{"my": "content"}>

Automatically provision the `sourcegraph` bucket

If you would like to allow your Sourcegraph instance to manage the target bucket configuration, set the following environment variable:

This requires additional bucket-management permissions from your configured storage vendor (AWS or GCP).

SOURCEGRAPH_UPLOAD_MANAGE_BUCKET=true

Code Graph Indexes

To target a managed object storage service for storing code graph index uploads, you will need to set a handful of environment variables for configuration and authentication to the target service.

If you are running a sourcegraph/server deployment, set the environment variables on the server container
If you are running via Docker-compose or Kubernetes, set the environment variables on the frontend, worker, and precise-code-intel-worker containers

Using S3 for the Code Graph Indexes bucket

To target an S3 bucket you've already provisioned, set the following environment variables. Authentication can be done through an access and secret key pair (and optional session token), or via the EC2 metadata API.

Never commit AWS access keys in Git. You should consider using a secret handling service offered by your cloud provider.

PRECISE_CODE_INTEL_UPLOAD_BACKEND=S3
PRECISE_CODE_INTEL_UPLOAD_BUCKET=<my bucket name>
PRECISE_CODE_INTEL_UPLOAD_AWS_ENDPOINT=https://s3.us-east-1.amazonaws.com
PRECISE_CODE_INTEL_UPLOAD_AWS_ACCESS_KEY_ID=<your access key>
PRECISE_CODE_INTEL_UPLOAD_AWS_SECRET_ACCESS_KEY=<your secret key>
PRECISE_CODE_INTEL_UPLOAD_AWS_SESSION_TOKEN=<your session token> (optional)
PRECISE_CODE_INTEL_UPLOAD_AWS_USE_EC2_ROLE_CREDENTIALS=true (optional; set to use EC2 metadata API over static credentials)
PRECISE_CODE_INTEL_UPLOAD_AWS_REGION=us-east-1 (default)

If a non-default region is supplied, ensure that the subdomain of the endpoint URL (the AWS_ENDPOINT value) matches the target region.

You don't need to set the PRECISE_CODE_INTEL_UPLOAD_AWS_ACCESS_KEY_ID environment variable when using PRECISE_CODE_INTEL_UPLOAD_AWS_USE_EC2_ROLE_CREDENTIALS=true because role credentials will be automatically resolved. Attach the IAM role to the EC2 instances hosting the frontend, worker, and precise-code-intel-worker containers in a multi-node environment.

Using GCS for the Code Graph Indexes bucket

To target a GCS bucket you've already provisioned, set the following environment variables. Authentication is done through a service account key, supplied as either a path to a volume-mounted file, or the contents read in as an environment variable payload.

PRECISE_CODE_INTEL_UPLOAD_BACKEND=GCS
PRECISE_CODE_INTEL_UPLOAD_BUCKET=<my bucket name>
PRECISE_CODE_INTEL_UPLOAD_GCP_PROJECT_ID=<my project id>
PRECISE_CODE_INTEL_UPLOAD_GOOGLE_APPLICATION_CREDENTIALS_FILE=</path/to/file>
PRECISE_CODE_INTEL_UPLOAD_GOOGLE_APPLICATION_CREDENTIALS_FILE_CONTENT=<{"my": "content"}>

Automatically provision the Code Graph Indexes bucket

If you would like to allow your Sourcegraph instance to control the creation and lifecycle configuration management of the target buckets, set the following environment variables:

This requires additional bucket-management permissions from your configured storage vendor (AWS or GCP).

PRECISE_CODE_INTEL_UPLOAD_MANAGE_BUCKET=true
PRECISE_CODE_INTEL_UPLOAD_TTL=168h (default)

Search Job Results

To target a third party managed object storage service for storing search job results, you must set a handful of environment variables for configuration and authentication to the target service.

If you are running a sourcegraph/server deployment, set the environment variables on the server container
If you are running via Docker-compose or Kubernetes, set the environment variables on the frontend and worker containers

Using S3 for the Search Job Results bucket

Set the following environment variables to target an S3 bucket you've already provisioned. Authentication can be done through an access and secret key pair (and optionally through session token) or via the EC2 metadata API.

Never commit AWS access keys in Git. You should consider using a secret handling service offered by your cloud provider.

SEARCH_JOBS_UPLOAD_BACKEND=S3
SEARCH_JOBS_UPLOAD_BUCKET=<my bucket name>
SEARCH_JOBS_UPLOAD_AWS_ENDPOINT=https://s3.us-east-1.amazonaws.com
SEARCH_JOBS_UPLOAD_AWS_ACCESS_KEY_ID=<your access key>
SEARCH_JOBS_UPLOAD_AWS_SECRET_ACCESS_KEY=<your secret key>
SEARCH_JOBS_UPLOAD_AWS_SESSION_TOKEN=<your session token> (optional)
SEARCH_JOBS_UPLOAD_AWS_USE_EC2_ROLE_CREDENTIALS=true (optional; set to use EC2 metadata API over static credentials)
SEARCH_JOBS_UPLOAD_AWS_REGION=us-east-1 (default)

If a non-default region is supplied, ensure that the subdomain of the endpoint URL (the AWS_ENDPOINT value) matches the target region.

You don't need to set the SEARCH_JOBS_UPLOAD_AWS_ACCESS_KEY_ID environment variable when using SEARCH_JOBS_UPLOAD_AWS_USE_EC2_ROLE_CREDENTIALS=true because role credentials will be automatically resolved.

Using GCS for the Search Job Results bucket

Set the following environment variables to target a GCS bucket you've already provisioned. Authentication is done through a service account key, either as a path to a volume-mounted file or the contents read in as an environment variable payload.

SEARCH_JOBS_UPLOAD_BACKEND=GCS
SEARCH_JOBS_UPLOAD_BUCKET=<my bucket name>
SEARCH_JOBS_UPLOAD_GCP_PROJECT_ID=<my project id>
SEARCH_JOBS_UPLOAD_GOOGLE_APPLICATION_CREDENTIALS_FILE=</path/to/file>
SEARCH_JOBS_UPLOAD_GOOGLE_APPLICATION_CREDENTIALS_FILE_CONTENT=<{"my": "content"}>

Automatically provision the Search Job Results bucket

If you would like to allow your Sourcegraph instance to control the creation and lifecycle configuration management of the target buckets, set the following environment variables:

This requires additional bucket-management permissions from your configured storage vendor (AWS or GCP).

SEARCH_JOBS_UPLOAD_MANAGE_BUCKET=true