Secrets Detection

One of the most common and damaging security mistakes is accidentally committing secrets — API keys, database passwords, private keys, access tokens, and other credentials — into source code repositories. Once a secret is pushed to a Git repository, it exists in the commit history forever, even if you delete it from the current code. If the repository is public, or if any developer's machine is compromised, those secrets can be harvested and exploited within minutes. Automated bots continuously scan GitHub and other public repositories for leaked credentials.

This lesson covers the tools and practices that prevent secrets from leaking and detect them when they do.

What are secrets?

In the context of software security, "secrets" refers to any piece of sensitive data that grants access to a system, service, or resource. They should never appear in source code, configuration files committed to version control, build logs, or client-side web pages.

Types of secrets

API keys: Access tokens for services like AWS, Google Cloud, Stripe, Twilio, SendGrid, and thousands of other APIs.
Database credentials: Usernames and passwords for MySQL, PostgreSQL, MongoDB, Redis, and other data stores.
Private keys: SSH keys, TLS/SSL private keys, JWT signing keys, and PGP private keys.
OAuth tokens: Access and refresh tokens for GitHub, Google, Facebook, and other OAuth providers.
Connection strings: Database URLs, SMTP credentials, and other service endpoints with embedded credentials.
Environment-specific secrets: Webhook signing secrets, encryption keys, session secrets, and cookie signing keys.

Gitleaks

Gitleaks is the most widely used open-source tool for detecting secrets in Git repositories. It scans the entire commit history (not just the current state of files) using a comprehensive set of regular expression patterns to identify API keys, tokens, passwords, and other credentials.

How it works

Scans every commit in a repository's history, including deleted files and branches.
Uses over 150 built-in rules to detect secrets from major providers (AWS, GCP, Azure, GitHub, Stripe, etc.).
Supports custom rules so you can add patterns specific to your organization's internal services.
Produces reports in JSON, CSV, and SARIF formats for integration with CI/CD pipelines and security dashboards.

Example usage

# Scan the current repository
gitleaks detect --source . --verbose

# Scan only staged changes (for pre-commit hook)
gitleaks protect --staged --verbose

# Generate a JSON report
gitleaks detect --source . --report-format json --report-path gitleaks-report.json

Tip: CodeFrog uses Gitleaks to scan for exposed secrets in your website's page source. When you run a CodeFrog mega report, it downloads and analyzes the HTML source of every page looking for accidentally exposed API keys, tokens, and credentials that are visible to anyone viewing your site.

TruffleHog

TruffleHog takes a different approach to secrets detection. In addition to pattern matching, it uses entropy analysis to find high-entropy strings that look like they could be secrets, even if they do not match a known pattern. It also verifies whether detected secrets are actually active by attempting to authenticate with them (with your permission).

Key features

Entropy analysis: Detects random strings that look like credentials even without a matching pattern.
Verification: Can confirm whether a detected secret is still valid and active.
Multi-source scanning: Scans Git repos, S3 buckets, filesystems, and even Slack messages.
700+ detectors: Covers a wide range of credential types with verified detection.

Example usage

# Scan a Git repository
trufflehog git https://github.com/your-org/your-repo.git

# Scan the local filesystem
trufflehog filesystem /path/to/project

# Scan with verification enabled
trufflehog git file://. --only-verified

git-secrets

Developed by AWS Labs, git-secrets is a lightweight tool specifically designed to prevent AWS credentials from being committed to Git repositories. It installs as a Git hook and blocks commits that contain patterns matching AWS access keys, secret keys, and other AWS-specific credential formats.

Example setup

# Install git-secrets
brew install git-secrets

# Initialize in a repository
cd your-repo
git secrets --install

# Register AWS patterns
git secrets --register-aws

# Add custom patterns
git secrets --add 'PRIVATE_KEY'
git secrets --add --allowed 'PRIVATE_KEY_EXAMPLE'

Pre-commit hooks

The most effective way to prevent secrets from entering your repository is to catch them before they are committed. Pre-commit hooks run automatically when a developer attempts to make a commit, scanning the staged changes for secrets and blocking the commit if any are found.

Setting up Gitleaks as a pre-commit hook

Using the popular pre-commit framework, add this to your .pre-commit-config.yaml:

repos:
  - repo: https://github.com/gitleaks/gitleaks
    rev: v8.18.0
    hooks:
      - id: gitleaks

Every time a developer runs git commit, Gitleaks scans the staged files. If a secret is detected, the commit is rejected with a clear error message showing what was found and where.

Why pre-commit hooks matter

Shift-left security: Catching secrets before they enter the repository is far cheaper than remediating after the fact.
Developer experience: Developers get immediate feedback without waiting for a CI pipeline to run.
Defense in depth: Pre-commit hooks are the first layer. CI/CD scanning and regular audits are additional layers.

How CodeFrog scans for exposed secrets

CodeFrog takes a unique approach to secrets detection by focusing on what is actually exposed to the public. Rather than scanning your local Git repository, CodeFrog downloads and analyzes the rendered HTML source of your web pages — the same content any visitor to your site can see.

This catches a class of leaks that repository-level scanning misses:

API keys embedded in client-side JavaScript that gets bundled and deployed.
Credentials accidentally rendered in HTML comments or hidden form fields.
Access tokens injected by server-side templates that were not properly configured for production.
Third-party scripts that expose keys in their configuration objects.

Real examples of leaked secrets causing breaches

Leaked secrets are not a theoretical risk. Major breaches have resulted from this exact mistake:

Uber (2016): Attackers found AWS credentials in a private GitHub repository. They used those credentials to access an S3 bucket containing personal data of 57 million riders and drivers. Uber paid $148 million in settlement.
Samsung (2022): Samsung employees accidentally committed internal source code containing secret keys and credentials to a public GitHub repository. The leaked data included private keys for Samsung's SmartThings backend.
Toyota (2022): A developer accidentally published source code to a public GitHub repository that contained an access key to a server handling customer data. The key was exposed for five years, potentially exposing email addresses and management numbers of nearly 300,000 customers.
CircleCI (2023): After a security breach, CircleCI advised all customers to rotate every secret stored in their platform. Any environment variable, context, or project API token that had been stored in CircleCI was potentially compromised.

Warning: If you discover a leaked secret, rotate it immediately. Do not just delete it from the code — it still exists in Git history. Generate a new key, update all systems that use it, and revoke the old one. Then use tools like git filter-repo or BFG Repo-Cleaner to scrub the secret from history if the repository is public.

Best practices for secrets management

Use environment variables: Store secrets in environment variables, not in code or configuration files.
Use a secrets manager: Services like AWS Secrets Manager, HashiCorp Vault, Azure Key Vault, or 1Password provide secure, audited storage and rotation of secrets.
Maintain a .gitignore file: Ensure .env, *.pem, *.key, and other sensitive files are listed in .gitignore.
Use .gitleaksignore: For known false positives, add them to .gitleaksignore rather than disabling scanning.
Layer your defenses: Combine pre-commit hooks, CI/CD scanning, repository monitoring, and page-level scanning (CodeFrog) for comprehensive coverage.
Rotate secrets regularly: Even if no leak is detected, regular rotation limits the blast radius of any undiscovered exposure.
Audit access: Regularly review who has access to production secrets and revoke access for departed team members immediately.

Resources

Gitleaks — Open-source secrets scanner for Git repositories
TruffleHog — Secrets detection with entropy analysis and verification
git-secrets — AWS Labs tool to prevent committing AWS credentials
pre-commit — Framework for managing pre-commit hooks