Search Techniques

This guide covers URL discovery methods, search techniques, and optimization strategies for maximizing reconnaissance effectiveness with WayHack's built-in search command.

Search Fundamentals

Basic Search Patterns

Domain-based Search:

wayhack search --domain example.com

Subdomain Discovery:

wayhack search --domain example.com --include-subdomains

Path-specific Search:

wayhack search --domain example.com --path "/admin"

File Extension Targeting:

wayhack search --domain example.com --extensions "pdf,doc,xls"

Output Format Options:

# JSON output
wayhack search --domain example.com --output json

# CSV output
wayhack search --domain example.com --output csv

# Text output (default)
wayhack search --domain example.com --output text

Result Limiting:

# Limit results to 500
wayhack search --domain example.com --limit 500

# Get maximum results
wayhack search --domain example.com --limit 5000

Advanced Search Options

Source Selection:

# Use specific data sources
wayhack search --domain example.com --sources wayback,crtsh

# Use all available sources (default)
wayhack search --domain example.com --sources wayback,crtsh,commoncrawl

# Single source for faster results
wayhack search --domain example.com --sources wayback

Combined Filters:

# Combine multiple filters
wayhack search --domain example.com --path "/api" --extensions "json"

# Subdomain discovery with specific extensions
wayhack search --domain example.com --include-subdomains --extensions "pdf,doc"

# Path and subdomain combination
wayhack search --domain example.com --include-subdomains --path "/admin"

Multiple Search Strategy:

# Separate searches for different purposes
wayhack search --domain example.com --path "/api" --output json
wayhack search --domain example.com --path "/admin" --output json
wayhack search --domain example.com --extensions "pdf,doc,xls" --output json

# View all results
wayhack view --latest --count 3

Data Sources

Available Sources

WayHack's search command supports multiple data sources for comprehensive URL discovery:

  • wayback: Wayback Machine archives for historical web data

  • urlscan: URLScan.io for live web scanning and analysis

  • otx: AlienVault OTX for threat intelligence data

  • commoncrawl: Common Crawl web archive data

  • shodan: Shodan for internet-connected device discovery

  • profundis: Profundis.io for deep web crawling

  • virustotal: VirusTotal for domain and URL intelligence

  • securitytrails: SecurityTrails for historical DNS data

  • censys: Censys for internet asset discovery

  • intelx: IntelX.io for threat intelligence gathering

  • leakix: LeakIX.net for data leak discovery

  • fofa: Fofa for cyber asset discovery

  • crtsh: Certificate Transparency logs

  • netlas: Netlas.io for internet asset intelligence

  • builtwith: BuiltWith for technology stack analysis

  • zoomeye: ZoomEye for cyberspace search

  • hunter: Hunter.how for attack surface discovery

  • github: GitHub code and repository search

  • gitlab: GitLab code and repository search

Wayback Machine

Basic Usage:

# Wayback Machine only
wayhack search --domain example.com --sources wayback

# Include subdomains for broader coverage
wayhack search --domain example.com --sources wayback --include-subdomains

Best Practices:

# Combine with path filtering for targeted discovery
wayhack search --domain example.com --sources wayback --path "/api"

# Focus on specific file types
wayhack search --domain example.com --sources wayback --extensions "js,json,xml"

Certificate Transparency (crt.sh)

Subdomain Discovery:

# Basic certificate transparency search
wayhack search --domain example.com --sources crtsh

# Include subdomains for comprehensive enumeration
wayhack search --domain example.com --sources crtsh --include-subdomains

Targeted Discovery:

# Combine with other sources for validation
wayhack search --domain example.com --sources crtsh,wayback

# Focus on specific paths in discovered subdomains
wayhack search --domain example.com --sources crtsh --include-subdomains --path "/admin"

Common Crawl

Large-scale Discovery:

# Common Crawl data mining
wayhack search --domain example.com --sources commoncrawl

# Limit results for faster processing
wayhack search --domain example.com --sources commoncrawl --limit 2000

Practical Search Workflows

Comprehensive Domain Reconnaissance

Step 1: Initial Discovery:

# Start with all sources for maximum coverage
wayhack search --domain example.com --sources wayback,crtsh,commoncrawl

Step 2: Subdomain Enumeration:

# Focus on subdomain discovery
wayhack search --domain example.com --sources crtsh --include-subdomains

Step 3: Targeted Path Discovery:

# Look for admin interfaces
wayhack search --domain example.com --path "/admin" --include-subdomains

# API endpoint discovery
wayhack search --domain example.com --path "/api" --include-subdomains

# Common sensitive paths
wayhack search --domain example.com --path "/backup" --include-subdomains

Document and File Discovery

Sensitive File Types:

# Configuration files
wayhack search --domain example.com --extensions "xml,json,yml,yaml"

# Documentation and backups
wayhack search --domain example.com --extensions "pdf,doc,docx,xls,xlsx"

# Database and backup files
wayhack search --domain example.com --extensions "sql,db,bak,backup"

Development Files:

# Source code and configs
wayhack search --domain example.com --extensions "js,php,py,rb,java"

# Environment and config files
wayhack search --domain example.com --extensions "env,config,ini,conf"

Advanced Discovery Techniques

Multi-Source Strategy

Comprehensive Discovery:

# All available sources for maximum coverage
wayhack search --domain example.com --sources wayback,crtsh,commoncrawl

# Archive sources for historical data
wayhack search --domain example.com --sources wayback,commoncrawl

# Certificate transparency for subdomain discovery
wayhack search --domain example.com --sources crtsh --include-subdomains

Source Prioritization Strategy:

# Fast discovery with crt.sh
wayhack search --domain example.com --sources crtsh --output json

# Comprehensive follow-up with Wayback
wayhack search --domain example.com --sources wayback --output json

# Large dataset mining with Common Crawl
wayhack search --domain example.com --sources commoncrawl --limit 3000 --output json

# View all results
wayhack view --latest --count 3

Automated Discovery Workflows

Batch Subdomain Discovery:

#!/bin/bash
# Automated subdomain enumeration
domain="example.com"

# Initial subdomain discovery
echo "Starting subdomain discovery for $domain"
wayhack search --domain "$domain" --sources crtsh --include-subdomains --output json

# Get the latest scan ID for processing
latest_scan=$(wayhack view --latest --tool search | head -1 | awk '{print $1}')
echo "Latest scan ID: $latest_scan"

# View results
wayhack view "$latest_scan"

Path Enumeration Workflow:

#!/bin/bash
# Systematic path discovery
domain="example.com"
common_paths=("/admin" "/api" "/dashboard" "/login" "/upload" "/backup")

echo "Starting path enumeration for $domain"
for path in "${common_paths[@]}"; do
  echo "Searching for path: $path"
  wayhack search --domain "$domain" --path "$path" --include-subdomains --output json
  sleep 2  # Rate limiting
done

echo "Path enumeration complete. View results with: wayhack view --latest --count ${#common_paths[@]}"

Result Analysis

Viewing Search Results:

# View latest search
wayhack view --latest

# View specific search by ID
wayhack view scan_1234567890

# View multiple recent searches
wayhack view --latest --count 5

# View detailed scan information
wayhack view --detailed

Processing Results:

# Search results are automatically saved in:
# ~/.wayhack-outputs/scan_ID/results.txt (or .json, .csv)

# Example: Extract unique domains from results
cat ~/.wayhack-outputs/scan_*/results.txt | grep -oP 'https?://[^/]+' | sort -u

# Example: Filter for specific file types
cat ~/.wayhack-outputs/scan_*/results.txt | grep -E '\.(pdf|doc|xls)$'

Best Practices and Tips

Search Optimization

Start Small, Scale Up:

# Begin with fast sources
wayhack search --domain example.com --sources crtsh

# Expand to comprehensive search
wayhack search --domain example.com --sources wayback,crtsh,commoncrawl

# Use limits for large domains
wayhack search --domain example.com --limit 2000

Targeted Discovery:

# Focus on specific areas of interest
wayhack search --domain example.com --path "/api" --extensions "json,xml"

# Combine filters for precision
wayhack search --domain example.com --include-subdomains --extensions "pdf,doc" --limit 500

Managing Large Result Sets

Use Appropriate Limits:

# Small test run
wayhack search --domain example.com --limit 100

# Medium discovery
wayhack search --domain example.com --limit 1000

# Comprehensive search
wayhack search --domain example.com --limit 5000

Output Format Selection:

# JSON for programmatic processing
wayhack search --domain example.com --output json

# CSV for spreadsheet analysis
wayhack search --domain example.com --output csv

# Text for simple viewing
wayhack search --domain example.com --output text

Workflow Integration

Sequential Searches:

# Progressive discovery approach
wayhack search --domain example.com --sources crtsh --include-subdomains
wayhack search --domain example.com --sources wayback --path "/admin"
wayhack search --domain example.com --sources commoncrawl --extensions "pdf,doc"

# Review all results
wayhack view --latest --count 3

Command Reference

Search Command Syntax

wayhack search [flags]

Available Flags

Flag

Short

Description

Default

--domain

-d

Target domain to search (required)

-

--sources

-s

Comma-separated list of data sources

wayback,crtsh,commoncrawl

--include-subdomains

-i

Include subdomains in search

false

--extensions

-e

Comma-separated list of file extensions

-

--path

-p

Specific path to search for

-

--output

-o

Output format (text, json, csv)

text

--limit

-l

Maximum number of results

1000

Quick Reference Examples

# Basic domain search
wayhack search -d example.com

# Subdomain discovery
wayhack search -d example.com -i

# Specific file types
wayhack search -d example.com -e pdf,doc,xls

# Admin panel discovery
wayhack search -d example.com -p "/admin" -i

# JSON output with limit
wayhack search -d example.com -o json -l 500

# Multiple sources
wayhack search -d example.com -s wayback,crtsh

Troubleshooting

Common Issues

API Connection Problems:

# Check API configuration
wayhack check

# Verify API key setup
wayhack setup

Large Result Sets:

# Use limits to manage large datasets
wayhack search --domain example.com --limit 1000

# Use specific sources for faster results
wayhack search --domain example.com --sources crtsh

No Results Found:

# Try different sources
wayhack search --domain example.com --sources wayback
wayhack search --domain example.com --sources commoncrawl

# Include subdomains for broader coverage
wayhack search --domain example.com --include-subdomains

Conclusion

The WayHack search command provides a powerful interface for URL discovery and OSINT reconnaissance. By combining multiple data sources, flexible filtering options, and automated result management, it streamlines the process of gathering intelligence about target domains.

Key benefits:

  • Multiple data sources: Wayback Machine, Certificate Transparency, and Common Crawl

  • Flexible filtering: Domain, subdomain, path, and extension filters

  • Multiple output formats: Text, JSON, and CSV support

  • Automatic result management: All searches are saved and can be reviewed later

  • Integration ready: Results integrate seamlessly with other WayHack tools

For more information on viewing and managing search results, see the CLI Tool Mastery guide.