This document was automatically generated from the README in the test folder.
NewsFork Seeds - Test Directory
Section titled βNewsFork Seeds - Test DirectoryβThis directory is a separate project for testing GitHub Actions-based automation pipelines.
π Directory Structure
Section titled βπ Directory Structureβtest/βββ .github/ # GitHub Actions μν¬νλ‘μ°β βββ workflows/ # CI/CD μν¬νλ‘μ°β β βββ research-engine.yml # Phase 1: URL λ°κ²¬ & Dataset μμ±β β βββ analysis-engine.yml # Phase 2: Seed ν보 λΆμβ β βββ contract-engine.yml # Phase 3: Seed JSON μμ±β β βββ ci.test.yml # ν΅ν© CI ν
μ€νΈβ βββ scripts/ # CI/CD μ€ν¬λ¦½νΈβ β βββ ci.shβ β βββ steps/β βββ README.mdββββ cli/ # CLI μ€νκΈ°β βββ research-engine.ts # Research λ¨κ³ μ€νκΈ°β βββ analysis-engine.ts # Analysis λ¨κ³ μ€νκΈ°β βββ contract-engine.ts # Contract λ¨κ³ μ€νκΈ°ββββ schemas/ # Zod μ€ν€λ§ (Single Source of Truth)β βββ domain.schema.ts # Domain/Seed ID κ²μ¦β βββ research.schema.ts # Research/Liveness κ²μ¦β βββ seed.schema.ts # Enhanced Seed V1 κ²μ¦β βββ index.ts # ν΅ν© exportββββ services/ # μμ ν¨μ λΉμ¦λμ€ λ‘μ§β βββ domain.service.ts # λλ©μΈ μ κ·ν (Pure Functions)β βββ research.service.ts # 리μμΉ λ°κ²¬ (Pure Functions)β βββ seed.service.ts # μλ μμ± (Pure Functions)β βββ index.ts # ν΅ν© exportββββ utils/ # ν΅μ¬ μ νΈλ¦¬ν°β βββ domain-normalizer.ts # λ κ±°μ νΈνμ±β βββ kv-deduplication.ts # KV λ μ§μ€νΈλ¦¬ββββ research/ # Research κ²°κ³Ό (ν
μ€νΈ λ°μ΄ν°)β βββ datasets/β βββ liveness/β βββ blocked/β βββ dead/ββββ analysis/ # Analysis κ²°κ³Ό (ν
μ€νΈ λ°μ΄ν°)β βββ country=sg/ββββ results/ # μ΅μ’
κ²°κ³Ό (ν
μ€νΈ λ°μ΄ν°)β βββ seeds/drafts/ββββ services.test.ts # μλΉμ€ λ μ΄μ΄ ν
μ€νΈβββ kv-deduplication.test.ts # KV μμ€ν
ν
μ€νΈβββ README.md # μ΄ νμΌπ GitHub Actions Workflow
Section titled βπ GitHub Actions Workflowβ1. Research Engine (research-engine.yml)
Section titled β1. Research Engine (research-engine.yml)βTrigger: Daily at 2:00 AM UTC or manual execution
Stages:
- Phase 1-A: Liveness Check (Domain Accessibility Verification)
- Phase 1-B: URL Discovery (Google Search, crt.sh, Domain Directory)
- Dataset Generation (Generate Immutable Research Dataset)
Output: test/research/datasets/ , test/research/liveness/
CLI: npm run research:full -- --country=SG --category=news
2. Analysis Engine (analysis-engine.yml)
Section titled β2. Analysis Engine (analysis-engine.yml)βTrigger: Auto-exec after Research Engine completion or manual execution
Steps:
- Load Research Registry
- Domain analysis (KV duplicate check)
- Generate Seed Candidates
- Candidate validation
Output: test/analysis/country=*/
CLI: npm run analysis:full -- --country=SG --category=news
3. Contract Engine (contract-engine.yml)
Section titled β3. Contract Engine (contract-engine.yml)βTrigger: Manually triggered by Analysis Engine or manually executed
Steps:
- Load Seed Candidates
- Collect R2 Assets (robots.txt, sitemap.xml, homepage.html)
- Generate Enhanced Seed V1
- Perform Legal Compliance Check
- Upload to R2 Storage
Output: test/results/seeds/drafts/ , Cloudflare R2
CLI: npm run contract:full -- --country=SG --category=news --fetch-r2=true
π οΈ Local Development and Testing
Section titled βπ οΈ Local Development and TestingβRun Individual Steps
Section titled βRun Individual Stepsβ# Research λ¨κ³npm run research:liveness -- --country=SG --category=newsnpm run research:discovery -- --country=SG --category=news --max-results=50npm run research:dataset -- --country=SG --category=news
# Analysis λ¨κ³npm run analysis:registry -- --country=SG --category=newsnpm run analysis:domains -- --country=SG --category=newsnpm run analysis:candidates -- --country=SG --category=news
# Contract λ¨κ³npm run contract:generate -- --country=SG --category=news --fetch-r2=truenpm run contract:validate -- --country=SG --category=newsnpm run contract:legal -- --country=SG --category=newsRun Full Pipeline
Section titled βRun Full Pipelineβ# μ 체 Research νμ΄νλΌμΈnpm run research:full -- --country=SG --category=news
# μ 체 Analysis νμ΄νλΌμΈnpm run analysis:full -- --country=SG --category=news
# μ 체 Contract νμ΄νλΌμΈnpm run contract:full -- --country=SG --category=news --fetch-r2=trueTest Execution
Section titled βTest Executionβ# μλΉμ€ λ μ΄μ΄ ν
μ€νΈnpm test test/services.test.ts
# KV μμ€ν
ν
μ€νΈnpm test test/kv-deduplication.test.ts
# λͺ¨λ ν
μ€νΈnpm test test/π Data Flow
Section titled βπ Data Flowβ1. Research Phase
Section titled β1. Research Phaseβ- Input: Country/Category Parameters
- Processing: URL Discovery, Domain Normalization, Liveness Check
- Output: Immutable Research Dataset (JSON)
2. Analysis Phase
Section titled β2. Analysis Phaseβ- Input: Research Dataset
- Processing: Build domain registry, KV duplicate check, Generate Seed candidates
- Output: Seed Candidates (JSON)
3. Contract Phase
Section titled β3. Contract Phaseβ- Input: Seed Candidates
- Processing: R2 Asset Collection, Enhanced Seed V1 Generation, Legal Verification
- Output: Seed Contracts (GitHub) + R2 Assets (Cloudflare R2)
βοΈ Environment Variable Setup
Section titled ββοΈ Environment Variable SetupβSecrets used in GitHub Actions:
# Cloudflare KV (Analysis λ¨κ³)CLOUDFLARE_KV_ACCOUNT_IDCLOUDFLARE_KV_NAMESPACE_IDCLOUDFLARE_API_TOKEN
# Cloudflare R2 (Contract λ¨κ³)CLOUDFLARE_R2_ACCOUNT_IDCLOUDFLARE_R2_ACCESS_KEY_IDCLOUDFLARE_R2_SECRET_ACCESS_KEYCLOUDFLARE_R2_BUCKET
# NewsFork μ€μ NEWSFORK_DEDUP=trueποΈ Principal Architect Design Principles
Section titled βποΈ Principal Architect Design PrinciplesβSOLID + GitHub Actions Optimization
Section titled βSOLID + GitHub Actions Optimizationβ1. Single Responsibility
- Each workflow handles only one step
- CLI scripts have clear command separation
2. Dependency Inversion
- Service layer implemented as pure functions
- GitHub Actions calls services only via CLI
3. Open/Closed
- Adding new countries/categories requires only parameter changes, no code modification
- New Research Methods can be added
4. Separation of Concerns
- Research (discovery) β Analysis (analysis) β Contract (contract) are completely separated
- Each step can be executed and tested independently
π― Advantages
Section titled βπ― Advantagesβ- Automation: Daily automated execution for latest domain discovery
- Scalability: Parallel processing possible by country/category
- Stability: Verification and error handling at each stage
- Traceability: All results version-controlled on GitHub
- Cost Efficiency: Utilizes GitHub Actions free quota
- Legal Safety: Automated compliance checks
π Notes
Section titled βπ NotesβThis directory is the main projectβs test pipeline. Actual tests for the main project run in the .test.ts files within the src/ directory.
Main project test execution:
# νλ‘μ νΈ λ£¨νΈμμpnpm testRelated Documentation
Section titled βRelated Documentationβ- Project README
- ](/v1/guides/research/)
- ](/v1/guides/seeds/)
- Architecture Guidelines