Source
githubarchive.month.*
Date range
2021 – 2026
Granularity
User × Month
Repo groups
12 + 2 isolated

What it does

Scans every GitHub event from January 2021 through March 2026, keeps only events touching AI-relevant repositories matched by repo name via regex, then aggregates per user per calendar month into a weighted score.

Final score = sum of (event-type weight × repo-tier multiplier) across all matching events in that month. The result is streamed into PostgreSQL by load_scores.py.

Output columns

ColumnTypeDescription
usernameSTRINGGitHub actor login
periodDATEFirst day of the calendar month
scoreFLOATSum of event weight × repo multiplier
total_eventsINTRaw event count for that user-month
unique_reposINTDistinct repos touched
event_varietyINTDistinct event types used
efficiencyFLOATAverage weighted score per event (score / total_events)

Event-type weights

Event typeWeightRationale
ReleaseEvent30Shipping software publicly
PullRequestReviewEvent25High-effort code review
DeploymentEvent25Production deployment
PullRequestEvent20Substantive contribution
CreateEvent15Branch / tag / repo creation
CheckRunEvent15CI automation
PushEvent10Code commit
WorkflowRunEvent10Workflow execution
MemberEvent10Team management
IssuesEvent5Issue opened/closed
IssueCommentEvent4Discussion participation
ForkEvent2Passive interest
WatchEvent1Star / watch
all others1Baseline

Repo-tier multipliers

TierMultiplierPattern examples
ELITE L5-L6 5.0× grpo, vllm, sglang, triton-lang, sae-lens, unsloth, deepseek-r1, transformer-lens, cuda-mode
AGENTIC L4 3.5× langgraph, pydantic-ai, mcp-server, model-context-protocol, fastmcp, crewai, dspy, dify, langflow
BUILDER L3 2.0× vercel-ai, ai-sdk, langchain, llamaindex, huggingface, trl, axolotl, llama-factory, ragas, deepeval
VIBE/USER L1-L2 1.5× cursor, aider, bolt-new, ollama, open-webui, lm-studio, lmstudio
BASELINE 1.0× everything else that passes the WHERE filter

Repository filter — 12 groups

Any event whose repo.name matches at least one of these patterns is included. Two patterns are isolated for correct dot-escaping (llama\.cpp, llm\.c).

1 · Core Ecosystem
huggingfacetransformersdiffusers peftacceleratesafetensors hf-datasetstokenizershf-hub
2 · Agents & Orchestration
langgraphcrewaipydantic-ai autogenmemgptbrowser-use smolagentsphidataagno openhandse2bagentic langchainllamaindexhaystack difylangflow
3 · Reasoning & Advanced Models
deepseek-v3deepseek-r1open-r1 grpoverifiersprm reasoning-tracemeta-llamamistral-ai google-geminianthropic-sdkclaude-code gemmaphi-
4 · Interpretability & Alignment
transformer-lensnnsightsae-lens sparse-autoencoderlogit-lensactivation-patching alignment-handbookdpo-trainerrlhf reward-modelmath-verify
5 · Training & Hardware Kernels
unslothaxolotlllama-factory torchtunetorchaotrl qlorabitnettriton-lang modular-mojocuda-modenanogpt
6 · Protocols & Infra
model-context-protocolmcp-serverfastmcp vllmsglangtensorrt-llm ggufexl2mlx-lm ollamaopen-webuilitellm
7 · Vector Stores & RAG
lancedbchromadbpgvector weaviateqdrantmilvus graphraglightragcolpali ragatouillecolbert
8 · Evaluation & Observability
ragasdeepevalpromptfoo langsmithlm-evaluation-harnesslighteval
9 · Generative Media
flux-1black-forest-labsstable-diffusion comfyuikokoromochi ltx-videowhisper
10 · AI Coding Tools
aidercontinue-devtabbyml coderabbitgreptile
11 · Qwen & Chinese Ecosystem
qwenchatglmcogvlm codegeexzhipu
12 · MLOps & Deployment
modal-labsgroq-api bentomlreplicate-ai
Isolated (dot-escaped)
llama.cppllm.c

Full query

-- GitHub AI Talent - BigQuery Aggregated Monthly Scores
-- Source: githubarchive.month.* | Range: 202101–202603

SELECT
  actor.login                                       AS username,
  DATE_TRUNC(DATE(created_at), MONTH)               AS period,

  -- score = SUM(event_weight × repo_multiplier)
  SUM(
    CASE
      WHEN type = 'ReleaseEvent'           THEN 30
      WHEN type = 'PullRequestReviewEvent' THEN 25
      WHEN type = 'DeploymentEvent'        THEN 25
      WHEN type = 'PullRequestEvent'       THEN 20
      WHEN type = 'CreateEvent'            THEN 15
      WHEN type = 'PushEvent'              THEN 10
      WHEN type = 'IssuesEvent'            THEN 5
      WHEN type = 'IssueCommentEvent'      THEN 4
      WHEN type = 'ForkEvent'              THEN 2
      WHEN type = 'WatchEvent'             THEN 1
      ELSE 1
    END
    *
    CASE
      WHEN REGEXP_CONTAINS(LOWER(repo.name), r'grpo|vllm|triton-lang|...') THEN 5.0  -- ELITE
      WHEN REGEXP_CONTAINS(LOWER(repo.name), r'langgraph|crewai|...')     THEN 3.5  -- AGENTIC
      WHEN REGEXP_CONTAINS(LOWER(repo.name), r'langchain|llamaindex|...')  THEN 2.0  -- BUILDER
      WHEN REGEXP_CONTAINS(LOWER(repo.name), r'cursor|aider|ollama|...')  THEN 1.5  -- VIBE/USER
      ELSE 1.0
    END
  )                                                 AS score,

  COUNT(*)                                          AS total_events,
  COUNT(DISTINCT repo.name)                         AS unique_repos,
  COUNT(DISTINCT type)                              AS event_variety,
  ROUND(score / total_events, 2)                    AS efficiency

FROM `githubarchive.month.*`

WHERE _TABLE_SUFFIX BETWEEN '202101' AND '202603'
  AND (
    REGEXP_CONTAINS(LOWER(repo.name), r'llama\.cpp')
    OR REGEXP_CONTAINS(LOWER(repo.name), r'llm\.c')
    OR REGEXP_CONTAINS(LOWER(repo.name), /* 12-group pattern */)
  )

GROUP BY actor.login, DATE_TRUNC(DATE(created_at), MONTH)
ORDER BY username, period;