mirror of https://gitea.jeffemmett.com/jeffemmett/p2p-translation-cache.git synced 2026-06-25 16:35:19 +02:00

Lazy AI translation cache for p2pwiki and p2pblog (FastAPI + Redis + LiteLLM)

Python 49.2%
JavaScript 23.9%
PHP 11.1%
Shell 9.2%
CSS 4.3%
Other 2.3%

Find a file

Jeff Emmett 9aae226ceb runbook: mark consolidation Step 3 complete + document 2026-05-21 legacy-link cleanup The runbook had been stuck at "Step 3 pending, awaiting wp-admin wizard" since 2026-05-10, but the actual import + Polylang language tagging had been completed at some undocumented later point. The `wp_term_taxonomy.count` cache lagged at 0 for fr/el/nl which made the runbook look correct; the real `wp_term_relationships` rows tell the true story. Verified 2026-05-21 (cache refreshed via `wp term recount language`): English 19,473 / Nederlands 314 / Français 210 / Ελληνικά 173 Per-source matches by guid LIKE: bloggr=171, blogfr=205, blognl=312 — exactly matching source DB published-post counts. Cleanup also done 2026-05-21: 164 posts that still held legacy blog{gr,fr,nl}.p2pfoundation.net references in post_content are now all rewritten (uploads re-hosted at /wp-content/uploads/migrated-<src>/, ?p=NN permalinks remapped to new IDs via post_name join, bare-domain links rewritten to /<lang>/). Final residual count: 0/0/0. guids correctly preserved (immutable identifiers). Pre-cleanup backup at /opt/backups/p2p-blog-pre-step3-2026-05-20/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-21 01:12:34 +01:00
app	llm: bump LiteLLM timeout to 360s	2026-05-09 21:08:27 -04:00
tests	Initial: FastAPI+Redis lazy translation cache for p2pwiki/p2pblog	2026-05-09 20:23:06 -04:00
wp-plugin	runbook: mark consolidation Step 3 complete + document 2026-05-21 legacy-link cleanup	2026-05-21 01:12:34 +01:00
.env.example	Initial: FastAPI+Redis lazy translation cache for p2pwiki/p2pblog	2026-05-09 20:23:06 -04:00
.gitignore	Initial: FastAPI+Redis lazy translation cache for p2pwiki/p2pblog	2026-05-09 20:23:06 -04:00
docker-compose.yml	Initial: FastAPI+Redis lazy translation cache for p2pwiki/p2pblog	2026-05-09 20:23:06 -04:00
Dockerfile	Initial: FastAPI+Redis lazy translation cache for p2pwiki/p2pblog	2026-05-09 20:23:06 -04:00
entrypoint.sh	Initial: FastAPI+Redis lazy translation cache for p2pwiki/p2pblog	2026-05-09 20:23:06 -04:00
README.md	Initial: FastAPI+Redis lazy translation cache for p2pwiki/p2pblog	2026-05-09 20:23:06 -04:00
requirements.txt	Initial: FastAPI+Redis lazy translation cache for p2pwiki/p2pblog	2026-05-09 20:23:06 -04:00

README.md

p2p-translation-cache

Lazy-translation cache service for p2pwiki and p2pblog. Reader picks a target language, service translates that one article on first request via LiteLLM (qwen-coder default), caches the result keyed by (source, id, revision, lang). Subsequent readers hit cache.

API

`POST /translate`

{
  "source": "wiki",
  "id": "Commons",
  "revision": "75811",
  "lang": "fr",
  "html": "<p>The commons is...</p>"
}

Returns:

{
  "translated_html": "<p>Les communs sont...</p>",
  "cached": false,
  "model": "qwen-coder",
  "elapsed_ms": 4823
}

`GET /health`

{ "status": "ok", "redis": true, "llm": true }

`GET /stats`

{ "cache_hits": 142, "cache_misses": 38, "llm_errors": 0, "inflight": 0 }

Cache

Key: translate:{source}:{lang}:{sha256(source|id|revision|lang)[:32]}
No TTL; revision change in the client produces a new key, so old entries are inert until evicted by Redis allkeys-lru (capped at 512 MB).

Inflight dedup

When N readers request the same (source, id, revision, lang) simultaneously, only one LiteLLM call fires; the others await the same future and return together. Avoids duplicate spend on cold pages.

Output sanitization

LLM output is passed through bleach with a wiki-friendly tag/attribute allowlist. <script> and event-handler attributes are stripped. Models sometimes wrap output in ```html ... ``` despite instructions; the fences are stripped before sanitization.

Deploy

cp .env.example .env
# fill in LITELLM_API_KEY
docker compose up -d --build
curl https://translate.p2pfoundation.net/health

Why qwen by default

Free via the project's existing LiteLLM stack. Acceptable quality for major European languages (FR, ES, DE, NL, IT, PT). Long-tail languages may want upgrading to a paid model — change LITELLM_MODEL in .env and bounce.