No description
Find a file
2026-04-17 15:29:35 +02:00
converter replacement of the old conversion project. targeting complete conversion from PMWiki to outline 2026-04-17 15:14:18 +02:00
tests replacement of the old conversion project. targeting complete conversion from PMWiki to outline 2026-04-17 15:14:18 +02:00
.gitignore remove helper script 2026-04-17 15:23:54 +02:00
convert.py replacement of the old conversion project. targeting complete conversion from PMWiki to outline 2026-04-17 15:14:18 +02:00
README.md fix readme 2026-04-17 15:29:35 +02:00

pmwiki-to-outline

End-to-end migration guide and tool for moving a PmWiki installation into Outline.

This document walks an admin through the full journey — from copying wiki.d off the PmWiki server to a working import in Outline. Tool internals (architecture, extending rules, tests) are covered at the bottom.

wiki.d/                            wiki-out/
  Main.HomePage          ─→          Main/
  Main.GettingStarted                   HomePage.md
  Recipes.FooBar                        GettingStarted.md
                                      Recipes/
uploads/                                 FooBar.md
  Main/                              attachments/
    image.png                           Main/image.png

TODO — Domain conversion is NOT YET IMPLEMENTED

Absolute-URL links that reference the old PmWiki server by its hostname are currently left unchanged in the output.


Prerequisites

  • Local machine with Python 3.9+ installed (no third-party Python packages required — stdlib only).
  • Access to the PmWiki server via SSH/SFTP to retrieve wiki.d/ and uploads/ directories.
  • Outline workspace access:
    • For bulk ZIP import, you need Outline admin rights (Settings → Preferences → Import).
    • Without admin: you can still drag-and-drop .md files into any Collection you have write access to, one folder at a time.

Step 1 — Copy the PmWiki data to your local machine

PmWiki keeps everything in two directories on the server. Typical paths:

  • wiki.d/ — page files (one file per page, named Group.PageName, no extension)
  • uploads/ — attachments (organized by group: uploads/<Group>/<filename>)

Paths may differ; check the PmWiki install's config.php for the $WorkDir and $UploadDir variables.

Copy both to the directory where you'll run the converter:

rsync -av user@pmwiki-server:/var/www/pmwiki/wiki.d/    ./wiki.d/
rsync -av user@pmwiki-server:/var/www/pmwiki/uploads/   ./uploads/

Step 2 — Run the converter

python3 convert.py \
  --wiki-dir ../wiki.d \
  --uploads-dir ../uploads \
  --out ../wiki-out \
  --report

Flags:

Flag Required Description
--wiki-dir yes Path to PmWiki's wiki.d directory
--uploads-dir no Path to PmWiki's uploads directory (attachments). Omit if you have none.
--out yes Where to write the converted Markdown tree (must be empty, or pass --overwrite)
--overwrite no Allow writing into a non-empty --out directory
--report no After conversion, scan output for any surviving PmWiki syntax and print a per-category summary

The output tree mirrors Outline's Collection/document structure:

wiki-out/
├── Main/
│   ├── HomePage.md
│   ├── GettingStarted.md
│   └── ...
├── Recipes/
│   └── FooBar.md
└── attachments/
    └── Main/
        └── logo.png

Each top-level directory will become a Collection in Outline; the .md files within become documents. Attachments are referenced from page content as ../attachments/<Group>/<filename>.

Step 3 — Package for Outline import

cd wiki-out
zip -r ../wiki.zip .
cd ..

Outline's import accepts a ZIP whose top-level folders become Collections — which is exactly what we produce.

Size note: Outline's bulk import is capped at ~1.5 GB. For most wikis this is fine. If your ZIP exceeds that, split by Collection:

cd wiki-out
for group in */; do
  zip -r "../${group%/}.zip" "$group" attachments
done

You'll import one group at a time.

Step 4 — Import into Outline

  1. In Outline: Settings → Preferences → Import
  2. Choose Markdown
  3. Upload wiki.zip
  4. Wait for the async import to finish (email notification, or refresh the Import page)
  5. Each top-level directory in the ZIP becomes a Collection; files within become documents

4b. Non-admin per-collection (drag-drop)

If you don't have admin rights but have write access to at least one Collection:

  1. Open the target Collection in Outline
  2. Drag-and-drop .md files (or a folder of them) directly into the Collection
  3. Repeat for each Collection

4c. API (for scripted re-imports)

If you expect to re-import (e.g. to refine the converter and re-run), scripting against the Outline API is worth it:

curl -X POST https://outline.example.com/api/documents.import \
  -H "Authorization: Bearer $OUTLINE_TOKEN" \
  -F "file=@Main/HomePage.md" \
  -F "collectionId=$COLLECTION_ID" \
  -F "publish=true"

The response includes a file operation ID; poll fileOperations.info to know when each import completes.

Step 5 — Verify after import

  1. Counts match: the number of documents in Outline matches the .md count in wiki-out/ (find wiki-out -name '*.md' | wc -l).
  2. Spot-check 510 random pages — formatting, links resolve, images render.
  3. Broken-link pass: inside Outline, search for pages containing (.md) in prose — any such literal pattern is a link that didn't resolve. The converter tries to match the output tree, so these should be rare.
  4. Attachment pass: open a page with images. The files should render, not show as broken images.
  5. TODO pass: search for TODO: inside Outline. These are the converter's visible markers for manual attention (see next section).

Known lossy conversions

Construct What the converter does Action for the reviewer
Edit history Not preserved (Outline's import doesn't support per-document history anyway) Optional: keep a separate git-history backup using pmwiki-to-git (see Optional section below)
Authorship All docs land under the importing user Create a dedicated "Import" user in Outline so migrated content is attributable
(:pagelist ...:) Replaced with <!-- TODO: (:pagelist ...:) --> Manually rebuild the page list or delete the TODO
(:include OtherPage:) Replaced with <!-- TODO: (:include OtherPage:) --> Inline the target content or delete
(:redirect OtherPage:) Replaced with a visible pointer: → [OtherPage](./OtherPage.md) <!-- was (:redirect OtherPage:) --> Fine as-is, or edit the line
(:Summary: X:) Stripped silently; summary text lost If summaries matter: switch the converter to <!-- Summary: X --> comments
(:input ...:) form widgets Stripped No Outline equivalent; accept the loss
(:table:)...(:tableend:) advanced tables Replaced with <!-- TODO: PmWiki advanced table --> + inner content (cell markers stripped) Manually reconstruct as a Markdown pipe table
Long-tail custom PmWiki recipes (e.g. (:workadventure-url:), (:jitsi-url:), (:e_preview:)) Left as literal text Decide per-recipe: delete, or replace with meaningful content
Unpaired style markers outside the whitelist (e.g. %Siteem%, %LOCALAPPDATA%) Preserved as literal text Hand-fix on affected pages
URL-encoded UTF-8 bytes like %C3%A4 inside URLs Preserved (correct) Not a bug — these are valid URL syntax
[[SomePage]] with unusual characters (e.g. [, ] in page names) Some edge cases survive as literal [[...]] Manually replace with proper Markdown links

Re-running

The converter is idempotent for a given input. To re-run after refining the converter or cleaning up source pages:

rm -rf wiki-out/
python3 convert.py --wiki-dir ../wiki.d --uploads-dir ../uploads --out ../wiki-out --report

Re-import into a fresh Collection or fresh workspace to avoid duplicate content — Outline imports append, they don't replace.

Troubleshooting

  • "Converted 0 page(s)" — check that --wiki-dir points at the directory that contains files named Group.PageName, not at a parent directory.
  • Parse errors listed after the "Converted N" line — the file names are shown; usually indicates unusual characters in filenames. Rename the source file or skip it and add a note.
  • Mojibake in output (e.g. Lösung instead of Lösung) — the page file's charset isn't being detected correctly. The parser reads the header's charset= field if present (default UTF-8). Check a page file for a charset= line; if it says iso-8859-1 or similar and the file actually contains UTF-8 bytes, UTF-8 decoding will produce replacement chars. Open an issue if you hit this.
  • Import fails with "file too large" — split the ZIP by Collection (see Step 3).
  • Pages appear but images don't render — confirm the attachments/ folder was included in the ZIP at the top level (same level as the Collection folders).

Optional: keep a git history backup

PmWiki's per-page edit history is lost in the Outline migration. If you want a browsable backup with full history, run pmwiki-to-git separately — it's independent of this tool:

go install github.com/oxzi/pmwiki-pagefileformat-go/cmd/pmwiki-to-git@latest
git init pmwiki-git
pmwiki-to-git -pmwiki ./wiki.d -git pmwiki-git

You now have a git repo with one commit per page edit. Useful as a long-term archive alongside the Outline deployment.


The tool

Architecture

Conversion runs in phases (see converter/ — one module per phase):

# Module What it handles Status
0 pagefile.py PmWiki page-file parsing, respecting the charset= header field (default UTF-8) done
pipeline.py Orchestration + code-block protection ([@...@] and (:markup:)...(:markupend:) stashed as placeholders before Phase 1 to survive unchanged through every phase) done
1 inline.py Headings, bold, italic, bullet/numbered lists, (:comment:) done
2 directives.py (:title:), (:if:)/(:else:)/(:ifend:) variants, (:include:), (:redirect:), (:pagelist:), (:div:), display-mode flags (accepts both (:name arg:) and (:name: arg:) separators) done
3 links.py All [[...]] forms: [[Page]], [[Group.Page]], [[Group/Page]], display text, anchors, external URLs, mailto done
4 attachments.py Attach:file.ext (bare and [[...]]-wrapped, with/without caption, same/cross-group) → Markdown image or link (image extensions rendered inline) done
5 tables.py || simple tables → Markdown pipe tables; (:table:)...(:tableend:) → TODO comment + stripped inner content done
6 misc.py %Group%/%Page% PTVs, paired %class%text%% inline styles, lone-marker whitelist strip, strikethrough {-...-}, revision-insertion {+...+}, line continuation \\, horizontal rule ---- done
7 cleanup.py Whitespace normalization, blank line before headings, preserves Markdown hard breaks, leading/trailing whitespace stripped, trailing newline done

Extending

Each rule has a fixture pair in tests/fixtures/:

tests/fixtures/
  headings.pmwiki          ← PmWiki input
  headings.expected.md     ← expected Markdown output

To add a rule:

  1. Add a fixture pair (or extend an existing one) showing the input/output transformation.
  2. Add the regex to the right phase module.
  3. Run pytest — the test harness auto-discovers all fixture pairs.

Running the tests (requires pytest):

pip install pytest
pytest

Credits

Phase 1 regex rules derive from dohliam/pmdown (MIT). Page-file format reference: PmWiki PageFileFormat docs and oxzi/pmwiki-pagefileformat-go (the canonical Go implementation, including revision-history reconstruction which we don't need for Outline).

License

MIT.