Paperless-ngx — Self-Hosted Document Management

A practical guide to deploying, organising, and running a self-hosted Paperless-ngx instance on a home NAS — covering setup, multi-user access, daily operations, and backup.

What Is Paperless-ngx?
Who Is It For?
Solution Architecture
3.1 The Five-Service Stack
3.2 Document Processing Pipeline
3.3 OCR Configuration
Document Organisation System
4.1 The Three Classification Tools
4.2 Document Types
4.3 Tags
4.4 Saved Views
4.5 Automated Workflows
Multi-User Setup
Running It Day-to-Day
6.1 Daily Routine
6.2 Adding Documents
6.3 Exception Handling
6.4 Annual Tasks
Going Further
Repository Contents

1. What Is Paperless-ngx?

Paperless-ngx is a free, open-source document management system you host yourself. It ingests scanned documents and PDFs, runs OCR to make them fully searchable, classifies them with metadata, and archives everything in a structured, browsable web interface.

Key idea: You scan or drop a document into a folder. Paperless-ngx handles the rest — text recognition, archiving, indexing — and makes it searchable in seconds.

Core capabilities

Capability	Description
OCR	Converts scans and image-based PDFs into searchable text using Tesseract
Full-text search	Find any document by content, not just filename
Classification	Organise documents with types, correspondents, tags, and custom fields
Automation	Auto-classify incoming documents using configurable workflow rules
Archive	Produces standardised PDF/A output alongside original files
Web UI	Clean, responsive interface accessible from any browser
API	Full REST API for integration with other tools and scripts
Multi-user	Role-based access with per-user saved views and permissions

2. Who Is It For?

Paperless-ngx is designed for anyone who deals with a recurring flow of documents and wants to stop losing things in folders or filing cabinets.

Everyday household users

Most households accumulate hundreds of documents per year — bank statements, insurance policies, tax records, medical letters, utility bills. Paperless-ngx solves the common problems:

"I can't find that document." — Full-text search across everything, instantly.
"I'm not sure if I kept that." — Automatic ingestion means nothing gets lost.
"Tax time is a nightmare." — Tag documents as they arrive; at tax time, open a saved view.
"I have paper everywhere." — Scan once, discard the paper, find it digitally forever.

Small home offices and freelancers

Centralised storage for invoices, contracts, receipts, and client correspondence
Search by client name, document type, or date range
Export documents for accountants or legal review

Families managing shared documents

Multiple user accounts with separate saved views
Shared correspondents and document types
Personal health records, identity documents, and property papers in one searchable archive

In practice, a non-technical second user can be onboarded with a single one-page guide covering VPN access, the daily inbox routine, and the tag system. The role-based permission model (§5) keeps family access simple without exposing system configuration.

Privacy note: Because Paperless-ngx is self-hosted, your documents never leave your own hardware. No third-party cloud service has access to your files.

3. Solution Architecture

This deployment runs Paperless-ngx on a Synology NAS using Docker containers managed through Synology Container Manager. The same stack can run on any Linux host with Docker.

3.1 The Five-Service Stack

                    ┌─────────────────────────────────────────┐
                    │         paperless-net (bridge)           │
                    │                                          │
  [browser] :8000 ──►  webserver (paperless-ngx)              │
                    │      │          │          │             │
                    │   broker      db (pg)   gotenberg        │
                    │  (redis)    port 5432   port 3000        │
                    │                           │              │
                    │                         tika             │
                    │                        port 9998         │
                    └─────────────────────────────────────────┘

Container	Image	Role
`webserver`	`ghcr.io/paperless-ngx/paperless-ngx:latest`	Django web application — the main interface and task runner
`db`	`postgres:17`	Stores all document metadata, tags, users, and classification data
`broker`	`redis:8`	Task queue — coordinates background OCR and ingestion jobs
`gotenberg`	`gotenberg/gotenberg:8.20`	Converts non-PDF formats (Word, HTML, etc.) to PDF
`tika`	`apache/tika:latest`	Extracts content and metadata from complex file formats

All containers communicate over a single user-defined Docker bridge network. Only the webserver exposes a port (8000) to the host.

3.2 Document Processing Pipeline

When a file is dropped into the consume folder, the following happens automatically:

Drop file into consume/
        │
        ▼
  Format detection
  (Tika + Gotenberg)
        │
        ▼
  OCR processing
  (Tesseract via paperless)
        │
        ▼
  Workflow rules evaluated
  (auto-assign type, correspondent, tags)
        │
        ▼
  Archived PDF/A created
  Original file preserved
        │
        ▼
  Full-text index updated
  Document visible in UI

3.3 OCR Configuration

The OCR pipeline is configured for high-quality archival output:

Setting	Value	Effect
Output format	PDF/A-2b	Long-term archival standard
OCR mode	Force	Processes all pages even if text layer exists
Image DPI	400	High-resolution processing for accurate recognition
Cleaning	clean-final	Applies image correction before OCR
Deskew	Enabled	Corrects tilted scans automatically
Auto-rotate	Enabled	Corrects rotated pages automatically
Colour conversion	Grayscale	Improves contrast; reduces file size
Archive files	Always	Originals are preserved alongside archived copies
Language	`eng`	English OCR; multi-language supported (e.g. `eng+deu`, `eng+fra`)
Tesseract args	`--psm 1 --oem 3`	Automatic page layout detection; best available OCR engine
Barcode detection	Disabled	Skipped to reduce processing time

4. Document Organisation System

Paperless-ngx provides three classification tools. Used together, they make any document findable in seconds without relying on folder structure or filename conventions.

4.1 The Three Classification Tools

Tool	Question it answers	Cardinality
Document Type	What kind of document is it?	One per document
Correspondent	Who sent it, or who is it about?	One per document
Tags	What topics does it relate to?	Many per document

Key principle: Document Type and Correspondent narrow down what and who. Tags handle everything else — year, topic, tax relevance, processing status. Use tags generously.

4.2 Document Types

Document types are broad, stable categories. The goal is a short list that covers all document varieties without becoming granular.

Recommended categories:

Type	Covers
Bank Statement	Monthly or quarterly statements from financial institutions
Invoice	Bills received — utilities, services, subscriptions
Receipt	Proof of payment — purchases, donations, expenses
Payslip	Salary and wage records
Tax Document	Returns, assessments, income statements
Contract	Signed agreements — employment, rental, services
Insurance Policy	Policies, certificates of currency, renewals
Medical Record	Test results, referrals, discharge summaries
Government Notice	Official correspondence from government agencies
Property Document	Titles, rates notices, inspection reports, leases
Correspondence	General letters and emails not covered above
Identity Document	Passports, licences, birth certificates, visas

4.3 Tags

Tags are the most flexible part of the system. A single document can carry many tags, making cross-cutting searches possible.

Recommended tag groups:

Financial year tags — one per document to enable year-based filtering: FY2024, FY2025, FY2026 (add a new one each July)

Tax tags — applied throughout the year so nothing needs to be found at tax time: tax-deductible, income, capital-gain, donation, work-expense, home-office, vehicle, private-health, tax-return

Topic tags — for personal finance, property, and health: superannuation, investment, mortgage, property-home, medical-expense, prescription, dental

Status tags — for managing the processing pipeline: needs-review, important, pending, filed, archive

Tip: Apply needs-review automatically via a catch-all workflow. This creates a reliable inbox of documents that need human attention before being considered filed.

4.4 Saved Views

Saved Views are pre-configured filtered lists that act as one-click shortcuts. Create them once; use them permanently.

Recommended views:

View	Purpose
Needs Review	All documents awaiting classification confirmation
Tax — Current FY	All deductible expenses for the current financial year
Tax — Income Current FY	All income documents for the current year
Recent Documents	Everything added in the last 30 days
Pending Actions	Documents tagged `pending` — awaiting a response or follow-up
Current Insurance	Active insurance policies
Medical Records	All health-related documents

4.5 Automated Workflows

Workflows automatically apply classification rules when a document is ingested. They eliminate most manual classification work.

How they work: - Each workflow has a trigger (e.g. document added), a filter (e.g. title contains a keyword), and actions (e.g. set type, add tag). - Workflows run in priority order — specific rules first, a catch-all last.

Recommended workflow set:

Priority	Purpose	Filter	Actions
10	Tax authority documents	Title contains tax authority name	Set Correspondent; Type: Government Notice; tag: `needs-review`
20	Bank statements	Title contains bank name	Set Correspondent; Type: Bank Statement; tag: `needs-review`
30	Payslips	Title contains employer name	Set Correspondent; Type: Payslip; tags: `income`, `needs-review`
40	Health / Medicare	Title contains health agency name	Set Correspondent; Type: Government Notice; tags: `medical-expense`, `needs-review`
50	Utility bills	Title contains provider names	Type: Invoice; tag: `needs-review`
999	Catch-all (always last)	(none — matches everything)	Tag: `needs-review`

Priority tip: Number specific workflows 10–50. Set the catch-all to 999. This guarantees specific rules always run before the fallback, and leaves room to insert new rules without renumbering.

5. Multi-User Setup

Paperless-ngx supports role-based access with group-level permissions. Using named groups rather than individual account flags keeps the access model explicit and auditable.

Account and group structure

Group	Who	Permissions
`Administrators`	Admin account	Full permissions on all objects — documents, taxonomy, settings, users
`Household`	Day-to-day user; family members	View and edit documents and taxonomy; no system configuration
`ReadOnly`	(reserved)	View only

Two accounts are used in practice:

Admin account — used only for system configuration, user management, and break-glass access. Keep the username non-obvious (not admin). Strong password distinct from the database password.
Power user account — used for all daily document work. Member of Household group.

Additional family members get their own account in the Household group.

Object-level permissions

In addition to group membership, Paperless-ngx requires object-level permissions on documents and taxonomy. When setting up:

Set owner of all existing documents, tags, correspondents, document types, and storage paths to the admin account
Grant view permission to: Administrators group + Household group
Grant edit permission to: Administrators group only

Auto-assign permissions on consumption

Create a workflow that fires on every new document so family members can see newly ingested documents without manual intervention:

Trigger: Document added
Filter: (none)
Actions: Set owner = admin account; grant view to Administrators + Household; grant edit to Administrators

Without this, documents consumed after the initial setup are invisible to non-admin users until permissions are applied manually — a common gotcha.

Design rationale

A named Administrators group (rather than the raw superuser flag) makes the access model consistent and auditable — what the admin account can do is visible in the group permissions list, not implied by a hidden flag. The superuser flag remains available as a break-glass fallback.

6. Running It Day-to-Day

6.1 Daily Routine

Open the Needs Review saved view. Work through each document from top to bottom.

Step	Action	Notes
1	Open the document	Check what workflows auto-assigned
2	Confirm or correct Correspondent	Fix if OCR misidentified the sender
3	Confirm or correct Document Type	Fix if the wrong type was assigned
4	Add Financial Year tag	e.g. `FY2026` for July 2025 – June 2026
5	Add any topic or tax tags	e.g. `tax-deductible`, `income`, `medical-expense`
6	Action required?	If you need to pay, reply, or act — leave `needs-review` on and return once done
7	Remove `needs-review`	Document moves out of the inbox

Under 30 seconds per document once workflows are running. A full week's mail takes about 5 minutes. At tax time, everything is already tagged — just open the Tax saved view.

6.2 Adding Documents

Desktop (Windows): Map the consume folder as a network drive — \\[your-nas-ip]\docker\paperless\consume. Drag and drop PDFs directly in File Explorer. Requires home network or VPN when remote.

Mobile: Use the DS File app — navigate to docker/paperless/consume/, add it to Favourites, then tap + to upload. Works over QuickConnect without VPN.

Phone scanning: Scan paper documents with Microsoft Lens or Adobe Scan (or the iOS built-in document scanner). Save as PDF — not JPG — for better OCR results. Upload to the consume folder via DS File.

Email: Forward relevant emails to a dedicated Gmail address configured for IMAP ingestion. Paperless pulls attachments automatically and processes them like any other consumed file. Manual forwarding (rather than giving the address to senders directly) gives selective control — not every email from a given sender is worth archiving.

Source	Method
Paper	Scan with phone → save as PDF → upload via DS File or SMB
Digital PDF (email, download)	Save to consume folder via SMB or DS File
Screenshot / image	JPG or PNG to consume — Paperless OCRs images as well as PDFs
Email attachment	Forward to dedicated Gmail address (IMAP ingestion)

Naming tip: Rename files descriptively before dropping them in — e.g. bank-statement-mar-2026.pdf. The filename becomes the initial document title, which workflow filters match against.

6.3 Exception Handling

Problem	Symptom	Fix
Duplicate document	Same document ingested twice	Search by correspondent or phrase before classifying; delete the newer copy if confirmed duplicate
Failed OCR	Document has no searchable text	Check Settings → Tasks for errors. Re-scan at 400 DPI minimum, or remove password protection. If re-scanning is not possible, set Title / Type / Correspondent manually — the document is stored regardless
Workflow didn't fire	Document in Needs Review with no auto-assigned fields	OCR likely misread the title. Correct the title manually → re-save → the catch-all workflow fires and moves the document to `filed` automatically
Garbled OCR / wrong language	Text is meaningless	English-only OCR — foreign-language documents store safely but text won't be searchable. Set metadata manually as a workaround

6.4 Annual Tasks

Do these every July at the start of the new financial year:

Task	Where	Details
Add new FY tag	Settings → Tags → Add	e.g. `FY2027` each July
Create Tax saved views for new year	Documents → Filter → Save	Tax — FY2027 (tag: FY2027 + tax-deductible); Tax — Income FY2027 (tag: FY2027 + income)
Clear the inbox	Needs Review view	Classify or action any documents left over from the previous year
Review `filed` documents	Optional	Archive documents older than 5 years by adding the `archive` tag

7. Going Further

Security hardening

These steps should be done before the instance is in regular use:

Step	What to do
PostgreSQL password	Run via Container Manager → db → Terminal: `ALTER USER paperless WITH PASSWORD '[strong-password]';`
Secret key	Generate: `openssl rand -base64 37 \\| tr -d '=+/' \\| cut -c1-50` → set result as `PAPERLESS_SECRET_KEY` env var on the webserver container
Allowed hosts	Set `PAPERLESS_ALLOWED_HOSTS` to `[your-nas-ip],localhost,127.0.0.1,[your-ddns-hostname]`. Django returns HTTP 400 if the Host header is not on this list — the DDNS hostname must be included if you access Paperless by hostname.
Admin account	Rename the default `admin` account to a non-obvious username. Use a strong password distinct from the database password.

Remote access: For a private home deployment, VPN-only access is the practical choice — no extra open port, no TLS certificate to manage, and the VPN tunnel handles transport security. Connect via OpenVPN, then access http://[your-nas-ip]:8000/ or the DDNS hostname from any browser.

For deployments that need internet-facing access without VPN, a reverse proxy (e.g. Nginx, Traefik, or Synology Reverse Proxy) with a Let's Encrypt certificate is the standard path.

Automated backups

A two-layer approach covers both full recovery and document-level recovery:

Layer	What it covers	How
Volume backup	Everything — database, documents, config, Redis state	Back up entire `/volume1/docker/` to an external USB or NAS backup destination. Schedule: three times a week (e.g. Mon/Thu/Sat) via Hyper Backup
Document export	All documents and metadata in a portable, human-readable format	DSM Task Scheduler: daily at 01:00. Script: `docker exec webserver document_exporter ../export`. Output: `/volume1/docker/paperless/export/`

The export produces a manifest.json + documents/ folder that can be imported into a fresh Paperless instance independently of the volume backup.

What the document export does NOT include — these must be re-created manually after a fresh-install restore:

User accounts and passwords
Saved views
Mail rules and email ingestion config
Automation workflows
Environment variables (PAPERLESS_SECRET_KEY, PAPERLESS_ALLOWED_HOSTS)

Restore

Scenario	When to use	Approach
Add missing documents	Database intact, some documents missing	`docker exec webserver document_importer ../export`
Full wipe and reimport	Corrupted data, fresh database needed	Stop webserver → drop and recreate the PostgreSQL schema → start webserver → wait 30 s → reimport → recreate superuser
Disaster recovery (new NAS)	Total hardware loss	Restore `/volume1/docker/` from volume backup → recreate container stack → reimport from export if pgdata is unusable

After any restore: verify documents are visible, tags and correspondents are present, the consume folder is being monitored, and backup jobs are re-enabled.

Email ingestion

Paperless-ngx can pull email attachments via IMAP and process them exactly like files dropped into the consume folder.

Gmail setup:

Setting	Value
Dedicated address	Create a separate Gmail account (e.g. `[yourname]-paperless@gmail.com`)
App password	Google Account → Security → 2-Step Verification → App passwords (16-character)
IMAP server	`imap.gmail.com`, port `993`, SSL/TLS

Mail rule (catch-all):

Setting	Value
Order	999
Action	Consume attachments
Assign tags	`needs-review`
After processing	Mark email as read

Forward relevant emails to the dedicated address rather than giving it directly to senders — this keeps selective control over what gets archived. Because Gmail rewrites the From: header on forwarded mail, per-sender filter rules don't work reliably; a single catch-all rule handles everything, with document workflows doing classification after OCR.

Optional enhancements

Feature	What it adds
Storage Paths	Auto-organise archived files into a folder structure — e.g. `{correspondent}/{document_type}/{created_year}/` produces `ATO/Government Notice/2025/[title].pdf`
Custom Fields	Add structured metadata to documents: Amount, Account Number, Policy Number, Expiry Date, Reference Number
Share Links	Generate temporary read-only links to individual documents — no account required for the recipient
API integration	Use the REST API to trigger ingestion, query documents, or integrate with home automation tools

Ongoing maintenance

Task	Frequency
Add new financial year tag and saved views	Annually (start of each fiscal year)
Update container images	Every 2–3 months
Check task queue for failed jobs	Monthly
Verify backups are completing successfully	Monthly
Annual restore test — confirm export imports cleanly to a fresh instance	Annually
Review and archive old status-tagged documents	Quarterly

8. Repository Contents

This repository contains deployment documentation and configuration references for the above setup. It is not the Paperless-ngx source code.

File	Description
`Paperless-ngx_Complete_Reference/`	Full deployment reference split across 8 files: architecture, containers, volumes, environment variables, OCR settings, users and permissions, backup and restore, and quick reference
`paperless-ngx-organisation-guide.md`	Complete taxonomy guide — document types, correspondents, tags, saved views, workflows, and day-to-day process
`paperless-ngx-day-to-day-sop.md`	Day-to-day SOP — inbox routine, scanning, bulk import, exception handling, phone workflow, and annual tasks
`paperless-ngx-next-steps.md`	Actionable checklist — UI setup tasks, security improvements, backup automation, and optional enhancements
`Paperless-ngx_Restore_Procedure.md`	Step-by-step restore procedures: add missing documents and full wipe + reimport
`rbac-design.md`	RBAC design — accounts, groups, object-level permissions, and implementation steps
`gabi-onboarding-guide.md`	Non-technical quick guide for household users — access, daily routine, classification, and quick reference
`Paperless-ngx.md`	Original per-container setup notes for Synology Container Manager GUI
`OCR Settings.md`	OCR parameter reference with descriptions
`Paperlessngx - Poweruser.md`	Power user permission template — full CRUD matrix
`Paperlessngx-FolderCheckScript.md`	Bash script to verify folder structure and container volume mounts

Built with Paperless-ngx — open-source document management.