Files
cg_api_secure-webshare/README.md

446 lines
20 KiB
Markdown

# cg.cx
> End-to-end encrypted content sharing via Telegram — with a modern web frontend.
**cg.cx** is a privacy-first file and text sharing platform built as a Telegram bot and Axum web service. Users upload content through a Telegram bot; the service encrypts every file with unique per-content keys, stores them securely, and shares them via short 12-character IDs. Recipients view or download content through a lightweight Svelte 5 web interface with automatic decryption on the fly.
---
## Project Overview
### What it is
cg.cx lets Telegram users upload media, documents, or plain text and receive a short shareable link (`https://cg.cx/?cxid=AbCdEfGhIjKl`). All content is encrypted at rest using **XChaCha20-Poly1305** with per-file content encryption keys (CEKs) wrapped by a master key. The server never sees plaintext.
### Key Features
| Feature | Description |
| --------------------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
| **End-to-End Encryption** | Every file is encrypted with a unique CEK using XChaCha20 secretstream; only the server (with the master key) can decrypt for delivery. |
| **Short Shareable IDs** | Content is addressed by 12-character alphanumeric IDs (e.g., `AbCdEfGhIjKl`). |
| **Auto-Destruct** | Uploaders can set a max view count; content self-destructs once the limit is reached. |
| **Password Protection** | Optional per-content passwords with Argon2id-hashed verification and HMAC-SHA256 session cookies. |
| **Admin Moderation** | Blacklist / whitelist user IDs, delete content, review reports via Telegram admin groups. |
| **Reporting** | Users can report content; reports are routed to review groups with inline admin actions. |
| **Streaming Decryption** | Large encrypted files are decrypted and streamed chunk-by-chunk without loading into memory. |
| **Content Typing & Safety** | Automatic MIME detection and render flags flag dangerous/executable files for safe handling. |
### Architecture at a Glance
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Telegram User │────▶│ cgcx-bot │────▶│ cgcx-server │
│ (upload / cmd) │ │ (Teloxide) │ │ (Axum / web) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ cgcx-file- │────▶│ Svelte 5 │
│ pipeline │ │ Frontend │
│ (encrypt/store) │ │ (viewer) │
└─────────────────┘ └─────────────────┘
┌─────────────────┐
│ SQLite3 + WAL │
│ (metadata) │
└─────────────────┘
```
---
## Architecture
cg.cx is organized as a **Rust workspace** with 10 focused crates. This modular design separates concerns, enables independent unit testing, and allows the bot and server binaries to pull in only the crates they need.
| Crate | Purpose |
| --------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `cgcx-core` | Shared domain types: `ContentId`, `User`, `Content`, `ContentFile`, `Report`, error enums, and result types. Zero external dependencies beyond `serde` and `chrono`. |
| `cgcx-config` | Hierarchical configuration loader (`config/default.toml``config/local.toml``CGCX_*` env vars) with validation. |
| `cgcx-crypto` | Cryptographic primitives: XChaCha20 secretstream encryption/decryption, AES-KW key wrapping, BLAKE3 hashing, master key loading. |
| `cgcx-db` | SQLite access layer with `rusqlite`, embedded migrations (`rusqlite_migration`), and async repository patterns for users, content, files, reports, and admin actions. |
| `cgcx-storage` | Filesystem abstraction: path generation by MIME type, directory creation, temp file handling, and cleanup. |
| `cgcx-content-typing` | MIME type detection (`infer` + `mime_guess`) and render-flag computation for safe UI handling of dangerous files. |
| `cgcx-file-pipeline` | High-level upload orchestration: ingests raw bytes, detects type, encrypts via `cgcx-crypto`, stores via `cgcx-storage`, and records metadata via `cgcx-db`. |
| `cgcx-moderation` | Runtime moderation lists (blacklist / whitelist) loaded from JSON, with configurable share modes (`b` = blocklist, `w` = allowlist) and auto-reload. |
| `cgcx-bot` | **Binary crate** — Telegram bot built on `teloxide`. Handles dialogue flows, uploads, terms acceptance, reporting, and admin commands. |
| `cgcx-server` | **Binary crate** — Axum HTTP server. Serves the Svelte frontend, streams decrypted files, enforces view limits, and validates password cookies. |
### Why a Modular Crate Structure?
- **Separation of concerns**: Crypto logic cannot accidentally depend on Telegram bot internals; the database layer knows nothing about HTTP.
- **Testability**: Each crate can be unit-tested in isolation. `cgcx-core` and `cgcx-crypto` have no async runtime requirements, making them fast to test.
- **Independent deployment**: In the future, the bot and server could be built as separate container images sharing only the library crates.
- **Compile-time enforcement**: The workspace dependency graph guarantees that, for example, `cgcx-crypto` never touches the network or filesystem directly.
---
## Security Design
### Cryptographic Primitives
| Layer | Algorithm | Purpose |
| -------------------- | ------------------------------------- | ---------------------------------------------------------------------------------------------------- |
| **Secretstream** | XChaCha20-Poly1305 (libsodium) | Encrypts file plaintext into an authenticated ciphertext stream. |
| **Key Wrapping** | AES-KW (AES-256 Key Wrap, RFC 3394) | Wraps each per-file CEK with the master key. |
| **Integrity Hash** | BLAKE3 | Computes a hash over the ciphertext stream (including the secretstream header) for tamper detection. |
| **Password Hashing** | Argon2id | Hashes optional per-content passwords. |
| **Cookie MAC** | HMAC-SHA256 | Integrity MAC for password-verification session cookies using constant-time comparison. |
| **ID Entropy** | Rejection sampling over `[A-Za-z0-9]` | 12-character IDs provide ~71 bits of entropy. |
### Encryption Flow
1. **Generate CEK**: For every uploaded file, `cgcx-crypto` generates a random 256-bit `ContentKey`.
2. **Encrypt**: The file is fed through `sodiumoxide::crypto::secretstream::xchacha20poly1305` in chunks (up to 1 MiB). The final chunk is tagged `Final`.
3. **Hash**: A running BLAKE3 hash covers the secretstream header and every ciphertext chunk.
4. **Wrap Key**: The CEK is wrapped with AES-KW using the 256-bit master key. The wrapped key + a version byte is stored in SQLite.
5. **Store**: The ciphertext file is moved from temp storage to its final path (`data/media|documents|text/<cxid>/...`).
### Decryption Flow
1. **Unwrap CEK**: The server unwraps the per-file CEK using the master key.
2. **Init Stream**: `DecryptStream` is initialized with the stored secretstream header.
3. **Stream**: Ciphertext is read from disk in ~1 MiB chunks, decrypted, and pushed to the HTTP response body via a Tokio channel.
4. **Verify**: If decryption fails (tampered or truncated data), the stream aborts and the client receives a broken stream.
### Password Protection
- Passwords are hashed with **Argon2id** and stored in the `contents` table.
- On successful verification, the server issues an `__Host-pw` cookie containing a base64-encoded `cxid:MAC` pair.
- The MAC is computed via **HMAC-SHA256** over the content ID using a server-side secret (derived from the master key).
- Cookie attributes: `Secure`, `HttpOnly`, `SameSite=Strict`, `Max-Age=3600`.
### Master Key Handling
- The master key is a 256-bit value loaded from either an environment variable (`CGCX_AES_MASTER_KEY`) or a file.
- If loaded from a file, the key is expected as 64 hex characters.
- On Unix systems, newly generated key files are automatically chmodded to `0o600`.
- The key fingerprint (first 8 bytes of BLAKE3 hash) is logged at startup for audit purposes; the full key is never logged.
---
## Tech Stack
| Layer | Technology |
| ----------------- | --------------------------------------------------------------------------- |
| **Backend** | Rust (edition 2021), Tokio async runtime |
| **Web Server** | Axum 0.7, Tower HTTP middleware |
| **Telegram Bot** | Teloxide 0.13 |
| **Frontend** | Svelte 5, Vite 5 |
| **Database** | SQLite 3 (WAL mode), `rusqlite` + `rusqlite_migration` |
| **Cryptography** | libsodium (via `sodiumoxide`), `aes-kw`, `blake3`, `argon2`, `hmac`, `sha2` |
| **Serialization** | `serde`, `serde_json` |
| **Observability** | `tracing` + `tracing-subscriber` |
---
## Prerequisites
- **Rust** toolchain (latest stable or nightly; the project builds on stable Rust 1.78+)
- **Node.js** 20+ and `npm` (for the frontend)
- **SQLite 3** (bundled via `rusqlite`, but the CLI is useful for inspection)
- A **Telegram Bot Token** from [@BotFather](https://t.me/botfather)
- A 256-bit master key (64 hex characters) for encryption
---
## Building
### Rust Workspace
Build all crates (library + binaries):
```bash
cargo build --workspace
```
Build optimized release binaries:
```bash
cargo build --workspace --release
```
The release profile enables thin LTO, single codegen unit, and binary stripping for minimal size.
### Frontend
```bash
cd frontend
npm install
npm run build
```
The static assets are emitted to `frontend/dist/` and served by `cgcx-server` at runtime.
---
## Configuration
cg.cx uses a layered configuration system:
1. `config/default.toml` — committed defaults
2. `config/default.example.toml` — local overrides (gitignored)
3. `CGCX_*` environment variables — runtime overrides
Environment variables use double-underscore as a separator, e.g.:
```bash
export CGCX_SERVER__PORT=3000
export CGCX_TELEGRAM__BOT_TOKEN="your_token_here"
export CGCX_CRYPTO__AES_MASTER_KEY_SOURCE__TYPE="env"
export CGCX_CRYPTO__AES_MASTER_KEY_SOURCE__VAR="CGCX_AES_MASTER_KEY"
export CGCX_AES_MASTER_KEY="aabbccdd..." # 64 hex chars
```
### Config Sections
| Section | Description |
| ----------------------------- | ----------------------------------------------------------------------------------------------------- |
| `[content]` | Auto-destruct behavior (`keep_content`, `share_mode`, `default_allow_download`, `default_max_views`). |
| `[crypto]` | Master key source (`env` or `file`). |
| `[telegram]` | Bot token and optional custom API URL. |
| `[groups]` | `admin_group_ids` and `review_group_ids` (Telegram chat IDs). |
| `[storage]` | Filesystem paths for `media`, `documents`, `text`, `temp`, and the streaming chunk size. |
| `[upload_limits]` | `max_batch_size`, `max_file_size_bytes`, `max_total_batch_bytes`. |
| `[server]` | `base_url`, `bind_address`, `port`. |
| `[rate_limiting]` | Per-minute request limits, burst capacity, and password-attempt limits. |
| `[logging]` | `level` (e.g., `info`, `debug`). |
| `[frontend.behavior_toggles]` | Feature flags for retro animations and particles. |
### Validating Config
Both binaries validate configuration on startup. Key checks include:
- Chunk size between 8 MiB and 256 MiB
- Bot token is set and not the placeholder
- Upload and rate-limiting values are non-zero
- Master key source is fully specified
---
## Running
### Run the Web Server
```bash
cargo run -p cgcx-server
```
The server binds to `127.0.0.1:8080` by default and serves:
- `/` — Svelte frontend
- `/api/health` — health check
- `/api/content/:cxid` — metadata JSON
- `/api/content/:cxid/verify-password` — password verification
- `/api/content/:cxid/file/:file_idx` — streamed decrypted file
- `/assets/*` — static frontend assets
### Run the Telegram Bot
```bash
cargo run -p cgcx-bot
```
The bot processes updates from Telegram, handles user dialogues, and triggers the file pipeline for uploads.
### Run Both Simultaneously
Because the bot and server are separate binaries, they can run side-by-side sharing the same SQLite database and data directories:
```bash
# Terminal 1
cargo run -p cgcx-server
# Terminal 2
cargo run -p cgcx-bot
```
Ensure both processes point to the same database path and storage directories via shared configuration.
---
## Database Migrations
Migrations are managed by `rusqlite_migration` and embedded into the `cgcx-db` crate at compile time.
- `migrations/001_init.sql` — Creates `users`, `contents`, `content_files`, `reports`, and `admin_actions` tables.
- `migrations/002_indexes.sql` — Adds performance indexes on foreign keys, status columns, and report state.
On startup, both the bot and server call `db.run_migrations()`, which applies any pending migrations automatically. The database is opened with:
- `PRAGMA journal_mode = WAL;`
- `PRAGMA foreign_keys = ON;`
- `PRAGMA busy_timeout = 5000;`
### Manual Inspection
```bash
sqlite3 data/db.sqlite ".schema"
sqlite3 data/cgcx.db ".indexes"
```
---
## Deployment
### systemd Service
Create `/etc/systemd/system/cgcx-server.service`:
```ini
[Unit]
Description=cg.cx Web Server
After=network.target
[Service]
Type=simple
User=cgcx
Group=cgcx
WorkingDirectory=/opt/cgcx
Environment="RUST_LOG=info"
Environment="CGCX_AES_MASTER_KEY=<64-hex-chars>"
ExecStart=/opt/cgcx/cgcx-server
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
```
Create a similar service for `cgcx-bot`. Reload and enable:
```bash
sudo systemctl daemon-reload
sudo systemctl enable --now cgcx-server cgcx-bot
```
### Reverse Proxy (nginx)
```nginx
server {
listen 443 ssl http2;
server_name cg.cx;
ssl_certificate /path/to/cert.pem;
ssl_certificate_key /path/to/key.pem;
location / {
proxy_pass http://127.0.0.1:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Support streaming
proxy_buffering off;
proxy_request_buffering off;
}
}
```
### TLS
Use Let's Encrypt (certbot) or a managed TLS terminator. The `__Host-pw` cookie requires HTTPS (`Secure` flag).
### File Permissions
- The master key file (if used instead of env) **must** be readable only by the service user:
```bash
chmod 600 /opt/cgcx/master.key
chown cgcx:cgcx /opt/cgcx/master.key
```
- Data directories (`data/media`, `data/documents`, `data/text`, `data/temp`) should be owned by the service user.
---
## Administration
### Admin Commands
Admin commands are restricted to users in configured `admin_group_ids` who also have the `admin` role in the database.
| Command | Usage | Description |
| ---------------- | -------------------------- | ---------------------------------------------------------------------------------------------- |
| `/reload` | `/reload` | Reloads moderation lists from disk (`data/blacklisted_ids.json`, `data/whitelisted_ids.json`). |
| `/blacklist_uid` | `/blacklist_uid <user_id>` | Blacklists a Telegram user ID and sets their role to `banned`. |
| `/whitelist_uid` | `/whitelist_uid <user_id>` | Whitelists a Telegram user ID (relevant in whitelist mode). |
### Review Groups
Reports submitted by users are forwarded to all configured `review_group_ids` with an inline keyboard:
- **🗑 Delete** — Sets content status to `deleted`.
- **⛔ Blacklist User** — Blacklists the uploader and bans them.
- **📝 Ignore** — Dismisses the report.
### Moderation Modes
- **Blocklist mode (`share_mode = "b"`)**: Everyone can upload except blacklisted IDs.
- **Allowlist mode (`share_mode = "w"`)**: Only whitelisted IDs can upload.
Moderation lists are hot-reloaded every 30 seconds by a background task, or immediately via `/reload`.
---
## Development
### Dev Mode (Frontend)
```bash
cd frontend
npm install
npm run dev
```
Vite dev server runs separately; point `config/local.toml` `server.base_url` to your local frontend proxy if needed.
### Dev Mode (Backend)
```bash
# Server with tracing
cargo run -p cgcx-server
# Bot with tracing
cargo run -p cgcx-bot
```
Set `RUST_LOG=debug` for verbose output:
```bash
RUST_LOG=debug cargo run -p cgcx-server
```
### Testing
Run workspace tests:
```bash
cargo test --workspace
```
Individual crate tests:
```bash
cargo test -p cgcx-core
cargo test -p cgcx-crypto
cargo test -p cgcx-content-typing
```
### Useful Debug Tips
- Inspect SQLite directly: `sqlite3 data/db.sqlite "SELECT * FROM contents;"`
- Check moderation lists: `cat data/blacklisted_ids.json`
- Verify master key fingerprint in logs on startup.
---
## License
MIT License — see [LICENSE](LICENSE) for details.
---
## Security Disclosure
If you discover a security vulnerability, please do not open a public issue. Contact the maintainers directly through the admin channels configured in the bot.