Major improvement, security handling, file handling +fixes

This commit is contained in:
unknown
2026-05-23 00:13:56 +02:00
parent 2129081599
commit a7b44af91a
25 changed files with 925 additions and 116 deletions

View File

@@ -23,7 +23,7 @@ For a self-hosted, single-tenant service handling encrypted file metadata, **SQL
1. **Operational simplicity**: No separate database server to install, upgrade, or network-secure. A single `.sqlite` file is trivial to back up, replicate, or inspect.
2. **WAL mode performance**: With `PRAGMA journal_mode = WAL`, SQLite handles concurrent readers and a single writer efficiently - enough for a bot + web server pair.
3. **Schema simplicity**: The schema is small (5 tables, 2 migration files). The overhead of a client/server RDBMS is unjustified.
3. **Schema simplicity**: The schema is small (10 tables, 7 migration files). The overhead of a client/server RDBMS is unjustified.
4. **Deployment footprint**: Ideal for running on a small VPS or even an embedded edge device without container orchestration.
If future requirements demand horizontal scaling or heavy analytics, the repository pattern in `cgcx-db` makes it straightforward to swap in PostgreSQL without touching the bot or server code.
@@ -121,6 +121,17 @@ The **cg.cx server** is a trusted party for decryption and delivery. It is not a
---
## Hashing for Deduplication and Blacklist
`cgcx-crypto` computes a **BLAKE3 hash over the ciphertext stream** (including the secretstream header) for tamper detection. This hash is stored per-file in `content_files.encrypted_hash`.
In addition, the file pipeline now computes a **plaintext BLAKE3 hash** during ingestion:
1. A running hash of the plaintext chunks is computed alongside encryption.
2. The resulting `plaintext_hash` is stored in `content_files` and used for deduplication — when identical plaintext is uploaded, the existing encrypted file is reused and its `ref_count` is incremented.
3. A `hash_blacklist` table (migration `007_hash_blacklist.sql`) allows moderators to block re-uploads of known-banned content by its plaintext hash. The pipeline checks this blacklist before storing any new file and rejects blocked content with a `BlockedHash` error.
---
## Future Considerations
- **Client-side decryption**: A future iteration could deliver the wrapped CEK to the browser and decrypt via WebAssembly / libsodium-js. This would remove the server from the trust boundary for delivery.