Major improvement, security handling, file handling +fixes
This commit is contained in:
@@ -23,7 +23,7 @@ For a self-hosted, single-tenant service handling encrypted file metadata, **SQL
|
||||
|
||||
1. **Operational simplicity**: No separate database server to install, upgrade, or network-secure. A single `.sqlite` file is trivial to back up, replicate, or inspect.
|
||||
2. **WAL mode performance**: With `PRAGMA journal_mode = WAL`, SQLite handles concurrent readers and a single writer efficiently - enough for a bot + web server pair.
|
||||
3. **Schema simplicity**: The schema is small (5 tables, 2 migration files). The overhead of a client/server RDBMS is unjustified.
|
||||
3. **Schema simplicity**: The schema is small (10 tables, 7 migration files). The overhead of a client/server RDBMS is unjustified.
|
||||
4. **Deployment footprint**: Ideal for running on a small VPS or even an embedded edge device without container orchestration.
|
||||
|
||||
If future requirements demand horizontal scaling or heavy analytics, the repository pattern in `cgcx-db` makes it straightforward to swap in PostgreSQL without touching the bot or server code.
|
||||
@@ -121,6 +121,17 @@ The **cg.cx server** is a trusted party for decryption and delivery. It is not a
|
||||
|
||||
---
|
||||
|
||||
## Hashing for Deduplication and Blacklist
|
||||
|
||||
`cgcx-crypto` computes a **BLAKE3 hash over the ciphertext stream** (including the secretstream header) for tamper detection. This hash is stored per-file in `content_files.encrypted_hash`.
|
||||
|
||||
In addition, the file pipeline now computes a **plaintext BLAKE3 hash** during ingestion:
|
||||
1. A running hash of the plaintext chunks is computed alongside encryption.
|
||||
2. The resulting `plaintext_hash` is stored in `content_files` and used for deduplication — when identical plaintext is uploaded, the existing encrypted file is reused and its `ref_count` is incremented.
|
||||
3. A `hash_blacklist` table (migration `007_hash_blacklist.sql`) allows moderators to block re-uploads of known-banned content by its plaintext hash. The pipeline checks this blacklist before storing any new file and rejects blocked content with a `BlockedHash` error.
|
||||
|
||||
---
|
||||
|
||||
## Future Considerations
|
||||
|
||||
- **Client-side decryption**: A future iteration could deliver the wrapped CEK to the browser and decrypt via WebAssembly / libsodium-js. This would remove the server from the trust boundary for delivery.
|
||||
|
||||
Reference in New Issue
Block a user