GHC 2020-11-22

5 comments.

, https://git.io/JkoWz in jarun/buku
> For a short stint at it, just add 2 new functions for the encrypt and decrypt logic. 

Sure, here's a patch implementing a fresh crypto scheme with PBKDF2 + Fernet. See commit message for detailed explanations.

Patch (also in a gist: https://gist.github.com/7cbd65b2cffac771045cf9999a936383):

```diff
From 41987f1697992b48da13bbebc5e0c42036ea5dc5 Mon Sep 17 00:00:00 2001
From: Zhiming Wang <[email protected]>
Date: Mon, 23 Nov 2020 01:01:19 +0800
Subject: [PATCH] Improve crypto

The less hand-rolled stuff the better.

- Switch from hand-rolled crappy KDF to PBKDF2;
- Switch from AES-CBC with hand-rolled padding and authentication to
  Fernet;
- Future-proofing cipher text format (can use different PBKDF2
  iterations, longer salt, or a new cipher altogether).

Rationale:

- PBKDF2 because it's the obvious choice; remove the option of
  user-specified iterations because what's the point. 100k is good
  enough. Just use a stronger password if they want better security.

- Fernet because it's been oftly recommended by experts, including the
  authors of the very cryptography package we're using, and most
  importantly, it has the minimum number of moving parts. It takes a
  key, and the plaintext, and that's it. All the crypto stuff that can
  go wrong (iv, tag, whatever hogwash) is implemented by experts, we
  just can't do wrong.

- My output format borrows heavily from bcryptm, using $ as the
  delimiter between algorithm version (future-proofing), decryption
  parameters (iterations, salt) and cipher text.

Downsides:

- Fernet requires the entire cleartext and ciphertext in memory. I don't
  think that's much of a practical concern given the nature of the
  application at hand.
---
 buku | 196 +++++++++++++++++++++--------------------------------------
 1 file changed, 68 insertions(+), 128 deletions(-)

diff --git a/buku b/buku
index 9495cef..a51c6dc 100755
--- a/buku
+++ b/buku
@@ -31,6 +31,7 @@ import re
 import shutil
 import signal
 import sqlite3
+from sqlite3.dbapi2 import DataError
 import struct
 import subprocess
 from subprocess import Popen, PIPE, DEVNULL
@@ -110,38 +111,11 @@ class BukuCrypt:
     """
 
     # Crypto constants
-    BLOCKSIZE = 0x10000  # 64 KB blocks
-    SALT_SIZE = 0x20
-    CHUNKSIZE = 0x80000  # Read/write 512 KB chunks
+    PBKDF2_SALT_SIZE = 16  # 128 bit salt
+    PBKDF2_ITERATIONS = 100000
 
     @staticmethod
-    def get_filehash(filepath):
-        """Get the SHA256 hash of a file.
-
-        Parameters
-        ----------
-        filepath : str
-            Path to the file.
-
-        Returns
-        -------
-        hash : bytes
-            Hash digest of file.
-        """
-
-        from hashlib import sha256
-
-        with open(filepath, 'rb') as fp:
-            hasher = sha256()
-            buf = fp.read(BukuCrypt.BLOCKSIZE)
-            while len(buf) > 0:
-                hasher.update(buf)
-                buf = fp.read(BukuCrypt.BLOCKSIZE)
-
-            return hasher.digest()
-
-    @staticmethod
-    def encrypt_file(iterations, dbfile=None):
+    def encrypt_file(dbfile=None):
         """Encrypt the bookmarks database file.
 
         Parameters
@@ -152,19 +126,16 @@ class BukuCrypt:
             Custom database file path (including filename).
         """
 
+        import base64
+        from getpass import getpass
         try:
-            from cryptography.hazmat.backends import default_backend
-            from cryptography.hazmat.primitives.ciphers import (Cipher, modes, algorithms)
-            from getpass import getpass
-            from hashlib import sha256
+            from cryptography.fernet import Fernet
+            from cryptography.hazmat.primitives import hashes
+            from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2HMAC
         except ImportError:
             LOGERR('cryptography lib(s) missing')
             sys.exit(1)
 
-        if iterations < 1:
-            LOGERR('Iterations must be >= 1')
-            sys.exit(1)
-
         if not dbfile:
             dbfile = os.path.join(BukuDb.get_default_dbdir(), 'bookmarks.db')
         encfile = dbfile + '.enc'
@@ -191,81 +162,56 @@ class BukuCrypt:
             LOGERR('Passwords do not match')
             sys.exit(1)
 
-        try:
-            # Get SHA256 hash of DB file
-            dbhash = BukuCrypt.get_filehash(dbfile)
-        except Exception as e:
-            LOGERR(e)
-            sys.exit(1)
-
-        # Generate random 256-bit salt and key
-        salt = os.urandom(BukuCrypt.SALT_SIZE)
-        key = ('%s%s' % (password, salt.decode('utf-8', 'replace'))).encode('utf-8')
-        for _ in range(iterations):
-            key = sha256(key).digest()
-
-        iv = os.urandom(16)
-        encryptor = Cipher(
-            algorithms.AES(key),
-            modes.CBC(iv),
-            backend=default_backend()
-        ).encryptor()
-        filesize = os.path.getsize(dbfile)
+        salt = os.urandom(BukuCrypt.PBKDF2_SALT_SIZE)
+        kdf = PBKDF2HMAC(
+            algorithm=hashes.SHA256(),
+            length=32,
+            salt=salt,
+            iterations=BukuCrypt.PBKDF2_ITERATIONS,
+        )
+        key = base64.urlsafe_b64encode(kdf.derive(password.encode('utf-8')))
+        fernet = Fernet(key)
 
         try:
             with open(dbfile, 'rb') as infp, open(encfile, 'wb') as outfp:
-                outfp.write(struct.pack('<Q', filesize))
-                outfp.write(salt)
-                outfp.write(iv)
-
-                # Embed DB file hash in encrypted file
-                outfp.write(dbhash)
-
-                while True:
-                    chunk = infp.read(BukuCrypt.CHUNKSIZE)
-                    if len(chunk) == 0:
-                        break
-                    if len(chunk) % 16 != 0:
-                        chunk = b'%b%b' % (chunk, b' ' * (16 - len(chunk) % 16))
-
-                    outfp.write(encryptor.update(chunk))
-
-                outfp.write(encryptor.finalize())
-
+                # Use $ as delimiter (taking a page from bcrypt).
+                outfp.write(b'$2')  # algorithm version for future proofing
+                outfp.write(b'$%d' % BukuCrypt.PBKDF2_ITERATIONS)
+                outfp.write(b'$%s$' % base64.urlsafe_b64encode(salt))
+                outfp.write(fernet.encrypt(infp.read()))
             os.remove(dbfile)
             print('File encrypted')
             sys.exit(0)
         except Exception as e:
-            os.remove(encfile)
+            try:
+                os.remove(encfile)
+            except OSError:
+                pass
             LOGERR(e)
             sys.exit(1)
 
     @staticmethod
-    def decrypt_file(iterations, dbfile=None):
+    def decrypt_file(dbfile=None):
         """Decrypt the bookmarks database file.
 
         Parameters
         ----------
-        iterations : int
-            Number of iterations for key generation.
         dbfile : str, optional
             Custom database file path (including filename).
             The '.enc' suffix must be omitted.
         """
 
+        import base64
+        from getpass import getpass
         try:
-            from cryptography.hazmat.backends import default_backend
-            from cryptography.hazmat.primitives.ciphers import (Cipher, modes, algorithms)
+            from cryptography.fernet import Fernet, InvalidToken
+            from cryptography.hazmat.primitives import hashes
+            from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2HMAC
             from getpass import getpass
-            from hashlib import sha256
         except ImportError:
             LOGERR('cryptography lib(s) missing')
             sys.exit(1)
 
-        if iterations < 1:
-            LOGERR('Decryption failed')
-            sys.exit(1)
-
         if not dbfile:
             dbfile = os.path.join(BukuDb.get_default_dbdir(), 'bookmarks.db')
         else:
@@ -294,50 +240,44 @@ class BukuCrypt:
 
         try:
             with open(encfile, 'rb') as infp:
-                size = struct.unpack('<Q', infp.read(struct.calcsize('Q')))[0]
-
-                # Read 256-bit salt and generate key
-                salt = infp.read(32)
-                key = ('%s%s' % (password, salt.decode('utf-8', 'replace'))).encode('utf-8')
-                for _ in range(iterations):
-                    key = sha256(key).digest()
-
-                iv = infp.read(16)
-                decryptor = Cipher(
-                    algorithms.AES(key),
-                    modes.CBC(iv),
-                    backend=default_backend(),
-                ).decryptor()
-
-                # Get original DB file's SHA256 hash from encrypted file
-                enchash = infp.read(32)
+                # Read algorithm version, PBKDF2 iterations and salt.
+                # Read until we have encountered four $'s.
+                buf = b''
+                delim_count = 0
+                while delim_count < 4:
+                    ch = infp.read(1)
+                    if len(ch) == 0:
+                        raise DataError('malformed data')
+                    buf += ch
+                    if ch == b'$':
+                        delim_count += 1
+                _, iterations_bytes, urlsafe_b64_salt = buf.strip(b'$').split(b'$')
+                iterations = int(iterations_bytes)
+                salt = base64.urlsafe_b64decode(urlsafe_b64_salt)
+
+                kdf = PBKDF2HMAC(
+                    algorithm=hashes.SHA256(),
+                    length=32,
+                    salt=salt,
+                    iterations=iterations,
+                )
+                key = base64.urlsafe_b64encode(kdf.derive(password.encode('utf-8')))
+                fernet = Fernet(key)
 
+                decrypted = fernet.decrypt(infp.read())
                 with open(dbfile, 'wb') as outfp:
-                    while True:
-                        chunk = infp.read(BukuCrypt.CHUNKSIZE)
-                        if len(chunk) == 0:
-                            break
-
-                        outfp.write(decryptor.update(chunk) + decryptor.finalize())
-
-                    outfp.truncate(size)
-
-            # Match hash of generated file with that of original DB file
-            dbhash = BukuCrypt.get_filehash(dbfile)
-            if dbhash != enchash:
+                    outfp.write(decrypted)
+            os.remove(encfile)
+            print('File decrypted')
+        except Exception as e:
+            try:
                 os.remove(dbfile)
+            except OSError:
+                pass
+            if isinstance(e, InvalidToken):
                 LOGERR('Decryption failed')
-                sys.exit(1)
             else:
-                os.remove(encfile)
-                print('File decrypted')
-        except struct.error:
-            os.remove(dbfile)
-            LOGERR('Tainted file')
-            sys.exit(1)
-        except Exception as e:
-            os.remove(dbfile)
-            LOGERR(e)
+                LOGERR(e)
             sys.exit(1)
 
 
@@ -5084,9 +5024,9 @@ POSITIONAL ARGUMENTS:
 
     # Handle encrypt/decrypt options at top priority
     if args.lock is not None:
-        BukuCrypt.encrypt_file(args.lock)
+        BukuCrypt.encrypt_file()
     elif args.unlock is not None:
-        BukuCrypt.decrypt_file(args.unlock)
+        BukuCrypt.decrypt_file()
 
     # Set up title
     if args.title is not None:
-- 
2.25.1
```

, https://git.io/JkoIG in jarun/buku
Unfortunately it's only gonna be fewer lines of code if you start anew. With legacy encrypted files out there, if you change the crypto (to be simpler and more robust) at the very least you'll have to provide a migration path with the old decryption code, so it wouldn't be simpler at all... Consider the above an unactionable rant.

This example demonstrates the importance the choosing the right application file format the first time round.

, https://git.io/JkoIZ in jarun/buku
#481.

Also have some qualms with your crypto itself (minus the bugs). It's kind of a hand-rolled authenticated (through writing the SHA-256 hash as part of the data) AES-CBC with a hand-rolled padding scheme (pad spaces and remove them through writing the file length as a struct !! as part of the data). Why not just use a mode with authentication built-in? Like AES-GCM, which doesn't even require padding? And for modes that require padding, there are standard padding schemes like PKCS7, which are implemented in `cryptography.hazmat.primitives`.

Disclaimer: not a crypto expert, but did learn a few best practices.

, https://git.io/JkoIn in jarun/buku
Fix BukuCrypt.encrypt_file
==========================

Two bugs:

- When padding data, `chunk`'s type was changed from bytes to str.
- `encryptor.finalize` was called in a loop, causing `AlreadyFinalized` whenever there are more than one chunk.

To reproduce old bugs:

Use a file larger than `CHUNKSIZE` (512KiB):

```console
$ ln buku buku.py
$ dd if=/dev/zero bs=1024 count=513 of=data
$ python3 -c 'import buku; buku.BukuCrypt.encrypt_file(8, "data")'
Password:
Password:
Context was already finalized.
```

Similarly, to reproduce the padding bug, just create a file that's not a multiple of 16 bytes. I think this buggy code path wasn't ever triggered due to SQLite databases always being full blocks.

, https://git.io/JkoIc in jarun/buku
Looked at the crypto code, found bugs. To save you trouble I'll just open a PR.