Stream Codecs

Object Data Encryption and Compression

When storing data each object is encrypted and compressed according to global settings. Object data contains the resulting data stream while information describing the format of the stream is stored in S3 Metadata (x-amz-meta- headers).

Our stream codecs system is extensible and so far there are three data formats:

  1. ’legacy’ was used in early beta versions (not described here)
  2. ’simple’ was used until 1.0.2 release
  3. ’s3bk-v2’ is used starting with 1.0.2 release

All formats are supported for download. Uploads are processed with the newest codec.

Stream format name is stored in a stream-format metadata field.

Current stream codec

This format is marked by stream-format field being present and set to s3bk-v2. If no other fields are present, the data stream is unprocessed. If compression is present, it is compressed. If encryption is present it is encrypted and all the other fields starting with encryption- must be present.

Field Valid values
stream-format s3bk-v2
compression bz2
encryption AES-256
encryption-kdf bcrypt-10
encryption-salt HEX-encoded
encryption-key-digest HEX-encoded
encryption-iv HEX-encoded

Decoding the stream

  • If encrypted
    • Find a password
      • Ask the user or use secret key or any of the passwords used this session
      • Calculate the key as key = sha256(kdf(password, salt))
      • For the correct password the following will be true: sha256(key) == key-digest
    • Use the key to decrypt the stream with AES-256 in CFB mode with iv initialization vector
  • Decompress if necessary

Previous stream codec

Relevant metadata fields

Field Expected values Description
stream-format simple Marks this stream format
compression-algorithm bz2 or zlib Compression algorithm. If not present, the stream is not compressed.
compression-original-size Integer value Length of the stream after decompression.
encryption-cipher AES or Blowfish Encryption cipher. If not present, the stream is not encrypted.
encryption-salt ASCII string of random data This is a random value used to make encryption key harder to find by brute force or precomputed maps.
encryption-key-digest Hex encoded string Used to check validity of decryption key. See below for more info.
encryption-original-length Integer value Length of the stream before encryption (but after compression if it was applied).

Decoding the stream

To restore stream data it has to be decrypted first (if necessary), decompressed second (if necessary). For block ciphers the stream is padded with data that have to be discarded, i.e. trim the decrypted stream at encryption-original-length before decompressing or saving to disk.

To decompress the data one can use standard zlib and bzip2 libraries, for example as provided by Python standard library: zlib, bz2.

To decrypt the data use some standard implementation as well (we use PyCrypto). However you will need a correct decryption key to get the original data back. Here’s a description of algorithm to get and validate the decryption key based on metadata:

  1. User enters a decryption key, which can be any Unicode string.
  2. Application calculates the “salted key” by concatenating the key entered by user UTF-8 encoded (you can use it verbatim if it’s strictly ASCII) with salt value found in metadata.
  3. Application checks the validity of salted key by computing a SHA-1 hash of the salted key and comparing its hex digest to a digest found in metadata.
    1. If the cipher used is Blowfish, the salted key can be used to decrypt the stream.
    2. If the cipher is AES, take a SHA-256 hash of the salted key, take the binary digest of that and use it for decryption. (The resulting key will be exactly 32 bytes long).