Stream Codecs
Object Data Encryption and Compression
When storing data each object is encrypted and compressed according to global settings. Object data contains the resulting data stream while information describing the format of the stream is stored in S3 Metadata (x-amz-meta- headers). Our stream codecs system is extensible and so far there are three data formats:
- ’legacy’ was used in early beta versions (not described here)
- ’simple’ was used until 1.0.2 release
- ’s3bk-v2’ is used starting with 1.0.2 release
All formats are supported for download. Uploads are processed with the newest codec.
Stream format name is stored in a stream-format metadata field.
Current stream codec
This format is marked by stream-format field being present and set to s3bk-v2 . If no other fields are present, the data stream is unprocessed. If compression is present, it is compressed. If encryption is present it is encrypted and all the other fields starting with encryption- must be present.
Field |
Valid values |
stream-format |
s3bk-v2 |
compression |
bz2 |
encryption |
AES-256 |
encryption-kdf |
bcrypt-10 |
encryption-salt |
HEX-encoded |
encryption-key-digest |
HEX-encoded |
encryption-iv |
HEX-encoded |
Decoding the stream
- If encrypted
- Find a password
- Ask the user or use secret key or any of the passwords used this session
- Calculate the key as
key = sha256(kdf(password, salt))
- For the correct password the following will be true:
sha256(key) == key-digest
- Use the
key to decrypt the stream with AES-256 in CFB mode with iv initialization vector
- Decompress if necessary
Previous stream codec
Relevant metadata fields
Field |
Expected values |
Description |
stream-format |
simple |
Marks this stream format |
compression-algorithm |
bz2 or zlib |
Compression algorithm. If not present, the stream is not compressed. |
compression-original-size |
Integer value |
Length of the stream after decompression. |
encryption-cipher |
AES or Blowfish |
Encryption cipher. If not present, the stream is not encrypted. |
encryption-salt |
ASCII string of random data |
This is a random value used to make encryption key harder to find by brute force or precomputed maps. |
encryption-key-digest |
Hex encoded string |
Used to check validity of decryption key. See below for more info. |
encryption-original-length |
Integer value |
Length of the stream before encryption (but after compression if it was applied). |
Decoding the stream
To restore stream data it has to be decrypted first (if necessary), decompressed second (if necessary). For block ciphers the stream is padded with data that have to be discarded, i.e. trim the decrypted stream at encryption-original-length before decompressing or saving to disk. To decompress the data one can use standard zlib and bzip2 libraries, for example as provided by Python standard library: zlib, bz2.
To decrypt the data use some standard implementation as well (we use PyCrypto). However you will need a correct decryption key to get the original data back. Here’s a description of algorithm to get and validate the decryption key based on metadata:
- User enters a decryption key, which can be any Unicode string.
- Application calculates the “salted key” by concatenating the key entered by user UTF-8 encoded (you can use it verbatim if it’s strictly ASCII) with salt value found in metadata.
- Application checks the validity of salted key by computing a SHA-1 hash of the salted key and comparing its hex digest to a digest found in metadata.
- If the cipher used is Blowfish, the salted key can be used to decrypt the stream.
- If the cipher is AES, take a SHA-256 hash of the salted key, take the binary digest of that and use it for decryption. (The resulting key will be exactly 32 bytes long).
|