Back

Securing data at shipit

Created on by Artjom Vassiljev


Image Securing data at shipit
Photo by Matthew Henry

Insurance industry plays on our fears: pay a smaller fee for an unlikely future event, and if it happens you will be covered. If it is a calculated risk, this money is worth spending. If you care about your partner’s future well-being and your children, you might insure your life so that in the worst-case scenario they get financial support for some time. A similar concept is applied when building software. We calculate various risks and implement the appropriate counter-measures. This takes more time, but if the incident happens, that time would pay-off the development work.

The security problem of storing sensitive data

When setting up third-party integrations with shipit, users provide API keys to their accounts. These allow us to read and write to these systems, sometimes also delete data. Now imagine that for some reason our whole database leaks to public: a clever attack that compromised our or partner’s systems, a disgruntled engineer, or simply due to the negligence. Anyone would be able to use these API keys to access customers’ services like Google Drive files, Jira tasks, Confluence pages, or sales data at PipeDrive. Not only this affects our system, but now also our users’ external services which amplifies the consequences.

The technical solution to this particular problem is to encrypt sensitive data inside the database. We host everything on Google Cloud and use KMS for managing encryption keys. This means that if someone were to dump the database, all of the sensitive data would either be hashed (passwords), or encrypted with a key stored and managed by another server.

Securely storing sensitive data

Below is a description of how we do that in code. We use Django and Python 3.7 for everything on the backend. Here we will create a new model field that encrypts data before storing it in the database, and decrypts it when reading it back.

First create a service client and helper functions to encrypt/decrypt the data:

import base64
from google.cloud import kms_v1


client = None
resource_name = None


def _get_client(project_id=None, location=None, keyring=None, crypto_key=None):
	"""Create a Google KMS client that will be used for encrypting/decrypting the data.
	https://cloud.google.com/kms/docs/reference/rest/
	"""
    global resource_name
    global client

    if client and resource_name:
        return client, resource_name
    elif not any([project_id, location, keyring, crypto_key]):
        # for development environment we do not provide any encryption
        # so just return None
        return None, None

    client = kms_v1.KeyManagementServiceClient()
    resource_name = client.crypto_key_path_path(project_id, location, keyring, crypto_key)
    return client, resource_name


def encrypt(clear_text, **kwargs):
    client, resource_name = _get_client(**kwargs)
    if not client and not resource_name:
        # we're in development mode, so simply return unencrypted data
        return clear_text
    if type(clear_text) == str:
        clear_text = clear_text.encode('utf-8')

    response = client.encrypt(resource_name, clear_text)
    # base64 encoding is just for convenience in order to store the data
    # in a text field rather than binary
    return base64.b64encode(response.ciphertext).decode('utf-8')


def decrypt(ciphertext, **kwargs):
    client, resource_name = _get_client(**kwargs)
    if not client and not resource_name:
        # we're in development mode, so simply return unencrypted data
        return ciphertext
    decoded = base64.b64decode(ciphertext)
    response = client.decrypt(resource_name, decoded)
    return response.plaintext.decode('utf-8')

Using the above we can then create a Django model field that automatically encrypts the data when saving it and decrypts during the load:

from django.db import models
from django.conf import settings


class EncryptedCharField(models.CharField):
    def __init__(self, *args, **kwargs):
        kwargs['max_length'] = 1024
        super().__init__(*args, **kwargs)

    def from_db_value(self, value, expression, connection):
        # this function is called by Django when reading the data from the database
        if value is None:
            return value
        # below is a a `decrypt` function implemented above. You should place it
        # a separate module and import here
        return decrypt(
            value, project_id=settings.GOOGLE_PROJECT_ID,
            location=settings.GOOGLE_LOCATION, keyring=settings.GOOGLE_KEYRING,
            crypto_key=settings.GOOGLE_CRYPTOKEY)

    def get_db_prep_value(self, value, connection, prepared=False):
        # this function is called before saving the data to the database
        if not value:
            return value
        encrypted = encrypt(
            value, project_id=settings.GOOGLE_PROJECT_ID,
            location=settings.GOOGLE_LOCATION, keyring=settings.GOOGLE_KEYRING,
            crypto_key=settings.GOOGLE_CRYPTOKEY)
        return encrypted

Now you can use the EncryptedCharField in your model, just like any other field.

Avoiding sensitive data in code

There are quite a lot of news about developers storing API and session keys inside the code, then uploading it to GitHub where anyone can see that (in case of a public repository). But even if you do have a private repository that only your company employees can access, it is a bad practice to store anything sensitive in code.

Amazon AWS has either Secrets Manager or Parameter Store which simplifies this considerably. Unfortunately Google doesn’t have anything near developer-friendly as Amazon, so we have created a quick solution for that.

We use a simple text file that is then encrypted using KMS, and is then stored on Google Cloud Storage. Whenever Django needs to load its settings, it fetches the file, decrypts it, and populates the necessary fields. Here’s the code:

from google.cloud import storage

SECRETS_BUCKET = 'secrets-bucket'
SECRETS_FILE = 'secrets'


def _fetch_secrets():
    client = storage.Client()
    bucket = client.get_bucket(SECRETS_BUCKET)
    output_blob = (SECRETS_FILE)
    return bucket.blob(output_blob).download_as_string()

Now that we have the encrypted contents as a string, we can decrypt it and process:

def _to_settings(blob):
    items = blob.strip().split('\n')
    settings = {}
    for item in items:
        k, v = item.split('=')
        if v in ['True', 'False']:
            v = v == 'True'
        settings[k] = v
    return settings

settings_blob = _fetch_secrets()
settings_processed = _to_settings(settings_blob)

Finally, we populate Django settings with that data:

for k, v in settings_processed.items():
    setattr(sys.modules[__name__], k, v)

In order to have this working, we split settings files based on the environment. So all of the above code is located in settings/production.py which is loaded only on live servers.

Third-party permissions

When configuring server, a workstation, or setting up a user account on computer, it is considered the best practice to use the least permissions principle. What this means is that we should allow the minimum rights to the user or the app to perform their job. Everything else should be disabled. It is dangerous to give user the admin rights on a machine if all they need to do is be able to edit Word documents. In case of a cyber attack it will be very easy for malware to gain access in the machine. We try to follow the same principles when asking for your permissions with third-party services. A good example is Google.

If you just want to quickly login to the website via Google, we do not need anything other than your full name and email (even though we could ask for more and quickly setup and Google Drive integration automatically). When connecting your Drive, we only ask a permission to create files and modify the ones created by shipit. From the user perspective it would require less clicks to be able to select any drive folder and store PRDs inside it. However this would mean elevated permissions and us being able to edit and delete your entire Drive contents. In the worst-case scenario if the access key leaks, the attacker would be able to wipe our users’ entire drives. Instead, we create a folder at the root directory and have access to just its contents thus limiting the potential damage by the attacker (should this ever happen).