Security in GitHub

Protecting information in GitHub

GitHub should primarily be used to store code.

Generally, you should not store any data in GitHub, especially sensitive data. Instead, you should use warehouse data sources.

You can use the following approaches to reduce the risk of accidentally publishing sensitive data to GitHub. See also repository visibility and managing access in GitHub.

What	How it’s configured	Reasoning	How to override
Publishing data files (.csv, .xlsx, etc.)	gitignore file	You should not store data in GitHub.	Manually add the file using `git add -f <filename>`
Publishing file archives (.zip, .tar, .7z, .gz, .bz, .rar, etc.)	gitignore file	It’s better to unzip file archives and commit their raw contents so files can be tracked individually. This helps prevent data from being accidentally published within a file archive.	Manually add the file using `git add -f <filename>`
Publishing large files (>5MB)	Pre-commit hook	Large files are likely to be data.	Do not run pre-commit hooks using `git commit --no-verify`
Publishing Jupyter notebooks	nbstripout as a pre-commit hook	Jupyter notebook outputs often contain data.	Disable nbstripout using `ENABLE_NBSTRIPOUT=false; git commit`
Pushing to repositories outside the MoJ Analytical Services GitHub organistion	Pre-push hook	You should only store code in the MoJ Analytical Services GitHub organisation.	Force push using `git push -f <remote> <branch>`

You should also not store secrets in GitHub, including passwords, credentials and keys. You can use parameters to securely store secrets on the Analytical Platform.

Accidentally publishing data to GitHub

If you accidentally publish sensitive data to GitHub, you should:

follow the GitHub guidance on removing sensitive data from a repository
report a security incident and follow any instructions given by the security team

If you need further support contact the Data Engineering team in the #ask-data-engineering Slack channel.

This page was last reviewed on 30 January 2023. It needs to be reviewed again on 30 January 2024 by the page owner #analytical-platform-support .