Quick Reference
⚠️ This service is in beta ⚠️
This page is intended to give users who have read through the detailed create-a-derived-table instructions a quick reference to refresh their memory. Please post suggestions to improve this document in our slack channel #ask-data-modelling, or edit and raise a PR.
Contents
Glossary
Glossary of key words in the context of create-a-derived-table
/ dbt
.
_dim
: dimension model suffix naming convention._fct
: fact model suffix naming convention._stg
: staging model suffix naming convention..yaml
: preferred YAML file extension (rather than.yml
).create-a-derived-table
: MoJ tool to create derived tables and serve them in AWS Athena.dbt
: data build tool open source software whichcreate-a-derived-table
is built on.dbt-athena
: an open source community builtdbt
adapter to allowdbt
to work with AWS Athena. MoJ currently uses its own fork.model
: more or less synonymous withtable
.run_artefacts
: files created when models are compiled or run.seed
: lookup table.source
: any table on the MoJ Analytical Platform not created bycreate-a-derived-table
which may be used as a starting point to derived models.table
: tabular data in the usual sense; also the default materialisation type for amodel
.
Set up
This list comprises everything you need to do and consider to get set up and ready to start building models collaboratively. It is intended as a quick check or reference list and assumes you have read the detailed create-a-derived-table instructions. See Troubleshooting if you have any problems.
Read the detailed create-a-derived-table instructions.
Check your use case is appropriate; you may contact the Data Modelling team for advice at #ask-data-modelling.
Decide an appropriate domain within
create-a-derived-table
for your project.Decide on naming conventions for your
models
in the formdatabase_name__table_name
, note separation using__
(“dunder”). Database name must be unique within MoJ.Set up an MoJ Analytical Platform account.
Add yourself to standard_database_access and raise a PR to gain access to the
create_a_derived_table/basic
resource, which includes access toseeds
andrun_artefacts
.Create a project access file for your project in data-engineering-database-access/project_access.
In the project access file under
Resources
include thecreate-a-derived-table
domains required to write models to, as well as the source databases you will be buildung models from.If an MoJ Analytical Platform database is not listed as a source under mojap_derived_tables/models/sources then it will need to be added, see CONTRIBUTING.
Set up RStudio IDE; set up a project and clone the repo into it. See Set up the RStudio working environment for GUI instructions. Using Terminal navigate to where you want the
create-a-derived-table
project to sit and rungit clone git@github.com:moj-analytical-services/create-a-derived-table.git
.Navigate to the root of your
create-a-derived-table
directory in Terminal and set up a Python virtual environment; activate it, upgrade pip, and install requirements. See Setting up a Python virtual environment.- Set the environment variable
DBT_PROFILES_DIR
in your Bash profile and source it. - Navigate to the
mojap_derived_tables
directory in Terminal to rundbt
commands. Check you have an active connection,dbt debug
- Install
dbt
packages withdbt deps
.
- Set the environment variable
Use Github Workflow method to collaborate on a project. Branch off
main
and create a main branch for your project,project-name-main
; all subsequent developers should branch offproject-name-main
to create feature branches for this project. When raising a PR ensure you merge into this branch, before merging intomain
; the PR summary should read something like “github-user
wants to merge X commits intoproject-name-main
fromproject-name-feature-branch
”. See also Collaborating with Git.
You are now ready to start building models collaboratively with create-a-derived-tbale
. If you have any problems please check Troubleshooting, or ask at #ask-data-modelling providing context and links if appropriate.
Tips
- You can test out how your SQL model files look once rendered by running
dbt compile --select <path_to_file(s)>
. This saves on running and deploying your tables if you want to test your sql. The compiled model files will be saved in themojap_derived_tables/target/compiled
folder. - Make sure you deploy your seeds with
dbt seed --select <path_to_seeds>
if your models depend on them. - If you define any variables to inject into your model sql files using
{{ var(...) }}
, they need to be in thedbt_project.yml
file.