Tools
The MoJ Analytical Platform comes with various tools including:
Control panel
The main entry point to the Analytical Platform.
RStudio
A development environment for writing R code and R Shiny apps.
JupyterLab
A development environment for writing Python code.
Data Discovery
The data engineering team maintain a number of databases on the Analytical Platform (curated databases). The best way to find out about these is using the data discovery tool.
Upload File Data
A web application to upload data (.csv, .json, .jsonl) to the MoJ Analytical Platform in a standardised way.
Upload Microservices Data
Tools for uploading and refreshing data from microservices to the MoJ Analytical Platform in a standardised way:
- data-engineering-data-extractor extracts data from applications/services/microservices
- register-my-data moves the data into the AP curated databases
Airflow
A tool for scheduling and monitoring workflows.
Create a Derived Table
A tool for creating persistent derived tables in Athena.
Python packages
The data engineering team maintain a number of python packages to help with data manipulation, as well as interfacing with data using our preferred services. The following python packages are those we consider the most useful:
pydbtools
Standard package for querying MoJAP athena databases with useful features including temp table creation.
mojap-arrow-pd-parser
Useful package for ensuring type conformance when reading with arrow or pandas.
mojap-metadata
MoJAP defined metadata that interacts with other packages (inc arrow-pd-parser) for ensuring type conformance as well as a number of schema converters.
dataengineeringutils3
A collection of useful utilities for interacting with AWS
athena_tools
User friendly way of making small persisting ad hoc databases. In it’s alpha release, please report all problems!
mojap-aws-tools-demo
A repo containing some helpful guides on how to use some of the above packages. You can also ask for help with these on #ask-data-engineering.
R packages
The data engineering team maintain the following R package:
dbtools
A package for accessing Athena databases from the Analytical Platform.
The Analytical Platform community maintain the following R packages, which avoid the need for using Python in R projects:
Rdbtools
A native R package for accessing Athena databases from the Analytical Platform.
Rs3tools
A native R package that is used to access AWS S3 from the Analytical Platform, which is mainly compatible with the legacy package s3tools.