The MoJ Analytical Platform comes with various tools including:
The main entry point to the Analytical Platform.
A development environment for writing R code and R Shiny apps.
A development environment for writing Python code.
The data engineering team maintain a number of databases on the Analytical Platform (curated databases). The best way to find out about these is using the data discovery tool.
Upload File Data
A web application to upload data (.csv, .json, .jsonl) to the MoJ Analytical Platform in a standardised way.
Upload Microservices Data
Tools for uploading and refreshing data from microservices to the MoJ Analytical Platform in a standardised way:
- data-engineering-data-extractor extracts data from applications/services/microservices
- register-my-data moves the data into the AP curated databases
A tool for scheduling and monitoring workflows.
Create a Derived Table
A tool for creating persistent derived tables in Athena.
The data engineering team maintain a number of python packages to help with data manipulation, as well as interfacing with data using our preferred services. The following python packages are those we consider the most useful:
Standard package for querying MoJAP athena databases with useful features including temp table creation.
Useful package for ensuring type conformance when reading with arrow or pandas.
MoJAP defined metadata that interacts with other packages (inc arrow-pd-parser) for ensuring type conformance as well as a number of schema converters.
A collection of useful utilities for interacting with AWS
User friendly way of making small persisting ad hoc databases. In it’s alpha release, please report all problems!
A repo containing some helpful guides on how to use some of the above packages. You can also ask for help with these on #ask-data-engineering.
The data engineering team maintain the following R package:
A package for accessing Athena databases from the Analytical Platform.
The Analytical Platform community maintain the following R packages, which avoid the need for using Python in R projects:
A native R package for accessing Athena databases from the Analytical Platform.
A native R package that is used to access AWS S3 from the Analytical Platform, which is mainly compatible with the legacy package s3tools.