Tools and services
The Analytical (AP) provides a range of tools, services and packages. This page describes the core tools and services that comprise the platform, as well as additional packages you can use to perform data analysis.
Note that we only provides support for third-party tools and services for features directly involving the Analytical Platform, such as bespoke configurations. For any other support with third-party tools and services, see the vendor’s documentation; we have provided links where possible.
Core Tools and Services
Control Panel
Main entry point to the Analytical Platform. Allows you to configure tools and view their status.
Data Discovery
Allows you to browse the databases that are available on the Analytical Platform.
GitHub
Online hosting platform for git. Git is a distributed version control system that allows you to track changes in files, while GitHub hosts the Analytical Platform’s code.
Data Ingest
Data Extractor
Extracts data from applications, services or microservices to the Analytical Platform in a standardised way.
Data Uploader
Web application for uploading data (.csv, .json, .jsonl) to the Analytical Platform in a standardised way.
Ingestion
An SFTP based service that allows users to ingest data into their Analytical Platform data warehouse.
Register my data
Moves data from microservices into the Analytical Platform’s curated databases in a standardised way.
Integrated Development Environments (IDE)
Amazon Athena Console
Amazon Athena is an interactive query service that makes it easy to analyze data directly in Amazon Simple Storage Service (Amazon S3) using standard SQL. See Athena documentation for more information.
JupyterLab
Development environment for writing Python code. For more information, see the JupyterLab documentation.
RStudio
Development environment for writing R code and R Shiny apps. For more information, see the RStudio documentation.
Visual Studio Code
General purpose code editor. For more information, see the Visual Studio Code documentation.
Orchestration
Airflow
A tool for scheduling and monitoring workflows.
Create a Derived Table
A tool for creating persistent derived tables in Athena.
💡 For guidance on which tools to use when doing data transformation please refer to tools for data transformation.
Python Packages
The Data Engineering team maintain Python packages that help with data manipulation. The following are the packages we consider the most useful for doing so:
athena_tools
Provides a simple way to create small persisting ad hoc databases. Currently in Alpha.
dataengineeringutils3
Collection of useful utilities for interacting with AWS.
mojap-arrow-pd-parser
Ensures type conformance when reading with arrow or pandas.
mojap-aws-tools-demo
Contains helpful guides on how to use the Python packages listed in this section. You can also ask for help with these in the #ask-data-engineering Slack channel on the Justice Digital workspace.
mojap-metadata
Defined metadata that interacts with other packages (including arrow-pd-parser) to ensure type conformance, as well as schema converters.
pydbtools
Queries MoJAP athena databases with features such as temp table creation.
splink
Provides the ability to link datasets at scale. Splink is the matching engine behind the linked data on the Analytical Platform. This package is maintained by the Internal Data Linking team, support is offered via the #ask-data-linking Slack channel.
R Packages
The following R packages are maintained by the AP community
dbtools
Allows you to access Athena databases from the Analytical Platform using a reticulate wrapper around pydbtools.
Rdbtools
Allows you to access Athena databases from the Analytical Platform using an extension of the noctua R package.
Rs3tools
Allows you to access AWS S3 from the Analytical Platform, which is mainly compatible with the legacy package s3tools.
Data Science Tools and Services
Data Science Asset Register
Process for managing deployed data science assets. The register itself can be found here.
MLFlow Tracking Server
A user interface for MLFlow Tracking Server that allows users to track their model experiments.