Skip to main content

Tools and services

The Analytical (AP) provides a range of tools, services and packages. This page describes the core tools and services that comprise the platform, as well as additional packages you can use to perform data analysis.

Note that we only provides support for third-party tools and services for features directly involving the Analytical Platform, such as bespoke configurations. For any other support with third-party tools and services, see the vendor’s documentation; we have provided links where possible.

Core Tools and Services

Control Panel

Main entry point to the Analytical Platform. Allows you to configure tools and view their status.

Data Discovery

Allows you to browse the databases that are available on the Analytical Platform.

GitHub

Online hosting platform for git. Git is a distributed version control system that allows you to track changes in files, while GitHub hosts the Analytical Platform’s code.

Data Ingest

Data Extractor

Extracts data from applications, services or microservices to the Analytical Platform in a standardised way.

Data Uploader

Web application for uploading data (.csv, .json, .jsonl) to the Analytical Platform in a standardised way.

Ingestion

An SFTP based service that allows users to ingest data into their Analytical Platform data warehouse.

Register my data

Moves data from microservices into the Analytical Platform’s curated databases in a standardised way.

Integrated Development Environments (IDE)

Amazon Athena Console

Amazon Athena is an interactive query service that makes it easy to analyze data directly in Amazon Simple Storage Service (Amazon S3) using standard SQL. See Athena documentation for more information.

JupyterLab

Development environment for writing Python code. For more information, see the JupyterLab documentation.

RStudio

Development environment for writing R code and R Shiny apps. For more information, see the RStudio documentation.

Visual Studio Code

General purpose code editor. For more information, see the Visual Studio Code documentation.

Orchestration

Airflow

A tool for scheduling and monitoring workflows.

Create a Derived Table

A tool for creating persistent derived tables in Athena.

💡 For guidance on which tools to use when doing data transformation please refer to tools for data transformation.

Python Packages

The Data Engineering team maintain Python packages that help with data manipulation. The following are the packages we consider the most useful for doing so:

athena_tools

Provides a simple way to create small persisting ad hoc databases. Currently in Alpha.

dataengineeringutils3

Collection of useful utilities for interacting with AWS.

mojap-arrow-pd-parser

Ensures type conformance when reading with arrow or pandas.

mojap-aws-tools-demo

Contains helpful guides on how to use the Python packages listed in this section. You can also ask for help with these in the #ask-data-engineering Slack channel on the Justice Digital workspace.

mojap-metadata

Defined metadata that interacts with other packages (including arrow-pd-parser) to ensure type conformance, as well as schema converters.

pydbtools

Queries MoJAP athena databases with features such as temp table creation.

Provides the ability to link datasets at scale. Splink is the matching engine behind the linked data on the Analytical Platform. This package is maintained by the Internal Data Linking team, support is offered via the #ask-data-linking Slack channel.

R Packages

The following R packages are maintained by the AP community

dbtools

Allows you to access Athena databases from the Analytical Platform using a reticulate wrapper around pydbtools.

Rdbtools

Allows you to access Athena databases from the Analytical Platform using an extension of the noctua R package.

Rs3tools

Allows you to access AWS S3 from the Analytical Platform, which is mainly compatible with the legacy package s3tools.

Data Science Tools and Services

Data Science Asset Register

Process for managing deployed data science assets. The register itself can be found here.

 MLFlow Tracking Server

A user interface for MLFlow Tracking Server that allows users to track their model experiments.

This page was last reviewed on 8 December 2022. It needs to be reviewed again on 8 December 2023 by the page owner #analytical-platform-support .
This page was set to be reviewed before 8 December 2023 by the page owner #analytical-platform-support. This might mean the content is out of date.