Metadata Models

Illumina Connected Analytics allows you to create and assign metadata to capture additional information about samples.

Every tenant has a root metadata model that is accessible to all projects of that tenant. This allows an organization to collect the same piece of information, such as an ID number, for every sample in every project. Within this root model, you can configure multiple metadata submodels, even at different levels. These submodels inherit all fields and groups from their parent models.

Illumina recommends that you limit the amount of fields or field groups you add to the root model. Fields can have various types containing single or multiple values and field groups contain fields that belong together, such as all fields related to quality metrics. If there are any misconfigured items in the root model, it will carry over into all other tenant metadata models. Once a root model is published, the fields and groups that are defined within it cannot be deleted, only more fields can be added.

Illumina recommends that you limit the amount of fields or field groups you add to the root model as this model can not be deprecated and anything you add to the root model can not be removed. You should always consider creating submodels before adding anything to the root model.

Do not use dots (.) in the metadata model names, fieldgroup names or field names as this can cause issues with field data.

When configuring a project, you can assign a published metadata model for all samples in the project. This metadata model can be any published metadata model in your tenant such as the root model, or one of the lower level submodels. When a metadata model is selected for a project, all fields configured for the metadata model, and all fields in any parent models are applied to the samples in the project.

Metadata gives information about a sample and can be provided by the user, the pipeline and the API. There are 2 general categories of metadata models: Project Metadata models and Pipeline Metadata models . Both models contain metadata fields and groups.

The project metadata model is specific per tenant. A Project metadata model has metadata linked to a specific project. Values are known upfront, general information is required for each sample of a specific project, and it may include general mandatory company information.
The pipeline metadata model is linked to a pipeline, not to a project and can be shared across tenants. Values are populated during pipeline execution and it requires an output file with the name 'metadata.response.json'.

Field groups should be used when configuring metadata fields that are filled by a pipeline. These fields should be part of the same field group and be configured with the Multiple Value setting enabled.

Each sample can have multiple metadata models. When you link a project metadata model to your project, you will see its groups and fields present on each sample. The root model from that tenant will be present as every metadata model inherits the groups and fields specified in the parent metadata model(s). When a pipeline is executed with single sample and the pipeline containing a metadata model, the groups and fields will be present as well for each analysis resulting from a pipeline execution.

Creating a Metadata Model

In the main navigation, go to System Settings > Metadata Models. Here you will see the root metadata model and any underlying sub-metadata models. To create a new submodel, select +Create at the bottom of the screen.

The new metadata model screen will show an overview of all the higher-level metadata models. use the down arrow next to the model name to expand these for more information.

For your new metadata model, add a unique name and optional description. Once this is done, start adding the metadata fields with the +Add button. The field type will determine the parameters which you can configure.

To edit your metadata model later on, select it and choose Manage > Edit. Keep in mind that fields can be added, but not removed once the model is published.

Field Types & Properties

field types

Text

Free text

Keyword

Automatically complete value based on already used values

Numeric

Only numbers

Boolean

True or false, cannot be multiple value

Date

e.g. 23/02/2022

Date time

e.g. 23/02/2022 11:43:53, saved in UTC

Enumeration

select value from list. Enter the values in the options field which appears when you have selected enumeration type.

Field Group

Groups fields. Once you have chosen this, the +Add group field becomes available to add fields to this group.

The following properties can be selected for groups & fields:

Propery

Required

Pipeline can not be started with this sample until the required group/field is filled in.

Sensitive

Values of this group/field are only visible to project users of the own tenant. When a sample is shared across tenants, these fields will not be visible.

Multi value

This group/field can consist of multiple (grouped) values

Filled by pipeline

Fields that need to be filled by pipeline should be part of the same group. This group will automatically be multiple value and values will be available after pipeline execution. This property is only available for the Field Group type.

If you have fields that are filled by the pipeline you can create an example JSON structure indicating what the json in an analysis output file with name metadata.response.json should look like to fill in the metadata fields of this model. Use System Settings > Metadata Models > your_metadata_model > Manage > Generate example JSON. Only fields in groups marked as Filled by pipeline are included.

Fields cannot be both required and filled by pipeline at the same time.

To help retrieve the field values via API calls, you can use System Settings > Metadata Models > your_metadata_model > Manage > Show Field Paths.

Metadata Actions

Publish a Metadata Model

Newly created and updated metadata models are not available for use within the tenant until the metadata model is published. Once a metadata model is published, fields and field groups cannot be deleted, but the names and descriptions for fields and field groups can be edited. A model can be published after verifying all parent models are published first. To publish your model, select System Settings > Metadata Models > your_metadata_model > Manage > Publish.

Retire a Metadata Model

If a published metadata model is no longer needed, you can retire the model (except the root model). Once a model is retired, it can be published again in case you would need to reactivate it.

First, check if the model contains any submodels. A model cannot be retired if it contains any published submodels.
When you are certain you want to retire a model and all submodels are retired, select System Settings > Metadata Models > your_metadata_model > Manage > Retire Metadata Model.

Assign a Metadata Model to a Project

To add metadata to your samples, you first need to assign a metadata model to your project.

Go to Projects > your_project > Project Settings > Details.
Select Edit.
From the Metadata Model drop-down list, select the metadata model you want to use for the project.
Select Save. All fields configured for the metadata model, and all fields in any parent models are applied to the samples in the project.

Add Metadata to Samples Manually

If you have a metadata model assigned to your project, you can manually fill out the defined metadata of the samples in your project:

Go to Projects > your_project > Samples > your_sample.
Click your sample to open the sample details and choose Edit Sample.
Enter all metadata information as it applies to the selected sample. All required metadata fields must be populated or the pipeline will not be able to start.
Select Save

Populating a Pipeline Metadata Model

To fill metadata by pipeline executions, a pipeline model must be created.

In the main navigation, go to Projects > your_project > Flow > Pipelines > your_pipeline.
Click on your pipeline to open the pipeline details and choose Edit.
Create/Edit your model under Metadata Model tab. Field groups should be used when configuring metadata fields that are filled by a pipeline. These fields should be part of the same field group and be configured with the Multiple Value setting enabled.

In order for your pipeline to fill the metadata model, an output file with the name metadata.response.json must be generated. After adding your group fields to the pipeline model, click on Generate example JSON to view the required format for your pipeline.

Use System Settings > Metadata Models > your_metadata_model > Manage > Generate example JSON to see an example JSON for these fields.

The field names cannot have . in them, e.g. for the metric name Q30 bases (excl. dup & clipped bases) the . after excl must be removed.

Pushing Metadata Metrics to Base

Populating metadata models of samples allows having a sample-centric view of all the metadata. It is also possible to synchronize that data into your project's Base warehouse.

In ICA, select Projects > your_project >Base > Schedule.
Select +Create > From metadata.
Type a name for your schedule, optionally add a description, and set it to active. You can select if sensitive metadata fields should be included as values of sensitive metadata fields will not be visible to other users outside of the project.
Select Save.
Navigate to Base > Tables in your project.
Two new table schemas should be added with your current metadata models.

PreviousEvent Log NextDocker Repository

Last updated 2 months ago

Was this helpful?