1 of 6

Base

Introduction to Base

Base is a genomics data aggregation and knowledge management solution suite. It is a secure and scalable integrated genomics data analysis solution which provides Information management and knowledge mining. You can analyze, aggregate and query data for new insights that can inform and improve diagnostic assay development, clinical trials, patient testing and patient care. Clinically relevant data needs to be generated and extracted from routine clinical testing and clinical questions need to be asked across all data and information sources. As a large data store, Base provides a secure and compliant environment to accumulate data, allowing for efficient exploration of the aggregated data. This data consists of test results, patient data, metadata, reference data, consent and QC data.

Use Cases

Base can be used by for different use cases:

Clinical and Academic Researchers:
- Big data storage solution housing all aggregated sample test outcomes
- Analyze information by way of a convenient query formalism

Base Action Possibilities

Data Warehouse Creation: Build a relational database for your Project in which desired data sets can be selected and aggregated. Typical data sets include pipeline output metrics and other suitable data files generated by the ICA platform which can be complemented by additional public (or privately built) databases.
Report and Export: Once created, a data warehouse can be mined using standard database query instructions. All Base data is stored in a structured and easily accessible way. An interface allows for the selection of specific datasets and conditional reporting. All queries can be stored, shared, and re-used in the future. This type of standard functionality supports most expected basic mining operations, such as variant frequency aggregation. All result sets can be downloaded or exported in various standard data formats for integration in other reporting or analytical applications.

Access

The Base module can be found at Projects > your_project > Base. In order to use Base, you need to meet the following requirements:

Subscription

On the domain level, Base needs to be included in the subscription (full and premium subscriptions give access to Base).

Enabling Base

Once a project is created, the project owner must navigate to Projects > your_project > Base and click the Enable button. From that moment on, every user who has the proper permissions has access to the Base module in that project.

Enabling User Access

Access to the projects and Base is configured on the Projects > your_project > Project settings > Team page. Here you can add or edit a user or workgroup and give them .

Activity

The status and history of Base activities and jobs are shown on the page.

Tables

All tables created within Base are gathered on the Projects > your_project > Base > Tables page. New tables can be created and existing tables can be updated or deleted here.

Create a new Table

To create a new table, click Projects > your_project > Base > Tables > +Create. Tables can be created from scratch or from a template that was previously saved. Views on data from Illumina hardware and processes are selected with the option Import from catalogue.

If you make a mistake in the order of columns when creating your table, then as long as you have not saved your table, you can switch to Edit definition to change the column order. The text editor can swap or move columns whereas the built-in editor can only delete columns or add columns to the end of the sequence. When editing in text mode, it is best practice to copy the content of the text editor to a notepad before you make changes because a corrupted syntax will result in the text being wiped or reverted when switching between text and non-text mode.

Once a table is saved it is no longer possible to edit the schema, only new fields can be added. The workaround is switching to text mode, copying the schema of the table to which you want to make modifications and paste it into a new empty table where the necessary changes can be made before saving.

Once created, do not try to modify your table column layout via the Query module as even though you can execute ALTER TABLE commands, the definitions and syntax of the table will go out of sync resulting in processing issues.

Be careful when naming tables when you want to use them in . Table names have to be unique per bundle, so no two tables with the same name can be part of the same bundle.

Empty Table

To create a table from scratch, complete the fields listed below and click the Save button. Once saved, a job will be created to create the table. To view table creation progress, navigate to the Activity page.

Table information

The table name is a required field and must be unique. The first character of the table must be a letter followed by letters, numbers or underscores. The description is optional.

References

Including or excluding references can be done by checking or un-checking the Include reference checkbox. These reference fields are not shown on the table creation page, but are added to the schema definition, which is visible after creating the table (Projects > your_project > Base > Tables > your_table > Schema definition). By including references, additional columns will be added to the which can contain references to the data on the platform:

data_reference: reference to the data element in the Illumina platform from which the record originates
data_name: original name of the data element in the Illumina platform from which the record originates
sample_reference: reference to the sample in the Illumina platform from which the record originates

Schema

In an empty table, you can create a schema by adding a field with the +Add button for each column of the table and defining it. At any time during the creation process, it is possible to switch to the edit definition mode and back. The definition mode shows the JSON code, whereas the original view shows the fields in a table.

Each field requires:

a unique name (*1) with optional description.
a type
- String – collection of characters

(*1) Do not use reserved Snowflake keywords such as left, right, sample, select, table,... (https://docs.snowflake.com/en/sql-reference/reserved-keywords) for your schema name as this will lead to SQL compilation errors.

(*2) Float values will be exported differently depending on the output format. For example JSON will use scientific notation so verify that your consecutive processing methods support this.

(*3) Defining the precision when creating tables with SQL is not supported as this will result in rounding issues.

From template

Users can create their own template by making a table which is turned into a template at Projects > your_project > Base > Tables > your_table > Manage (top right) > Save as template.

If a template is created and available/active, it is possible to create a new table based on this template. The table information and references follow the rules of the empty table but in this case the schema will be pre-filled. It is possible to still edit the schema that is based on the template.

Table information

Table status

The status of a table can be found at Projects > your_project > Base > Tables. The possible statuses are:

Available: Ready to be used, both with or without data
Pending: The system is still processing the table, there is probably a process running to fill the table with data
Deleted: The table is deleted functionally; it still exists and can be shown in the list again by clicking the Show deleted tables/views button

Additional Considerations

Tables created from empty data or from a template are available faster.
When copying a table with data, it can remain in a Pending for longer periods of time.
Clicking on the page's refresh button will update the list.

Table details

For any available table, the following details are shown:

Table information: Name, description, status, number of records and data size.

The data size of tables with the same layout and content may vary slightly, depending on when and how the data was written by Snowflake.

Definition: An overview of the table schema, also available in text. Fields can be added to the schema but not deleted. Tip for deleting fields: copy the schema as text and paste in a new empty table where the schema is still editable.
Preview: A preview of the table for the 50 first rows (when data is uploaded into the table). Select show details to see record details.
Source Data: the files that are currently uploaded into the table. You can see the Load Status of the files which can be Prepare Started

Table actions

From within the details of a table it is possible to perform the following actions from the Manage menu (top right) of the table:

Edit: Add fields to the table and change the table description.
Copy: Create a copy from this table in the same or a different project. In order to copy to another project, data sharing of the original project should be enabled in the details of this project. The user also has to have access to both original and target project.
Export as file: Export this table as a CSV or JSON file. The exported file can be found in a project where the user has the access to download it.

Manually importing data to your Table

To manually add data to your table, go to Projects > your_project > Base > Tables > your_table > Manage (top right) > Add Data

Data selection

The data selection screen will show options to select the structure as CSV (comma-separated), TSV (tab-separated) or JSON (JavaScript Object Notation) and the location of your source data. In the first step, you select the data format and the files containing the data.

Data format (required): Select the format of the data which you want to import.
Write preference: Define if data can be written to the table only when the table is empty, if the data should be appended to the table or if the table should be overwritten.
Delimiter: Which delimiter is used in the delimiter separated file. If the required delimiter is not comma, tab or pipe, select custom and define the custom delimiter.

Most of the advanced options are legacy functions and should not be used. The only exceptions are

Encoding: Select if the encoding is UTF-8 (any Unicode character) or ISO-8859-1 (first 256 Unicode characters).
Ignore unknown values: This applies to CSV-formatted files. You can use this function to handle optional fields without separators, provided that the missing fields are located at the end of the row. Otherwise, the parser can not detect the missing separator and will shift fields to the left, resulting in errors.
- If headers are used: The columns that have matching fields are loaded, those that have no matching fields are loaded with NULL and remaining fields are discarded.

Data import progress

To see the status of your data import, go to Projects > your_project > Activity > Base Jobs where you will see a job of type Prepare Data which will have succeeded or failed. If it has failed, you can see the error message and details by double-clicking the base job. You can then take corrective actions if the input mismatched with the table design and try to run the import again (with a new copy of the file as each input file can only be used once)

If you need to cancel the import, you can do so while it is scheduled by navigating to the Base Jobs inventory and selecting the job followed by Abort.

List of table data sources

To see which data has been used to populate your table go to Projects > your_project > Base > Tables > your_table > Source Data. This will list all the source data files, including those that failed to be imported. You can not use these files anymore to import again to prevent double entries. The load status will remain empty while the data is being processed and be set to load succeeded or failed after loading completes.

How to load array data in Base

Base Table schema definitions do not include an array type, but arrays can be ingested using either the Repeated mode for arrays containing a single type (ie, String), or the Variant type.

Parsing nested JSON data

If you have a nested JSON structure, you can import it into individual fields of your table.

For example, if your JSON nested structure looks like the above and you want to get it imported into a table with a, b and c having integers as values, you need to create a matching table. This can be done either or via the sql command CREATE OR REPLACE TABLE json_data ( a INTEGER, b INTEGER, c INTEGER);

Format your JSON data to have single lines per structure.

Finally, create a to import your data or perform a .

The resulting table will look like this:

Data Catalogue

Data Catalogues provide views on data from Illumina hardware and processes (Instruments, Cloud software, Informatics software and Assays) so that this data can be distributed to different applications. This data consists of read-only tables to prevent updates by the applications accessing it. Access to data catalogues is included with professional and enterprise subscriptions.

Available views

Project-level views

Query

Queries can be used for data mining. On the Projects > your_project > Base > Query page:

New queries can be created and executed
Already executed queries can be found in the query history

Schedule

On the Schedule page at Projects > your_project > Base > Schedule, it’s possible to create a job for importing different types of data you have access to into an existing table.

When creating or editing a schedule, Automatic import is performed when the Active box is checked. The job will run at 10 minute intervals. In addition, for both active and inactive schedules, a manual import is performed when selecting the schedule and clicking the »run button.

Configure a schedule

There are different types of schedules that can be set up:

Files
Metadata
Administrative data.

Files

This type will load the content of specific files from this project into a table. When adding or editing this schedule you can define the following parameters:

Name (required): The name of the scheduled job
Description: Extra information about the schedule
File name pattern (required): Define in this field a part or the full name of the file name or of the tag that the files you want to upload contain. For example, if you want to import files named sample1_reads.txt, sample2_reads.txt, … you can fill in _reads.txt in this field to have all files that contain _reads.txt imported to the table.

Metadata

This type will create two new tables: BB_PROJECT_PIPELINE_EXECUTIONS_DETAIL and ICA_PROJECT_SAMPLE_META_DATA. The job will load metadata (added to the samples) into ICA_PROJECT_SAMPLE_META_DATA. The process gathers the metadata from the samples via the data linked to the project and the metadata from the analyses in this project. Furthermore, the schedular will add provenance data to BB_PROJECT_PIPELINE_EXECUTIONS_DETAIL. This process gathers the execution details of all the analyses in the project: the pipeline name and status, the user reference, the input files (with identifiers), and the settings selected at runtime. This enables you to track the lineage of your data and to identify any potential sources of errors or biases. So, for example, the following query will count how many times each of the pipelines was executed and sort it accordingly:

To obtained the similar table for the failed runs, you can execute the following SQL query:

When adding or editing this schedule you can define the following parameters:

Name (required): the name of this scheduled job.
Description: Extra information about the schedule.
Include sensitive meta data fields: in the meta data fields configuration, fields can be set to sensitive. When checked, those fields will also be added.

Administrative data

This type will automatically create a table and load administrative data into this table. A usage overview of all executions is considered administrative data.

When adding or editing this schedule the following parameters can be defined:

Name (required): The name of this scheduled job.
Description: Extra information about the schedule.
Include sensitive metadata fields: In the metadata fields configuration, fields can be set to sensitive. When checked, those fields will also be added.

Delete schedule

Schedules can be deleted. Once deleted, they will no longer run, and they will not be shown in the list of schedules.

Run schedule

When clicking the Run button, or Save & Run when editing, the schedule will start the job of importing the configured data in the correct tables. This way the schedule can be run manually. The result of the job can be seen in the tables. The load status is empty while the data is being processed and set to failed or succeeded once loading completes.

Snowflake

User

Every Base user has 1 snowflake username: ICA_U_<id>

User/Project-Bundle

Tables

All tables created within Base are gathered on the Projects > your_project > Base > Tables page. New tables can be created and existing tables can be updated or deleted here.

Create a new Table

Be careful when naming tables when you want to use them in . Table names have to be unique per bundle, so no two tables with the same name can be part of the same bundle.

Empty Table

Table information

The table name is a required field and must be unique. The first character of the table must be a letter followed by letters, numbers or underscores. The description is optional.

References

data_reference: reference to the data element in the Illumina platform from which the record originates
data_name: original name of the data element in the Illumina platform from which the record originates
sample_reference: reference to the sample in the Illumina platform from which the record originates

Schema

Each field requires:

a unique name (*1) with optional description.
a type
- String – collection of characters

(*2) Float values will be exported differently depending on the output format. For example JSON will use scientific notation so verify that your consecutive processing methods support this.

(*3) Defining the precision when creating tables with SQL is not supported as this will result in rounding issues.

From template

Users can create their own template by making a table which is turned into a template at Projects > your_project > Base > Tables > your_table > Manage (top right) > Save as template.

Table information

Table status

The status of a table can be found at Projects > your_project > Base > Tables. The possible statuses are:

Available: Ready to be used, both with or without data
Pending: The system is still processing the table, there is probably a process running to fill the table with data
Deleted: The table is deleted functionally; it still exists and can be shown in the list again by clicking the Show deleted tables/views button

Additional Considerations

Tables created from empty data or from a template are available faster.
When copying a table with data, it can remain in a Pending for longer periods of time.
Clicking on the page's refresh button will update the list.

Table details

For any available table, the following details are shown:

Table information: Name, description, status, number of records and data size.

The data size of tables with the same layout and content may vary slightly, depending on when and how the data was written by Snowflake.

Definition: An overview of the table schema, also available in text. Fields can be added to the schema but not deleted. Tip for deleting fields: copy the schema as text and paste in a new empty table where the schema is still editable.
Preview: A preview of the table for the 50 first rows (when data is uploaded into the table). Select show details to see record details.
Source Data: the files that are currently uploaded into the table. You can see the Load Status of the files which can be Prepare Started

Table actions

From within the details of a table it is possible to perform the following actions from the Manage menu (top right) of the table:

Edit: Add fields to the table and change the table description.
Copy: Create a copy from this table in the same or a different project. In order to copy to another project, data sharing of the original project should be enabled in the details of this project. The user also has to have access to both original and target project.
Export as file: Export this table as a CSV or JSON file. The exported file can be found in a project where the user has the access to download it.

Manually importing data to your Table

To manually add data to your table, go to Projects > your_project > Base > Tables > your_table > Manage (top right) > Add Data

Data selection

Data format (required): Select the format of the data which you want to import.
Write preference: Define if data can be written to the table only when the table is empty, if the data should be appended to the table or if the table should be overwritten.
Delimiter: Which delimiter is used in the delimiter separated file. If the required delimiter is not comma, tab or pipe, select custom and define the custom delimiter.

Most of the advanced options are legacy functions and should not be used. The only exceptions are

Encoding: Select if the encoding is UTF-8 (any Unicode character) or ISO-8859-1 (first 256 Unicode characters).
Ignore unknown values: This applies to CSV-formatted files. You can use this function to handle optional fields without separators, provided that the missing fields are located at the end of the row. Otherwise, the parser can not detect the missing separator and will shift fields to the left, resulting in errors.
- If headers are used: The columns that have matching fields are loaded, those that have no matching fields are loaded with NULL and remaining fields are discarded.

Data import progress

If you need to cancel the import, you can do so while it is scheduled by navigating to the Base Jobs inventory and selecting the job followed by Abort.

List of table data sources

How to load array data in Base

Base Table schema definitions do not include an array type, but arrays can be ingested using either the Repeated mode for arrays containing a single type (ie, String), or the Variant type.

Parsing nested JSON data

If you have a nested JSON structure, you can import it into individual fields of your table.

Format your JSON data to have single lines per structure.

Finally, create a to import your data or perform a .

The resulting table will look like this:

Schedule

On the Schedule page at Projects > your_project > Base > Schedule, it’s possible to create a job for importing different types of data you have access to into an existing table.

Configure a schedule

There are different types of schedules that can be set up:

Files
Metadata
Administrative data.

Files

This type will load the content of specific files from this project into a table. When adding or editing this schedule you can define the following parameters:

Name (required): The name of the scheduled job
Description: Extra information about the schedule
File name pattern (required): Define in this field a part or the full name of the file name or of the tag that the files you want to upload contain. For example, if you want to import files named sample1_reads.txt, sample2_reads.txt, … you can fill in _reads.txt in this field to have all files that contain _reads.txt imported to the table.

Metadata

To obtained the similar table for the failed runs, you can execute the following SQL query:

When adding or editing this schedule you can define the following parameters:

Name (required): the name of this scheduled job.
Description: Extra information about the schedule.
Include sensitive meta data fields: in the meta data fields configuration, fields can be set to sensitive. When checked, those fields will also be added.

Administrative data

This type will automatically create a table and load administrative data into this table. A usage overview of all executions is considered administrative data.

When adding or editing this schedule the following parameters can be defined:

Name (required): The name of this scheduled job.
Description: Extra information about the schedule.
Include sensitive metadata fields: In the metadata fields configuration, fields can be set to sensitive. When checked, those fields will also be added.

Delete schedule

Schedules can be deleted. Once deleted, they will no longer run, and they will not be shown in the list of schedules.

Base

hashtagIntroduction to Base

hashtagUse Cases

hashtagBase Action Possibilities

hashtagAccess

hashtagSubscription

hashtagEnabling Base

hashtagEnabling User Access

hashtagActivity

Tables

hashtagCreate a new Table

hashtagEmpty Table

hashtagTable information

hashtagReferences

hashtagSchema

hashtagFrom template

hashtagTable information

hashtagTable status

hashtagTable details

hashtagTable actions

hashtagManually importing data to your Table

hashtagData selection

hashtagData import progress

hashtagList of table data sources

hashtagHow to load array data in Base

hashtagParsing nested JSON data

Data Catalogue

hashtagAvailable views

Query

Schedule

hashtagConfigure a schedule

hashtagFiles

hashtagMetadata

hashtagAdministrative data

hashtagDelete schedule

hashtagRun schedule

Snowflake

hashtagUser

hashtagUser/Project-Bundle

Base

hashtagIntroduction to Base

hashtagUse Cases

hashtagBase Action Possibilities

hashtagAccess

hashtagSubscription

hashtagEnabling Base

hashtagEnabling User Access

hashtagActivity

Tables

hashtagCreate a new Table

hashtagEmpty Table

hashtagTable information

hashtagReferences

hashtagSchema

hashtagFrom template

hashtagTable information

hashtagTable status

hashtagTable details

hashtagTable actions

hashtagManually importing data to your Table

hashtagData selection

hashtagData import progress

hashtagList of table data sources

hashtagHow to load array data in Base

hashtagParsing nested JSON data

Query

hashtagNew Query

hashtagAvailable tables

hashtagInput

hashtagBest practices and notes

hashtagQuerying data within columns.

hashtagQuerying data within an array

hashtagQuery results

hashtagRun a new query

hashtagQuery history

hashtagSaved Queries

hashtagRun a saved Query

hashtagShared database for project

Snowflake

hashtagUser

Introduction to Base

Use Cases

Base Action Possibilities

Access

Subscription

Enabling Base

Enabling User Access

Activity

Create a new Table

Empty Table

Table information

References

Schema

From template

Table information

Table status

Table details

Table actions

Manually importing data to your Table

Data selection

Data import progress

List of table data sources

How to load array data in Base

Parsing nested JSON data

Available views

Configure a schedule

Files

Metadata

Administrative data

Delete schedule

Run schedule

User

User/Project-Bundle

Introduction to Base

Use Cases

Base Action Possibilities

Access

Subscription

Enabling Base

Enabling User Access

Activity

Create a new Table

Empty Table

Table information

References

Schema

From template

Table information

Table status

Table details

Table actions

Manually importing data to your Table

Data selection

Data import progress

List of table data sources

How to load array data in Base

Parsing nested JSON data

New Query

Available tables

Input

Best practices and notes

Querying data within columns.

Querying data within an array

Query results

Run a new query

Query history

Saved Queries

Run a saved Query

Shared database for project

User

User/Project-Bundle

Roles

Project viewer role

Project contributor role

Warehouses

Synchronizing Tables

Configure a schedule

Files

Metadata

Administrative data