All tables created within Base are gathered on the Projects > your_project > Base > Tables page. New tables can be created and existing tables can be updated or deleted here.
Create a new Table
To create a new table, click Projects > your_project > Base > Tables > +Create. Tables can be created from scratch or from a template that was previously saved. Views on data from Illumina hardware and processes are selected with the option Import from catalogue.
If you make a mistake in the order of columns when creating your table, then as long as you have not saved your table, you can switch to Edit definition to change the column order. The text editor can swap or move columns whereas the built-in editor can only delete columns or add columns to the end of the sequence. When editing in text mode, it is best practice to copy the content of the text editor to a notepad before you make changes because a corrupted syntax will result in the text being wiped or reverted when switching between text and non-text mode.
Once a table is saved it is no longer possible to edit the schema, only new fields can be added. The workaround is switching to text mode, copying the schema of the table to which you want to make modifications and paste it into a new empty table where the necessary changes can be made before saving.
Once created, do not try to modify your table column layout via the Query module as even though you can execute ALTER TABLE commands, the definitions and syntax of the table will go out of sync resulting in processing issues.
Be careful when naming tables when you want to use them in . Table names have to be unique per bundle, so no two tables with the same name can be part of the same bundle.
Empty Table
To create a table from scratch, complete the fields listed below and click the Save button. Once saved, a job will be created to create the table. To view table creation progress, navigate to the Activity page.
Table information
The table name is a required field and must be unique. The first character of the table must be a letter followed by letters, numbers or underscores. The description is optional.
References
Including or excluding references can be done by checking or un-checking the Include reference checkbox. These reference fields are not shown on the table creation page, but are added to the schema definition, which is visible after creating the table (Projects > your_project > Base > Tables > your_table > Schema definition). By including references, additional columns will be added to the which can contain references to the data on the platform:
data_reference: reference to the data element in the Illumina platform from which the record originates
data_name: original name of the data element in the Illumina platform from which the record originates
sample_reference: reference to the sample in the Illumina platform from which the record originates
Schema
In an empty table, you can create a schema by adding a field with the +Add button for each column of the table and defining it. At any time during the creation process, it is possible to switch to the edit definition mode and back. The definition mode shows the JSON code, whereas the original view shows the fields in a table.
Each field requires:
a unique name (*1) with optional description.
a type
String – collection of characters
(*1) Do not use reserved Snowflake keywords such as left, right, sample, select, table,... (https://docs.snowflake.com/en/sql-reference/reserved-keywords) for your schema name as this will lead to SQL compilation errors.
(*2) Float values will be exported differently depending on the output format. For example JSON will use scientific notation so verify that your consecutive processing methods support this.
(*3) Defining the precision when creating tables with SQL is not supported as this will result in rounding issues.
From template
Users can create their own template by making a table which is turned into a template at Projects > your_project > Base > Tables > your_table > Manage (top right) > Save as template.
If a template is created and available/active, it is possible to create a new table based on this template. The table information and references follow the rules of the empty table but in this case the schema will be pre-filled. It is possible to still edit the schema that is based on the template.
Table information
Table status
The status of a table can be found at Projects > your_project > Base > Tables. The possible statuses are:
Available: Ready to be used, both with or without data
Pending: The system is still processing the table, there is probably a process running to fill the table with data
Deleted: The table is deleted functionally; it still exists and can be shown in the list again by clicking the Show deleted tables/views button
Additional Considerations
Tables created from empty data or from a template are available faster.
When copying a table with data, it can remain in a Pending for longer periods of time.
Clicking on the page's refresh button will update the list.
Table details
For any available table, the following details are shown:
Table information: Name, description, status, number of records and data size.
The data size of tables with the same layout and content may vary slightly, depending on when and how the data was written by Snowflake.
Definition: An overview of the table schema, also available in text. Fields can be added to the schema but not deleted. Tip for deleting fields: copy the schema as text and paste in a new empty table where the schema is still editable.
Preview: A preview of the table for the 50 first rows (when data is uploaded into the table). Select show details to see record details.
Source Data: the files that are currently uploaded into the table. You can see the Load Status of the files which can be Prepare Started
Table actions
From within the details of a table it is possible to perform the following actions from the Manage menu (top right) of the table:
Edit: Add fields to the table and change the table description.
Copy: Create a copy from this table in the same or a different project. In order to copy to another project, data sharing of the original project should be enabled in the details of this project. The user also has to have access to both original and target project.
Export as file: Export this table as a CSV or JSON file. The exported file can be found in a project where the user has the access to download it.
Manually importing data to your Table
To manually add data to your table, go to Projects > your_project > Base > Tables > your_table > Manage (top right) > Add Data
Data selection
The data selection screen will show options to select the structure as CSV (comma-separated), TSV (tab-separated) or JSON (JavaScript Object Notation) and the location of your source data. In the first step, you select the data format and the files containing the data.
Data format (required): Select the format of the data which you want to import.
Write preference: Define if data can be written to the table only when the table is empty, if the data should be appended to the table or if the table should be overwritten.
Delimiter: Which delimiter is used in the delimiter separated file. If the required delimiter is not comma, tab or pipe, select custom and define the custom delimiter.
Most of the advanced options are legacy functions and should not be used. The only exceptions are
Encoding: Select if the encoding is UTF-8 (any Unicode character) or ISO-8859-1 (first 256 Unicode characters).
Ignore unknown values: This applies to CSV-formatted files. You can use this function to handle optional fields without separators, provided that the missing fields are located at the end of the row. Otherwise, the parser can not detect the missing separator and will shift fields to the left, resulting in errors.
If headers are used: The columns that have matching fields are loaded, those that have no matching fields are loaded with NULL and remaining fields are discarded.
Data import progress
To see the status of your data import, go to Projects > your_project > Activity > Base Jobs where you will see a job of type Prepare Data which will have succeeded or failed. If it has failed, you can see the error message and details by double-clicking the base job. You can then take corrective actions if the input mismatched with the table design and try to run the import again (with a new copy of the file as each input file can only be used once)
If you need to cancel the import, you can do so while it is scheduled by navigating to the Base Jobs inventory and selecting the job followed by Abort.
List of table data sources
To see which data has been used to populate your table go to Projects > your_project > Base > Tables > your_table > Source Data. This will list all the source data files, including those that failed to be imported. You can not use these files anymore to import again to prevent double entries. The load status will remain empty while the data is being processed and be set to load succeeded or failed after loading completes.
How to load array data in Base
Base Table schema definitions do not include an array type, but arrays can be ingested using either the Repeated mode for arrays containing a single type (ie, String), or the Variant type.
Parsing nested JSON data
If you have a nested JSON structure, you can import it into individual fields of your table.
For example, if your JSON nested structure looks like the above and you want to get it imported into a table with a, b and c having integers as values, you need to create a matching table. This can be done either or via the sql command CREATE OR REPLACE TABLE json_data ( a INTEGER, b INTEGER, c INTEGER);
Format your JSON data to have single lines per structure.
Finally, create a to import your data or perform a .
The resulting table will look like this:
#
A
B
C
sample_name: name of the sample in the Illumina platform from which the record originates
pipeline_reference: reference to the pipeline in the Illumina platform from which the record originates
pipeline_name: name of the pipeline in the Illumina platform from which the record originates
execution_reference: reference to the pipeline execution in the Illumina platform from which the record originates
account_reference: reference to the account in the Illumina platform from which the record originates
account_name: name of the account in the Illumina platform from which the record originates
Bytes – raw binary data
Integer – whole numbers
Float – fractional numbers (*2)
Numeric – any number (*3)
Boolean – only options are “true” or “false”
Timestamp - Stores number of (milli)seconds passed since the Unix epoch
Date - Stores date in the format YYYY-MM-DD
Time - Stores time in the format HH:MI:SS
Datetime - Stores date and time information in the format YYYY-MM-DD HH:MI:SS
Record – has a child field
Variant - can store a value of any other type, including OBJECT and ARRAY
a mode
Required - Mandatory field
Nullable - Field is allowed to have no value
Repeated - Multiple values are allowed in this field (will be recognized as array in Snowflake)
,
Prepare Succeeded
or
Prepare Failed
and finally
Load Succeeded
or
Load Failed
.
Save as template: Save the schema or an edited form of it as a template.
Add data: Load additional data into the table manually. This can be done by selecting data files previously uploaded to the project, or by dragging and dropping files directly into the popup window for adding data to the table. It’s also possible to load data into a table manually or automatically via a pre-configured job. This can be done on the Schedule page.
Delete: Delete the table.
Custom delimiter: If a custom delimiter is used in the source data, it must be defined here.
Header rows to skip: The number of consecutive header rows (at the top of the table) to skip.
References: Choose which references must be added to the table.
If no headers are used: The fields are loaded in order of occurrence and trailing missing fields are loaded with NULL, trailing additional fields are discarded.
Data Catalogues provide views on data from Illumina hardware and processes (Instruments, Cloud software, Informatics software and Assays) so that this data can be distributed to different applications. This data consists of read-only tables to prevent updates by the applications accessing it. Access to data catalogues is included with professional and enterprise subscriptions.
CLARITY_SEQUENCINGRUN_VIEW_tenant (sequencing run data coming from the lab workflow software)
CLARITY_SAMPLE_VIEW_tenant (sample data coming from the lab workflow software)
Preconditions for view content
DRAGEN metrics will only have content when DRAGEN pipelines have been executed.
Analysis views will only have content when analyses have been executed.
Views containing Clarity data will only have content if you have a Clarity LIMS instance with minimum version 6.0 and the Product Analytics service installed and configured. Please see the for more information.
Who can add or remove Catalogue data (views) to a project?
Members of a project, who have both base contributor and project contributor or administrator rights and who belong to the same tenant as the project can add views from a Catalogue. Members of a project with the same rights who do not belong to the same tenant can remove the catalogue views from a project. Therefore, if you are invited to collaborate on a project, but belong to a different tenant, you can remove catalogue views, but cannot add them again.
Adding Catalogue data (views) to your project
To add Catalogue data,
Go to Projects > your_project > Base > Tables.
Select Add table > Import from Catalogue.
A list of available views will be displayed. (Note that views which are already part of your project are not listed)
Catalogue data will have View as type, the same as tables which are linked from other projects.
Removing Catalogue data (views) from your project
To delete Catalogue data,
go to Projects > your_project > Base > Tables.
Select the table you want to delete and choose Delete.
A warning will be presented to confirm your choice. Once deleted, you can add the Catalogue data again if needed.
Description: An explanation of which data is contained in the view.
Category: The identification of the source system which provided the data.
Catalogue table details (Table Schema Definition)
In the Projects > your_project > Base > Tables view, double-click the Catalogue table to see the details. For an overview of the available actions and details, see .
Querying views
In this section, we provide examples of querying selected views from the Base UI, starting with ICA_PIPELINE_ANALYSES_VIEW (project view). This table includes the following columns: TENANT_UUID, TENANT_ID, TENANT_NAME, PROJECT_UUID, PROJECT_ID, PROJECT_NAME, USER_UUID, USER_NAME, and PIPELINE_ANALYSIS_DATA. While the first eight columns contain straightforward data types (each holding a single value), the PIPELINE_ANALYSIS_DATA column is of type VARIANT, which can store multiple values in a nested structure. In SQL queries, this column returns data as a JSON object. To filter specific entries within this complex data structure, a combination of JSON functions and conditional logic in SQL queries is essential.
Since Snowflake offers robust JSON processing capabilities, the function can be utilized to expand JSON arrays within the PIPELINE_ANALYSIS_DATA column, allowing for the filtering of entries based on specific criteria. It's important to note that each entry in the JSON array becomes a separate row once flattened. Snowflake aligns fields outside of this FLATTEN operation accordingly, i.e. the record USER_ID in the SQL query below is "recycled".
The following query extracts
USER_NAME directly from the ICA_PIPELINE_ANALYSES_VIEW_project table.
PIPELINE_ANALYSIS_DATA:reference and PIPELINE_ANALYSIS_DATA:price. These are direct accesses into the JSON object stored in the PIPELINE_ANALYSIS_DATA column. They extract specific values from the JSON object.
Entries from the array 'steps' in the JSON object. The query uses LATERAL FLATTEN(input => PIPELINE_ANALYSIS_DATA:steps) to expand the steps array within the PIPELINE_ANALYSIS_DATA JSON object into individual rows. For each of these rows, it selects various elements (like bpeResourceLifeCycle, bpeResourcePresetSize, etc.) from the JSON.
Furthermore, the query filters the rows based on the status being 'FAILED' and the stepId not containing the word 'Workflow': it allows the user to find steps which failed.
Now let's have a look at DRAGEN_METRICS_VIEW_project view. Each DRAGEN pipeline on ICA creates multiple metrics files, e.g. SAMPLE.mapping_metrics.csv, SAMPLE.wgs_coverage_metrics.csv, etc for DRAGEN WGS Germline pipeline. Each of these files is represented by a row in DRAGEN_METRICS_VIEW_project table with columns ANALYSIS_ID, ANALYSIS_UUID, PIPELINE_ID, PIPELINE_UUID, PIPELINE_NAME, TENANT_ID, TENANT_UUID, TENANT_NAME, PROJECT_ID, PROJECT_UUID, PROJECT_NAME, FOLDER, FILE_NAME, METADATA, and ANALYSIS_DATA. ANALYSIS_DATA column contains the content of the file FILE_NAME as an array of JSON objects. Similarly to the previous query we will use FLATTEN command. The following query extracts
Sample name from the file names.
Two metrics 'Aligned bases in genome' and 'Aligned bases' for each sample and the corresponding values.
The query looks for files SAMPLE.wgs_coverage_metrics.csv only and sorts based on the sample name:
Lastly, you can combine these views (or rather intermediate results derived from these views) using the WITH and JOIN commands. The SQL snippet below demonstrates how to join two intermediate results referred to as 'flattened_dragen_scrna' and 'pipeline_table'. The query:
Selects two metrics ('Invalid barcode read' and 'Passing cells') associated with single-cell RNA analysis from records where the FILE_NAME ends with 'scRNA.metrics.csv', and then stores these metrics in a temporary table named 'flattened_dragen_scrna'.
Retrieves metadata related to all scRNA analyses by filtering on the pipeline ID from the 'ICA_PIPELINE_ANALYSES_VIEW_project' view and stores this information in another temporary table named 'pipeline_table'.
Joins the two temporary tables using the JOIN operator, specifying the join condition with the ON operator.
An example how to obtain the costs incurred by the individual steps of an analysis
You can use ICA_PIPELINE_ANALYSES_VIEW to obtained the costs of individual steps of an analysis. Using the following SQL snippet you can retrieve the costs of individual steps for every analyses run in the past week.
Limitations
Data Catalogue views cannot be shared as part of a Bundle.
Data size is not shown for views because views are a subset of data.
By removing Base from a project, the Data Catalogue will also be removed from that project.
Best Practices
As tenant-level Catalogue views can contain sensitive data, it is best to save this (filtered) data to a new table and share that table instead of sharing the entire view as part of a project. To do so, add your view to a separate project and run a query on the data at Projects > your_project > Base > Query > New Query. When the query completes, you can export the result as a new table. This ensures no new data will be added on consequent runs.
CLARITY_LIBRARY_VIEW_tenant (library data coming from the lab workflow software)
CLARITY_EVENT_VIEW_tenant (event data coming from the lab workflow software)
ICA_DRAGEN_QC_METRIC_ANALYSES_VIEW (quality control metrics)
When you use your storage in a project, metrics can not be collected and thus the DRAGEN METRICS - related views can not be used.
Select the table you want to add and choose +Select
Tenant/project. Appended to the view name as _tenant or _project. Determines if the data is visible for all projects within the same tenant or only within the project. Only the tenant administrator can see the non-project views.
SELECT
USER_NAME as user_name,
PIPELINE_ANALYSIS_DATA:reference as reference,
PIPELINE_ANALYSIS_DATA:price as price,
PIPELINE_ANALYSIS_DATA:totalDurationInSeconds as duration,
f.value:bpeResourceLifeCycle::STRING as bpeResourceLifeCycle,
f.value:bpeResourcePresetSize::STRING as bpeResourcePresetSize,
f.value:bpeResourceType::STRING as bpeResourceType,
f.value:completionTime::TIMESTAMP as completionTime,
f.value:durationInSeconds::INT as durationInSeconds,
f.value:price::FLOAT as price,
f.value:pricePerSecond::FLOAT as pricePerSecond,
f.value:startTime::TIMESTAMP as startTime,
f.value:status::STRING as status,
f.value:stepId::STRING as stepId
FROM
ICA_PIPELINE_ANALYSES_VIEW_project,
LATERAL FLATTEN(input => PIPELINE_ANALYSIS_DATA:steps) f
WHERE
f.value:status::STRING = 'FAILED'
AND f.value:stepId::STRING NOT LIKE '%Workflow%';
SELECT DISTINCT
SPLIT_PART(FILE_NAME, '.wgs_coverage_metrics.csv', 1) as sample_name,
f.value:column_2::STRING as metric,
f.value:column_3::FLOAT as value
FROM
DRAGEN_METRICS_VIEW_project,
LATERAL FLATTEN(input => ANALYSIS_DATA) f
WHERE
FILE_NAME LIKE '%wgs_coverage_metrics.csv'
AND (
f.value:column_2::STRING = 'Aligned bases in genome'
OR f.value:column_2::STRING = 'Aligned bases'
)
ORDER BY
sample_name;
WITH flattened_dragen_scrna AS (
SELECT DISTINCT
SPLIT_PART(FILE_NAME, '.scRNA.metrics.csv', 1) as sample_name,
ANALYSIS_UUID,
f.value:column_2::STRING as metric,
f.value:column_3::FLOAT as value
FROM
DRAGEN_METRICS_VIEW_project,
LATERAL FLATTEN(input => ANALYSIS_DATA) f
WHERE
FILE_NAME LIKE '%scRNA.metrics.csv'
AND (
f.value:column_2::STRING = 'Invalid barcode read'
OR f.value:column_2::STRING = 'Passing cells'
)
),
pipeline_table AS (
SELECT
PIPELINE_ANALYSIS_DATA:reference::STRING as reference,
PIPELINE_ANALYSIS_DATA:id::STRING as analysis_id,
PIPELINE_ANALYSIS_DATA:status::STRING as status,
PIPELINE_ANALYSIS_DATA:pipelineId::STRING as pipeline_id,
PIPELINE_ANALYSIS_DATA:requestTime::TIMESTAMP as start_time
FROM
ICA_PIPELINE_ANALYSES_VIEW_project
WHERE
PIPELINE_ANALYSIS_DATA:pipelineId = 'c9c9a2cc-3a14-4d32-b39a-1570c39ebc30'
)
SELECT * FROM flattened_dragen_scrna JOIN pipeline_table
ON
flattened_dragen_scrna.ANALYSIS_UUID = pipeline_table.analysis_id;
SELECT
USER_NAME as user_name,
PROJECT_NAME as project,
SUBSTRING(PIPELINE_ANALYSIS_DATA:reference, 1, 30) as reference,
PIPELINE_ANALYSIS_DATA:status as status,
ROUND(PIPELINE_ANALYSIS_DATA:computePrice,2) as price,
PIPELINE_ANALYSIS_DATA:totalDurationInSeconds as duration,
PIPELINE_ANALYSIS_DATA:startTime::TIMESTAMP as startAnalysis,
f.value:bpeResourceLifeCycle::STRING as bpeResourceLifeCycle,
f.value:bpeResourcePresetSize::STRING as bpeResourcePresetSize,
f.value:bpeResourceType::STRING as bpeResourceType,
f.value:durationInSeconds::INT as durationInSeconds,
f.value:price::FLOAT as priceStep,
f.value:status::STRING as status,
f.value:stepId::STRING as stepId
FROM
ICA_PIPELINE_ANALYSES_VIEW_project,
LATERAL FLATTEN(input => PIPELINE_ANALYSIS_DATA:steps) f
WHERE
PIPELINE_ANALYSIS_DATA:startTime > CURRENT_TIMESTAMP() - INTERVAL '1 WEEK'
ORDER BY
priceStep DESC;