On the Schedule page at Projects > your_project > Base > Schedule, it’s possible to create a job for importing different types of data you have access to into an existing table.
When creating or editing a schedule, Automatic import is performed when the Active box is checked. The job will run at 10 minute intervals. In addition, for both active and inactive schedules, a manual import is performed when selecting the schedule and clicking the »run button.
There are different types of schedules that can be set up:
Files
Metadata
Administrative data.
This type will load the content of specific files from this project into a table. When adding or editing this schedule you can define the following parameters:
Active: The job will run automatically if checked
Name (required): The name of the scheduled job
Description: Extra information about the schedule
Source:
Project: All files with the correct naming from this project will be used.
Search for a part of a specific ‘Original Name’ or Tag (required): Define in this field a part or the full name of the file name or of the tag that the files you want to upload contain. For example, if you want to import files named sample1_reads.txt, sample2_reads.txt, … you can fill in _reads.txt in this field to have all files that contain _reads.txt imported to the table.
Generated by Pipelines: Only files generated by these selected pipelines are taken into account. When left clear, files from all pipelines are used.
Target Base Table (required): The table to which the information needs to be added. A drop-down list with all created tables is shown. This means the table needs to be created before the schedule can be created.
Write preference: Define data handling; whether it can overwrite the data
Data format (required): CSV, TSV, JSON, AVRO, PARQUET
Delimiter: to indicate which delimiter is used in the delimiter separated file. If the delimiter is not present in list, it can be indicated as custom.
Custom delimiter: the custom delimiter that is used in the file.
Header rows to skip: The number of consecutive header rows (at the top of the table) to skip.
References: Choose which references must be added to the table
Advanced Options - Ignore unknown values: This applies to CSV-formatted files. You can use this function to handle optional fields without separators, provided that the missing fields are located at the end of the row. Otherwise, the parser can not detect the missing separator and will shift fields to the left, resulting in errors.
If headers are used: The columns that have matching fields are loaded, those that have no matching fields are loaded with NULL and remaining fields are discarded.
If no headers are used: The fields are loaded in order of occurrence and trailing missing fields are loaded with NULL, trailing additional fields are discarded.
This type will create two new tables: BB_PROJECT_PIPELINE_EXECUTIONS_DETAIL and ICA_PROJECT_SAMPLE_META_DATA. The job will load metadata (added to the samples) into ICA_PROJECT_SAMPLE_META_DATA. The process gathers the metadata from the samples via the data linked to the project and the metadata from the analyses in this project. Furthermore, the schedular will add provenance data to BB_PROJECT_PIPELINE_EXECUTIONS_DETAIL. This process gathers the execution details of all the analyses in the project: the pipeline name and status, the user reference, the input files (with identifiers), and the settings selected at runtime. This enables you to track the lineage of your data and to identify any potential sources of errors or biases. So, for example, the following query will count how many times each of the pipelines was executed and sort it accordingly:
To obtained the similar table for the failed runs, you can execute the following SQL query:
When adding or editing this schedule you can define the following parameters:
Active: the job will run automatically if ticked
Name (required): the name of this scheduled job
Description: Extra information about the schedule
Source (required):
Project: all meta data from this project will be added
Account: all meta data from every project in the account will be added. This feature is only available to the tenant admin. When a tenant admin creates the tenant-wide table with metadata in a project and invites other users to this project, these users will see this table as well.
Anonymize references: when selected, the references will not be added
Include sensitive meta data fields: in the meta data fields configuration, fields can be set to sensitive. When checked, those fields will also be added.
This type will automatically create a table and load administrative data into this table. A usage overview of all executions is considered administrative data.
When adding or editing this schedule the following parameters can be defined:
Active: The job will run automatically if checked
Name – required field: The name of this scheduled job
Description: Extra information about the schedule
Source:
Project: All administrative data from this project will be added
Account: All administrative data from every project in the account will be added. This feature is only available to the tenant admin. When a tenant admin creates the tenant-wide table with administrative data in a project and invites other users to this project, these users will see this table as well.
Anonymize references: When checked, any platform references will not be added
Include sensitive metadata fields: In the metadata fields configuration, fields can be set to sensitive. When checked, those fields will also be added.
Schedules can be deleted. Once deleted, they will no longer run, and they will not be shown in the list of schedules.
When clicking the Run button, or Save & Run when editing, the schedule will start the job of importing the configured data in the correct tables. This way the schedule can be run manually. The result of the job can be seen in the tables.