arrow-left

All pages
gitbookPowered by GitBook
1 of 3

Loading...

Loading...

Loading...

Base: SnowSQL

You can access the databases and tables within the Base module using snowSQL command-line interface. This is useful for external collaborators who do not have access to ICA core functionalities. In this tutorial we will describe how to obtain the token and use it for accessing the Base module. This tutorial does not cover how to install and configure snowSQL.

hashtag
Obtaining OAuth token and URL

Once the Base module has been enabled within a project, the following details are shown in Projects > your_project > Project Settings > Details.

base-enabled-oauth

After clicking the button Create OAuth access token, the pop-up authenticator is displayed.

After clicking the button Generate snowSQL command the pop-up authenticator presents the snowSQL command.

Copy the snowSQL command and run it in the console to log in.

You can also get the OAuth access token via API by providing <PROJECT ID> and <YOUR KEY>.

hashtag
Example:

API Call:

Response

Template snowSQL:

Now you can perform a variety of tasks such as:

  1. Querying Data: execute SQL queries against tables, views, and other database objects to retrieve data from the Snowflake data warehouse.

  2. Creating and Managing Database Objects: create tables, views, stored procedures, functions, and other database objects in Snowflake. you can also modify and delete these objects as needed.

  3. Loading Data: load data into Snowflake from various sources such as local files, AWS S3, Azure Blob Storage, or Google Cloud Storage.

Overall, snowSQL CLI provides a powerful and flexible interface to work with Snowflake, allowing external users to manage data warehouse and perform a variety of tasks efficiently and effectively without access to the ICA core.

hashtag
Example Queries:

Show all tables in the database:

Create a new table:

List records in a table:

Load data from a file: To load data from a file, you can start by create a staging area in the internal storage using the following commend:

You can then upload the local file to the internal storage using the following command:

You can check if the file was uploaded properly using LIST command:

Finally, Load data by using COPY TO command. The command assumes the data.tsv is a tab delimited file. You can easily modify the following command to import JSON file setting TYPE=JSON.

Load data from a string: If you have data as JSON string, you can import the data into the tables using following commands.

Load data into specific columns: If you want to load sample_name into the table, you can remove the "count" from the column and the value list as below:

List the views of the database to which you are connected. As shared database and catalogue views are created within the project database, they will be listed. However, it does not show views which are granted via another database, role or from bundles.

Show grants, both directly on the tables and views and grants to roles which in turn have grants on tables and views.

base-oauth-token
base-oauth-command
curl -X 'POST' \
  'https://ica.illumina.com/ica/rest/api/projects/<PROJECT ID>/base:connectionDetails' \
  -H 'accept: application/vnd.illumina.v3+json' \
  -H 'X-API-Key: <YOUR KEY>' 
{
  "authenticator": "oauth",
  "accessToken": "XXXXXXXXXX",
  "dnsName": "use1sf01.us-east-1.snowflakecomputing.com",
  "userPrincipalName": "xxxxx",
  "databaseName": "xxxxx",
  "schemaName": "xxx",
  "warehouseName": "xxxxxx",
  "roleName": "xxx"
}
snowsql -a use1sf01.us-east-1 -u <userPrincipalName> --authenticator=oauth -r <roleName> -d <databaseName> -s PUBLIC -w <warehouseName> --token="<accessToken>"
>SHOW TABLES;
create TABLE demo1(sample_name VARCHAR, count INT);
SELECT * FROM demo1;
>CREATE STAGE myStage;
> PUT file:///path/to/data.tsv @myStage;
> LIST @myStage;
> COPY INTO demo1(sample_name, count) FROM @mystage/data.tsv FILE_FORMAT = (TYPE = 'CSV' FIELD_DELIMITER = '\t');
> SET myJSON_str = '{"sample_name": "from-json-str", "count": 1}';
> INSERT INTO demo1(sample_name, count)
> SELECT
    PARSE_JSON($myJSON_str):sample_name::STRING,
    PARSE_JSON($myJSON_str):count::INT
> SET myJSON_str = '{"sample_name": "from-json-str", "count": 1}';
> INSERT INTO demo1(sample_name)
  SELECT
    PARSE_JSON($myJSON_str):sample_name::STRING;
>SHOW VIEW;
>SHOW GRANTS;

Base: Access Tables via Python

You can access the databases and tables within the Base module using Python from your local machine. Once retrieved as e.g. pandas object, the data can be processed further. In this tutorial, we will describe how you could create a Python script which will retrieve the data and visualize it using Dash framework. The script will contain the following parts:

  • Importing dependencies and variables.

  • Function to fetch the data from Base table.

  • Creating and running the Dash app.

hashtag
Importing dependencies and variables

This part of the code imports the dependencies which have to be installed on your machine (possibly with pip). Furthermore, it imports the variables API_KEY and PROJECT_ID from the file named config.

hashtag
Function to fetch the data from Base table

We will be creating a function called fetch_data to obtain the data from Base table. It can be broken into several logically separated parts:

  • Retrieving the token to access the Base table together with other variables using API.

  • Establishing the connection using the token.

  • SQL query itself. In this particular example, we are extracting values from two tables Demo_Ingesting_Metrics and BB_PROJECT_PIPELINE_EXECUTIONS_DETAIL. The table Demo_Ingesting_Metrics contains various metrics from DRAGEN analyses (e.g. the number of bases with quality at least 30 Q30_BASES) and metadata in the column ica which needs to be flattened to access the value

Here is the corresponding snippet:

hashtag
Creating and running the Dash app

Once the data is fetched, it is visualized in an app. In this particular example, a scatter plot is presented with END_DATE as x axis and the choice of the customer from the dropdown as y axis.

Now we can create a single Python script called dashboard.py by concatenating the snippets and running it. The dashboard will be accessible in the browser on your machine.

Execution_reference
. Both tables are joined on this
Execution_reference
value.
  • Fetching the data using the connection and the SQL query.

  • from dash import Dash, html, dcc, callback, Output, Input
    import plotly.express as px
    
    from config import API_KEY, PROJECT_ID
    import requests
    import snowflake.connector
    import pandas as pd
    def fetch_data():
        # Your data fetching and processing code here
        # retrieving the Base oauth token
        url = 'https://ica.illumina.com/ica/rest/api/projects/' + PROJECT_ID +  '/base:connectionDetails'
    
        # set the API headers
        headers = {
                    'X-API-Key': API_KEY,
                    'accept': 'application/vnd.illumina.v3+json'
                    }
    
        response = requests.post(url, headers=headers)
        ctx = snowflake.connector.connect(
            account=response.json()['dnsName'].split('.snowflakecomputing.com')[0],
            authenticator='oauth',
            token=response.json()['accessToken'], 
            database=response.json()['databaseName'],
            role=response.json()['roleName'],
            warehouse=response.json()['warehouseName']
        )
        cur = ctx.cursor()
        sql = '''
        WITH flattened_Demo_Ingesting_Metrics AS (
            SELECT 
                flattened.value::STRING AS execution_reference_Demo_Ingesting_Metrics,
                t1.SAMPLEID,
                t1.VARIANTS_TOTAL_PASS,
                t1.VARIANTS_SNPS_PASS,
                t1.Q30_BASES,
                t1.READS_WITH_MAPQ_3040_PCT
            FROM 
                Demo_Ingesting_Metrics t1,
                LATERAL FLATTEN(input => t1.ica) AS flattened
            WHERE 
                flattened.key = 'Execution_reference'
        ) SELECT 
            f.execution_reference_Demo_Ingesting_Metrics,
            f.SAMPLEID,
            f.VARIANTS_TOTAL_PASS,
            f.VARIANTS_SNPS_PASS,
            t2."EXECUTION_REFERENCE",
            t2.END_DATE,
            f.Q30_BASES,
            f.READS_WITH_MAPQ_3040_PCT
        FROM 
            flattened_Demo_Ingesting_Metrics f
        JOIN 
            BB_PROJECT_PIPELINE_EXECUTIONS_DETAIL t2
        ON 
            f.execution_reference_Demo_Ingesting_Metrics = t2."EXECUTION_REFERENCE";
        '''
    
        cur.execute(sql)
        data = cur.fetch_pandas_all()
        return data
    
    df = fetch_data()
    
    app = Dash(__name__)
    #server = app.server
    
    
    app.layout = html.Div([
        html.H1("My Dash Dashboard"),
        
        html.Div([
            html.Label("Select X-axis:"),
            dcc.Dropdown(
                id='x-axis-dropdown',
                options=[{'label': col, 'value': col} for col in df.columns],
                value=df.columns[5]  # default value
            ),
            html.Label("Select Y-axis:"),
            dcc.Dropdown(
                id='y-axis-dropdown',
                options=[{'label': col, 'value': col} for col in df.columns],
                value=df.columns[2]  # default value
            ),
        ]),
        
        dcc.Graph(id='scatterplot')
    ])
    
    
    @callback(
        Output('scatterplot', 'figure'),
        Input('y-axis-dropdown', 'value')
    )
    def update_graph(value):
        return px.scatter(df, x='END_DATE', y=value, hover_name='SAMPLEID')
    
    if __name__ == '__main__':
        app.run(debug=True)

    Base Basics

    Base is a genomics data aggregation and knowledge management solution suite. It is a secure and scalable integrated genomics data analysis solution which provides information management and knowledge mining. Refer to the Base documentationarrow-up-right for more details.

    This tutorial provides an example for exercising the basic operations used with Base, including how to create a table, load the table with data, and query the table.

    hashtag
    Prerequisites

    • An ICA project with access to Base

      • If you don't already have a project, please follow the instructions in the to create a project.

    • File to import

      • A tab delimited gene expression file (). Example format:

    hashtag
    Create table

    Tables are components of databases that store data in a 2-dimensional format of columns and rows. Each row represents a new data record in the table; each column represents a field in the record. On ICA, you can use Base to create custom tables to fit your data. A schema definition defines the fields in a table. On ICA you can create a schema definition from scratch, or from a template. In this activity, you will create a table for RNAseq count data, by creating a schema definition from scratch.

    1. Go to the Projects > your_project > Base > Tables and enable Base by clicking on the Enable button.

    1. Select Add Table > New Table.

    2. Create your table

      1. To create your table from scratch, select Empty Table from the Create table from dropdown.

    hashtag
    Upload data to load into your table

    1. Upload sampleX.final.count.tsv file with the final count.

      1. Select Data tab (1) from the left menu.

      2. Click on the grey box (2) to choose the file to upload or drag and drop the sampleX.final.count.tsv into the grey box

    hashtag
    Create a schedule to load data into your table

    Data can be loaded into tables manually or automatically. To load data automatically, you can set up a schedule. The schedule specifies which files’ data should be automatically loaded into a table, when those files are uploaded to ICA or created by an analyses on ICA. Active schedules will check for new files every 24 hours.

    In this exercise, you will create a schedule to automatically load RNA transcript counts from .final.count.tsv files into the table you created above.

    1. Go to Projects > your_project > Base > Schedule and click the + Add New button.

    1. Select the option to load the contents from files into a table.

    1. Create your schedule.

      1. Name your schedule LoadFeatureCounts

      2. Choose Project as the source of data for your table.

    1. Highlight your schedule. Click the Run button to run your schedule now.

    • It will take a short time to prepare and load data into your table.

      1. Check the status of your job on your Projects > your_project > Activity page.

      2. Click the BASE JOBS tab to view the status of scheduled Base jobs.

    1. Check the data in the table.

      1. Go back to your Projects > your_project > Base > Tables page.

      2. Double-click your table to view its details.

    hashtag
    Query a table

    To request data or information from a Base table, you can run an SQL query. You can create and run new queries or saved queries.

    In this activity, we will create and run a new SQL query to find out how many records (RNA transcripts) in your table have counts greater than 100.

    1. Go to your Projects > your_project > Base > Query page.

    1. Paste the above query into the NEW QUERY text box

    2. Click the Run Query button to run your query

    3. View your query results.

    hashtag
    Export table data

    Find the table you want to export on the "Tables" page under BASE. Go to the table details page by clicking twice on the table you want to export.

    Click on the Export As File icon and complete the required fields

    1. Name: Name of the exported file

    2. Data Format: A table can be exported in CSV and JSON format. The exported files can be compressed using GZIP, BZ2, DEFLATE or RAW_DEFLATE.

      • CSV Format: In addition to Comma, the file can be Tab, Pipe or Custom character delimited.

    1. Export to single/multiple files: This option allows the export of a table as a single (large) file or multiple (smaller) files. If "Export to multiple files" is selected, a user can provide "Maximum file size (in bytes)" for exported files. The default value is 16,000,000 bytes but can be increased to accommodate larger files. The maximum file size supported is 5 GB.

    Name your table FeatureCounts
  • Uncheck the box next to Include reference, to exclude reference data from your table.

  • Check the box next to Edit as text. This will reveal a text box that can be used to create your schema.

  • Copy the schema text below and paste it in into the text box to create your schema.

  • Click the Save button

  • Refresh the screen (3)

  • The uploaded file (4) will appear on the data page after successful upload.

  • To specify that data from .final.count.tsv files should be loaded into your table, enter .final.count.tsv in the Search for a part of a specific ‘Orignal Name’ or Tag text box.
  • Specify your table as the one to load data into, by selecting your table (FeatureCounts) from the dropdown under Target Base Table.

  • Under Write preference, select Append to table. New data will be appended to your table, rather than overwriting existing data in your table.

  • The format of the .final.count.tsv files that will be loaded into your table are TSV/tab-delimited, and do not contain a header row. For the Data format, Delimiter, and Header rows to skip fields, use these values:

    • Data format: TSV

    • Delimiter: Tab

    • Header rows to skip: 0

  • Click the Save button

  • Click BASE ACTIVITY to view Base activity.

    You will land on the SCHEMA DEFINITION page.
  • Click the PREVIEW tab to view the records that were loaded into your table.

  • Click the DATA tab, to view a list of the files whose data has been loaded into your table.

  • Save your query for future use by clicking the Save Query button. You will be asked to "Name" the query before clicking on the "Create" button.

    JSON Format: Selecting JSON format exports the table in a text file containing a JSON object for each entry in the table. This is the standard snowflake behavior.

    Project documentation
    sampleX.final.count.tsvarrow-up-right
    enable-base
    save-table
    upload_data
    schedule-tab
    schedule_load_content_from_file
    create_schedule
    run_schedule
    activity
    tablde_preview_data
    query_page
    query_results
    table_view
    json-format
    HES4-NM_021170-T00001  1392
    ISG15-NM_005101-T00002	46
    SLC2A5-NM_003039-T00003	14
    H6PD-NM_004285-T00004	30
    PIK3CD-NM_005026-T00005	200
    MTOR-NM_004958-T00006	156
    FBXO6-NM_018438-T00007	10
    MTHFR-NM_005957-T00008	154
    FHAD1-NM_052929-T00009	10
    PADI2-NM_007365-T00010	12
    SELECT TranscriptID,ExpressionCount FROM FeatureCounts WHERE ExpressionCount > 100;
    {
      "Fields": [
        {
          "NAME_PATTERN": "[a-zA-Z][a-zA-Z0-9_]*",
          "Name": "TranscriptID",
          "Type": "STRING",
          "Mode": "REQUIRED",
          "Description": null,
          "DataResolver": null,
          "SubBluebaseFields": []
        },
        {
          "NAME_PATTERN": "[a-zA-Z][a-zA-Z0-9_]*",
          "Name": "ExpressionCount",
          "Type": "INTEGER",
          "Mode": "REQUIRED",
          "Description": null,
          "DataResolver": null,
          "SubBluebaseFields": []
        }
      ]
    }