The Partek Flow Metadata Tab has an option to import data, and is where sample/cell attributes are managed. This is also where users can modify the location of the project output folder.
The Metadata tab can be used to import data. To add samples to the project, click Add data under Import, different import options are displayed using the cascading menu (Figure 1).
This method adds samples by creating them simultaneously as the data gets imported into a project. The sample names are assigned automatically based on filenames.
Before proceeding, it is ideal that you have already transfer the data you wish to analyze in a folder (with appropriate permissions) within the Partek Flow server. Please seek assistance from your system administrator in uploading your data directly.
Select the Automatically create samples from files button. The next screen will feature a file browser that will show any folders you have access to in the Partek Flow server (Figure 2). Select a folder by clicking the folder name. Files in the selected folder that have file formats that can be imported by Partek Flow will be displayed and tick-marked on the right panel. You can exclude some files from the folder by unselecting the check mark on the left side of the filename. When you have made your selections, click the Create sample button.
Alternatively, files can also be uploaded and imported into the project from the user's local computer -only use this option if your file size is less than 500MB. Select the My computer radio button (Figure 3) and the options of selecting the local file and the upload (destination) directory will appear. Only one file at a time can be imported to a project using this method.
Multiple data files can be compressed a single .zip file before uploading. Partek Flow will automatically unzip the files and put them in the upload directory.
Please be aware that the use of the method illustrated in Figure 3 highly depends on the speed and latency of the Internet connection between the user's computer and the server where Partek Flow is installed. Given the large size of most genomics data sets, is not recommended in most cases.
After successful creation of samples from files, the Data tab now contains a Sample management table (Figure 4). The Sample name column in the table is automatically generated based on the filenames and the table is sorted in alphabetical order.
Clicking the on the** Show data files** link on the lower right side of the sample management table will expand the table and reveal the filenames of the files associated with each sample. Conversely, clicking on Hide data files will hide the file information.
The columns in the expanded view show the files associated with each sample. Files are organized by file type. Any filename extensions that indicate compression (such as .gz) are not shown.
Once a sample is created in a project, the files associated with it can be modified. In the expanded view, mouse over the +/- column of a sample. The highlighted icons will correspond to the options for the sample on that row.
Samples can be added one at a time by selecting the Create a new blank sample option (Figure 5). In the following dialog box, type a sample name and click Create. This process creates a sample entry in the sample management table but there is no associated file with it, hence it is a "blank sample."
Expanding the Sample management table by clicking Show data files on the lower left corner of the table will reveal the option to associate files to the blank sample.
Alternatively, if you have a matrix of data, such as raw read count data in text format, select Import count matrix. The requirements of this text file are listed below:
The file contains numeric values in a tab-delimited format, samples can be on rows while features (e.g.gene names) are in columns, or vice versa
The file contains unique sample IDs and feature IDs
If the data contains sample attribute information, all these attributes have to be ether
The leftmost columns when samples are on rows (Figure 6)
The first few rows when samples are on columns (Figure 7)
Like all other input files, you can upload the file from the Partek Flow server, My Computer or via a URL. Uploading the file brings up a file preview window (Figure 8). The preview of the first few rows and columns of the text file should help you determine on which rows/columns the relevant counts are located (the preview will display up to 100 rows and 100 columns). Inspect the text preview and indicate the orientation of the text file under File format>Input format.
If the read counts are based on a compatible annotation file in Partek Flow, you can specify that annotation file under Gene/feature annotation. Select the appropriate genome build and annotation model for your count data. Select the Contain sample attributes checkbox if your data includes additional sample information.
The example above is showing an example text file with samples listed on rows. The gene ID is compatible with the hg19 RefSeq Transcripts - 2016-08-01 annotation model. Under the Column information and Row information sections, indicate the location of the Sample ID, which in this case is on Column 1. Indicate the sample attribute location by marking where it starts, which in the example is at Column 2. Mark the Feature ID, which in this case are gene IDs and starts at Column 4.
If the data has been log transformed, specify the base under Counts format.
The project output directory is the folder within the Partek Flow server where all output files produced during analysis will be stored.
The default directory is configured by the Partek Flow Administrator under the Settings menu (under System Preferences > Default project output directory).
If the user does not override the default, the task output will go to a subdirectory with the name of the Project.
After samples have been added in the project, additional information about the samples can be added. Information such as disease type, age, treatment, or sex can be annotated to the data by assigning the Attributes for each sample.
Certain tasks in Partek Flow, such as Gene-Specific Analysis, require that samples be assigned attributes in order to do statistical comparisons between groups or samples. As attributes are added to the project, additional columns in the sample management table will be created.
Attributes can be managed or created within a project. Under the Data tab, click the button to open the Manage attributes page (Figure 9).
To prepare for later data analysis using statistical tools, attributes can either be categorical or numeric (i.e., continuous).
For categorical attributes, there are two levels of visibility. Project-specific categorical attributes are visible only within the current project. System-wide categorical attributes are visible across all the projects within the Partek Flow server, and are useful for maintaining uniformity of terms. Importing samples in a new project will retain the system-wide attributes, but not the project-specific attributes.
A feature of Partek Flow is the use of controlled vocabulary for categorical attributes, allowing samples to be assigned only within pre-defined categories. It was designed to effectively manage content and data and allow teams to share common definitions. The use of standard terms minimizes confusion.
To add a categorical attribute in the Manage attributes page, click the Add new attribute (Figure 10). In the dialog box, type a Name for the attribute, select the Categorical radio button next to Attribute type, select the visibility of the attribute and then click the Add button.
Repeat the process for additional attributes of the samples in your study. When done, click Back to sample management table. Categorical attributes will default to Project-specific visibility.
Click an attribute name to drag and drop can change the order of the attributes displayed on the data tab. Click on group name to drag and drop vertically can change the order of the group name, which can be reflected on visualization.
To add a numeric attribute in the Manage attributes page, click the Add new attribute. In the dialog box (Figure 13), type a Name for the attribute, select the Numeric radio button next to Attribute type, and then click the Add button. Some optional parameters for numeric attributes include the Minimum value, Maximum value, and Units. When done, click Add to return to the Manage attributes page. Repeat the process add more numeric attributes. When done, click Back to sample management table.
Since system-wide attributes do not have to be created by the current user, they only need to be added to the sample management table in a project.
In the Data tab, click Add a system-wide attribute button. In the dialog box that follows (Figure 14), a drop down menu is located next to Add attribute where you can select the System-wide attribute you would like to add to the project. Once selected, it will be recognized automatically as either Categorical, system-wide or the Numeric attribute.
For an System-wide categorical attribute, the different categories are listed and you have the option of pre-filling the columns with N/A (or any other category within the attribute). Click Add column and you will return to the Data Tab.
After adding all the desired attributes to a project, the sample management table will show a new column for each attribute (Figure 15). The columns will initially as "N/A", as the samples have not yet been categorized or assigned a value. To edit the table, click Edit attributes. Assign the sample attributes by using a drop down for categorical attributes (controlled vocabulary) or typing with a keyboard for numeric attributes.
When all the attributes have been entered, click Apply changes and the sample management table will be updated. After editing the sample table, make sure there are no fields with blank or N/A values before proceeding. To rename or delete attributes, click Manage attributes from the Data tab to access the Manage attributes page.
Another way to assign attributes to samples in the Data tab is to use a text file that contains the table of attributes and categories/values. This table is prepared outside of Partek Flow using any text editing software capable of saving tab-separated text files.
Using a text editor, prepare a table containing the attributes. An example is shown in Figure 16. There should only be one tab between columns with no extra tabs after the last column. In this particular example, the first column contains the filename and the text file is saved as Sampleinfo.txt.
The first row of the table in the text file contains the attributes (as headers). The first column of the table in the text file, regardless of the header of the first column, should contain either the sample names or the file names of the samples already added in Partek Flow. The first column is the unique identifier that will match the samples to the correct values or categories.
To upload sample attributes, click Assign sample attributes from a file in the Data tab. Then indicate where the attribute text file is stored and navigate to it. Partek Flow will parse the text file and present attributes that will be available for import (Figure 17).
Select the attributes you want to import by clicking the Import check box. Imported attributes that do not currently exist in the project will create new project-specific attributes.
You can change the name of a specific attribute by editing the Attribute name text box. Columns containing letter characters are automatically selected as categorical attributes. Columns containing numbers are suggested to be numeric attributes and can be changed to categorical using the drop down menu under Attribute type.
The first column is always the unique identifier and can refer only to File names or Sample names.
If using Sample names in the first column, they must match the entries of the Sample name column in the Sample management table.
If using File names in the first column, use the filenames shown in the fastq column of the expanded sample management table (see Figure 4) then add the extension .gz. All filenames must include the complete file extension (e.g., Samplename.fastq.gz).
The header name of the first column of the table (top left cell of our text table) is irrelevant but should not be left blank. Whether the first column contains File names or Sample names will be chosen during the process.
The last column cannot have empty values
Missing data (blank cells) can only be handled if the attribute is numeric. If it is categorical, please put a character in it.
It is advisable to use Sample name as the first column identifier when:
Samples are associated with more than one file (for instance, paired-end reads and/or technical replicates).
The files were imported in the SRA format (from the Sequence Read Archive database). In Partek Flow, they are automatically converted to the FASTQ format. Consequently, their filenames would change once they are imported. The new file names can be seen by expanding the sample management table, the new extension would be .fastq.gz.
If attributes are assigned from two different text files, the following will happen:
If the previous attributes have the same header and type (both are either categorical or numeric), the values are overwritten.
If there are different/additional headers on the "second round" of assignment, these new attributes will be appended to the table.
For numeric attributes, a "blank" value will not override a previous value.
The attributes assigned to the samples within the Data Tab will be associated with the samples throughout the project. During the course of analysis, Partek Flow tasks generate various tables and any attributes associated with a sample can be included in the table as optional columns. An example is shown in Figure 18 for a pre-alignment QA/QC report where the Optional columns link on the top left of the table reveal the different sample attributes.
In the Data tab, each sample can be renamed or deleted from the project by clicking the gear icon next to the sample name. The gear icon is readily visible upon mouse over (Figure 19). Sample can only be deleted if no analysis has been performed on the data yet. If any analysis has been performed on the data node, then delete sample operation is invisible. You can perform filter samples in downstream analysis if you want to exclude certain samples in further analysis. Deleting a sample from a project does not delete the associated files, which will remain on the disk.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Click the green icon ( ) to associate additional files or the red icon ( ) to dissociate a file from a sample. You can manually associate multiple files with one sample. Dissociating a file from a sample does not delete the file from the Partek Flow server.
Mouse over the +/- column and click the green icon ( ) to associate a file(s) to the sample. Perform the process for every sample in your project.
The user has the option of specifying an existing folder or creating a new one as the project output directory. To do so, click the icon next to the directory and specify or create a new folder in the dialog box.
Individual categories for the attribute must then be entered. Enter a name of the New category in the New category text box and click Add (Figure 11). The Name of the new category will show up in the table. The category can also be edited by clicking or deleted by clicking (visible on mouse-over). Repeat to add additional categories within the attribute.