Steps definition

This document describes the steps that can be used in our pipelines. A step is the unit of transformation generated by a user interaction with the visual query builder.

Unless explicitly specified, all steps are supported by all backends.

aggregate step

Perform aggregations on one or several columns.

An aggreation step has the following strucure:

{
   name: 'aggregate',
   on: ['column1', 'column2'],
   aggregations:  [
      {
          newcolumn: 'sum_value1',
          aggfunction: 'sum',
          column: 'value1'
      }
    // ...
  ]
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example:

Input dataset:

Label Group Value
Label 1 Group 1 13
Label 2 Group 1 7
Label 3 Group 1 20
Label 4 Group 2 1
Label 5 Group 2 10
Label 6 Group 2 5

Step configuration:

{
  name: 'aggregate',
   on: ['Group'],
   aggregations:  [
    {
      newcolumn: 'Total',
      aggfunction: 'sum',
      column: 'Value'
    }
  ]
}

Output dataset:

Group Total
Group 1 30
Group 2 16

append step

Appends to the current dataset, one or several datasets resulting from other pipelines. WeaverBird allows you to save pipelines referenced by name in the Vuex store of the application. You can then call them by their unique names in this step.

{
  name: 'append',
  pipelines: ['pipeline1', 'pipeline2'] // the name of the pipelines to append
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example

Input dataset:

Label Group Value
Label 1 Group 1 13
Label 2 Group 1 7

dataset1 (saved in the application Vuex store):

Label Group Value
Label 3 Group 1 20
Label 4 Group 2 1

dataset2 (saved in the application Vuex store):

Label Group Value
Label 5 Group 2 10
Label 6 Group 2 5

Step configuration:

{
  name: 'append',
  datasets: ['dataset1', 'dataset2']
}

Output dataset:

Label Group Value
Label 1 Group 1 13
Label 2 Group 1 7
Label 3 Group 1 20
Label 4 Group 2 1
Label 5 Group 2 10
Label 6 Group 2 5

argmax step

Get row(s) matching the maximum value in a given column, by group if groups is specified.

{
  name: 'argmax',
  column: 'value', // column in which to search for max value
  groups: ['group1', 'group2']
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example 1: without groups

Input dataset:

Label Group Value
Label 1 Group 1 13
Label 2 Group 1 7
Label 3 Group 1 20
Label 4 Group 2 1
Label 5 Group 2 10
Label 6 Group 2 5

Step configuration:

{
  name: 'argmax',
  column: 'Value'
}

Output dataset:

Label Group Value
Label 3 Group 1 20

Example 2: with groups

Input dataset:

Label Group Value
Label 1 Group 1 13
Label 2 Group 1 7
Label 3 Group 1 20
Label 4 Group 2 1
Label 5 Group 2 10
Label 6 Group 2 5

Step configuration:

{
  name: 'argmax',
  column: 'Value',
  groups: ['Group']
}

Output dataset:

Label Group Value
Label 3 Group 1 20
Label 5 Group 2 10

argmin step

Get row(s) matching the minimum value in a given column, by group if groups is specified.

{
  name: 'argmin',
  column: 'value', // column in which to search for max value
  groups: ['group1', 'group2'] // optional
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example 1: without groups

Input dataset:

Label Group Value
Label 1 Group 1 13
Label 2 Group 1 7
Label 3 Group 1 20
Label 4 Group 2 1
Label 5 Group 2 10
Label 6 Group 2 5

Step configuration:

{
  name: 'argmin',
  column: 'Value'
}

Output dataset:

Label Group Value
Label 2 Group 1 7

Example 2: with groups

Input dataset:

Label Group Value
Label 1 Group 1 13
Label 2 Group 1 7
Label 3 Group 1 20
Label 4 Group 2 1
Label 5 Group 2 10
Label 6 Group 2 5

Step configuration:

{
  name: 'argmin',
  column: 'Value',
  groups: ['Groups']
}

Output dataset:

Label Group Value
Label 2 Group 1 7
Label 4 Group 2 1

concatenate step

This step allows to concatenate several columns using a separator.

{
  name: 'concatenate',
  columns: ['Company', 'Group']
  separator: ' - ' // can be a string of any length
  new_column_name: 'Label' // The new column in which to write the concatenation
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example

Input dataset:

Company Group Value
Company 1 Group 1 13
Company 2 Group 1 7
Company 3 Group 1 20
Company 4 Group 2 1
Company 5 Group 2 10
Company 6 Group 2 5

Step configuration:

{
  name: 'concatenate',
  columns: ['Company', 'Group']
  separator: ' - '
  new_column_name: 'Label'
}

Output dataset:

Company Group Value Label
Company 1 Group 1 13 Company 1 - Group 1
Company 2 Group 1 7 Company 2 - Group 1
Company 3 Group 1 20 Company 3 - Group 1
Company 4 Group 2 1 Company 4 - Group 2
Company 5 Group 2 10 Company 5 - Group 2
Company 6 Group 2 5 Company 6 - Group 2

convert step

This step allows to columns data types.

{
  name: 'convert',
  columns: ['col1', 'col2']
  data_type: 'integer' // The data type to convert the column into. Can be either
                       // 'integer', 'float', 'text', 'date' or 'boolean'
}

This step is supported by the following backends:

  • Mongo 4.0

Example

Input dataset:

Company Value
Company 1 ‘13’
Company 2 ‘7’
Company 3 ‘20’
Company 4 ‘1’
Company 5 ‘10’
Company 6 ‘5’

Step configuration:

{
  name: 'convert',
  columns: ['Value']
  data_type: 'integer'
}

Output dataset:

Company Value
Company 1 13
Company 2 7
Company 3 20
Company 4 1
Company 5 10
Company 6 5

custom step

This step allows to define a custom query that can’t be expressed using the other existing steps.

{
    name: 'custom',
    query: '$group: {_id: ...}'
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example: using Mongo query language

Input dataset:

Company Group Value
Company 1 Group 1 13
Company 2 Group 1 7
Company 3 Group 1 20
Company 4 Group 2 1
Company 5 Group 2 10
Company 6 Group 2 5

Step configuration:

{
  name: 'custom',
  query: '$addFields: { Label: { $concat: [ "$Label", " - ", "$Group" ] ] } }'
}

Output dataset:

Company Group Value Label
Company 1 Group 1 13 Company 1 - Group 1
Company 2 Group 1 7 Company 2 - Group 1
Company 3 Group 1 20 Company 3 - Group 1
Company 4 Group 2 1 Company 4 - Group 2
Company 5 Group 2 10 Company 5 - Group 2
Company 6 Group 2 5 Company 6 - Group 2

dateextract step

Extract a part of a date (e.g. day, week, year) column. The following properties can be extracted:

  • year': extract ‘year’ from date,
  • month': extract ‘month’ from date,
  • day': extract ‘day of month’ from date,
  • hour': extract ‘hour’ from date,
  • minutes': extract ‘minutes’ from date,
  • seconds': extract ‘seconds’ from date,
  • milliseconds': extract ‘milliseconds’ from date,
  • dayOfYear': extract ‘day of year’ from date,
  • dayOfWeek': extract ‘day of week’ from date,
  • week': extract ‘week number’ from date.

Here’s an example of such a step:

{
    name: 'dateextract',
    column: 'date',
    operation: 'day',
    new_column_name: 'date_day',
    format: '%Y-%m-%d',
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example

Input dataset:

Company Date
Company 1 2019-10-06T00:00:00.000Z
Company 1 2019-10-07T00:00:00.000Z
Company 1 2019-10-08T00:00:00.000Z
Company 2 2019-10-06T00:00:00.000Z
Company 2 2019-10-07T00:00:00.000Z
Company 2 2019-10-08T00:00:00.000Z

Step configuration:

{
    name: 'dateextract',
    column: 'Date',
    operation: 'day',
    new_column_name: 'Date_day',
    format: '%Y-%m-%d',
}

Output dataset:

Company Date Date_day
Company 1 2019-10-06T00:00:00.000Z 6
Company 1 2019-10-07T00:00:00.000Z 7
Company 1 2019-10-08T00:00:00.000Z 8
Company 2 2019-10-06T00:00:00.000Z 6
Company 2 2019-10-07T00:00:00.000Z 7
Company 2 2019-10-08T00:00:00.000Z 8

delete step

Delete a column.

{
    name: 'delete',
    columns: ['my-column', 'some-other-column']
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example

Input dataset:

Company Group Value Label
Company 1 Group 1 13 Company 1 - Group 1
Company 2 Group 1 7 Company 2 - Group 1
Company 3 Group 1 20 Company 3 - Group 1
Company 4 Group 2 1 Company 4 - Group 2
Company 5 Group 2 10 Company 5 - Group 2
Company 6 Group 2 5 Company 6 - Group 2

Step configuration:

{
  name: 'delete',
  columns: ['Company', 'Group']
}

Output dataset:

Value Label
13 Company 1 - Group 1
7 Company 2 - Group 1
20 Company 3 - Group 1
1 Company 4 - Group 2
10 Company 5 - Group 2
5 Company 6 - Group 2

domain step

This step is meant to select a specific domain (using MongoDB terminology).

{
    name: 'domain',
    domain: 'my-domain'
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

duplicate step

This step is meant to duplicate a column.

{
    name: 'duplicate',
    column: 'my-column'
    new_column_name: 'my-duplicate'
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example

Input dataset:

Company Value
Company 1 13
Company 2 0
Company 3 20

Step configuration:

{
  name: 'duplicate',
  column: "Company",
  new_column_name: "Company-copy",
}

Output dataset:

Company Value Company-copy
Company 1 13 Company 1
Company 2 0 Company 2
Company 3 20 Company 3

fillna step

Replace null values by a given value in a column.

{
    name: 'fillna',
    column: "foo",
    value: "bar"
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example

Input dataset:

Company Group Value
Company 1 Group 1 13
Company 2 Group 1  
Company 3 Group 1 20
Company 4 Group 2 1
Company 5 Group 2  
Company 6 Group 2 5

Step configuration:

{
  name: 'fillna',
  column: "Value",
  value: 0
}

Output dataset:

Company Group Value
Company 1 Group 1 13
Company 2 Group 1 0
Company 3 Group 1 20
Company 4 Group 2 1
Company 5 Group 2 0
Company 6 Group 2 5

filter step

Filter out lines that don’t match a filter definition.

{
    name: 'filter',
    column: 'my-column',
    value: 42,
    operator: 'ne'
}

operator is optional, and defaults to eq. Allowed operators are eq, ne, gt, ge, lt, le, in, nin, matches, notmatches isnull or notnull.

value can be an arbitrary value depending on the selected operator (e.g a list when used with the in operator, or null when used with the isnull operator).

matches and notmatches operators are used to test value against a regular expression.

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

formula step

Add a computation based on other columns or on values. Column names must not be escaped. Strings have to be escaped with quotes.

{
  {
    name: 'formula',
    new_column: 'result', // if the column already exists, overwrites it
    formula: '(Value1 + Value2) / Value3 - Value4 * 2'
  }
}

Example

Input dataset:

Label Value1 Value2 Value3 Value4
Label 1 10 2 3 1
Label 2 1 13 7 3
Label 3 5 20 5 2

Step configuration:

{
  name: 'formula',
  new_column: 'Result',
  formula: '(Value1 + Value2) / Value3 - Value4 * 2'
}

Output dataset:

Label Value1 Value2 Value3 Value4 Result
Label 1 10 2 3 1 2
Label 2 1 13 7 3 -4
Label 3 5 20 5 2 1

join step

Joins a dataset to the current dataset, i.e. brings columns from the former into the latter, and matches rows based on columns correspondance. It is similar to a JOIN clause in SQL, or to a VLOOKUP in excel. The joined dataset is the result from the query of the right_pipeline.

The join type can be:

  • ‘left’: will keep every row of the current dataset and fill unmatched rows with null values,

  • ‘inner’: will only keep rows that match rows of the joined dataset.

In the on parameter, you must specify 1 or more column couple(s) that will be compared to determine rows correspondance between the 2 datasets. The first element of a couple is for the current dataset column, and the second for the corresponding column in the right dataset to be joined. If you specify more than 1 couple, the matching rows will be those that find a correspondance between the 2 datasets for every column couple specified (logical ‘AND’).

Weaverbird allows you to save pipelines referenced by name in the Vuex store of the application. You can then call them by their unique names in this step.

{
  name: 'join',
  right_pipeline: 'somePipelineReference',
  type: 'left', // or 'inner'
  on: [
    ['currentDatasetColumn1', 'rightDatasetColumn1'],
    ['currentDatasetColumn2', 'rightDatasetColumn2'],
  ]
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example 1: Left join with one column couple as on parameter

Input dataset:

Label Value
Label 1 13
Label 2 7
Label 3 20
Label 4 1
Label 5 1
Label 6 1

rightDataset (saved in the application Vuex store):

Label Group
Label 1 Group 1
Label 2 Group 1
Label 3 Group 2
Label 4 Group 2

Step configuration:

{
  name: 'join',
  right_pipeline: 'rightDataset',
  type: 'left',
  on: [['Label', 'Label']];
}

Output dataset:

Label Value Group
Label 1 13 Group 1
Label 2 7 Group 1
Label 3 20 Group 2
Label 4 1 Group 2
Label 5 1  
Label 6 1  

Example 2: inner join with different column names in the on parameter

Input dataset:

Label Value
Label 1 13
Label 2 7
Label 3 20
Label 4 1
Label 5 1
Label 6 1

rightDataset (saved in the application Vuex store):

LabelRight Group
Label 1 Group 1
Label 2 Group 1
Label 3 Group 2
Label 4 Group 2

Step configuration:

{
  name: 'join',
  right_pipeline: 'rightDataset',
  type: 'inner',
  on: [['Label', 'LabelRight']];
}

Output dataset:

Label Value LabelRight Group
Label 1 13 Label 1 Group 1
Label 2 7 Label 2 Group 1
Label 3 20 Label 3 Group 2
Label 4 1 Label 4 Group 2

fromdate step

Converts a date column into a string column based on a specified format.

{
    name: 'fromdate',
    column: 'myDateColumn'
    format: '%Y-%m-%d' // For further details on supported format specifiers,
                       // see https://docs.mongodb.com/manual/reference/operator/aggregation/dateFromString/#datefromstring-format-specifiers
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example

Input dataset:

Company Date Value
Company 1 2019-10-06T00:00:00.000Z 13
Company 1 2019-10-07T00:00:00.000Z 7
Company 1 2019-10-08T00:00:00.000Z 20
Company 2 2019-10-06T00:00:00.000Z 1
Company 2 2019-10-07T00:00:00.000Z 10
Company 2 2019-10-08T00:00:00.000Z 5

Step configuration:

{
  name: 'fromdate',
  column: 'Date',
  format: '%d/%m/%Y'
}

Output dataset:

Company Date Value
Company 1 06/10/2019 13
Company 1 07/10/2019 7
Company 1 08/10/2019 20
Company 2 06/10/2019 1
Company 2 07/10/2019 10
Company 2 08/10/2019 5

lowercase step

Converts a string column to lowercase.

{
  name: 'lowercase',
  column: 'foo',
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example:

Input dataset:

Label Group Value
LABEL 1 Group 1 13
LABEL 2 Group 1 7
LABEL 3 Group 1 20

Step configuration:

{
  name: 'lowercase',
  column: 'Label',
}

Output dataset:

Label Group Value
label 1 Group 1 13
label 2 Group 1 7
label 3 Group 1 20

percentage step

Compute the percentage of total, i.e. for every row the value in column divided by the total as the sum of every values in column. The computation can be performed by group if specified. The result is written inplace.

{
  name: 'percentage',
  column: 'bar',
  group: ['foo'] // optional
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example:

Input dataset:

Label Group Value
Label 1 Group 1 5
Label 2 Group 1 10
Label 3 Group 1 15
Label 4 Group 2 2
Label 5 Group 2 7
Label 6 Group 2 5

Step configuration:


  name: 'percentage',
  new_column: 'Percentage_of_total',
  column: 'Value',
  group: ['Group']
}

Output dataset:

Label Group Value
Label 1 Group 1 0.167
Label 2 Group 1 0.333
Label 3 Group 1 0.5
Label 4 Group 2 0.143
Label 5 Group 2 0.5
Label 6 Group 2 0.357

pivot step

Pivot rows into columns around a given index (expressed as a combination of column(s)). Values to be used as new column names are found in the column column_to_pivot. Values to populate new columns are found in the column value_column. The function used to aggregate data (when several rows are found by index group) must be among sum, avg, count, min or max.

{
 name: 'pivot',
 index: ['column_1', 'column_2'],
 column_to_pivot: 'column_3',
 value_column: 'column_4',
 agg_function: 'sum',
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example:

Input dataset:

Label Country Value
Label 1 Country1 13
Label 2 Country1 7
Label 3 Country1 20
Label 1 Country2 1
Label 2 Country2 10
Label 3 Country2 5
label 3 Country2 1

Step configuration:

{
 name: 'pivot',
 index: ['Label'],
 column_to_pivot: 'Country',
 value_column: 'Value',
 agg_function: 'sum',
}

Output dataset:

Label Country1 Country2
Label 1 13 1
Label 2 7 10
Label 3 20 6

rename step

Rename a column.,

{
    name: 'rename',
    oldname: 'old-column-name',
    newname: 'new-column-name'
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example:

Input dataset:

Label Group Value
Label 1 Group 1 13
Label 2 Group 1 7
Label 3 Group 1 20
Label 4 Group 2 1
Label 5 Group 2 10
Label 6 Group 2 5

Step configuration:

{
  name: 'rename',
  oldname: 'Label',
  newname: 'Company'
}

Output dataset:

Company Group Value
Label 1 Group 1 13
Label 2 Group 1 7
Label 3 Group 1 20
Label 4 Group 2 1
Label 5 Group 2 10
Label 6 Group 2 5

replace step

Replace one or several values in a column.

A replace step has the following strucure:

{
   name: 'replace',
   search_column: "column_1",
   to_replace: [
     ['foo', 'bar'], // The first value is the one to be replace, the second is the new value
     [42, 0]
   ]
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example

Input dataset:

COMPANY COUNTRY
Company 1 Fr
Company 2 UK

Step configuration:

{
   name: 'replace',
   search_column: "COUNTRY",
   to_replace: [
     ['Fr', 'France']
     ['UK', 'United Kingdom']
   ]
}

Output dataset:

COMPANY COUNTRY
Company 1 France
Company 2 United Kingdom

select step

Select a column. The default is to keep every columns of the input domain. If the select is used, it will only keep selected columns in the output.

{
    name: 'select',
    columns: ['my-column', 'some-other-column']
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example

Input dataset:

Company Group Value Label
Company 1 Group 1 13 Company 1 - Group 1
Company 2 Group 1 7 Company 2 - Group 1
Company 3 Group 1 20 Company 3 - Group 1
Company 4 Group 2 1 Company 4 - Group 2
Company 5 Group 2 10 Company 5 - Group 2
Company 6 Group 2 5 Company 6 - Group 2

Step configuration:

{
  {
    name: 'select',
    columns: ['Value', 'Label']
}
}

Output dataset:

Value Label
13 Company 1 - Group 1
7 Company 2 - Group 1
20 Company 3 - Group 1
1 Company 4 - Group 2
10 Company 5 - Group 2
5 Company 6 - Group 2

sort step

Sort values in one or several columns. Order can be either ‘asc’ or ‘desc’. When sorting on several columns, order of columns specified in columns matters.

{
    name: 'sort',
    columns: [{column: 'foo', order: 'asc'}, {column: 'bar', order: 'desc'}],
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example

Input dataset:

Label Group Value
Label 1 Group 1 13
Label 2 Group 1 7
Label 3 Group 1 20
Label 4 Group 2 1
Label 5 Group 2 10
Label 6 Group 2 5

Step configuration:

{
    name: 'sort',
    columns: [{ column: 'Group', order: 'asc'}, {column: 'Value', order: 'desc' }]
}

Output dataset:

Company Group Value
Label 3 Group 1 20
Label 1 Group 1 13
Label 2 Group 1 7
Label 5 Group 2 10
Label 6 Group 2 5
Label 4 Group 2 1

split step

Split a string column into several columns based on a delimiter.

{
  name: 'split',
  column: 'foo', // the columnn to split
  delimiter: ' - ', // the delimiter can e a strinng of any length
  number_cols_to_keep: 3, // the numer of columns to keep resulting from the
                          // split (starting from first chunk)
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example 1

Input dataset:

Label Value
Label 1 - Group 1 - France 13
Label 2 - Group 1 - Spain 7
Label 3 - Group 1 - USA 20
Label 4 - Group 2 - France 1
Label 5 - Group 2 - Spain 10
Label 6 - Group 2 - USA 5

Step configuration:

{
  name: 'split',
  column: 'Label',
  delimiter: ' - ',
  number_cols_to_keep: 3,
}

Output dataset:

Label_1 Label_2 Label_3 Value
Label 1 Group 1 Spain 13
Label 2 Group 1 USA 7
Label 3 Group 1 France 20
Label 4 Group 2 USA 1
Label 5 Group 2 France 10
Label 6 Group 2 Spain 5

Example 2: keeping less columns

Input dataset:

Label Value
Label 1 - Group 1 - France 13
Label 2 - Group 1 - Spain 7
Label 3 - Group 1 - USA 20
Label 4 - Group 2 - France 1
Label 5 - Group 2 - Spain 10
Label 6 - Group 2 - USA 5

Step configuration:

{
  name: 'split',
  column: 'Label',
  delimiter: ' - ',
  number_cols_to_keep: 2,
}

Output dataset:

Label_1 Label_2 Value
Label 1 Group 1 13
Label 2 Group 1 7
Label 3 Group 1 20
Label 4 Group 2 1
Label 5 Group 2 10
Label 6 Group 2 5

substring step

Extract a substring in a string column. The substring begins at index start_index (beginning at 1) and stops at end_index. You can specify negative indexes, in such a case the index search will start from the end of the string (with -1 being the last index of the string). Please refer to the examples below for illustration. Neither start_index nor end_index can be equal to 0.

{
  name: 'substring',
  column: 'foo',
  start_index: 1, // 1 = fisrt character
  end_index: -1, // -1 = last character
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example 1: positive start_index and end_index

Input dataset:

Group Value
foo 13
overflow 7
some_text 20
a_word 1
toucan 10
toco 5

Step configuration:

{
  column: 'Label',
  name: 'substring',
  start_index: 1,
  end_index: 4,
}
Label Value
foo 13
over 7
some 20
a_wo 1
touc 10
toco 5

Example 2: start_index is positive and end_index is negative

Input dataset:

Label Value
foo 13
overflow 7
some_text 20
a_word 1
toucan 10
toco 5

Step configuration:

{
  name: 'substring',
  column: 'Label',
  start_index: 2,
  end_index: -2,
}

Output dataset:

Label Value
o 13
verflo 7
ome_tex 20
_wor 1
ouca 10
oc 5

Example 3: start_index and end_index are negative

Input dataset:

Label Value
foo 13
overflow 7
some_text 20
a_word 1
toucan 10
toco 5

Step configuration:

{
  name: 'substring',
  column: 'Label',
  start_index: -3,
  end_index: -1,
}

Output dataset:

Label Value
foo 13
low 7
ext 20
ord 1
can 10
oco 5

text step

Use this step to add a text column where every value will be equal to the specified text.

{
  {
    name: 'text',
    new_column: 'new', // if the column already exists, overwrites it
    text: 'some text'
  }
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example

Input dataset:

Label Value1
Label 1 10
Label 2 1
Label 3 5

Step configuration:

{
  name: 'text',
  new_column: 'KPI',
  formula: 'Sales'
}

Output dataset:

Label Value1 KPI
Label 1 10 Sales
Label 2 1 Sales
Label 3 5 Sales

todate step

Converts a string column into a date column based on a specified format.

{
    name: 'todate',
    column: 'myTextColumn'
    format: '%Y-%m-%d' // Optional. If not specified, the backend tries to guess
                        // For further details on supported format specifiers,
                        // see https://docs.mongodb.com/manual/reference/operator/aggregation/dateFromString/#datefromstring-format-specifiers
                        // Note: format is unsupported in Mongo versions older than 4.0
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example

Input dataset:

Company Date Value
Company 1 06/10/2019 13
Company 1 07/10/2019 7
Company 1 08/10/2019 20
Company 2 06/10/2019 1
Company 2 07/10/2019 10
Company 2 08/10/2019 5

Step configuration:

{
  name: 'todate',
  column: 'Date',
  format: '%d/%m/%Y'
}

Output dataset:

Company Date Value
Company 1 2019-10-06T00:00:00.000Z 13
Company 1 2019-10-07T00:00:00.000Z 7
Company 1 2019-10-08T00:00:00.000Z 20
Company 2 2019-10-06T00:00:00.000Z 1
Company 2 2019-10-07T00:00:00.000Z 10
Company 2 2019-10-08T00:00:00.000Z 5

top step

Return top N rows by group if groups is specified, else over full dataset.

{
  name: 'top',
  groups: ['foo'],
  rank_on: 'bar',
  sort: 'desc', // or 'asc'
  limit: 10
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example 1: top without groups, ascending order

Input dataset:

Label Group Value
Label 1 Group 1 13
Label 2 Group 1 7
Label 3 Group 1 20
Label 4 Group 2 1
Label 5 Group 2 10
Label 6 Group 2 5

Step configuration:

{
  name: 'top',
  rank_on: 'Value',
  sort: 'asc',
  limit: 3
}

Output dataset:

Label Group Value
Label 4 Group 2 1
Label 6 Group 2 5
Label 2 Group 1 7

Example 2: top with groups, descending order

Input dataset:

Label Group Value
Label 1 Group 1 13
Label 2 Group 1 7
Label 3 Group 1 20
Label 4 Group 2 1
Label 5 Group 2 10
Label 6 Group 2 5

Step configuration:

{
  name: 'top',
  groups: ['Group'],
  rank_on: 'Value',
  sort: 'desc',
  limit: 1
}

Output dataset:

Company Group Value
Label 3 Group 1 20
Label 5 Group 2 10

unpivot step

Unpivot a list of columns to rows.

{
  name: 'unpivot',
  keep: ['COMPANY', 'COUNTRY'], // columns to keep fixed around which to unpivot columns
  unpivot: ['NB_CLIENTS', 'REVENUES'], // columns to unpivot
  unpivot_column_name: 'KPI', // name of the new dimension column created after unpivot
  value_column_name: 'VALUE', // name of the new value column created after unpivot
  dropna: true // whether null values have to be kept or the corresponding rows discarded
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example 1: with dropnaparameter to true

Input dataset:

COMPANY COUNTRY NB_CLIENTS REVENUES
Company 1 France 7 10
Company 2 France 2  
Company 1 USA 12 6
Company 2 USA 1 3

Step configuration:

{
  name: 'unpivot',
  keep: ['COMPANY', 'COUNTRY'],
  unpivot: ['NB_CLIENTS', 'REVENUES'],
  unpivot_column_name: 'KPI',
  value_column_name: 'VALUE',
  dropna: true
}

Output dataset:

COMPANY COUNTRY KPI VALUE
Company 1 France NB_CLIENTS 7
Company 1 France REVENUES 10
Company 2 France NB_CLIENTS 2
Company 1 USA NB_CLIENTS 12
Company 1 USA REVENUES 6
Company 2 USA NB_CLIENTS 1
Company 2 USA REVENUES 3

Example 1: with dropnaparameter to false

Input dataset:

COMPANY COUNTRY NB_CLIENTS REVENUES
Company 1 France 7 10
Company 2 France 2  
Company 1 USA 12 6
Company 2 USA 1 3

Step configuration:

{
  name: 'unpivot',
  keep: ['COMPANY', 'COUNTRY'],
  unpivot: ['NB_CLIENTS', 'REVENUES'],
  unpivot_column_name: 'KPI',
  value_column_name: 'VALUE',
  dropna: false
}

Output dataset:

COMPANY COUNTRY KPI VALUE
Company 1 France NB_CLIENTS 7
Company 1 France REVENUES 10
Company 2 France NB_CLIENTS 2
Company 2 France REVENUES  
Company 1 USA NB_CLIENTS 12
Company 1 USA REVENUES 6
Company 2 USA NB_CLIENTS 1
Company 2 USA REVENUES 3

uppercase step

Converts a string column to uppercase.

{
  name: 'uppercase',
  column: 'foo',
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example:

Input dataset:

Label Group Value
Label 1 Group 1 13
Label 2 Group 1 7
Label 3 Group 1 20

Step configuration:

{
  name: 'uppercase',
  column: 'Label',
}

Output dataset:

Label Group Value
LABEL 1 Group 1 13
LABEL 2 Group 1 7
LABEL 3 Group 1 20

uniquegroups step

Allow to get unique groups of values from one or several columns.

{
  name: 'uniquegroups',
  on: ['foo', 'bar']
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example:

Input dataset:

Label Group Value
Label 1 Group 1 13
Label 2 Group 1 7
Label 3 Group 1 20
Label 1 Group 2 1
Label 2 Group 1 2
Label 3 Group 1 3

Step configuration:

{
  name: 'uniquegroups',
  column: ['Label', 'Group'],
}

Output dataset:

Label Group
Label 1 Group 1
Label 1 Group 2
Label 2 Group 1
Label 3 Group 1