Steps definition

This document describes the steps that can be used in our pipelines. A step is the unit of transformation generated by a user interaction with the visual query builder.

Unless explicitly specified, all steps are supported by all backends.

aggregate step

Perform aggregations on one or several columns.

An aggreation step has the following strucure:

{
   name: 'aggregate',
   on: ['column1', 'column2'],
   aggregations:  [
      {
          newcolumn: 'sum_value1',
          aggfunction: 'sum',
          column: 'value1'
      }
    // ...
  ]
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example:

Input dataset:

Label Group Value
Label 1 Group 1 13
Label 2 Group 1 7
Label 3 Group 1 20
Label 4 Group 2 1
Label 5 Group 2 10
Label 6 Group 2 5

Step configuration:

{
  name: 'aggregate',
   on: ['Group'],
   aggregations:  [
    {
      newcolumn: 'Total',
      aggfunction: 'sum',
      column: 'Value'
    }
  ]
}

Output dataset:

Group Total
Group 1 30
Group 2 16

append step

Appends to the current dataset, one or several datasets resulting from other pipelines. WeaverBird allows you to save pipelines referenced by name in the Vuex store of the application. You can then call them by their unique names in this step.

{
  name: 'append',
  pipelines: ['pipeline1', 'pipeline2'] // the name of the pipelines to append
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example

Input dataset:

Label Group Value
Label 1 Group 1 13
Label 2 Group 1 7

dataset1 (saved in the application Vuex store):

Label Group Value
Label 3 Group 1 20
Label 4 Group 2 1

dataset2 (saved in the application Vuex store):

Label Group Value
Label 5 Group 2 10
Label 6 Group 2 5

Step configuration:

{
  name: 'append',
  datasets: ['dataset1', 'dataset2']
}

Output dataset:

Label Group Value
Label 1 Group 1 13
Label 2 Group 1 7
Label 3 Group 1 20
Label 4 Group 2 1
Label 5 Group 2 10
Label 6 Group 2 5

argmax step

Get row(s) matching the maximum value in a given column, by group if groups is specified.

{
  name: 'argmax',
  column: 'value', // column in which to search for max value
  groups: ['group1', 'group2']
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example 1: without groups

Input dataset:

Label Group Value
Label 1 Group 1 13
Label 2 Group 1 7
Label 3 Group 1 20
Label 4 Group 2 1
Label 5 Group 2 10
Label 6 Group 2 5

Step configuration:

{
  name: 'argmax',
  column: 'Value'
}

Output dataset:

Label Group Value
Label 3 Group 1 20

Example 2: with groups

Input dataset:

Label Group Value
Label 1 Group 1 13
Label 2 Group 1 7
Label 3 Group 1 20
Label 4 Group 2 1
Label 5 Group 2 10
Label 6 Group 2 5

Step configuration:

{
  name: 'argmax',
  column: 'Value',
  groups: ['Group']
}

Output dataset:

Label Group Value
Label 3 Group 1 20
Label 5 Group 2 10

argmin step

Get row(s) matching the minimum value in a given column, by group if groups is specified.

{
  name: 'argmin',
  column: 'value', // column in which to search for max value
  groups: ['group1', 'group2'] // optional
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example 1: without groups

Input dataset:

Label Group Value
Label 1 Group 1 13
Label 2 Group 1 7
Label 3 Group 1 20
Label 4 Group 2 1
Label 5 Group 2 10
Label 6 Group 2 5

Step configuration:

{
  name: 'argmin',
  column: 'Value'
}

Output dataset:

Label Group Value
Label 2 Group 1 7

Example 2: with groups

Input dataset:

Label Group Value
Label 1 Group 1 13
Label 2 Group 1 7
Label 3 Group 1 20
Label 4 Group 2 1
Label 5 Group 2 10
Label 6 Group 2 5

Step configuration:

{
  name: 'argmin',
  column: 'Value',
  groups: ['Groups']
}

Output dataset:

Label Group Value
Label 2 Group 1 7
Label 4 Group 2 1

concatenate step

This step allows to concatenate several columns using a separator.

{
  name: 'concatenate',
  columns: ['Company', 'Group']
  separator: ' - ' // can be a string of any length
  new_column_name: 'Label' // The new column in which to write the concatenation
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example

Input dataset:

Company Group Value
Company 1 Group 1 13
Company 2 Group 1 7
Company 3 Group 1 20
Company 4 Group 2 1
Company 5 Group 2 10
Company 6 Group 2 5

Step configuration:

{
  name: 'concatenate',
  columns: ['Company', 'Group']
  separator: ' - '
  new_column_name: 'Label'
}

Output dataset:

Company Group Value Label
Company 1 Group 1 13 Company 1 - Group 1
Company 2 Group 1 7 Company 2 - Group 1
Company 3 Group 1 20 Company 3 - Group 1
Company 4 Group 2 1 Company 4 - Group 2
Company 5 Group 2 10 Company 5 - Group 2
Company 6 Group 2 5 Company 6 - Group 2

convert step

This step allows to convert columns data types.

{
  name: 'convert',
  columns: ['col1', 'col2']
  data_type: 'integer' // The data type to convert the column into. Can be either
                       // 'integer', 'float', 'text', 'date' or 'boolean'
}

This step is supported by the following backends:

  • Mongo 4.0

Example

Input dataset:

Company Value
Company 1 ‘13’
Company 2 ‘7’
Company 3 ‘20’
Company 4 ‘1’
Company 5 ‘10’
Company 6 ‘5’

Step configuration:

{
  name: 'convert',
  columns: ['Value']
  data_type: 'integer'
}

Output dataset:

Company Value
Company 1 13
Company 2 7
Company 3 20
Company 4 1
Company 5 10
Company 6 5

cumsum step

This step allows to compute the cumulated sum of value column based on a reference column (usually dates) to be sorted by ascending order for the needs of the computation. The computation can be scoped by group if needed.

{
  name: 'cumsum',
  valueColumn: 'myValues',
  referenceColumn: 'myDates',
  groupby: ['foo', 'bar'],
  newColumn: 'myCumsum'
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example 1: Basic usage

Input dataset:

DATE VALUE
2019-01 2
2019-02 5
2019-03 3
2019-04 8
2019-05 9
2019-06 6

Step configuration:

{
  name: 'cumsum',
  valueColumn: 'VALUE',
  referenceColumn: 'DATE',
}

Output dataset:

DATE VALUE VALUE_CUMSUM
2019-01 2 2
2019-02 5 7
2019-03 3 10
2019-04 8 18
2019-05 9 27
2019-06 6 6 33

Example 2: With more advanced options

Input dataset:

COUNTRY DATE VALUE
France 2019-01 2
France 2019-02 5
France 2019-03 3
France 2019-04 8
France 2019-05 9
France 2019-06 6 6
USA 2019-01 10
USA 2019-02 6
USA 2019-03 6
USA 2019-04 4
USA 2019-05 8
USA 2019-06 6 7

Step configuration:

{
  name: 'cumsum',
  valueColumn: 'VALUE',
  referenceColumn: 'DATE',
  groupby: ['COUNTRY'],
  newColumn: 'MY_CUMSUM'
}

Output dataset:

COUNTRY DATE VALUE MY_CUMSUM
France 2019-01 2 2
France 2019-02 5 7
France 2019-03 3 10
France 2019-04 8 18
France 2019-05 9 27
France 2019-06 6 6 33
USA 2019-01 10 10
USA 2019-02 6 16
USA 2019-03 6 22
USA 2019-04 4 26
USA 2019-05 8 34
USA 2019-06 6 7 41

custom step

This step allows to define a custom query that can’t be expressed using the other existing steps.

{
    name: 'custom',
    query: '$group: {_id: ...}'
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example: using Mongo query language

Input dataset:

Company Group Value
Company 1 Group 1 13
Company 2 Group 1 7
Company 3 Group 1 20
Company 4 Group 2 1
Company 5 Group 2 10
Company 6 Group 2 5

Step configuration:

{
  name: 'custom',
  query: '$addFields: { Label: { $concat: [ "$Label", " - ", "$Group" ] ] } }'
}

Output dataset:

Company Group Value Label
Company 1 Group 1 13 Company 1 - Group 1
Company 2 Group 1 7 Company 2 - Group 1
Company 3 Group 1 20 Company 3 - Group 1
Company 4 Group 2 1 Company 4 - Group 2
Company 5 Group 2 10 Company 5 - Group 2
Company 6 Group 2 5 Company 6 - Group 2

dateextract step

Extract a part of a date (e.g. day, week, year) column. The following properties can be extracted:

  • year': extract ‘year’ from date,
  • month': extract ‘month’ from date,
  • day': extract ‘day of month’ from date,
  • hour': extract ‘hour’ from date,
  • minutes': extract ‘minutes’ from date,
  • seconds': extract ‘seconds’ from date,
  • milliseconds': extract ‘milliseconds’ from date,
  • dayOfYear': extract ‘day of year’ from date,
  • dayOfWeek': extract ‘day of week’ from date,
  • week': extract ‘week number’ from date.

Here’s an example of such a step:

{
    name: 'dateextract',
    column: 'date',
    operation: 'day',
    new_column_name: 'date_day',
    format: '%Y-%m-%d',
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example

Input dataset:

Company Date
Company 1 2019-10-06T00:00:00.000Z
Company 1 2019-10-07T00:00:00.000Z
Company 1 2019-10-08T00:00:00.000Z
Company 2 2019-10-06T00:00:00.000Z
Company 2 2019-10-07T00:00:00.000Z
Company 2 2019-10-08T00:00:00.000Z

Step configuration:

{
    name: 'dateextract',
    column: 'Date',
    operation: 'day',
    new_column_name: 'Date_day',
    format: '%Y-%m-%d',
}

Output dataset:

Company Date Date_day
Company 1 2019-10-06T00:00:00.000Z 6
Company 1 2019-10-07T00:00:00.000Z 7
Company 1 2019-10-08T00:00:00.000Z 8
Company 2 2019-10-06T00:00:00.000Z 6
Company 2 2019-10-07T00:00:00.000Z 7
Company 2 2019-10-08T00:00:00.000Z 8

delete step

Delete a column.

{
    name: 'delete',
    columns: ['my-column', 'some-other-column']
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example

Input dataset:

Company Group Value Label
Company 1 Group 1 13 Company 1 - Group 1
Company 2 Group 1 7 Company 2 - Group 1
Company 3 Group 1 20 Company 3 - Group 1
Company 4 Group 2 1 Company 4 - Group 2
Company 5 Group 2 10 Company 5 - Group 2
Company 6 Group 2 5 Company 6 - Group 2

Step configuration:

{
  name: 'delete',
  columns: ['Company', 'Group']
}

Output dataset:

Value Label
13 Company 1 - Group 1
7 Company 2 - Group 1
20 Company 3 - Group 1
1 Company 4 - Group 2
10 Company 5 - Group 2
5 Company 6 - Group 2

domain step

This step is meant to select a specific domain (using MongoDB terminology).

{
    name: 'domain',
    domain: 'my-domain'
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

duplicate step

This step is meant to duplicate a column.

{
    name: 'duplicate',
    column: 'my-column'
    new_column_name: 'my-duplicate'
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example

Input dataset:

Company Value
Company 1 13
Company 2 0
Company 3 20

Step configuration:

{
  name: 'duplicate',
  column: "Company",
  new_column_name: "Company-copy",
}

Output dataset:

Company Value Company-copy
Company 1 13 Company 1
Company 2 0 Company 2
Company 3 20 Company 3

evolution step

Use this step if you need to compute the row-by-row evolution of a value column, based on a date column. It will output 2 columns: one for the evolution in absolute value, the other for the evolution in percentage.

You must be careful that the computation is scoped so that there are no dates duplicates (so that any date finds no more than one previous date). That means that you may need to specify index columns to make any date unique by index. You should use the indexColumns parameter to specifiy what series of columns allows to build a unique index (by concatenation with the date column). If index is not unique and more than one previous date is found, you will get an error at row level (“Error: More than one previous date found for the specified index columns”). Please refere to example 3 and 4 below for a concrete illustration.

{
  name: 'evolution',
  dateCol: 'DATE',
  valueCol: 'VALUE',
  evolutionType: 'vsLastYear', // or vsLastMonth, vsLastWeek, vsLastDay
  evolutionFormat: 'abs', // or pct
  // optional, if computation has to be performed by group so that any date finds no more than one previous date
  indexColumns: ['COUNTRY'],
  newColumn: 'MY_EVOL', // optional, <originalColumnName>_EVOL_ABS or <originalColumnName>_EVOL_PCT by default
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example 1: Basic configuration - evolution in absolute value

Input dataset:

DATE VALUE
2019-06 79
2019-07 81
2019-08 77
2019-09 75
2019-11 78
2019-12 88

Step configuration:

{
  name: 'evolution',
  dateCol: 'DATE',
  valueCol: 'VALUE',
  evolutionType: 'vsLastMonth',
  evolutionFormat: 'abs',
  indexColumns: [],
}

Output dataset:

DATE VALUE VALUE_EVOL_ABS
2019-06 79  
2019-07 81 2
2019-08 77 -4
2019-09 75 -2
2019-11 78  
2019-12 88 10

Example 2: Basic configuration - evolution in percentage

Input dataset:

DATE VALUE
2019-06 79
2019-07 81
2019-08 77
2019-09 75
2019-11 78
2019-12 88

Step configuration:

{
  name: 'evolution',
  dateCol: 'DATE',
  valueCol: 'VALUE',
  evolutionType: 'vsLastMonth',
  evolutionFormat: 'pct',
  indexColumns: [],
}

Output dataset:

DATE VALUE VALUE_EVOL_PCT
2019-06 79  
2019-07 81 0.02531645569620253
2019-08 77 -0.04938271604938271
2019-09 75 -0.025974025974025976
2019-11 78  
2019-12 88 0.1282051282051282

Example 3: Error on duplicate dates

If ‘COUNTRY’ is not specified as index column, the computation will not be scoped by country. Then there are duplicate dates in the “DATE” columns which is prohibited.

Input dataset:

DATE COUNTRY VALUE
2014-12 France 79
2015-12 France 81
2016-12 France 77
2017-12 France 75
2014-12 USA 74
2015-12 USA 74
2016-12 USA 73
2017-12 USA 72

Step configuration:

{
  name: 'evolution',
  dateCol: 'DATE',
  valueCol: 'VALUE',
  evolutionType: 'vsLastYear',
  evolutionFormat: 'abs',
  indexColumns: [],
}

Output dataset:

DATE COUNTRY VALUE MY_EVOL
2014-12 France 79  
2015-12 France 81 Error …
2016-12 France 77 Error …
2017-12 France 75 Error …
2014-12 USA 74  
2015-12 USA 74 Error …
2016-12 USA 73 Error …
2017-12 USA 72 Error …

Example 4: Complete configuration with index columns

Input dataset:

DATE COUNTRY VALUE
2014-12 France 79
2015-12 France 81
2016-12 France 77
2017-12 France 75
2019-12 France 78
2020-12 France 88
2014-12 USA 74
2015-12 USA 74
2016-12 USA 73
2017-12 USA 72
2018-11 USA 75
2020-12 USA 76

Step configuration:

{
  name: 'evolution',
  dateCol: 'DATE',
  valueCol: 'VALUE',
  evolutionType: 'vsLastYear',
  evolutionFormat: 'abs',
  indexColumns: ['COUNTRY'],
  newColumn: 'MY_EVOL',
}

Output dataset:

DATE COUNTRY VALUE MY_EVOL
2014-12 France 79  
2015-12 France 81 2
2016-12 France 77 -4
2017-12 France 75 -2
2019-12 France 78  
2020-12 France 88 10
2014-12 USA 74  
2015-12 USA 74 0
2016-12 USA 73 -1
2017-12 USA 72 -1
2018-11 USA 75 3
2020-12 USA 76  

fillna step

Replace null values by a given value in a column.

{
    name: 'fillna',
    column: "foo",
    value: "bar"
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example

Input dataset:

Company Group Value
Company 1 Group 1 13
Company 2 Group 1  
Company 3 Group 1 20
Company 4 Group 2 1
Company 5 Group 2  
Company 6 Group 2 5

Step configuration:

{
  name: 'fillna',
  column: "Value",
  value: 0
}

Output dataset:

Company Group Value
Company 1 Group 1 13
Company 2 Group 1 0
Company 3 Group 1 20
Company 4 Group 2 1
Company 5 Group 2 0
Company 6 Group 2 5

filter step

Filter out lines that don’t match a filter definition.

{
    name: 'filter',
    condition: {
      column: 'my-column',
      value: 42,
      operator: 'ne'
    }
}

operator is optional, and defaults to eq. Allowed operators are eq, ne, gt, ge, lt, le, in, nin, matches, notmatches isnull or notnull.

value can be an arbitrary value depending on the selected operator (e.g a list when used with the in operator, or null when used with the isnull operator).

matches and notmatches operators are used to test value against a regular expression.

Conditions can be grouped and nested with logical operators and and or.

{
    name: 'filter',
    condition: {
      and: [
        {
          column: 'my-column',
          value: 42,
          operator: 'gte'
        },
        {
          column: 'my-column',
          value: 118,
          operator: 'lte'
        },
        {
          or: [
            {
              column: 'my-other-column',
              value: 'blue',
              operator: 'eq'
            },
            {
              column: 'my-other-column',
              value: 'red',
              operator: 'eq'
            }
          ]
        }
      ]
    }
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

formula step

Add a computation based on a formula. Usually column names do not need to be escaped, unless they include whitespaces, in such a case you need to use brackets ‘[]’ (e.g. [myColumn]). Any characters string escaped with quotes (simple or double) will be considered as a string.

{
  {
    name: 'formula',
    new_column: 'result', // if the column already exists, overwrites it
    formula: '(Value1 + Value2) / Value3 - Value4 * 2'
  }
}

Example 1: Basic usage

Input dataset:

Label Value1 Value2 Value3 Value4
Label 1 10 2 3 1
Label 2 1 13 7 3
Label 3 5 20 5 2

Step configuration:

{
  name: 'formula',
  new_column: 'Result',
  formula: '(Value1 + Value2) / Value3 - Value4 * 2'
}

Output dataset:

Label Value1 Value2 Value3 Value4 Result
Label 1 10 2 3 1 2
Label 2 1 13 7 3 -4
Label 3 5 20 5 2 1

Example 2: Column name with whitespaces

Input dataset:

Label Value1 Value2 Value3 Value 4
Label 1 10 2 3 1
Label 2 1 13 7 3
Label 3 5 20 5 2

Step configuration:

{
  name: 'formula',
  new_column: 'Result',
  formula: '(Value1 + Value2) / Value3 - [Value 4] * 2'
}

Output dataset:

Label Value1 Value2 Value3 Value 4 Result
Label 1 10 2 3 1 2
Label 2 1 13 7 3 -4
Label 3 5 20 5 2 1

join step

Joins a dataset to the current dataset, i.e. brings columns from the former into the latter, and matches rows based on columns correspondance. It is similar to a JOIN clause in SQL, or to a VLOOKUP in excel. The joined dataset is the result from the query of the right_pipeline.

The join type can be:

  • ‘left’: will keep every row of the current dataset and fill unmatched rows with null values,

  • ‘inner’: will only keep rows that match rows of the joined dataset.

In the on parameter, you must specify 1 or more column couple(s) that will be compared to determine rows correspondance between the 2 datasets. The first element of a couple is for the current dataset column, and the second for the corresponding column in the right dataset to be joined. If you specify more than 1 couple, the matching rows will be those that find a correspondance between the 2 datasets for every column couple specified (logical ‘AND’).

Weaverbird allows you to save pipelines referenced by name in the Vuex store of the application. You can then call them by their unique names in this step.

{
  name: 'join',
  right_pipeline: 'somePipelineReference',
  type: 'left', // or 'inner'
  on: [
    ['currentDatasetColumn1', 'rightDatasetColumn1'],
    ['currentDatasetColumn2', 'rightDatasetColumn2'],
  ]
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example 1: Left join with one column couple as on parameter

Input dataset:

Label Value
Label 1 13
Label 2 7
Label 3 20
Label 4 1
Label 5 1
Label 6 1

rightDataset (saved in the application Vuex store):

Label Group
Label 1 Group 1
Label 2 Group 1
Label 3 Group 2
Label 4 Group 2

Step configuration:

{
  name: 'join',
  right_pipeline: 'rightDataset',
  type: 'left',
  on: [['Label', 'Label']];
}

Output dataset:

Label Value Group
Label 1 13 Group 1
Label 2 7 Group 1
Label 3 20 Group 2
Label 4 1 Group 2
Label 5 1  
Label 6 1  

Example 2: inner join with different column names in the on parameter

Input dataset:

Label Value
Label 1 13
Label 2 7
Label 3 20
Label 4 1
Label 5 1
Label 6 1

rightDataset (saved in the application Vuex store):

LabelRight Group
Label 1 Group 1
Label 2 Group 1
Label 3 Group 2
Label 4 Group 2

Step configuration:

{
  name: 'join',
  right_pipeline: 'rightDataset',
  type: 'inner',
  on: [['Label', 'LabelRight']];
}

Output dataset:

Label Value LabelRight Group
Label 1 13 Label 1 Group 1
Label 2 7 Label 2 Group 1
Label 3 20 Label 3 Group 2
Label 4 1 Label 4 Group 2

fromdate step

Converts a date column into a string column based on a specified format.

{
    name: 'fromdate',
    column: 'myDateColumn'
    format: '%Y-%m-%d' // For further details on supported format specifiers,
                       // see https://docs.mongodb.com/manual/reference/operator/aggregation/dateFromString/#datefromstring-format-specifiers
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example

Input dataset:

Company Date Value
Company 1 2019-10-06T00:00:00.000Z 13
Company 1 2019-10-07T00:00:00.000Z 7
Company 1 2019-10-08T00:00:00.000Z 20
Company 2 2019-10-06T00:00:00.000Z 1
Company 2 2019-10-07T00:00:00.000Z 10
Company 2 2019-10-08T00:00:00.000Z 5

Step configuration:

{
  name: 'fromdate',
  column: 'Date',
  format: '%d/%m/%Y'
}

Output dataset:

Company Date Value
Company 1 06/10/2019 13
Company 1 07/10/2019 7
Company 1 08/10/2019 20
Company 2 06/10/2019 1
Company 2 07/10/2019 10
Company 2 08/10/2019 5

lowercase step

Converts a string column to lowercase.

{
  name: 'lowercase',
  column: 'foo',
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example:

Input dataset:

Label Group Value
LABEL 1 Group 1 13
LABEL 2 Group 1 7
LABEL 3 Group 1 20

Step configuration:

{
  name: 'lowercase',
  column: 'Label',
}

Output dataset:

Label Group Value
label 1 Group 1 13
label 2 Group 1 7
label 3 Group 1 20

percentage step

Compute the percentage of total, i.e. for every row the value in column divided by the total as the sum of every values in column. The computation can be performed by group if specified. The result is written in a new column.

{
  name: 'percentage',
  column: 'bar',
  group: ['foo'] // optional
  newColumName: 'myNewColumn' // optional, <originalColumname>_PCT by default
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example:

Input dataset:

Label Group Value
Label 1 Group 1 5
Label 2 Group 1 10
Label 3 Group 1 15
Label 4 Group 2 2
Label 5 Group 2 7
Label 6 Group 2 5

Step configuration:


  name: 'percentage',
  new_column: 'Percentage_of_total',
  column: 'Value',
  group: ['Group']
  newColumn: 'Percentage'
}

Output dataset:

Label Group Value Percentage
Label 1 Group 1 5 0.167
Label 2 Group 1 10 0.333
Label 3 Group 1 15 0.5
Label 4 Group 2 2 0.143
Label 5 Group 2 7 0.5
Label 6 Group 2 5 0.357

pivot step

Pivot rows into columns around a given index (expressed as a combination of column(s)). Values to be used as new column names are found in the column column_to_pivot. Values to populate new columns are found in the column value_column. The function used to aggregate data (when several rows are found by index group) must be among sum, avg, count, min or max.

{
 name: 'pivot',
 index: ['column_1', 'column_2'],
 column_to_pivot: 'column_3',
 value_column: 'column_4',
 agg_function: 'sum',
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example:

Input dataset:

Label Country Value
Label 1 Country1 13
Label 2 Country1 7
Label 3 Country1 20
Label 1 Country2 1
Label 2 Country2 10
Label 3 Country2 5
label 3 Country2 1

Step configuration:

{
 name: 'pivot',
 index: ['Label'],
 column_to_pivot: 'Country',
 value_column: 'Value',
 agg_function: 'sum',
}

Output dataset:

Label Country1 Country2
Label 1 13 1
Label 2 7 10
Label 3 20 6

statistics step

Compute statistics of a column.,

{
    name: 'statistics',
    column: 'Value',
    groupby: [],
    statistics: ['average', 'count'],
    quantiles: [{label: 'median', nth: 1, order: 2}],
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example:

Input dataset:

Label Group Value
Label 1 Group 1 13
Label 2 Group 1 7
Label 3 Group 1 20
Label 4 Group 2 1
Label 5 Group 2 10
Label 6 Group 2 5

Step configuration:

{
    name: 'statistics',
    column: 'Value',
    groupby: [],
    statistics: ['average', 'count'],
    quantiles: [{label: 'median', nth: 1, order: 2}],
}

Output dataset:

average count median
9.33333 6 8.5

rename step

Rename a column.,

{
    name: 'rename',
    oldname: 'old-column-name',
    newname: 'new-column-name'
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example:

Input dataset:

Label Group Value
Label 1 Group 1 13
Label 2 Group 1 7
Label 3 Group 1 20
Label 4 Group 2 1
Label 5 Group 2 10
Label 6 Group 2 5

Step configuration:

{
  name: 'rename',
  oldname: 'Label',
  newname: 'Company'
}

Output dataset:

Company Group Value
Label 1 Group 1 13
Label 2 Group 1 7
Label 3 Group 1 20
Label 4 Group 2 1
Label 5 Group 2 10
Label 6 Group 2 5

replace step

Replace one or several values in a column.

A replace step has the following strucure:

{
   name: 'replace',
   search_column: "column_1",
   to_replace: [
     ['foo', 'bar'], // The first value is the one to be replace, the second is the new value
     [42, 0]
   ]
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example

Input dataset:

COMPANY COUNTRY
Company 1 Fr
Company 2 UK

Step configuration:

{
   name: 'replace',
   search_column: "COUNTRY",
   to_replace: [
     ['Fr', 'France']
     ['UK', 'United Kingdom']
   ]
}

Output dataset:

COMPANY COUNTRY
Company 1 France
Company 2 United Kingdom

rollup step

Use this step if you need to compute aggregated data at every level of a hierarchy, specified as a series of columns from top to bottom level. The output data structure stacks the data of every level of the hierarchy, specifying for every row the label, level and parent in dedicated columns.

{
   name: 'rollup',
   hierarchy: ['continent', 'country', 'city'], // Hierarchical columns, from top to bottom level
   aggregations: [ // The columns to be aggregated, and how
    {
      newcolumn: 'revenues',
      aggfunction: 'sum', // aggregate using sum
      column: 'revenues'
    },
    {
      newcolumn: 'average_cost',
      aggfunction: 'avg', // aggregate using average
      column: 'cost'
    }
   ],
   groupby: ['date'], // optional, if needing to segment the rollup computation by group
   labelCol: 'label', // optional, name of the output label column, 'label' by default
   levelCol: 'label', // optional, name of the output level column, 'level' by default
   parentLabelCol: 'label', // optional, name of the output parent column, 'parent' by default
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example 1 : Basic configuration

Input dataset:

CITY COUNTRY CONTINENT YEAR VALUE
Paris France Europe 2018 10
Bordeaux France Europe 2018 5
Barcelona Spain Europe 2018 8
Madrid Spain Europe 2018 3
Boston USA North America 2018 12
New-York USA North America 2018 21
Montreal Canada North America 2018 10
Ottawa Canada North America 2018 7
Paris France Europe 2019 13
Bordeaux France Europe 2019 8
Barcelona Spain Europe 2019 11
Madrid Spain Europe 2019 6
Boston USA North America 2019 15
New-York USA North America 2019 24
Montreal Canada North America 2019 10
Ottawa Canada North America 2019 13

Step configuration:

{
   name: 'rollup',
   hierarchy: ['CONTINENT', 'COUNTRY', 'CITY'],
   aggregations: [
    {
      newcolumn: 'revenues',
      aggfunction: 'sum',
      column: 'revenues'
    },
   ],
}

Output dataset:

CITY COUNTRY CONTINENT label level parent VALUE
    North America Europe CONTINENT   64
    North America North America CONTINENT   112
  France Europe France COUNTRY Europe 36
  Spain Europe Spain COUNTRY Europe 28
  USA North America USA COUNTRY North America 72
  Canada North America Canada COUNTRY North America 40
Paris France Europe Paris CITY France 23
Bordeaux France Europe Bordeaux CITY France 13
Barcelona Spain Europe Barcelona CITY Spain 19
Madrid Spain Europe Madrid CITY Spain 9
Boston USA North America Boston CITY USA 27
New-York USA North America New-York CITY USA 45
Montreal Canada North America Montreal CITY Canada 17
Ottawa Canada North America Ottawa CITY Canada 23

Example 2 : Configuration with optional parameters

Input dataset:

CITY COUNTRY CONTINENT YEAR VALUE
Paris France Europe 2018 10
Bordeaux France Europe 2018 5
Barcelona Spain Europe 2018 8
Madrid Spain Europe 2018 3
Boston USA North America 2018 12
New-York USA North America 2018 21
Montreal Canada North America 2018 10
Ottawa Canada North America 2018 7
Paris France Europe 2019 13
Bordeaux France Europe 2019 8
Barcelona Spain Europe 2019 11
Madrid Spain Europe 2019 6
Boston USA North America 2019 15
New-York USA North America 2019 24
Montreal Canada North America 2019 10
Ottawa Canada North America 2019 13

Step configuration:

{
   name: 'rollup',
   hierarchy: ['CONTINENT', 'COUNTRY', 'CITY'],
   aggregations: [
    {
      newcolumn: 'VALUE-sum',
      aggfunction: 'sum',
      column: 'VALUE'
    },
    {
      newcolumn: 'VALUE-avg',
      aggfunction: 'avg',
      column: 'VALUE'
    }
   ],
   groupby: ['date'],
   labelCol: 'label',
   levelCol: 'label',
   parentLabelCol: 'label',
}

Output dataset:

CITY COUNTRY CONTINENT YEAR MY_LABEL MY_LEVEL MY_PARENT VALUE-sum VALUE-avg
    North America 2018 Europe CONTINENT   26 6.5
    North America 2018 North America CONTINENT   50 12.5
  France Europe 2018 France COUNTRY Europe 15 7.5
  Spain Europe 2018 Spain COUNTRY Europe 11 5.5
  USA North America 2018 USA COUNTRY North America 33 16.5
  Canada North America 2018 Canada COUNTRY North America 17 8.5
Paris France Europe 2018 Paris CITY France 10 10
Bordeaux France Europe 2018 Bordeaux CITY France 5 5
Barcelona Spain Europe 2018 Barcelona CITY Spain 8 8
Madrid Spain Europe 2018 Madrid CITY Spain 3 3
Boston USA North America 2018 Boston CITY USA 12 12
New-York USA North America 2018 New-York CITY USA 21 21
Montreal Canada North America 2018 Montreal CITY Canada 7 7
Ottawa Canada North America 2018 Ottawa CITY Canada 10 10
    North America 2019 Europe CONTINENT   38 9.5
    North America 2019 North America CONTINENT   62 15.5
  France Europe 2019 France COUNTRY Europe 21 10.5
  Spain Europe 2019 Spain COUNTRY Europe 17 8.5
  USA North America 2019 USA COUNTRY North America 39 19.5
  Canada North America 2019 Canada COUNTRY North America 23 11.1
Paris France Europe 2019 Paris CITY France 13 13
Bordeaux France Europe 2019 Bordeaux CITY France 8 8
Barcelona Spain Europe 2019 Barcelona CITY Spain 11 11
Madrid Spain Europe 2019 Madrid CITY Spain 6 6
Boston USA North America 2019 Boston CITY USA 15 15
New-York USA North America 2019 New-York CITY USA 24 24
Montreal Canada North America 2019 Montreal CITY Canada 17 10
Ottawa Canada North America 2019 Ottawa CITY Canada 23 13

select step

Select a column. The default is to keep every columns of the input domain. If the select is used, it will only keep selected columns in the output.

{
    name: 'select',
    columns: ['my-column', 'some-other-column']
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example

Input dataset:

Company Group Value Label
Company 1 Group 1 13 Company 1 - Group 1
Company 2 Group 1 7 Company 2 - Group 1
Company 3 Group 1 20 Company 3 - Group 1
Company 4 Group 2 1 Company 4 - Group 2
Company 5 Group 2 10 Company 5 - Group 2
Company 6 Group 2 5 Company 6 - Group 2

Step configuration:

{
  {
    name: 'select',
    columns: ['Value', 'Label']
}
}

Output dataset:

Value Label
13 Company 1 - Group 1
7 Company 2 - Group 1
20 Company 3 - Group 1
1 Company 4 - Group 2
10 Company 5 - Group 2
5 Company 6 - Group 2

sort step

Sort values in one or several columns. Order can be either ‘asc’ or ‘desc’. When sorting on several columns, order of columns specified in columns matters.

{
    name: 'sort',
    columns: [{column: 'foo', order: 'asc'}, {column: 'bar', order: 'desc'}],
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example

Input dataset:

Label Group Value
Label 1 Group 1 13
Label 2 Group 1 7
Label 3 Group 1 20
Label 4 Group 2 1
Label 5 Group 2 10
Label 6 Group 2 5

Step configuration:

{
    name: 'sort',
    columns: [{ column: 'Group', order: 'asc'}, {column: 'Value', order: 'desc' }]
}

Output dataset:

Company Group Value
Label 3 Group 1 20
Label 1 Group 1 13
Label 2 Group 1 7
Label 5 Group 2 10
Label 6 Group 2 5
Label 4 Group 2 1

split step

Split a string column into several columns based on a delimiter.

{
  name: 'split',
  column: 'foo', // the columnn to split
  delimiter: ' - ', // the delimiter can e a strinng of any length
  number_cols_to_keep: 3, // the numer of columns to keep resulting from the
                          // split (starting from first chunk)
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example 1

Input dataset:

Label Value
Label 1 - Group 1 - France 13
Label 2 - Group 1 - Spain 7
Label 3 - Group 1 - USA 20
Label 4 - Group 2 - France 1
Label 5 - Group 2 - Spain 10
Label 6 - Group 2 - USA 5

Step configuration:

{
  name: 'split',
  column: 'Label',
  delimiter: ' - ',
  number_cols_to_keep: 3,
}

Output dataset:

Label_1 Label_2 Label_3 Value
Label 1 Group 1 Spain 13
Label 2 Group 1 USA 7
Label 3 Group 1 France 20
Label 4 Group 2 USA 1
Label 5 Group 2 France 10
Label 6 Group 2 Spain 5

Example 2: keeping less columns

Input dataset:

Label Value
Label 1 - Group 1 - France 13
Label 2 - Group 1 - Spain 7
Label 3 - Group 1 - USA 20
Label 4 - Group 2 - France 1
Label 5 - Group 2 - Spain 10
Label 6 - Group 2 - USA 5

Step configuration:

{
  name: 'split',
  column: 'Label',
  delimiter: ' - ',
  number_cols_to_keep: 2,
}

Output dataset:

Label_1 Label_2 Value
Label 1 Group 1 13
Label 2 Group 1 7
Label 3 Group 1 20
Label 4 Group 2 1
Label 5 Group 2 10
Label 6 Group 2 5

substring step

Extract a substring in a string column. The substring begins at index start_index (beginning at 1) and stops at end_index. You can specify negative indexes, in such a case the index search will start from the end of the string (with -1 being the last index of the string). Please refer to the examples below for illustration. Neither start_index nor end_index can be equal to 0.

{
  name: 'substring',
  column: 'foo',
  start_index: 1, // 1 = fisrt character
  end_index: -1, // -1 = last character
  newColumName: 'myNewColumn' // optional, <originalColumame>_SUBSTR by default
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example 1: positive start_index and end_index

Input dataset:

Group Value
foo 13
overflow 7
some_text 20
a_word 1
toucan 10
toco 5

Step configuration:

{
  column: 'Label',
  name: 'substring',
  start_index: 1,
  end_index: 4,
}
Label Value Label_PCT
foo 13 foo
overflow 7 over
some_text 20 some
a_word 1 a_wo
toucan 10 touc
toco 5 toco

Example 2: start_index is positive and end_index is negative

Input dataset:

Label Value
foo 13
overflow 7
some_text 20
a_word 1
toucan 10
toco 5

Step configuration:

{
  name: 'substring',
  column: 'Label',
  start_index: 2,
  end_index: -2,
  newColumName: 'short_label',
}

Output dataset:

Label Value short_label
foo 13 o
overflow 7 verflo
some_text 20 ome_tex
a_word 1 _wor
toucan 10 ouca
toco 5 oc

Example 3: start_index and end_index are negative

Input dataset:

Label Value
foo 13
overflow 7
some_text 20
a_word 1
toucan 10
toco 5

Step configuration:

{
  name: 'substring',
  column: 'Label',
  start_index: -3,
  end_index: -1,
}

Output dataset:

Label Value Label_PCT
foo 13 foo
overflow 7 low
some_text 20 ext
a_word 1 ord
toucan 10 can
toco 5 oco

text step

Use this step to add a text column where every value will be equal to the specified text.

{
  {
    name: 'text',
    new_column: 'new', // if the column already exists, overwrites it
    text: 'some text'
  }
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example

Input dataset:

Label Value1
Label 1 10
Label 2 1
Label 3 5

Step configuration:

{
  name: 'text',
  new_column: 'KPI',
  text: 'Sales'
}

Output dataset:

Label Value1 KPI
Label 1 10 Sales
Label 2 1 Sales
Label 3 5 Sales

todate step

Converts a string column into a date column based on a specified format.

{
    name: 'todate',
    column: 'myTextColumn'
    format: '%Y-%m-%d' // Optional. If undefined, the backend tries to infer the date format.
                       // For further details on supported format specifiers,
                       // see https://docs.mongodb.com/manual/reference/operator/aggregation/dateFromString/#datefromstring-format-specifiers
                       // Note: custom format is not supported in Mongo versions older than 4.0
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example

Input dataset:

Company Date Value
Company 1 06/10/2019 13
Company 1 07/10/2019 7
Company 1 08/10/2019 20
Company 2 06/10/2019 1
Company 2 07/10/2019 10
Company 2 08/10/2019 5

Step configuration:

{
  name: 'todate',
  column: 'Date',
  format: '%d/%m/%Y'
}

Output dataset:

Company Date Value
Company 1 2019-10-06T00:00:00.000Z 13
Company 1 2019-10-07T00:00:00.000Z 7
Company 1 2019-10-08T00:00:00.000Z 20
Company 2 2019-10-06T00:00:00.000Z 1
Company 2 2019-10-07T00:00:00.000Z 10
Company 2 2019-10-08T00:00:00.000Z 5

top step

Return top N rows by group if groups is specified, else over full dataset.

{
  name: 'top',
  groups: ['foo'],
  rank_on: 'bar',
  sort: 'desc', // or 'asc'
  limit: 10
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example 1: top without groups, ascending order

Input dataset:

Label Group Value
Label 1 Group 1 13
Label 2 Group 1 7
Label 3 Group 1 20
Label 4 Group 2 1
Label 5 Group 2 10
Label 6 Group 2 5

Step configuration:

{
  name: 'top',
  rank_on: 'Value',
  sort: 'asc',
  limit: 3
}

Output dataset:

Label Group Value
Label 4 Group 2 1
Label 6 Group 2 5
Label 2 Group 1 7

Example 2: top with groups, descending order

Input dataset:

Label Group Value
Label 1 Group 1 13
Label 2 Group 1 7
Label 3 Group 1 20
Label 4 Group 2 1
Label 5 Group 2 10
Label 6 Group 2 5

Step configuration:

{
  name: 'top',
  groups: ['Group'],
  rank_on: 'Value',
  sort: 'desc',
  limit: 1
}

Output dataset:

Company Group Value
Label 3 Group 1 20
Label 5 Group 2 10

unpivot step

Unpivot a list of columns to rows.

{
  name: 'unpivot',
  keep: ['COMPANY', 'COUNTRY'], // columns to keep fixed around which to unpivot columns
  unpivot: ['NB_CLIENTS', 'REVENUES'], // columns to unpivot
  unpivot_column_name: 'KPI', // name of the new dimension column created after unpivot
  value_column_name: 'VALUE', // name of the new value column created after unpivot
  dropna: true // whether null values have to be kept or the corresponding rows discarded
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example 1: with dropnaparameter to true

Input dataset:

COMPANY COUNTRY NB_CLIENTS REVENUES
Company 1 France 7 10
Company 2 France 2  
Company 1 USA 12 6
Company 2 USA 1 3

Step configuration:

{
  name: 'unpivot',
  keep: ['COMPANY', 'COUNTRY'],
  unpivot: ['NB_CLIENTS', 'REVENUES'],
  unpivot_column_name: 'KPI',
  value_column_name: 'VALUE',
  dropna: true
}

Output dataset:

COMPANY COUNTRY KPI VALUE
Company 1 France NB_CLIENTS 7
Company 1 France REVENUES 10
Company 2 France NB_CLIENTS 2
Company 1 USA NB_CLIENTS 12
Company 1 USA REVENUES 6
Company 2 USA NB_CLIENTS 1
Company 2 USA REVENUES 3

Example 1: with dropnaparameter to false

Input dataset:

COMPANY COUNTRY NB_CLIENTS REVENUES
Company 1 France 7 10
Company 2 France 2  
Company 1 USA 12 6
Company 2 USA 1 3

Step configuration:

{
  name: 'unpivot',
  keep: ['COMPANY', 'COUNTRY'],
  unpivot: ['NB_CLIENTS', 'REVENUES'],
  unpivot_column_name: 'KPI',
  value_column_name: 'VALUE',
  dropna: false
}

Output dataset:

COMPANY COUNTRY KPI VALUE
Company 1 France NB_CLIENTS 7
Company 1 France REVENUES 10
Company 2 France NB_CLIENTS 2
Company 2 France REVENUES  
Company 1 USA NB_CLIENTS 12
Company 1 USA REVENUES 6
Company 2 USA NB_CLIENTS 1
Company 2 USA REVENUES 3

uppercase step

Converts a string column to uppercase.

{
  name: 'uppercase',
  column: 'foo',
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example:

Input dataset:

Label Group Value
Label 1 Group 1 13
Label 2 Group 1 7
Label 3 Group 1 20

Step configuration:

{
  name: 'uppercase',
  column: 'Label',
}

Output dataset:

Label Group Value
LABEL 1 Group 1 13
LABEL 2 Group 1 7
LABEL 3 Group 1 20

uniquegroups step

Allow to get unique groups of values from one or several columns.

{
  name: 'uniquegroups',
  on: ['foo', 'bar']
}

This step is supported by the following backends:

  • Mongo 4.0
  • Mongo 3.6

Example:

Input dataset:

Label Group Value
Label 1 Group 1 13
Label 2 Group 1 7
Label 3 Group 1 20
Label 1 Group 2 1
Label 2 Group 1 2
Label 3 Group 1 3

Step configuration:

{
  name: 'uniquegroups',
  column: ['Label', 'Group'],
}

Output dataset:

Label Group
Label 1 Group 1
Label 1 Group 2
Label 2 Group 1
Label 3 Group 1