BIRD methodology

Introduction

The BIRD methodology refers to the technique used to provide a formal documentation of the BIRD proposed data model.

The description of the BIRD proposed data model is composed of:

  • Description of datasets
  • Transformation process
  • Technical guidelines on how to populate the datasets and implement transformations

The BIRD is documenting a dynamic data process, where some data are provided as input, validated and subsequently transformed into other datasets until the final datasets are generated. These datasets reflect the secondary reporting, i.e. the information that NCBs need to provide to the authorities (e.g. the ECB or the EBA). The following section describes the high-level BIRD data process.

The BIRD provides a formal description of both the datasets and the validation and transformation rules. Annex I introduces the methodology for the formal description of the BIRD datasets, while Annex II deals with the validation and transformation language.

High-level BIRD data process

The BIRD data process may be divided into four layers and three phases separating those layers. The four layers are:

  • The Input Layer (IL)
  • The Enriched Input Layer (EIL)
  • The Reference Output Layer (ROL)
  • The Non-reference Output Layer (NROL)

While the three phases may be denoted by

  • Phase 1 (IL EIL)
  • Phase 2 (EIL ROL)
  • Phase 3 (ROL NROL)

Please note that the layers are described by Cubes while the phases are described by Transformations and Mappings in the BIRD data base.

A system following the BIRD process would start by feeding the input layer from banks’ internal IT systems, following the structure of the input layer cubes defined by the BIRD. The system would then follow apply the first phase of transformations onto the input layer data in order to generate the Enriched input layer.

Layers

The Input layer

Represents a harmonized entity relationship model (ERM) that comprises the necessary information that is available in bank’s internal systems in order to generate the output layer(s). Among other things this layer contains information about entities (e.g. debtors, protection providers…), commitments (e.g. credit facilities), instruments (e.g. loans, securities, deposits…) and protections (e.g. financial protection, physical protection like real estate protection) as well as aggregated information.

The Enriched Input Layer (EIL)

The Enriched Input Layer is another ERM and acts as an intermediate layer between the Input Layer and the Reference Output Layer. It represents the information of the input layer enriched with additional information (e.g. the result of particular derivations where the input parameters of the derivation may not be needed any more and are therefore not present in the Enriched Input Layer).

The Reference Output Layer (ROL)

The Reference Output Layer describes the Non-reference Output Layer using reference codes meaning the codes used in the Input Layer. The relationship between the Reference Output Layer and the Non-reference Output Layer is represented by Mappings (see Annex I: Non-technical introduction to SMCube methodology).

The Non-reference Output Layer (NROL)

The non-reference output layer describes the reporting requirements as defined in the related documents (e.g. regulations, directives, guidelines and manuals), sometimes using different codification systems (e.g. EBA ITS, SDMX…).

Please note that, although the output layer is part of the BIRD process, the actual definition of the output requirements (e.g. regulations) is out of scope of the BIRD. We rather translate the reporting requirements into the language of the BIRD (i.e. in the SMCube methodology), meaning that these reporting requirements are represented as cubes (e.g. FinRep EBA ITS that is described in the DPM / XBRL is (automatically) extracted and represented in the dictionary).

Phases

All phases may comprise validations, e.g. validations that ensure the integrity of the Input Layer, validations that ensure the consistency of the Enriched Input Layer, validations that apply (external) validation rules on the Non-reference Output Layer.

Phase 1 (IL EIL)

The first phase comprises the transformations that are applied on the Input Layer in order to generate the Enriched Input Layer. These transformations may be classified into the Preparation (of data) and the Enrichment (of data).

Phase 2 (EIL ROL)

This phase comprises the transformations that generate the Reference Output Layer based on the Enriched Input Layer. Additionally this phase may comprise technical adjustments necessary in order to comply with the Non-reference Output Layer.

Phase 3 (ROL NROL)

The third phase of the process comprises the application of Mappings describing the relationship between the Reference Output Layer and the Non-reference Output Layer.

Description of the datasets

The description of the datasets in the BIRD is provided following the SMCube methodology, which is a methodology developed with the objectives of:

  • Enabling the description within one dictionary of all types of datasets (registries, dynamic data for supervisory purposes, for monetary purposes…)
  • Facilitating the integration of dictionaries developed with different methodologies (like DPM/XBRL or SDMX) in one single dictionary by keeping a high degree of compatibility with the other methodologies

The version 1.0 of the SMCube methodology is finalised. The SMCube information model can be explored in the section Information Model ERM and additional information about the model can be found on the ECB website. A non-technical introduction to the main concepts of the methodology is provided in Annex I of this handbook.

The content of the BIRD dictionary is accessible via a (read only) //www.public-test.banks-integrated-reporting-dictionary.eu/web-service-interface:web service or by downloading the //www.public-test.banks-integrated-reporting-dictionary.eu/bird-database:Access data base.

Transformation rules in the BIRD model

Transformation rules represent a description of logical operations applied to objects of the BIRD database to create new / additional information.

Transformations rules are organised, following the VTL model, consisting of Transformation schemes, Functions, Rulesets and Procedures. One Transformation scheme may contain one or many transformations.

Transformations constitute the basic element of the calculations, and consist of a statement assigning the outcome of an expression to a VTL element. Transformations have thus (except for procedure calls) the following structure:

transformationResult := expression

Transformations are grouped into transformation schemes, which are sets of transformations aimed at obtaining some meaningful results for the user. Transformation schemes group transformations that share some functional or business purpose. An example of transformation scheme is the derivation of enterprise size. This transformation scheme is formed by several different transformations aimed to give as a result the final enterprise size.

Transformations are defined in verbal and formal form. The verbal description (“Natural language”) represents the business perspective trying to avoid specifications regarding the technical implementation. The formal description is based on the “Validation and Transformation Language (VTL)” with minor adaptions to improve usability / readability by the end user and aims at a technical description of the transformation rules independent of the technical implementation.

Regarding the formal description, please note that due to the ongoing developments in the Task force for the Validation and Transformation Language (VTL) version 2.0. Therefore the BIRD transformations are still based on VTL version 1.1.

Transformation schemes are classified in the following four types:

  1. Validation rules: Transformations that evaluate a logical condition that data should comply with. Validations are subdivided in:
    1. Consistency: Check the consistency among variables (for instance, the inception date has to be earlier than the reference date)
    2. Completeness: Check that all related information exists (for instance, for each loan there should be at least one debtor).
    3. Integrity: Check that, for one instance of one object, there is a related instance in another object. This includes referential integrity validations (for instance, if there is an issuer of a security, this issuer has to exist in the Counterparties cube).
    4. Uniqueness: Checks that identifiers used in separate cubes are unique (for instance, that an Instrument unique identifier can only be used for one instrument).
  2. Derivation rules: Transformations that create new variables from existing data.
  3. Generation rules: Transformations focused on the data preparation based on the formalities described for each output framework.
  4. Technical rules: Transformations required for technical purposes, in order to have a complete description of the transformation process, but are not relevant for business users.

Annex II provides a technical description of the model used in the database to represent VTL as well as an overview of the most important VTL functionality used in BIRD transformations.

Annex I: Non-technical introduction to SMCube methodology

The main purpose of the Statistical Multidimensional Metadata Model methodology (SMCube methodology) is to provide functionality to a dictionary (here the so called Single Data Dictionary (SDD)). The main principle is that concepts1 are univocally identifiable and consequently that one concept is only represented once using one code only.

Maintenance agencies

All the elements described with the SMCube methodology belong to one maintenance agency. The reason for this is that the elements are owned by different agencies, and it has to be clear which the owner for each element is.

Data set definition

SMCube is a methodology for defining datasets based on their metadata. Its pivotal role is in defining cubes, which define the structure of a dataset, intended as a set of data organised as a table with fields (columns) and records (rows).

The following table provides an example of a dataset comprising information about granular loans:

Granular Loans
Instrument unique identifier Type of instrument Inception date Legal final maturity date Currency Carrying amount
aGranularLoan Other loans 17/03/2015 17/03/2025 Euro 10,000
anotherGranularLoan Finance leases 01/01/2016 01/01/2021 United States Dollar 13,000

Table 1: data set of granular loans

The minimum information required to describe the structure of the dataset can be summarised with the following three questions:

1. What are the fields (columns) of the dataset?

In the SMCube methodology, the fields of a data set are called variables (of the cube defining this data set). The variables are defined independently of the cube allowing for reusability of concepts meaning that a variable may be used in multiple cubes. A cube may comprise as many variables as needed to define the data set.

Referring to Table 1 the variables are: Instrument unique identifier, Type of instrument, Inception date, Legal final maturity date, Currency and Carrying amount.

2. What are the allowed values for each field?

The possible values (of variables) are organised in so called domains. Domains can be enumerated, if they provide a finite list of allowed values, which we then call members (e.g. countries, currencies,…), or non-enumerated if they provide a data type2.

The allowed values of a variable in the context of a cube are determined by a subset of its domain, a so called [{cube:SUBDOMAIN:subdomain]}3.

In case of the domain of the variable Currency that comprises all possible currencies (e.g. EUR, USD) including aggregates like Currencies of the European Union and others it becomes obvious that these possible values need to be restricted to a subset (i.e. the allowed values) when used in a cube. The same holds true for non-enumerated domains, e.g. the variable Carrying amount may be defined on the monetary domain (allowing positive and negative values) while in the context of a cube it may be restricted to a subset of this domain (e.g. positive values only). Please note that a subdomain may cover the whole domain.

In the example dataset, some variables are defined on non-enumerated domains, like Instrument unique identifier (String domain), Inception date (Date domain), Legal final maturity date (Date domain), Carrying amount (Monetary domain). The other variables are defined on enumerated domains: Type of instrument, Currency.

The used subdomains (for the variables) in the context of the data set illustrated in table 1 are as following: Instrument unique identifier (String up to 120 characters limited to letters (capital and low cases), numbers, dash and underscore), Inception date (All dates), Legal final maturity date (All dates), Carrying amount (Non-negative monetary amounts), Type of instrument ({Credit card debt, Current accounts, Factoring, Financial leases, Other loans, Other trade receivables}), Currency (ISO4271).

Please also note that for each concept (i.e. variables, members) the dictionary shall provide additional information like a description or a legal reference.

3. What is the role of one field within one dataset?

One of the most relevant aspects of the structure of the dataset is what the identifier of the record is or, in other words, what combination of fields makes a record unique. In the example dataset, if nothing is said regarding the structure, applying some business knowledge one may conclude that each record is uniquely identified by its Instrument unique identifier. On the other hand there may be other datasets that may not contain an explicit identifier.

Thus in order to get a thorough understanding of the described dataset, the role of the variables needs to be explicit. In the SMCube methodology, variables that serve as identifiers of the records take the role of a dimension.

Variables that provide information related to the primary key (i.e. the set of dimensions) take the role of observations and variables that provide additional information related to a dimension or observation take the role of an attribute.

It is worth highlighting that one variable may take different roles in different datasets. For instance, the example dataset is based on granular loans and therefore the variable Type of instrument acts as an observation. In an aggregated dataset (e.g. table 2) this variable may act as a dimension.

Aggregated data set
Type of instrument Institutional sector Currency Carrying amount
Loans and advances Non-financial corporations Euro 11,000
Equity instruments Financial corporations United States Dollar 13,000

Table 2: aggregated dataset

In the context of this data set a record is uniquely identified by the variables Type of instrument, Institutional sector and Currency. Consequently the variable Type of instrument is part of the primary key and therefore acts as a dimension.

Summary

With the SMCube methodology, one cube serves to define the structure of a dataset. One cube is a set of variables, for which the allowed values are specified by a subdomain, and that have a role in the context of the cube.

The meta data description of the Granular Loans data set (table 3) could be summarised as following:

Role Variable Subdomain
Dimension Instrument unique identifier {String up to 120 characters limited to letters (capital and low cases), numbers, dash and underscore}
Observation Type of instrument {Other loans, Financial leases, Reverse repurchase agreements, Factoring, Other trade receivables, Current accounts, Credit card debt}
Observation Inception date {All dates}
Observation Legal final maturity date {All dates}
Observation Currency {ISO 4217}
Observation Carrying amount {Non-negative monetary amounts}

Table 3: meta data description of Granular Loans data set

The meta data description of the Aggregated data set (table 2) could be summarised as following:

Role Variable Subdomain
Dimension Type of instrument {Loans and advances, Equity instruments, Debt securities}
Dimension Institutional sector {Non-financial corporations, Financial corporations, General government, Households}
Dimension Currency {Euro, United States Dollar, Yen, Other currency}
Observation Carrying amount {Monetary amounts}

Table 4: meta data description of Aggregated data set

Cube hierarchies, cube hierarchy nodes, cube groups and cube group enumerations

These new tables serve to organise the cubes in a hierarchical manner. This is used, for instance, to organise the BIRD cubes in different groups of cubes (counterparties, protection, instruments…).

Cube structure

This new table serves to ensure compatibility with the SDMX standard, where more than one cube can be defined based on the same cube structure. In practice, in most of the cases (including the BIRD, AnaCredit, FinRep and SHS) there is a 1 to 1 equivalence between cube and cube structure.

Cube relationships

The cube relationships table serves to describe primary/foreign key relationships between cubes and therefore allows for the definition of relations between cubes in the BIRD model.

Combinations

A cube, or more specifically a cube structure, can be seen as a (hyper) space spanned by an orthogonal coordinate system where the axis are given by the dimension(s) of the cube. Combinations can be interpreted as the points inside this space4. Each combination is determined by an allowed value (e.g. a member) for each variable that acts in the role of a dimension. Combinations provide also the possibility to restrict the possible “points in the coordinate system” in the sense that only those combinations that are related to the cube may be valid combinations of this cube.

Please note that one cube may contain multiple combinations and one combination may be present in multiple cubes (i.e. a many-to-many relationship).

For example, the cube of the Aggregated data set (see table 1 of the previous section) can be thought of as a Cartesian coordinate system where the variables (i.e. Type of instrument, Institutional sector and Currency) present the axis and the possible values are specific points on those axis.

This space that is generated by the three axis contains 48 points (i.e. 3 possible values for the variable Type of instrument, 4 possible values for the variable Institutional sector and 4 possible values for the variable Currency), so called Combinations (of allowed values with respect to the underlying subdomains).

It is worth highlighting that with respect to the translation of DPM / XBRL in the SMCube methodology every data point (of DPM / XBRL) is presented as a combination.

Variable sets

One of the principles of SMCube methodology is that a concept is only represented once and is univocally identifiable. In order to comply with this principle, but at the same time allow covering different data sets based on different use cases, we need the functionality provided by so called variable sets. Just imagine that the concept of Carrying amount is already stored in the dictionary as a variable (so we can think about it as a column of a data set). At the same time another data set may use the concept of Carrying amount but not presented as a column but as the value of a column. In order to comply with the above stated principle this value must not be stored as a member but as a variable. The following tables show the same data represented in different data sets, the first one does not use the concept of variable sets (i.e. the concepts of Carrying amount and Fair value are represented as columns) while the second one applies the concept of variable set (i.e. the concepts of Carrying amount and Fair value are represented as members although they are already defined as variables).

Data set representation without variable set
Type of instrument Carrying amount Fair value
Reverse repurchase loan 31 29
Factoring 19 23

Table 5: Representation of a data set without a variable set

Data set representation with variable set
Type of instrument Type of value Value
Reverse repurchase loan Carrying amount 31
Reverse repurchase loan Fair value 29
Factoring Carrying amount 19
Factoring Fair value 23

Table 6: Representation of the same information illustrated in Table 5 using a variable set

Please note that in the second case the variable Type of value is part of the primary key.
Please also note that the VTL operators fold and unfold allow to switch between representations using a variable set and representations without the utility of a variable set.

Mappings

Due to the fact that different reporting frameworks use different codification systems (e.g. DPM codification, SHS codification, AnaCredit codification) it is necessary to provide functionality for aligning these codification systems. In SMCube methodology this functionality is called mapping. The distinction between codes of different codification systems is achieved by different so called maintenance agencies. The codes maintained by the maintenance agency “SDD reference dictionary (ECB)” will be referred to as reference codes while all other codes will be denoted as non-reference codes. Simply speaking, mappings allow to represent a cube (and its content) using another codification system, e.g. representing a FinRep cube (and its content) with reference codes instead of DPM codes.

Historisation

Historisation refers to the ability of knowing the structure of the (meta) data at a certain point in time. Note that historisation differs from an audit log, which deals with how the database changed. To illustrate the difference, suppose that one new cube needs to be reported from time t1, and the cube is created in the database at t0. The audit log will deal with the fact that at t0 a certain cube was created, while historisation serves to specify that the new cube is valid from t1.

The SMCube methodology uses two historisation methods.

Versioning

In some cases, historisation is done with versions of elements. One element can have different versions, and each version has a validity range. This is the case for cubes and cube structures.

As an illustrative example, suppose a cube structure ABC that up to the date t1 has three variables (A, B and C), but from t1 will need to have four (A,B,C,D). This implies two records in the cube structure table, one per version, and the full version of both versions in the cube structure item. The database tables (simplified for illustration purposes) would be:

Cube structure table

CUBE_STRUCTURE_ID CODE VERSION VALID_FROM VALID_TO
ABC_1 ABC 1 t0 t1
ABC_2 ABC 2 t1 31/12/9999

Note that the code is the same, but there are two different ids.

Cube structure item table

CUBE_STRUCTURE_ID CUBE_VARIABLE_CODE
ABC_1 A
ABC_1 B
ABC_1 C
ABC_2 A
ABC_2 B
ABC_2 C
ABC_2 D

This approach is used for cubes, cube structures and combinations.

Enumeration validity

In some other cases, the validity is provided with the enumeration of an item. In these cases, the elements that belong to the item evolve over time without creating different versions. This is the case for the hierarchies. Suppose a hierarchy with the composition of the Euro Area. Some members may join the Euro Area over time, but the Euro Area concept is always the same, so there are no different versions. A member hierarchy with a changing composition is illustrated below:

Member hierarchy table

MEMBER_HIERARCHY_ID NAME
EA Euro Area

Member hierarchy node table

MEMBER_HIERARCHY_ID MEMBER_ID LEVEL PARENT VALID_FROM VALID_TO
EA EA 0 t0 31/12/9999
EA A 1 EA t0 31/12/9999
EA B 1 EA t0 t1
EA C 1 EA t2 31/12/9999
EA D 1 EA t3 31/12/9999

Additional considerations

There are practical reasons for providing both ways of historisation. Creating versions is very useful to stress that there are changes. But, as seen in the example, creating new versions imply more changes and redundancy.

Annex II: Transformations

The representation of transformations in the BIRD database is based on the SDMX information model (see section II, 13.2 Model – Inheritance View, 13.2.1 Class Diagram).

According to the VTL model, a Transformation scheme is an ordered list of transformations. Therefore such a transformation scheme contains one or many transformations (i.e. one line of valid VTL code). The SDMX model specifies that transformations can contain one or many transformation nodes (i.e. the components of this line of valid VTL code). Therefore a transformation element is either a constant, an operation or a BIRD model object (i.e. a variable, cube, etc.). In case the transformation element represents an operation such a transformation element itself can have a relation to one or many transformation elements.

The BIRD database contains the complete information about transformation schemes in the sense that not only the decomposition of each transformation scheme into its transformations but also the decomposition of transformations into its transformation nodes according to the SDMX information model is stored in the database.

The next sections (“Example”, “Representation in the database”) explain the relation between transformation schemes, transformations and transformation nodes and their representation in the database. In the section “New VTL artefacts” we also describe functionality of VTL that is used regularly in various transformation schemes.

Conventions regarding transformations

Please consider the following conventions when analysing the transformation rules.

Implicit variables in join operations

When joining cubes we do not take into account the variables Reference date and Perspective identifier. Consequently we assume equality of these two variables when applying a join operation.

Implicit aggregation

We also assume aggregation of measures (i.e. numeric observations) when reducing the dimensions of a cube (e.g. via a keep operation that does not comprise all dimensions of the cube).

Known issues regarding the transformations

Dependency tree of transformation schemes

Due to improvements of the way we derive the relationships between the transformations we are currently updating the generation of the dependency tree and will update this feature as soon as possible. Please take a look at the table NODE_CONNECTIONS in the BIRD database. This table provides all relationships between the transformation nodes.

Validations / Data point rulesets

The update of the validation rules (w.r.t. the new Input Layer structure) is currently in progress. Please note that the results of the Work stream on data modelling may also affect the Input Layer structure and therefore we plan to publish the updated validation rules after this work stream has finished its mandate.

Regarding the representation of certain VTL artefacts

We are aware of the fact that certain VTL artefacts are represented in a not very user friendly way, e.g. Procedures or larger if-then-else structures. We are working on improving the website representation of these artefacts. Note however that this is issue is only related to the representation of the information in the website, the content of the database (in the transformation package) is sound.

Example

The following example will try to clarify the current status of the representation of transformations in the BIRD database:

Let’s assume we have a (database-)table named “coordinates” containing the columns (i.e. variables) x and y which (clearly) relate to some coordinate system. Our transformation scheme’s goal is to derive a new variable distance for all records where x and y are greater than or equal to 0 defined in the following way:

distance = sqrt(x * x + y * y).

Using VTL syntax we would write the following lines:

/*extract all records from the (database-)table coordinates and store the result in a dataset named “coordinates”*/

coordinates := get(“coordinates”);

/*extract all records from the dataset “coordinates” where x and y are greater or equal to zero, keep only x and y and store the result in a dataset named “result”*/

result := coordinates [filter (x >= 0 and y >= 0), keep (x, y)];

/*apply a calculation on the dataset “result” which takes the square root of the sum of x squared and y squared and stores the result in a new column (i.e. variable) named “distance”*/

finalResult := result [calc sqrt (x * x + y * y) as “distance”];

The tree structure with respect to the second line can be illustrated as follows:

FIG 1: tree structure representation of result := coordinates [filter (x >= 0 and y >= 0), keep (x, y)];

The term written in brackets is the type of transformation element which can be used to identify constants and BIRD model objects (i.e. variables, cubes, etc.).

Please note that the Boolean condition applied to the filter operator (i.e. “x>=0 and y>=0”) is completely decomposed into its components (i.e. transformation elements) in a structured way in the sense that the Boolean condition can be reengineered from this tree structure.

The decomposition of transformation into its transformation elements supports specific implementations of these transformations. For example in case of a SQL implementation we could apply the following mappings:

  • := CREATE VIEW _______ AS
  • Filter WHERE
  • Keep SELECT

Walk the tree and create the corresponding line of SQL code:

CREATE VIEW result AS SELECT x, y FROM coordinates WHERE x >= 0 AND y >= 0;

Please note that – in order to generate such an SQL statement – one must additionally rearrange the nodes of the tree according to the SQL syntax. Please also note that the elements after each keyword (i.e. CREATE VIEW, SELECT, FROM, WHERE) are similar to the elements represented in the tree structure.

For the sake of completeness you find the tree representation of the first and second line here:

FIG 2: tree structure representation of coordinates := get(“coordinates”);

FIG 3: tree structure representation of finalResult := result [calc sqrt(x * x + y * y) as “distance”];

Representation in the database

The transformation scheme is stored in the table TRANSFORMATION_SCHEME:

SCHEME_ID EXPRESSION DESCRIPTION NATURAL_LANGUAGE
TEST_SCHEME /*extract all records from the (database-)table coordinates and store the result in a dataset named “coordinates”*/
coordinates := get(“coordinates”);
/*extract all records from the dataset “coordinates” where x and y are greater or equal to zero, keep only x and y and store the result in a dataset named “result”*/
result := coordinates [filter (x >= 0 and y >= 0), keep (x, y)];
/*apply a calculation on the dataset “result” which takes the square root of the sum of x squared and y squared and stores the result in a new column (i.e. variable) named “distance”*/
finalResult := result [calc sqrt (x * x + y * y) as “distance”];
Transformation scheme example Transformation scheme deriving the distance between x and y

Please note that this is a reduced version of the original table, presented for illustrative purposes.

Each individual transformation is stored in the TRANSFORMATION table:

TRANSFORMATION_ID EXPRESSION SCHEME_ID ORDER
156430 /*extract all records from the (database-)table coordinates and store the result in a dataset named “coordinates”*/

coordinates := get(“coordinates”);

TEST_SCHEME 0
156435 /*extract all records from the dataset “coordinates” where x and y are greater or equal to zero, keep only x and y and store the result in a dataset named “result”*/

result := coordinates [filter (x >= 0 and y >= 0), keep (x, y)];

TEST_SCHEME 1
156451 /*apply a calculation on the dataset “result” which takes the square root of the sum of x squared and y squared and stores the result in a new column (i.e. variable) named “distance”*/

finalResult := result [calc sqrt (x * x + y * y) as “distance”];

TEST_SCHEME 2

Using the SCHEME_ID we can connect these transformations with the related transformation scheme (which is similar to stating that “these transformations are children of the transformation scheme with SCHEME_ID ”TEST_SCHEME”).

All Transformation elements are stored in the table TRANSFORMATION_NODE:

TRANSFORMATION_ID NODE_ID EXPRESSION TYPE_OF_NODE LEVEL PARENT ORDER
156430 156431 := OperatorNode 0
156430 156432 coordinates ReferenceNode 1 156431 0
156430 156433 get OperatorNode 1 156431 1
156430 156434 “coordinates” ReferenceNode 2 156433 0
156435 156436 := OperatorNode 0
156435 156437 result ReferenceNode 1 156436 0
156435 156438 keep OperatorNode 1 156436 1
156435 156439 filter OperatorNode 2 156438 0
156435 156440 coordinates ReferenceNode 3 156439 0
156435 156441 and OperatorNode 3 156439 1
156435 156442 >= OperatorNode 4 156441 0
156435 156443 x ReferenceNode 5 156442 0
156435 156444 0 ConstantNode 5 156442 1
156435 156445 >= OperatorNode 4 156441 1
156435 156446 y ReferenceNode 5 156445 0
156435 156447 0 ConstantNode 5 156445 1
156435 156448 Parameters OperatorNode 2 156438 1
156435 156449 x ReferenceNode 3 156448 0
156435 156450 y ReferenceNode 3 156448 1
156451 156452 := OperatorNode 0
156451 156453 finalResult ReferenceNode 1 156452 0
156451 156454 calc OperatorNode 1 156452 1
156451 156455 result ReferenceNode 2 156454 0
156451 156456 as OperatorNode 2 156454 1
156451 156457 sqrt OperatorNode 3 156456 0
156451 156458 + OperatorNode 4 156457 0
156451 156459 * OperatorNode 5 156458 0
156451 156460 x ReferenceNode 6 156459 0
156451 156461 x ReferenceNode 6 156459 1
156451 156462 * OperatorNode 5 156458 1
156451 156463 y ReferenceNode 6 156462 0
156451 156464 y ReferenceNode 6 156462 1
156451 156465 “distance” ConstantNode 3 156456 1

This structure supports easy access to the components of each Transformation. If, for example, we are interested in the operators that are used in the second line (result := coordinates [filter (x >= 0 and y >= 0), keep (x, y)]; compare FIG 1) we simply select all rows where the TRANSFORMATION_ID equals 156435 and restrict the result to those records where the TYPE_OF_NODE equals OperatorNode. The result reflects the blue squares in FIG 1.

Please note that not only Transformations are represented in this structure but also Functions, Datasets and Procedures.

Description of VTL artefacts

Procedures

Procedures are aimed at automating common processing tasks, and can be used as a means for shortening the code by replacing common processing tasks with a procedure call.

Procedures take input and output arguments and describe the set of transformations performed with those arguments.

The following example shows a procedure for checking the identifiers that are present in a cube (FRGN_CB) but are not present in another cube (PRMRY_CB).

define procedure PRCDR_RFRNTL_INTGRTY(input FRGN_CB as dataset, input FRGN_VRBL as string, input PRMRY_CB as dataset, input PRMRY_VRBL as string, input VLDTN_ID as string, output RSLT as dataset) {

/*extract a set of Foreign variable (FRGN_VRBL), rename to ID, store in dataset Foreign identifiers (FRGN_IDS)*/

FRGN_IDS := FRGN_CB[keep (FRGN_VRBL), rename FGN_VRBL as “ID” ];

/*extract a set of Primary variable (PRMRY_VRBL), rename to ID, store in dataset Primary identifiers (PRMRY_IDS)*/

PRMRY_IDS := PRMRY_CB[keep (PRMRY_VRBL), rename PRMRY_VRBL as “ID” ];

/*calculate the set difference between Foreign identifiers (FRGN_IDS) and Primary identifiers (PRMRY_IDS)*/

RSLT := setdiff (FRGN_IDS, PRMRY_IDS);

/*rename ID to FRGN_IDS, make VLDTN_ID a measure variable*/

RSLT := RSLT [rename (ID as “FRGN_IDS”, VLDTN_ID role Measure)];

}

The procedure can be called afterwards with concrete arguments:

call PRCDR_RFRNTL_INTGRTY(TRNSCTNS_CNTRPRTS, CNTRPRTY_ID, CNTRPRTS, CNTRPRTY_ID, V_TRNSCTNS_CNTRPRTS_ID);

Functions

VTL allows extending the available operators by defining functions. Functions take as input some variables, and give as a result a predefined calculation. Currently the BIRD uses functions as linear maps n:1 maps creating one output value while taking into account n input parameters.

The following example shows a function to calculate the carrying amount from the required input variables:

/*map: (Accounting classification, Fair value, Gross carrying amount excluding accrued interest, Accrued interest, Fair value changes due to hedge accounting, Accumulated impairment) Carrying amount*/

create function D_CRRYNG_AMNT(ACCNTNG_CLSSFCTN, FV, GRSS_CRRYNG_AMNT_E_INTRST, ACCRD_INTRST, FV_CHNG_HDG_ACCNTNG, ACCMLTD_IMPRMNT) {

returns

if (ACCNTNG_CLSSFCTN in (“2”, “4”, “8”, “41”)) then FV

elseif (ACCNTNG_CLSSFCTN in (“6”, “14”)) then (GRSS_CRRYNG_AMNT_E_INTRST + ACCRD_INTRST – ACCMLTD_IMPRMNT + FV_CHNG_HDG_ACCNTNG)

else null

as integer}

This function can be then used to derive new data:

RESULT := CUBE [calc D_CRRYNG_AMNT(ACCNTNG_CLSFCTN, FV, GRSS_CRRYNG_AMNT_E_INTRST, ACCRD_INTRST, FV_CHNGS_HDG_ACCNTNG, ACCMLTD_IMPRMNT) as “CRRYNG_AMNT” role Measure];

The line illustrated above adds a column named “CRRYNG_AMNT” to the dataset CUBE, where the value of this new column for each row is determined by the function D_CRRNG_AMNT, and stores the result in a dataset named RESULT.

Rulesets

Rulesets define validation rules between variables that have to be applied to each individual record of a given dataset. Rulesets take as input the variables to be validated, and contain at least one consistency rule that the validations need to comply with. Each rule has two conditions (introduced by the clauses when and then), and the validation will be satisfied if both conditions are satisfied.

As an example, the following ruleset includes two consistency rules between the variables accounting classification and accumulated changes in the fair value due to credit risk

define datapoint ruleset DR_ACCMLTD_IMPRMNT1(ACCNTNG_CLSSFCTN, ACCMLTD_IMPRMNT) {

RL1:

when ACCNTNG_CLSSFCTN in (“2”, “4”, “41”)

then isnull(ACCMLTD_IMPRMNT)

errorcode(“Instruments classified as ‘IFRS: Financial assets held for trading (2)’, ‘IFRS: Financial assets designated at fair value through profit or loss (4)’ or ‘IFRS: Non-trading financial assets mandatorily at fair value through profit or loss (41)’ are not subject to impairment”);

RL2:

when ACCNTNG_CLSSFCTN in (“6”, “8”, “14”)

then not isnull(ACCMLTD_IMPRMNT)

errorcode(“For instruments classified as ‘IFRS: Financial assets at amortised cost (6)’, ‘IFRS: Financial assets at fair value through other comprehensive income (8)’, ‘IFRS: Cash balances at central banks and other demand deposits (14)’ the ‘Accumulated impairment’ should not be null”);

}

The ruleset can then be used with the check operator:

VALIDATION_RESULT := check (DATASET, DR_ACCMLTD_IMPRMNT1);

Transformation parser

We developed a parser for VTL in order to visualize tree structures and support the production of the output data model required for the BIRD database. The parser is written in Java, and available in GitHub.

Dependencies of transformation schemes

Due to the fact that most of the transformation schemes depend on other schemes in the sense that they use datasets that are generated in other schemes we provide a graphical illustration of such dependencies for each transformation scheme following the header “Scheme dependencies” in the “Natural language” section. These dependencies can be computed from the transformation content in the database (i.e. the tables TRANSFORMATION_SCHEME, TRANSFORMATION, TRANSFORMATION_NODE).

Please note that we implemented some restrictions to the transformation schemes that are taken into account when computing these dependencies; first we do not include validation and put schemes in the dependency tree and second, such a dependency tree does not contain any duplication of related schemes (although multiple schemes in such a tree may depend on the same transformation scheme).

Annex III: FinRep translation

The content of EBA’s DPM is imported into the SMCube methodology automatically from the XBRL taxonomies. All the DPM content remains untouched, only a translation between methodologies is performed. All the items belong to the maintenance agency “EBA”, reflecting the fact that the content is left exactly as published by the EBA through the taxonomies.

Core package translation

The core package of the SMCube methodology is very similar to the dictionary layer of the DPM, so in most cases the translation is very straightforward.

The most relevant changes that should be noted are:

  • Metrics are members of the domain Amount type in the DPM. As explained above, in SMCube metrics are treated as variables. Nevertheless, metrics are also imported as members, because the DPM contains member hierarchies with in which the members are metrics, so they get imported in the SDD as members in order not to lose the hierarchies.
  • New domains are added with the metric data types. The reason is the conversion of metrics into variables. Variables in SMCube are defined on a domain, and metrics have a data type, so the data types of the metrics are created as domains. These new domains are:
    • Monetary
    • String
    • Percent
    • Boolean
    • Date
    • Integer
    • Decimal
  • New variables are added to make explicit some dimensions that the DPM retains implicit, but that are included in the XBRL instances. These variables deal with the reporting agent, the date and the observation value, and are:
    • Period
    • Entity
    • Unit
    • Observation_Value

Data definition package translation

The translation of the data definition package is not straightforward, since the DPM does not represent multidimensional structures. Therefore, a convention to what is equivalent to a cube is required. The convention followed has been translating one table as one cube.

In the cube structure items, the implicit variables are added to the cube. All the DPM dimensions have the dimension role also under the SMCube methodology.

Data points are translated as combinations: each combination within a non-reference cube is considered a data point described by the DPM. The FinRep reference cubes are a normalised template described by reference codes and reference combinations.

It is possible to link a non-reference combination – a datapoint – to a reference combination5.


  1. The term concept refers to Variables or Members in the dictionary.
  2. The allowed values of non-enumerated domains may be specified in more detail using so called facets. These facets allow us to apply additional constraints onto non-enumerated domains, e.g. a pattern for the last day of the month.
  3. Please note that the utility of subdomains for variables in cubes allows to organise similar concepts (e.g. countries, regions, …) in domains while only allowing a subset (e.g. countries only) of this domain in a cube.
  4. Please note that a combination may also be represented as a subset of the whole space and does not necessarily be restricted to one point in that space.
  5. The non-reference combinations ID are coded using DataPointVID of DPM database. Reference combinations ID are structured in the same way, ending with a “_REF” string. (i.e. EBA_100 <-> EBA_100_REF, and identify DataPointVID = 100).