Annex I : Introduction to SMCube Methodology

The main purpose of the Statistical Multidimensional Metadata Model methodology (SMCube methodology) is to provide functionality to a dictionary (here the so called Single Data Dictionary (SDD)). The main principle is that concepts1 are univocally identifiable and consequently that one concept is only represented once using one code only.

Maintenance agencies

A Maintenance agency is the agency responsible for the maintenance of associated dictionary elements (e.g. Variables, Members,…).
The concept of Maintenance agency allows the dictionary to manage different codification systems maintained by different Maintenance agencies. Examples for Maintenance agencies are the SDD reference dictionary (ECB) – the codes that are used to describe the Input layer – and the European Banking Authority (EBA) which is responsible for the maintenance of the DPM content (that is imported into the dictionary; for further details see section Annex III: FinRep translation)
We define the Maintenance agency SDD reference dictionary (ECB) as the reference Maintenance agency and refer to the dictionary elements associated with this Maintenance agency as reference objects or elements.

Data set definition

SMCube is a methodology for defining datasets based on their metadata. Its pivotal role is in defining cubes, which define the structure of a dataset, intended as a set of data organised as a table with fields (columns) and records (rows).
The following table provides an example of a dataset comprising information about granular loans:

Granular Loans
Instrument unique identifier Type of instrument Inception date Legal final maturity date Currency Carrying amount
aGranularLoan Other loans 17/03/2015 17/03/2025 Euro 10,000
anotherGranularLoan Finance leases 01/01/2016 01/01/2021 United States Dollar 13,000

Table 1: data set of granular loans
The minimum information required to describe the structure of the dataset can be summarised with the following three questions:
1. What are the fields (columns) of the dataset?
In the SMCube methodology, the fields of a data set are called variables (of the cube defining this data set). The variables are defined independently of the cube allowing for reusability of concepts meaning that a variable may be used in multiple cubes. A cube may comprise as many variables as needed to define the data set.
Referring to Table 1 the variables are: Instrument unique identifier, Type of instrument, Inception date, Legal final maturity date, Currency and Carrying amount.
2. What are the allowed values for each field?
The possible values (of variables) are organised in so called domains. Domains can be enumerated, if they provide a finite list of allowed values, which we then call members (e.g. countries, currencies,…), or non-enumerated if they provide a data type2.
The allowed values of a variable in the context of a cube are determined by a subset of its domain, a so called subdomain3.
In case of the domain of the variable Currency that comprises all possible currencies (e.g. EUR, USD) including aggregates like Currencies of the European Union and others it becomes obvious that these possible values need to be restricted to a subset (i.e. the allowed values) when used in a cube. The same holds true for non-enumerated domains, e.g. the variable Carrying amount may be defined on the monetary domain (allowing positive and negative values) while in the context of a cube it may be restricted to a subset of this domain (e.g. positive values only). Please note that a subdomain may cover the whole domain.
In the example dataset, some variables are defined on non-enumerated domains, like Instrument unique identifier (String domain), Inception date (Date domain), Legal final maturity date (Date domain), Carrying amount (Monetary domain). The other variables are defined on enumerated domains: Type of instrument, Currency.
The used subdomains (for the variables) in the context of the data set illustrated in table 1 are as following: Instrument unique identifier (String up to 120 characters limited to letters (capital and low cases), numbers, dash and underscore), Inception date (All dates), Legal final maturity date (All dates), Carrying amount (Non-negative monetary amounts), Type of instrument ({Credit card debt, Current accounts, Factoring, Financial leases, Other loans, Other trade receivables}), Currency (ISO4271).
Please also note that for each concept (i.e. variables, members) the dictionary shall provide additional information like a description or a legal reference.
3. What is the role of one field within one dataset?
One of the most relevant aspects of the structure of the dataset is what the identifier of the record is or, in other words, what combination of fields makes a record unique. In the example dataset, if nothing is said regarding the structure, applying some business knowledge one may conclude that each record is uniquely identified by its Instrument unique identifier. On the other hand there may be other datasets that may not contain an explicit identifier.
Thus in order to get a thorough understanding of the described dataset, the role of the variables needs to be explicit. In the SMCube methodology, variables that serve as identifiers of the records take the role of a dimension.
Variables that provide information related to the primary key (i.e. the set of dimensions) take the role of observations and variables that provide additional information related to a dimension or observation take the role of an attribute.
It is worth highlighting that one variable may take different roles in different datasets. For instance, the example dataset is based on granular loans and therefore the variable Type of instrument acts as an observation. In an aggregated dataset (e.g. table 2) this variable may act as a dimension.

Aggregated data set
Type of instrument Institutional sector Currency Carrying amount
Loans and advances Non-financial corporations Euro 11,000
Equity instruments Financial corporations United States Dollar 13,000

Table 2: aggregated dataset
In the context of this data set a record is uniquely identified by the variables Type of instrument, Institutional sector and Currency. Consequently the variable Type of instrument is part of the primary key and therefore acts as a dimension.
Summary
With the SMCube methodology, one cube serves to define the structure of a dataset. One cube is a set of variables, for which the allowed values are specified by a subdomain, and that have a role in the context of the cube.
The meta data description of the Granular Loans data set (table 3) could be summarised as following:

Role Variable Subdomain
Dimension Instrument unique identifier {String up to 120 characters limited to letters (capital and low cases), numbers, dash and underscore}
Observation Type of instrument {Other loans, Financial leases, Reverse repurchase agreements, Factoring, Other trade receivables, Current accounts, Credit card debt}
Observation Inception date {All dates}
Observation Legal final maturity date {All dates}
Observation Currency {ISO 4217}
Observation Carrying amount {Non-negative monetary amounts}

Table 3: meta data description of Granular Loans data set
The meta data description of the Aggregated data set (table 2) could be summarised as following:

Role Variable Subdomain
Dimension Type of instrument {Loans and advances, Equity instruments, Debt securities}
Dimension Institutional sector {Non-financial corporations, Financial corporations, General government, Households}
Dimension Currency {Euro, United States Dollar, Yen, Other currency}
Observation Carrying amount {Monetary amounts}

Table 4: meta data description of Aggregated data set

About the Information Model

For detailed information regarding the dictionary’s Information model we recommend users to explore the Information model’s Entity Relationship Model. By double-clicking on an entity of the model users may explore the structure of each entity.
The dictionary’s Information model is separated into the following packages:

  • The Core package
  • The Data definition package
  • The Mapping package
  • The Transformation package
  • The Rendering package
  • The (legal) reference package

Core package

The main idea behind the dictionary objects of the core package is that these objects are reusable by various Frameworks. The rationale behind using the same concepts in different contexts is that the dictionary may provide users with the information in which Framework(s) or Cube(s) a specific concept is used, e.g. where may I find Loans?

Cube structure

A Cube structure is an intermediate between a Cube and its Cube structure items which represent the columns of a table, their roles (e.g. primary key) and the allowed values. Implementation of the Cube structure ensures compatibility with the SDMX standard where more than one Cube may refer to the same Cube structure. In most cases the relationship between a Cube and its Cube structure will be a one-to-one relationship.

Cube relationships

A Cube relationship allows establishing relationships (e.g. primary / foreign key) between Cubes. For the documentation of Entity Relationship Models (ERM) like the BIRD input layer, but also AnaCredit the relationships between the Cubes are essential as they define referential integrity constraints.

Variable sets

One of the principles of SMCube methodology is that a concept is only represented once and is univocally identifiable. In order to comply with this principle, but at the same time allow covering different data sets based on different use cases, we need the functionality provided by so called variable sets. Just imagine that the concept of Carrying amount is already stored in the dictionary as a variable (so we can think about it as a column of a data set). At the same time another data set may use the concept of Carrying amount but not presented as a column but as the value of a column. In order to comply with the above stated principle this value must not be stored as a member but as a variable. The following tables show the same data represented in different data sets, the first one does not use the concept of variable sets (i.e. the concepts of Carrying amount and Fair value are represented as columns) while the second one applies the concept of variable set (i.e. the concepts of Carrying amount and Fair value are represented as members although they are already defined as variables).

Data set representation without variable set
Type of instrument Carrying amount Fair value
Reverse repurchase loan 31 29
Factoring 19 23

Table 5: Representation of a data set without a variable set

Data set representation with variable set
Type of instrument Type of value Value
Reverse repurchase loan Carrying amount 31
Reverse repurchase loan Fair value 29
Factoring Carrying amount 19
Factoring Fair value 23

Table 6: Representation of the same information illustrated in Table 5 using a variable set
Please note that in the second case the variable Type of value is part of the primary key.

Member hierarchies, Member hierarchy nodes

Member hierarchies serve to establish a hierarchical relationship between the members of a domain. These relationships are independent of datasets, and therefore they are part of the core package. Member hierarchies are described using so called member hierarchy nodes. Such Member hierarchies allow – for example – to represent the configuration of the European Union with respect to the member states in the BIRD model.

Data definition package

Cube hierarchies, cube hierarchy nodes, cube groups and cube group enumerations

A Cube group allows the organization of Cubes in groups. Technically, this many-to-many relationship is established using (so called) Cube group enumerations. Cube groups are useful to group Cubes that share specific characteristics, e.g. all the Cubes related to Counterparties.
A Cube hierarchy allows the representation of Cube groups in a hierarchical (tree) structure. The individual components of such a tree structure are called Cube hierarchy nodes.
These concepts are used, as an example, to represent the content of the BIRD input layer in a hierarchical (tree) structure as you can see in our BIRD input layer Entity Relationship Model.

Combinations

A cube, or more specifically a cube structure, can be seen as a (hyper) space spanned by an orthogonal coordinate system where the axis are given by the dimension(s) of the cube. Combinations can be interpreted as the points inside this space4. Each combination is determined by an allowed value (e.g. a member) for each variable that acts in the role of a dimension. Combinations provide also the possibility to restrict the possible “points in the coordinate system” in the sense that only those combinations that are related to the cube may be valid combinations of this cube.
Please note that one cube may contain multiple combinations and one combination may be present in multiple cubes (i.e. a many-to-many relationship).
For example, the cube of the Aggregated data set (see table 1 of the previous section) can be thought of as a Cartesian coordinate system where the variables (i.e. Type of instrument, Institutional sector and Currency) present the axis and the possible values are specific points on those axis.
This space that is generated by the three axis contains 48 points (i.e. 3 possible values for the variable Type of instrument, 4 possible values for the variable Institutional sector and 4 possible values for the variable Currency), so called Combinations (of allowed values with respect to the underlying subdomains).
It is worth highlighting that with respect to the translation of DPM / XBRL in the SMCube methodology every data point (of DPM / XBRL) is presented as a combination.

Mapping package

Due to the fact that different reporting frameworks use different codification systems (e.g. DPM codification, SHS codification, AnaCredit codification) it is necessary to provide functionality for aligning these codification systems. In SMCube methodology this functionality is called mapping. The distinction between codes of different codification systems is achieved by different Maintenance agencies. The codes maintained by the maintenance agency “SDD reference dictionary (ECB)” will be referred to as reference codes while all other codes will be denoted as non-reference codes. Simply speaking, mappings allow to represent a cube (and its content) using another codification system, e.g. representing a FinRep cube (and its content) with reference codes instead of DPM codes.

Rendering package

The rendering package in SMCube is a copy of the Rendering package in the DPM. Please note that, in SMCube methodology the table cells can be linked to Combinations, which are the equivalent to data points (in DPM).

Historisation

Historisation refers to the ability of knowing the structure of the (meta) data at a certain point in time. Note that historisation differs from an audit log, which deals with how the database changed. To illustrate the difference, suppose that one new cube needs to be reported from time t1, and the cube is created in the database at t0. The audit log will deal with the fact that at t0 a certain cube was created, while historisation serves to specify that the new cube is valid from t1.
The SMCube methodology uses two historisation methods.

Versioning

In some cases, historisation is done with versions of elements. One element can have different versions, and each version has a validity range. This is the case for cubes and cube structures.
As an illustrative example, suppose a cube structure ABC that up to the date t1 has three variables (A, B and C), but from t1 will need to have four (A,B,C,D). This implies two records in the cube structure table, one per version, and the full version of both versions in the cube structure item. The database tables (simplified for illustration purposes) would be:
Cube structure table

CUBE_STRUCTURE_ID CODE VERSION VALID_FROM VALID_TO
ABC_1 ABC 1 t0 t1
ABC_2 ABC 2 t1 31/12/9999

Note that the code is the same, but there are two different ids.
Cube structure item table

CUBE_STRUCTURE_ID CUBE_VARIABLE_CODE
ABC_1 A
ABC_1 B
ABC_1 C
ABC_2 A
ABC_2 B
ABC_2 C
ABC_2 D

This approach is used for cubes, cube structures and combinations.

Enumeration validity

In some other cases, the validity is provided with the enumeration of an item. In these cases, the elements that belong to the item evolve over time without creating different versions. This is the case for the hierarchies. Suppose a hierarchy with the composition of the Euro Area. Some members may join the Euro Area over time, but the Euro Area concept is always the same, so there are no different versions. A member hierarchy with a changing composition is illustrated below:
Member hierarchy table

MEMBER_HIERARCHY_ID NAME
EA Euro Area

Member hierarchy node table

MEMBER_HIERARCHY_ID MEMBER_ID LEVEL PARENT VALID_FROM VALID_TO
EA EA 0 t0 31/12/9999
EA A 1 EA t0 31/12/9999
EA B 1 EA t0 t1
EA C 1 EA t2 31/12/9999
EA D 1 EA t3 31/12/9999

Additional considerations

There are practical reasons for providing both ways of historisation. Creating versions is very useful to stress that there are changes. But, as seen in the example, creating new versions imply more changes and redundancy.


  1. The term concept refers to Variables or Members in the dictionary.
  2. The allowed values of non-enumerated domains may be specified in more detail using so called facets. These facets allow us to apply additional constraints onto non-enumerated domains, e.g. a pattern for the last day of the month.
  3. Please note that the utility of subdomains for variables in cubes allows to organise similar concepts (e.g. countries, regions, …) in domains while only allowing a subset (e.g. countries only) of this domain in a cube.
  4. Please note that a combination may also be represented as a subset of the whole space and does not necessarily be restricted to one point in that space.