PivotCube User's Guide Table of contents

Section 2


2. Working with multidimensional cube

2.1 Building a new cube

Building a new cube involves the creation of a logical cube structure by means of setting the TPivotCube properties and loading data from a data source.  All operations connected with building a multidimensional cube may be divided into 3 main stages:
2.1.1 Data schemas for populating a cube
2.1.1.1 Table

The Table schema is a logical structure where all data for the dimensions and measures are kept in one table. An example of this schema is a spreadsheet like that found in Microsoft Excel.

Transaction date Customer Shop Item Quantity sold
10-Jan-2003 Acme Shop A Bicycles 23
10-Jan-2003 Acme Shop A Cars 10
10-Jan-2003 Acme Shop B Bicycles 15
10-Jan-2003 Acme Shop B Cars 12
10-Jan-2003 Ecma Shop A Bicycles 53
10-Jan-2003 Ecma Shop A Cars 17
10-Jan-2003 Ecma Shop B Bicycles 35

For loading data from a table into a cube, the data sources for the dimensions are identical to the Facts table's data source.

2.1.1.2 Star

The Star schema is a logical structure built like this: in the center of this structure is a Facts table. Surrounding this table are the dimension tables which contains the reference data.

Clients
ClientName string[32]
ClientID integer

Wares
WareName string[32]
WareID integer

Facts
ClientID integer
WareID integer
Summary float
sDate date

For loading data from the Star schema into a cube, every dimension must have its own data source.

In the above figure, you can see that 'Ware' dimension has its own data source, named ‘aWare’.

2.1.1.3 Advantages and disadvantages of the different schemas

Advantages of Table schema

  • Data is kept in a single table, and allows editing by means of any data editor (e.g. DBGrid)

Disadvantages of Table schema

  • Loading data to cube is very slow. OLAP engine 'does not know' if loading dimension value is new or not – this brings down the loading speed.

Advantages of Star schema

  • Every dimension can be loaded from the database before starting the  loading of the fact data, so it is not necessary to check every row from the Fact tables to find all dimension items.

Disadvantages of Star schema

  • It is necessary to create separate data sources for every dimension, which makes for a more complex configuration which may lead to mistakes.
2.1.2 Measures

Adding a new measure to a cube is done by using the Measures property editor of TPivotCube.

FieldName   field name from Fact table, that will be used as the Measure
AliasName   a custom title for this measure which is displayed in the cube e.g. ‘Summary of Sales’ for the field ‘Summary’.
CalcType   aggregating type.
DataType obsolete
FormatString  is a string that determines the formatting rules of the measure for display purposes.  For a detailed description of the ‘FormatString’ structure, see the description of the ‘FormatFloat’ function in the Delphi VCL help.
2.1.2.1 Measure datatypes

PivotCube permits using any datatypes as measures with the following limitation:

  • Non-numeric data types can only use ctCount aggregation type.
2.1.2.2 Aggregating types

By loading data from database, measures can be aggregated using the CalcType property.

For example, the aggregation type ctSumma sums up the values of all records for the field referenced by dimension's FieldName property. 

For example

Client Shop Ware Quantity sold
Acme Shop A Bicycles 10
Acme Shop A Cars 20
Acme Shop B Bicycles 30
Acme Shop B Cars 40
Ecma Shop A Bicycles 50
Ecma Shop A Cars 60
Ecma Shop B Bicycles 70

If we choose to use the ctSumma aggregation type on the 'Client' and 'Ware' dimensions, we will get the following results:

Acme - Bicycles : 10 + 30 = 40
Acme - Cars : 20 + 40 = 60
Ecma - Bicycles: 50 + 70 = 120
Ecma - Cars: 60

If using the ctCount aggregation type :

Acme - Bicycles : 2
Acme - Cars : 2
Ecma - Bicycles: 2
Ecma - Cars: 1

If using the ctAverage aggregation type :

Acme - Bicycles : 10 + 30 = 40 / 2 = 20
Acme - Cars : 20 + 40 = 60 / 2 = 30
Ecma - Bicycles: 50 + 70 = 120 / 2 = 60
Ecma - Cars: 60 = 60 / 1 = 60

PivotCube supports following functions in standard mode

  • Summa
  • Count
  • Average (as summa/count)
2.1.2.3 Extended statistical functions

If the minimal set of aggregation functions is insufficient, PivotCube allows you to to use an extended set of aggregation functions. However, you need to build a supersaturated cube.

To build the supersaturated cube, you have to set the ExtendedMode property of the TPivotCube to True.

2.1.2.3.1 Supersaturated cube

When the ExtendedMode is True, PivotCube does not make aggregation of database records by loading cube.  Instead, all measure values are placed in a special store “as is” for special processing. Internally, PivotCube creates a full copy of  the Facts table inside of itself, which enlarges the cube size.

2.1.2.3.2 How it works?

Calculation of aggregating values in case of supersaturated cube is made ‘on the fly’ by every query on aggregation.

For example

Fact table:

Client Ware Quantity sold
Acme Bicycles 10
Acme Cars 20
Acme Bicycles 30
Acme Cars 40
Ecma Bicycles 50
Ecma Cars 60
Ecma Bicycles 70

Pre-aggregating cube:
Acme - Bicycles: 40
Acme - Cars: 40
Ecma - Bicycles: 120
Ecma - Cars: 60

By calculation of total value on ‘Client’:

Pre-aggregating cube:
Acme = 40 + 60 = 100
Ecma = 120 + 60 = 180

Supersaturated cube:
Acme = 10 + 20 + 30 + 40 = 100
Ecma = 50 + 60 + 70 = 180

In calculating the median:

Pre-aggregating cube:
Acme = (40 + 60) / 2 = 50
Ecma = (60 + 120) / 2 = 90

Supersaturated cube:
Acme = (20 + 30) / 2 = 25
Ecma = 60

As we can see the results differ from each other very much.

2.1.2.3.3 List of supported functions

PivotCube supports the following set of functions in ExtendedMode:

  • Min
  • Max
  • Median
  • 1stQuartile
  • 3rdQuartile
  • InterQuartile
  • Quartile Deviation
  • Coeff of. Quartile Deviation
  • Skewness
  • Kurtosis
  • Standard Deviation
  • Variance
  • Coeff. of Deviation
  • Mean St. Error
  • Mean Abs. Deviation
  • Non-equal set
  • Custom aggregation
2.1.2.3.4 Custom aggregation

When PivotCube works in supersaturated mode (PivotCube.ExtendedMode = True), PivotCube allows you to define your own aggregation calculations using the TPivotMap.OnCustomMeasure event handler.

To use custom aggregation measures, you need to do the following:

1. Define a new measure with the CalcType property set to ctCustom

2. Create your own event handler in TPivotMap.OnCustomMeasure

TOnCustomMeasure = procedure(Measure: Integer; Count: Integer; Data: OleVariant; var Value: Double) of object

where

  • Measure - index of Measure in TPivotCube.Measures collection
  • Count - number of items in the Data object
  • Data - variant array of double. It is filled with sorted values of the measure's field.
  • Value - the output value

Example:

{calculate rounded average - if number of values more then 3, remove min and max values and calculate average, otherwise - simple average calculated (Summa/Count)}

procedure TForm1.PivotMap1CustomMeasure(Measure, Count: Integer;
Data: OleVariant; var Value: Double);
 var
   i : integer;
   pData : PDoubleArray;
  begin
  Value := 0;
  pData := VarArrayLock(Data);
try
  if Count >= 3 then
    begin
    for i := 1 to Count - 2 do
      Value := Value + pData[i];
      Dec(Count,2);
    end
  else
    begin
    for i := 0 to Count - 1 do
      Value := Value + pData[i];
    end;
  Value := Value / Count;
  finally
      VarArrayUnLock(Data);
  end;
end;


Note: Custom aggregated measures are not saved into the cube file.
2.1.3 Dimensions

Adding a new dimension to the cube structure is carried out by using the Dimensions property editor of TPivotCube.

DataSet   the data source that will be used for loading the present dimension. PivotCube will check if Dimension.DataSet property is equal to the TPivotCube.DataSet property.  If it is not equal, the dimension data will be loaded before loading the facts.  PivotCube  will not check for the presence of new dimension values when loading the Facts data.
AliasName   the string value that will be displayed for the dimension.
FieldName    the field name from the selected data source that will be used for the dimension's values.
KeyField   the field name from the selected data source that is used as the key value in the Facts table when the 'Star' data schema is used.  If using 'Table' schema, this value must be equal to the FieldName value.
LookupField   the field from the data source that is interpreted as string representation of dimension element when building cube slices for user.
DescriptionField   the field name from the data source that keeps the description of the value represented by AliasName
DisplayName String value, which will display in TPivotGrid dimension box or as caption of item on dimension toolbars. By default is same as AliasName. But DisplayName may be changed at any time to any different string. It allows translate dimension names in multi-language applications. Also DisplayName value saved/restored when slice(map) saved/restored
Enabled   determines, if the dimension should be used when building a new cube.  Using this property allows you to build different cubes using the same data schema and Facts data source.

For example:
You have the following dimensions

  • Client
  • Ware
  • Store
  • Date
  • Shop

You can then choose to enable only 2 dimensions at run-time e.g.

Date and Ware or Client and Shop

By enabling only the required columns, the resulting cube will be smaller in size.

Sorting   determines the sorting order of the dimension. Possible values: 
  • dmNoSort – don’t prepare any sorting by default. This is the default value. It is used when the data from the data source is already sorted in the required order. 
  • dmNameSort – to sort the dimension by the field referenced in the 'LookupField' property. 
  • dmKeySort – to sort the dimension by the field referenced in the 'KeySort' property.

Sorting by Key (dmKeySort) may be used when the order for the description values do not correspond with the key values e.g. a 'Month' field where the month number (e.g. 4 for April) doesn’t coincide with alphabet (A for April).

2.1.3.1 Hierarchical dimensions

Besides simple dimensions that are linear lists (enumerations) of elements, PivotCube allows the use of hierarchical dimensions, organized as an unbalanced tree of arbitrary size. Hierarchical dimensions can be set by using the KeyField and ParentField properties, where ParentField has the same data type as KeyField.

For an example of such a data structure, please see city.dbf file in demo package

Here we can see the following hierarchy

North America -> US -> Santa Monica
North America -> Canada -> Vancouver
Islands -> Fiji -> Suva
etc.

By using of hierarchical dimensions aggregating is making both for parents and for descendents.

North America
20,000 
US
10,000 
Santa Monica
3,000
San Jose
7,000

Using hierarchical dimensions imposes the following limitations:

  • The data source for hierarchical dimensions cannot be used as the data source for the Facts table.
  • Before loading the cube, the data source of the hierarchical must have been sorted in the following manner: first, there must be loaded elements of the highest hierarchical level, then elements of next hierarchical level etc (see the table).
2.1.3.2 Wrapping

Wrapping in PivotCube terms means deriving a new dimension value based on the underlying field value from the database.  Wrapping can only be done for dimensions that obtain their values from the Fact table.  This also means that hierarchical dimensions cannot be wrapped.

2.1.3.2.1 Dimensions with ‘Date’ datatype

Dimensions with with ‘Date’ datatype are very wrapped to extract their individual elements e.g. year, month, day.

For automatic wrapping of dimensions with the 'Date' datatype, use the WrapTo property.

Available values are as follows:

wt_None no wrapping is performed.
wt_Year the 'Year' element from the underlying field will be extracted, and the count of items in the dimension will be equal to the number of unique years in the Facts table.
wt_Month the 'Month' element from the underlying field will be extracted, and the count of items in the dimension will be equal to the number of unique months in the Facts table.  This count will never exceed 12, regardless of the period selected.
wt_Day the 'Day' element from the underlying field will be extracted, and the count of items in the dimension will be equal to the number of unique days in the Facts table.  This count will never exceed 31, regardless of the period selected.
wt_Custom is used for custom processing of dimension with algorithm given by programmer.

2.1.3.2.2 Custom wrapping

There are cases when the standard wrapping variants (Year, Month, Day) are inadequate, or wrapping is required for other field types.

For example
Financial analysis may demand dimension based on ‘Quarter' or ‘Season’, or ‘Week’, or ‘Decade’, or ‘Half-year’ etc.

For fuzzy-logic application it may be required to derive dimension values in the form ‘Too much/Too low/Average’ based on a summary field.

The custom wrapping algorithm is realized in the OnCustomWrapData event handler in TPivotCube. It is also necessary to set WrapTo property to wt_Custom.

Event handler example for getting information by seasons:

procedure TForm1.PivotCube1CustomWrapData(const AliasName: String; const Value: OleVariant; var ID: Integer; var Name: String);
var
  m,d,y,h : Word;
begin
  if AliasName = 'Seasons' then 
  begin
    DecodeDate(Value,y,m,d);

    case m of
      3,4,5: begin
        ID := 1;
        Name := 'Spring';
      end;

      6,7,8: begin
        ID := 2;
        Name := 'Summer';
      end;

      9,10,11: begin
        ID := 3;
        Name := 'Autumn';
      end;

      1,2,12: begin
        ID := 4;
        Name := 'Winter';
      end;
    end;  
end;

The ID values will be interpreted as KeyField values, and the Name values as LookupField values in a simple dimension.

2.1.3.3 Processing of incorrect data and Null

Sometimes the Facts table contains no values for a dimension field, or the Facts table contains a reference to the key that doesn’t exist in a lookup table.  In the first situation, the field is categorized as 'having a NULL value', and in the second, the key field 'refers to an unknown value'.  To handle these situations, use the following properties:

NullName : string
UnknownName: string

Defining these properties gives a more descriptive text when the above situations are encountered.

For example, we could set the properties to the following for a 'Client' dimension ‘Client’:

NullName = 'Unknown client'
UnknownName = ‘Client was deleted for inactivity’

2.1.3.4 Forecasting ability

PivotCube provides some forecasting abilities, allowing you to forecast the value of the item in the dimension before the first actual value, and the value of the item after the last actual value.  If a dimension has less than 3 items, the forecasted values will not be calculated and 0 will be displayed

You enable forecasting by setting the TDimensionItem.Forecasting.Enabled property to true.

ConsequentName The string constant which will be displayed as the caption of the consequent dimension item
PrecedingName The string constant which will be displayed as caption of the  preceding dimension item
Enabled If True, forecast values will be calculated for that dimension. 
Method
mathematical algorithm which will be used to calculate the forecast values
 

dftMovingAverage

Simple Moving Averages
The best-known forecasting method is the moving averages method. It simply takes a certain number of past periods and adds them together, then divides the result by the number of periods. Simple Moving Averages (MA) is an effective and efficient method provided the time series is stationary in both mean and variance. The following formula is used in finding the moving average of order n, MA(n) for a period t+1:

MAt+1 = [Dt + Dt-1 + ... +Dt-n+1] / n

where n is the number of observations used in the calculation
 

dftWeightedMovingAverage

Weighted Moving Averages
Very powerful and economical. It is widely used where repeated forecasts requires methods like sum-of-the-digits and trend adjustment methods. An example of a Weighted Moving Averages calculation is as follows:

Weighted MA(3) = w1.Dt + w2.Dt-1 + w3.Dt-2

where the weights are any positive numbers such that: w1 + w2 + w3 =1. A typical weights for this example is, w1 = 3/(1 + 2 + 3) = 3/6, w2 = 2/6, and w3 = 1/6.

 

dftDoubleExponentialSmoothing
dftTripleExponentialSmoothing

Exponential Smoothing Techniques
One of the most successful forecasting methods is the exponential smoothing (ES) method. While the simple MA method is a special case of the ES, the ES is more conservative in its data usage.  It also offers the following advantages:
  • can be modified to be used effectively for time series with seasonal patterns
  • easy to adjust for past errors
  • easy to prepare follow-on forecasts, which is ideal for situations where many forecasts must be prepared and several different forms are used depending on the presence of trend or cyclical variations 

In short, an ES is an averaging technique that uses unequal weights, where the weights applied to past observations decline in an exponential manner, as follows:

Ft+1 = a Dt + (1 - a) Ft

where:

Dt is the actual value
Ft is the forecasted value
a is the weighing factor, which ranges from 0 to 1
t is the current time period.

Notice that, the smoothed value becomes the forecast for period  t + 1.

A small a provides a lot of smoothing while a large a provides a fast response to the recent changes in the time series and a smaller amount of smoothing. Notice that the exponential smoothing and simple moving average techniques will generate forecasts having the same average age of information if moving average of order n is the integer part of (2-a)/a.

An exponential smoothing over an already smoothed time series is called double-exponential smoothing. In some cases, it might be necessary to extend it to  triple-exponential smoothing.  While simple exponential smoothing requires stationary condition, the double-exponential smoothing can capture linear trends and triple-exponential smoothing can handle almost all other business time series.


Note: Forecasted values are not part of the cube data, so some views on the measure (like ranks, record count,  prev/next items etc) will be 0 when applied to row/column with forecast values. Also, forecast values are not included in total/subtotal calculations, but included in running total calculations.

2.1.4 Facts table definition

The data for the Facts table may be obtained from any TDataset descendant class.  You specify this in the FactTableDataset property of TPivotCube.

2.1.4.1 Using GROUP BY

It is a common question whether it is necessary to include a ‘GROUP BY’ clause in SQL queries to carry out pre-aggregating on server. 

In PivotCube's case, when building a supersaturated cube, using a ‘GROUP BY’ clause leads to irreplaceable data distortion, and should be avoided.  For standard cubes, using a ‘GROUP BY’ clause may speed up loading of the cube, if data is loaded from a high-performance database server and grouping is performed by using index keys.  Also, using ‘GROUP BY’ is usually not possible when used with user-defined functions in most SQL databases.

2.1.5 Cube building

Building a cube involves loading the dimensions data and fact data from the defined data sources.  To control loading you may use the OnNextRecord event handler of TPivotCube.  This event handler is called every time the Build method is reading a record from the  Facts table.

2.1.5.1 Using the 'Build' method

The Build method of TPivotCube loads data from the dimensions and facts data sources into the cube.  If successful, the Active property is set to True.

The Build method does not check if the cube is active before starting the cube process.  If the cube is already active and the CanUpgrade property is True, the Build method will then add new data to the existing cube.

Example of using Build method for adding new data to existing cube:

if Pivotcube1.Active then
begin
  if PivotCube1.CanUpgrade then
  begin
    PivotCube1.Build;
    PivotGrid1.RefreshData;
  end
end;

 

2.1.5.2 Using the “Active” property

The Active property may be used start the building of a new cube and to unload an existing cube from memory.  

To build a new cube, simply set the value of the Active property to True.  To free the memory used by the cube, set Active to False.

Before the cube is built, an event handler allows you to stop the build process.  This event handler is named OnCubeActivate, and has a parameter named Allow.  Setting Allow to False will disable data loading to the cube.  This may be useful in situations where you want to disable the loading of the cube if certain conditions were not met e.g. having complete use of computer resources etc.

Similarly, there is an OnCubeDeactivate event handler to disable the deactivation of a cube.  A situation where this may be useful is where you want to ensure the cube is saved before it is deactivated.

2.1.5.3 ‘CanUpgrade’ property

The CanUpgrade property is used to check if data may be appended to the cube.  The internal structure of a cube may be different from the structure defined in the TPivotCube settings, which determines if a cube can be upgraded.

When loading data from a file or stream, PivotCube automatically checks if the new data may be appended to the existing cube.

2.2 Saving data to a file

Once a cube has been built, it may be saved to a file for future analysis.  To save a cube, use the Save method of TPivotCube.  You can determine if a cube should be allowed to be saved by handling the OnBeforeSave event handler.

When saving to a file, if the SaveCubeName parameter does not contain an absolute path mark (checking for substring presence “:\”), the PivotCube data engine will try to save the file using the following rules and order:

2.3 Loading data from file

A saved cube can be loaded by calling the Load method of TPivotCube.

If the 'LoadCubeName' parameter does not contain an absolute path mark (checking for substring presence “:\”), the PivotCube data engine checks for presence of the zCubeData environment variable and also the presence of the CUBEDATA variable in the following registry path:

HKEY_CURRENT_USER/SOFTWARE/PIVOTWARE/PIVOTCUBE/CUBEDATA

The PivotCube data engine will look for the path referenced in these settings.  If no settings are found, PivotCube will try to load the cube from the current directory.

2.4 Working with streams

PivotCube also allows saving and loading of cubes to/from streams.  This is done by means of using the standard OLE interface for ‘Istream’ streams.

To obtain IStream interface from standard TStream object use

IStream := TStreamAdapter.Create(TStreamObject);

2.5 Working with XML

PivotCube allows saving and loading of cubes to/from XML storage. It used MSXML30 interfaces to save or parse xml file. 

Use TPivotCube.SaveToXML(root : IXMLDOMElement) to save cube into branch of XML storage

Use TPivotCube.LoadFromXML(root : IXMLDOMElement) to load cube from branch of XML storage
 


PivotCube User Guide rev. 2004.02.10