SAP HANA FAQ'S

 SAP HANA FAQS

What is the reason for going In-memory?

One reason is the number of CPU cycles per second is increasing and the cost of processors is decreasing.For managing the data in memory, there is five-minute rule which is based on the suggestion that it costs more to wait for the data to be fetched from disk than it costs to keep data in memory so it depends on how often you fetch the data.
For example there is a table and no matter how large it is and this table is touched by a query at least once every 55 minutes, it is less expensive (in hardware costs) to keep it in memory than to read it from memory and if it is frequently accessed it is less expensive to store it in memory.

What is a Five-minute rule?

It is a rule of thumb for deciding whether a data item should be kept in memory, or stored on disk and read back into memory when required. The rule is “randomly accessed disk pages of cache are re-used every 5 minutes”.

What is multi-core CPU?

Multiple CPU’s on one chip or in one package is called multi-core CPU. Traditional databases for online transaction processing (OLTP) do not use current hardware efficiently.

What is Stall?

Waiting for data to be loaded from main memory into the CPU cache is called as Stalls.

What is SAP In-Memory Appliance (SAP HANA)?

HANA is an in-memory technique to store data that is particularly suited for handling very large amounts of tabular, or relational, data with extra ordinary performance. Common databases store tabular data row-wise. Reorganizing the data in memory column-wise brings a tremendous speed increase when accessing a subset of the data in each table row.

What are the components or products of HANA?

SAP HANA contains the following components and administration tools:

• SAP® In-Memory Computing Engine (IMCE) Server 1.0

• SAP® IMCE Clients 1.0 – The IMCE clients are the interfaces by which the IMCE can communicate with other components. The following subcomponents are included:

IMCE ODBO 1.0

IMCE ODBC 1.0

IMCE JDBC 1.0

IMCE SQLDBC 1.0

• SAP® IMCE Studio1.0 (includes SAP HANA Modeler)

• Sybase Replication Server 15 + Sybase Enterprise Connect Data Access (ECDA)

• Sybase Replication Agent

• SAP® HANA Load Controller 1.0 (includes R3Load, RepServer De-cluster Add-On)

• SAP® Host Agent 7.20

What are the different editions available in HANA appliance software?

Platform , and Enterprise edition.
Platform edition is intended for customers who want to use ETL-based replication and already have a license for SAP BO Data Services.Enterprise edition is intended for customers who want to use either trigger-based replication or ETL-based replication and do not already have all of the necessary licenses for SAP BO Data Services.
Extended edition is intended for customers who want to use the full potential of all available replication scenarios including log-based replication.

What is columnar and Row-Based Data Storage?

Fig: Row and Column-based storage

A database table contains data in the form of rows and columns. However Computer memory is organized as a linear structure. To store a table in linear memory, there are two options. A row-based storage stores a table as a sequence of records, each of which contains the fields of one row.  In a columnar storage the entries of a column are stored in contiguous memory locations.

The SAP HANA database allows to specify whether a table is to be stored column-wise or row-wise. It is also possible to alter an existing table from columnar to row-based and vice versa.

Search operations in tabular data can be accelerated by organizing data in columns instead in rows.

What are the advantages of Column based tables?


Calculations are typically executed on single or a few columns only.
The table is searched based on values of a few columns.
The table has a large number of columns.
The table has a large number of rows and columnar operations are required (aggregate, scan, etc.).
High compression rates can be achieved because the majority of the columns contain only few distinct values (compared to number of rows).

What are the advantages of Row-based tables?

The application needs to only process a single record at one time (many selects and/or updates of single records).
The application typically needs to access a complete record (or row).
The columns contain mainly distinct values so that the compression rate would be low.
Neither aggregations nor fast searching are required.
The table has a small number of rows (e. g. configuration tables).

In which case the data to be stored in columnar storage?

To enable fast on-the-fly aggregations, ad-hoc reporting, and to benefit from compression mechanisms it is recommended that transaction data to be stored in a column-based table.

What are the advantages of Columnar tables?

Higher Data Compression Rates

Higher Performance for Column Operations

Elimination of Additional Indexes

Parallelization

Elimination of Materialized Aggregates

What are the different Compression Techniques you know?

Run-length encoding

Cluster encoding

Dictionary encoding

Why materialized aggregates are not required?

With a scanning speed of several gigabytes per millisecond, in-memory column stores, make it possible to calculate aggregates on large amounts of data on the fly with high performance. This is expected to eliminate the need for materialized aggregates in many cases.

What are the advantages of Eliminating materialized aggregates?

No additional tables for storing aggregate results means:

Simplified data model

Simplified application logic

Higher level of concurrency and

With the fly Aggregation we have aggregated values up to date

What is parallelization?

Column-based storage makes it easy to execute operations in parallel using multiple processor cores. In a column store data is already vertically partitioned means that operations on different columns can easily be processed in parallel. If multiple columns need to be searched or aggregated, each of these operations can be assigned to a different processor core. In addition operations on one column can be parallelized by partitioning the column into multiple sections that can be processed by different processor cores (core 3 and 4 below).

What is the purpose of DB trigger in SLT?

DB trigger considers only relevant table changes for DB recording and are recorded in logging tables; the replicated changes are later deleted from the logging tables. This approach will not have any performance impact in the source system.

What is the purpose of Controller Module in SLT?


Controller Module ensures mapping between HANA target database structure and source system structure. It also allows conversion of data values and scheduling options while replicating the source data.

Which tool supports ETL based replication into HANA?

Business Objects Data services (BODS) supports ETL based replication into SAP HANA.

What are the advantages of using BODS to load data into SAP HANA?

The advantages of using BODS to load data into SAP HANA are:

a) Loads unstructured data into SAP HANA

b) Sorts and filters relevant business data

c) Reads BI Content extractors or SAP function modules thereby reusing the logic

d) Merges multiple data streams

e) Transforms the data before loading using advanced and complex transformations

f) Supports connectivity to a wide range of data sources

What are the limitations while using BODS for data replication?


The limitations while using BODS for data replication are:

a) Since BODS uses batch mode to load the data, real-time capabilities will be limited; only near real-time can be achieved

b) ETL based replication takes longer time to implement

No comments:

Post a Comment