|Stable release||SAP HANA 1.0 SPS5 / November 14, 2012; 5 months ago|
|Written in||C, C++|
- SAP HANA DB (or HANA DB) refers to the database technology itself,
- SAP HANA Studio refers to the suite of tools provided by SAP for modeling,
- SAP HANA Appliance refers to HANA DB as delivered on partner certified hardware (see below) as an appliance. It also includes the modeling tools from HANA Studio as well as replication and data transformation tools to move data into HANA DB,
- SAP HANA One refers to a deployment of SAP HANA certified for production use on the Amazon Web Services (AWS) cloud. (see below)
- SAP HANA Application Cloud refers to the cloud based infrastructure for delivery of applications (typically existing SAP applications rewritten to run on HANA).
HANA DB takes advantage of the low cost of main memory (RAM), data processing abilities of multi-core processors and the fast data access of solid-state drives relative to traditional hard drives to deliver better performance of analytical and transactional applications. It offers a multi-engine query processing environment which allows it to support both relational data (with both row- and column-oriented physical representations in a hybrid engine) as well as graph and text processing for semi- and unstructured data management within the same system. HANA DB is 100% ACID compliant.
SAP HANA is the synthesis of three separate products – TREX, P*Time and MaxDB.
- TREX (Text Retrieval and Extraction) is a search engine. It began in 1996 as a student project at SAP in collaboration with DFKI. TREX became a standard component in SAP NetWeaver in 2000. In-memory attributes were added in 2002 and columnar data store was added in 2003, both as ways to enhance performance.
- In 2005 SAP acquired Menlo Park based Transact in Memory, Inc. With the acquisition came P*Time, an in-memory light-weight online transaction processing (OLTP) RDBMS technology with a row-based data store.
- MaxDB (formerly SAP DB), a relational database coming from Nixdorf via Software AG (Adabas D) to SAP, was added to TREX and P*Time to provide persistence and more traditional database features like backup.
In 2008, SAP CTO Vishal Sikka wrote about HANA “…our teams working together with the Hasso Plattner Institute and Stanford University demonstrated how a new application architecture is possible, one that enables real-time complex analytics and aggregation, up to date with every transaction, in a way never thought possible in financial applications”. In 2009 a development initiative was launched at SAP to integrate the three technologies above to provide a more comprehensive feature set. The resulting product was named internally and externally as NewDB until the change to HANA DB was finalized in 2011.
SAP HANA is not SAP’s first in-memory product. Business Warehouse Accelerator (BWA, formerly termed BIA) was designed to accelerate queries by storing BW infocubes in memory. This was followed in 2009 by Explorer Accelerated where SAP combined the Explorer BI tool with BWA as a tool for performing ad-hoc analyses. Other SAP products using in-memory technology were CRM Segmentation, By Design (for analytics) and Enterprise Search (for role based search on structured and unstructured data). All of these were based on the TREX engine.
Taking a different approach Advanced Planning and Optimization (APO) used LiveCache for its analytics.
Versions, service packs
- SP0 – released 20 November 2010; HANA first public release
- SP1 – released 20 June 2011; HANA general availability (GA); focus is as an operation data mart
- SP2 – released 27 June 2011; more data mart functions
- SP3 a.k.a HANA 1.5 – released 7 November 2011); focus is on HANA as the underlying database under Business Warehouse (BW); also named Project Orange
- SP4 – Q2, 2012; resolved a variety of stability issues and add new features for BW, according to SAP
- SP5 – Feb, 2013; introduces Extended Application Services (REST driver)
Big data refers to datasets that exceed the abilities of commonly used tools. While no formal definition based on size exists, these datasets typically reach terabytes (TB), petabytes (PB), or even exabytes in size. SAP has positioned HANA as its solution to big data challenges at the low end of this scale. At launch HANA started with 1TB of RAM supporting up to 5TB of uncompressed data. In late 2011 hardware with 8TB of RAM became available which supported up to 40TB of uncompressed data. SAP owned Sybase IQ with its more mature MapReduce-like functionality has been cited as a potentially better fit for larger datasets. By May 2012, HANA was able to run on servers with 100TB main memory powered by IBM. Hasso Plattner claimed that the system was big enough to run 8 largest SAP customers.
Other databases marketed by SAP
SAP still offers other database products:
Offering its own database solution to support its Business Suite ERP puts SAP in direct competition with some of its largest partners IBM, Microsoft and Oracle. Among the more prominent competing products are:
- In-memory database management systems
Strategic workforce planning
SAP Business Objects Strategic Workforce Planning (SWP) was among the first SAP applications to be redesigned to take advantage of HANA’s abilities. SWP on HANA is aimed at HR executives who want to simulate workforce models in real-time taking into account turnover, retirement, hiring and other variables.
Smart Meter Analytics
In September 2011 SAP released its Smart Meter Analytics tool. This is to help utility companies with large smart meter deployments to manage and use the large amount of data generated by such meters.
The focal point of the community of developers on SAP HANA platform is SAP HANA Developer Center or “the DevCenter”. The DevCenter offers general information, education materials, community forums, plus access to SAP HANA database with free licenses:
- 30-days evaluation,
- free developer license to images hosted in the public cloud (Amazon Web Services)
Access to some materials and features may require free registration.
SAP HANA Cloud Options
In September 2011 SAP announced its intentions to partner with EMC and VMWare to enable a HANA based application infrastructure cloud. This platform as a service (PaaS) offering includes HANA DB-as-a-service in conjunction with a choice of either a Java-based or ABAP-based stack. Applications built for either stack will have access to HANA DB through a variety of APIs. The Java based approach, codenamed Project River, is based on the NetWeaver 7.3.1 Java application server. The ABAP-based approach is designed more for SAP’s existing user base – for example in the SAP Business ByDesign suite of business applications including ERP, CRM and supply chain management.
On October 16, 2012 SAP announced general availability of two SAP HANA options delivered in the cloud:
- SAP NetWeaver Cloud (now called SAP HANA Cloud) – an open standards-based application service and
- SAP HANA One – a deployment of SAP HANA on the Amazon Web Services cloud on an hourly basis. Only 60GB option is available and a 24/7 instance costs $30,572/year, though an upfront commitment with Amazon can substantially reduce the hardware portion of the cost.
At its most basic, the architecture of the HANA database system has the following components.
- Four Management services
- The Connection and Session Management component manages sessions/connections for database clients. Clients can use a variety of languages to communicate with the HANA database.
- The Transaction Manager component helps with ACID compliance by coordinating transactions, controlling transactional isolation and tracking running and closed transactions.
- The Authorization Manager component handles all security and credentialing (see Security below).
- The Metadata Manager component manages all metadata such as table definitions, views, indexes and the definition of SQL Script functions. All metadata, even of different types, is stored in a common catalog.
- Three Database Engine components
- Calculation Engine component executes on calculation models received from SQL Script (and other) compilers.
- Optimizer and Plan Generator component parses and optimizes client requests.
- Execution Engine component invokes the various In-Memory Processing Engines and routes intermediate results between consecutive execution steps based on the optimized execution plan.
- Three In-Memory Storage Engines
- Relational Engine (see Column and row store below)
- The Graph Engine (where should this go?)
- Text Engine (see Unstructured data below)
- Persistency Layer (see Storage below)
Column and row store
The Relational Engine supports both row- and column-oriented physical representations of relational tables. A system administrator specifies at definition time whether a new table is to be stored in a row- or in a column-oriented format. Row- and column-oriented database tables can be seamlessly combined into one SQL statement, and subsequently, tables can be moved from one representation form to the other.
The row store is optimized for concurrent WRITE and READ operations. It keeps all index structures in-memory rather than persisting them on disk. It uses a technology that is optimized for concurrency and scalability in multi-core systems. Typically, Metadata or rarely accessed data is stored in a row-oriented format.
Compared to this, the column store is optimized for performance of READ operations. Column-oriented data is stored in a highly compressed format in order to improve the efficiency of memory resource usage and to speed up the data transfer from storage to memory or from memory to CPU. The column store offers significant advantages in terms of data compression enabling access to larger amounts of data in main memory. Typically, user and application data is stored in a column-oriented format to benefit from the high compression rate and from the highly optimized access for selection and aggregation queries.
Business Function Library
The Business Function Library is a reusable library (similar to stored procedures) for business applications embedded in the HANA calculation engine. This eliminates the need for developing such calculations from scratch. Some of the functions offered are
Predictive Analysis Library
Similar to the Business Function Library, the Predictive Analysis Library is a collection of compiled analytic functions for predictive analytics. Among the algorithms supported are
R is a programming language designed for statistical analysis. An open source initiative (under the GNU Project) R is integrated in HANA DB via TCP/IP. HANA uses SQL-SHM, a shared memory-based data exchange to incorporate R’s vertical data structure. HANA also introduces R scripts equivalent to native database operations like join or aggregation. HANA developers can write R scripts in SQL and the types are automatically converted in HANA. R scripts can be invoked with HANA tables as both input and output in the SQLScript. R environments need to be deployed to use R within SQLScript.
The Persistency Layer is responsible for the durability and atomicity of transactions. It manages data and log volumes on disk and provides interfaces for writing and reading data that are leveraged by all storage engines. This layer is based on the proven persistency layer of MaxDB, SAP’s commercialized disk-centric relational database. The persistency layer ensures that the database is restored to the most recent committed state after a restart and that transactions are either completely executed or completely undone. To achieve this efficiently, it uses a combination of write-ahead logs, shadow paging, and savepoints.
Logging and transactions
HANA’s persistence layer manages logging of all transactions in order to provide standard backup and restore functions. The same persistence layer manages both row and column stores. It offers regular save points and logging of all database transaction since the last save point.
Concurrency and locking
HANA DB uses the multiversion concurrency control (MVCC) principle for concurrency control. This enables long-running read transactions without blocking update transactions. MVCC, in combination with a time-travel mechanism, allows temporal queries inside the Relational Engine.
Since ever more applications require the enrichment of normally structured data with semi-structured, unstructured, or text data, the HANA database provides a text search engine in addition to its classic relational query engine.
The Graph Engine supports the efficient representation and processing of data graphs with a flexible typing system. A new dedicated storage structure and a set of optimized base operations are introduced to enable efficient graph operations via the domain-specific WIPE query and manipulation language. The Graph Engine is positioned to optimally support resource planning applications with huge numbers of individual resources and complex mash-up interdependencies. The flexible type system additionally supports the efficient execution of transformation processes, like data cleansing steps in data-warehouse scenarios, to adjust the types of the individual data entries, and it enables the ad-hoc integration of data from different sources.
The Text Engine provides text indexing and search abilities, such as exact search for words and phrases, fuzzy search (which tolerates typing errors), and linguistic search (which finds variations of words based on linguistic rules). In addition, search results can be ranked and federated search abilities support searching across multiple tables and views. This functionality is available to applications via specific SQL extensions. For text analyses, a separate Preprocessor Server is used that leverages SAP’s Text Analysis library.
The figure above gives an overview of the alternative methods for data replication from a source system to a HANA database. Each method handles the required data replication differently, and consequently each method has different strengths. It depends on your specific application field and the existing system landscape as to which of the methods best serves your needs.
Trigger-Based Data Replication Using SAP Landscape Transformation (LT) Replication Server is based on capturing database changes at a high level of abstraction in the source ERP system. This method of replication benefits from being database-independent, and can also parallelize database changes on multiple tables or by segmenting large table changes.
Extract, transform, load (ETL) based data replication uses SAP BusinessObjects Data Services to extract the relevant business data from a source system such as ERP and load it into a HANA database. In addition, the ETL-based method offers options for the integration of third-party data providers. Replication jobs and data flow are configured in Data Services. This permits the use of multiple data sources (including external ones) and data validation.
Transaction Log-Based Data Replication Using Sybase Replication is based on capturing table changes from low-level database log files. This method is database-dependent. Database changes are propagated for each database transaction, and they are then replayed on the HANA database. This maintains consistency, but at the cost of being unable to use parallelizing to propagate changes.(rewrite)
Backup and recovery
Immediately after launch, with Service Pack 2, backup and recovery abilities were limited to either Recovery to Last Back-up or Older Data Back-up or Recovery to Last State Before Crash. Additional backup features were implemented in Service Pack 3. These included a Full Automatic or Manual Log Backup option and a Point In-Time Recovery option. New administration features included a new Backup Catalog which records all backup attempts.
One implication of HANA’s ability to work with a full database in memory is that computationally intensive KPI calculations can be completed rapidly when compared to disk based databases. Pre-aggregation of data in cubes or storage of results in materialized views is no longer necessary.
SAP HANA Information Composer is a web based tool which allows users to upload data to a HANA database and manipulate that data by creating Information Views. In the data acquisition portion, data can be uploaded, previewed and cleansed. In the data manipulation portion objects can be selected, combined and placed in Information Views which can be used by SAP BusinessObjects tools.
Security and role based permissions are managed by the Authorization Manager in HANA DB. Besides standard database privileges such as create, update or delete HANA DB also supports analytical privileges that represent filters or drill-down limitations on queries as well as access control access privileges to values with certain attributes. HANA DB components invoke the Authorization Manager whenever they need to check on user privileges. The authentication can then be done either by the database itself or be further delegated to an external authentication provider, such as an LDAP directory.
Performance and scalability
SAP has stated that customers have realized gains as high as 100,000x in improved query performance when compared to disk based database systems.
In March 2011, Wintercorp (an independent testing firm specializing in large scale data management) was retained by SAP to audit test specifications and results from test runs. The test used concepts similar to those of the industry standard TPC-H benchmark. The test data had between 600 million and 1.8 billion rows and the test ran five analytical query types and three operational report query types. The combined throughput of analytical and operational report queries ran between 3007 queries/hour and 10,042 queries per hour depending on the volume of data.
To enable scalability in terms of data volumes and the number of application requests, the HANA database supports scale-up and scale-out. For scale-up, all algorithms and data structures are designed to work on large multi-core architectures especially focusing on cache-aware data structures and code fragments. For scale-out, the HANA database is designed to run on a cluster of individual machines allowing the distribution of data and query processing across multiple nodes.
Competing in-memory databases for online transaction processing and analytics workloads include:
- IBM solidDB, a DB2 front-ended high-speed database suite
- Oracle In-Memory Database Cache, a performance extension of Oracle 11g
- ^ a b c Appleby, John. “Updated: The SAP HANA FAQ – answering key SAP In-Memory questions”. Bluefin Solutions (Corporate Blog). Retrieved 23 January 2012.
- ^ a b c d e f g Färber, Franz; Sang Kyun Cha, Jürgen Primsch, Christof Bornhövd, Stefan Sigg and Wolfgang Lehner (December 2011). “SAP HANA Database – Data Management for Modern Business Applications”. SIGMOD Record 40 (4): 45–51. Retrieved 24 January 2012.
- ^ a b “SAP Introduces SAP HANA® Cloud, One of the Industry’s First In-Memory Cloud Platforms”. SAP (Corporate Press release). Retrieved 20 March 2013.
- ^ a b Sikka, Vishal. “Timeless Software”. Timelessness / Blogger.com (personal blog). Retrieved 19 January 2012.
- ^ Desmond, Paul (11 August 2011). “SAP HANA – Updating the Naming Conventions”. ERP Executive. Retrieved 23 January 2012.
- ^ “Transact In Memory, Inc.”. Bloomberg Businessweek. Retrieved 19 January 2012.
- ^ “SAP HANA forum: SP5 availability”. Retrieved 7 March 2013.
- ^ Jung, Thomas. “SAP HANA Extended Application Services”. Retrieved 7 March 2013.
- ^ a b Woods, Dan (5 January 2012). “Bringing Value of Big Data to Business: SAP’s Integrated Strategy”. Forbes. Retrieved 23 January 2012.
- ^ Rudnytskiy., Vitaliy. “Big Data and SAP HANA? Or Sybase IQ?”. Vital BI / WordPress (Personal Blog). Retrieved 23 January 2012.
- ^ “IBM and SAP create the world’s largest SAP HANA system”. IBM.
- ^ Greenbaum, Joshua. “A Revolution Threatens the Relational Database”. IT Business Edge. Retrieved 23 January 2012.
- ^ “HANA and the Future of Business Intelligence”. EPI-USE Systems Limited. Retrieved 19 March 2013.
- ^ Appleby, John. “SAP HANA: an analysis of the major hardware vendors”. People, Process, Technology (personal blog). Retrieved 19 January 2012.
- ^ “SAP High-Performance Analytic Appliance”. Cisco Systems, Inc. Retrieved 19 January 2012.
- ^ “Dell Strengthens ERP Solutions Portfolio with PowerEdge R910 Server Now Certified to Run SAP® In-Memory Appliance (SAP HANA™)”. Dell Inc. Retrieved 19 January 2012.
- ^ “SAP Solutions: SAP High Performance Analytic Appliance”. Fujitsu Limited. Retrieved 19 January 2012.
- ^ “New Hitachi Converged Platform for SAP HANA Helps Organizations Manage and Analyze Massive Volumes of Critical Data”. Hitachi Data Systems Corporation. Retrieved 19 January 2012.
- ^ “HP AppSystem for SAP HANA™”. Hewlett-Packard Development Company, L.P.
- ^ “IBM Systems and Services for SAP HANA”. IBM.
- ^ “NECs Appliance Server for SAP HANA(R) Certified by SAP”. NEC.
- ^ Scott, Jennifer (9 November 2011). “SAP holds hands with EMC and VMware for cloud computing push”. CloudPro. Retrieved 25 January 2012.
- ^ Massimo Pezzini,, Daniel Sholler. “SAP Throws Down the Next-Generation Architecture Gauntlet With HANA (Research Note G00219001)”. Gartner. Retrieved 25 January 2012.
- ^ “SAP HANA® Cloud Portal Evaluated by Gartner”. Retrieved 20 March 2013.
- ^ “SAP HANA One AWS Marketplace”. Retrieved 24 March 2013.
- ^ Große, Philipp; Wolfgang Lehner, Thomas Weichert, Franz Färber, Wen-Syan Li (3 September 2011). “Bridging Two Worlds with RICE: Integrating R into the SAP In-Memory Computing Engine”. Proceedings of the VLDB Endowment 4 (12): 1307–1317. Retrieved 25 January 2012.
- ^ HANA Pocketbook-DRAFT.pdf “HANA Pocketbook for Developers – DRAFT”. SAP. Retrieved 23 January 2012.
- ^ Jitender Aswani; Jens Doerpmund. “Advanced Analytics with R and SAP HANA”. Slideshare. Retrieved 2012-03-14.
- ^ “SAP HANA – Overview and Architecture”. ERPHowTos.com. Retrieved 23 January 2012.
- ^ a b c “SAP HANA Technical Operations Manual”. SAP. Retrieved 23 January 2012.
- ^ Holder, Steve. “Why Hana: Where (and When) HANA Fits in Your Company’s Analytics Strategy”. SAP Canada. Retrieved 25 January 2012.
- ^ Sevilla, Manuel. “OLAP databases are being killed by In-Memory solutions”. CapGemini. Retrieved 25 January 2012.
- ^ “SAP HANA Information Composer”. SAP. Retrieved 25 January 2012.
- ^ Sikka, Vishal (29 December 2011). “The renewal of enterprise landscapes”. Financial Times. Retrieved 26 January 2012.
- ^ Winter, Richard. “Audit Letter for the SAP HANA Performance Test, March 16, 2011”. Wintercorp. Retrieved 24 January 2012.
- Implementing SAP HANA, an End-to-End Perspective
- IBM Systems and Services for SAP HANA
- Learn about SAP HANA and In-Memory Business Data Management and visit the SAP HANA Developer Center on SAP Community Network (SCN)
- academy.saphana.com – Provides short video tutorials covering numerous SAP HANA topics
- cloud.saphana.com – Learn about SAP HANA One
- New Tools for New Times – Primer on Big Data, Hadoop and ‘In-memory’ Data Clouds
- SAP MaxDB Overview
- The State of SAP HANA – Four SAP Mentors Share Their Views, JonERP.com
- Building High Performance Analytics Applications on SAP HANA Databases, Wen-Syan Li
- Column Stores vs. Row Stores: How Different Are They Really?
- SAP® HANA for ERP Financials