Kakao Pay | Customers

ClouderaNOW Navigate data architectures, sovereign clouds, & edge data for AI | On-Demand

Watch now

Impact

Ensure compliance with legal requirements that include data updates through the ability to modify or delete row-level data without the need to rewrite an entire partition.

Improve data recovery, even in the case of accidental deletion, through the snapshot capability

By utilizing Cloudera's Open Data Lakehouse, Kakao Pay has significantly boosted query performance, reducing data processing time by approximately 30%.

Solutions

Cloudera on premises

Cloudera Professional Services

Streaming data powered by Apache NiFi

SQL support for operational databases powered by Apache Phoenix

Data Architecture

Open Data Lakehouse powered by Apache Iceberg

Industry

Financial Services

Country

South Korea

Website

www.kakaopay.com

Kakao Pay is pursuing several strategies to effectively utilize data, including improving data quality, strengthening data analytics capabilities, driving data-driven decision-making, and strengthening data security.

Kakao Pay provides mobile payment and financial services via KakaoTalk. Launched in September 2014 as Korea’s first simple payment service, Kakao Pay has since expanded to include remittance, overseas payment, loan comparison, and wealth management services like securities, funds, and insurance, making financial services more accessible.

As a result, the strategy focuses on building a data platform that integrates various data sources, seamlessly analyzes them, and utilizes data at scale to provide end-users with a better financial experience and sustain growth.

Kakao Pay’s data platform collects and processes real-time and batch data, provides data to customers, visualizes it with business intelligence(BI) tools, operates the core platform, and establishes data governance to ensure a stable analysis environment.

Kakao Pay Enhances Data Management with Cloudera: Improved Analysis, Real-Time Processing, and Seamless Querying

Kakao Pay upgraded and migrated to the latest Cloudera version to modernize and take advantage of the innovation.

Kakao Pay’s deployment effectively consists of three stages that work together to deliver seamless management and analytics capabilities.

The first is an analysis cluster, which is the main cluster used for analyzing data. It consists of a Cloudera Base cluster and a Cloudera Data Services cluster, both deployed on-premises.

The first cluster contains Apache HDFS, Kudu (data storage), Ranger (data access and auditing), Oozie (workflow schedules), Impala, Hive, Spark (data processing), and Iceberg (open table format).

There are several components:

Apache HDFS and Apache Kudu are used for data storage.
Apache Ranger is used for managing data access and auditing.
Apache Oozie manages and schedules workflows.
Apache Impala, Hive, and Spark are for data processing
Apache Iceberg’s open table format is used for row-level data updates and deletes as well as snapshots.

The second is a real-time data serving cluster. It is a Phoenix cluster and consists of a disaster recovery cluster between multiple Internet Data Centers (IDC). Kakao Pay created an HBase connection manager system that detects problem clusters and changes active clusters to good clusters. It also uses the NiFi cluster for real-time data collection.

The last is a heterogeneous query cluster. It is a Trino cluster, built in Kubernetes. Trino allows querying across multiple sources without data collection. Kakao Pay uses this cluster to pre-check whether the data is suitable for regular collection or not.

Difficulty that takes too many resources and time

As a fintech company, Kakao Pay provides financial services and has to comply with legal requirements. One of the legal requirements is to periodically delete the data of unsubscribers. However, most of the data in the Hadoop Distributed File System(HDFS) cannot be deleted, and even with Impala on top of HDFS for query analytics, it was difficult for Kakao Pay to modify or delete row-level data. This meant that when it was time to update data, Kakao Pay had to rewrite the entire partition, consuming too many resources and time.

Kakao Pay considered using Kudu as a solution to this problem, but Kudu is designed for processing near-real-time data, making it unsuitable for simply deleting those who have left the system. Additionally, Kudu tables take a long time to load when there is too much data in them.

Another challenge was recovering data that was accidentally deleted by users. Previously, Kakao Pay had to go through the deleted data directory and recover data that had not yet expired, but if the Time to Live(TTL) had expired, recovery was impossible, and the data had to be re-ingested from ETL

Cloudera enabled row-level data modification and deletion

Kakao Pay adopted Apache Iceberg with Cloudera. After the implementation, they realized that Impala added the ability to query, delete, and update data in the Apache Iceberg tables.

This enabled Kakao Pay to modify and delete data on a row-level basis using Apache Iceberg. Additionally, the snapshot feature provided by Apache Iceberg simplifies data recovery if a user accidentally deletes data by allowing the viewing and rollback of past snapshots.

“Since the Apache Iceberg architecture searches metadata and filters before reading data, the amount of data that needs to be read to process a query has been significantly reduced,” said Steven Yoon, Senior Data Engineer at Kakao Pay. “This ultimately led to improved query performance, and we received feedback from users that query performance improved by about 30%.”

Yoon added that, “previously, both computing resources and storage resources were used on a single server, so if computing resources were insufficient, both had to be added even if storage resources were sufficient. However, in the Cloudera platform environment, if more computing resources are needed, they can be added independently. This resulted in efficient resource management and reduced hardware costs.

Future Data Strategy with Cloudera

“Using open source technology can be difficult. However, Cloudera provides pre-verified packaging to address the difficulties encountered when utilizing open source,” Yoon said. “Many data experts at Cloudera also analyzed the issues and provided solutions, related documents, and test results, which were very helpful.”

Kakao Pay is now considering building a hybrid environment where data can be loaded to the cloud and analyzed as needed in the future. Kakao Pay noted that the biggest competitive advantage and differentiation from other services is that, with Cloudera, it can utilize the cloud environment as well as the current setup it is using. Additionally, Kakao Pay is also reviewing whether the LLM provided by Cloudera can meet its needs.

Using open source technology can be difficult. However, Cloudera provides pre-verified packaging to address the difficulties encountered when utilizing open source. Many data experts at Cloudera also analyzed the issues and provided solutions, related documents, and test results, which were very helpful.

Steven Yoon, Senior Data Engineer at Kakao Pay

Kakao Pay: Iceberg innovations to make financial services more accessible

Impact

Kakao Pay Enhances Data Management with Cloudera: Improved Analysis, Real-Time Processing, and Seamless Querying

Difficulty that takes too many resources and time

Cloudera enabled row-level data modification and deletion

Future Data Strategy with Cloudera

Recommended

Your form submission has failed.