Improving Data Security, Interoperability, and Veracity using Blockchain for One Data Governance, Case Study of Local Tax Big Data

Improving Data Security, Interoperability, and Veracity using Blockchain for One Data Governance, Case Study of Local Tax Big Data

 


 

 

Satriyo Wibowo
Indonesia Cyber Security Forum

Jakarta, Indonesia

satriyowibowo@icsf.or.id

Tesar Sandikapura

Blockchain Nusantara

Jakarta, Indonesia

tesar@litebig.com

 

 

 

Abstract— Presidential Decree on One Data Indonesia is intended to govern data produced by central agencies and local agencies to support planning, implementation, evaluation, and development control, including one of them is the local tax. It is a Big Data development contains a lot of data from the central and local government of Indonesia. The defining factors of data collection on Big Data are volume, velocity, variety, and veracity. Volume and velocity state how much and how soon the data is generated. Variety states the condition of the data is structured or not, while veracity speaks the level of trust in data validity. Data veracity is a big problem on Big Data Analytics and using data integrity protection feature and other methods, Blockchain can offer solutions to improve data interoperability, security, and veracity.

Keywords—One Data Indonesia, Local Tax, Big Data Analytics, Blockchain

                                                                                                                                                               I.        Introduction

Presidential Decree One Data Indonesia is intended to govern data produced by central agencies and local agencies to support planning, implementation, evaluation, and development control. This rule was issued because previously the collected data was inaccurate, not current, not integrated, could not be accounted for, difficult to access and share. Data is a record of a collection of facts or descriptions in the form of numbers, characters, symbols, pictures, maps, signs, signs, writing, sound, and / or sound, which represent the actual situation or indicate an idea, object, condition, or situation. Form of data are statistical, geospatial, and monetary [1].

Statistical data in the form of numbers about the characteristics or special characteristics of a population obtained by means of collection, processing, presentation, and analysis. Geospatial Data is about the geographical location, dimensions or characteristics of natural and / or man-made objects that are below, on, or above the surface of the earth. National Financial Data is data compiled based on a government accounting system that covers all rights and obligations of the state that can be valued in money, as well as everything in the form of money or in the form of goods that can be owned by the state in connection with the implementation of these rights and obligations [1].

Principles of One Data Indonesia are standard data compliance, metadata possession, interoperability, reference code or data basis. Data Standards are standards that underlie certain data. Metadata is information in the form of structures and standard formats to describe data, explain data, and facilitate the search, use, and management of data information. Data Interoperability is the ability of data to be shared between interacting electronic systems. Reference Code is a sign containing characters that contain or illustrate certain meanings, intentions, or norms as a unique identity data reference. Main Data is Data that represents objects in government business processes that are determined in accordance with the provisions in this Presidential Regulation to be used together.

Local tax governed by Local Tax and Retribution Law is a mandatory contribution to regions owed by individuals or entities that are coercive based on the Law, with no direct compensation and used for regional needs for the maximum prosperity of the people and the results will go to the Local Budget (APBD). There are two kinds of local tax and retribution: province and city/regency.

Province tax and retribution consist of vehicle tax, transfer of vehicle name retribution, vehicle fuel tax, surface water tax, and cigarette tax. City/regency tax and retribution consists of hotel tax, restaurant tax, entertainment tax, billboard tax, street lighting tax, nonmetallic and rock mineral tax, parking tax, groundwater tax, swallow bird nest tax, rural and urban land and building tax, land or building acquisition retribution.

Even though this tax and retribution are the main income for local government besides development funds from central government, not all can be managed properly. Different local government has different resource, budget, and competency that cause inaccurate report and leakage and, in the end, causing inaccurate planning on the central government. One Data Indonesia acts as a Big Data to coup this problem. However, data security, interoperability, and veracity will be the biggest challenge because without reliable and trusted data, the planner will not confident to use it. Blockchain as an emerging technology has a potentiality to answer the problem.

                                                                                                                                     II.       Problems Statements and Methods

A.   Problems Statement

The problem statement in this paper is how can Blockchain help to improve data security, interoperability, and veracity of Local Tax Big Data?

B.   Research Method and Limitation

The methods used are normative methods with the primary data derived from discussion with local government administrator and secondary data derived from one data and local tax regulation and implementation. Local Tax Big Data system will be assessed using BIAF to make sure this case is suitable for Blockchain solution. If it is, it will be analyzed which area can Blockchain give solution.

                                                                                                                                                          III.      Empirical Study

One Data Indonesia is basically a central government effort to build a Big Data contains development data from central to local government. To build it, there are principles of standard data compliance, metadata possession, interoperability, reference code or data basis. However, the analytic of big data will be depend on data veracity.

A.   Big Data Analytics

Big data analytics using advanced techniques to analyse very large and diverse data sets that include structured, semi-structured and unstructured data. It can be collected from different sources, in different sizes from terabytes to zettabytes [2]. Data sets whose size or type is beyond the ability of traditional relational databases to capture, manage, and process the data with low-latency and then called by Big data which has one or more of the following characteristics – high volume, high velocity, or high variety. Volume and velocity state how much and how soon the data is generated, then variety states the condition of the data is structured or not. Big data comes from many types of sensors, devices, video/audio, networks, log files, transactional applications, web, and social media - usually generated in real-time and on a very large scale.

The defining factors of data collection on Big Data are volume, velocity, variety, and veracity. Volume and velocity state how much and how soon the data is generated. Variety states the condition of the data is structured or not, while veracity speaks the level of trust in data validity. Data is being generated at a tremendous pace and there must be enough measures in place to verify the nature of Big data. Research has shown that 80% of the big data is uncertain data [3] and analysis performed on that uncertain data may lead to untrusted results or shaping decisions poorly. Veracity means that the data can be sure so it can be trusted.  However, data can also be known only to some extent or to some confidence level, just like sensor data that is only known up to some precision. Based on Fig. 1 according to IBM, the red curve shows the proportion of data whose veracity is unknown.  By the end of 2015, the uncertain data approximately reach 80%.  Uncertainty about the veracity of data can come from various sources. For instance, measurement error in the case of sensors, or lack of credentials in case of social media. From this point of view, if there is no solution, it will become harder and harder to trust the analysis of Big data because of low level of data veracity.

image

Fig. 1.   Data Veracity (source: IBM)

B.   Blockchain

Blockchain is still heavily linked as Bitcoin, a booming digital currency that started the era of cryptocurrency. Using DEA tools that developed by Drone Emprit [4] and Indonesia Islamic University between July 4 to October 23, 2019, almost all twitter conversation with Blockchain hashtag is about ICO (Initial Coin Offering). Blockchain is indeed the technology which underlies cryptocurrency, however its use is not only for it.

World Economic Forum (WEF) tried to define Blockchain as a technology that allows parties to transfer assets to each other without intermediaries securely moreover enables transparency, immutable records, and autonomous execution of business rules. Blockchain actually is a part of database technology and has a distinctive working concept such as distributed database, peer-to-peer transmission, transparency, irreversible notes, and computational logic. Using a decentralized concept, Blockchain uses the concept of consensus to decide a transaction.

Distributed Database means that each party that joins Blockchain has access to all data and complete transaction history without exception. Each party may verify its partner transaction directly without a middleman. Peer-to-Peer Transmission means that communication or transaction occurs between one party and another without any intermediary node. Each node can store and forward information to another node. Transparency with Pseudo-anonymity (idle identity, especially on Public Blockchain) means such node or user in Blockchain has an address that contains 30 alphanumeric characters or more for user identification marks (such as username id). Irreversible notes or finality means that if the transaction has been recorded in the database, the record cannot be changed because the Blockchain system has a cryptography security system. Computational logic means that Blockchain can be programmed specifically so that transactions can be automatically performed when a criterion has been met.

Just like the crowd, there must be a consensus on how they decide on something. Because of different natures, consensus on Public Blockchain is very different from Private Blockchain [5]. Public Blockchain exists on untrusted network. Everybody can join the network, everybody can read and write the ledger. For consensus on Private Blockchain, it is much easier because naturally built on a trusted network. However, it still uses a Byzantine Fault Tolerant (BFT) to make sure all the information is delivered and there is no compromised node that can influence the validation. Easier consensus method leads to less resourcing power to process it.

C.   Blockchain Implementation Assessment Framework

Blockchain Implementation Assessment Framework (BIAF) is a framework for measuring suitability of a use-case with Blockchain implementation [6]. It is built for general purpose because not all problems can be solved by Blockchain. For example, even though logistics, supply chain management, remittance, are ideal use-cases, but if the regulation does not take a place yet, the implementation surely will suffer.

There are two levels of this assessment, the first one is a set of questions to make sure the case is a suitable case for Blockchain implementation. The second level is a framework built from the triangle of the Process-People-Technology framework. For each side, there will be set of assessments to make sure everything is ready before implementing Blockchain.

These are checkpoints to ascertain how Blockchain could be the solution:

1.   concerning a centralized database solution is considered a deficiency, maybe because it less secure, too many databases synchronization process, hard to track data history, and so on

2.   the asset, or key data that move across the system and will be changed by transaction, is a historical asset that its history matters

3.   the system needs a single database with the redundant dataset so that all stakeholder together can improve performance

4.   there are multi-stakeholders that hold different sets of data but complement each other to enrich the asset

5.   the location of each data source is spread over many locations

6.   a lot of participants who need to change the data, but only certain participants can change basic application

7.   there are personal data protection requirement

The most important checkpoints are the asset and multi-stake-holder here because Blockchain will be seen as giant database here by all nodes, even though there are many stakeholders that has its own dataset. Moreover, Blockchain can be set to permit communication on specific groups not to be accessed by others.

The second level is performed after conclusion that the case is a suitable case for Blockchain implementation. The assessment is based on the Process-People-Technology framework that must be prepared before the implementation could take place [7]. It is not all serial steps, mostly parallel one to make sure to gain enough momentum to move forward.

On the Process assessment, there will be checkpoints to fill such as:

1. stakeholder’s identification

2. regulation or governance readiness

3. risk identification

Diterbitkan oleh International Conference on ICT for Smar pada 23/10/2019