DATA PRICING MECHANISM BASED ON PROPERTY RIGHTS COMPENSATION DISTRIBUTION CONFERENCE SUBMISSIONS

Abstract

While machine learning (ML) benefits from data, it also faces the challenges of ambiguous data ownership, including privacy violations and increased costs of using data. This suggests that the value created by data is determined not only by its utility but also by the cost of using the data (negative externalities). The existing pricing methods mainly value data based on its utility but ignore the negative externalities caused by fuzzy ownership, therefore can not design an efficient pricing mechanism. Throughout the data life cycle (creation, pre-processing, training, etc.), the usufruct and ownership of the data are transferred at the same time, so the benefits and costs are generated simultaneously. Considering that data rights confirmation and data pricing cannot be separated independently in the process of data transaction, we propose the first data valuation mechanism based on modern property rights theory in this paper. Specifically, we propose to clarify the ownership of property rights through the integration of property rights and improve the final revenue of the whole workflow "data link" through the form of the whole collective, while compensating process performers who lose ownership after the integration. Then, we consider the expectations of both the integrator and the integrated party during the compensation allocation. For the former, we apply compound interest to assess a total compensation equivalent to the time value for the Data chain. For the latter, we respect and meet their expectations as much as possible. To achieve this, we provide the framework based on Least-core to assign the compensation and prove that our framework can also work compared to existing algorithms. Finally, to cope with more complex situations, we adjust the traditional Least-core and demonstrate theoretically and experimentally that the compensation mechanism is feasible and effective in solving the data pricing problem.

1. INTRODUCTIOIN

While ML benefits from data, it also faces challenges brought by the ambiguity of data ownership (Maini et al., 2021) . According to the life cycle, data is generated by the producer and then passes through agents such as data preprocessors and model pre-trainers, and finally generates value through model training. In this process, data ownership and access are transferred simultaneously, which makes the utilization of data have a cost: On the one hand, this makes the subject of privacy protection unclear, posing the potential for privacy violations. The Cambridge Analytica scandal, for example, where Facebook and Cambridge Analytica collected the personal data of up to 87 million Facebook users without their consent (Kelly), gave rise to an unprecedented discussion on data privacy. On the other hand, even though a large number of data pricing models are already available in the data marketplace to value data (Chen et al., 2019; Liu, 2020; Li et al., 2013; Koutris et al., 2013; 2015) , each use of a dataset requires a valuation, since datasets behave differently in different models. This increases the negative externalities of ML (e.g., huge computational volumes). The imbalance between the benefits of data and the costs of using it makes clarifying property rights an important issue. Demsetz (1967) points out that the generation of property rights is essentially a process of cost-benefit tradeoff. Property rights arise when the benefits of internalizing externalities by defining property rights are greater than the costs of engaging in the act. Furthermore, the purpose

