TABDDPM: MODELLING TABULAR DATA WITH DIFFUSION MODELS

Abstract

Denoising diffusion probabilistic models are currently becoming the leading paradigm of generative modeling for many important data modalities. Being the most prevalent in the computer vision community, diffusion models have also recently gained some attention in other domains, including speech, NLP, and graph-like data. In this work, we investigate if the framework of diffusion models can be advantageous for general tabular problems, where datapoints are typically represented by vectors of heterogeneous features. The inherent heterogeneity of tabular data makes it quite challenging for accurate modeling, since the individual features can be of completely different nature, i.e., some of them can be continuous and some of them can be discrete. To address such data types, we introduce TabDDPM -a diffusion model that can be universally applied to any tabular dataset and handles any type of feature. We extensively evaluate TabDDPM on a wide set of benchmarks and demonstrate its superiority over existing GAN/VAE alternatives, which is consistent with the advantage of diffusion models in other fields. Additionally, we show that TabDDPM is eligible for privacy-oriented setups, where the original datapoints cannot be publicly shared.

1. INTRODUCTION

Denoising diffusion probabilistic models (DDPM) (Sohl-Dickstein et al., 2015; Ho et al., 2020) have recently become an object of great research interest in the generative modelling community since they often outperform the alternative approaches both in terms of the realism of individual samples and their diversity (Dhariwal & Nichol, 2021) . The most impressive successes of DDPM were demonstrated in the domain of natural images (Dhariwal & Nichol, 2021; Saharia et al., 2022; Rombach et al., 2022) , where the advantages of diffusion models are successfully exploited in applications, such as colorization (Song et al., 2021) , inpainting (Song et al., 2021) , segmentation Baranchuk et al. (2021) , super-resolution (Saharia et al., 2021; Li et al., 2021) , semantic editing (Meng et al., 2021) and others. Beyond computer vision, the DDPM framework is also investigated in other fields, such as NLP (Austin et al., 2021; Li et al., 2022) , waveform signal processing (Kong et al., 2020; Chen et al., 2020) , molecular graphs (Jing et al., 2022; Hoogeboom et al., 2022) , time series (Tashiro et al., 2021) , testifying the universality of diffusion models across a wide range of problems. The aim of our work is to understand if the universality of DDPM can be extended to the case of general tabular problems, which are ubiquitous in various industrial applications that include data described by a set of heterogeneous features. For many such applications, the demand for highquality generative models is especially acute because of the modern privacy regulations, like GDPR, which prevent publishing real user data, while the synthetic data produced by generative models can be shared. Training a high-quality model of tabular data, however, can be more challenging compared to computer vision or NLP due to the heterogeneity of individual features and relatively small sizes of typical tabular datasets. In our paper, we show that despite these two intricacies, the diffusion models can successfully approximate typical distributions of tabular data, leading to state-of-the-art performance on most of the benchmarks. In more detail, the main contributions of our work are the following: 1. We introduce TabDDPM -the simplest design of DDPM for tabular problems that can be applied to any tabular task and can work with mixed data, which includes both numerical and categorical features. 2. We demonstrate that TabDDPM outperforms the alternative approaches designed for tabular data, including GAN-based and VAE-based models from the literature, and illustrate the sources of this advantage for several datasets. 3. We show that data produced by TabDDPM appears to be a "sweet spot" for privacyconcerned scenarios when synthetics are used to substitute the real user data that cannot be shared. The source code of TabDDPM is publicly availablefoot_0 .

2. RELATED WORK

Diffusion models (Sohl-Dickstein et al., 2015; Ho et al., 2020) are a paradigm of generative modelling that aims to approximate the target distribution by the endpoint of the Markov chain, which starts from a given parametric distribution, typically a standard Gaussian. Each Markov step is performed by a deep neural network that effectively learns to invert the diffusion process with a known Gaussian kernel. Ho et al. demonstrated the equivalence of diffusion models and score matching (Song & Ermon, 2019; 2020) , showing them to be two different perspectives on the gradual conversion of a simple known distribution into a target distribution via the iterative denoising process. Several recent works (Nichol, 2021; Dhariwal & Nichol, 2021) have developed more powerful model architectures as well as different advanced learning protocols, which led to the "victory" of DDPM over GANs in terms of generative quality and diversity in the computer vision field. In our work, we demonstrate that one can also successfully use diffusion models for tabular problems. Generative models for tabular problems are currently an active research direction in the machine learning community (Xu et al., 2019; Engelmann & Lessmann, 2021; Jordon et al., 2018; Fan et al., 2020; Torfi et al., 2022; Zhao et al., 2021; Kim et al., 2021; Zhang et al., 2021; Nock & Guillame-Bert, 2022; Wen et al., 2022) since high-quality synthetic data is of large demand for many tabular tasks. First, the tabular datasets are often limited in size, unlike in vision or NLP problems, for which huge "extra" data is available on the Internet. Second, proper synthetic datasets do not contain actual user data, therefore they are not subject to GDPR-like regulations and can be publicly shared without violation of anonymity. The recent works have developed a large number of models, including tabular VAEs (Xu et al., 2019) and GAN-based approaches (Xu et al., 2019; Engelmann & Lessmann, 2021; Jordon et al., 2018; Fan et al., 2020; Torfi et al., 2022; Zhao et al., 2021; Kim et al., 2021; Zhang et al., 2021; Nock & Guillame-Bert, 2022; Wen et al., 2022) . By extensive evaluations on a large number of public benchmarks, we show that our TabDDPM model surpasses the existing alternatives, often by a large margin. "Shallow" synthetics generation. Unlike unstructured images or natural texts, tabular data is typically structured, i.e., the individual features are often interpretable and it is not clear if their modelling requires several layers of "deep" architectures. Therefore, the simple interpolation techniques, like SMOTE (Chawla et al., 2002) (originally proposed to address class-imbalance) can serve as simple and powerful solutions as demonstrated in (Camino et al., 2020) , where SMOTE is shown to outperform tabular GANs for minor class oversampling. In the experiments, we demonstrate the advantage of synthetics produced by TabDDPM over synthetics produced by interpolation techniques from the privacy-preserving perspective.

3. BACKGROUND

Diffusion models (Sohl-Dickstein et al., 2015; Ho et al., 2020) are likelihood-based generative models that handle the data through forward and reverse Markov processes. The forward process q (x 1:T |x 0 ) = T t=1 q (x t |x t-1 ) gradually adds noise to an initial sample x 0 from the data distribution q (x 0 ) sampling noise from the predefined distributions q (x t |x t-1 ) with variances {β 1 , ..., β T }. The reverse diffusion process p (x 0:T ) = T t=1 p (x t-1 |x t ) gradually denoises a latent variable x T ∼q (x T ) and allows generating new data samples from q(x 0 ). Distributions p (x t-1 |x t ) are usually unknown and approximated by a neural network with parameters θ. These parameters are learned from the data by optimizing a variational lower bound: log q (x 0 ) ≥ E q(x0) log p θ (x 0 |x 1 ) L0 -KL (q (x T |x 0 ) |q (x T )) L T - T t=2 KL (q (x t-1 |x t , x 0 ) |p θ (x t-1 |x t )) Lt (1) Gaussian diffusion models operate in continuous spaces (x t ∈ R n ) where forward and reverse processes are characterized by Gaussian distributions: q (x t |x t-1 ) := N x t ; 1 -β t x t-1 , β t I q (x T ) := N (x T ; 0, I) p θ (x t-1 |x t ) := N (x t-1 ; µ θ (x t , t) , Σ θ (x t , t)) Ho et al. (2020) suggest using diagonal Σ θ (x t , t) with a constant σ t and computing µ θ (x t , t) as a function of x t and ϵ θ (x t , t): µ θ (x t , t) = 1 √ α t x t - β t √ 1 -ᾱt ϵ θ (x t , t) where α t := 1 -β t , ᾱt := i≤t α i and ϵ θ (x t , t) predicts a "groundtruth" noise component ϵ for the noisy data sample x t . In practice, the objective (1) can be simplified to the sum of mean-squared errors between ϵ θ (x t , t) and ϵ over all timesteps t: L simple t = E x0,ϵ,t ∥ϵ -ϵ θ (x t , t)∥ 2 2 (2) Multinomial diffusion models (Hoogeboom et al., 2021) are designed to generate categorical data where x t ∈ {0, 1} K is a one-hot encoded categorical variable with K values. The multinomial forward diffusion process defines q (x t |x t-1 ) as a categorical distribution that corrupts the data by uniform noise over K classes: q(x t |x t-1 ) := Cat (x t ; (1 -β t ) x t-1 + β t /K) q (x T ) := Cat (x T ; 1/K) q (x t |x 0 ) = Cat (x t ; ᾱt x 0 + (1 -ᾱt ) /K) From the equations above, the posterior q(x t-1 |x t , x 0 ) can be derived: q (x t-1 |x t , x 0 ) = Cat x t-1 ; π/ K k=1 π k where π = [α t x t + (1 -α t )/K] ⊙ [ᾱ t-1 x 0 + (1 -ᾱt-1 )/K]. The reverse distribution p θ (x t-1 |x t ) is parameterized as q (x t-1 |x t , x0 (x t , t)), where x0 is predicted by a neural network. Then, the model is trained to maximize the variational lower bound (1).

4. TABDDPM

In this section, we describe the design of TabDDPM as well as its main hyperparameters, which affect the model's effectiveness. TabDDPM uses the multinomial diffusion to model the categorical and binary features, and the Gaussian diffusion to model the numerical ones. In more detail, for a tabular data sample x = [x num , x cat1 , ..., x cat C ], that consists of N num numerical features x num ∈ R Nnum and C categorical features x cati with K i categories each, our model takes one-hot encoded versions of categorical features as an input (i.e. x ohe cati ∈ {0, 1} Ki ) and normalized numerical features. Therefore, the input x 0 has a dimensionality of (N num + K i ). For preprocessing, we use the gaussian quantile transformation from the scikit-learn library (Pedregosa et al., 2011) . Each categorical feature is handled by a separate forward diffusion process, i.e., the noise components for all features are sampled independently. The reverse diffusion step in TabDDPM is modelled by a multi-layer neural network that has an output of the same dimensionality as x 0 , where the first N num coordinates are the predictions of ϵ for the Gaussian diffusion and the rest are the predictions of x ohe cati for the multinomial diffusions.

Hyperparameter

Search 2)) for the Gaussian diffusion term and the KL divergences L i t for each multinomial diffusion term (Equation ( 1)). The total loss of multinomial diffusions is additionally divided by the number of categorical features. L T abDDP M t = L simple t + i≤C L i t C For classification datasets, we use a classconditional model, i.e. p θ (x t-1 |x t , y) is learned. For regression datasets, we consider a target value as an additional numerical feature, and the joint distribution is learned. To model the reverse process, we use a simple MLP architecture adapted from (Gorishniy et al., 2021) : MLP(x) = Linear (MLPBlock (. . . (MLPBlock(x)))) MLPBlock(x) = Dropout(ReLU(Linear(x))) As in (Nichol, 2021; Dhariwal & Nichol, 2021) , a tabular input x in , a timestep t, and a class label y are processed as follows. t emb = Linear(SiLU(Linear(SinTimeEmb(t)))) y emb = Embedding(y) x = Linear(x in ) + t emb + y emb where SinTimeEmb refers to a sinusoidal time embedding as in (Nichol, 2021; Dhariwal & Nichol, 2021) Table 2 : List of datasets used for the evaluation and their descriptions. Hyperparameters in TabDDPM are essential since in the experiments we observed them having a strong influence on the model effectiveness. Table 1 lists the main hyperparameters as well as the search spaces for each of them, which we recommend to use. The process of tuning is described in detail in the experimental section.

5. EXPERIMENTS

In this section, we extensively evaluate TabDDPM against existing alternatives. Datasets. For systematic investigation of the performance of tabular generative models, we consider a diverse set of 15 real-world public datasets. These datasets have various sizes, nature, number of features, and their distributions. Most datasets were previously used for tabular model evaluation in (Zhao et al., 2021; Gorishniy et al., 2021) . The full list of datasets and their properties are presented in Table 2 . Baselines. Since the number of generative models proposed for tabular data is enormous, we evaluate TabDDPM only against the leading approaches from each paradigm of generative modelling. Also, we consider only the baselines with the published source code. • TVAE (Xu et al., 2019) -the state-of-the-art variational auto-encoder for tabular data generation. To the best of our knowledge, there are no alternative VAE-like models that outperform TVAE and have public source code. • CTABGAN (Zhao et al., 2021 ) -a recent GAN-based model that is shown to outperform the existing tabular GANs on a diverse set of benchmarks. This approach cannot handle regression tasks. • CTABGAN+ (Zhao et al., 2022) -an extension of the CTABGAN model that was published in the very recent preprint. We are not aware of the GAN-based model for tabular data that is proposed after CTABGAN+ and has a public source code. • SMOTE (Chawla et al., 2002) -a "shallow" interpolation-based method that "generates" a synthetic point as a convex combination of a real data point and its k-th nearest neighbor from the dataset. This method was originally proposed for minor class oversampling. Here, we generalize it and apply it to synthetic data generation as a simple sanity check. Evaluation measure. Our primary evaluation measure is machine learning (ML) efficiency (or utility) (Xu et al., 2019) . In more detail, ML efficiency quantifies the performance of classification or regression models that are trained on synthetic data and evaluated on the real test set. Intuitively, models trained on high-quality synthetics should be competitive (or even superior) to models trained on real data. In our experiments, we use two evaluation protocols to compute ML efficiency. In the first protocol, which is more common in the literature (Xu et al., 2019; Zhao et al., 2021; Kim et al., 2022) , we compute an average efficiency with respect to a set of diverse ML models (logistic regression, decision tree, and others). In the second protocol, we evaluate ML efficiency only with respect to the CatBoost model (Prokhorenkova et al., 2018) , which is arguably the leading GBDT implementation providing state-of-the-art performance on tabular tasks Gorishniy et al. (2021) . In our experiments in subsection 5.2, we show that it is crucial to use the second protocol, while the first one can often be misleading. Tuning process. To tune the hyperparameters of TabDDPM and the baselines, we use the Optuna library (Akiba et al., 2019) . The tuning process is guided by the values of the ML efficiency (with respect to Catboost) of the generated synthetic data on a hold-out validation dataset (the score is averaged over five different sampling seeds). The search spaces for all hyperparameters of TabDDPM are reported in Table 1 (for baselines -in Appendix C). Additionally, we demonstrate that tuning the hyperparameters using the CatBoost guidance does not introduce any sort of "Catboost-biasedness", and the Catboost-tuned TabDDPM produces synthetics that are also superior for other models, like MLP. These results are reported in Appendix A.

5.1. QUALITATIVE COMPARISON

Here, we qualitatively investigate the ability of TabDDPM to model the individual and joint feature distributions compared to the TVAE and CTABGAN+ baselines. In particular, for each dataset, we sample a synthetic dataset from TabDDPM, TVAE, and CTABGAN+ of the same size as a real train set in a particular dataset. For classification datasets, each class is sampled according to its proportion in the real dataset. Then, we visualize the typical individual feature distributions for real and synthetic data in Figure 2 . For completeness, the features of different types and distributions are presented. In most cases, TabDDPM produces more realistic feature distributions compared to TVAE and CTABGAN+. The advantage is more pronounced (1) for numerical features, which are uniformly distributed, (2) for categorical features with high cardinality, and (3) for mixed type features that combine continuous and discrete distributions. Then, we also visualize the differences between the correlation matrices computed on real and synthetic data for different datasets, see Figure 3 . To compute the correlation matrices, we use the Pearson correlation coefficient for numerical-numerical correlations, the correlation Ratio for categorical-numerical cases, and Theil's U statistic between categorical features. Compared to CTABGAN+ and TVAE, TabDDPM generates synthetic datasets with more realistic pairwise correlations. These illustrations indicate that our TabDDPM model is more flexible compared to alternatives and produces superior synthetic data.

5.2. MACHINE LEARNING EFFICIENCY

In this section, we compare TabDDPM to alternative generative models in terms of machine learning efficiency. From each generative model, we sample a synthetic dataset with the size of a real train set in proportion from Table 1 . This synthetic data is then used to train a classification/regression model, which is then evaluated using the real test set. In our experiments, classification performance is evaluated by the F1 score, and regression performance is evaluated by the R2 score. We use two protocols: 1. First, we compute average ML efficiency for a diverse set of ML models, as performed in previous works (Xu et al., 2019; Zhao et al., 2021; Kim et al., 2022) . This set includes Decision Tree, Random Forest, Logistic Regression (or Ridge Regression) and MLP models from the scikit-learn library (Pedregosa et al., 2011) with the default hyperparameters except for "max-depth" equals 28 for Decision Tree and Random Forest, "maximum iterations" equals 500 for Logistic and Ridge regressions, and "maximum iterations" equals 100 for MLPs. 2. Second, we compute ML efficiency with respect to the current state-of-the-art model for tabular data. Specifically, we consider CatBoost (Prokhorenkova et al., 2018) and MLP architecture from (Gorishniy et al., 2021) for evaluation. CatBoost and MLP hyperparameters are thoroughly tuned on each dataset using the search spaces from (Gorishniy et al., 2021) . We argue that this evaluation protocol demonstrates the practical value of synthetic data more reliably since in most real scenarios practitioners are not interested in using weak and suboptimal classifiers/regressors. Main results. The ML efficiency values computed by both protocols are presented in Table 3 and in Table 4 : The values of machine learning efficiency computed with regards to the state-of-the-art tuned CatBoost model. dataset, we average over ten random seeds for training classifiers/regressors. The key observations are described below: • In both evaluation protocols, TabDDPM significantly outperforms TVAE and CTABGAN+ on most datasets, which highlights the advantage of diffusion models for tabular data as well as demonstrated for other domains in prior works. • The interpolation-based SMOTE method demonstrates the performance competitive to TabDDPM and often significantly outperforms the GAN/VAE approaches. Interestingly, most of the prior works on generative models for tabular data do not compare against SMOTE, while it appears to be a simple baseline, which is challenging to beat. • While many prior works use the first evaluation protocol to compute the ML efficiency, we argue that the second one (which uses the state-of-the-art model, like CatBoost) is more appropriate. Table 3 and Table 4 show that the absolute values of classification/regression performance are much lower for the first protocol, i.e., weak classifiers/regressors are substantially inferior to CatBoost on the considered benchmarks. Therefore, one can hardly use these suboptimal models instead of CatBoost and their performance values are uninformative for practitioners. Moreover, in the first protocol, training on synthetic data is often advantageous compared to training on real data. This creates an impression that the data produced by generative models are more valuable than the real ones. However, it is not the case when one uses the tuned ML model, as in most practical scenarios. Appendix A confirms this observation for the properly tuned MLP model. Overall, TabDDPM provides state-of-the-art generative performance and can be used as a source of high-quality synthetic data. Interestingly, in terms of ML efficiency, a simple "shallow" SMOTE method is competitive to TabDDPM, which raises the question if sophisticated deep generative models are needed. In the section below, we provide an affirmative answer to this question.

5.3. PRIVACY

Here, we demonstrate that TabDDPM is preferable to SMOTE in setups with privacy concerns, e.g., sharing the data without disclosure of personal or sensitive information. In these setups, one is interested in high-quality synthetics that do not reveal the datapoints from the original real dataset. To quantify the privacy of synthetic, we use a median Distance to Closest Record (DCR) (Zhao et al., 2021) between synthetic and real datapoints. Specifically, for each synthetic sample, we find the minimum distance to real datapoints and take the median of these distances. Low DCR values indicate that all synthetic samples are essentially copies of some real datapoints, which violates the privacy requirements. In contrast, larger DCR values indicate that the generative model can produce something "new" rather than just copies of real data. 

6. CONCLUSION

In this paper, we have investigated the prospect of the diffusion modelling framework in the field of tabular data. In particular, we have described the design of DDPM that can handle mixed data consisting of numerical, ordinal, and categorical features. We also demonstrate the importance of the model's hyperparameters and explain the protocol of their tuning. For the most considered benchmarks, the synthetics produced by our model has consistently higher quality compared to ones produced by the GAN/VAE-based rivals and interpolation techniques, especially for the setups, where the privacy of the data must be ensured.

APPENDIX A MLP EVALUATION AND TUNING

Here, we show that tuning the hyperparameters using the CatBoost guidance results in the TabDDPM models that produce synthetics that is also optimal for other classifiers/regressors. The results for a subset of datasets are presented on Table 6 . The methods denoted with "-CB" and "-MLP" denote the CatBoost guidance and different types of evaluation (CatBoost and MLP, respectively). The "-MLP-tune" suffix stands for the MLP guidance tuning and MLP evaluation.  AB (R2) AD (F 1) BU (F 1) CA (R2) CAR (F 1) CH (F 1) DE (F 1) DI (F 1)

B ADDITIONAL RESULTS

Here, we provide results for CTGAN Xu et al. (2019) model (Table 7 ). We also follow Zhao et al. (2021) and provide an additional quantitative comparison that shows how well individual feature distributions are modelled ( 



URL https://github.com/Team-TUD/CTAB-GAN-Plus 3 https://github.com/sdv-dev/CTGAN



Figure 1: TabDDPM scheme for classification problems; t, y and ℓ denote a diffusion timestep, a class label, and logits, respectively.

Figure 2: The individual feature distributions for the real data and the data generated by TabDDPM, CTABGAN+, and TVAE. TabDDPM produces more realistic feature distributions than alternatives in most cases.

Figure3: The absolute difference between correlation matrices computed on real and synthetic datasets. A more intensive red colour indicates a higher difference between the real and synthetic correlation values.

Figure 4: Histograms of minimal synthetic-to-real distances for TabDDM and SMOTE.

The main hyperparameters of TabDDPM.

The ML efficiency for the tuned MLP is reported in Appendix A. To compute each value, we average the results over five random seeds for synthetics generation, and for each generated

The values of machine learning efficiency computed with regards to five weak classification/regression models. Negative scores denote negative R2, which means that performance is worse than an optimal constant prediction.

Table 5 compares the DCR values for SMOTE and TabDDPM and demonstrates the advantage of TabDDPM consistently for all datasets.We also visualize histograms of the minimal synthetic-to-real distances on Figure4. For SMOTE, most distance values are concentrated around zero, while TabDDPM samples are better separated from real datapoints. This experiment confirms that TabDDPM synthetics while providing high ML efficiency, are also more appropriate for privacy-concerned scenarios. TabDDPM 0.550 0.050 0.795 0.104 0.906 0.143 0.836 0.041 0.737 0.012 0.755 0.157 0.691 0.112 0.740 0.204

ML efficiency CatBoost scores and privacy scores for SMOTE and TabDDPM models.

ML utility score with MLP evaluation and MLP tuning compared with CatBoost evaluation and CatBoost tuning.

Table 9, Table 10). Finally, we include density and coverage metrics from Naeem et al. (2020) that are improved alternatives of precision and recall, respectively (Table 11, Table 12). Real 0.837±.001 0.636±.007 0.724±.001 0.662±.003 0.814±.001 0.907±.002 0.934±.000 0.898±.006

ML utility score with CatBoost evaluation. .010 0.015 0.004 0.004 0.009 0.004 0.085 TVAE 0.020 0.016 0.039 0.007 0.027 0.049 0.009 0.044 CTABGAN+ 0.008 0.011 0.016 0.019 0.003 0.046 0.022 0.016 SMOTE 0.002 0.003 0.005 0.002 0.001 0.006 0.002 0.020 TabDDPM 0.005 0.002 0.003 0.002 0.000 0.005 0.012 0.008

Wasserstein distance between numerical features. .085 0.168 nan 0.076 0.039 0.120 0.298 TVAE 0.027 0.095 0.072 nan 0.181 0.019 0.157 0.052 CTABGAN+ 0.035 0.052 0.037 nan 0.009 0.018 0.030 0.017 SMOTE 0.005 0.074 0.072 nan 0.069 0.030 0.058 0.004 TabDDPM 0.007 0.019 0.026 nan 0.011 0.017 0.009 0.006 .240 0.091 nan 0.071 0.296 0.140 0.532 TVAE 0.246 0.113 0.040 nan 0.033 0.098 0.066 0.149 CTABGAN+ 0.051 0.094 0.009 nan 0.023 0.044 0.075 0.017 SMOTE 0.027 0.000 0.000 nan 0.013 0.102 0.000 0.000 TabDDPM 0.046 0.001 0.001 nan 0.008 0.060 0.000 0.002

Jensen-Shannon divergence between categorical features.

L2 distance between correlation matrices.

Density of synthetic data.

D DATASETS

We used the following datasets:• Abalone (OpenML)• Adult (income estimation, Kohavi (1996) • House 16H (OpenML)• Insurance (Kaggle)• King (Kaggle)• MiniBooNE (OpenML)• Wilt (OpenML)

E ENVIRONMENT AND RUNTIME

Experiments were conducted under Ubuntu 20.04 on a machine equipped with GeForce RTX 2080 Ti GPU and Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz. We used Pytorch 10.1, CUDA 11.3, scikit-learn 1.1.2 and imbalanced-learn 0.9.1 (for SMOTE).As for runtime of the proposed method, it depends on the dataset and hyperparameters 

