



1. The paper defines "node stability" as "the number of epochs wherein nodes held their rank in terms of weight value compared to their rank on the last epoch". By this definition, stability will always reach 100% in the last epoch, and little can be determined from the "exponential-looking" curve of DNNs stability. The high increase in the end is, by definition, inevitable, and does not really indicate the network's weights are "more stable". However, the underlying idea of measuring NN stability is interesting. Perhaps it would be better measured as stability of rank between each two consecutive epochs, and not using the last epoch as an anchor. Yes, the stability must always reach 100 percent by our definition. That, however, is not critical. The issue is when each node stabilizes. That process, over time, forms a tree structure. The importance of the definition is in establishing decision tree behavior and the implications thereof in model simplification. 2. Equivalence between node significance and its weight value is unfounded (although proving this would be interesting). Throughout the paper the absolute weight value of a node is used equivalently to its "significance" or "influence in performance". This is actually not backed up by theory or by empirical results. A node with high weight value may consistently receive inputs with very low absolute value (depending on the dataset), and a node with low weight value may receive inputs with high absolute value. The relationship between absolute weight value and the influence of that node on model performance is never tested in the paper. It would indeed be interesting to have an experiment where each node (or group of nodes) is pruned at a time, and performance of the pruned model is assessed, so we could draw some empirical conclusions on whether intermediate nodes are indeed less valuable than higher rank nodes. That is intrinsic in the definition of node rank. If the rank is maintained, the weights must be of high absolute value. If the weight absolute value is high for a node, that node must have been fed high valued weights, or it would lose its rank along training. And a high weight value must be more influential in model outcomes. The use of the term "decision tree", both in the title and throughout the paper, is a bit misleading. No DTs are actually used anywhere in the paper, it is continuously used as (somewhat arbitrary) analogy. Nodes with high absolute weight value do not behave as root nodes do in DTs (or at least this is not proven in the paper).

