DENSITY DISTRIBUTION OF DATA CONSIDERING OBJECT RELATIONSHIPS
Abstract
The problem of taking into account the influence of the distribution density of features on the structure of relationships between objects is considered. Analysis of the structure of relationships is necessary to find ways to increase the generalization ability of recognition algorithms. As a criterion for assessing the structure, it is proposed to use the values of the measure of compactness of class objects according to a given metric. A significant number of methods are based on the assumption of normal data distribution density. A technique is proposed for calculating the parameters of real density, the analytical form of which is initially unknown. The effectiveness of using real density relative to normal density is substantiated. When justifying, ordered sequences of values of differences between classes according to pairs of nominal characteristics are used. Data analysis using the proposed compactness measure is one means to solve the curse of dimensionality problem in BigData.
References
2. Vorontsov K.V. Mathematical methods of teaching using precedents. MIPT course of lectures, 2006.
3. K.V. Rudakov. On some Factorizations of semimetric cones and estimates quality of heuristic metrics in data analysis tasks.
4. Zagoruiko N. G. Hypotheses of compactness and λ -compactness in data analysis methods // Sib . magazine industrial _ mathematics _ 1998. T.1, No. 1. pp. 114-126.
5. Zinoviev A.Yu., Visualization of multidimensional data, Krasnoyarsk, Publishing House KSTU, 2000.180
6. Saidov D.Yu. Information models based on nonlinear transformations of feature space in recognition problems: Diss . ... Doctor of Philosophy ( PhD ) in physical and mathematical sciences. Tashkent, 2017.- 93 p .
7. Ignatyev NA, Structure Choice for Relations between Objects in Metric Classification Algorithms // Pattern Recognition and Image Analysis. 2018. V . 28. No. 4. P. 590–597.
8. Zagoruiko N.G., Kutnenko O.A., Zyryanov A.O., Levanov YES . Learning pattern recognition without retraining // Machine learning and data analysis, 2014. Vol. 1 . No. 7. pp. 891–901.
9. Mirzaev 2021 – Mirzaev A.I. On the choice of space for describing objects in machine learning on large data samples // Problems of Computational and Applied Mathematics. No. 6 (36) 2021, pp. 120 – 127.
10. Ignatyev NA On Nonlinear Transformations of Features Based on the Functions of Objects Belonging to Classes // Pattern Recognition and Image Analysis. 2021. V. 31. No. 2. P. 197-204.
11. Adilova FT, Ignat'ev NA, Madrakhimov Sh.F. _ The Approach to Individualized Teleconsultations of Patients with Arterial Hypertension // Global Telemedicine and eHealth Updates: Knowledge Resources, Vol. 3, 2010. –P.372-375.
12. Zhamby M. Hierarchical cluster analysis and correspondence: Transl. from fr. - M.: Finance and Statistics, 1988. 342 With .
13. Gyamfi , KS, Brusey , J, Hunt, A & Gaura , E 2018, 'Linear dimensionality reduction for classification via a sequential Bayes error minimization with an application to flow meter diagnostics' Expert Systems with Applications, vol 91, pp. 252-262 https://dx.doi.org/10.1016/j.eswa.2017.09.010
14. http s :/ scikitlearn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html
15. https://archive.ics.uci.edu/ml/datasets/arrhythmia



