Алгоритми очищення статистичної вибірки від аномалій для задач data science

The paper considers the nature of input data used by Data Science algorithms of modern-day application domains. It then proposes three algorithms designed to remove statistical anomalies from datasets as a part of the Data Science pipeline. The main advantages of given algorithms are their relative...

Повний опис

Збережено в:
Бібліографічні деталі
Видавець:The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"
Дата:2023
Автори: Pysarchuk, Oleksii, Baran, Danylo, Mironov, Yurii, Pysarchuk, Illya
Формат: Стаття
Мова:English
Опубліковано: The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2023
Теми:
Онлайн доступ:http://journal.iasa.kpi.ua/article/view/260175
Теги: Додати тег
Немає тегів, Будьте першим, хто поставить тег для цього запису!

Організація

System research and information technologies
Опис
Резюме:The paper considers the nature of input data used by Data Science algorithms of modern-day application domains. It then proposes three algorithms designed to remove statistical anomalies from datasets as a part of the Data Science pipeline. The main advantages of given algorithms are their relative simplicity and a small number of configurable parameters. Parameters are determined by machine learning with respect to the properties of input data. These algorithms are flexible and have no strict dependency on the nature and origin of data. The efficiency of the proposed approaches is verified with a modeling experiment conducted using algorithms implemented in Python. The results are illustrated with plots built using raw and processed datasets. The algorithms application is analyzed, and results are compared.