El problema epistemológico de los Big Data en la producción de conocimiento científico
Fecha
2023-06-07
Tipo
tesis de maestría
Autores
Martén Saborío, Sergio
Título de la revista
ISSN de la revista
Título del volumen
Editor
Resumen
La ciencia de datos ocupa un puesto de importancia en las ciencias actuales debido a los avances en Machine Learning y en la recolección y almacenamiento de grandes cantidades de datos —Big Data—, los cuales han posibilitado un nivel más alto de precisión en muchas disciplinas, además de que han hecho cognoscibles ciertos objetos que a simple vista no lo eran. Sin embargo, la razón para la distribución de pesos y sesgos en algunos modelos de Machine Learning es incognoscible por su complejidad, lo cual, en el contexto de la producción de conocimiento científico, significa omitir lo explicativo de las teorías. En la investigación se muestran las razones epistemológicas que fundamentan el problema de la opacidad epistémica. Luego, se analiza el área emergente de la inteligencia artificial explicable (XAI), un intento de evadir la opacidad epistémica a través de explicaciones locales de los modelos opacos, y se muestran sus alcances y limitaciones. Finalmente, se argumenta que la explicación no es accidental en la ciencia, sino que epistemológicamente debería ser parte esencial de ella. Se muestra que, mientras que el uso de modelos complejos de Machine Learning presenta problemas sui generis en su uso en las ciencias, estos no atentan directamente contra los métodos científicos actuales. La explicación del funcionamiento de los modelos complejos, mientras que no permite una comprensión absoluta de la razón de cada parámetro, puede abarcar lo suficiente —siempre que se aplique el modelo de explicación adecuado, comprendiendo sus limitaciones— como para permitir la producción legítima de conocimiento científico.
Data science occupies an important spot among current sciences due to its development of Machine Learning tools and the collection and storage of increasingly big amounts of data —Big Data—, all of which have led to a higher level of precision in a lot of different disciplines, as well as they have made certain objects cognizable which at first glance seemed unknowable. However, the reason for why weights and biases are distributed as they are in some complex Machine Learning models seems still to be unknowable precisely because of its complexity. This, in the context of scientific knowledge, could imply the need to omit explanations from theories. Here, we show the epistemological reasons that ground the problem of epistemic opacity. We then analyze the emergent area of explainable AI (XAI), an attempt to avoid epistemic opacity through local explanations of opaque models, to make clear its scope and limitations. Finally, we argue that explanation is not accidental to science, but that, epistemologically, it should be an essential part of science. The use of complex models of Machine Learning for scientific purposes comes with a sui generis set of problems, which do not, in fact, go against current scientific methods or practices. The explanation of how these complex models work, while it does not allow for an absolute comprehension of the reason for each specific parameter, may comprise enough to allow for legitimate production of scientific knowledge —at least as long as the choice of model is reasonable and rational and the explanation model chosen is adequate and its limitations are understood—.
Data science occupies an important spot among current sciences due to its development of Machine Learning tools and the collection and storage of increasingly big amounts of data —Big Data—, all of which have led to a higher level of precision in a lot of different disciplines, as well as they have made certain objects cognizable which at first glance seemed unknowable. However, the reason for why weights and biases are distributed as they are in some complex Machine Learning models seems still to be unknowable precisely because of its complexity. This, in the context of scientific knowledge, could imply the need to omit explanations from theories. Here, we show the epistemological reasons that ground the problem of epistemic opacity. We then analyze the emergent area of explainable AI (XAI), an attempt to avoid epistemic opacity through local explanations of opaque models, to make clear its scope and limitations. Finally, we argue that explanation is not accidental to science, but that, epistemologically, it should be an essential part of science. The use of complex models of Machine Learning for scientific purposes comes with a sui generis set of problems, which do not, in fact, go against current scientific methods or practices. The explanation of how these complex models work, while it does not allow for an absolute comprehension of the reason for each specific parameter, may comprise enough to allow for legitimate production of scientific knowledge —at least as long as the choice of model is reasonable and rational and the explanation model chosen is adequate and its limitations are understood—.
Descripción
Palabras clave
BIG DATA, ARTIFICIAL INTELLIGENCE, Machine Learning, Interpretable Machine Learning, Explainable Machine Learning, EPISTEMOLOGÍA, CONOCIMIENTO