Some Extensions of Mining of Linguistic Associations

Summary


This paper is a contribution to the theoretical foundations of data mining. More precisely, we contribute to a part of data mining allowing us to search for associations among attributes that can be expressed in the form of natural language sentences. The theoretical background and also a method for mining such associations was published recently in [V. Novák et al., Mining pure linguistic associations from numerical data, Int. Journal of Approximate Reasoning 48 (2008), 4 - 22]. We elaborated other mathematical representations of the model presented in the mentioned paper in order to extend its applicability.

See the full content of this document

Extract


Some Extensions of Mining of Linguistic Associations

(ProQuest: ... denotes formulae omitted.)

1. Introduction

Data mining is regarded as a non-trivial process of identifying valid, novel, potentially useful and ultimately understandable knowledge in large-scale data sets ([5]). The process of data mining has attracted a lot of research interest in the last two decades. It should be mentioned that the first data mining method was the GUHA method presented in [7] even earlier. Probably because of a different terminology (the author of [7] did not use the term "data mining") the GUHA method is not well known and some results had been forgotten and then rediscovered in the nineties (e.g. [11], [20]). For more precise information on the GUHA method we refer to [8] and references therein.

This paper is a contribution to the theoretical foundations of data mining and partially extends the use of the GUHA method. We follow a direction that was recently developed by V. Novak in several papers (cf., e.g. [15] and [12]). Within Novak's novel approach a method for searching for so-called linguistic associations was elaborated ([16]). This method is based on the GUHA method and the results of formal fuzzy logic ([14]) and allows us to mine linguistic associations of the form IF the area of the base of a cylinder is big AND the height of this cylinder is also big, THEN the volume of this cylinder is big.

The advantage of this approach is especially the high understandability of founded associations since they are presented in natural language. Additionally, it should be also mentioned that found linguistic associations can be interpreted as standard fuzzy IF-THEN rules (see [2] and references therein). Further, any data mining procedure working with categorical or logical data can be applied to Novak's mathematical model of linguistic expressions and predications. However, this mathematical model has some disadvantag...

See the full content of this document

Sponsored links




ver las páginas en versión mobile | web

ver las páginas en versión mobile | web

© Copyright 2012, vLex. All Rights Reserved.

Contents in vLex Germany

Explore vLex

For Professionals

For Partners

Company