Further notes on weeks 7 and 8

Back to definitons

Anahit had found a nice definition for expert systems (here):

"The definition of an expert or knowledge-based system has evolved over the years. Rauch [1984] defines knowledge-based systems as:

A class of computer programs intended to serve as consultants for decision making. These programs use a collection of facts, rules of thumb, and other knowledge about a limited field to help make inferences in the field. The differ substantially from conventional computer programs in their goals may have no algorithmic solution, and they must make inferences based on incomplete or uncertain information. They are called expert systems because they address problems normally thought to require human specialists for solution, and knowledge-based because researchers have found that amassing a large amount of knowledge, rather than sophisticated reasoning techniques, is responsible for the success of the approach.

Expert systems are thus software systems that mimic the deductive or inductive reasoning of a human expert. This should not mean that the computer is processing information like the human expert nor is the silicon brain physically organized like the expert system. Expert knowledge in a particular domain not only includes published information and theories, but more importantly includes private knowledge that is not readily available in the literature. This private knowledge consists largely of rules of thumb that have come to be called heuristics. Representing and encoding such knowledge is the central task in knowledge engineering."

In the same cite, they have also discussed about artificial intelligence, knowledge engineering and other disciplines, which are related to expert systems.

Margins

We had discussion about different kind of margins. The common or regular margin is the maximal distance between hyperplanes, which divide the data points into classes. I.e. distance between two linear decision boundaries. Sometimes this is also called a hard margin.

The soft margin is a distance between hyperplanes, which do not separate all data points, but allow some data points to be in the wrong side (i.e. misclassified). The idea is to find a hyperplane that splits the examples as cleanly as possible, while still maximizing the distance to the nearest cleanly split examples. Why do we do this, when we could transform the data to so high space that we could learn a linear boundary? An obvious reason is that the resulting model could be seriously overfitted. It is also possible that we want to use a (simpler) kernel function, which can create only soft margins for the given data. Finally, it is possible that the data set is not consistent, but contains several data points, which share all the other attribute values but the class value. This last problem cannot be solved by transformation to any higher space, but one of the conflicting data points will always lie in the wrong side.

Some easy-reading explanations can be found here and here.

Language notes: to learn or to know?

I have recognized that quite many of native Russian speakers use verb "know" instead of "learn". For example, somebody says "Last week I knew three topics", when quite obviously, s/he means "Last week I have learnt three new topics". Is it so that in Russian you use know-verb to express learning in some cases? (Learning is increasing your knowledge, getting to know something!) Maybe the same holds for other Slavic languages, as well? I am interested to hear the reasons behind this! Maybe we could invent good rules, when to use which verb.

Another important distinction is between two types of knowledge, data and information: data is raw knowledge (if we can call it knowledge at all), just strings of bits, which contains hidden knowledge, but nobody has interpreted it. When the data is interpreted, the information is extracted from it. I.e. information is knowledge, which people can understand and use, while machines just process data (without understanding). This is also the explanation behind concept "data mining": we have a rock of data, which we mine (like gold miners) to find valuable, new pieces of information.