Free Essay

Consistent Query Answers in Inconsistent Databases

In:

Submitted By dumspirospero
Words 8845
Pages 36
Consistent Query Answers in Inconsistent Databases
Marcel0 Arenas Pontificia Universidad Cat6lica de Chile Escuela de Ingenieria Departamento de Ciencia de Computaci6n Casilla 306, Santiago 22, Chile marenas@ ing.puc.cl Leopold0 Bertossi Pontificia Universidad Cat6lica de Chile Escuela de Ingenieria Departamento de Ciencia de Computaci6n Casilla 306, Santiago 22, Chile bertossi@ing.puc.cl

Jan Chomicki Monmouth University Department of Computer Science West Long Branch, NJ 07764 chomicki @monmouth.edu

Abstract
In this paper we consider the problem of the logical characterization of the notion of consistent answer in a relational database that may violate given integrity constraints. This notion is captured in terms of the possible repaired versions of the database. A rnethod for computing consistent answers is given and its soundness and completeness (for some classes of constraints and queries) proved. The method is based on an iterative procedure whose termination for several classes of constraints is proved as well.

1

Introduction

Integrity constraints capture an important normative aspect of every database application. However, it is often the case that their satisfaction cannot be guaranteed, allowing for the existence of inconsistent database instances. In that case, it is important to know which query answers are consistent with the integrity comtraints and which are not. In this paper, we provide a logical characterization of consistent query answers in relational databases that may be inconsistent with the given integrity constraints. Intuitively, an answer to a query posed to a database that violates the integrity constraints will be consistent in a precise sense: It should be the same as the answer obtained from any minimally repaired version of the original database. We also provide a method for computing such answers and prove its properties. On the basis of a query Q, the. method computes, using an iterative procedure, a new query Tw(Q) whose evaluation in an arbitrary, consistent or inconsistent, database returns the set of
Permission to make digital or hard copies of all or part of this work 1‘01 personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the fill1 citation on the iirst page. To copy otherwise, to republish, to post on servers or to redistribute to lists. rcquircs prior specific permission and/or a fee. PODS ‘99 Philadelphia PA Copyright ACM 1999 I-581 I:!-062-7/99/05...$5.00

consistent answers to the original query Q. We envision the application of our results in a number of areas: Data warehousing. A data warehouse contains data coming from many different sources. Some of it typically does not satisfy the given integrity constraints. The usual approach is thus to clean the data by removing inconsistencies before the data is stored in the warehouse [6]. Our results make it possible to determine which data is already clean and proceed to safely remove unclean data. Moreover, a different scenario becomes possible, in which the inconsis,tenties are not removed but rather query answers are marked as “consistent” or “inconsistent”. In this way, information Iloss due to data cleaning may be prevented. Database integration. Often many different databases are integrated together to provide a single unified view for the users. Database integration is difficult since it requires the resolution of many different kinds of discrepancies of the integrated databases. One possible discrepancy is due to different sets of integrity constraints. Moreover, even if every integrated database locally satisfies the same integrity constraint, the constraint may be globally violated. For example, different databases may assign different addresses to the same student. Such conflicts may fail to be resolved al:all and inconsistent data cannot be “cleaned” because of the autonomy of different databases. Therefore, it is important to be able to find out, given a set of local integrity constraints, which query answers returned from the integrated database are consistent with the constraints and which are not. Active and reactive d&abases. A violation of integrity constraints may be acceptable under the provision that it will be repaired in the near future. For example, the stock level in a warehouse may be allowed to fall below the required minimum if the necessary replenishments have been ordered. During this temporary inconsistency, however, query answers should give an indication whether they are consistent with the constraints or not. This problem is particularly acute in

68

active databases that allow such consistency lapses. The result of evaluating a trigger condition that is consistent with the integrity constraints should be treated differently from the one that isn’t. The following example presents the basic intuitions behind the notion of consistent query answer. Example 1. Consider a database subject to the following ZC: v4W The instance 1 Q(x))

2.1

Repairs

Given a database instance r, we denote by Z(r) the set of formulas {P(c)(r b P(a)}, where the Ps are relation names and E is ground tuple. Definition 2. (Distance) The distance A(r,r’) between database instances r and r’ is the symmetric difference: A(r, r’) = (X(r) - C(r’)) U (Z(r’) -E(r)). 3. For the instances r, r’, r”, r’ Lr r” if A(r, r’) c i.e., if the distance between r and r’ is less than or equal to the distance between r and r”. Notice that built-in predicates do not contribute to the As because they have fixed extensions, identical in every database instance.
A(r,r”),

Definition

violates this constraint. Now if the query asks for all x such that Q(X), only a is returned as an answer consistent with the integrity constraint. The plan of this paper is as follows. In section 2 we introduce the basic notions of our approach, including those of repair and consistent query answer. In section 3 we show a method how to compute the query To(Q) for a given firstorder query Q. In subsequent sections, the properties of this method are analyzed: soundness in section 4, completeness in section 5, and termination in section 6. In section 7 we discuss related work. In section 8 we conclude and outline some of the prospects for future work in this area. The proofs are given in the appendix.
2 Basic Notions

Definition 4. (Repair) Given database instances r and r’, we say that r’ is a repair of r if r’ k IC and r’ is Q(x))}. Clearly, r does not satisfy ZC because r k P(b) A TQ(b). In this case we have two possibles repairs for r. First, we can falsify P(b), obtaining an instance r’ with C(r’) = {f%),Q(a)>Q(c)). A s a second alternative, we can make Q(b) true, obtaining an instance r” with C(r”) = {P(a), P(b),
Q(a),Q(b),Q(c)>.

In this paper we assume we have a fixed database schema and a fixed infinite database domain D. We also have a first order language based on this schema with names for the elements of D. We assume that elements of the domain with different names are different. The instances of the schema are finite structures for interpreting the first order language. As such they all share the given domain D, nevertheless, since relations are finite, every instance has a finite active domain which is a subset of D. As usual, we allow built-in predicates that have infinite extensions, identical for all database instances. There is also a set of integrity constraints IC, expressed in that language, which the database instances are expected to satisfy. We will assume that ZC is consistent in the sense that there is a database instance that makes it true. Definition 1. (Consistency) A database instance r is consistent if r satisfies IC in the standard model-theoretic sense, that is, r k ZC; r is inconsistent otherwise. This paper addresses the issue of obtaining meaningful and useful query answers in any, consistent or inconsistent, database. It is well known how to obtain query answers in consistent databases. Therefore, the challenging part is how to deal with the inconsistent ones.

The definition of a repair satisfies certain desirable and expected properties. Firstly, a consistent database does not need to be repaired, because if r satisfies IC, then, by the minimality condition wrt the relation &, r is the only repair of itself (since A(r,r) is empty). Secondly, any database r can always be repaired because there is a database r’ that satisfies 1C, and A(r, r’) is finite. Example 3. (motivated by [ 191) Consider the IC saying that C is the only supplier of items of class T4:
V(x,y,z)(Supply(x,y,z) A Class(z,T4) 3 x = C). (1)

The following database instance rl violates the IC:
C D SUPPlY Dl 11
D2 I2

Class 11 T4
12 T4

The only repairs of this database are
C SUPPlY Q 11 Class 11
12

T4
T4

69

and
SWPlY CD1 D D2 I2

2.2

Consistent

query

answers

We assume all queries are in prefix disjunctive normal form. Definition 5. A formula Q is a query if it has the following syntactical form:

Example 4. (motivated by [ 191) Consider the IC:
\J(~,Y)(S~PE;,~Y(X,Y,~l) 3

SuPPlY(GY:Iz)),

(2)

i=l

j=l

j=l

saying that item Z2is supplied whenever item 11 is supplied; and the following inconsistent instance, r;?,of the database
C
C

SUPPlY DI 11
D1 13

where 0 is a sequence of quantifiers and every vi contains only built-in predicates. If 0 contains only universal quantifiers, then we say that Q is a universal query. If 0 con.tains existential (and possibly universal) quantifiers, we say that Q is non-universal query. Definition 6. (Query answer) A (ground) tuple i is an an-) swer to a query Q( x in a database instance r if r k Q(I~?. A (ground) tuple F is an answer to a set of queries { Ql , . . . , Qnj if r b Ql A . . . A Qn. Definition 7. (Consistent answer) Given a set of integrity constraints, we say that a (ground) tuple i is a consistem answer to a query Q(Z) in a database instance r, and we. write r kc Q(g ( or r kc Q(Z)[t)), if for every repair ,r’ of r, r’ != Q(tJ. If Q is a sentence, then true (false) is a consistent answer to Q in r, and we write r +=c Q (r kc Q), if for every repair r’ of r, r’ b Q (r’ F Q). Example 6. (example 3 continued) The only consistent answer to the query Class(z, T4), posed to the database instance rl, is II because rl kc CZass(z, T4)[Zl]. Example 7. (example 4 continued) The only consistent answer to the query Supply(C, D1, z), posed to the database instance r2, is 13because r:! /==cSuppZy(C, D1, z) [13]. Example 8. (example 5 continued) By considering all the:repairs of the database instance r3, we obtain Cl and C2 as the consistent answers to the query 3zCourse(Sl, y,z), posed to r3. For the query 3(u, v)(Student(u, Nl , v) A Course(u,x,y)), we obtain no (consistent) answers.
3 The General Approach

This instance has two repairs:
C
C c

SUPPlY DI 11
DI Dl 12 13

and
SUPPlY
C Dl 13

Example 5. Consider a student database. Student@, y,z) means that x is the student number, y is the student’s name, and z is the student’s address. The two following ICs state that the first argument is a key of the relation V(x, y, z, u, v) (Student(x, y, z) A Student(x, u, v) 3 y = u) , V(x, y, z, u, v) (Student(x, y, z) A Student(x, u, v) > z = v) . The inconsistent database instance r3
Student Sl Nl CT
Sl N2 J%

Course SI
Sl

Cl
C2

G
G2

has two repairs:
Student Sl NI &-SI
Sl

Course Cl
C2

Gl
G2

and
Student
Sl N2 0; Sl SI

Course
Cl C2 GI G2

We present here a method to compute consistent answers to queries. Given a query Q, the query T,(Q) is defined based on the notion of residue developed in the context of sem,antic query optimization (SQO) [5]. In the context of deductive databases, SQO is used to optimize the process of answering queries using the semantic knowledge about the domain that is contained in the ICs. In this case, the basic assumption is that the ICs are satisfied by the database. In our case, since we allow inconsistent databases, we do not assume the satisfaction of the KS while answering queries. A first attempt to obtain consistent answers to a query Q(Z) may be to use query modification, i.e., ask the query Q(2) A IC. However,

70

this does not work, as we obtain false as the answer if the DB is inconsistent. Instead, we iteratively modify the query Q using the residues. As a result, we obtain the query To(Q) with the property that the set of all answers to To(Q) is the same as as the set of consistent answers to Q. (As shown later, the property holds only for restricted classes of queries and constraints.)
3.1 Generating residues in relational DBs

and each positive occurrence of a predicate Pj(Tj) in it, the following residue for +j(%j) is generated j-l Q(Vf?(%)V i=l \j

f?(G)VQTQi(Yi)VV); i=l (4)

i=j+ 1

We consider only universal constraints. We begin by transforming every integrity constraint to the standard format (Edpansion step). Definition 8. An integrity constraint is in standardformat if it has the form

where Q is a sequence of universal quantifiers over all the variables in the formula not appearing in Xi. IfR 1,. . . , R, are all the residues for -Pi, then the following rule is generated: -Pi(G) ++ lPj($){Rl(C),... ,R,(G)},

where tt, are new variables. If there are no residues for TPj, then the rule 7Pj(W) H -IPC is generated. For each negative occurrence of a predicate Qj(yi) in (3), the following residue for Qj (yj) is generated

v(~Pi(~i)Vil~Qi(Bi)VW), i=l i=l

where V represents the universal closure of the formula, Zi, yi are tuples of variables and w is a formula that mentions only built-in predicates, in particular, equality. Notice that in such an IC there are no constants in the Pi, Qi; if they are needed they can be pushed into w. Many usual ICs that appear in DBs can be transformed to the standard format, e.g. functional dependencies, set inclusion dependencies of the form ‘v’Z(P(Z) > Q(Z)), transitivity constraints of the formVx,y,z(P(n,y) ~P(y,z) > P(x,z)). The usual ICs that appear in SQO in deductive databases as rules [5] can be also accommodated in this format, including rules with disjunction and logical negation in their heads. An inclusion dependency of the form Vf(P(2) > 3y Q&y)) cannot be transformed to the standard format. After the expansion of ZC,rules associated with the database schema are generated. This could be seen as considering an instance of the database as an extensional database expanded with new rules, and so obtaining an associated deductive database where semantical query optimization can be used. For each predicate, its negative and positive occurrences in the ICs (in standard format) will be treated separately with the purpose of generating corresponding residues and rules. First, a motivating example. Example 9. Consider the IC Vx (+(x) V Q(x)). If Q(x) is false, then +(x) must be true. Then, when asking about ~Q(x), we make sure that +‘(x) .becomes true. That is, we generate the query ~Q(x) A +(x) where +(x) is the residue attached to the query. For each IC in standard format
(3)
i=l i=l

&(~fi(%)V’~-Qi(Y~)V i=l i=l

T) TQi(jji)VW), i=j+l where 0 is a sequence of universal quantifiers over all the variables in the formula not appearing in yj. IfR’,,... , Ri are all the residues for Qj (yj), the following rule is generated:

Qj(4 ++ Qj(n){R’l(E),... ,R:(fi)).
If there are no residues for Qj(yj), then the rule Qj(z2) H Qj(E) is generated. Notice that there is exactly one new rule for each positive predicate, and exactly one rule for each negative predicate. If there are more than one positive (negative) occurrences of a predicate, say P, in an IC, then more then one residue is computed for 1P. In some cases, e.g., for functional dependencies, the subsequent residues will be redundant. In other cases cases, e.g., for transitivity constraints, multiple residues are not redundant. Example 10. If we have the following ICs in standard format IC = {WW v+‘(x) v ~Q(x)),V~;P(x),V~Q(x))),

the following rules are generated: P(x) w P(x) {R(x) v lQ(4 > Q(4{W v +‘(4,P(x)) R(x) -P(x) {-Q(x) >

Q(x)
R(x) +(4

-Q(x) lR(x) -Q(x)
FR { -P(x) V -Q(x)}.

Notice that no rules are generated for built-in predicates, but such predicates may appear in the residues. They have

71

fixed extensions and ,thus cannot contribute to the violation of an IC or be modified to make an IC true. For example, if we have the IC Vx,y,z(lP(x,y) V ~P(x,z) Vy = z), and the database satisfies P( 1,2),P( 1,3), the IC cannot be made true by making 2 = 3. Once the rules have been generated, it is possible to simplify the associated residues. In every new rule of the form P(C) b--P P(fi){Rl(ii),. . . , R,( 6)) the auxiliary quantifications introduced in the expansion step are eliminated (both the quantifier and the associated variable in the formula) from the residues by the process inverse to the one applied in the expansion. The same is done with rules of the form TP +--k ,P{. . .}.
3.2 Computing

Definition 10. The application of operator Tw on a query is. defined as T,(g) = U {T,(q)}. n, +(x1 A ((+(x) A ~Q(x)) vlQ(x,)}.
We show first that the operator T, conservatively extends standard query evaluation on consistent databases. Proposition 1. Given a database instance r and a set of integrity constraints IC, such that r t= IC, then for every query Q(Z) and every natural number n: r !=VX(Q(X) E Tn(Q(f))). Corollary 1. Given a database instance r and a set of integrity constraints IC, such that r b ZC, then for every qulery Q(T) and every tuple f: r l= Q(o if and only if r b T,(Q(i)).
4 Soundness

To(Q)

In order to determine consistent answers to queries in arbitrary databases, we will make use of a family of operators consisting of T,, n 1 0, and T,. Definition 9. The application of an operator T, to a query is defined inductively by :meansof the following rules 1. T,(O) := 0, T,(-10) := 10, for every IZ 2 0 (0 is the empty clause). 2. To(q) := q. 3. For each predicate P(ii), if there is a rule P(ii) e P(L){Ri(c), . . . ,t;r,(c)}, then T,+l (P(il)) := P(G) A A Tn(Ri(c)). i=l Now we will show the relationship between consistent answers to a query Q in a database instance r (definition 7) and answers to the query Tw(Q) (definition 6). We show that Tw(Q) returns only consistent answers to Q. Theorem 1. (Soundness) Let I be a database instance, ZC a set of integrity constraints and Q(X) a query (see definition 5) such that r b To(Q(Z))[q. If Q is universal or non-universal and domain independent[20], then i is a consistent answer to Q in r (in the sense of definition 7), that is, r kc Q(q . The second condition in the theorem excludes non-univlzrsal, but domain dependent queries like Elx~P(x). Example 12. (example 6 continued) The IC (1) transformed into the standard format becomes

IPf5y) u

does not have residues, then T,+t(P(fi))

:=

4. For each negated predicate -Q(c), if there is a rule --JQ(~) c) lQ(+){Ri(P),. . ,Ri(G)}, then
T,+l(~Q(i;)) := ~Q(ts) A i\ T,(R:(~)) i=l If -Q(c) does not have any residues, thenT,+t (-Q(E)) :=

-QY, 4 v

Xlass(z, The following rule is generated:

w) V w # T? V x = C).

i=l

j=l

j=l

Class(z, w) H

CZass(z,w)

where Q is a sequence of quantifiers and vi is a formula that includes only built-in predicates, then for every n 2 0: TV := CS y( Tn(f’i,j(%,j)) A $( is1 j:=l I h Tn(lQi,j(Qi,j)) AWi) j=l Given the database instance rt that violates the IC as before, if we pose the query Class(z,Td), asking for the items of class T4, directly to t-1, we obtain Ii and Z2. Nevertheless, if we pose the query T, (CZass(z, T4)), that is

CZass(z, T4) AV(x,y)(+uppZy(x,y,z)

Vx = C))

72

we obtain only II, eliminating 12. Ii is the only consistent answer. Example 13. (example 8 continued) In the standard format, the ICs take the form V(x, y, z, u, v) (-Q4dent(x,y, z) v 1Student(x, u, v) v y = u) )
V(~,y,z,~,v)(~Student(x,y,z)V Y%dent(x, u, v) V z = v).

where 11 and 12 are literals, and w is a formula that only contains built-in predicates. Examples of BICs include: functional dependencies, symmetry constraints, set inclusions dependencies of the form ‘v’~(P(~) > Q(2)). Definition 12. Given a set of sentences C in the language of the database schema DB, and a sentence rp, we denote by Z I=DBcpthe fact that, for every instance r of the database, if r != C, then r k cp. Theorem 2. (Completeness for BZCs) Given a set IC of binary integrity constraints, if for every literal l’(G), ICP~DB l’(n), then the operator T, is complete, that is, for every ground literal Z(q, if r kc l(Q then r b T,(l(q). The theorem says that every consistent answer to a query of the form L(1) is captured by the Tw operator. Actually, proposition 2 in the appendix and the completeness theorem can be easily extended to the case of queries that are conjunctions of literals. Notice that the finiteness Tw(Z(X)) is not a part of the hypothesis in this theorem. The hypothesis of the theorem requires that the ICs are not enough to answer a literal query by themselves; they do not contain definite knowledge about the literals. Example 14. We can see in the example 12 where BICs and queries which are conjunctions of literals appear, that the operator Tw gave us ail the consistent answers, as implied by the theorem. Corollary 2. If ZC is a set of functional dependencies (FDs)

The following rule is generated Student(x, y,z) I--+ Student(x, y, z) {V(~,~)(~Student(x,u,v)
V(u,v)(~Student(x,u,v)

Vy = u), Vz = v)}.

Given the inconsistent database instance Q, if we pose the query ZlzCourse(S1,y, z), asking for the names of the courses of the student with number Sr, we obtain Cl and C2. If we pose the query T&JzCourse(Sl ,y,z)) = {flzCourse(Sl
,y,z)}

we obviously obtain the same answers which, in this case, are the consistent answers. Intuitively, in this case the Tw operator helps us to establish that even when the name of the student with number St is undetermined, it is still possible to obtain the list of courses in which he/she is registered. On the other hand, if we pose the query

about the courses and grades for a student with name Nr , to r-3, we obtain (Cl, Gl) and (Cz, G2). Nevertheless, if we ask
To(3(u,v)(Student(u, Nl ,v) A Course(u,n,y)))

IC=

{v’(-~l(~l,yl)~~~l(~l,zl)vYl

=z1),

(5)

. ..)
W+n(%,Yn)

we obtain, in conjunction with the original query, the formula:
Vy’ = N1) A Vz’ = v) A Course(u,x,y)),

v+n(%,zn)

VYn

= zn)),

then the operator To is complete for consistent answers to queries that are conjunctions of literals. Example 15. In example 13 we had FDs that are also BICs. Thus the operator Tw found all the consistent answers, even for some queries that are not conjunctions of literals, showing that this is not a necessary condition. Example 16. Here we will show that in general completeness is not obtained for queries that are not conjunctions of AP(x,z) > y = z) literals. Consider the IC: Vx,y,z(P(x,y) and the inconsistent instance r with Z(r) = {P(a, b), P(a, c)}. This database has two repairs: r’ with C(r/) = {P(a,b)}; and r” with C(r”) = {P(a,c)}. We have that r kc 3xP(u,x), because the query is true in the two repairs. Now, it is easy to see that To(3uP(u, u)) is logically equivalent to 3u(P(u,u) AVz(~P(a,z) Vz = u)). So, we have r k T,(33cP(u,n)). Th us, the consistent answer true is not captured by the operator Tw.

V(y’,z’)(-Student(u,y’,z’) V(y’,z’)(Gtudent(u,y’,z’)

from this we obtain the empty set of tuples. This answer is intuitively consistent, because the number of the student with name Nr is uncertain, and in consequence it is not possible to find out in which courses he/she is registered. The set of answers obtained with the Tw operator coincides with the set of consistent answers which is empty.
5 5.1 Completeness Binary ICs

Definition 11. A binary integrity constraint tence of the form
\J(h (Xl) v Z2@2) v w(4),

(BIC) is a sen-

73

5.2

Other

Constraints

6.1

Syntactical

finiteness

The following the0re.m applies to arbitrary ICs and generalizes Theorem 2. Theorem 3. (Completeness) Let IC be a set of integrity constraints, l(1) a literal, and T,(Z(X)) of the form l(F) A ~V(.fi,~i)(Ci(i,Ti) i=l Vvi(.F,ri)).

The notion of syntactical finiteness is important because then for some n and all m > n, Tm(Q(Z)) will be exactly the same. In consequence, T,(Q) will be a finite set of formulas. In addition, a point of finiteness n can be detected (if it exists) by syntactically comparing every two consecutive steps in the iteration. No simplification rules need to be considered, because the iterative procedure is fully deterministic. Here we introduce a necessary and sufficient condition for syntactical finiteness. Definition 14. A set of integrity constraints IC is a&& if there exists a function f from predicate names plus negations of predicate names in the database to the natural numbers, that is, f : {PI,. . ,pn,lpl,. . ,lp,} -+ N, such thal: for every integrity constraint V(VfZ1 Zi(Zi) VW(T)) E ZC as in (3), and every i and j (1 2 i, j < k), if i # j, then f(-li) > f(li). (Here Tli is the literal complementary to li.) Example 17. The set of ICs
IC = {Vx(+(x) v -Q(x) v S(n)), v+(y) v %Y))l.

If for every n > 0, the,re is S E { 1, . . . , m} such that 1. for every j E S and every tuple a: IC ~DB Cj (ii), and 2. {V(~i,yi)(Ci(I,~i)V~i(X,~i))li
{V(-fii,yi)(Ci(-f,Xi)

E S} implies
VWi(.f,jji))ll 5

i _ f(S), and from the second, we would obtain f(S) > f(Q); a contradiction. Theorem 4. A set of integrity constraints ZC is acyclic iff for every literal name I in the database schema, T,(Z(Z)Il is syntactically finite. The theorem can be extended to any class of pueries satisfying Definition 5. Example 19. The set of integrity constraints in example 18 is not acyclic. In that case To(Q(x)) is infinite. Example 20. The ICs in example 17 are acyclic. There we

The number n in ca,ses2 and 3 is called a point offiniteness. It is clear that 1 implies 2 and 2 implies 3. In the full version we will show tlhat all these implications are proper. In all these cases, evaluating Tw(Q(Z) gives the same result as evaluating T, (Q(Z) f or some n (in the instance r in case 3). IfTw(Q(-) x 1s semantically finite, sound and complete, then the set of consistent answers to Q is jrst-order dejinable.

74

have Wf’(u)) =

Theorem 6. Let I be a literal name. If for some n,
W&(f)) 3 G+l(Z(f)))

{P(U)> P(u) A (~Q(u) V S(u)), P(u) A (lQ(u) v S(u) A WlQ(v) V T(v>u)))l

is valid, then for all m 2 n, Vi(T,(Z(f)) - T,(Z(x)))

To(Q(u)) = {Q(u),
Q(u) A (d’(u) V S(u)) A Vv(-S(v) V S(u) A b(lQ(w) A (+‘(v) v V T(u, v)); V T(w ~1)) A

Q(u) A (+(u)

W+(v)

lQ(v))

V T(u> ~1) 1

T&S(u)) = {S(u),S(u) A V'(lQ(v) V T(">u))} Tco(T(u,v))= {T(v)) L(+(u)) = t+(u))

is valid. According to Theorem 6, we can detect a point of finiteness by comparing every two consecutive steps wrt logical implication. Although this is undecidable in general, we might try to apply semidecision procedures, for example, automated theorem proving. We have successfully made use of OTTER [ 171in some cases that involve sets of constraints that are neither acyclic nor uniform. Examples include multivalued dependencies, and functional dependencies together with set inclusion dependencies. For multivalued dependencies, Theorem 6 together with Theorem 3 gives completeness of T,(Z(,i!)) where Z(X) is a negative literal. The criterion from Theorem 6 is also applicable to uniform constraints by providing potentially faster termination detection than the proof of Theorem 5.
6.3 Instance based semantical finiteness

TdlQ(u)) = bQ(4)
T,(G(u)) = {+(u),+(u) A (+(u)

v-Q(u))}

Theorem 7. If Q(2) is a domain independent query, then for every database instance r there is an n, such that for all m L n, r b VT(Tn(Q(Z)) - L(Q(X))). Notice that this theorem does not include the case of negative literals, as in the case of theorem 5.
7 Related work

Tw(+-(u, v)) = {~T(u>v), lT(u, v) A (lQ(u) V +(v)), lT(u,v) A (-Q(u) V+(v) A (+(v)

‘.‘lQ(v))))

Corollary 3. For functional dependencies and a query Q(z), T,(Q(z)) is always syntactically finite.
6.2 Semantical finiteness

Definition 15. A constraint C in clausal form is uniform if for every literal Z(2) in it, the set of variables in Z(X) is the same as the set of variables in C - Z(2). A set of constraints is uniform if all the constraints in it are uniform. Examples of uniform constraints include set inclusion dependencies of the form VT(P(X) > Q(2)), e.g., Example 4. Theorem 5. If a set of integrity constraints IC is uniform, then for every literal name 1 in the database schema, T,(Z(z)) is semantically finite. Furthermore, a point of finiteness n can be bounded from above by a function of the number of variables in the query, and the number of predicates (and their arities) in the query and ZC.

Bry [4] was, to our knowledge, the first author to consider the notion of consistent query answer in inconsistent databases. He defined consistent query answers based on provability in minimal logic, without giving, however, a proof procedure or any other computational mechanism for obtaining such answers. He didn’t address the issues of of semantics, soundness or completeness. It has been widely recognized that in database integration the integrated data may be inconsistent with the integrity constraints. A typical (theoretical) solution is to augment the data model to represent disjunctive information. The following example explains the need for a solution of this kind. Example 21. Consider the functional dependency
V’(X,Y,Z)@‘(X,Y) APbiz)

3 Y = z.

If the integrated database contains both P(a, b) and P(a, c), then the functional dependency is violated. Each of P(a, b) and P(a,c) may be coming from a different database that satisfies the dependency. Thus, both facts are replaced by their disjunction P(a, b) V P(a, c) in the integrated database. Now the functional dependency is no longer violated.

75

To solve this kind of problems [l] introduced the notion of flexible relation, a non-1NF relation that contains tuples with sets of non-key values (with such a set standing for one of its elements). This approach is limited to primary key functional dependencies and was subsequently generalized to other key functional1 dependencies [9]. In the same context, [3, 121 proposed to use disjunctive Datalog and [16] tables with OR-objectis. [l] introduced flexible relational algebra to query flexible relations, and [9] - flexible relational calculus (whose subset can be translated to flexible relational algebra). The remain:ing papers did not discuss query language issues, relying on the existing approaches to query disjunctive Datalog or tables with OR-objects. There are several important differences between the above approaches and ours. First, they rely on the construction of a single (disjunctive) instance and the deletion of conflicting tuples. In our approach, the underlying databases are incorporated into the integrated one in toto, without any changes. There is no need for introducing disjunctive information. It would be interesting to compare: the scope and the computational requirements of both approaches. For instance, one should note that the single-instance approach is not incremental: Any changes in the underlying databases require the recomputation of the entire instance. Second, our approach seems to be unique, in the context of database integration, in considering tuple insertions as possible repairs for integrity violations. Therefore, in some cases consistent query answers may be different from query answers obtained from the corresponding single instance. Example 22. Consider the integrity constraint p > q and a fact p. The instance consisting of p alone does not satisfy the integrity constraint. The common solution for removing this violation is to delete p. However, in our approach inserting q is also a possible repair. This has consequences for the inferences about up and Tq. Our approach returns false in both cases, as ,I) (resp. q) is true in a possible repair. Other approaches return true (under CWA) or undefined (under OWA). Our work has connections with research done on belief revision [lo]. In our case, we have an implicit notion of revision that is determined by the set of repairs of the database, and corresponds to revising the database (or a suitable categorical theory describing it) by the set of integrity constraints. Thus, querying the inconsistent database expecting only correct answers corresponds to querying the revised theory without restrictions. It is easy to see that our notion of repair of a relational database is a particular case of the local semantics introduced in [S], restricted to revision performed starting from a single model (the database). From this we obtain that our revision operator satisfies the postulates (Rl) - (R5),(R7), (R8) in [13]. For each given database r, the relation Lr introduced in definition 3 provides the partial order between models that determines the (models of the) revised database as described in [ 131. [8] concentrates on the computation

of the models of the revised theory, i.e. the repairs in our case, whereas we do not compute the repairs, but keep querying the original, non-revised database and pose a modified query. Therefore, we can view our methodology as a way of representing and querying simultaneously all the repairs of the database by means of a new query. Nevertheless, our motivation and starting point is quite different from belief revision. We attempt to take direct advantage of the semlantic information contained in the integrity constraints in order to answer queries, rather than revising the database. Revising the database means repairing all the inconsistencies in it, instead we are interested in the information related to particular queries. For instance, a query referring only to the consistent portion of the database can be answered withLout repairing the database. Reasoning in the presence of inconsistency has been an important research problem in the area of knowledge representation. The goal is to design logical formalisms that limit what can be inferred from an inconsistent set of formulas. One does not want to infer all formulas (as required by the classical two-valued logic). Also, one prefers not to infer a formula together with its negation. The formalisms satisfying the above properties, e.g., [15], are usually propositio:nal. Moreover, they do not distinguish between integrity constraints and database facts. Thus, if the data in the database violates an integrity constraint, the constraint itself can no longer be inferred (which is not acceptable in the database context). Example 23. Assume the integrity constraint is ~(p A q) and the database contains the facts p and q. In the approach of [ 151, p V q can be inferred (minimal change is captured correctly) but p, q and ~(p A q) can no longer be inferred (they are all involved in an inconsistency). Because of the above-mentioned limitations, such methods are not directly applicable to the problem of computing consistent query answers. Deontic logic [ 18, 141, a modal logic with operators capturing permission and obligation, has been used for the specification of integrity constraints. [ 141used the obligation loperator 0 to distinguish integrity constraints that have to hold always from database facts that just happen to hold. [ 181 used deontic operators to describe policies whose violations can then be caught and handled. The issues of possible repairs of constraint violations, their minimality and consis’tent query answers are not addressed. Gertz [ 1 l] described techniques and algorithms for computing repairs of constraint violations. The issue of query answering in the presence of an inconsistency is not addressed in his work.
8 Conclusions and Further Work

This paper represents a first step in the development of a new research area dealing with the theory and applications

76

of consistent query answers in arbitrary, consistent or inconsistent, databases. The theoretical results presented here are preliminary. We have proved a general soundness result but the results about completeness and termination are still partial. Also, one needs to look beyond purely universal constraints to include general inclusion dependencies. In a forthcoming paper we will also describe our methodology for using automated theorem proving, in particular, OTTER, for proving termination. It appears that in order to obtain completeness for disjunctive and existentially quantified queries one needs to move beyond the To operator on queries. Also, the upper bounds on the size of Tw and the lower bounds on the complexity of computing consistent answers for different classes of queries and constraints need to be studied. In [2] it is shown that in the propositional case, SAT is reducible in polynomial time to the problem of deciding if an arbitrary formula evaluated in the propositional database does not give true as a correct answer, that is it becomes false in some repair. From this it follows that this problem is NP-complete. There is an interesting connection to modal logic. Consider the definition 7. We could write r b q Q(q, meaning that Q(?) is true in all repairs of r, the database instances that are “accessible” from r. This is even more evident from example 16, where, in essence, it is shown that Cl&Q(Z) is not logically equivalent to Z~XOQ(Z),which is what usually happens in modal logic.
Acknowledgments

[5] U.S. Chakravarthy, J. Grant, and J. Minker. LogicBased Approach to Semantic Query Optimization. ACM Transactions on Database Systems, 15(2):162207,199O. [6] S. Chaudhuri and U. Dayal. An Overview of Data Warehousing and OLAP Technology. SIGMOD Record, 26, March 1997. [7] J. Chomicki and G. Saake, editors. Logics for Databases and Information Systems. Kluwer Academic Publishers, Boston, 1998. [8] T. Chou and M. Winslett. A Model-Based Belief Revision System. J. Automated Reasoning, 12: 157-208, 1994. [9] Phan Minh Dung. Integrating Data from Possibly Inconsistent Databases. In International Conference on Cooperative Information Systems, Brussels, Belgium, 1996. [lo] P. Gaerdenfors and H. Rott. Belief Revision. In D. M. Gabbay, J. Hogger, C, and J. A. Robinson, editors, Handbook of Logic in Artificial Intelligence and Logic Programming, volume 4, pages 35-132. Oxford University Press, 1995. [ 1l] M. Gertz. Diagnosis and Repair of Constraint vialations in Database Systems. PhD thesis, Universitlt Hannover, 1996. [12] P. Godfrey, J. Grant, J. Gryz, and J. Minker. Integrity Constraints: Semantics and Applications. In Chomicki and Saake [7], chapter 9. [13] H. Katsuno and A. Mendelzon. Propositional Knowledge Base Revision and Minimal Change. Artificial Intelligence, 52:263-294,199l. [14] K.L. Kwast. A Deontic Approach to Database Integrity. Annals of Mathematics and ArtiJicial Intelligence, 9:205-238,1993. [ 151 J. Lin. A Semantics for Reasoning Consistently in the Presence of Inconsistency. Art$cial Intelligence, 86( l2):75-95,1996. [16] J. Lin and A. 0. Mendelzon. Merging Databases under Constraints. International Journal of C&operative Information Systems, 7( 1):55-76,1996. [17] ‘W.W. McCune. OTTER 3.0 Reference Manual and Guide. Argonne National Laboratory, Technical Report ANL-9416, 1994. [ 181 J.-J. Meyer, R. Wieringa, and F. Dignum. The Role of Deontic Logic in the Specification of Information Systems. In Chomicki and Saake [7], chapter 4.

This research has been partially supported by FONDECYT Grants (1971304 & 1980945) and NSF Grant (IRI-9632870). Part of this research was done when the second author was on sabbatical at the Technical University of Berlin (CIS Group) with the financial support from DAAD and DIPUC.
References

[l] S. Agarwal, A.M. Keller, G. Wiederhold, and K. Saraswat. Flexible Relation: An Approach for Integrating Data from Multiple, Possibly Inconsistent Databases. In IEEE International Conference on Data Engineering, 1995. [2] M. Arenas, L. Bertossi, and M. Kifer. APC and Querying Inconsistent Databases. In preparation. [3] C. Baral, S. Kraus, J. Minker, and VS. Subrahmanian. Combining Knowledge Bases Consisting of FirstOrder Theories. Computational Intelligence, 8:45-71,’ 1992. [4] F. Bry. Query Answering in Information Systems with Integrity Constraints. In IFIP WG 11.5 Working Conference on Integrity and Control in Information Systems. Chapman &Hall, 1997.

77

[19] Jean-Marie Nicolas. Logic for Improving Integrity Checking in Relational Data Bases. Actu Informutica, 18:227-253,1982. [20] J. Ullman.
Principles of Database and KnowledgeBase Systems, Vol. I. Computer Science Press, 1988. Proofs of Results

2. For every tuple a, ZC~L~DB -#i(G), since the database instance r& where the relation Pi contains only the tuple d and the other relations are empty, satisfies ZC, but not TPi(E). Proof of Theorem 3: Suppose that r t=, Z(g. Let r’ a repa:r of r, we have that f b Z(Z). By proposition 1 we have that r’ k m(Z(F)), that is f b Z(f) A i;\V(q i=l j=l

Appendix:

Some technical lemmas are stated without proof. Full proofs can be found in the file proof spods99. ps in http://dcc.ing.puc.cl/Nbertossi/. Lemma 1. If r k T,,(Z(h)), where Z(E) is a ground literal, then for every repair r’ of r, it holds r’ b Z(c). Lemma 2. If r b Tcj)($, Zi(&)), where Zi(di) is a ground literal, then for every repair f of r, it holds r’ k A;=, Zi(Ei). Lemma 3. If r b T,(Vy=‘=, Ci(Hi)), with Ci(ni) a conjunction of literals, then for every repair f of r, f k Vzl Ci(Ei). Lemma 4. Let Q(X) a universal query. If r C=To@(?)), for a ground tuple i, then for every repair f of r, f I= Q(?). Lemma 5. Let Q(i!) a domain independent query. If r k To(Q(o), for a ground tuple i, then for every repair f of r,

Zi,j(i,fi,j)

VWi(T,xi)),

(a

We want to prove that for every i and for every sequence of ground tuples ai, ai,r, . . . , ai,mi: r k 3 Zi,j(i,sii,j) j=l VWi(t,ni),

(3

To do this, first we are going to prove that for every i E S and for every sequence of ground tuples ai, ai,t , . . . , ai,,,: r k 3 Zi,j(i,Bi,j) j=l VlJli(i,di),

f

b

Q(t3.

Proof of Theorem I.: Lemmas 4 and 5. Proposition 2. Given a set ZC of integrity constraints, a ground clause Vzr Zi(5)) if ZC Y~B Vzt Zi(t;:) and, for every repair r’ of r, f I= VE1 Zi(t;:), then r k V$Zi(t;:). Proof of Propositioln 2: Assume that r k -Vzl Zi(&). By hypothesis ZC Y~DB \,/zr Zi($), thus there exists an instance of the database r’ su.ch that f l= ZCU (7 VEl Z;(fi)}. Let us consider the set of d,atabaseinstances
R = {r*jr*

This is immediately obtained when r k vi(F,ci). Assume that r k lWi(i,di). We know that vi only mentions built-in predicates, thus for every repair f of r we have that r’ b TWi (t, di) . Therefore, by (6) we conclude that for every repair r’ of r: nti r’ k V Zi,j(i,di,j) j=l VJli(i,hi),

By proposition 2 we conclude (8). Thus we have that r~Z(1)A/\V(i;Zi,i(T,x,,i)VWi(i,~i)), iES j=l

b ZC and A(r,r*) 5 A(r,r’)}.

We know that A(r, r”) is finite, therefore there exists ro E R such that A(r, rg) is minimal. Then, rg is a repair of r. For every 1 < i 5:’m, if Zi(t;:) is p(F) or -p(o, then p(q $ A(r,f ). Using this fact we conclude that p(F) $ A(r,rO), Therefore, r b VEr Zi(t;:) if and only if rg l= VLt Zi(t;:). But we assumed that r 1~~Vzr Zi(&), then rg I= 7VzI Zi(t;:); a contradiction. Proof of Theorem 2: From theorem 3. Proof of Corollary 2: In this case it holds:

but by the second condition in the hypothesis of the theorem we conclude that: r k Z(o A i;,,(;j; i=l j=l

Zi,j(f,&,j)

VWi(i,fi)).

1. For every tuple 5, ZC F~DBPi(E), because the empty database instance (which has only empty base relations) satisfies ZC, but not P(a).

Proof of Theorem 4: (&) Suppose that ZC is acyclic:, then there exists f as in the definition 14. We are going to prove by induction on k that for every literal name I, if f(Z:) = k, then G+I Q(Z)) = Tk+dW) (I) If k = 0. We know that that for every literal na;me I’, f(Z’) > 0. Therefore, every integrity constraint containing -11 is of the form V(-1(X!) V v(y)), where II, only mentions builtin predicates. This is because if there were any other literal I’ in the integrity constraint, we would have f(Z’) < f(Z) = 0. Then fi (Z(z)) = Tz(Z(z)).

78

(II) Suppose that the property is true for every m < k. We know that Tk+z(Z(X)) is of the form: I(n) A i i=l Lemma 6. If T, (Z(2)) is of the form:

z(z) ~v(-fi,Y,)(G(f,-G) A VWi(%jji)),
&i( v Tk+l(h,j(%,j)) VVi(%)), j=l i=l

then T,+t (Z(2)) is of the form: where &i is a sequence of quantifiers over all the variables Si,r , . . . , zi,.,nli,Xi not appearing in 2, and Tk+i (Z(2)) is of the form: l(z) A i f2i( 7 i=l j=l Tk(h,j(fi,j)) vVi(fi)).

I(

~V(~i.yi)(Tl(Ci(l,xi)) i=l Vvi(Z,yi)),

By definition of f, we know that for every literal name Zi,j in the previous formulas, f(Zi,j) < k. Then by induction hypothesis Tk(Z(-fi,j)) = z+l(Zi,j(Xi,j)) (since if Tm(Z’(R)) = Tm+l(Z’(R)), then for every n 2 m, T,(Z’(X)) = T,+l(Z’(f))). (+) Suppose that for every literal name 1, T,(Z(z)) is finite. The for every literal name 1 there exists a first natural number k such that Tk(Z(P)) = Tk+t(Z(E)). Let us define a function f, from the literal names into the natural number, by f(Z) = k (k as before). We can show that this is a well defined function that behaves as in definition 14: since if V(VEt Zi(?i) VW(?)) E ZC, then for every 1 5 s 5 m, Tf(+) (7Zs(&)) is of the form
S-l

Lemma7. If for a ground tupleH, T,(Z(c)) kV(V$=t Zi(n,zj)), then T,+i(Z(d)) b V(V&t Tt(Z>(a,Zj))).

Proof of Theorem 6: Suppose that for a natural number IZ, W’L(@)) 3 Tn+l (z(X))) is a valid sentence. We are going to prove that for every m 2 n, V,$Tm(Z(2)) > Tm+t (Z(X))) is a valid sentence, by induction on m. (I) If m = n, by hypothesis. (II) Suppose that V$T,(Z(i)) > T,+t (Z(X))) is a valid sentence. For every clause VT=, Zj(.?,?j) Vy@Z) in T,+t (Z(2)) and for every ground tuple h we have that

c i=s+l Tf(-ls)-l(zi(~ii))VW(~))Ae(~~),

(9)

where Q is a sequence of quantifiers over all the variables 21, . . . . &, 7, not appearing in zss,and Tf(+)+t (7Zs(2$)) is of the form
S-l i=l Q i=s+l Tf(+)(zi(%)) VV(Y)) A@(-%). (10)

By lemma 7 and considering that w only mentions built-in predicates we have that Tm+i (Z(6)) k V(i& Tt (Zi(G,zj)) V ~(a,.?)), and from this and lemma 6 we can conclude that V2(Tm+l(Z(2)) > Tm+2(Z(2))) is a valid sentence. Proof of Theorem 7: Let Q(2) be a domain independent query and r a database instance. Define A, = {i ( r I=T,(Q(i))}. We know that for every n: A,+1 c A,,, therefore A = {Ai 1i < o} is a family of subsets of Ao. But A0 is finite because Q(2) is a domain independent query. Thus, there exists a minimal element A,,, in A. For this element, it holds that for every k 2 m: A, = Ak, since Ak c A,.

Bydefinition T’(+) off, (4%)) = T~(~z~)+I(~~(%)). Then, by the form of (9) and (lo), we conclude that for every i # s, Tf(+)-l (Zi(zi)) = T,~(Q (Zi(fi)), and then, again by definition Off, f(Zi) < f(lZ,). Proof of Corollary 3: The following stratification function from literals to N can be defined: f( TPi) = 0 and f(Pj) = 1, where Pi, Pj are relation names. Proof of Theorem 5: For uniform constraints the residues do not contain quantifiers. Therefore T,(Z(i)) for every n 2 0 is quantifier-free and contains only the variables that occur in i. There are only finitely many inequivalent formulas with this property, and thus T,(Z(.?)) is finite.

79

Similar Documents

Free Essay

Consistent Hashing

...1. Consistent Hashing Consider the following two scenarios. Describe in each case why consistent hashing is likely to perform better than hashing. Scenario 1: There is a fixed set of cache servers implementing consistent hashing and a population of clients who have incomplete views of the system i.e. each client only knows about a fraction of the servers Scenario 2: There is a set of cache servers that change i.e. nodes come and go. Answer 1: Hashing: Hashing is an easy to implement and quick to evaluate algorithm. Let us consider that the fixed set of cache servers are ‘n’, that is the number of nodes. Let us number the computers 0, 1, 2,…,n-1. According to the hashing algorithm, the key value pair (k, v) will be stored on the cache ‘hash(k) mod n’ where hash() is any function that converts the arbitrary string k to a non-negative integer. Keys are distributed evenly in a cluster for any reasonable number of keys, if the hash function being used in hash(k) mod n is a good hash function. Consistent Hashing: Consistent hashing is a cleverer algorithm compared to hashing. Here, the output range of the hash function is treated as a ring or fixed circular space. The largest hash value wraps around to the smallest hash value to form the ring. Each node in the system is assigned a random value which represents its ‘position’ on the ring. Each data item identified by a key k is assigned to a node by hashing data item’s key to yield a position on the ring. We walk clockwise...

Words: 1166 - Pages: 5

Premium Essay

Essay

...the many different levels, formats, and granularities of organizational information to make decisions * Changing the structures of information systems databases are much more costly and harder to do than by just building it right the first time * Transaction processing system, records sales * Point of sales example with organizational boundaries * Successfully collecting, compiling, sorting, and analyzing information can provide tremendous insight into how an organization is performing * Information Granularity: Extend of details within the information (fine, detailed, coarse and abstract). 2 Primary type of Information: Transactional and Analytical Transactional information (data) = encompasses all of the information (data) contained within a single business process or unit of work, and its primary purpose is to support the performing of daily operational tasks (for repetitive decision making) * Airline tickets, sales receipts (Model 6-8 chart) Analytical information – encompasses all organizational information (data), and its primary purpose is to support the performing of managerial analysis tasks (such as: whether the organization should build a new manufacturing plant or hire additional sales personnel) * 1/3 of all organizational content is contained within databases * Explicit knowledge – all the things that can be recorded (procedures, policies, formulas, algorithms) * Tacit knowledge – skills, talent etc ...

Words: 1536 - Pages: 7

Premium Essay

Transaction Management Ch 10

...a multi-user database environment give us a special interest in transaction management and concurrency control? Begin by exploring what a transaction is, what its components are, and why it must be managed carefully even in a single-user database environment. Then explain why a multi-user database environment makes transaction management even more critical. Emphasize the following points: • A transaction represents a real-world event such as the sale of a product. • A transaction must be a logical unit of work. That is, no portion of a transaction stands by itself. For example, the product sale has an effect on inventory and, if it is a credit sale, it has an effect on customer balances. • A transaction must take a database from one consistent state to another. Therefore, all parts of a transaction must be executed or the transaction must be aborted. (A consistent state of the database is one in which all data integrity constraints are satisfied.) All transactions have four properties: Atomicity, Consistency, Isolation, and Durability. (These four properties are also known as the ACID test of transactions.) In addition, multiple transactions must conform to the property of serializability. Table IM10.1 provides a good summary of transaction properties. Table IM10.1 Transaction Properties. |Multi-user | |Single-user | |atomicity: Unless all parts of the executed, the transaction is aborted | |Databases | |Databases...

Words: 4230 - Pages: 17

Free Essay

Acc444 Exam1

...ACC 444 EXAM 1 Chapter 1 * Introduction * System: set of two or more interrelated components that interact to achieve a goal. (systems consist of subsystems that support the larger system) * Goal conflict: when subsystem is inconsistent with the goals of another subsystem or with the system as a whole * Goal congruence: when a subsystem achieves its goals while contributing to the organization’s overall goal * Data: facts that are collected, recorded, stored and processed by an information system (activities, resource, people) * Information: data that have been organized and processed to provide meanings and improve the decision-making process (better decisions quantity and quality of information increase) * Information overload: when limits to the amount of information human mind can absorb and process are passes, resulting in a decline in decision making quality and increase in cost of providing that information * Information technology(IT): information designers use to help decision makers more effectively filter and condense information * Value of information: the benefit produced by the information minus the cost of producing it * Seven characteristics of useful information: relevant, reliable, complete, timely, understandable, verifiable, accessible (RRCTUVA) * Information needs and business process * Business process: a set of related, coordinated and structured activities and tasks that are performed by a person or by...

Words: 1532 - Pages: 7

Premium Essay

File Organization Terms and Concepts

...FILE ORGANIZATION TERMS AND CONCEPTS THE DATA HIERARCHY A computer system organizes data in a hierarchy that starts with bits and bytes and progresses to fields, records, files, and databases * A bit represents the smallest unit of data a computer can handle * A group of bits, called a byte, represents a single character, which can be a letter, a number, or another symbol (A,2?,S) * A grouping of characters into a word, a group of words, or a complete number (such as a person’s name or age) is called a field Ex: employee Last name, Customer Account number * A group of related fields, such as the student’s name, the course taken, the date, and the grade, comprises a record; Ex: There will be one record for every one * A group of records of the same type is called a file. Ex: Employee Benefits file, Employee payroll file * Database: A group of related files about a specific entity Ex: HR database PROBLEMS WITH THE TRADITIONAL FILE ENVIRONMENT In most organizations, systems tended to grow independently without a company-wide plan. Accounting, finance, manufacturing, human resources, and sales and marketing all developed their own systems and data files. Figure 6-2 illustrates the traditional approach to information processing * In the company as a whole, this process led to multiple master files created, maintained, and operated by separate divisions or departments. As this process goes on for 5 or 10 years, the organization is saddled...

Words: 3898 - Pages: 16

Premium Essay

Practice Questions for Ibm Db2 10 Certification

...native SQL stored procedures. D. NUMTCB parameter can be a value greater than 1 when a stored procedure invokes DB2 utilities. Answer: B 2.If a single row of the PLAN_TABLE has a 'Y' value in more than one of the sort composite columns, what is indicated.? A. The next sort step will perform two sorts. B. There are multiple sorts in the plan step. C. One sort in the plan step will accomplish two tasks. D. Two sorts are performed on the new table of a star join. Answer: C 3.What IBM provided stored procedure will access DB2 real time statistics tables? A. DSNAEXP B. DSNAIMS C. DSNACCOX D. DSNLEUSR Answer: C 4.The EXPLAIN STMTCACHE ALL statement provides information about SQL tuning. Which information is part of the DSN_STATEMENT_CACHE_TABLE? A. Filter factor information. B. Stage 1 and stage 2 information. C. Number of columns used in an index. D. Number of times an SQL statement is executed. Answer: D 5.Which two of the following DB2 performance features will ignore clustering in favor of faster insert performance? (Choose two.) A. Append B. Inline LOBs C. Member cluster D. Volatile table E. Include columns Answer: A,C 6.When is a merge scan join a well performing access path? A. When the number of qualifying rows of the inner and outer table are both large. B. When the query references at least two dimensions and the STARJOIN subsystem...

Words: 8187 - Pages: 33

Premium Essay

Db Testing

...about Database Testing Presented by: Mary R.Sweeney Exceed Technical Training & Consultation Copyright Sammamish Software Services 2003. All rights reserved. 1 Today’s complex software systems access heterogeneous data from a variety of backend databases. The intricate mix of client-server and Web-enabled database applications are extremely difficult to test productively. Testing at the data access layer is the point at which your application communicates with the database. Tests at this level are vital to improve not only your overall test strategy, but also your product’s quality. In this presentation you’ll find out what you need to know to test the SQL database engine, stored procedures, and data views. Find out how to design effective automated tests that exercise the complete database layer of your applications. You’ll learn about the most common and vexing defects related to SQL databases and the best tools available to support your testing efforts. Copyright Sammamish Software Services 2003. All rights Reserved 1 8/26/2004 The Data Access Layer Testing at the data access layer is the point at which your application communicates with the database. ! In this presentation we’ll discuss why tests at this level are vital to improve not only your overall test strategy, but also your product’s quality ! Copyright Sammamish Software Services 2003. All rights reserved. 2 How to design effective automated tests that exercise the complete database layer...

Words: 5030 - Pages: 21

Premium Essay

Good

...AC14/AT11 Database Management Systems TYPICAL QUESTIONS & ANSWERS PART -I OBJECTIVE TYPE QUESTIONS Each Question carries 2 marks. Choosethe correct or the best alternative in the following: Q.1 Which of the following relational algebra operations do not require the participating tables to be union-compatible? (A) Union (B) Intersection (C) Difference (D) Join Ans: (D) Q.2 Which of the following is not a property of transactions? (A) Atomicity (B) Concurrency (C) Isolation (D) Durability Ans: (B) Q.3 Relational Algebra does not have (A) Selection operator. (C) Aggregation operators. (B) Projection operator. (D) Division operator. Ans: (C ) Q.4 Checkpoints are a part of (A) Recovery measures. (C ) Concurrency measures. (B) Security measures. (D) Authorization measures. Ans: (A) Q.5 Tree structures are used to store data in (A) Network model. (B) Relational model. (C) Hierarchical model. (D) File based system. Ans: (C ) Q.6 The language that requires a user to specify the data to be retrieved without specifying exactly how to get it is (A) Procedural DML. (B) Non-Procedural DML. (C) Procedural DDL. (D) Non-Procedural DDL. Ans: (B) Q.7 Precedence graphs help to find a 1 AC14/AT11 Database Management Systems (A) Serializable schedule. (C) Deadlock free schedule. (B) Recoverable schedule. (D) Cascadeless schedule. Ans: (A) Q.8 The rule that a value of a foreign key must appear...

Words: 20217 - Pages: 81

Premium Essay

Asignment

...Oracle® Database Concepts 10g Release 2 (10.2) B14220-02 October 2005 Oracle Database Concepts, 10g Release 2 (10.2) B14220-02 Copyright © 1993, 2005, Oracle. All rights reserved. Primary Author: Michele Cyran Contributing Author: Paul Lane, JP Polk Contributor: Omar Alonso, Penny Avril, Hermann Baer, Sandeepan Banerjee, Mark Bauer, Bill Bridge, Sandra Cheevers, Carol Colrain, Vira Goorah, Mike Hartstein, John Haydu, Wei Hu, Ramkumar Krishnan, Vasudha Krishnaswamy, Bill Lee, Bryn Llewellyn, Rich Long, Diana Lorentz, Paul Manning, Valarie Moore, Mughees Minhas, Gopal Mulagund, Muthu Olagappan, Jennifer Polk, Kathy Rich, John Russell, Viv Schupmann, Bob Thome, Randy Urbano, Michael Verheij, Ron Weiss, Steve Wertheimer The Programs (which include both the software and documentation) contain proprietary information; they are provided under a license agreement containing restrictions on use and disclosure and are also protected by copyright, patent, and other intellectual and industrial property laws. Reverse engineering, disassembly, or decompilation of the Programs, except to the extent required to obtain interoperability with other independently created software or as specified by law, is prohibited. The information contained in this document is subject to change without notice. If you find any problems in the documentation, please report them to us in writing. This document is not warranted to be error-free. Except as may be expressly permitted in your license agreement...

Words: 199783 - Pages: 800

Premium Essay

Study Guide

...® OCA Oracle Database 11g: SQL Fundamentals I Exam Guide (Exam 1Z0-051) ABOUT THE AUTHORS John Watson (Oxford, UK) works for BPLC Management Consultants, teaching and consulting throughout Europe and Africa. He was with Oracle University for several years in South Africa, and before that worked for a number of companies, government departments, and NGOs in England and Europe. He is OCP qualified in both database and Application Server administration. John is the author of several books and numerous articles on technology and has 25 years of experience in IT. Roopesh Ramklass (South Africa), OCP, is an independent Oracle specialist with over 10 years of experience in a wide variety of IT environments. These include software design and development, systems analysis, courseware development, and lecturing. He has worked for Oracle Support and taught at Oracle University in South Africa for several years. Roopesh is experienced in managing and executing IT development projects, including infrastructure systems provisioning, software development, and systems integration. About the Technical Editor Bruce Swart (South Africa) works for 2Cana Solutions and has over 14 years of experience in IT. Whilst maintaining a keen interest for teaching others, he has performed several roles including developer, analyst, team leader, administrator, project manager, consultant, and lecturer. He is OCP qualified in both database and developer roles. He has taught at Oracle University...

Words: 150089 - Pages: 601

Premium Essay

Sql Book

...A GUIDE TO SQL Eighth Edition This page intentionally left blank A G U I D E TO S Q L Eighth Edition Philip J. Pratt Grand Valley State University Mary Z. Last University of Mary Hardin-Baylor Australia • Brazil • Japan • Korea • Mexico • Singapore • Spain • United Kingdom • United States A Guide to SQL, Eighth Edition Philip J. Pratt, Mary Z. Last Vice President, Publisher: Jack Calhoun Editor-in-Chief: Alex von Rosenberg Senior Acquisitions Editor: Charles McCormick, Jr. Product Manager: Kate Hennessy Development Editor: Jessica Evans Editorial Assistant: Bryn Lathrop Marketing Director: Brian Joyner Marketing Manager: Bryant Chrzan Marketing Communications Manager: Libby Shipp Marketing Coordinator: Suellen Ruttkay Content Project Manager: Matt Hutchinson Art Director: Stacy Jenkins Shirley, Marissa Falco Cover Designer: Joseph Sherman Cover Image: Getty Images/Taxi/Chris Bell Manufacturing Coordinator: Denise Powers © 2009 Course Technology, Cengage Learning ALL RIGHTS RESERVED. No part of this work covered by the copyright hereon may be reproduced, transmitted, stored, or used in any form or by any means graphic, electronic, or mechanical, including but not limited to photocopying, recording, scanning, digitizing, taping, Web distribution, information networks, or information storage and retrieval systems, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the publisher....

Words: 48772 - Pages: 196

Premium Essay

Dw Note for Finance

...knowledge from large amounts of data collected in a modern enterprise Data warehousing, machine learning Acquire theoretical background in lectures and literature studies Obtain practical experience on (industrial) tools in practical exercises Data warehousing: construction of a database with only data analysis purpose • Purpose   Business Intelligence (BI) Machine learning: find patterns automatically in databases 2 •1 Literature • Multidimensional Databases and Data Warehousing, Christian S. Jensen, Torben Bach Pedersen, Christian Thomsen, Morgan & Claypool Publishers, 2010 • Data Warehouse Design: Modern Principles and Methodologies, Golfarelli and Rizzi, McGraw-Hill, 2009 • Advanced Data Warehouse Design: From Conventional to Spatial and Temporal Applications, Elzbieta Malinowski, Esteban Zimányi, Springer, 2008 • The Data Warehouse Lifecycle Toolkit, Kimball et al., Wiley 1998 • The Data Warehouse Toolkit, 2nd Ed., Kimball and Ross, Wiley, 2002 3 Overview • • • • Why Business Intelligence? Data analysis problems Data Warehouse (DW) introduction DW topics    Multidimensional modeling ETL Performance optimization 4 •2 What is Business Intelligence (BI)? • From Encyclopedia of Database Systems: “[BI] refers to a set of tools and techniques that enable a company to transform its business data into timely and accurate information for the decisional process, to be made available to the right persons in the most suitable form.” 5 What is Business Intelligence (BI)...

Words: 8493 - Pages: 34

Premium Essay

Database Management System

...DATABASE S YSTEMS DESIGN, IMPLEMENTATION, AND MANAGEMENT CARLOS CORONEL • STEVEN MORRIS • PETER ROB Australia • Brazil • Japan • Korea • Mexico • Singapore • Spain • United Kingdom • United States Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Database Systems: Design, Implementation, and Management, Ninth Edition Carlos Coronel, Steven Morris, and Peter Rob Vice President of Editorial, Business: Jack W. Calhoun Publisher: Joe Sabatino Senior Acquisitions Editor: Charles McCormick, Jr. Senior Product Manager: Kate Mason Development Editor: Deb Kaufmann Editorial Assistant: Nora Heink Senior Marketing Communications Manager: Libby Shipp Marketing Coordinator: Suellen Ruttkay Content Product Manager: Matthew Hutchinson Senior Art Director: Stacy Jenkins Shirley Cover Designer: Itzhack Shelomi Cover Image: iStock Images Media Editor: Chris Valentine Manufacturing Coordinator: Julio Esperas Copyeditor: Andrea Schein Proofreader: Foxxe Editorial Indexer: Elizabeth Cunningham Composition: GEX Publishing Services © 2011 Cengage Learning ALL RIGHTS RESERVED. No part of this work covered by the copyright herein may be reproduced, transmitted, stored or used in any form or by any means graphic, electronic, or mechanical, including but not limited to photocopying, recording, scanning, digitizing, taping, Web distribution, information networks, or information storage and retrieval systems, except as permitted...

Words: 189848 - Pages: 760

Premium Essay

Essay

...no: MCS-2010-23 June 2010 Principles for Distributed Databases in Telecom Environment Imran Ashraf Amir Shahzed Khokhar School of Computing School of Computing Blekinge Institute of Technology Blekinge Institute of Technology Box 520520 Box SE – 372372 Ronneby SE – 25 25 Ronneby Sweden Sweden This thesis is submitted to the School of Computing at Blekinge Institute of Technology in partial fulfillment of the requirements for the degree of Master of Science in Computer Science. The thesis is equivalent to 20 weeks of full time studies. Contact Information: Author(s): Imran Ashraf Address: c/o Gulfam Abbas, Älgbacken 4 LGH 081, 37234 Ronneby, Sweden E-mail: im_qamar@yahoo.com Phone: +46 700746734 Amir Shahzed Khokhar Address: c/o Gulfam Abbas, Älgbacken 4 LGH 081, 37234 Ronneby, Sweden E-mail: amir_ask@yahoo.com Phone: +46 760811926 University advisor(s): Professor Lars Lundbarg School of Computing Blekinge Institute of Technology, Sweden External advisor(s): Magnus Vigerlöf Ericsson AB Address: Ölandsgatan 1, 371 23 Karlskrona Phone: +46 10 7140404 School of Computing Blekinge Institute of Technology Box 520 SE – 372 25 Ronneby Sweden Internet Phone Fax : www.bth.se/com : +46 457 38 50 00 : + 46 457 102 45 2 Abstract Centralized databases are becoming bottleneck for organizations that are physically distributed and access data remotely. Data management is easy in centralized databases. However, it carries high communication cost and most importantly...

Words: 17534 - Pages: 71

Free Essay

Database Administration

...VERSANT Dattabase Fundamenttalls Manuall VERSANT Da abase Fundamen a s Manua June 2003 VERSANT Dattabase Fundamenttalls Manuall VERSANT Da abase Fundamen a s Manua June 2003 VERSANT Database Fundamentals Manual This page is intentionally blank. 2 VERSANT Database Fundamentals Manual Table of Contents Chapter 1: System Description ..............................................................................................................8 Versant Developer Suite 6.0: An Overview..........................................................................................9 VERSANT Features ........................................................................................................................12 Storage Architecture ......................................................................................................................22 Software Structure .........................................................................................................................24 Language Interfaces .......................................................................................................................25 System Usage Notes.......................................................................................................................28 Chapter 2: Objects.............................................................................................................................34 Object Types.................................

Words: 44539 - Pages: 179