Free Essay

Information System

In:

Submitted By shaddie
Words 18634
Pages 75
Information Systems Lecture Notes
G´bor Bodn´r a a RISC-Linz, Johannes Kepler University, A-4040 Linz, Austria email: Gabor.Bodnar@risc.uni-linz.ac.at www: http://www.risc.uni-linz.ac.at/people/gbodnar January 23, 2005

2

Contents
0 Introduction 0.1 Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.2 Information Sources . . . . . . . . . . . . . . . . . . . . . . . . . 1 Data Modeling 1.1 Introduction . . . . . . . . . . . . . . . . . 1.2 The Entity-Relationship Model . . . . . . 1.2.1 Entities, Attributes, Relationships 1.2.2 Classification of Relationships . . . 1.2.3 Keys . . . . . . . . . . . . . . . . . 1.2.4 Entity-Relationship Diagrams . . . 1.2.5 Entity-Relationship Design . . . . 1.2.6 Exercises . . . . . . . . . . . . . . 1.3 The Relational Model . . . . . . . . . . . 1.3.1 Relational Structure . . . . . . . . 1.3.2 Relational Algebra . . . . . . . . . 1.3.3 Functional Dependencies . . . . . . 1.3.4 Normal forms . . . . . . . . . . . . 1.3.5 Indexing and Hashing . . . . . . . 1.3.6 Exercises . . . . . . . . . . . . . . 1.4 SQL . . . . . . . . . . . . . . . . . . . . . 1.4.1 Data Definition . . . . . . . . . . . 1.4.2 Simple Queries . . . . . . . . . . . 1.4.3 Database Modification . . . . . . . 1.4.4 Views and Joins . . . . . . . . . . 1.4.5 Embedded SQL . . . . . . . . . . . 1.4.6 Exercises . . . . . . . . . . . . . . 2 Information Systems On-Line 2.1 On-Line Databases . . . . . . . . . . . 2.1.1 Security Control . . . . . . . . 2.1.2 Transaction Management . . . 2.1.3 Static Archives . . . . . . . . . 2.1.4 Search Engines . . . . . . . . . 2.1.5 Exercises . . . . . . . . . . . . 2.2 In Practice . . . . . . . . . . . . . . . 2.2.1 Web Servers . . . . . . . . . . . 2.2.2 Database Management Systems 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 6 7 7 10 10 11 12 13 14 17 18 18 20 26 29 31 37 39 39 41 44 45 47 48 49 49 49 51 58 62 65 66 66 66

4 2.2.3

CONTENTS Dynamic Content Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 67 68 68 72 72 73 74 76 77 78 87 89 90 91 94 96 98 100 105 107 107 109 113 115

3 XML 3.1 XML basics . . . . . . . . . . . . . . 3.1.1 Syntax . . . . . . . . . . . . . 3.1.2 XHTML . . . . . . . . . . . . 3.2 Namespaces . . . . . . . . . . . . . . 3.2.1 Uniform Resource Identifiers 3.2.2 Namespaces in XML . . . . . 3.3 XML Schema . . . . . . . . . . . . . 3.3.1 Schema declaration . . . . . . 3.3.2 Types, Elements, Attributes . 3.3.3 Exercises . . . . . . . . . . . 3.4 XPath . . . . . . . . . . . . . . . . . 3.4.1 Data Model . . . . . . . . . . 3.4.2 Location Paths . . . . . . . . 3.4.3 Expressions, Core Functions . 3.5 XSL Transformations . . . . . . . . . 3.5.1 Templates . . . . . . . . . . . 3.5.2 Additional features . . . . . . 3.5.3 Exercises . . . . . . . . . . . 3.6 XML Query . . . . . . . . . . . . . . 3.6.1 Data Model and Types . . . 3.6.2 Expressions . . . . . . . . . . 3.6.3 Query Prolog . . . . . . . . . 3.6.4 Exercises . . . . . . . . . . .

Chapter 0

Introduction
0.1 Foreword

The intended audience of the course is students of mathematics in the fifth semester of their studies. The goal is to provide a brief but up to date introduction to information systems with special attention to web-based solutions. The topic is treated as a supplementary subject to the studies in mathematics, with no desire of comprehensive, detailed and far reaching discussion of any of the areas it deals with. These notes introduce only the essential concepts providing additional examples for easier understanding. After the course, the student should have a clear concept of elementary problems and solution techniques in relational data modeling and he should have basic knowledge in relational database manipulation. With additional programming skills (PHP/Perl) he should be able to write simple web-interfaces to relational databases, using SQL. He should be able to create moderately complex XML documents, validate them using XSchema, address its parts using XPath and, provided he knows HTML, transform it into a displayable HTML document by XSLT. These lecture notes are built up as follows. In Chapter 1 we present an introduction to the theoretical foundations of relational data modeling. In Section 1.2 we discuss the Entity-Relationship (ER) model from the basic concepts till the standard ER-techniques to be used for data modeling. In Section 1.3 we continue by applying the results of Section 1.2 for relational database design, now dealing also with manipulation of tables and redundancy elimination i.e. normalization. Section 1.4 contains the SQL specifications (a standard language to form queries for- and to create, destroy and manipulate relational databases) and via several examples demonstrates its usage. In Chapter 2 we briefly discuss more practical issues of on-line databases to “scratch the surface” of the enormous amount of knowledge accumulated in the nowadays ubiquitous web-based information systems with relational database support. In Section 2.1 we discuss two relevant kind of on-line information systems: transaction management systems, and static archives (which are often filled with transaction data for later analysis). We describe the arising problems in both environments and the rigorous requirements they pose for information systems which intend to achieve success in these scenarios. Finally, in Section 2.2 we introduce state of the art free software from which one can build web based 5

6

CHAPTER 0. INTRODUCTION

information systems for virtually any purpose. In Chapter 3 we provide a small survey on a few XML technologies which are related to hierarchical data description and retrieval. In Section 3.1 we introduce the generic XML framework with the example of XHTML followed by a discussion on data identification problems and solutions in Section 3.2. Section 3.3 deals with document validation. Section 3.4 provides means to refer to parts of XML document and to navigate in them, and in Section 3.5 we present an introduction to the topic of XML document transformations e.g. for data presentation using HTML. Section 3.6 addresses the problem of data retrieval in more general context. As this document is of purely introductory nature, I tried to supply enough references for further studies. Most of the information sources are freely available on the Internet.

0.2

Information Sources

In this section we summarize the information sources these lecture notes rely on. The sources are by no means the only ones that can be used to study the topics discussed in these notes, however they provide a good introduction to them. For the (relational) data modeling part of the notes a standard reference is [3], which gives an extensive overview and a thorough introduction to databases. It is also recommendable to look at the excellent lecture notes of [14]. For the part on on-line databases one can also use [3], and in particular for search engines [7]. For the XML part a brief, to the point introduction can be found in [2]. However the original specifications are all available on line: [8, 9, 10, 13, 11]. It cannot be emphasized enough to search the Internet for further information. By the nature of the topic, an enormous amount of information is available on-line, awaiting to be discovered.

Chapter 1

Data Modeling
Data modeling is an important part of database design. The data model focuses on what data should be stored in the database, exploring the relation between data entities. The goal of the data model is to ensure that all data object required by the database is accurately represented. It provides the overall logical structure (the “blueprint”) of the database. In this chapter we take a look at relational data modeling techniques and SQL, a commonly used data manipulation language.

1.1

Introduction

Databases are now ubiquitous in any areas where information has to be stored and processed. Their importance comes from the value of information they store. The most important criteria for them can be described by two keywords reliability (covering many subareas from robustness to issues related to concurrency and security) and efficiency (covering not only data manipulation speed but also its flexibility towards new requirements or the degree to which database supports application software development). A Database management system (DBMS) consists of • a collection of structured, interrelated and persistent data, • a set of application programs to access, update and manage the data. Example 1.1. Let us take tasks of the administration of a university. It has to maintain files of teachers, students, courses, the relationship between courses and students for each semester, the grades arising from the previous relationships and countless other things. The administration has to be able to perform the following tasks: add/remove a student/teacher/course, assign courses to teachers/students each semester, add/modify grades of a student for the courses she took, generate the list of grades of students, generate the list of courses given by a teacher, etc. This amounts to managing large pieces of data with additional constraints (for instance courses can build on each other) and numerous reports to be generated. Moreover it should not be too difficult to upgrade the system to meet new requirements that come from the real world (if the law changes, a new faculty/institute/branch of studies starts, etc.). 7

8

CHAPTER 1. DATA MODELING

Thinking a bit about this example shows the major problems we run into when trying to solve the problem in an ad-hoc way. Here are the problems commonly known to be coped with in a DBMS: • Data redundancy and inconsistency. – The same information should not appear at several places of the system. – Related data should be updated consistently. • Data integrity. – The system should ensure that the data stored in it fulfills certain prescribed constraints. – The system should adopt to the change of the constraints. – The system should be able to recover from crashes. • Data access. – The system should be able to generate answers to a large class of queries. – The system should support efficient data retrieval (e.g. by indexing, hashing). • Data isolation. – Data(-elements) of different types and/or magnitudes should be handled by the system (like text documents, numerical data, photos, etc.). • Concurrency. – The system should be able to work with multiple users at the same time. – The concurrent modifications should not disturb the consistency of the data. – Avoid deadlocks. • Security. – Handle access rights for users. – Secure remote access to the database. Data abstraction is the most important tool to treat the above problems conceptually. There are three levels of abstraction (sometimes also referred to as database schemes): • Physical level : It deals with the storage and retrieval of the data on the disk (data and index file structures). This level is usually hidden from the users by the DBMS. • Conceptual level : In this level the questions of what data are stored and what are the relationships between these data are decided. This is the level of the database administrator

1.1. INTRODUCTION

9

• View level : It describes a subset of the data in the database for a particular set of users. The same data can be viewed in many different ways (for instance the teacher wants list of the students subscribed for his course while the dean of studies is interested in the number of people attending the course). This is the level of the users of the database. Data models are collections of conceptual tools to describe data, data relations, data constraints and data semantics. There are object-based, recordbased and physical data models. Object-based models: • Describe data on the conceptual and view levels. • Provide flexible structuring capabilities. • Data constraints can be specified explicitly. Examples are: Entity-Relationship (ER) model, Object-oriented model, Functional model, etc. etc. Record-based models: • Describe data on the conceptual and view levels. • Used for databases with fixed record structure; the sizes of the fields of the records are usually also fixed. • There exist query languages for record based models (e.g. SQL). Examples are: Relational model, Network model, Hierarchical model, etc. A snapshot of the data contained in a database at a given point in time is called an instance of the database. In contrast, the database scheme is the overall design of the database (structure, relations, constraints in general). An important concept is the one of data independence, which expresses that changes in the the database scheme at a given level of abstraction should not affect the schemes at higher levels. Thus we have: • Physical data independence: Changes in the physical level should not make changes in the data model necessary and therefore application programs need not be rewritten. • Logical data independence: Changes in the conceptual level should not make necessary the application programs to be rewritten. Standard terminology in DBMSes: • Data Definition Language (DDL): Used to describe the database scheme (structure, relations, constraints). The compiled DDL statements are called the data directory (the metadata describing the database). • Data Manipulation Language (DML): Used to pose queries (select) or to modify (insert, update, delete) the database. In nonprocedural DMLs the user only specifies what data is needed, while in procedural DMLs also the way it should be retrieved.

10

CHAPTER 1. DATA MODELING • Database Manager : The program which provides an interface between the users of the database (that can be application programs) and the data stored in the database on the physical level. This program is responsible for enforcing most of the requirements for a DBMS. • Database Administrator : The person controlling the DBMS. He defines database schemes, defines the storage structure and the access methods, provides authorization for data access, specifies integrity constraints. • Database Users: Application programmers (interact with the system via DML calls from their programs), Naive users (interact with the system via application programs). The overall structure of a DBMS consists of the following components: • File manager : Responsible for low-level data storage and retrieval on the disk (this functionality is usually provided by the operating system). • Database manager : See above. • Query processor : Translates statements in a query language into low-level instructions the Database manager understands. • DDL compiler : Converts DDL statements into database metadata stored in the data directory. • Data file: Store the data themselves. • Data directory: Stores information about the structure of the database. • Indices: Accelerate data retrieval from the database.

1.2

The Entity-Relationship Model

The ER model views the real world as the set of entities and relationships between them. This modeling technique maps well to the relational model (the constructs of the ER model can easily be turned into corresponding tables) discussed in Section 1.3.

1.2.1

Entities, Attributes, Relationships

An entity is an object with identity, i.e. an object which is distinguishable from other objects. Entities are either concrete or abstract objects, persons, things, documents, events, programs, etc., about which we intend to collect information. An entity set is the set of all entities of the same type (for instance the set of all students inscribed to the university, of the set of all cars registered in Austria). Entities are described by attributes which are, from the formal point of view, functions assigning to an entity an element from a domain of permitted values. Example 1.2. Let the entity set be the students of the university, let us describe a student for the sake of simplicity only with the attributes: first-name, lastname, student-ID, birth-date. The first-name and last-name attributes can take

1.2. THE ENTITY-RELATIONSHIP MODEL

11

values from the nonempty strings of latin characters which will be the domain of these attributes (viewing the attributes as functions, from a strict mathematical point of view what we call here domain is actually the codomain of the function, but let us not change the well established terminology of the ER model). The student-ID can be a seven-digit positive numbers and the birth-date can be a string of the format dd-mm-yyyy with the usual semantics. An entity (a particular student of the university) can be described then as {(first-name, Susan),(last-name, Becker),(student-ID, 0256098), (birth-date, 2706-1978)}. Of course, in any kind of information processing the entity appears only via its description by the set of attributes which are relevant from the point of view of the task to be carried out. A relationship represents association between two or more entities. A relationship set is the set of relationships of the same type. Recall that a relation over the sets Ei , . . . , Er is a subset of E1 × · · · × Er (with the ER terminology this is a relationship set). Example 1.3. Continuing Example 1.2, let another entity set be the set of courses offered in the current semester at the university. Let the attributes describing a course be the course-ID, teacher-ID, course-title. Let a course be described by {(course-ID, 322527),(teacher-ID, 024532),(course-title, Analysis I)}. If the student of the other exercise takes this course, there is a relationship between the two entities (the student and the course). If we take the entity sets of all students and all the courses offered, the arising relationship set will describe which student takes what courses this semester. It can be easily recognized that a relationship can also be taken as an abstract entity, and hence a relationship set can be described just as an entity set. Example 1.4. Continuing Example 1.3, the relationship set that associates students to the courses they take in a semester can be described via the entity set whose elements are described by the attributes student-ID and course-ID. The relationship between Susan Becker and the Analysis I course would then be described by {(student-ID, 0256098),(course-ID, 322527)}.

1.2.2

Classification of Relationships

The degree of a relationship is the number of entity sets it associates to each other. In Example 1.4 the degree of the relationship is two (students, courses). Higher degree relationships can be decomposed into degree-two ones, thus these are the most important occurrences. The (mapping) cardinality of a relationship expresses the number of entities that can be associated via it. The cardinality of a binary relationship between entity sets A and B can be One-to-One (1:1) An entity in A is associated with at most one entity in B, and an entity in B is associated with at most one entity in A. One-to-Many (1:N) An entity in A is associated with any number of entities in B. An entity in B is associated with at most one entity in A. Many-to-Many Entities in A and B are associated with any number from each other.

12

CHAPTER 1. DATA MODELING

Remark 1.5. Because of symmetry, a Many-to-One relationship between A and B is a One-to-Many relationship between B and A. Example 1.6. A typical One-to-One relationship would be the one between the students and the ID cards: Every student can have at most one ID card and every ID card can belong to at most one student (if the student lost the card, it should be immediately invalidated but it might take some time to produce a new card). A typical One-to-Many relationship would be the one between members and books of a library: A member can borrow many books but a book (which means here an instance) can be borrowed by only one person at a time. A typical Many-to-Many relationship would be the one between students and courses: A student can take many courses and a course can be attended by many students. The existence dependency of a binary relationship expresses that the existence of an entity is dependent on another, related entity. If in a relationship between entities X and Y, the existence of X depends on the existence of Y, we call Y the dominant and X the subordinate entity. Example 1.7. Let us take the examples the relationships between teachers and courses and between students and courses. The first one has existence dependency with teachers being the dominant entities and courses the subordinate ones. That is, for each course there must be someone to teach it. There is no existence dependency between students and courses, which means that certain courses are not obligatory for any students, thus there might be nobody taking them in a given semester. The direction of a binary relationship indicates the originating entity set. The entity set from which a relationship originates is the parent, while the one the relationship terminates is the child entity. In a One-to-One relationship the direction is from the independent entity to a dependent entity. If both entities are independent, the direction is arbitrary. With One-to-Many relationships, the entity occurring once is the parent. The direction of Many-to-Many relationships is arbitrary. Example 1.8. Let us take the One-to-One part in Example 1.6. The direction should be from students to ID-cards, as the existence of a student implies the existence of an ID-card (could be invalidated if the student has lost it).

1.2.3

Keys

Since entities must be distinguishable, we must be able to choose a set of attributes for each entity set, so that there is not two entity in the set described by the same set of attribute values. Usually the set of attributes is more than enough to distinguish entities, and what we try to do is to find small subsets of attributes that still have this property. A superkey is a nonempty set of attributes which allows us to uniquely identify an entity of the entity set. A superkey which does not have a proper subset which is also a superkey is called a candidate key. A primary key is a candidate key which is chosen by the database administrator to identify entities in an entity set.

1.2. THE ENTITY-RELATIONSHIP MODEL

13

Remark 1.9. There might be more than one candidate keys (since set inclusion is only a partial order). Which candidate key becomes a primary key is apparently a matter of choice. An entity set that does not possess sufficient attributes to form a primary key is called a weak entity set. One that does have a primary key is called a strong entity set. A weak entity set must be part of a One-to-Many relationship in a subordinate role while the entity set with the dominant role must be a strong one so that it allows distinction between the entities of the weak set via the relation. A discriminator is a set of attributes in the weak entity set that allows the distinction of the entities via a given relation with a strong entity set on which the weak one existence depends. A primary key for a weak entity set is the primary key of the strong entity set with which it is related together with a discriminator with respect to this relation. Example 1.10. Take the entity set of all the telephones in Austria, where an entity is described with the attribute phone-number (without any country or area codes). This is a weak entity set because many phones in different areas can have the same number (although the entities, that is the phones, are different). Make the system work we have to introduce the entity set of phone areas, where an entity is described by the attribute area-code. The One-to-Many relation associates to an area each phone in it. Since the areas cover Austria completely, no phones exist without an area, thus there is an existence dependency in which the areas are the dominant and the phones are the subordinate entities. The area-code is a superkey, a candidate key, and (after we declare it so) a primary key of the entity set of phone areas. The phone-number is a discriminator of the entity set of telephones with respect to the primary key we have chosen for the phone areas, via the relation we established between areas and phones. And the area-code with the phone number forms a primary key for the entity set of telephones in Austria. The primary key of a relationship set is formed from the primary keys of the entity sets in the relationship set. A foreign key is a set of attributes for an entity set in a One-to-Many binary relation on the “Many” side, which identifies the corresponding parent entity for each entity of the set. Foreign keys provide means to maintain data integrity in databases (called referential integrity). For instance in the example of Figure 1.1, if we add member-ID to the attributes of the Book entity set (with a special null value for books not borrowed) it becomes a foreign key that identifies the borrower of any book in the library.

1.2.4

Entity-Relationship Diagrams

ER diagrams are a graphical way to express the logical structure of a database. There is no standard for ER diagrams; therefore there are many “dialects” appearing in articles, CASE tools, etc. We use the following notation: • Entity sets are represented by labeled rectangles, for weak entity sets the rectangles have double border. Entity set labels (names) should be singular nouns.

14

CHAPTER 1. DATA MODELING • Relationship sets are represented by a solid line connecting two entity sets. The name of the relationship is written beside the line. Relationship names should be verbs. • Attributes, when included, are listed inside the entity set rectangle. Attribute names should be singular nouns. Members of the primary key can be denoted by underlining. • If the line ends in an arrow, the relationship cardinality is Many for the entity set at that end. Otherwise the cardinality is One. • Existence dependency is denoted by a bar crossing the line of the relationship set at the dominant entity set. Figure 1.1 provides an example.
Member member−ID first−name last−name Borrows Book book−ID author title

Figure 1.1: An ER diagram A database conforming to an ER diagram can be represented by a collection of tables (more on relational databases comes in Section 1.3). For each entity set corresponds a table with as many columns as many attributes the entity set is described with. A row of the table will represent an entity, containing the corresponding attribute values. For weak entity sets we attach to the columns of the primary key of the strong entity set on which it depends. Relationship sets can also be described this way. Tables will contain only a subset of all possible rows: the ones which represent entities in the system we describe. As the system changes in time, we may add, delete or modify rows. Table 1.1 shows an example. area-code 0732 0662 0512 05574 01 . . . area-name Linz Salzburg Innsbruck Bregenz Wien phone-number 4554323 243486 875678 875678 34452 . . . owner-first-name Monika Thomas Hans Hans John owner-last-name Beispiel Muster Schmidt Schmidt Example area-code 01 0732 0662 0732 05574

Table 1.1: ER diagram to tables

1.2.5

Entity-Relationship Design

Many-to-Many relationships cannot be taken over directly to the relational model, which is important in practice; therefore such relationships should be resolved to One-to-Many relationships as early in the design phase as possible. The method is to introduce a new entity type for the relation itself and associate the original entity types to this new one via One-to-Many relationships.

1.2. THE ENTITY-RELATIONSHIP MODEL

15

Example 1.11. Take the entity sets of students and courses and the Many-toMany relationship set between them that associates students and the courses they attend. We introduce then a new entity set with name Attendance with two attributes: student-ID, course-ID, whose entities express the fact that a certain student attends a given course. Then establish a relationship between a student and an entity in Attendance if the student-ID is the same, and we act analogously for courses and entities of Attendance. The ER diagram is shown in Figure 1.2. Please note that the new ER diagram requires that for each Attendance entity a student and a course entity exists.

Student student−ID first−name last−name Attends

Course course−ID course−title

Takes Student student−ID first−name last−name Attendance student−ID course−ID

Attended by Course course−ID course−title

Figure 1.2: Resolving Many-to-Many relationships Complex relationships of degree greater than two can, and should, be resolved to binary One-to-Many relationships using the same technique as for resolving Many-to-Many relationships (see the first two ER diagrams in Figure 1.3). Also description of relationships between relationships should use this technique for a conceptually clearer presentation. That is, one should create new entities for relationship sets, considering them on a higher abstraction level, and establish the relationships between them (see Figure 1.3). This method is called aggregation. If two or more entity sets share common attributes we can form their generalization by taking only the common attributes, which can be considered as creating an abstraction type (entity set) that describes similarities of the original entity sets. The generalized entity set, which is at a higher level of abstraction, is also called the supertype and the original entity sets, at the lower abstraction level, are the subtypes. Viewing the construction from top-down the subtypes inherit the attributes of the supertype. Subtypes can be mutually exclusive (an entity can be a member only one of the subtypes) or overlapping (otherwise). This process can be applied repeatedly in both directions, resulting a generalization hierarchy. Generalization can be expressed by a triangular element of the ER diagram via which the supertype is connected to the subtypes. Example 1.12. Take the entity set of the vehicles registered in Austria, described by the attributes serial-number, registration-number, type, owner-ID (and perhaps also others). By attaching new attributes that describe important information only for special types of the vehicles we can get subtypes for

16

CHAPTER 1. DATA MODELING

Student student−ID last−name first−name

Attends

Course course−ID course−title

Works−on

Assignment assignment−ID difficulty

Student student−ID last−name first−name

Takes

Attendance student−ID course−ID

Attended−by

Course course−ID course−title

Does Course−Work student−ID course−ID assignment−ID Used−in

Assigns

Assignment assignment−ID difficulty

Student student−ID last−name first−name

Takes

Attendance student−ID course−ID

Attended−by

Course course−ID course−title

Assigned−to

Course−Work student−ID course−ID assignment−ID Used−in

Assignment assignment−ID difficulty

Figure 1.3: Resolving complex relationships and aggregating

1.2. THE ENTITY-RELATIONSHIP MODEL

17

personal cars (adding for instance the attributes: PS and color), trucks (adding maximal-weight and type-of-load), buses (adding nr-of-seats and comfort-level), etc. In practice, there is the important phase of requirement analysis before the phase of data modeling. Very briefly, requirement analysis is used to achieve the following goals: • Determine what data (viewed as elementary objects) is necessary to include in the database. • Collect the descriptive information about these objects. • Identify and classify relationships between these objects. • Identify what transactions will be executed on the database and how will they affect the data objects. • Identify data integrity rules. The last two points concern already the functional view of the database. Then in the data modeling phase, using the ER model, the tasks are the following: • Identification of entity and relationship sets. • Drafting the initial ER diagram. • Refining the ER diagram by resolving complex relationships. • Add key attributes to the diagram. • Adding non-key attributes. • Constructing generalization hierarchies. • Adding integrity rules to the model. The ER modeling method provides a wide variety of possibilities and there is no standardize way to achieve optimal designs. One has to balance between possibilities like to describe a class of objects by an entity set or via a relationship set, whether to allow weak entity sets or prefer strong ones, whether to apply generalization or aggregation or omit them to keep the ER model simpler, and so on.

1.2.6

Exercises

Since 1 October 2003 car dealers can offer models of competitors in the same exhibition area in the EU. A dealer is contracted with certain car manufacturers who supply him with certain models which are eventually bought by certain customers. Taking this simplified model of car selling at a car dealer do the following. Exercise 1.2.1. Identify the entities and the entity sets they belong in this model.

18

CHAPTER 1. DATA MODELING

Exercise 1.2.2. Which attributes would you choose to describe entities in the entity sets you identified previously? Of course the sets of attributes can be made quite big if one wants to model reality more and more accurately. Here it is enough to introduce a few attributes without the desire of completeness. Exercise 1.2.3. Enumerate the relationships between the entities in this model. Exercise 1.2.4. Sketch a first ER diagram of this model. Exercise 1.2.5. Resolve complex relationships (relationships with degree > 2 and N:M relationships) and draw the refined ER diagram of the model. Exercise 1.2.6. Decide which are the most important entities in the model and choose primary keys for them. You might want to introduce new attributes at this step: remember that, for instance, student IDs are introduced artificially to describe students (as important entities) in the corresponding model. Exercise 1.2.7. Prescribe integrity constraints for the models. (Which entity existence depends on which other? Are there foreign keys?)

1.3

The Relational Model

The relational model was formally introduced by E. F. Codd in 1970 and became the dominant modeling technique in the IT sector. The relational model represents data in the form of two-dimensional tables, and a relational database is a collection of such tables each having a unique name.

1.3.1

Relational Structure

Let Di be domains (sets) for attributes (denoted by distinct names, say Ai ) for i = 1, . . . , n. Then an n-ary relation over the Di is a subset of their cartesian product: R ⊆ D1 × · · · × D n . If R is a finite set, it can be described as a table with m = #R rows and n columns: d11 . . . d1n d21 . . . d2n . . . dm1 ... dmn . A row (or tuple) represents that the attribute values appearing in it are in relationship with each other by R. The attribute values are arranged in each row that the jth column contains the value fromDj . Remark 1.13. We typeset attribute and table names in typewriter style, with the first letter of table names capitalized.

1.3. THE RELATIONAL MODEL

19

Example 1.14. Relations represented by tables can be found in Table 1.1 on page 14. In the first separated row we find the attribute names; the tables themselves consist of the data below. The domain of area-code, phone-number is the natural numbers and the domain of the other attributes is the set of strings (of course, with more effort we could also find smaller domains). From the viewpoint of the modeling process, we consider every attribute value in a table atomic. In other words the table is flat (the cells has no internal relational structure). A relational database is a collection of tables with unique names. Remark 1.15. From now on we will use the words “relation” and “table”, just as “row” and “tuple”, synonymously. From the definitions it follows that a table has no two identical rows and the order of rows is unspecified. Moreover the order of columns can also be changed as long as each row contains the corresponding attribute values in the same column. Atomicity is a relative notion. For instance virtually all relational database managers provide the domain of dates as a build in type, i.e. an atomic value. However, if we want, we can split it further to year, month and day components. We make distinction between the overall design of the database, which is called the database scheme, and a snapshot of the database at a point in time, which is called a database instance. We also call a list of attributes with their corresponding domains a relation scheme. The notions of superkey, candidate key, primary key, foreign key from Section 1.2 apply to the relational model. In context of relations a nonempty set K of attributes is a superkey for R if for any r, s ∈ R, r = s implies r|K = s|K (where r|K denotes the attribute values of r belonging to attributes in K). Example 1.16. Continuing Example 1.14, let us call the first table Phone-areas and the second Phone-numbers. Then we get a relational database, which can play the role of a simple phone book. We fixed the order of columns (i.e. attributes) in a certain way, but actually any other order would do. There are no two identical rows in any of the tables due to the area-code attribute in Phone-numbers. The set {area-code,area-name} is a superkey in Phone-areas, {area-code} is a candidate key, which is chosen to be the primary key for Phone-areas. Similarly, in Phone-numbers {area-code, phone-number} is chosen to be the primary key. We can also notice that the area-code attribute appears in both of the tables. This way the two tables are related to each other in a One-to-Many relationship (area-code is a foreign key in Phone-numbers). Finally, we can mention that sometimes it is unavoidable to “leave certain cells blank” in a table, simply because information might not be available there. This is done by introducing special null values in the domains of the attributes (we will denote them generally by NULL). However, this makes maintaining the integrity of the database more difficult, since in each transformation (updating and deleting rows, etc.) one has to check the special case of nullity. Moreover, since null values represent missing information comparisons involving them are always false.

20

CHAPTER 1. DATA MODELING

It is also commonly required that attributes of primary keys are never null (entity integrity). And for foreign keys it is required that (in a row) either all attribute values in the key are null or they match a row in the table for which the foreign key is the primary key (referential integrity).

1.3.2

Relational Algebra

The relational algebra is a (procedural) query language that allows us to execute several kind of operations on a relational database. The fundamental operations are the following: select (unary), project (unary), rename (unary), cartesian product (binary), set theoretic union (binary), set theoretic difference (binary). The set of additional operations are defined in terms of the fundamental ones are: set theoretic intersection (binary), natural join (binary), division (binary).
Student sid 0251563 0245654 0245675 fname Werner Andrea Daniela lname Schmidt Ritter Schmidt Attendance sid cid 0251563 327456 0251563 327564 0245654 327456 Course cid 327456 327564 title Analysis I Algebra I

Table 1.2: A small database of students and courses The operations may have additional parameters beside their input arguments; they are put in the subscript of the notation. The definitions of the operations are as follows: select The operations selects tuples of the relation that satisfy the predicate which is an additional parameter to it. It is denoted by σ. For example σlname=”Schmidt” (Student) results the relation (represented in table form) 0251563 0245675 Werner Daniela Schmidt . Schmidt

The atomic predicates allowed in the parameter of σ are =, =, is the markup. The markup consists of tags, for instance the pair and form a tag, where the former is the starting tag and the latter is the ending tag. The outermost tag is called the root element (in the example it is folder, the actual tags are formed by adding the signs and prepending / for the end-tag). There must be exactly one root element in an XML document, into which all the other tags are nested. The nesting must be a proper one, in the sense that for each non-root element there is a unique element which contains it and which contains no other element that also contains it. An XML document may also contain a prolog, which is just the text before the root element. The prolog is not part of the structured data, it usually contains compiler directives or self identification of the document. Elements Elements are the primary structuring units of XML documents. They are marked with start- and end-tags, where the name of the end tag must match

3.1. XML BASICS

69

the name of the start tag with a ’/’ prepended. The element, as a logical unit, contains the start-tag, its content and the end-tag. Between the start- and end-tags, elements can contain any combination of character data and other elements. The content of elements can be element if the element contains only elements (e.g. folder in the previous example), character if it contains only character data (e.g. to in the previous example), mixed if it contains both (e.g. email in the previous example), empty if it contains nothing (e.g. ). Empty-content elements can be abbreviated by putting a slash before the > sign of their start-tag and omitting the end-tag (e.g. ). The relationships between the elements can be classified quite naturally. Child element It is an element of another one in the first nesting level (e.g. both email elements are children of folder. Parent element It is the reverse of the child relationship (e.g. folder is the parent of both email elements). Sibling element These are elements with the same parent (e.g. the elements from, to, cc, subject inside an email element are siblings). Descendant element It is an element in the transitive closure of the child relationship (e.g. any of the to elements is a descendant of folder). Ancestor element It is an element in the transitive closure of the parent relationship (e.g. folder is an ancestor of any other elements, except itself, which is not surprising as it is the root element). Please note that the parent is uniquely defined in consequence of the proper nesting we required from XML documents. Names for elements can be chosen by the user according to the following rules. • Names are taken case sensitive. • Names cannot contain spaces. • Names starting with “xml” (in any case combination) are reserved for standardization. • Names can only start with letters or with the ’ ’, ’:’ characters. • They can contain alphanumeric characters and the ’ ’, ’-’, ’:’, ’.’ characters. The colon character, although allowed in names, should only be used for namespaces (see Section 3.2).

70 Attributes

CHAPTER 3. XML

As the example at the beginning of the section shows, elements can also contain attributes, which are name-value pairs, listed in the start-tags of elements. Here are the rules for attribute insertion into elements. • The naming rules of elements apply also for attributes. • Elements can contain zero or more attributes. • The names of the attributes must be unique within a start-tag. • Attributes cannot appear in end-tags. • Attribute values must be enclosed in single or double quotes. Attributes can be resolved into elements and character data can be put into attributes (though might not make much sense in many cases). For instance in the example we could have written for the first email the following. 20 Aug 2003 robert@company.com oliver@company.com Meeting Could we meet this week to discuss the interface problem in the NTL project? --Rob Or, for instance, the second email could have been formed as follows. Re: Meeting On 20 Aug 2003 rob@company.com wrote > Could we meet today to discuss the interface problem > in the NTL project? --Rob OK. What about today at 16:00? There are no strict rules of when to use attributes and when elements so it is up to the style of the user to decide in this question. As a generic rule one can put those data into attributes which are not important for most applications (or humans). Additional XML syntax Since characters like < and & have special meaning in XML, they cannot appear directly in character data. Whenever such a special character has to be included anyway, one has to substitute it with the corresponding entity reference that starts with a & and ends in a semicolon. The next table summarizes the entity references which can be used to substitute special characters.

3.1. XML BASICS Character < > ” ’ & Entity reference < > " ' &

71

In attribute values the same kind of quotation mark as the one used as the delimiter must be replaced with the corresponding entity reference. For example the following are equally valid attribute settings. Comments can be included in the XML document anywhere outside other markup with the following syntax. Here are the delimiters of the comment and obviously, the text of the comment should not contain the character sequence of the ending delimiter. The first line of the example () is the XML declaration, which is not required, however it is recommended to include it. Currently (summer 2003) only the 1.0 version of XML is in effect so that attribute is fixed. The encoding specifies the usual ASCII one (with ”UTF-16” it is the Unicode). The prolog may also include processing instructions that certain applications might interpret. For instance the line will tell for a web-browser to apply the my.css stylesheet for formatting the XML document when it displays it. A CDATA section can be used, outside other markup, to include text literally (special characters like My first web-page Hello world! ]]> Obviously, the ]]> character sequence should not appear in the included text. To summarize the most important requirements for well formed XML documents we recall the following.

72

CHAPTER 3. XML • XML elements are marked with start- and end-tags whose names must match. • There can be only one root element. • The elements must be properly nested into each other. • Elements can have attributes to store additional character data. • We have to obey certain naming rules for elements and attributes. • Special characters must be replaced by entity references, except inside a CDATA block.

3.1.2

XHTML

As there are many excellent introductions to XHTML and HTML available on the Internet, it is recommended that the reader gets the information from there. The official (though somewhat dry) specifications can be found at http: //www.w3.org/TR/xhtml1/ and http://www.w3.org/TR/html4/.

3.2

Namespaces

Namespaces are useful in solving the following complications. First of all, with their help naming conflicts can be resolved. For instance if an XML document contains data about publishers and authors and both has a describing attribute: name, which appears as an element with tag name in the document. The instances Springer Donald E. Knuth would describe entities of different categories, while this fact would be very difficult for a computer program to recognize. The ad-hoc solution in this example would be to use different tags for authors and publishers. However, in general this might be not so easy, for instance if the document has to be created by merging the XML document of publishers and one of authors, ad-hoc name changes of tags might be undesirable. Namespaces are a standard way to solve this problem. In this example, the documents of the publishers and of the authors should have unique namespaces declared in them. These can be thought of as prefixes to be prepended for names in the document. Then merging the two files is no problem anymore, because the different prefixes will identify to which category the element belongs. The other problem in which namespaces can help is to divide the structured document further into application-specific groups. If the data the applications need from the document are separable, it is advantageous to indicate which data is used by which application via putting the corresponding tag-names into different namespaces.

3.2. NAMESPACES

73

3.2.1

Uniform Resource Identifiers

Before we go into the details of XML namespaces, let us shortly discuss URIs (Uniform Resource Identifiers) and URLs (Uniform Resource Locators) as they will often be used in namespace definitions. URIs are character sequences with restricted syntax to reference things with identity (e.g. mailto:gbodnar@risc.uni-linz.ac.at). URLs are a special subclass of URIs, the ones that identify a resource via a representation of their primary access mechanism (e.g. http://www.risc.uni-linz.ac.at/ education/courses/). We distinguish absolute and relative URIs. Absolute URIs refer to the resource independent of the context in which the identifier is used. Relative URIs describe the difference between the current context and the absolute identifier of the resource in a hierarchical namespace. Example 3.1. Take for instance the URL http://www.risc.uni-linz.ac. at which is an absolute identifier for the main homepage of RISC. From any webpage over the Internet, if a link uses this URL as a pointer, a click on it will take the browser to the homepage of RISC. The absolute URL for the page with the list of RISC related courses in the current semester is http: //www.risc.uni-linz.ac.at/education/courses/. Within the RISC home page the link “Education” in the middle column uses the relative URL /education/courses/ (actually the implementation uses only /education/, which is redirected to the above given URL). This describes how to get the target URL from the URL of the current page, namely by appending it to current URL. If this relative URL appears on another homepage, it will point to the corresponding location within that site. Please note that if you move the cursor over the link the browser usually displays the appended absolute URL in the status bar (viewing the page source reveals the used URLs). To determine an absolute URI from a relative one requires a context, which is called the base URI. There are the following methods to do this • The base URI may be embedded into the document so that relative URIs in the document have the context explicitly set. • If not in the previous case, the base URI can be determined by the retrieval context of the entity (resource) of one encapsulation level higher (e.g. the resource is a table which appears in a web page). • If not in the previous cases, i.e. the resource is not encapsulated into another entity, the last URI, used for retrieving the document which contains the current URI, is used as a base URI (e.g. the case of the /education/courses/ example). • If none of the above conditions apply, the base URI is defined by the context of the application (in this case different applications can interpret the same URI differently). URIs can have up to four components: scheme, authority, path, query. Each component has a rigorous syntax which it has to obey, but we will not present

74

CHAPTER 3. XML

these rules in full detail here (the interested reader is advised to consult the RFC 2396 document). The scheme component defines the semantics for the rest of the string of the URI. The scheme component starts with a lower case letter and is separated from the rest of the URI by a colon. Relative URIs are distinguished from absolute ones in that they do not have a scheme component. An example is the http character sequence in the URLs that indicates the scheme for “hypertext transfer protocol” used on the WWW. The authority component defines the top hierarchical element in the namespace. It is typically defined by an Internet based server in the form [userinfo@]host[:port]} a few examples follow. gbodnar@risc.uni-linz.ac.at secure.access.com:443 193.170.37.129 The path component contains data specific to the given authority and/or scheme, identifying the resource within their context. The hierarchical structure is indicated by ’/’ signs, which separate path segments from each other. In Example 3.1 the relative URL /education/courses/ is actually a path component that appears also in the corresponding absolute URL. The query component is a string to be interpreted by the resource. It is separated from the path component with a question mark. A natural usage of query components is for instance encoding data filled in by the user at a web form, and passing it as the set of input parameters to the program which is supposed to process them. The example is an input query for the LEO online dictionary. http://dict.leo.org/?search=namespace&lang=en In this example the so called“GET” method is applied for parameter passing. In this standard the parameters are listed in the query string as “attribute=value” pairs separated by ampersands.

3.2.2

Namespaces in XML

The most important aspect of a namespaces is uniqueness (in order to avoid collisions). So namespaces must be identified with a URI, preferable with one to which the owner of the document has rights. Example 3.2. For instance, if a company owns the http://big.company.com/ domain name, it is preferable that namespaces start with this URL for XML documents created at this company. So http://big.company.com/product-info could be a URI used in a namespace declaration, where the corresponding namespace would be used in naming elements in XML documents which that describe products of the company. Please note that namespaces need not correspond to existing URLs, they are simply strings of characters which are supposedly unique and additionally might carry some self explanatory information on the purpose of the namespace.

3.2. NAMESPACES

75

Of course, uniqueness can be guaranteed only if the company owns the domain name and within the company the URIs are maintained centrally so that no two developer will invent and use the same URI. Namespaces can be declared within any element of an XML document, using the reserved attribute name xmlns. For instance Disney Mouse Pad 231069948 3.49 EUR declares that inside the list element the prefix p (the string between xmlns: and the equation sign) will be used as an alias for the namespace http://big. company.com/product-info. Prefixes are to simplify the notation in an XML document, as it would be cumbersome and unreadable to apply the long URI each time. Whenever a prefix is given in a namespace declaration, all descendants of the element which belong to the namespace must be named with the prefix prepended and separated by a colon from the tag name (e.g. p:id). This is also called qualifying the name (e.g. id is qualified to be in the namespace with alias p) and the name with a prefix is called a qualified name. Both start- and end-tag names have to be qualified (consistently). Attribute names can also be qualified. If no prefix is given, so that the attribute is just xmlns, in a namespace declaration, it will declare the default namespace for the element. In all descendants of the element unqualified tag names will automatically fall into the default namespace. Therefore the list of products example above can also be written as follows. Disney Mouse Pad 231069948 3.49 EUR However there is a difference between the two ways: In the first case the list element is not in the namespace while in the second it is. Default namespace declarations are available to the element in which the declaration takes place and to its descendants. With a prefixed namespace declaration the element in which the declaration takes place belongs to the namespace only if its name is qualified with the prefix we define. So an equivalent qualified version of the previous default namespace declaration example is the following. Disney Mouse Pad

76 231069948 3.49 EUR

CHAPTER 3. XML

The section of the XML document to which the namespace declaration applies is called its scope. If one uses a qualified name outside the scope of the corresponding namespace, the qualification will be simply ineffective. For instance, in the following XML document the second book element is not in the namespace http://www.mybookstore.com/catalog, the string b:book in that case stands just for itself. C. J. Date An Introduction to Database Systems G. Brill Codenotes for XML Remark 3.3. Attribute names are not automatically in the namespace of the including elements. For instance in the example above the isbn attribute is not a member of the declared namespace. Prefixes used as aliases for namespaces can be reassigned in an XML document, however it is not advised to do so in order to keep the document as readable as possible.

3.3

XML Schema

XML schemas are templates for XML documents. With XML schemas one is allowed to specify what structures the modeled XML documents may have and impose restrictions on their contents. XML schemas are XML documents themselves that must follow strict syntactical rules, defined in the Schema Primer Recommendation, available at http://www.w3.org/TR/xmlschema-0/. This specification can be considered as a schema for schemas. This concept replaces the one of Document Type Definition (DTD), which was used originally for the same purpose. There are several advantages of schemas over DTDs. • While DTD syntax was inconsistent with XML, schemas are valid XML documents themselves, so there is no additional parser necessary in XML applications. • XML schemas provide a wide range of data types (integers, dates, strings, etc.) for element content and attribute values. • Schemas allow the users to define custom data types, and support inheritance between types.

3.3. XML SCHEMA

77

• Schemas also provide powerful features to express element groupings, and other properties of elements (uniqueness, substitutability, etc.). In brief, schemas provide a flexible grammar to describe XML document templates in an XML compliant way. There are also several software products available to validate schema definitions and XML documents against well defined schemas.

3.3.1

Schema declaration

Schema definitions apply namespaces to reference constructions in the schema grammar and to declare new element names for some target namespace. The root element of an XML schema must always be schema, which should be qualified to be in the namespace http://www.w3.org/2001/XMLSchema. This namespace contains the vocabulary (set of predefined names) for schema definitions. The schema element can also include the attribute targetNamespace, whose value defines the namespace for the new names (of elements, attributes, types) declared in the schema. This is not a namespace declaration for the schema itself, but it will force the referencing document to use this namespace for the names defined in this schema. If no targetNamespace is defined, the definitions and declarations of the schema will validate elements in the instance document that do not fall into any namespace (i.e. they are unqualified and they are outside the scope of any default namespace declarations). XML documents need not use namespaces, so if we want to write a schema that validates such an XML document we must omit the target namespace definition. In the schema element, several additional namespaces (e.g. a default namespace) can be defined. They can be used to qualify names placing them into the desired namespace. This is particularly useful when user defined types appear in element or attribute declarations (we will discuss this later). The target namespace applies by default only to global objects. Global elements, attributes and types are declared in elements of the schema which are children of the schema element. The declarations in deeper nesting levels are called local declarations (or, when we talk about the corresponding elements/attributes in the instance document, simply local elements/attributes). Whether local elements/attributes must or must not be qualified to be in the target namespace is controlled by two attributes of the schema element: elementFormDefault and attributeFormDefault. Their values can be "unqualified" (default) or "qualified". In the local element/attribute declarations the values of the elementFormDefault and attributeFormDefault attributes (whose scope is the whole schema) can be overridden using the form attribute. Example 3.4. To declare an XML schema for the namespace http://big. company.com/product-info with element and attribute qualifications enforced one can write the following.

78

CHAPTER 3. XML

Please note that the http://www.w3.org/2001/XMLSchema namespace is assigned to the xsd prefix, and thus the schema element has to be appropriately qualified to be in the predefined namespace of schema definition grammar. The last two attributes imply that all local elements and attributes declared this schema will have to be qualified to be in the given target namespace in an XML document referencing the schema, unless they have the following attribute in their declaration: form="unqualified". An XML document that references a schema is called an instance of the schema. An instance can only use the vocabulary (set of declared names) defined by the schema. A schema reference in an instance document is typically placed into the root element. To properly reference a schema one has to do three things: first to declare the target namespace of the schema in the instance (except if there is no target namespace defined in the schema), so that it is available for further elements. Secondly one has to define the schema instance namespace, which is the predefined namespace http://www.w3.org/2001/XMLSchema-instance, to be able to do the actual referencing. And the last action is referencing the schema whose instance this document is; for this the schemaLocation attribute (which belongs to the schema instance namespace) has to be assigned the URI of the schema. Example 3.5. Let us assume that the schema of Example 3.4 has the URI http://big.company.com/schemas/product-info.xsd. Then an XML document that references the schema can start as follows. The default namespace declaration uses the target namespace of the referenced schema. The xsi prefix will qualify “reserved” names, for instance, to bind the schema to the document by the attribute xsi:schemaLocation.

3.3.2

Types, Elements, Attributes

The main part of XML schemas define what elements and attributes can an instance have and what types of data are allowed to be contained by them. First we discuss elements, then attributes and finally types. The most trivial elements are the simple content ones, which can contain only character data, possibly in some prescribed format, and they cannot contain child elements. This is an example of a simple element declaration where the tag-name that identifies the element must be simple and the content is some text.

3.3. XML SCHEMA

79

Please note that it is also assumed that the schema definition is as in Example 3.4, so xsd prefix is bound to the schema definition namespace. The element tag is defined in the xsd namespace and denotes that an element definition follows. The name attribute describes the name of the element being declared and the type attribute its data type which is also identified with a qualified name (i.e. string is defined in the meta-schema). Complex element definitions allow us to specify the internal structure of the declared elements. We will use the following XML schema constructions: sequence This requires that the instance contains the elements that appear in the schema in the same order and by default exactly once. all This allows that elements in the definition group appear in any order and by default exactly once. The all element can occur only in global declarations and only as a single element group declarator. The children of a all element should be individual elements (no element groups are allowed). choice This allows only one of the elements in the definition group to appear in an instance document inside the complex element. The following two attributes can appear in local element declarations and control the number of occurrence of the declared element in the instance document: minOccurs Specifies the minimal number of occurrences of the declared element in its parent element. maxOccurs Specifies the maximal number of occurrences of the declared element in its parent element. Optional element can be prescribed via the attribute minOccurs="0", and elements that can occur arbitrarily many times can be prescribed via the attribute maxOccurs="unbounded". Example 3.6. The following definition (schema fragment) prescribes a fixed structure for the product element. An example instance of this element definition can be taken from Section 3.2 with the slight modification of dropping the “EUR” characters from the price element. Disney Mouse Pad 231069948 3.49

80

CHAPTER 3. XML

In the next example each child element must appear exactly once, but the order is arbitrary. An example instance of this element can be the following. true Japan In the following example the occurrence constraints on child elements are prescribed explicitly. An example instance of this element can be the following. Germany United Kingdom With the group element we can define element groups on the global level and we can refer to them in other declarations.

3.3. XML SCHEMA A valid instances is K. Boehm Goethestr 5, Linz VISA1208549856745634/0508 and another is K. Boehm Goethestr 5, Linz IBAN57486739485934590000/34660

81

So far the elements were either simple or complex with only child element content. One can also declare an element with mixed content using the mixed=true attribute setting in the declaration. For instance, the following declaration allows that the element note contains arbitrarily many emp elements and character data in between. The following element is then an instance of this scheme. The extra lecture will be tomorrow at 16:00 in the lecture hall HS 13. The nillable attribute can be used to declare that an instance of the element can have a special attribute nil (from the schema instance namespace) which marks explicitly that the instance has no meaningful value (similarly to the NULL values in RDBMSes). The following example declares that the date element is nillable. Then the following instance is valid.

82

CHAPTER 3. XML

Attributes are declared similarly to child elements with the difference that attributes always have simple content. In an attribute definition the name and type attributes will prescribe the corresponding qualities of the attribute being declared. The use attribute controls requirements for the declared attribute and the value attribute can set default or fixed values for it. The use attribute can have the following values: required The declared attribute must be explicitly defined every time its element occurs in the instance of the schema. default The declared attribute is optional and if it is not given the value attribute specifies its value. optional The declared attribute is optional, no default is given. fixed The declared attribute is optional, but if it appears, its value has to be the one given in the value attribute. prohibited The declared attribute must not appear in any occurrence of the corresponding element and it has no predefined value. Attributes can be defined only inside complex element declarations, so if we want to define a simple element with attributes, we have to define it as a complex element with simple content using the simpleContent element. The attribute declarations always come after the child element declarations. Example 3.7. In the first declaration of Example 3.6 we could define the currency for the price tag as an attribute as follows. An example instance of this element can be taken from Example 3.6 and now we can (actually we have to) add the currency as an attribute of price. Disney Mouse Pad

3.3. XML SCHEMA 231069948 3.49

83

Or using Example 3.6, we include a time stamp attribute for the product extension element. Reusing the corresponding instance from Example 3.6 we can have the following elements. true false Japan In local declarations we can refer to global ones via the ref attribute. This is just to say that the definition of the element should be taken as the definition of the referenced one. This makes schema declarations more readable, modularizing the definitions. For instance, we could have taken out the definition of the price element in the first declaration of Example 3.7 to the global level and then in the declaration of product we could have written the following. ... ... Alternatively, we can define a new type on the global level, say price type, and use it as we use the standard types in the element declarations. In this case, if a target namespace is set, price type will become a member of the target namespace, thus we have to find means to reach it when we refer to it in the element declaration. This can be done by defining an alias with the same namespace as the target namespace. The whole schema looks like:

84

CHAPTER 3. XML

Here is an instance of this schema. Disney Mouse Pad 231069948 3.49 Think it over why certain elements are qualified and others are not. XML Schema supports modularity by enabling a schema to include other schemas using the include element. This will include the schema definitions in the file locatable by the given URI into the current schema. It is required that the target namespace of the included and including schemas be the same. It is also possible to use named types from other schemas having target namespaces other than the target namespace of the current schema. In this case we have to assign an alias to the target namespace of the imported schema and use the import element.

3.3. XML SCHEMA

85

The declarations of the imported schemas are accessible via the aliases of corresponding namespaces. XML schema provides more than forty built-in data types, from which we have already seen a few above (the complete list is available under the URL at the beginning of this section). We also created new types in an anonymous way so that they could be used only at the place of definition. Moreover we have seen how XML schema allows to introduce new named types that can be used as the built-in ones after their definition. Example 3.8. Here is another example of using named type. From the first declaration of Example 3.6 we can extract the complex type definition as follows. And then the declaration of the product element could simply be The main mechanisms to define new data types are restriction and extension. Built-in data types provide one or more “facets” through which one can create new types by restriction. An example can be the type of strings that match a given regular expression. The following example restricts the strings of the new type to be of the format nnn-nn-nnnn where in place of an n any digit can stand (e.g. 323-90-0982). The complete list of facets for each built-in type can be found at http://www. w3.org/TR/xmlschema-0/. For complex data types one can also use extension to add new elements and attributes to an already existing type. This example extends the product type declaration, from Example 3.8, with the elements of product extension.

86

CHAPTER 3. XML

As in the relational model we might want to ensure that certain elements or attributes have unique values within some scope in an instance of the XML schema. This can be achieved by using the unique tag in the element definition which must come after all child and attribute definitions. The general structure of a unique tag is as follows. The name attribute is only formal, the xpath attribute of the selector element specifies the set of elements within which the element or attribute specified by the value of the xpath attribute of the field element must be unique. The identification uses the XPath specifications, which we will discuss in Section 3.4. The field specification is relative to the path given in the selector. Attribute names must be prefixed with an @ character. Example 3.9. The following schema fragment prescribes that in an instance of the schema within the list element the product elements will have to have a child id with unique value. In the example we also assumed that the xsd prefix is assigned as in Example 3.4. If we want to ensure referential integrity between contents of elements, that is we want that if certain values of attributes or elements match values of other attributes or elements in the instances of the schema, we can use the key and keyref tags. The mechanism is analogous to the one of the unique tag. The key declarations must always occur at the end of an element declaration (after child element and attribute declarations). Elements or attributes which are declared to be keys must always be present with a unique value (which cannot be nil). Referring element or attribute values must always have a corresponding key value.

3.3. XML SCHEMA The construction parallels the notion of foreign keys in RDBMSes.

87

Example 3.10. In the following schema fragment we prescribe that the attends document can contain student and course elements with the structure given below. Then we declare that the cid element of the course elements must be a key in the attends elements and this key is referred by the cid elements of the student elements. This means that the cid elements in the course elements must be unique in an attends element and whenever a cid value appears in a student element, there must be a corresponding course element with the same value.

3.3.3

Exercises

Exercise 3.3.1. Consider the following schema:

88

CHAPTER 3. XML

Find the points at which the following XML document violates the prescriptions of the previous schema. Thomas Keller 34 36.5 Exercise 3.3.2. Modify the XML schema of Exercise 3.3.1 in a way that the example XML document becomes valid with respect to the modified schema. Exercise 3.3.3. Consider the following XML document, with the intended restrictions that the id attribute of the entry element is mandatory, the author, title, year elements have to be present, where there can be more than one author of a publication. No namespaces are used. David Cox Update on Toric Geometry 2000 Thomas Becker Volker

3.4. XPATH Weispfenning Groebner bases Springer 1993

89

Write an XML Schema for such kind of documents, that validates, in particular, this example. Exercise 3.3.4. Consider the following XML Schema Invent an XML document which is valid with respect to this schema.

3.4

XPath

XPath is a part of the XSL (eXtensible Stylesheet Language) family. Its primary purpose is to provide common syntax and functionality to address parts of XML

90

CHAPTER 3. XML

documents. Beside this, it can also be used for basic manipulation of strings, numbers and booleans. XPath operates on the logical structure of an XML document and uses a syntax to address their parts that resembles to the path constructions in URLs. XPath models an XML document as a tree of nodes, which can be imagined as an extension of the tree of elements of the XML document. In this extended tree not only the XML elements, but also their attributes, the character data they may contain, the namespaces and the processing instructions appear as nodes. Beside the data model, another important concept of the language is the XPath expression. With expressions it is possible to compute objects (like strings, numbers, sets of nodes) from the data of XML documents. The advanced expressions of XPath rely on a library of functions, which is general enough to allow simple text and numerical data manipulation. Expressions are always evaluated in some context, which is described by the context node, context size and context position (think of a list of nodes on which an expression is to be evaluated; the size of the list is the context size and when the expression is evaluated on the ith element on this list, the context position is i), variable bindings and namespace declarations. XPath is prominently used in XSLT to access and apply simple transformations on data of XML documents, e.g. to convert them to HTML files, which can be then displayed in web browsers. XPath expressions can appear as attribute values in XML documents, in this case the special characters must be substituted by entity references.

3.4.1

Data Model

The nodes in the tree of the XPath data model can be of the following types: root, element, text, attribute, namespace, processing instruction, comment. Each node represents a part of the text of the underlying XML document. The nodes can be ordered by the document order, in which a node precedes another if the first character that belongs to that node comes before the first character of the other node in the XML document. The nodes in the data model do not share children, moreover, every node (except the root node) has exactly one parent node which is an element node or the root node. Remark 3.11. Please note that the root node is the first one in the document order, and that an element and its attributes always precede the descendants of the element. Namespace nodes precede the attribute nodes by definition. Every node has an associated string value which can be computed in the following ways. The string value of the root node is the concatenation of all the string values of its descendants in document order. Every XML element has an element node in the XPath model. The string value of an element node is the concatenation of the of all text node descendants in document order. An attribute node in the XPath data model has the element node corresponding to the XML element in which the attribute is defined as its parent. However the attribute node is not considered as a child of its parent element. The string value of an attribute is simply its value as a string (after some nor-

3.4. XPATH

91

malization specified by the XML standard). Attribute that declare namespaces do not appear as attribute nodes, but as namespace nodes. Every element gets assigned a set of namespace nodes, one for each namespace declared in the element itself or in ancestors, in whose scope the element is. The element node is the parent of the namespace nodes, but just like attributes, the namespace nodes are not considered as children of the element. The string value of namespace nodes is the URI bound to the prefix identifying the namespace. Remark 3.12. Why is an attribute or a namespace node not a child of its parent? The purpose of this distinction is to preserve, by default, the structure of the modeled XML document (where only elements can be nested). The default relationship is the child-parent and several other derived relationships (e.g. descendant-ancestor). The relationships that belong together in this sense will be collected in an “axis” of path directions. How can then the attributes or namespaces that belong to an element be retrieved? By considering the corresponding relationship, e.g. attribute-parent, which will have its own axis. Text nodes represent character data in the XML document in a way that text nodes do not have immediate text node siblings (i.e. a text node contains as much character data as it can) and they contain at least one character. The string value of a text node is just the character data contained by it. Characters inside comments, processing instructions and attribute values are not represented by text nodes. They can be retrieved as string values of the corresponding node types. The string value of a comment node is the comment it contains (without the ’’ symbols). Please see Example 3.14 for a node tree of an XML document.

3.4.2

Location Paths

Location paths are special expressions for selecting a set of nodes, possibly but not necessarily relative to the context node. A location path consists of location steps composed together from left to right, separated by ’/’ (slashes). The first location step selects a set of nodes relative to the context node. Then each resulting node is used as a context node for the next location step which results again a set of nodes, and so on. If the first character of a location path is ’/’, the initial context node will be the root node. In this case the location path is called absolute, otherwise it is relative. Example 3.13. A first example could be the way one navigates in a hierarchically organized directory structure on a file system (the analogy is so natural that even the syntax is taken over in the abbreviated notation of path expressions). The following location path specification selects all the summary elements which are children of any element in the reports child of the parent node of the context node. The context node can be imagined as the current working directory if we use the file system analogy. ../reports/*/summary

92

CHAPTER 3. XML

On a UNIX file system this would correspond to selecting all the summary files in any subdirectory of the reports directory of the parent directory. Please note that there is still an important difference between file systems and XML documents: in a file system a directory cannot contain more than one subdirectories or files with the same name, however, an XML element can have many children with the same name. So while the ../reports specification selects at most one subdirectory on a file system, the same selects every reports children of the parent of the context node when considered as a location path for an XML document. Location steps have the following parts: axis It specifies the (in-tree) relationship between the context node and the nodes selected by the location step. node test Specifies the node type for the nodes selected by the location step (it is separated by :: from the axis). predicates It specifies further expressions with boolean value, to refine the selected node set. This part can be omitted. The following axes are available: child, descendant, parent, ancestor, following-sibling, preceding-sibling, following, preceding, attribute, namespace, self, descendant-or-self, ancestor-or-self. Most of the names are self explaining, e.g. the ’descendant’ axis contains the descendants of the context node (recall that attribute and namespace nodes are excluded), the ’following-sibling’ axis contains all the following siblings of the context node (with higher context positions). The ’attribute’ axis contains the attributes of the context node. The ’following’ (resp. ’preceding’) axis contains all nodes in the same XML document that come after (resp. before) the context node in the document order (obtained by left to right depth first search) excluding descendants (resp. ancestors) and attribute and namespace nodes. The ’following-sibling’, ’precedingsibling’, ’attribute’ and ’namespace’ axes are empty for non-element nodes. The ’self’ axis contains the context node itself. There are three principal node types for axes. If the axis can contain elements, the principal node type is ’element’. For the ’attribute’ and ’namespace’ axes the types are ’attribute’ and ’namespace’ respectively, and for other axes it is ’element’ again. Only those nodes pass a node test which have the same principal node type as the axis preceding the node test. One can specify exact matching of node names for a given one by adding it after the axis specifications separated by ’::’. For example child::x will select all the children nodes of the context node with name x. The node test denoted as * selects all nodes on the given axis that match the principal node type of the axis. For instance child::* selects all element children of the context node (because the principal node type of the ’child’ axis is ’element’). The text() node test selects text data nodes on the given axis. Example 3.14. Let us consider the example XML document with emails at the beginning of Section 3.1. Figure 3.1 on page 93 illustrates a part of the node tree of the document (the parsing direction is clockwise/top-down instead of left to right). Namespace nodes are omitted to simplify the figure. The absolute path

3.4. XPATH

93

date (attribute)

from (element)

"robert@company.com" (text data)

’ ’ (spaces and \n) (text data)

email (element}

to (element)

"oliver@company.com" (text data)

folder (element)

subject (element)

"Meeting" (text data)

document root

date (attribute)

"Could we..." (text data)

xml declaration (processing instruction)

email (element)

...

Figure 3.1: A simplified XPath data model /child::folder/child::email/attribute::* selects all the date attributes of all the email elements. The first ’/’ selects the root node as the context node then the first path element selects the child axis and the node test selects the folder child of the root. Then again the child axis is selected with all the email children. The third path element selects the attribute axis and the node test selects all attributes of the context node elements. The ’ancestor’, ’ancestor-or-self’, ’preceding’ and ’preceding-sibling’ axes are called reverse axes (as they contain nodes which are before the context node in the document order); the others are forward axes. The proximity position of a member of a node set with respect to an axis is its position in the node set considered as an ordered list in the document oder for forward axes and in the reverse document order for reverse axes. The counting starts at one. Predicates filter a node set with respect to an axis in a way that the predicate expression is evaluated for each node in the node set as context node, the size of the node set as context size and the proximity position of the node with respect to the axis as context position. The new node set, filtered by the predicate, contains the nodes resulting true in this evaluation procedure. Predicate expressions can be attached to the path element by enclosing them in ’[’, ’]’. For instance, in Example 3.14, the location path /child::folder/child::*[position()=1]/attribute::* selects only the attribute of the first email element in the XML document. There is an abbreviated syntax available for location paths; some of the most important elements are listed in Table 3.1

94 Abbreviation * @* @name . .. / //

CHAPTER 3. XML Meaning by default the child:: prefix can be omitted stands for ‘for all’ selects all attributes of the context node selects the name attribute of the context node selects the context node selects the parent of the context node selects the root node selects as descendant-or-self Table 3.1: XPath abbreviations

Example 3.15. The following location path selects the from elements of the email elements for which the date attribute is 21 Aug 2003. We use here the abbreviated notation. /folder/email[@date = ’21 Aug 2003’]/from

3.4.3

Expressions, Core Functions

The simplest expressions are numerical, string literals, variable references and function calls. Numbers mean double precision IEEE floating point numbers, and string literals define themselves with the possibility of using special encodings (like Unicode). Variables can be assigned using other XML technologies, for instance XSLT, and they differ from variables in usual programming languages in that we cannot change freely their value after the assignment. The value the variable x holds can be referenced by $x. The basic arithmetic operations are available for numbers. But please note that usually the operators should be separated from the names by whitespaces. For example x-y evaluates to the node set that contains the children of the context node with this name. On the other hand, x - y will take the first child of the context node with name x and the first one with name y and will attempt to convert their string value to numbers and evaluates to the difference of these numbers. More complicated expressions are location paths (discussed in Section 3.4.2) and boolean expressions. The latter are expressions that evaluate to true or false. The basic boolean expressions contain comparisons =, ! =, >,

3.5. XSL TRANSFORMATIONS December

105

The last element declares which month number gets which character name (as at this point of the XML introduction provided by these notes we cannot establish references between elements of different XML documents, we have to add the months element as a child of the source document). The XSLT stylesheet could be the following. To: Dear , Best regards, In the date retrieval we can use the concat function to produce the concatenation of the resulting string values. The current function retrieves the current context outside the square brackets in the location path, so that current()/date/@m will refer to the m attribute of the date element of the email element currently processed (and not to an attribute of some descendant of the months element).

3.5.3

Exercises

Exercise 3.5.1. Consider the following XML document. Sharp Beamer (1024x768) Josef Niel 2003-12-08T13:00:00 2003-12-09T18:00:00

106

CHAPTER 3. XML

Toshiba Satellite Pro 6100 Josef Niel 2003-12-08T13:00:00 2003-12-09T18:00:00 Toshiba Satellite Pro 6100 Barbara Hill 2003-12-03T10:00:00 2003-12-07T08:00:00 Write an XSLT stylesheet that produces a small XHTML document which contains a table whose rows collect the allocation data of ’Josef Niel’. Exercise 3.5.2. Take the XML document of Exercise 3.3.3 and write an XSLT stylesheet arranges the authors of each bibliographic entry in alphabetical order and then also arranges the entries in ascending order by the publication years. The resulting document should have the same structure as the source document (it also has to be valid with respect to the schema that validated the source document). Exercise 3.5.3. Consider the example of bidding from Exercise 3.3.1, let now the XML document look as follows. CD Player Stereo Amplifier Thomas Keller 34 36 Hugo Browning 35 40 Thomas Keller 23 Write an XSLT stylesheet that produces an XHTML document containing a table for each item in the items element. The rows of a table are the bids received for the corresponding item in descending order by the value of the bid. The information displayed in the row should also contain the currency, the name of the person and the timestamp.

3.6. XML QUERY

107

3.6

XML Query

The purpose of XML Query (XQuery for short) is to provide a language for extracting data from XML documents. The queries operate on single documents or fixed sets of documents and can select whole documents or subtrees of documents that match conditions on content and structure. The query language is functional (but it also includes universal and existential quantifiers), supports simple and complex data types defined in XML Schema, defines aggregate functions, handles null values and fully compliant with XML namespaces. Just as in XSLT, the expressions play the central role in XQuery. The value of an expression is always a sequence, which is a list of zero or more atomic values (as in XML Schema) or nodes (of the given node types, as in XPath). The xs: prefix is assumed to be bound to the URI http://www.w3. org/2001/XMLSchema, the fn: prefix to the URI http://www.w3.org/2003/ 11/xpath-functions and the xdt: prefix to the URI http://www.w3.org/ 2003/11/xpath-datatypes.

3.6.1

Data Model and Types

The data model of XQuery is an extended version of the data model of XPath (also referred to as the XPath 2.0 data model). The root node type is called document-node type (identified by the unique URI of the document, so that collections of document nodes can also be treated). The data model can contain various extra information for each node, e.g. the parent, the children, the inscope namespaces, an indication whether the node is nilled, the type, etc. These information can be obtained via accessors (interface specifications for functions which should be included in any implementations). Here we briefly discuss only the type annotation of nodes in the data model. Type information can be assigned to element or attribute nodes either using the post schema validation infoset (PSVI) which, as its name shows, is obtained by validating the document against schema(s), or, if the PSVI cannot provide type information for a node, just using the xdt:untypedAny type for element nodes and the xdt:untypedAtomic type for attribute nodes (see below). For anonymous types (having no names assigned in the schema) the parser assigns internal identifiers. The typed value of a node can be extracted by applying the fn:data function on the node, which corresponds to the dm:typed-value accessor, (the string value—recall the notion from the discussion of XPath—can be obtained by applying the fn:string function). The typed value of a node is computed in the following way: • For text, document, and namespace nodes, the typed value of the node is the same as its string value, as an instance of the type xdt:untypedAtomic. • The typed value of a comment or processing instruction node is the same as its string value, as an instance of the type xs:string. • The typed value of an attribute with type annotation xdt:untypedAtomic is its string value as an instance of this type. With other type annotation

108

CHAPTER 3. XML it is derived from the string value in a way consistent with the schema validation.

For element nodes the typed value can be computed in the following way: • If the element has a type of xdt:untypedAtomic or a complex type with mixed content, the typed value is the string value of the node as an instance of xdt:untypedAtomic. • If the element has a simple type or a complex type with simple content, the typed value is a sequence of zero or more atomic values derived from the string value of the node and its type in a way that is consistent with the schema validation. • If the node has a complex type with empty content, the typed value is the empty sequence. • If the node has a complex type with element only complex content, its typed value is undefined. XQuery is a strongly typed language. It uses the XML Schema data types, which have to be qualified with the predefined xs prefix (e.g. xs:integer), and the following additional data types (with predefined xdt prefix): xdt:anyAtomicType an abstract data type of all atomic values, xdt:untypedAny a concrete data type for values of element nodes not having any specific dynamic type assigned to them, xdt:untypedAtomic a concrete data type for atomic values not having more specific type assigned to them, xdt:dayTimeDuration and xdt:yearMonthDuration both are concrete subtypes of xs:duration respectively. Expressions are evaluated in two phases, called static analysis and dynamic evaluation. In the static analysis phase, which depends only on the expression and statically available information (one which is available before the actual evaluation of the expression, e.g. in-scope namespaces, defined variables, available functions), a concrete or an abstract data type may be assigned to the expression “in advance”. This phase is also useful to quickly detect static errors in the expression. In the dynamic evaluation phase, that comes after the static analysis, the value of the expression is computed with respect to the dynamic context (analogous to the concept of ’context’ in XPath, containing additional information on the available documents and node collections, and the current date and time). The type determined in this phase is called the the dynamic type of the expression. Since expressions always have sequence values, to simplify the notation we do not make distinction between an atomic value or a node and the singleton sequence containing it. However when a sequence of values is longer than one, we have to talk about sequence types. The main difference between types and sequence types is the possibility of prescribing occurrence constraints on the

3.6. XML QUERY

109

appearance of the given type in the sequence. The character ’?’ denotes zero or one, ’*’ denotes zero or more and ’+’ denotes one or more occurrence. The node kinds are denoted by their names with a ’()’ suffix, node() standing for a node of any kind. For the detailed specification we refer to [11]. A few examples are: • xs:date refers to the built-in Schema type date, • attribute()? refers to an optional attribute, • element() refers to any element, • element(po:shipto, po:address) refers to an element that has the qualified name po:shipto and has the type annotation po:address (or a subtype of that type), • node()* refers to a sequence of zero or more nodes of any type. Type matching is also a part of the expression evaluation process, i.e. a sequence of values with given sequence type has to be checked against an expected sequence type (e.g. input argument list of a function). The type matching results true if the given sequence type is known and matches the known expected sequence type, or if it can be derived by extensions or restrictions from the expected type. In other cases the result is false, except if the expected type is unknown or it is not possible to determine if the given type can be derived from the expected one, in which case a type error is risen.

3.6.2

Expressions

XQuery expressions can be split in the following main categories: primary-, path-, sequence-, arithmetic-, comparison- and logical-, constructors, FLWOR-, conditional- and quantified expressions. Here we discuss them only briefly, for the complete information we refer to [11]. Primary expressions include literals, variable references, context item expressions, constructors, and function calls. A primary expression may also be created by enclosing any expression in parentheses. A few examples of literals: • "3.14" denotes the string of the characters ’3’, ’.’, ’1’, ’4’, • 3.14 denotes the decimal value 3.14, • 314 denotes the integer value 314, • 314e-2 denotes the double value 3.14, • "Pratt & Whitney" denotes the string ’Pratt & Whitney’. Variable references can be constructed by prefixing a (possibly qualified) variable name with a ’$’ sign (e.g. $x). Variables can be declared in the query prolog (a series of declarations, schema and module imports that modify the context for the evaluation of the body of the query) or can be bound by XQuery expressions (like FLWOR and quantified expressions).

110

CHAPTER 3. XML

Function calls consist of the (possibly qualified) name of the function followed by a parenthesized list of arguments. The core function library of XPath is available, but one can also define new functions in the query prolog. The input parameters undergo a type matching, in which certain values can be split into a sequence of atomic values (atomization) and values with type xdt:untypedAtomic are attempted to be type casted to the required types. If the type matching was successful, the formal parameters of the function are bound to the corresponding input values, however if the input value is a subtype of the prescribed type of the formal parameter, it keeps its more specific type. Path expressions are basically known to us from XPath (see location paths in Section 3.4). Sequence expressions can be formed via the comma operator applied on (possibly sequence) expressions. The result is always a “flat” sequence, so that the elements of the nested sequences are shifted up to the outermost level. For instance: • (1,(2,3),(),(3,4)) results (1,2,3,3,4), • (book) results the sequence of all book children of the context node. Sequences can also be combined by the operators: union, intersect, except performing the corresponding set-theoretic operations (eliminating duplicates from the resulting sequence). Arithmetic expressions can be built from atomic values that can be type casted to an accepted input type of the applied operator, which can be: addition ’+’, subtraction ’-’, multiplication ’*’, division ’div’ (integer division ’idiv’), and modulus ’mod’. Comparison expressions can be of the type: value comparisons (eq, ne, lt, le, gt, ge), general comparisons (=, !=, =) and node comparisons (is, ). Value comparisons require atomic types, e.g. book/author eq "Kennedy" is true if and only if the book element node has exactly one author child and its typed value is "Kennedy" as an instance of xs:string or xdt:untypedAtomic. General comparisons can also be applied when the atomization of the operands produce longer sequences. The result of the comparison is true if and only if there is a pair of atomic values from the operands which are in the required relationship (applying the corresponding value comparisons for the atomic values, e.g. eq for =). In node comparisons the operands must be single nodes or the empty sequence. The is operator test if the operands are the same, in the sense of having the same identity (recall that in XML documents an element can have many children with the same name and the same content, each having its own identity). The > operators compare nodes with respect to the document order. For instance: /book[@isbn="0-387-94269-6"] is /book[@id="L23432"] is true if and only if the two path expressions evaluate to the same single node in the document. Logical expressions can be formed from subexpressions that evaluate to boolean type using the and and or operators.

3.6. XML QUERY

111

Constructors are provided for every kind of node types to create XML structures in a query. There are direct and computed constructors, where the direct constructors resemble to the corresponding concept in XSLT, where it is possible to embed dynamically computed values by enclosing the defining expressions in {}. For instance, the following direct element constructor creates a new node with its own identity in the result of the evaluation. Book: {string($b/title)}, {string($b/author)} [{string($b/@isbn)}] If the variable b is bound to the node Codenotes for XML G. Brill the result is the following. Book: Codenotes for XML, G. Brill [0-812909191-5] The curly braces can be included in the result either by doubling them: “{{” and “}}” will represent a ‘{’ and a ‘}’ respectively, or by using the corresponding character references: { and }. The direct element constructor automatically validates the created element against the available in-scope schemas (depending on the set validation mode this might result in an error or assigning the type xdt:untypedAny to the element and its children if the validation fails). Computed constructors start with a keyword, identifying the node type to be created (element, document, text, processing-instruction, comment, namespace), followed by the name (or an expression, in braces, resulting a name, which is possible only for non-namespace nodes) for element, attribute, namespace or processing instruction node. Finally the content expression defines the content of the node. Computed constructors can be nested, just as the direct ones. The previous example with computed constructors would look as follows. element p { attribute id {$b/@isbn}, "Book:", element br {}, string($b/title), ",", string($b/author), ", [", string($b/@isbn), "]" } An important use of computed constructors is to be able to assign the name of the created element dynamically. FLWOR expressions support iteration and binding variables to intermediate results. The acronym was created from the parts of such expressions: for, let, where, order by, return. The for and let clauses in a FLWOR expression generate a sequence of tuples of bound variables, called the tuple stream. The where clause serves to filter the tuple stream, retaining some tuples and discarding others. The order by clause imposes an ordering on the tuple stream. The return clause constructs the result of the FLWOR expression.

112

CHAPTER 3. XML

The following example (from [11]) of a FLWOR expression includes all of the possible clauses. The for clause iterates over all the departments in an input document, binding the variable $d to each department number in turn. For each binding of $d, the let clause binds variable $e to all the employees in the given department, selected from another input document. The result of the for and let clauses is a tuple stream in which each tuple contains a pair of bindings for $d and $e ($d is bound to a department number and $e is bound to a set of employees in that department). The where clause filters the tuple stream by keeping only those binding-pairs that represent departments having at least ten employees. The order by clause orders the surviving tuples in descending order by the average salary of the employees in the department. The return clause constructs a new big-dept element for each surviving tuple, containing the department number, headcount, and average salary. for $d in fn:doc("depts.xml")//deptno let $e := fn:doc("emps.xml")//emp[deptno = $d] where fn:count($e) >= 10 order by fn:avg($e/salary) descending return {$d, {fn:count($e)}, {fn:avg($e/salary)}} The for clause can contain more than one variables, and then the constructed tuple stream will contain variable bindings for each combination of values for the variables from the Cartesian product of the sequences over which the variables range. The let clause can also contain more than one variable, however it binds each variable to the result without iteration. This example illustrates the difference: for $i in (,,) return $i results while let $i := (,,) return $i results Each variable bound in a for clause can have associated positional variable that iterates on the integers from 1 on as the variable iterates on the sequence. The positional variable comes after the ordinary one separated by the ’at’ keyword. For instance for $author at $i in ("G. Brill", "C. J. Date"), $book at $j in ("Codenotes for XML", "Database Systems")

3.6. XML QUERY generates the following tuple stream. ($i ($i ($i ($i = = = = 1, 1, 2, 2, $author $author $author $author = = = = "G. "G. "C. "C. Brill", $j = Brill", $j = J. Date", $j J. Date", $j 1, $book = 2, $book = = 1, $book = 2, $book

113

"Codenotes for XML") "Database Systems") = "Codenotes for XML") = "Database Systems")

The where clause just contains an expression which is evaluated for each tuple in the tuple stream, and which should provide a boolean value. The tuples result in false get filtered out. The return clause will then be evaluated for each remaining tuple in the filtered tuple stream taken in the order prescribed by the order by clause (if it is omitted, the order will be the one of the tuple stream). Conditional expressions are analogous to the well known ’if-then-else’ constructions in imperative programming languages. An example (from [12]). { for $b in doc("http://bstore1.example.com/bib.xml")//book where fn:count($b/author) > 0 return { $b/title } { for $a in $b/author[position() 2) then else () } } Quantified expressions support the universal (every) and the existential (some) quantifiers. The quantifier keyword is followed by possibly several inclauses that bind sequences to variables, from which a tuple stream is formed as for the FLWOR expressions. The final part of the expression is a test-clause, separated by the satisfies keyword from the rest of the expression, which is evaluated for each tuple in the stream. The value of the existentially quantified (some) expression is true if at least one of the evaluations of the test expression on the tuples from the stream is true (the empty stream yields false value). The every expression is true if every evaluation of the test expression evaluates to true (the empty stream yields true). Examples (from [11]): this expression is true if every part element has a discounted attribute (regardless of the values of these attributes). every $part in //part satisfies $part/@discounted This expression is true if at least one employee element satisfies the given comparison expression. some $emp in //employee satisfies ($emp/bonus > 0.25 * $emp/salary)

3.6.3

Query Prolog

The prolog is a semicolon separated series of declarations (of variables, namespaces, functions, etc.) and imports (of schemas, modules) that create the environment for query processing.

114

CHAPTER 3. XML

A namespace declaration binds a namespace prefix to a namespace URI, adding the (prefix, URI) pair to the set of in-scope namespaces for the query. Example: declare namespace x = "http://www.my-company.com/"; Binding of prefixes to namespace URIs can locally be overridden by namespace declarations in the prolog, whose prefix binding can be overridden inside the query in an element constructor. Default namespaces can be defined for elements and attributes. Example: declare default element namespace "http://example.org/names"; declare default function namespace "http://example.org/math-functions"; A schema import adds a named schema to the in-scope schema definitions. The following example imports the schema for an XHTML document, specifying both its target namespace and its location, and binding the prefix xhtml to this namespace. Example: import schema namespace xhtml="http://www.w3.org/1999/xhtml" at "http://example.org/xhtml/xhtml.xsd"; A variable declaration adds a variable binding to the in-scope variables. The value of the variable can be provided by an initializing expression or by the external environment. Examples (from [11]): The following declaration specifies both the type and the value of a variable. This declaration causes the type xs:integer to be associated with variable $x in the static context, and the value 7 to be associated with variable $x in the dynamic context. declare variable $x as xs:integer {7}; The following declaration specifies a type but not a value. The keyword external indicates that the value of the variable will be provided by the external environment. At evaluation time, if the variable $x in the dynamic context does not have a value of type xs:integer, a type error is raised. declare variable $x as xs:integer external; A function declaration adds a user defined or external function to the available in-scope functions. User defined functions must include an expression that defines the result in terms of the parameters. In order to allow main modules to declare functions for local use within the module without defining a new namespace, XQuery predefines the namespace prefix local to the namespace http://www.w3.org/2003/11/xquery-local-functions, and reserves this namespace for use in defining local functions. Example (from [11]): This local function accepts a sequence of valid employee elements (as defined in the in-scope element declarations), summarizes them by department, and returns a sequence of valid dept elements. declare function local:summary($emps as element(employee)*) as element(dept)* { for $d in fn:distinct-values($emps/deptno) let $e := $emps[deptno = $d]

3.6. XML QUERY return {$d} {fn:count($e)} {fn:sum($e/salary)} };

115

This application of the previous function computes a summary of employees in Denver. local:summary(fn:doc("acme_corp.xml")//employee[location = "Denver"])

3.6.4

Exercises

Exercise 3.6.1. Solve the exercises of Section 3.5.3 using now XQuery instead of XSLT.

116

CHAPTER 3. XML

Bibliography
[1] Apache. http://www.apache.org/. [2] Brill, G. Codenotes for XML. Random House, 2001. [3] Date, C. J. An Introduction to Database Systems. Addison Wesley, 1995. [4] Perl. http://www.perl.com/. [5] PHP. http://www.php.net/. [6] PostgreSQL. http://www.postgresql.org/. [7] Search Engine Watch. http://searchenginewatch.com/. [8] XML. http://www.w3.org/XML/. [9] XMLSchema. http://www.w3.org/XML/Schema. [10] XPath. http://www.w3.org/TR/xpath. [11] XQuery. http://www.w3.org/TR/xquery/. [12] XQuery Use Cases. http://www.w3.org/TR/xquery-use-cases/. [13] XSLT. http://www.w3.org/Style/XSL/. [14] Zaiane, O. Database systems and structures, 1998. lecture notes (also availble on line: http://www.cs.sfu.ca/ CC/354/zaiane/material/notes/contents.html).

117

Similar Documents

Premium Essay

Information System

...INFORMATION SYSTEMS Author: RUEL GRAFIA Thesis Statement: Information system has been used pervasively on advanced economies by people and organization that rely most of this activities on mobile and digital technology. I. Introduction A. Definition of Information Systems B. Business Information Systems II. Body A. Components of Computer-based Information Systems B. Individuals’ and Organizations’ activities in the Information System III. Conclusion Organizations and individuals rely on information systems, generally internetbased for conducting much of their personal lives for socializing, study, shopping, electronics banking, and entertainment. By bringing together all of the components of information systems hardware, software, infrastructure, and people a business will have a better chance of adapting with new technology and keeping up with consumer trends to remain profitable and competitive. Thesis Statement: Information system has been used pervasively on advanced economies by people and organization that rely most of this activities on mobile and digital technology. Introduction Information systems play an important role in supporting the organization to conduct its business processes. An information system collects, stores, and disseminates information from an organization’s environment and internal operations to support organizational functions and decision making, communication, coordination, control, analysis, and visualization (Laudon & Laudon, 2012, p. 15)...

Words: 679 - Pages: 3

Premium Essay

Information Systems

...Axia/HCS 483 | Information System Briefing | November,2011 | Information System Briefing Selecting and acquiring information systems is important to this organization. Information systems are designed based on the organization’s needs, thus simplifying patient needs, departmental, and other information. Information systems should also accommodate other departments as well such as radiology, pharmacy, and other various branches of the hospital. An information system is, of course, an expensive technology but can greatly benefit the hospital. With information systems many systems have to be installed databases, storage, identification, and selection. Once it is up the organization still has to think about an offsite back up station with a firewall. It is to much for an organization or investors to take in, making it imperative that the organization receives the system designed to their needs. Selecting and Acquiring When selecting what is needed for the organization one has to look at many different variables. Does the organization have many branches, do they have different specialties, patients, billing, and other various staff. Communication between these is vital to making the information system work. The information also has to be user friendly and easy to obtain by staff. If any of the information is not easily obtainable it will make the organization run slow in all departments or fragment it. That means that patient information could be lost resulting in duplicate...

Words: 1181 - Pages: 5

Free Essay

Information Systems

... * * * * * * * * * * * * * Information System Briefing * * * * * * * * * * * * * * * * * * An information system is a combination of data, processes and information technology that interact to collect, process, store and provide output for an organization (Wager,K 2009). In an health care organization there are two types of information systems: administrative and clinical. In order for an organization to find the best system they must follow the process for selecting and acquiring an information system. * System implementation begins when the organization gains the system and begins to put it in use. There are several stakeholders that are involved in the implementation process. The CFO (chief financial officer) manages the budget and all future expenses. The CEO (chief executive officer) is the leader of the organization and overlooks everything that is done. The implementation team gets everything in order and ready for the implementation of the new system. The vendors job is to find the system that best fits the buyers requirements. The IT department operates and assists with technical support. To start the implementation process an implementation team should be assembled and a system champion must be identified. The system champion will be responsible for leading the team. Then the team will come together and determine...

Words: 701 - Pages: 3

Free Essay

Information System

...Information Systems and Software Applications Software applications and information systems help businesses to manage every area of their business affairs. In a company it is extremely important to have not only an appropriate information system, but also the appropriate software applications that will allow one to conduct daily tasks. It will determine how successful a business can be. Businesses that take the time to invest these types of software can help take an accurate snapshot of where their business has been and where they are going. Usually, the different departments within an organization will have different information systems; however, they may have similar applications or a package of application on a system. Human resources, Finance, and Accounting are three departments in an organization that usually depend on reliable information systems (IS) and must have Software applications that support the duties and help them conduct day-to-day business. In this paper I will discuss these three departments and the system they would benefit from as well as applications that they may require. One of many tasks that the human resources departments has is keeping records of new, current, future, or potential employees. Two information systems compatible with their duties are the office automation system, and the decision support system. The office automation systems would allow human resources employees to support activities for individuals and input data regarding...

Words: 627 - Pages: 3

Premium Essay

Information Systems

...Information Systems Department XBIS 219 August 28, 2010 Every company has different departments and those departments are in charge of different tasks and employees. In order for the departments to do their job efficiently, employees need a software application to help them complete their tasks. This type of software makes employee's jobs easier and helps the company keep track of their success. Two different examples are the a human resource department and accounting department. The human resources is responsible for hiring new employees, also to establish effective policies and procedures. Human resources (HR) can benefit greatly from the use of iCIMS software. This software can do many tasks for the department. iCIMS is designed to help businesses make the hiring process more efficient by using software for screening and storing applicant information, enabling web-based job applications, tracking candidates, monitoring performance after recruitment, and managing post-employment processes. As stated by the iCIMS (2010) website, " iCIMS' Talent Platform streamlines the entire talent lifecycle in one easy-to-use recruitment and HR software application." The accounting department is those in which deal with money paid, received, borrowed, or owed in the company. Good software that this department can use is Accounting Information System, better known as AIS. According to Accounting Information Systems (2010), “The purpose of AIS is to accumulate data and provide decision...

Words: 377 - Pages: 2

Premium Essay

Information System

...Contents Information Systems Proposal Table of Contents EXECUTIVE SUMMARY 2 Information Systems 4 EXECUTIVE SUMMARY The Objective… Explain the different types of information systems available to businesses.   * Processing Payroll   * Point-of-sale Terminal   * Microsoft Office   * Report of sales for individual customers   * Electronic commerce To operate a successful business one must know and understand the information technology aspects, which enhances the daily operations of the business. Small businesses are at a disadvantage and must seek ways to expand and become viable.   “Strategic Information Systems provide a competitive advantage by helping an organization implement its strategic goals and increase its performance and productivity” (R. Kelly Rainer Jr., Casey G. Cegielski, 2011). There are six reasons why information systems are so important for businesses today and they include:   * Operational excellence -   Businesses improve the efficiency of their operations in order to achieve higher profitability.   * New products, services, and business models -   Business models describe how a company produces, delivers, and sells a product or service to create wealth.   * Customer and supplier intimacy - When businesses serve its customers well, the customers usually return and purchase more. This allows businesses to engage its suppliers, which enables the suppliers to provide vital input.   * Improved decision making - Information system made it...

Words: 264 - Pages: 2

Premium Essay

Information System

...y Abstract Information Systems are the systems which are the made by the combination of the various hardware and software that people and organizations use to collect, filter, process, create, and distribute data. There are various types of information systems, for example: transaction processing systems, decision support systems, knowledge management systems, learning management systems, database management systems, and office information systems. Critical to most information systems are information technologies, which are typically designed to enable humans to perform tasks for which the human brain is not well suited, such as: handling large amounts of information, performing complex calculations, and controlling many simultaneous processes.  The domain of study of IS involves the study of theories and practices related to the social and technological phenomena, which determine the development, use, and effects of information systems in organization and society. Acknowledgement The satisfaction that accompanies that the successful completion of any task would be incomplete without the mention of people whose ceaseless cooperation made it possible, whose constant guidance and encouragement crown all efforts with success. We are grateful to all the concern websites and the book written by experienced professionals for the guidance, inspiration and constructive suggestions that helpful us in the preparation of this project. We also thank our colleagues who gave...

Words: 2036 - Pages: 9

Premium Essay

Information Systems

...Information Systems A Proposal To: Jeremy Black February 13, 2013 Contents Welcome 3 Performance Objective 3 Where to Start 3 Understand Information Systems 4 Management information System 4 Office Automation system 4 Supply Chain Management System 4 Electronic Commerce System 4 Executive Dashboard 4 Proposal 5 Thank You 6 [ ] Mr. Black, Thank you for contacting Wizard Tech for your information system needs for your company. We are honored to provide you with information on a variety of systems that are available for your company. This proposal will provide you with information on each system and address any question you may have concerning each system. Should you still have question or concerns please contact our office anytime. The contact information is located on the last page of this proposal. Performance Objective Information recording has change throughout the years through the change in technology. Technology advancements are changing the needs of companies to compete in their respectful markets. Here at Wizard Tech we know that it can be stressful for business owners to choose the correct information system to use for their new business. Our goal is to minimize your stress by walking you through picking your new system, installing it for you, and provide the necessary training needed to operate the new system. Where to Start When starting a new business it can be hard to choose the right information...

Words: 847 - Pages: 4

Premium Essay

Information Systems

...Health Care Information Systems Samantha Pernett November 1st, 2014 HCS/483 Professor Karen Johnson Technology changes and updates all the time and it is very important for organizations to stay up to date so that they are able to provide the best care. In health care this is also very important medicine as well as technology changes so much. This is why we have decided to change out information system. This can be a lengthy process and there are many steps. We will have to pick a system that fits best for our organization as well as transition into the new system by having everyone trained and educated. This will be a long process but it will help our organization be able to provide the best quality of care to our patients. As an organization we have to make sure that we cover all of the basics before making a change to a new information system. The first step is to research and decide which information system is going to fit best with our health care organization. As stated by Joy Hicks (2014) “The considering organization must ask, “What expense are we willing to spend on implementing this new process and what are the benefits from each choice, outsourcing or in house?” (pg 1). There are many things that we have to take into consideration when selecting and information system. We have to think about, cost, time, benefits, training, and education. All of these things are important. We also what to select at system that will be able to include all the information that we need...

Words: 843 - Pages: 4

Free Essay

Information Systems

...company supported this by explaining that a single McDonalds store takes into consideration all the factors such as labour rate, raw materials, electricity, water and land costs in price fixing. Mc Donalds uses a number of information systems in its day to day management of the store. Information Systems Of all the information systems used in Mc Donalds, the most used are the Inventory System, and the Employee Scheduling System. Inventory system is nothing but a modified Library Information System. It is because the library and a particular Mc Donalds store, both act as a single system in which there are many items categorised in sections which keeps coming in and the existing ones keep going out. This intricate similarity made the systems use each other. The Employee Scheduling System is a part of Management Information System. It has an interface and a database to make all other functions of this system easy to run. Explanation Inventory System (Library Information System): A library information system enables the users to track down each and every single product at the facility. The inventory system uses a similar way to store the list of things in the store, and all the details about them to be used whenever necessary. The working of an Inventory system can be explained as: 1. Assistant Manager counts everything in the store on a weekly basis. 2. He stores all these details in the central computer. 3. Manager predicts the next week’s sales based on previous...

Words: 1070 - Pages: 5

Premium Essay

Information System

...Introduction to Information Systems Fundamental of Information Systems, Sixth Edition Principles and Learning Objectives  The value of information  how it helps decision makers achieve the organization’s goals  Distinguish data f g from information f  Knowing the potential impact of information systems  Id tif the b i t Identify th basic types of b i f business i f information systems ti t  who uses them, how they are used, and what kinds of benefits they deliver  To build a successful information system y  System users, business managers, and information systems professionals must work together g 2 Principles and Learning Objectives  The use of information systems to add value to the organization  Identify some of the strategies employed to lower costs or improve y g p y p service  Identify the value-added processes in the supply chain value added  Define the term competitive advantage  IS personnel is a key li k  Define the types of roles, functions, and careers available in information systems 3 Why Learn About Information Systems in Organizations?  How might the information system used depend on the various components of a computer-based information system: hardware, software, databases, telecommunications, people, and procedures?  How do computer-based information systems help businesses p y p implement best practices?  Information systems are used in almost every imaginable profession to reach customers around the world  Information systems in an organization...

Words: 3811 - Pages: 16

Premium Essay

Information Systems

...Information Systems and Software Applications XXXXXXX BIS/219 XXXXXX XXXXXXX Information Systems and Software Applications Computer systems have had an amazing impact on the way businesses operate. Technology has advanced so remarkably that those who are not using computers in their business are at a major disadvantage against their competitors (Writing, 2011). Computers allow the application of different types of software that assist businesses in their everyday operations, from maintaining files, monitoring inventory, selling goods, purchasing supplies to paying employees. Some information systems support an entire organization, others only support certain divisions within the organization. Each area supported is referred to as a functional area, a few examples are Finance information system, Marketing information system, Management information system, Accounting information system, and Human Resources information system. The first functional area considered is the world of finance. Finance is extremely critical to the success of an organization; an information system designed to process payroll for the employees would be very beneficial. This type of information system is used in a particular functional area such as finance. An example could be a software application as simple as Quicken or as complex as an application custom designed specifically for this organization. A financial information system is a necessity for an organization to properly and efficiently...

Words: 543 - Pages: 3

Premium Essay

Information System

...1. Compare and contrast the application of information technology (IT) to optimize police departments’ performance to reduce crime versus random patrols of the streets. 2. Describe how COMPSTAT, as an information system (IS), implements the four (4) basic IS functions: 1. Input 2. Processing 3. Output 4. Feedback 3. Determine how information systems have allowed police departments that implement tools such as COMPSTAT to respond to crime faster. 4. Apply the strengths, weaknesses, opportunities, and threats analysis (SWOT analysis) on behalf of police departments that intend to implement predictive policing. 5. Use at least three (3) quality resources in this assignment. Note: Wikipedia and similar Websites do not qualify as quality resources. You may use the resources above or others of your choosing Predictive Policing | Information Technology, or IT, is the study, design, creation, utilization, support, and management of computer-based information systems, especially software applications and computer hardware. Information technology is not limited solely to computers, but other devices such as mobile phones, PDAs and other handheld devices. The field of IT is quickly moving from compartmentalized computer-focused areas to other forms of mobile technology.("Information Technology," 2011) Over the last decade, computer and telecommunications technologies have developed at a surprising rate. Increased computing...

Words: 253 - Pages: 2

Premium Essay

Information Systems

...0965944 Information Systems A consultancy report of Aalsmeer Flower Auction. Submitted to Ian Durling. Submitted by 0965944. Words 2919. 0965944 Contents 1.0 Introduction 1.1 Information Technology and Information Systems 1.2 Organisation chart of Aalsmeer Flower Auction 1.3 Business Environment 1.4 Swot Analysis of Aalsmeer Flower Auction 1.5 Pest Analysis of Aalsmeer Flower Auction 1.6 Porters Five Force Analysis 1.7 The Value Chain 1.8 Enterprise Application Architecture 2.0 Design Methodology 3.0 Dimensions of Information Systems 4.0 Conclusion 0965944 1 1.Introduction Aalsmeer Flower Auction, located in the Netherlands is the biggest flower auction of the world. It offers global growers, wholesalers and exporters a central place for the buying and selling of floricultural products with a range of marketing channels, facilities for growers, buyers and logistics. Every phase of the trade of flowers is managed in the Netherlands, pricing, packaging, distribution and quality control. Most of the flowers come from the Netherlands also Spain, Israel and Kenya among others. (Boonstra A & Van Dantzig,06 pg2). This has made AFA a prominent link in the International Chain of the flower auction market. New developments in the auction market has threatened the comfortable position of AFA. E-Networks the emergence of alternative electronically driven flower markets. Mergers and acquisition among...

Words: 3700 - Pages: 15

Premium Essay

Information Systems

...1. How do information systems projects get started in organizations? In order for Jim to initiate the project, he must first determine the size, scope, and resource requirements for the project. Information systems projects are started in organizations by first establishing the project initiation team. This activity involves organizing an initial core of project team members to assist in accomplishing the project initiation activities.  The project initiation team establishes the project initiation plan. This step defines the activities required to organize the initiation team while it is working to define the scope of the project. In order to have an organized approach or process, you must analyze what’s going on; then design a solution to the problem; and finally monitor and control. 2. How are organizational information systems related to company strategy? How does strategy affect the information systems a company develops and uses? The organizational information system is related to company strategy because it exists to help organizations achieve their goals and objectives. It is also determined by it competitive strategy. Strategy affects the information system because there are information services resources that apply to a strategic business opportunity in such way the computer systems have an impact on the organizations products and business operations.  4. What do you think Jim’s next step should be? Jim’s next step should be to create an information team and...

Words: 307 - Pages: 2