Free Essay

S Plus

In:

Submitted By debasis
Words 10252
Pages 42
Copyright © 2001 CANdiensten. All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (CANdiensten, Nieuwpoortkade 25, 1055 RX Amsterdam, The Netherlands).

ISBN 90-804652-2-4

Preface
In September 1997 Insightful (formerly known as MathSoft) released S-PLUS 4 for Windows, which added a complete new graphical user interface to the existing S-PLUS programming environment. It allowed non-programming minded users to access the advanced visualization techniques and modern analysis methods of S-PLUS. As a result, S-PLUS has gained enormous popularity over the past years among applied statisticians and data analysts. Over the last three years the content of this book has been used for introductory courses on S-PLUS. This book is aimed at people who are (completely) new to S-PLUS. It covers the graphical user interface of S-PLUS and introduces the underlying S language. The main goal of this book is to get you started and the best way to do that is to use this book interactively during an S-PLUS session. After reading this book you should be able to import data into the system, do some data manipulation and data cleaning. Furthermore, you should be able to visualize your data, apply statistical functions to your data, enter basic S-PLUS commands and write functions. The first edition of this book appeared in 1999, it was aimed at users of S-PLUS 2000. This edition is updated for S-PLUS 6 (see Appendix C for a list of new features). Chapters 8 and 9 about the S language have been restructured and extended. Throughout the book more examples are given, spelling mistakes and other errors have been corrected. Of course if you have suggestions on improving this book you can contact the author by electronic mail: longhow@splusbook.com

Online Material
To find out the latest news on this book go to the web page of this book. www.splusbook.com Most of the data examples use S-PLUS built-in data sets. The example data sets that are not built-in can be found on the web page together with an errata list and answers of the exercises of chapters 8 and 9.

Acknowledgements
I would like to thank Dick Verkerk and Gosé Fischer of CANdiensten for making time and resources available. Without them writing this book would not have been possible. Ian Gander and Peter Hallett of Insightful are acknowledged for supporting my book. Furthermore, the following people are acknowledged for their contribution: Arnold Dekkers and Siem Heisterkamp (National Institute of Public Health and Environment), Ronald Verwer (Netherlands Institute for Brain Research), Ronald Geskus and Maria Prins (Municipal Health Service Amsterdam), P. Dwarshuis (CBS), Barney Campbell (Laputa Computing Ltd), Ruud Koning (University of Groningen), Richard Pugh (Insightful International), Johan Krijnen and Willem de Winter (CANdiensten) and Marjan Heisterkamp. Last but not least, all the enthusiastic people following my S-PLUS courses that gave me valuable comments and suggestions. Longhow Lam Amsterdam, November 2001

4

Contents
CONTENTS .............................................................................................................................................5

1. Introduction............................................................................................................10
1.1 INSTALLATION ...............................................................................................................................11 1.1.1 System requirements ..............................................................................................................11 1.1.2 The S-PLUS network version.................................................................................................11 1.1.3 Directory choices...................................................................................................................12 1.1.4 Excel add-in and SPSS Link ..................................................................................................12 1.1.5 Acrobat Reader......................................................................................................................12 1.1.6 S-PLUS-Libraries ..................................................................................................................12 1.2 STARTING S-PLUS........................................................................................................................12 1.3 OTHER PRODUCTS BASED ON S AND S-PLUS ................................................................................13 1.3.1 S-PLUS for UNIX and Linux .................................................................................................13 1.3.2 The S-PLUS standard edition ................................................................................................13 1.3.3 S-PLUS modules ....................................................................................................................13 1.3.4 StatServer and S-PLUS analytical Server .............................................................................14

2. Help and documentation .......................................................................................15
2.1 ONLINE DOCUMENTATION .............................................................................................................15 2.2 ONLINE HELP INFORMATION ..........................................................................................................15 2.3 BOOKS ...........................................................................................................................................15 2.4 RESOURCES ON THE WEB ...............................................................................................................17

3. Importing data .......................................................................................................18
3.1 IMPORT ..........................................................................................................................................18 3.2 IMPORT OPTIONS ............................................................................................................................20 3.2.1 Column and row names, data block location.........................................................................21 3.2.2 Delimiters ..............................................................................................................................21 3.2.3 Importing characters .............................................................................................................21 3.3 FILTERING DATA ............................................................................................................................22 3.3.1 Filter expressions ..................................................................................................................22 3.3.2 Block reads and writes ..........................................................................................................22

4. Managing objects and projects .............................................................................23
4.1 INTRODUCTION ..............................................................................................................................23 4.2 THE WORKING DATABASE, A CHAPTER AND THE SEARCHPATH.....................................................23 4.2.1 Selecting a working database to work with ...........................................................................24 4.2.2 Starting S-PLUS in a new Chapter ........................................................................................25 4.2.3 The SearchPath .....................................................................................................................25 4.2.4 Masked objects ......................................................................................................................28 4.3 THE OBJECT EXPLORER .................................................................................................................28 4.3.1 Folder filters..........................................................................................................................29 4.3.2 Creating new folders .............................................................................................................30 4.3.3 Object Explorer pages ...........................................................................................................31 4.3.4 Right pane settings.................................................................................................................32 4.3.5 Saving an Object Explorer ....................................................................................................32 4.3.6 Searching for objects in the Object Explorer ........................................................................32

5. Data in S-PLUS ......................................................................................................35
5.1 DATA WINDOWS ...........................................................................................................................35 5.1.1 Hot spots in data windows.....................................................................................................35 5.2 DATA TYPES AND DATA STRUCTURES ............................................................................................37 5.2.1 Data types..............................................................................................................................37 5.2.2 Data structures ......................................................................................................................37 5.3 DATA ENTRY IN DATA WINDOWS ...................................................................................................38 5.3.1 Typing in data........................................................................................................................38 5.3.2 Changing data types ..............................................................................................................39 5.3.3 Copying and pasting data......................................................................................................40 5.3.4 Importing data .......................................................................................................................40 5.3.5 Generating data.....................................................................................................................40

5

5.3.6 Moving data...........................................................................................................................42 5.4 MANIPULATING DATA ...................................................................................................................44 5.4.1 Merging data frames .............................................................................................................44 5.4.2 Recode values ........................................................................................................................46 5.4.3 Splitting data .........................................................................................................................47 5.4.4 Create categories...................................................................................................................47 5.4.5 Stack and unstack ..................................................................................................................49 5.4.6 Creating subsets ....................................................................................................................51 5.4.7 Selecting cases by name ........................................................................................................53 5.4.8 Creating missing values.........................................................................................................54 5.5 USING EXCEL DATA MANIPULATION FEATURES .............................................................................54 5.5.1 Opening a new Excel sheet in S-PLUS ..................................................................................55 5.5.2 Linking the Excel data to S-PLUS .........................................................................................58 5.5.3 Opening existing Excel worksheets .......................................................................................61 5.6 EXPORTING DATA ..........................................................................................................................61 5.6.1 Export options and Filter ......................................................................................................61

6. Creating graphics...................................................................................................63
6.1 INTRODUCTION ..............................................................................................................................63 6.2 TWO-DIMENSIONAL GRAPHS ..........................................................................................................63 6.2.1 Plot buttons............................................................................................................................63 6.2.2 Scatter plot with linear fit ......................................................................................................64 6.2.3 Interactive graphics...............................................................................................................64 6.2.4 A graph as object...................................................................................................................66 6.3 FURTHER POSSIBILITIES WITH S-PLUS GRAPHS ............................................................................67 6.3.1 Adding objects .......................................................................................................................67 6.3.2 Adding titles and texts............................................................................................................67 6.3.3 Adding symbols......................................................................................................................67 6.3.4 Identifying points ...................................................................................................................67 6.3.5 Scaling Axes...........................................................................................................................69 6.3.6 Setting tickmarks and labels on the axes ...............................................................................69 6.4 MULTIPLE GRAPHS ........................................................................................................................74 6.4.1 Add an extra plot to an existing one ......................................................................................74 6.4.2 Inserting a second Y axis .......................................................................................................75 6.4.3 Multiple graphs on one graphsheet .......................................................................................77 6.4.4 Multi page Graphsheets.........................................................................................................78 6.5 TRELLIS GRAPHICS ........................................................................................................................79 6.5.1 Introduction ...........................................................................................................................79 6.5.2 Further examples...................................................................................................................81 6.6 THREE-DIMENSIONAL GRAPHS .......................................................................................................83 6.6.1 Surface plot............................................................................................................................83 6.6.2 2D graphs in a 3D graph.......................................................................................................84 6.6.3 A 3D scatter plot with text .....................................................................................................85 6.7 EXPORTING GRAPHS ......................................................................................................................87 6.7.1 S-PLUS graphsheet format....................................................................................................87 6.7.2 Other formats.........................................................................................................................87 6.7.3 Copy and Paste......................................................................................................................88 6.7.4 The PowerPoint presentation Wizard....................................................................................88 6.8 GRAPHING EXCEL DATA ................................................................................................................88 6.8.1 Drag and drop .......................................................................................................................88 6.8.2 The S-PLUS add-in................................................................................................................89 6.8.3 Excel Active Document Support ............................................................................................90

7. Statistics using the GUI .........................................................................................91
7.1 INTRODUCTION ..............................................................................................................................91 7.2 AN OVERVIEW OF THE STATISTICS MENU IN S-PLUS.....................................................................91 7.3 BASIC STATISTICS ..........................................................................................................................94 7.3.1 Summary statistics .................................................................................................................94 7.3.2 Correlations...........................................................................................................................95 7.4 COMPARING SAMPLES ...................................................................................................................95 7.4.1 One sample ............................................................................................................................95 7.4.2 Two or more samples.............................................................................................................97

6

7.5 STATISTICAL MODELS ....................................................................................................................99 7.5.1 Linear regression................................................................................................................. 100 7.5.2 Survival analysis.................................................................................................................. 105 7.5.3 Linear mixed effects models................................................................................................. 111 7.5.4 Nonlinear Regression .......................................................................................................... 115

8. The basics of the S programming language.......................................................119
8.1 INTRODUCTION ............................................................................................................................ 119 8.2 THE COMMANDS WINDOW AND SCRIPT FILE ............................................................................... 119 8.3 DATA OBJECTS ............................................................................................................................ 122 8.3.1 Data types............................................................................................................................ 122 8.3.2 Data Structures.................................................................................................................... 127 8.4 DATA MANIPULATION.................................................................................................................. 135 8.4.1 Vector subscripts ................................................................................................................. 135 8.4.2 Matrix subscripts ................................................................................................................. 137 8.4.3 Data frames ......................................................................................................................... 139 8.4.4 Other useful functions to manipulate data frames............................................................... 140 8.4.5 Attributes ............................................................................................................................. 142 8.4.6 Character manipulation. ..................................................................................................... 143 8.4.7 Creating factors from continuous data................................................................................ 145 8.5 GRAPHICAL FUNCTIONS ............................................................................................................... 145 8.5.1 Traditional graphics vs. editable object oriented graphics ................................................. 145 8.5.2 Traditional graphics ............................................................................................................ 146 8.5.3 Graphical parameters.......................................................................................................... 150 8.5.4 Trellis graphics.................................................................................................................... 157

9. Further possibilities with the S language...........................................................161
9.1 STATISTICS AND THE S LANGUAGE .............................................................................................. 161 9.1.1 Basic Statistics..................................................................................................................... 161 9.1.2 Formula objects................................................................................................................... 164 9.1.3 Linear regression models .................................................................................................... 166 9.1.4 Factor (categorical) variables as regression variables....................................................... 169 9.1.5 Cox proportional hazards models ....................................................................................... 171 9.1.6 Nonlinear regression ........................................................................................................... 173 9.2 WRITING FUNCTIONS ................................................................................................................... 176 9.2.1 Introduction ......................................................................................................................... 176 9.2.2 Function arguments and return value ................................................................................. 177 9.2.3 Control Flow ....................................................................................................................... 178 9.3 EFFICIENT CALCULATIONS........................................................................................................... 180 9.3.1 Vectorized calculations........................................................................................................ 180 9.3.2 The apply and outer functions ............................................................................................. 181 9.4 GENERATING SCRIPTS AND S-PLUS CODE................................................................................... 184 9.4.1 Introduction ......................................................................................................................... 184 9.4.2 Creating the same plot for different data sets...................................................................... 185 9.4.3 Importing files from a directory .......................................................................................... 186 9.5 BLOCK READS AND WRITES ......................................................................................................... 187 9.5.1 Minimum and maximum ...................................................................................................... 188 9.5.2 Scatter plot........................................................................................................................... 188

10. Modifying the graphical user interface (GUI).................................................190
10.1 INTRODUCTION .......................................................................................................................... 190 10.2 CREATING NEW MENUS.............................................................................................................. 191 10.3 CREATING NEW TOOLBARS ........................................................................................................ 194 10.4 CREATING NEW DIALOGS ........................................................................................................... 196 10.5 VIEWING EXISTING DIALOGS IN THE OBJECT EXPLORER ............................................................ 205 10.6 S-PLUS FUNCTIONS FOR EXTENDING THE USER INTERFACE ...................................................... 206 10.7 DEPLOYING AND REMOVING CUSTOMIZED USER INTERFACES .................................................... 208 10.7.1 Deploying .......................................................................................................................... 208 10.7.2 Removing ........................................................................................................................... 208

Appendix A Plot toolbars ......................................................................................210
2D PLOTS .......................................................................................................................................... 210 Pie plot ......................................................................................................................................... 210

7

Grouped Bar Charts ..................................................................................................................... 211 Box plots ....................................................................................................................................... 212 3D PLOTS .......................................................................................................................................... 213 ANNOTATIONS .................................................................................................................................. 214

Appendix B Preferences ........................................................................................215 Appendix C What’s new in S-PLUS 6 for Windows ..........................................218 Appendix D Exercises ............................................................................................221 INDEX.......................................................................................................................229

8

Typographical conventions
In this book we refer to names of S-PLUS software objects, such as menus and dialogs. References to menus and submenus are given in bold font and submenus within menus are indicated by a little arrowhead like: • Go to the menu File ► Import Data ► From File

References to dialogs, tabs of dialogs and fields of dialogs are put between quotes, as in: • • • Type a name in the ‘Save As’ field from the ‘Linear Regression’ dialog. Go to the ‘Options’ tab to .... Check the checkbox ‘Print Results’ to ...

S-PLUS commands, names of functions and output of S-PLUS routines are given in typewriter font like: cars.fit is larger than; = is larger than or equal to; & and; | or. ! not; The variable names in the filter expression must be known in the data file. Examples of filter expressions: ozone > 2.4 ozone > 1.3 & ozone < 3.2 Sex == “Male” Sex == “Female” & Bloodtype != “A” Note When a data file does not have column names, S-PLUS will generate default column names (like Col1, Col2, Col3 etc). These names must be used in the filter expressions.

3.3.2 Block reads and writes
A new feature in S-PLUS 6 is the ability to process data sequentially, using block reads and writes. Instead of importing all the rows of a data file one after another, it is possible to read the first 10,000 rows, process these, then read the next 10,000, process these etc. This requires some knowledge of the S language, since the block reads and writes are not integrated in the menu system. We will give some examples in section 9.5.

22

4. Managing objects and projects

CHAPTER 4
4. Managing objects and projects
Before starting to create fancy graphs and to calculate all sorts of statistics, it is important to get a feeling for the way S-PLUS manages objects. It is easy to create objects, it is even easier to get lost!

4.1 Introduction
The S-PLUS environment is object oriented. Everything in S-PLUS is an object of a certain class, having certain properties. There are many object classes, each with their own property set. The class of objects that a beginning user will probably encounter first, is that of data frames. Other classes of objects include functions, matrices and statistical output objects. In fact, everything you do in S-PLUS boils down to creating or manipulating an object. The Object Explorer is an easy-to-use, yet powerful tool for managing, creating and manipulating the different objects in S-PLUS. It consists of two panes, the left pane containing folders and the right pane showing the properties of the object selected in the left pane. It is important that you familiarize yourself with the Object Explorer as soon as you start using S-PLUS. It is a very important and useful part of the graphical user interface. To open the Object Explorer, click on the Object Explorer button on the main toolbar. Before learning about all possibilities of the Object Explorer in Section 4.3, you should know the meaning of a chapter, the working database and the SearchPath. Let’s discuss these first.

4.2 The working database, a Chapter and the SearchPath
Most objects you create or modify during an S-PLUS session are automatically stored as S-PLUS objects in a so-called working database (a directory with files) on disk. Such objects include data frames you create, functions you write or output objects from statistical models (see Chapter 7). Objects that do not fall into this category are graphsheets, report windows, script files and interface objects. These objects are usually stored “manually” by the user and must be stored outside the working database. A working database has the name “.Data”. During the installation process of S-PLUS, the user is asked to specify a working database. The default working database is something like: C:\Program Files\splus6\users\username\.Data The working database is part of the so-called Chapter (or project directory), which in this example is:

23

4. Managing objects and projects

C:\Program Files\splus6\users\username This directory also contains a “.Prefs” directory, where ‘personal’ preferences are stored. The Chapter may also contain other files and directories, which are not discussed in this book. After the installation of S-PLUS it is always possible to create and work with new working databases, as we will see in the next section. Objects created or manipulated in an S-PLUS session will be saved on disk (in the working database) and will be available when you restart S-PLUS with the same working database. Of course, if you delete an object in S-PLUS it will be removed from the working database and it is lost. Outside S-PLUS, you can use Windows Explorer to look at the working database. This folder may also contain files with odd names such as “_10”, “_12”, etc., as you can see in the next figure.

Most of these files correspond with an object in an S-PLUS session. The name mapping between the files in the “.Data” directory and the objects in S-PLUS can be found in a text file named “__nonfi”, which has the following structure: “sam” “_10” “Sam” “_11” “Test.Data” “_13” The first name is the name used in S-PLUS, the next one is the corresponding file name in the working database, etc. You are strongly advised not to touch the files in the S-PLUS working database. Deleting or changing one or more files in the working database may result in ‘unwanted’ effects! It is also not recommended to manually put (your own) files in the “.Data” directory (using Windows Explorer). The “.Data” directory should only contain files that S-PLUS placed there!

4.2.1 Selecting a working database to work with
When you start S-PLUS from the Windows Start menu, you will be asked which directory you want to use as a Chapter (project directory).

24

4. Managing objects and projects

Select the Chapter (project directory) and click OK. S-PLUS will start and the objects you create during the session will be stored in the working database of the selected Chapter.

4.2.2 Starting S-PLUS in a new Chapter
To organize your S-PLUS work it is recommended to use different working databases for different projects. If you are working on project A, you don’t want to have all the (irrelevant) objects of project B in the same directory. However, you may want to use some objects of project B while you are working at project A. We will explain in the next chapter how to do that. Suppose you start on a new data analysis project and you find it convenient to work in a new S-PLUS Chapter. All you have to do is to create an empty Chapter (a directory somewhere on disk). For example: D:\SPLUSprojects\mortgages Start S-PLUS. Select the newly created Chapter in the ‘Open S-PLUS Project’ dialog that appears, and click OK. S-PLUS will automatically create a working database (a .Data subdirectory) in this Chapter and all the objects created in this session will be stored in this working database. Now suppose you are told to work also on a totally different data analysis project. To structurize your work you create a new Chapter, for example: D:\SPLUSprojects\savingsaccounts Start S-PLUS and select the new project directory.

4.2.3 The SearchPath
The SearchPath is a list of directories (databases) in S-PLUS. When you start an S-PLUS session, by default the SearchPath consists of your working database and S-PLUS system databases. You can see the list of directories in the Object Explorer (see next figure).

25

4. Managing objects and projects

The order of the directories in the SearchPath is important; the first directory is always used as the working database. This means that importing a data file or writing a function will result in the creation of objects that are placed in the first directory. For example, the Exenvirn data set we have imported in Chapter 3 has become an object in the first database. To see all the objects in the selected database, select the first database in the left pane of the Object Explorer.

The other directories (databases) contain existing S-PLUS functions and built-in data sets. For example, look at what happens when you fit a linear regression model. S-PLUS searches for the specific fitting function, first in database 1, then in database 2, and so on, until the specific function (the function lm in this case) is found. The SearchPath is not a static list. New databases can be attached to the SearchPath and existing databases can be detached. Changing the working database Using the SearchPath, you can easily change your working database without having to close S-PLUS and starting it again. Suppose that in your current session you want to use the Chapter D:\SPLUSprojects\mortgages as your working database. Proceed as follows: • Go to the menu File ► Chapters ► New Working Chapter • You will see the following dialog

26

4. Managing objects and projects

• • •

Browse to the Chapter (directory) you wish to attach. So this directory contains the .Data as subdirectory. Type a label for the Chapter. This will be the label shown in the SearchPath. Click OK to update the SearchPath.

The SearchPath now contains the new working database. This means that if you import a data set, the resulting object will be stored in the new working database. The old working database is not listed in the SearchPath anymore and you can no longer make use of objects in that database. Attaching a new directory (database) Instead of replacing your old working database by a new working database, you can also add a new database to the SearchPath. Suppose that in your current session you want to make use of the objects in the Chapter D:\SPLUSprojects\mortgages, but you want to keep working in your current working data base. Proceed as follows. • • Go to the menu File ► Chapters ► New Working Chapter You will get the following dialog

• • •

Select the directory and enter a label for the database Select the position you want to give the database in the SearchPath. Click OK

Note When you select “Position 1”, the new database will be attached on position 1 and becomes the working database. The old working database is moved to position 2. To detach a database from the SearchPath go to the menu File ► Chapters ► Detach Chapter. You will get a list of databases that can be detached.

27

4. Managing objects and projects

Select a database to detach and click OK. Note If you only have your working database and the S-PLUS system databases, you can not detach a database.

4.2.4 Masked objects
In an S-PLUS database you cannot have two objects with the same name (but as the names are case sensitive, you can have objects test and Test). You can however, have an object test in one database and another object test in a different database attached to S-PLUS. The object test in the database that is attached higher in the SearchPath, maskes the object test in the database attached lower. The masked object will show an red X painted through its icon, as shown in the next figure. The object test in databaseA is masked by the object test in databaseB.

4.3 The Object Explorer
The Object Explorer in S-PLUS consists of two panes (see the next figure). The left pane consists of folders with references to objects and the right pane displays the properties of a selected object in the left pane. Which objects are displayed, depends on the filter settings of that folder. Each folder needs to be set separately. The default Object Explorer has five folders (Data, Graphs, Reports, Scripts and SearchPath) with specific settings. If a folder in the Object Explorer has a plus sign on the left side, you can expand that folder by clicking the plus sign. For example, if a folder contains data frames, you can expand the folder and select a specific data frame in the folder. In the right pane you will see the columns of the selected data frame.

28

4. Managing objects and projects

To change the name of an object or folder, select the object and click it once. This activates the edit mode in which you can change the name.

4.3.1 Folder filters
Once you start working with S-PLUS you will get more and more objects in your working database. One way to organize and manage those objects is to use folders with specific folder filters. By setting a filter on a folder you can specify in which databases to look for objects and which classes of objects to be displayed in the folder. Let’s illustrate this with the folder ‘Data’, a default folder in the S-PLUS Object Explorer. Right clicking it results in a context menu in which you choose ‘Advanced’. The following dialog appears:

This dialog has the following fields: • Object Creation: Select the default object type for the folder. Selecting one object type doesn’t mean that other object types won’t be displayed. It merely specifies the icon of the folder.

29

4. Managing objects and projects • • • • •

Documents: Select the type of ‘Document objects’ (graph objects, report objects or script objects) to be displayed in the folder. These ‘Document objects’ will only be displayed in the Data folder when they are selected. Interface objects: Select the type of ‘Interface objects’ (menu’s, toolbars, etc) to be displayed in the folder. Database Filter: Select the database(s) in which you want the folder to look for objects. The directories listed here are exactly the directories of the SearchPath. Usually you select the first directory of the list, which is the working database (working Chapter). Classes: Select the class of objects to be displayed in the folder. Include Derived Classes: Check this box to display objects from derived classes of the class selected in the ‘Classes’ field. For example, an object of the class ‘design’ is derived from a data frame. If this checkbox is checked and a data frame has been chosen in the field ‘Classes’, objects of the class "design" will also be displayed.

4.3.2 Creating new folders
We have seen that you can have different working databases for different projects. If the number of objects within a project becomes too large, you can use different folders in the Object Explorer to manage objects within a project. For example, you can have one folder that only shows data frames, a second folder that only shows matrices and a third folder that only shows output objects from statistical routines. To create a new folder, right click somewhere in the white space of the left pane. Select ‘Insert Folder’ from the context menu that appears. A new folder will appear with the name ‘Folder1’, which can be changed into any name you like. By default the filter of the new folder has not been set. Nothing will be displayed until you change the filter settings as described above. As an example we will insert a folder that filters on list objects only. • Right click in the left pane and select ‘Insert Folder’. • A new folder appears with the name ‘Folder1’, which can be changed to ‘List’ for example. • Right click on the folder icon and select ‘Advanced’. The following dialog appears:

• • •

Select List in the ‘Object Creation’ field. This is not really neccesary, but it gives the folder a special icon so that you can recognize it immediately as a folder for list objects. Don’t select anything in the ‘Documents’ and ‘Interface Objects’ fields. Select a database from the ‘Databases’ field. To select the working database check the box ‘Search Working Chapter Only’.

30

4. Managing objects and projects • •

Select List in the ‘Classes’ field. Click OK.

You now have the following folders in the left pane:

Note Even if you set the folder filter, the folder will probably be empty, as you have not yet created any list objects. Note that the icon of the list folder has changed and that ‘Insert list’ is one of the options in the folder’s context menu. As you may remember, the Data folder by default also displays list objects. To ensure only your newly created folder displays list objects, you right click on the Data folder, select ‘Advanced’ in the Folder dialog and deselect the ‘Classes’ field. As we will see in Chapter 7 and 9, list objects are usually returned by statistical modeling functions. To create a sub folder in a folder, right click the folder icon and select ‘Insert Folder’. A new subfolder is created in the existing folder. It has no filter yet: you will need to set it before you can see anything at all. There is no relation between the parent folder filtering and the subfolder filtering.

4.3.3 Object Explorer pages
An Object Explorer can have multiple pages. The default Object Explorer has only one page. The number of the pages is indicated by tabs, which are located at the bottom of the left pane.

To create more Object Explorer pages: • • Right click somewhere in the white area of the left pane. Select ‘Create Explorer Page’ from the context menu. The dialog ‘Explorer Page [2]’ appears.

31

4. Managing objects and projects

• •

Enter a name for the Explorer page. Add a tool tip text (optional). This text appears when you place the pointer over the page tab in the Object Explorer. You can even define your own icon for the explorer page tab.

To switch between pages, click on the tabs of the pages. The new page is empty, but you can insert folders as described above.

4.3.4 Right pane settings
The right pane shows the contents of the folder (or object) selected in the left pane. There are various ways to represent the contents. • • • Right click somewhere in the white space of the right pane. Choose ‘Right pane’ from the context menu. The dialog ‘Object Explorer [1]’ appears. Experiment with it!

4.3.5 Saving an Object Explorer
If you are satisfied with the Object Explorer you have created, you can save it for later use. To use it as default Object Explorer the next time you start S-PLUS, proceed as follows: • • Right click somewhere in the white area of the right pane. Select ‘Save Object Explorer as Default’ from the context menu.

The file corresponding with the default Object Explorer is “Default.sbf”. It can be found in the “.Prefs” directory of the S-PLUS project directory. You can also save the Object Explorer under a different name in another location. • • Make sure the Object Explorer is the active window. Go to the menu File► Save As and select a location and name for the Object Explorer.

To open an Object Explorer, go to the menu File► Open. Select ‘Object Explorer’ in the field ‘Files of Type’ of the ‘Open’ dialog.

4.3.6 Searching for objects in the Object Explorer
To look for objects in all databases of the SearchPath use the ‘Find Objects’ button. This button, represented by a little pair of binoculars, can be found on the Object Explorer toolbar. This toolbar is visible when the Object Explorer is the active window.

Suppose we want to find all objects that have a name starting with the letters ca. Click on the ‘Find Objects’ button to display the following dialog:

32

4. Managing objects and projects

• • • • •

Type the search pattern ca*in the ‘Pattern’ field. Note that the search pattern is case sensitive. Check the ‘Regular Expression’ checkbox if the search pattern is a so-called regular expression. In our example it is not a regular expression. Type the name of a folder in the ‘Folder’ field. This folder, which will be created if it doesn’t exist, is used to store the found objects. The ‘Container’ field is used to indicate whether the folder is a main folder of an Explorer Page or a subfolder of an existing folder. Click OK.

If S-PLUS finds objects matching your search pattern, a folder with the found objects is created in the Object Explorer. Our example results in the following folder.

33

4. Managing objects and projects

34

5. Data in S-PLUS

CHAPTER 5
5. Data in S-PLUS
For those readers who haven’t bought this book yet: the data in this chapter is priceless or at least equals the price of this book. Whichever is smaller!

5.1 Data Windows
To manipulate and insert data in S-PLUS you can use data windows. A data window has the appearance of a spreadsheet. However, a data window has no cell linking mechanisms and is more column-oriented. Excel spreadsheets can be used as data window in S-PLUS (for more information see section 5.5). This chapter describes how to use S-PLUS data windows. You can have multiple data windows open in one S-PLUS session. The following figure illustrates the different aspects of a data window. Data Window name Column number Column names Active cell Data Row names Row number

To open a new empty data window, go to the Data ►Select Data menu. In the dialog that appears, choose ‘New Data’. To open an existing data window, choose ‘Existing data’ and select the existing data from the pull down menu. Or alternatively, double click on a data frame object in the Object Explorer.

5.1.1 Hot spots in data windows
In a data window you can select cells, rows and columns and you can define row and column names.

35

5. Data in S-PLUS Selecting a cell: • Click on a cell to select it. Going to a cell: • Press F5 to display the dialog window ‘Go To Cell’. • Type the column and row number of the cell. Selecting a block of cells: • Select the left top cell of the block you want to select. • Hold down the Shift key and select the right bottom cell of the block. • The selected cells are highlighted. Selecting the whole data window: • Click on the grey cell above the row numbers and left of the column numbers. • The whole data window is highlighted. Selecting columns: From a data window you can select one or more columns to use for plotting purposes or for statistical data analysis. 1. Selecting one column: • Click on the number of the column you want to select. All cells will be highlighted. 2. Selecting adjacent columns: • Click on the number of the column you want to select. • Hold down the left mouse button and drag to the right or left to select the columns next to the first selected column. 3. Selecting non-adjacent columns: • Click on the number of the column you want to select. • Hold down the Ctrl key and select another column. • You can select any number of columns. • The order in which you select the columns is important for data plotting routines. Selecting rows: Selecting rows is similar to selecting columns. Instead of clicking on column numbers you click on row numbers. Dragging blocks of cells, columns or rows: You can drag columns or rows in a data window. It is used to create trellis graphics (Chapter 6) or to move columns or rows to a different location (possibly in a different data window). • • • Select a block of cells, column or row. Click on a cell in the selected block and hold down the left mouse button. The mouse cursor will change, and you can drag the block, column or row to a different location in the same data window, to another data window, to a script file or to a graph.

Column names, row names and column descriptions: In a data window you can add column names and row names. • Double click on the column name, the grey cell under the column number. • Type a name for the column. To attach a column description proceed as follows: • Right click on a cell of the desired column to display the context menu. • In this menu select ‘Properties’. • Type a description for the column in the field 'Description'. The column description will appear as a tip text when you hover over the column name.

36

5. Data in S-PLUS Data window appearance To change the appearance of a data window, select the entire data window and right click on any cell. Select ‘Properties’ in the context menu that appears. In the following dialog you can change the appearance of a data window.

5.2 Data types and data structures
5.2.1 Data types
Before visualizing or analyzing data, you need to put your data in some kind of structure and you need to know what type of data you have. The two most important data types in S-PLUS representing data for data analysis, are ‘double’ and ‘factor’. The type ‘timeDate’ is convenient to represent a date or time. There are more types, which we will discuss in Chapter 8. At times you must be very precise in SPLUS. Routines often require a certain type of data. double The double data type is used to represent continuous variables. Continuous data are for example a variable ‘Weight’ which can take values such as 50.56 or 78.986 or a variable ‘Length’ which can take values such as 1.78 or 1.88. factor A factor data type is used to represent categorical data. Factor data in S-PLUS consists of levels, the individual categories of a categorical variable. For example, the variable ‘Sex’ is a categorical variable with the two levels “male” and “female”. Another example would be the variable ‘Income’ with three levels, “Low”, “Average” and “High”. In S-PLUS it is not necessary to code the levels of a factor variable with numbers like 1, 2, 3 etc. As long as you tell S-PLUS that a column is of type factor, it will recognize and treat the words ‘male’ and ‘female’ as separate categories. timeDate Data of type timeDate represent a date and possible a time within a day. For example, 1/30/2001 represents January the 30th of 2001, and 1/30/2001 18:34:45 represents January the 30th of 2001, 6 o’clock PM, 34 minutes and 45 seconds. Missing data The symbol NA in S-PLUS is used to represent missing data. It is not really a separate data type; it could be a missing double or a missing factor.

5.2.2 Data structures
S-PLUS has a number of different data structures, the most important of which are data frames and lists. We will discuss data structures in more detail in Chapter 8. • A data frame is a very useful structure for data analysis. It consists of rows and columns. A row usually corresponds with an observation and a column usually corresponds with a variable. The columns of a data frame can be of different data types, for example one column representing ‘Sex’ and another column representing ‘Age’. A list is a collection of other structures (objects). Many statistical routines in S-PLUS return a list. Such a list is usually a collection of parameter estimates, predicted values and other results.



37

5. Data in S-PLUS

5.3 Data entry in data windows
To create a new data frame, go to the menu File►New. Choose ‘Data Set’ to display an empty data window with the name SDFn. You can change this name in the Object Explorer. To view an existing data frame in a data window, double click on the data frame object in the Object Explorer, or go to the Data►Select Data menu.

5.3.1 Typing in data
The simplest way to enter or modify data is by typing in data. In a data window go to an empty cell and type your data or go to a cell and modify its contents. In a large data window, use the F5 key to go to a specific location in the data window. If you start typing data in a new column, the first characters you type determine the data type of the column. • Typing numbers only (and possible a decimal point) will result in a column of type double. • Typing letters or other non-numeric characters will result in a column of type factor. • Typing dates (i.e. 1/27/1997) will result in a column of type timeDate To check the type of a column, hover over the column number and a tool tip text appears with the data type of the column. Once the type of column is set, you can not type data into cells of other data types. When you try do this, S-PLUS will generate a NA (not available) in the cell. Typing in factors When you are modifying cells of a factor column, you can easily change a cell’s factor level into another level. To see a list of levels that are already known to S-PLUS, double click on a cell of a factor column and click on the little triangle that appears on the right hand side of the cel.

Typing in timeDates By default the date format is that of the regional settings on your system. If, for example, your regional settings are 'Dutch (Standard)', you enter 12/09/1973 for the 12th of September 1973. If you don’t want to follow the regional settings on your system, go to the ‘Startup’ tab of the ‘General Settings’ dialog – accessable through the ‘Options’ menu - and deselect the field ‘Use regional options for input (output)’. Please note: • If the regional setting is 'English (US)', S-PLUS will read 13/1/2000 as the 13th of January, and 44/1/2000 will generate a NA. • If you type only two digits (xx) for the year, it is interpreted as 19xx if xx is smaller than 30 and it is interpreted as 20xx otherwise. By default dates are displayed in the form that is specified in the regional settings on your system. You can change the display of a timeDate column. Right click on a cell of a timeDate column and select ‘Properties’ in the menu. You will get the following timeDate dialog.

38

5. Data in S-PLUS

You can choose another format by selecting one of the formats in the ‘Date format’ field or by typing your own format string into the ‘Date format’ field. For example, dd-MM-yy is the common date format in the Netherlands. Note however, that even when you change the display to dd-MM-yy, you still have to follow the conventions of entering dates as mentioned before. So you would enter 11/23/2000 to display 23-11-2000 if your system uses English (US) as regional settings! You can also type in dd-MMM-yy (or dd-MMMM-yyyy) in the Date format field. S-PLUS will display the date as 23-Nov-00 (or 23-November-2000).

5.3.2 Changing data types
You can change the data type of a column. Right click on a cell of the column you want to change. In the context menu that appears choose ‘Change Data Type’. The following dialog will appear.

Select a new data type for the column. Changing factor columns Changing a factor column into a double or integer column will change the levels of the factor column into numbers. S-PLUS uses an alphabetical order. So a factor with levels “high”, “low” and “average” will become a double column with 1 for “average”, 2 for “high” and 3 for “low”. Changing double to factor Changing a double column into a factor column will create a level for each unique value in the double column. This may not be the result you want. Use ‘Create Categories’ in the Data menu to split a column variable into different categories. Changing double to timeDate If numbers are changed into timeDate, a zero represents 1/1/1960, a one represents 1/2/1960 and so on. If you change 1.5 to a timeDate, the result is 1/2/1960 12:00:00. See for example the following figure, where the double column V3 is transformed to timeDate column V3.1.

39

5. Data in S-PLUS

5.3.3 Copying and pasting data
You can copy data from other sources and paste it into a data window. For example: • • Select and copy a column in an Excel sheet (“Ctrl-C”). Go to a cell in an S-PLUS data window and paste it (“Ctrl-V”).

5.3.4 Importing data
In Chapter 3 we described how to import data files into S-PLUS data frames.

5.3.5 Generating data
S-PLUS offers a number of routines to fill a column with: • Regular sequences • Random numbers • Values calculated from other columns (transformations) Regular sequences Let’s create four sequences in an empty data window to illustrate the use of the fill routine in S-PLUS. • Create a new data window. • Go to the menu Data ►Fill to display the ‘Fill Numeric Columns’ dialog (see the figure below).

1.

The sequence 1,2,..,75 • Select the name of the data window in the field ‘Data Set’. • Type the name of the column, for example: Col1. • Type 75 in the ‘Length’ Field. • Select ‘Sequence’ in the ‘Content’ field. • Do not change the value ‘1’ of the fields ‘Start’, ‘Increment’ and ‘Replications’. • Click on ‘OK’ or ‘Apply’ to perform the operation. The sequence of 75 times a ‘2’ • Select the name of the data window in the field ‘Data Set’.

2.

40

5. Data in S-PLUS • • • • • 3.

Type the name of the column, for example: Col2. Type 1 in the ‘Length’ Field. Select ‘Sequence’ in the ‘Content’ field. In the field ‘Start’ type ‘2’, do not change the value ‘1’ of the field ‘Increment’ and type ‘75’ in the ‘Replications’ field. Click on ‘OK’ or ‘Apply’ to perform the operation.

The sequence 1,1,1,2,2,2,..,,25,25,25. • Select the name of the data window in the field ‘Data Set’. • Type the name of the column, for example: Col3. • Type ‘25’ in the ‘Length’ field. • Select ‘Grouped Sequence’ in the ‘Content’ field. • Do not change the value ‘1’ in the fields 'Start’ and ‘Increment’ and type ‘3’ in the ‘Replications’ field. • Click on ‘OK’ or ‘Apply’ to perform the operation. The sequence 5,10,..,375. • Select the name of the data window in the field ‘Data Set’. • Type the name of the column, for example: Col4. • Type ‘75’ in the 'Length' field. • Select ‘Sequence’ in the ‘Content’ field. • In the field 'Start’ type ‘5’, in the field ‘Increment’ type ‘5’ and do not change value ‘1’ in the ‘Replications’ field. • Click on ‘OK’ or ‘Apply’ to perform the operation.

4.

The steps above result in the following data window:

Note 1 Use the checkbox ‘Append to Existing Values’ to append the new data values to an existing column. The name of the existing column must be specified in the 'Columns' field. Note 2 You can generate the same column more than once in a data frame. To generate more columns, type the column names separated by commas in the ‘Columns’ field. If, for example, you type V1, V2, and V3 in the ‘Columns’ field, the data window will have three columns.

41

5. Data in S-PLUS

Random numbers Go the menu Data►Random Numbers. • Select a data frame in the ‘Data Set’ field. • Type the name of the new column with random numbers in the field ‘Target Column’. • Specify the number of random numbers in the field ‘Sample Size’. • Select a probability distribution from which the random numbers will be drawn. • Specify the distribution-specific parameters. Note 1 If you insert a column of random numbers in a data frame and the sample size is smaller than the number of rows in that data frame, S-PLUS will generate random numbers for the remaining cells of the column. Note 2 If you insert a column of random numbers in a data frame and the sample size is larger than the number of rows in that data frame, S-PLUS will generate NA’s in the other columns. The result is that all columns have the same length. Transformations Suppose you have a data frame SDF1 with columns Col1 and Col2 and you want to create a column Col3 that is the sum of the columns Col1 and Col2. Then proceed as follows: • • • • • Go to the menu Insert► Column. The dialog 'Insert Columns' appears. In the field 'Name(s)' type the name of the new column. In this example Col3. In the field 'Fill Expression' type the expression Col1 + Col2. Select the data frame SDF1 in the ‘Data Set’ field. Click on OK to insert Col3 into SDF1.

Other examples of fill expressions: • log(V1) • V1^2 • V1/V2 • sin(V1/log(V2) + 3)

The natural logarithm of column V1. The square of column V1. V1 divided by V2. Just a nice formula.

Note: Columns resulting from transformations of other columns in a data frame are not linked to those columns. This means that changing a column does not update other columns related to that column.

5.3.6 Moving data
By means of drag and drop you can move or copy (pieces of) data from one place to another place in S-PLUS. You can move data within data windows, between data windows, within the Object Explorer and between a data window and Object Explorer. The following examples will demonstrate this. Example 1 Suppose that in the following data frame 'cars' you want to move the 'Price' column to the third position.

42

5. Data in S-PLUS

• • •

Select the 'Price' column by clicking on the column number. Click on a cell and hold the mouse button. Drag the selected column to the third column. Release the mouse button.

Example 2 Suppose you want to move the Price column of this same data frame ‘cars’ to another (new) data frame. First you open this data frame in a data window, you them select the Price column and you drag it to the other data window. The 'Price' column is copied into the other data frame, it is not deleted from the cars data frame. Moreover, you can drag the column to a folder with data frames in the Object Explorer. Releasing the mouse button on a data frame will cause the 'Price' column to be copied into that specific data frame. Example3 Using the Object Explorer you can copy one or more columns from one data frame to another. Suppose you have a column V1 in the data frame ‘SDF1’ which you want to include in the data frame ‘SDF2’. Follow the steps below: • • • In the left pane of the Object Explorer, expand the folder with the data frame ‘SDF1’ and expand the folder with the data frame ‘SDF2’. This may be the same folder. Select the data frame ‘SDF1’. The different columns of ‘SDF1’ appear in the right pane. Drag the column V1 from the right pane to the icon of the data frame ‘SDF2’ in the left pane.

43

5. Data in S-PLUS

The column V1 will be placed in the data frame ‘SDF2’ and it will also remain in data frame ‘SDF1’. If the column V1 is longer than the columns in ‘SDF2’, the columns in SDF2 will be filled with NA’s to match the length of V1. On the other hand, if V1 is shorter than the columns in SDF2, V1 will be filled with NA’s to match the length of the columns in SDF2. Use the Shift or Ctrl key to select multiple columns in a data frame.

5.4 Manipulating data
The Data menu in S-PLUS contains a number of routines to manipulate data frames. These routines enable you to merge data frames, recode values in data frames, create a subset from a data frame and much more. We will give some examples in this section.

5.4.1 Merging data frames
Use the merge routine in S-PLUS to merge two data frames that have one or more columns in common. Suppose you have the data frames ‘SDF1’ and ‘SDF2’:

44

5. Data in S-PLUS

These data frames have the ‘Name’ and ‘Age’ column in common. To merge them into a new data frame containing all the columns, go to the menu Data►Merge to display the following dialog:

• • • •

Select the first data frame in the field 'Data Set 1'. Click on the little triangle to display a list of data frames. Select the second data frame in the field ‘Data Set 2’. Select ‘All Common Cols’ in the ‘Match by’ field. Type the name of the new data frame in the ‘Save In’ field, for example, ‘last.merge’.

A new data frame ‘last.merge’ will be created with the following data:

You can also use the merge routine to put extra columns in a data frame from a larger “reference” data frame. Suppose you have the data frame ‘mycities’ with columns Place and MyRating, and the reference data frame ‘cities’ with four columns.

45

5. Data in S-PLUS

To put the values of the Npeople, Continent and Language columns of the data frame ‘cities’ into the column Place from the data frame ‘mycities’, proceed as follows: • • • • Select ‘mycities’ as data set 1 and ‘cities’ as data set 2. Select ‘Specified Cols’ in the ‘Row Matching’ group and then select the ‘Place’ column from ‘mycities’ and the ‘Location’ column from ‘cities’. Check the checkbox ‘Include Non-matched Rows in data Set 1’. This will include all cases in data set 1 that could not be found in data set 2. In our example London. Type ‘mycities’ in the ‘Save In’ field.

You will get the following result.

5.4.2 Recode values
Suppose you have a double column ‘Age’ filled with the numbers 1, 2 or 3. The number 1 means people younger than 26, the number 2 means people younger than 65 and the number 3 means people older than 65. To change the column ‘Age’ into a factor column with more descriptive factor levels, proceed as follows: • • • • • First change the column to a factor column. Go to the menu Data►Recode and select the data frame and column of interest in the ‘Recode’ dialog that appears. Type ‘1’ in the field 'Current Value' and ‘Teenagers’ in the field 'New Value'. Click on OK. Likewise change the number 2 into ‘Mid age’ and the number 3 into ‘Old’.

There is a faster way if you want to change two or more levels simultanously. • Go to the data window and select the factor column. • Right click and select ‘Properties’ in the context menu to display the following dialog:

46

5. Data in S-PLUS

• •

In the field ‘Factor Levels’ change 1 into Teenagers, 2 into Mid Age and 3 into Old. Click OK.

5.4.3 Splitting data
With the S-PLUS split routine you can split one data frame into several different data frames. The splitting is based on a split column. If the split column is a factor column, S-PLUS will create a new data frame for each level of the factor column. Go the menu Data► Split to display the following dialog:

• • • • •

Select the data set to split. Select the columns that the resulting data frames should have. By default all columns are selected. Select the splitting variable. Select the type of results. The resulting data sets can be separate data frames or list components. Type a name for the result, for example “last.split”. If your data frames are separate, the names of the data frames are of the form: ‘last.split.level1’, ‘last.split.level2’.

You can also split a data frame based on a continuous variable (a column of type double). The number of resulting data frames depends on the values you enter in the fields 'Maximum Unique Numeric values' (MUNV) and 'Number of Bins for Numeric variable' (NBNV) of the split dialog. If the number of unique values in the double column is smaller than or equal to the value in the field ‘MUNV’, then the number of unique values determines the number of resulting data frames. Otherwise the field ‘NBNV’ gives the number of resulting frames.

5.4.4 Create categories
Using the ‘Create Categories’ routine in the Data menu, you can create factor columns from double columns. Suppose you have a data frame with column ‘V1’ consisting of random numbers between 0 and 10. You want to create a factor column with column name ‘V2’ with 10 levels, level 1 representing

47

5. Data in S-PLUS the numbers between 0 and 1, level 2 representing the numbers between 1 and 2 and so on. Select Data►Create Categories to display the following dialog:

• • • • •

Select the data frame, ‘SDF1’ in this case. Select the column V1 in the ‘Source Column’ field. Type V2 in the ‘Target Column’ field. Choose ‘Cut Points’ as the interval type. Specify the cut points as a comma-separated list 0,1,2,3,4,5,6,7,8,9,10.

The next figure shows the resulting data window of the ‘Create Categories’ action. By default S-PLUS uses ‘0+thru 1’, ‘1+thru2’, etc. as the level names of the new factor column. With the ‘Create Categories’ routine you can also combine factor levels of one column to one factor level. In the above example, suppose you want to combine the factor levels ‘0 + thru 1’ and ‘1 + thru 2’ of column V2 to level ‘0 + thru 2’. • Go to the ‘Create Categories’ dialog. • Select data frame ‘SDF1’ and column ‘V2’. • Type a name for the new target column, for example ‘V3’. When you select a factor column as a source column in the ‘Create Categories’ dialog, you will see the different levels of the column in the field ‘Factor Column’ of the dialog. The number between square brackets before the level name indicates the number of that level in the column. • • Select the levels ‘0+thru 1’ and ‘1+thru 2’. Type the new name for the ‘Combined Level’ (in this example ‘0+thru 2’).

48

5. Data in S-PLUS

5.4.5 Stack and unstack
The stack routine is used to stack separate columns into one column and the unstack routine splits a column into separate columns of specified length. Suppose you have the following ‘SDF1’ data frame with sales figures:

You want to create a new data frame ‘SDF2’ with three columns. The first one is called ‘Sales’ and has the contents of columns ‘Q1’, ‘Q2’, ‘Q3’ and ‘Q4’. The second is a column called ‘Quarter’ which indicates the original quarter and the third one is the column with the corresponding Sales Representatives. Proceed as follows: • • • • • • • Go to the menu Data ►Restructure ►Stack . Select the data frame ‘SDF1’ in the ‘Data Set field’. Select the columns Q1,..,Q4 in the ‘Stack Columns’ field. Select the column ‘SalesRep’ in the ‘Replicate Cols’ field. Type the name of the new data set ‘SDF2’ in ‘Data Set’. Type the name of the ‘Stack Column’: ‘Sales’. Type the name of the ‘Replicate Cols’ and ‘Group Column’.

49

5. Data in S-PLUS

The resulting data window is given below. Data frames resulting from a stack routine can usually easily be converted to so called Trellis graphics (Chapter 6). For example, the data has now been restructured in such a way that S-PLUS plotting routines (Chapter 6) can be used to put the sales figures into a histogram and the spread can be examined by sales person and quarter.

The unstack routine allows you to break up columns. Suppose you have a data frame ‘SDF1’ with three columns; two data columns and one grouping column:

You want to break up the columns ‘V1’, ‘V2’ and ‘V3’ according to the value in ‘V3’. Go to the menu Data ►Restructure ►Unstack.

50

5. Data in S-PLUS

• • • • •

Select SDF1 in the field ‘Data Set’. Choose ‘’ if you want to break up all columns. Select the type of grouping. In this example we want to break up by column. Select the group column ‘V3’. Type the name of the new data set, ‘SDF2’ for example.

The type of grouping can also be ‘Number of Rows’ in which case you can type a number in the field ‘Number of Rows’. S-PLUS will break up the columns in such a way that each new column has the number of rows you specified.

5.4.6 Creating subsets
With the subset routine you can extract a specific part of a data frame and put that in a new data frame. Let’s give a few examples. Go to the menu Data ► Subset. You will get the following subset dialog:

51

5. Data in S-PLUS

In the subset dialog you need to: • Select the data frame from which you want to create the subset. • Select the columns you want to have in your resulting subset. By default all columns are selected. • Type a subset expression the field ‘Subset Rows with’. • Select the type of result you want. The subset expressions you need to enter are expressions that contain column name(s) and one or more logical operatiors. & | == != Some examples: Age > 18 Age 18 & Age < 26 Note that in the third example you must use the & operator, 18 < Age < 26 does not work. When your subset involves levels of a factor variable you must use double quotes. Some examples: Type == “Small” Type == “Small” | Type ==”Van” Type != “Large” If you want to extract all rows from the original data set, but only a few columns, you leave the ‘Subset Rows with’ field blank and select the columns you want in the ‘Columns in Subset’ field. Some more examples Age != min(Age) & Age !=max(Age) Select all cases except the ones with the smallest Age and largest Age. If there is not a unique minimum or maximum, this expression will remove all non unique minima and maxima. Income < quantile(Income,0.95) Select all cases except the ones having an income in the top 5%. The result The result of the subset dialog can be a new data frame with the subset. Check the radio button ‘Data Set’ and type a name in the ‘Save In’ field. The result can also be a logical column containing the same number of rows as the data frame you took a subset from. Each row of the logical column contains a T or F indicating whether or not the and or is equal to is not equal to > < >=

Similar Documents

Free Essay

Music Report

...Concert report On May 2, 2013, the Jazz concert at YOSHI’s OAKLAND, my friend and me had their first concert of this semester, the jazz band called The Bad Plus is from United States, and there are special guest drummer Dave King, pianist Ethan Iverson and bassist Reid Anderson. The concert started at 10:00 pm and ended at 11:30 pm. The jazz concert performed five songs during the concert and there were three songs are the most unforgettable for me “People Like You”, “Never Stop” and “2 P.M.” Jazz music, like all music, tells a story and here are the stories “People Like You”, “Never Stop” and “2 P.M.” “People Like You”, written by Reid Anderson .i think it an song for someone they loved. The song was very pretty, and it has a different genre than other songs in the concert. It is honestly hard for me to recommend this song to anyone, for I thought it was dull and lacked the depth of the other songs throughout the evening. It can clean people mind. At the beginning of this song, a piano and bass began to alternate the melody in this song as though they were like two lovers dancing in the wonderland. The rhythm, which was continuously steady, seemed to get faster like a heartbeat by the climax of the song. The bass would pick up strong at the same time and then die down again. There was such an edgy sound to this performance. And really can take people in the song’s wonderland. “Never stop”, was written by Reid Anderson. It was a beautiful song. I...

Words: 597 - Pages: 3

Free Essay

Personalities

...~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Potential Strengths” These are the general characteristics for these Personalities.[1] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Popular Sanguine Personality * The Extrovert * The Talker * The Optimist* Are Best: * In Dealing with People Enthusiastically. * In Expressing Thoughts with Excitement. * In Up-Front Positions of Attention.* . Their Emotions: As a Parent: As a Friend: At Work: -Appealing Personality -Makes Home Fun -Makes Friends Easily -Volunteers for Jobs -Talkative, Story-Teller -Is Liked by Children’s Friends -Loves People -Thinks Up New Activities -Life of the Party -Turns Disaster into Humor -Thrives on Compliments -Looks Great on the Surface -Good Sense of Humor -Is the Circus Master -Seems Exciting -Creative & Colorful -Memory for Color -Envied by Others -Has Energy & Enthusiasm ...

Words: 622 - Pages: 3

Free Essay

Text Transformation

...Salome Transformation. Dear Diary; My latest guise stared back at me, as I gazed into the mirror, eyes questioning. The same question that shoots up from my subconscious like an obnoxious neon sign, illuminating the darkest crevasses of my (in)sanity. It's always the same. Why? Why do I do this? And I always have the same answer; it's not even an answer, just a shrug and a 'mind your own business'. The response is shot back at my reflection as I begin to admire my work. Red locks curl around my now hazel eyes. Red. Yes, a very fitting colour I think. I wondered if I should add anything else, freckles maybe... Redheads normally have freckles don't they? I decided against it. I had them last time. Then again, last time I was a bronzed beach blonde goddess from Florida. And in a matter of hours, I had transformed to a red haired beauty from the breathtakingly magical country that is Scotland. I tried to remember if and where I told him I was from. Knowing me I probably mumbled something nonsensical; the first city to come to my mind when I speak of Scotland. Glasgow. Yes I told him I was from Glasgow. Oh well, it didn't matter anyway, I'm sure that idiot couldn’t care less where I was from. As if on cue, my phone snaps me out of my thoughts and I see my latest's conquest's number flicker on my screen. What was his name? Peter? Simon? Andrew? John? My thumb hovers over the answer button for a second, the presses down, aggressively, and I answer harshly, almost impatient...

Words: 1122 - Pages: 5

Free Essay

Critically Examine the Impact of a Recent or Proposed Change in Funding on Your Organisation’s Provision in Relation to the Teaching Remit.

...REPORT: Critically examine the impact of a recent or proposed change in funding on your organisation’s provision in relation to the teaching remit. Institutional Context, Nature and Purpose The Careers Development Group (CDG) is a Welfare to Work charity that supports unemployed customers to find and sustain meaningful employment. Customers are aged between 16-64 and referred to 13-week courses from Jobcentre Plus. Many have multiple barriers to employment such as lack of recent work experience, disabilities, health problems and lack of relevant employability skills. In addition, customers often face anxiety, depression and low self-esteem, all of which impact their ability to find employment. To overcome these complex needs, CDG provides training, work experience and other support that can enable customers to achieve a better quality of life. Back to Work activities include job search; job brokerage; work experience placements; environmental and community sector placements; vocational qualifications; information, advice & guidance; pre-employment training; motivation and confidence building; literacy, numeracy and ESOL training. Role within Institutional Context The role of the Senior Personal Advisor (SPA) consists of delivering Literacy, Numeracy and ESOL Qualifications. In addition, internal and external vocational qualifications such as Security, Retail, Forklift and Construction are also organised and financed via the SPA. City & Guilds award the...

Words: 1715 - Pages: 7

Free Essay

Paper

...Employee Absenteeism Kmart Corporation Alvin Williams Professor Whatley LG415 Quality Control Park University INTRODUCTION The purpose of this research paper is to identify employee absenteeism and explain the process of how Kmart was able to successfully merge with Sears effectively without completely diminishing employee morale and loyalty. Let us first begin by going into detail to primarily explain what employee absenteeism is. In the workforce it is described as a failure to appear for work in a routine period of time. It also means the number of occurrences of missed work without valid reasoning. Picture the scenario that you are shopping in Kmart and you have finally completed your shopping to proceed to check out. The time just so happens to be three o’clock, which are the shift change hours for cashiers. There are three customers ahead of you and only two lanes open. You notice that there is a delay in check out times. You finally get to the counter to pay and over hear the cashier complaining that she will have to stay another two hours because her replacement has called out sick once again, without proper notice. At this point the cashier becomes a little irritated in her responses and appears to be in no way interested in pleasing the customer. What happened to the employee that was smiling just three customers before you? I can easily explain. The motivation the...

Words: 2935 - Pages: 12

Premium Essay

Forever 21 Vs Asos Research Paper

...versus Forever 21: Is the plus side of PLUS SIZED shopping in low prices or quality and variety? Ebony Thigpen Psychology 3301 Fall 2015 /Dr. Pamela Auburn November 28, 2015 Abstract Over the past ten years, there has been an influx of department stores that have incorporated plus sized departments into their targeted demographic and ASOS and Forever 21 are among the two most sought after plus sized departments amongst its competitors. Although both stores offer plus sized fashions at much lower prices than their specialty plus sized counterparts, they both maintain their own distinct share of the market attracting a niche yet different shopper. Consider that there are now more young women than ever from their teens to their early-thirties...

Words: 1930 - Pages: 8

Premium Essay

Uygtu

...Atlantic Computer Case Study Marketing II Assignment Group 17 AMITESH PRIOLKAR (2009009) MANISH ARORA (2009022) NEELAY THALY (2009026) SAGAR KHICHADE (2009040) ALOK SHETTY (2009047) SIDDHARTH CHAKU (2009048) Case Issue Develop a pricing strategy for ¶Atlantic bundle· - the new Tronn server and the PESA (Performance Enhancing Server Accelerator) software tool before SME trade show Company Overview y y Atlantic Computers is a manufacturer of Servers and High-tech products Jason Jowers joined the Company four months back and he was responsible for the pricing strategy of the ´Atlantic Bundleµ i.e (A package of the Tronn server and PESA software tool) y The Tronn was developed mainly f or the emerging US market opportunity for basic web servers y The ´Performance Enhancing Server Acceleratorµ (PESA) would allow Tronn to perform four times faster than its usual speed for standard procedures and hence the bundling made sense y y Existence of 2 market segments - High Performance and Basic Servers Atlantic has 20% market share in High Performance Servers with Radia being premier product over 30 years y y An emerging market for basic servers is coming up in the late 1990s Decided to introduce the Basic server into the market by 2000 with the belief that it won·t be seen as a substitute of the High Performance Servers Advantages of PESA & Tronn PESA ² will improve Tronn speed by 4 times Need to purchase fewer servers Lower annual electricity charges...

Words: 1409 - Pages: 6

Premium Essay

Plus Size Models Research Paper

...today’s generation of models. In today generation models can be small or plus size. The standards for models: is to be tall because being short doesn’t qualify, for the most part market places don’t want short models. Industries now have plus size models and there have been agreements and disagreements on how the fashion industry sees plus sizes models; how the public sees plus size models. Models have changed dramatically over time. In the 1895, there were pin-up girls as models...

Words: 1277 - Pages: 6

Free Essay

Outfitters Line Extension

...RESEARCH] | Submitted by: Shaiza Shahid Chaudhry / Roll No: 10-002 | INTRODUCTION: The study explores the need for plus size western wear for women and the client brand for this is Outfitters. This project identifies the fit problems which sturdy women face. This study focuses on all the design elements of the range for instance; silhouettes, colors, sizes, textures, which are required by the customer in order to come up with a whole new range for sturdy women. This study will help Outfitters determine factors so that they can have effective sales. The combination of exploratory and conclusive research is used for this research which includes unstructured interviews and pilots surveys. For conclusive research; survey methods and observations will be conducted. For the initial studies secondary research is also conducted; to see what sorts of trends are followed for plus size outfits by directional brands. For secondary research; available literature is reviewed and the books on clothing fits are reviewed as well. There is no specific age-bracket, anyone who is sturdy or little but oversized is included in the target market, and the class for this range is medium to upper-middle class. Later the research will be analyzed with the help of SPSS and tabulation. After analysis, data will be concluded so that Outfitters can use this study to launch its plus size western wear for women collection. There is an increase in the demand of western wear apparel because Pakistani market...

Words: 6169 - Pages: 25

Premium Essay

Mba Economics Paper

...18/04/2013 Individual assessment declaration: This assessment item is my own work, except where acknowledged, and is not the result of collaboration with others. Signature / Student ID Date Group assessment declaration: * This work is the result of collaboration amongst all members of the group listed below and no others. All members of the group have contributed to the assessment. Family name Given name Signature/Student ID NB: If the assessment is submitted electronically please type name/s into the signature field. 3392418 3451117 3407374 3405350 Livio Greg Grant Brett Barbagallo Marshall Kaczorowski Daley Gavin 3379593 Jennings Please complete either the individual OR group assessment declaration below. * I declare that I have read, understood and followed the University Rules in respect of student academic misconduct outlined in the Student Code of Conduct and Annexure 1 of the Student Misconduct Procedures. * I declare that this assessment item is my own work, except where acknowledged, and has not been submitted for academic credit previously in whole or in part. * I declare that appropriate citation methodology has been used throughout the assignment in accordance with the UNSW policy on Academic Misconduct. I have read the policy on plagiarism in the AGSM Student Handbook. * I acknowledge that the assessor of this item may, for assessment purposes: ...

Words: 2704 - Pages: 11

Free Essay

Risk Paper 2

...Building a house is not just draw and buy the materials, building a house is more than that, decide what we want inside?, how we wanted?, the colors?, what we need to change?, what need an updated?, do we need more rooms?, less rooms? They were so many questions and doubts that we knew that we need a contract where we can change and plan as we see the results of the initial project. In the construction industry, it is not a question of whether or not to have a written contract: one of the real issues is how many contracts can or will be involved in the project. We did not want to be writing diverse contract for the project, for the different steps and changes we may have to do in the scope and plans. We integrated our search for different s contractor and companies, and all of them use different type of contract. Most of the contractor wanted to do a contract based on provide us a service for a set price and we have to pay the price upon competition, the risk on this type of contract; called Lump-Sum. The Lump-Sum contract is a contract where the contractor takes the risk of loss if there were any unexpected expenses and in other hand, a gain if the service comes under budget. We analyzed the situation and we make a list with...

Words: 1301 - Pages: 6

Free Essay

Marketing

...com/wps/portal/npd/us/industry-expertise/fashion/ Jeremy, Y. R. (2015, February 2). Plus-Size Model Makes History. Retrieved from Here & Now: http://hereandnow.wbur.org/2015/02/02/tess-holliday-plus-size-model (Fashion Market Research & Business Solutions, 2015) Fabio Michelle, E. (2009, August). Can Fashion Designs be Copyrighted? Retrieved from Legal Zoom: https://www.legalzoom.com/articles/can-fashion-designs-be-copyrighted Fashion Market Research & Business Solutions. (2015, January). Retrieved from NPD Group: https://www.npd.com/wps/portal/npd/us/industry-expertise/fashion/ Jeremy, Y. R. (2015, February 2). Plus-Size Model Makes History. Retrieved from Here & Now: http://hereandnow.wbur.org/2015/02/02/tess-holliday-plus-size-model N2S Fashions Marketing Plan MK2530 Mrs. De’Onn Griffin January 10, 2015 EXECUTIVE SUMMARY This two-year marketing plan is for North 2 South Fashion Boutique to secure additional funding for in-house fashion line and to inform employees of the company’s current status and direction. Although N2S Fashions was launched less than a years, the boutique has experienced greater-than-anticipated demand for its Plus-size Woman clothing line, and research as shown that the target market of plus-size-minded women retailers would like to purchase more than accessories and jewelry from the boutique, but also her clothing size 14-24 that N2S currently do not offers. The Plus woman is also interested in a new service, in-house alterations. In...

Words: 4335 - Pages: 18

Free Essay

Hahaha

...STRICTLY PRIVATE AND CONFIDENTIAL Serial No. NNNNNNNNNN UNITED GROWTH BERHAD (Company No. 739648-W) Islamic Medium Term Notes pursuant to an Islamic Medium Term Notes Programme of RM2.2 billion under the Shariah principle of Musharakah Joint Lead Arrangers and Joint Lead Managers CIMB Investment Bank Berhad HSBC Amanah Malaysia Berhad (Company No: 18417-M) (Company No: 807705-X) 13 June 2012 IMPORTANT NOTICE Responsibility Statements This Information Memorandum has been approved by the directors of United Growth Berhad (Company No: 739648-W) (“UG” or “Issuer”) and UEM Group Berhad (Company No: 6551-K) (“UEM” or “Obligor”) and UG and UEM accept full responsibility for the accuracy of the information contained in this Information Memorandum. UG and UEM, after having made all reasonable enquiries, confirm that this Information Memorandum contains all information with respect to UG and UEM which is material in the context of the Islamic medium term notes programme of RM2.2 billion (“Programme”) and the offering of the Islamic medium term notes (“Sukuk”) under the Programme. The opinions and intentions expressed in this Information Memorandum in relation to UG and UEM are honestly held, have been reached after considering all relevant circumstances and are based on reasonable assumptions and there are no other facts in relation to UG and UEM or the Programme the omission of which would, in the context of the Sukuk issue, make any statement...

Words: 101090 - Pages: 405

Premium Essay

Proc 5860 Midterm

...PROC 5860 GOVERNMENT CONTRACTING MID-TERM EXAM Instructions: Please answer all questions in detail. One or two line responses are unacceptable. 1. What are acquisition plans designed to do: Acquisition planning is designed to efficiently and effectively use resources to design and develop, or produce quality systems. This includes ensuring accomplishment of mission requirements; promoting the use of commercial items; enhancing full and open competition; enhancing the use of performance-based acquisition; promoting strategic sourcing through consolidation of requirements; limiting the use of high-risk acquisition authorities; increasing support of small businesses; and facilitating effective allocation and use of resources. 2. Performance-based acquisition method is the preferred method of procuring service with the exception of procuring what kind of services? Performance-based acquisition is the preferred method for acquiring services, with the exception of the following services: architect-engineer services acquired in accordance with 40 U.S.C. 1101 et seq.; construction, utility services (see Part 41); or Services that are incidental to supply purchases, a firm-fixed price performance-based contract or task order, a performance-based contract or task order that is not firm-fixed price and a contract or task order that is not performance-based. 3. The federal acquisition and contracting source selection is the responsibility of whom? Who...

Words: 777 - Pages: 4

Premium Essay

Atlantic Computers

...Important Details * competitor: ONTARIO ZINK * CAGR: 3%(BS segment) * TRONN along with PESA works FOUR times more efficiently. * Value of 2 TRONNs = Value of 4 ZINKs Four types of pricing strategies: * Competition based pricing * Cost plus pricing * Charging only for TRON * Value in use pricing RECOMMENDATION: The company should adopt COST PLUS PRICING with a price of $2245.5 because: * Charging for PESA would increase its value and act as a key feature * Although revenues will not be the highest, but the possibility is that the customers might not be willing to pay a very high price as in case of value in use. * It has a 30% margin which means that the company will surely make 30% profit CALCULATING PRICE IN EACH MODEL(* all calculations done based on assumptions and figures given in case) MARKET SHARE | 2001 | 2002 | 2003 | Total | Basic server | 50000 | 70000 | 92000 | 212000 | Market share | 4% | 9% | 14% | | Share | 2000 | 6300 | 12880 | 21180 | 1. Competition based pricing Price for 2 TRONN = 2 * $2000 = $4000 Price for 4 ZINK = 4 * $1700 = $6800 Acc to competition based pricing, Price for 2 TRONN= $6800 Price for 1 TRONN= $ 3400 2. Charging only for TRONN Price for 1 TRONN= $2000 3. Cost Plus Pricing total market share for 3 years= 21180 volume of market share with PESA (50%)= 10590 R&D costs = $2000000 Per unit cost of PESA= $2000000/10590 = $189 Add margin 30%=189+30% = 245.5 Total price per unit of TRONN&PESA= $2000+245.5=...

Words: 1137 - Pages: 5