> Chapter Image

Using the attach( ) command

Attach( ) is Usually a Lousy Idea

Many books that you might consult on R will include in their code the attach() command. I have used that for several years, and found that it really is a bad idea. So I use it only rarely in my material, and when I do I feel bad. But I need to explain why, because it seems like such a nice simple thing to do.

When you read in a file the way I did in the example page that may have sent you here, the variables come in a sort of envelope called a data frame. So you have a data frame named "data1", inside of which are the variables "ID," "Score," "dv", and "Group." You are going to want to get at those variables, but there are good ways and bad ways of doing so. The most common way in introductory books on R is to use "attach(data1)." This essentially makes a copy of every variable in data1 and puts those copies in a place where you can address them by name. You can say ""mean(dv)", for example, and get the mean of the dependent variable. That's nice, but very dangerous if you do a lot of fiddling, like I do, to get the whole program to finally run the way you want.

In the R code that you will find at the end of this document, you will see me read some data, attach the data frame, and get the mean of the variable named "dv." BUT, lots of my problems have a variable named "dv" because it stands for "dependent variable." And lots of data frames get named "data" because that seems like such an obvious choice. In that code I went on and read in the next file that I wanted to work with. It also had "data" as a data frame and "dv" as a variable. So I attached that and went on to get its mean. And in both cases I get the correct means. And then being a good responsible person, I detached that data frame with "detach(data). Great!! Of course, I forgot to do that the first time.

That example is a bit foreshortened because you would probably do a bunch of other stuff with each problem while you were at it,which just gives you more time to get forgetful and careless.

But now I have other problems to work on, so I put in some more code and ask for the mean of a variable named "dv." But what happens? I get the mean from the first problem. Where did that come from? I don't want that mean!!

If you go back and look at what looks like an error message in the printout, you will see that it is not an error message. It says "the following objects are "masked" from data". It does not say that dv replaced the earlier dv. It says that it masked it. In other words, it has temporarily hidden that version of dv. But when we later detach(data), we are detaching the second set, and that allows the original dv to bounce back again and confuse us completely. And the more you work with a set of data, the more likely you are to end up with some variable that is no longer the variable you want. And it is so hard to figure out why you can't get the right answer.

You might think that you can easily get out of this problem by using "detach()." But don't be fooled. If you are as careless a typist as I am, you may have to run your code four or five times until you get it right. And every time you run it you invoke the attach(data1) function. So you think that you can make everything right by using "detach(data1)." BUT NO!! when you detach data1, you only detach the most recent attachment, not the previous ones. So they are still there. You may have to enter detach(data1) several times to clear everything out. But there is a way around this. If you are using RStudio, go to RStudio/Preferences and check the box that tells it to always restart windows that were open when you quit. (You only have to do that once.) Now go to Session/Restart R and run that. You won't lose anything important, but you will have erased all of the old stuff, not just what you erase with rm(list = ls()). When you issue this restart command, R will close and then immediately reopen with your code ready to run.

So what do we do???

Other than restarting , there are a couple of ways around this problem. If we have a dataframe named d1, for example, with a variable called "Score." we do not have to attach d1, we can add it to the name. In other words, we can say something like "mean(d1$Score)", and it knows to go into the data frame named d1 and get what we need. Very clever.

Alternatively you can look up the commands "with()" and "within()." They allow you to specify the data file from which you will run the next commands. Even better, when using some of the slightly more advanced packages, many functions will allow you to add "data = d1" when you invoke the function. That is my preference, but it will not work for all functions.

But you are going to say that is too much typing.Yes, but it saves a lot of headscratching. And for this book, that is what I am going to do. But to reduce typing, I will generally name data frames with short names like d1, d2, d3, etc. I won't reuse one of those names within the same set of problems, so that will help. I may use a longer name occasionally, but not as a rule.

You won't guess how much time on spent on the stupd problem of what the book should do with "attach()." This is the best that I can do.

dch

Free JavaScripts provided
by The JavaScript Source