3 Variable Lists
A common technique in previous versions of Stata was to define a global containing a list of variables to be used later in the document. For example, you might see something like this at the top of a Do file:
global predictors x1 x2 x3 x4then further down the document something like
regress y $predictors
logit z $predictorsIn Stata 16, Stata has formalized this concept with the addition of the vl command (variable list). It works similarly to the use of globals: lists of variables are defined, then later reference via the $name syntax. However, using vl has the benefits of improved organization, customizations unique to variable lists, error checking, and overall convenience.
3.1 Initialization of Variable Lists
To begin using variable lists, vl set must be run.
. sysuse auto
(1978 automobile data)
. vl set
-------------------------------------------------------------------------------
| Macro's contents
|------------------------------------------------------------
Macro | # Vars Description
------------------+------------------------------------------------------------
System |
$vlcategorical | 2 categorical variables
$vlcontinuous | 2 continuous variables
$vluncertain | 7 perhaps continuous, perhaps categorical variables
$vlother | 0 all missing or constant variables
-------------------------------------------------------------------------------
Notes
1. Review contents of vlcategorical and vlcontinuous to ensure they are
correct. Type vl list vlcategorical and type vl list vlcontinuous.
2. If there are any variables in vluncertain, you can reallocate them
to vlcategorical, vlcontinuous, or vlother. Type
vl list vluncertain.
3. Use vl move to move variables among classifications. For example,
type vl move (x50 x80) vlcontinuous to move variables x50 and x80 to
the continuous classification.
4. vlnames are global macros. Type the vlname without the leading
dollar sign ($) when using vl commands. Example: vlcategorical not
$vlcategorical. Type the dollar sign with other Stata commands to
get a varlist.This produces a surprisingly large amount of output. When you initialize the use of variable lists, Stata will automatically create four variable lists, called the “System variable lists”. Every numeric variable in the current data set is automatically placed into one of these four lists:
vlcategorical: Variables which Stata thinks are categorical. These generally have to be non-negative, integer valued variables with less than 10 unique values.vlcontinuous: Variables which Stata thinks are continuous. These generally are variables which have negative values, have non-integer values, or are non-negative integers with more than 100 unique values.vluncertain: Variables which Stata is unsure whether they are continuous or categorical. These generally are non-negative integer valued variables with between 10 and 100 unique values.vlother: Any numeric variables that aren’t really useful - either all missing or constant variables.
There is a potential fifth system variable list, vldummy, which is created when option dummy is passed. Unsurprisingly, this will take variables containing only values 0 and 1 out of vlcategorical and into this list.
The “Notes” given below the output are generic; they appear regardless of how well Stata was able to categorize the variables. They can be suppressed with the nonotes option to vl set.
The two thresholds given above, 10 and 100, can be adjusted by the categorical and uncertain options. For example,
vl set, categorical(20) uncertain(50)Running vl set on an already vl-set data set will result in an error, unless the clear option is given, which will re-generate the lists.
. vl set, dummy nonotes
one or more already classified variables specified
You requested that variables be added to vl's system classifications, but
you specified 11 variables that were already classified.
r(110);
. vl set, dummy nonotes clear
-------------------------------------------------------------------------------
| Macro's contents
|------------------------------------------------------------
Macro | # Vars Description
------------------+------------------------------------------------------------
System |
$vldummy | 1 0/1 variable
$vlcategorical | 1 categorical variable
$vlcontinuous | 2 continuous variables
$vluncertain | 7 perhaps continuous, perhaps categorical variables
$vlother | 0 all missing or constant variables
-------------------------------------------------------------------------------In the above, we changed our minds and wanted to include the vldummy list, but since we’d already vl-set, we had the clear the existing set.
3.2 Viewing lists
When initializing the variable lists, we’re treated to a nice table of all defined lists. We can replay it via
. vl dir
-------------------------------------------------------------------------------
| Macro's contents
|------------------------------------------------------------
Macro | # Vars Description
------------------+------------------------------------------------------------
System |
$vldummy | 1 0/1 variable
$vlcategorical | 1 categorical variable
$vlcontinuous | 2 continuous variables
$vluncertain | 7 perhaps continuous, perhaps categorical variables
$vlother | 0 all missing or constant variables
-------------------------------------------------------------------------------To see the actual contents of the variable lists, we’ll need to use vl list.
. vl list
----------------------------------------------------
Variable | Macro Values Levels
-------------+--------------------------------------
foreign | $vldummy 0 and 1 2
rep78 | $vlcategorical integers >=0 5
headroom | $vlcontinuous noninteger
gear_ratio | $vlcontinuous noninteger
price | $vluncertain integers >=0 74
mpg | $vluncertain integers >=0 21
trunk | $vluncertain integers >=0 18
weight | $vluncertain integers >=0 64
length | $vluncertain integers >=0 47
turn | $vluncertain integers >=0 18
displacement | $vluncertain integers >=0 31
----------------------------------------------------This output produces one row for each variable in each variable list it is in. We haven’t used this yet, but variables can be in multiple lists.
We can list only specific lists:
. vl list vlcategorical
------------------------------------------------
Variable | Macro Values Levels
---------+--------------------------------------
rep78 | $vlcategorical integers >=0 5
------------------------------------------------or specific variables
. vl list (turn weight)
------------------------------------------------
Variable | Macro Values Levels
---------+--------------------------------------
turn | $vluncertain integers >=0 18
weight | $vluncertain integers >=0 64
------------------------------------------------If “turn” was in multiple variable lists, each would appear as a row in this output.
There’s a bit of odd notation which can be used to sort the output by variable name, which makes it easier to identify variables which appear in multiple lists.
. vl list (_all), sort
----------------------------------------------------
Variable | Macro Values Levels
-------------+--------------------------------------
displacement | $vluncertain integers >=0 31
foreign | $vldummy 0 and 1 2
gear_ratio | $vlcontinuous noninteger
headroom | $vlcontinuous noninteger
length | $vluncertain integers >=0 47
mpg | $vluncertain integers >=0 21
price | $vluncertain integers >=0 74
rep78 | $vlcategorical integers >=0 5
trunk | $vluncertain integers >=0 18
turn | $vluncertain integers >=0 18
weight | $vluncertain integers >=0 64
----------------------------------------------------The (_all) tells Stata to report on all variables, and sorting (when you specify at least one variable) orders by variable name rather than variable list name.
This will also list any numeric variables which are not found in any list.
3.2.1 Moving variables in system lists
After initializing the variable lists, if you plan on using the system lists, you may need to move variables around (e.g. classifying the vluncertain variables into their proper lists). This can be done via vl move which has the syntax
vl move (<variables to move>) <destination list>For example, all the variables in vluncertain are actually continuous:
. vl list vluncertain
----------------------------------------------------
Variable | Macro Values Levels
-------------+--------------------------------------
price | $vluncertain integers >=0 74
mpg | $vluncertain integers >=0 21
trunk | $vluncertain integers >=0 18
weight | $vluncertain integers >=0 64
length | $vluncertain integers >=0 47
turn | $vluncertain integers >=0 18
displacement | $vluncertain integers >=0 31
----------------------------------------------------
. vl move (price mpg trunk weight length turn displacement) vlcontinuous
note: 7 variables specified and 7 variables moved.
------------------------------
Macro # Added/Removed
------------------------------
$vldummy 0
$vlcategorical 0
$vlcontinuous 7
$vluncertain -7
$vlother 0
------------------------------
. vl dir
-------------------------------------------------------------------------------
| Macro's contents
|------------------------------------------------------------
Macro | # Vars Description
------------------+------------------------------------------------------------
System |
$vldummy | 1 0/1 variable
$vlcategorical | 1 categorical variable
$vlcontinuous | 9 continuous variables
$vluncertain | 0 perhaps continuous, perhaps categorical variables
$vlother | 0 all missing or constant variables
-------------------------------------------------------------------------------Alternatively, since we’re moving all variables in vluncertain, we can see our first use of the variable list!
. vl set, dummy nonotes clear
-------------------------------------------------------------------------------
| Macro's contents
|------------------------------------------------------------
Macro | # Vars Description
------------------+------------------------------------------------------------
System |
$vldummy | 1 0/1 variable
$vlcategorical | 1 categorical variable
$vlcontinuous | 2 continuous variables
$vluncertain | 7 perhaps continuous, perhaps categorical variables
$vlother | 0 all missing or constant variables
-------------------------------------------------------------------------------
. vl move ($vluncertain) vlcontinuous
note: 7 variables specified and 7 variables moved.
------------------------------
Macro # Added/Removed
------------------------------
$vldummy 0
$vlcategorical 0
$vlcontinuous 7
$vluncertain -7
$vlother 0
------------------------------Note that variable lists are essentially just global macros so can be referred to via \$name. Note, however, that the \$ is only used when we want to actually use the variable list as a macro - in this case, we wanted to expand vluncertain into it’s list of variables. When we’re referring to a variable list in the vl commands, we do not use the \$.
3.3 User Variable Lists
In addition to the System variable lists, you can define your own User variables lists, which I imagine will be used far more often. These are easy to create with vl create:
. vl create mylist1 = (weight mpg)
note: $mylist1 initialized with 2 variables.
. vl create mylist2 = (weight length trunk)
note: $mylist2 initialized with 3 variables.
. vl dir, user
-------------------------------------------------------------------------------
| Macro's contents
|------------------------------------------------------------
Macro | # Vars Description
------------------+------------------------------------------------------------
User |
$mylist1 | 2 variables
$mylist2 | 3 variables
-------------------------------------------------------------------------------
. vl list, user
------------------------------------------------
Variable | Macro Values Levels
---------+--------------------------------------
weight | $mylist1 integers >=0 64
mpg | $mylist1 integers >=0 21
weight | $mylist2 integers >=0 64
length | $mylist2 integers >=0 47
trunk | $mylist2 integers >=0 18
------------------------------------------------Note the addition of the user option to vl list and vl dir to show only User variable lists and suppress the System variable lists. We can also demonstrate the odd sorting syntax here:
. vl list (_all), sort user
----------------------------------------------------
Variable | Macro Values Levels
-------------+--------------------------------------
displacement | not in vluser 31
foreign | not in vluser 2
gear_ratio | not in vluser
headroom | not in vluser
length | $mylist2 integers >=0 47
mpg | $mylist1 integers >=0 21
price | not in vluser 74
rep78 | not in vluser 5
trunk | $mylist2 integers >=0 18
turn | not in vluser 18
weight | $mylist1 integers >=0 64
weight | $mylist2 integers >=0 64
----------------------------------------------------You can refer to variable lists in all the usual shortcut ways:
vl create mylist = (x1-x100 z*)We can add labels to variable lists:
. vl label mylist1 "Related to gas consumption"
. vl label mylist2 "Related to size"
. vl dir, user
-------------------------------------------------------------------------------
| Macro's contents
|------------------------------------------------------------
Macro | # Vars Description
------------------+------------------------------------------------------------
User |
$mylist1 | 2 Related to gas consumption
$mylist2 | 3 Related to size
-------------------------------------------------------------------------------3.3.1 Modifying User Variable Lists
First, note that with User Variable Lists, the vl move command does not work. It only works with system variable lists.
We can create new user variable lists which build off old lists with vl create. To add a new variable:
. vl create mylist3 = mylist2 + (gear_ratio)
note: $mylist3 initialized with 4 variables.
. vl list, user
--------------------------------------------------
Variable | Macro Values Levels
-----------+--------------------------------------
weight | $mylist1 integers >=0 64
mpg | $mylist1 integers >=0 21
weight | $mylist2 integers >=0 64
length | $mylist2 integers >=0 47
trunk | $mylist2 integers >=0 18
weight | $mylist3 integers >=0 64
length | $mylist3 integers >=0 47
trunk | $mylist3 integers >=0 18
gear_ratio | $mylist3 noninteger
--------------------------------------------------
. vl create mylist4 = mylist2 - (turn)
note: $mylist4 initialized with 3 variables.
. vl list, user
--------------------------------------------------
Variable | Macro Values Levels
-----------+--------------------------------------
weight | $mylist1 integers >=0 64
mpg | $mylist1 integers >=0 21
weight | $mylist2 integers >=0 64
length | $mylist2 integers >=0 47
trunk | $mylist2 integers >=0 18
weight | $mylist3 integers >=0 64
length | $mylist3 integers >=0 47
trunk | $mylist3 integers >=0 18
gear_ratio | $mylist3 noninteger
weight | $mylist4 integers >=0 64
length | $mylist4 integers >=0 47
trunk | $mylist4 integers >=0 18
--------------------------------------------------Instead of adding (or removing) single variables at a time, we can instead add or remove lists. Keeping with the comment above, you do not use \$ here to refer to the list.
. vl create mylist5 = mylist2 - mylist1
note: $mylist5 initialized with 2 variables.
. vl list mylist5
------------------------------------------------
Variable | Macro Values Levels
---------+--------------------------------------
length | $mylist5 integers >=0 47
trunk | $mylist5 integers >=0 18
------------------------------------------------However, if we want to simply modify an existing list, a better approach would be the vl modify command. vl create and vl modify are similar to generate and replace; the former creates a new variable list while the later changes an existing variable list, but the syntax right of the = is the same.
. vl modify mylist3 = mylist3 + (headroom)
note: 1 variable added to $mylist3.
. vl modify mylist3 = mylist3 - (weight)
note: 1 variable removed from $mylist3.3.4 Dropping variable list
Variable lists can be dropped via vl drop
. vl dir, user
-------------------------------------------------------------------------------
| Macro's contents
|------------------------------------------------------------
Macro | # Vars Description
------------------+------------------------------------------------------------
User |
$mylist1 | 2 Related to gas consumption
$mylist2 | 3 Related to size
$mylist3 | 4 variables
$mylist4 | 3 variables
$mylist5 | 2 variables
-------------------------------------------------------------------------------
. vl drop mylist4 mylist5
. vl dir, user
-------------------------------------------------------------------------------
| Macro's contents
|------------------------------------------------------------
Macro | # Vars Description
------------------+------------------------------------------------------------
User |
$mylist1 | 2 Related to gas consumption
$mylist2 | 3 Related to size
$mylist3 | 4 variables
-------------------------------------------------------------------------------System lists cannot be dropped; if you run vl drop vlcontinuous it just removes all the variables from it.
3.5 Using Variable Lists
To be explicit, we can use variable lists in any command which would take the variables in that list. For example,
. describe $mylist3
Variable Storage Display Value
name type format label Variable label
-------------------------------------------------------------------------------
length int %8.0g Length (in.)
trunk int %8.0g Trunk space (cu. ft.)
gear_ratio float %6.2f Gear ratio
headroom float %6.1f Headroom (in.)
. describe $vlcategorical
Variable Storage Display Value
name type format label Variable label
-------------------------------------------------------------------------------
rep78 int %8.0g Repair record 1978We can also use them in a modeling setting.
. regress mpg $mylist3
Source | SS df MS Number of obs = 74
-------------+---------------------------------- F(4, 69) = 30.77
Model | 1565.65298 4 391.413244 Prob > F = 0.0000
Residual | 877.806484 69 12.7218331 R-squared = 0.6408
-------------+---------------------------------- Adj R-squared = 0.6199
Total | 2443.45946 73 33.4720474 Root MSE = 3.5668
------------------------------------------------------------------------------
mpg | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
length | -.1837962 .0327629 -5.61 0.000 -.2491564 -.1184361
trunk | -.0103867 .1627025 -0.06 0.949 -.3349693 .3141959
gear_ratio | 1.526952 1.27546 1.20 0.235 -1.017521 4.071426
headroom | .0136375 .6602514 0.02 0.984 -1.303528 1.330803
_cons | 51.33708 8.300888 6.18 0.000 34.77727 67.8969
------------------------------------------------------------------------------However, we’ll run into an issue here - how to specify categorical variables or interactions? The vl substitute command creates “factor-variable lists” that can include factor variable indicators (i.), continuous variable indicators (c.), and interactions (# or ##). (The name “factor-variable list” is slightly disingenuous; you could create a “factor-variable list” that includes no actual factors, for example, if you wanted to interact two continuous variables.)
Creating a factor-varible list via vl substitute can be done by specifying variables or variable lists.
. vl substitute sublist1 = mpg mylist3
. display "$sublist1"
mpg length trunk gear_ratio headroom
. vl dir
-------------------------------------------------------------------------------
| Macro's contents
|------------------------------------------------------------
Macro | # Vars Description
------------------+------------------------------------------------------------
System |
$vldummy | 1 0/1 variable
$vlcategorical | 1 categorical variable
$vlcontinuous | 9 continuous variables
$vluncertain | 0 perhaps continuous, perhaps categorical variables
$vlother | 0 all missing or constant variables
User |
$mylist1 | 2 Related to gas consumption
$mylist2 | 3 Related to size
$mylist3 | 4 variables
$sublist1 | factor-variable list
-------------------------------------------------------------------------------Note the use of display "\$listname" instead of vl list. Factor-variable lists are not just lists of vairables, they also can include the features above, so must be displayed. Note that in the vl dir, “sublist1” has no number of variables listed, making it stand apart.
We can make this more interesting by actually including continuous/factor indicatores and/or interactions.
. vl substitute sublist2 = c.mylist1##i.vldummy
. display "$sublist2"
weight mpg i.foreign i.foreign#c.weight i.foreign#c.mpgNote the need to specify that mylist1 is continuous (with c.). It follows the normal convention that Stata assumes predictors in a model are continuous by default, unless they’re invloved in an interaction, in which case it assumes they are factors by default.
. regress price $sublist2
Source | SS df MS Number of obs = 74
-------------+---------------------------------- F(5, 68) = 16.82
Model | 351163805 5 70232760.9 Prob > F = 0.0000
Residual | 283901591 68 4175023.4 R-squared = 0.5530
-------------+---------------------------------- Adj R-squared = 0.5201
Total | 635065396 73 8699525.97 Root MSE = 2043.3
------------------------------------------------------------------------------
price | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
weight | 4.415037 .8529259 5.18 0.000 2.71305 6.117024
mpg | 237.691 125.0383 1.90 0.062 -11.81907 487.201
|
foreign |
Foreign | 8219.603 7265.713 1.13 0.262 -6278.902 22718.11
|
foreign#|
c.weight |
Foreign | .7408054 1.647504 0.45 0.654 -2.546738 4.028348
|
foreign#|
c.mpg |
Foreign | -257.4683 155.426 -1.66 0.102 -567.616 52.67938
|
_cons | -13285.44 5149.648 -2.58 0.012 -23561.41 -3009.481
------------------------------------------------------------------------------3.5.1 Updating factor-variable Lists
Factor-variable lists cannot be directly modified.
. display "$sublist1"
mpg length trunk gear_ratio headroom
. vl modify sublist1 = sublist1 - mpg
sublist1 not allowed
vlusernames containing factor variables not allowed in this context
r(198);However, if you create a factor-variable list using only other variable lists, if those lists get updated, so does the factor-variable list!
. vl create continuous = (turn trunk)
note: $continuous initialized with 2 variables.
. vl create categorical = (rep78 foreign)
note: $categorical initialized with 2 variables.
. vl substitute predictors = c.continuous##i.categorical
. display "$predictors"
turn trunk i.rep78 i.foreign i.rep78#c.turn i.foreign#c.turn i.rep78#c.trunk i.
> foreign#c.trunk
. vl modify continuous = continuous - (trunk)
note: 1 variable removed from $continuous.
. quiet vl rebuild
. display "$predictors"
turn i.rep78 i.foreign i.rep78#c.turn i.foreign#c.turnNote the call to vl rebuild. Among other things, it will re-generate the factor-variable lists. (It produces a vl dir output without an option to suppress it, hence the use of quiet.)
3.6 Stored Statistics
You may have noticed that certain characteristics of the variable are reported.
. vl list mylist3
--------------------------------------------------
Variable | Macro Values Levels
-----------+--------------------------------------
headroom | $mylist3 noninteger
trunk | $mylist3 integers >=0 18
length | $mylist3 integers >=0 47
gear_ratio | $mylist3 noninteger
--------------------------------------------------This reports some characteristics of the variables (integer, whether it’s non-negative) and the number of unique values. We can also see some other statistics:
. vl list mylist3, min max obs
-------------------------------------------------------------------------------
Variable | Macro Values Levels Min Max Obs
---------+---------------------------------------------------------------------
headroom | $mylist3 noninteger 1.5 5 74
trunk | $mylist3 integers >=0 18 5 23 74
length | $mylist3 integers >=0 47 142 233 74
gear_r~o | $mylist3 noninteger 2.19 3.89 74
-------------------------------------------------------------------------------This is similar to codebook except faster; these characteristics are saved at the time the variable list is created or modified and not updated automatically. If the data changes, this does not get updated.
. drop if weight < 3000
(35 observations deleted)
. summarize weight
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
weight | 39 3653.846 423.5788 3170 4840
. vl list (weight), min max obs
-------------------------------------------------------------------------------
Variable | Macro Values Levels Min Max Obs
---------+---------------------------------------------------------------------
weight | $vlcontinuous integers >=0 64 1760 4840 74
weight | $mylist1 integers >=0 64 1760 4840 74
weight | $mylist2 integers >=0 64 1760 4840 74
-------------------------------------------------------------------------------To re-generate these stored statistics, we call vl set again, with the update option.
. vl set, update
-------------------------------------------------------------------------------
| Macro's contents
|------------------------------------------------------------
Macro | # Vars Description
------------------+------------------------------------------------------------
System |
$vldummy | 1 0/1 variable
$vlcategorical | 1 categorical variable
$vlcontinuous | 9 continuous variables
$vluncertain | 0 perhaps continuous, perhaps categorical variables
$vlother | 0 all missing or constant variables
-------------------------------------------------------------------------------
. vl list (weight), min max obs
-------------------------------------------------------------------------------
Variable | Macro Values Levels Min Max Obs
---------+---------------------------------------------------------------------
weight | $vlcontinuous integers >=0 34 3170 4840 39
weight | $mylist1 integers >=0 34 3170 4840 39
weight | $mylist2 integers >=0 34 3170 4840 39
-------------------------------------------------------------------------------When the update option is passed, variable lists are not affected, only stored statistics are updated.