3 Variable Lists
A common technique in previous versions of Stata was to define a global containing a list of variables to be used later in the document. For example, you might see something like this at the top of a Do file:
global predictors x1 x2 x3 x4
then further down the document something like
regress y $predictors
logit z $predictors
In Stata 16, Stata has formalized this concept with the addition of the vl
command (variable list). It works similarly to the use of globals: lists of variables are defined, then later reference via the $name
syntax. However, using vl
has the benefits of improved organization, customizations unique to variable lists, error checking, and overall convenience.
3.1 Initialization of Variable Lists
To begin using variable lists, vl set
must be run.
sysuse auto
. data)
(1978 automobile
set
. vl
-------------------------------------------------------------------------------
| Macro's contents
|------------------------------------------------------------
Macro | # Vars Description
------------------+------------------------------------------------------------
System |$vlcategorical | 2 categorical variables
$vlcontinuous | 2 continuous variables
$vluncertain | 7 perhaps continuous, perhaps categorical variables
$vlother | 0 all missing or constant variables
-------------------------------------------------------------------------------
Notes
of vlcategorical and vlcontinuous to ensure they are
1. Review contents list vlcategorical and type vl list vlcontinuous.
correct. Type vl
in vluncertain, you can reallocate them
2. If there are any variables or vlother. Type
to vlcategorical, vlcontinuous, list vluncertain.
vl
3. Use vl move to move variables among classifications. For example,type vl move (x50 x80) vlcontinuous to move variables x50 and x80 to
the continuous classification.
global macros. Type the vlname without the leading
4. vlnames are sign ($) when using vl commands. Example: vlcategorical not
dollar $vlcategorical. Type the dollar sign with other Stata commands to
get a varlist.
This produces a surprisingly large amount of output. When you initialize the use of variable lists, Stata will automatically create four variable lists, called the “System variable lists”. Every numeric variable in the current data set is automatically placed into one of these four lists:
vlcategorical
: Variables which Stata thinks are categorical. These generally have to be non-negative, integer valued variables with less than 10 unique values.vlcontinuous
: Variables which Stata thinks are continuous. These generally are variables which have negative values, have non-integer values, or are non-negative integers with more than 100 unique values.vluncertain
: Variables which Stata is unsure whether they are continuous or categorical. These generally are non-negative integer valued variables with between 10 and 100 unique values.vlother
: Any numeric variables that aren’t really useful - either all missing or constant variables.
There is a potential fifth system variable list, vldummy
, which is created when option dummy
is passed. Unsurprisingly, this will take variables containing only values 0 and 1 out of vlcategorical
and into this list.
The “Notes” given below the output are generic; they appear regardless of how well Stata was able to categorize the variables. They can be suppressed with the nonotes
option to vl set
.
The two thresholds given above, 10 and 100, can be adjusted by the categorical
and uncertain
options. For example,
set, categorical(20) uncertain(50) vl
Running vl set
on an already vl
-set data set will result in an error, unless the clear
option is given, which will re-generate the lists.
set, dummy nonotes
. vl or more already classified variables specified
one be added to vl's system classifications, but
You requested that variables
you specified 11 variables that were already classified.r(110);
set, dummy nonotes clear
. vl
-------------------------------------------------------------------------------
| Macro's contents
|------------------------------------------------------------
Macro | # Vars Description
------------------+------------------------------------------------------------
System |$vldummy | 1 0/1 variable
$vlcategorical | 1 categorical variable
$vlcontinuous | 2 continuous variables
$vluncertain | 7 perhaps continuous, perhaps categorical variables
$vlother | 0 all missing or constant variables
-------------------------------------------------------------------------------
In the above, we changed our minds and wanted to include the vldummy
list, but since we’d already vl
-set, we had the clear
the existing set.
3.2 Viewing lists
When initializing the variable lists, we’re treated to a nice table of all defined lists. We can replay it via
dir
. vl
-------------------------------------------------------------------------------
| Macro's contents
|------------------------------------------------------------
Macro | # Vars Description
------------------+------------------------------------------------------------
System |$vldummy | 1 0/1 variable
$vlcategorical | 1 categorical variable
$vlcontinuous | 2 continuous variables
$vluncertain | 7 perhaps continuous, perhaps categorical variables
$vlother | 0 all missing or constant variables
-------------------------------------------------------------------------------
To see the actual contents of the variable lists, we’ll need to use vl list
.
list
. vl
----------------------------------------------------
Variable | Macro Values Levels
-------------+--------------------------------------$vldummy 0 and 1 2
foreign | $vlcategorical integers >=0 5
rep78 | $vlcontinuous noninteger
headroom | $vlcontinuous noninteger
gear_ratio | $vluncertain integers >=0 74
price | $vluncertain integers >=0 21
mpg | $vluncertain integers >=0 18
trunk | weight | $vluncertain integers >=0 64
length | $vluncertain integers >=0 47
$vluncertain integers >=0 18
turn | $vluncertain integers >=0 31
displacement | ----------------------------------------------------
This output produces one row for each variable in each variable list it is in. We haven’t used this yet, but variables can be in multiple lists.
We can list only specific lists:
list vlcategorical
. vl
------------------------------------------------
Variable | Macro Values Levels
---------+--------------------------------------$vlcategorical integers >=0 5
rep78 | ------------------------------------------------
or specific variables
list (turn weight)
. vl
------------------------------------------------
Variable | Macro Values Levels
---------+--------------------------------------$vluncertain integers >=0 18
turn | weight | $vluncertain integers >=0 64
------------------------------------------------
If “turn” was in multiple variable lists, each would appear as a row in this output.
There’s a bit of odd notation which can be used to sort the output by variable name, which makes it easier to identify variables which appear in multiple lists.
list (_all), sort
. vl
----------------------------------------------------
Variable | Macro Values Levels
-------------+--------------------------------------$vluncertain integers >=0 31
displacement | $vldummy 0 and 1 2
foreign | $vlcontinuous noninteger
gear_ratio | $vlcontinuous noninteger
headroom | length | $vluncertain integers >=0 47
$vluncertain integers >=0 21
mpg | $vluncertain integers >=0 74
price | $vlcategorical integers >=0 5
rep78 | $vluncertain integers >=0 18
trunk | $vluncertain integers >=0 18
turn | weight | $vluncertain integers >=0 64
----------------------------------------------------
The (_all)
tells Stata to report on all variables, and sorting (when you specify at least one variable) orders by variable name rather than variable list name.
This will also list any numeric variables which are not found in any list.
3.2.1 Moving variables in system lists
After initializing the variable lists, if you plan on using the system lists, you may need to move variables around (e.g. classifying the vluncertain
variables into their proper lists). This can be done via vl move
which has the syntax
list> vl move (<variables to move>) <destination
For example, all the variables in vluncertain
are actually continuous:
list vluncertain
. vl
----------------------------------------------------
Variable | Macro Values Levels
-------------+--------------------------------------$vluncertain integers >=0 74
price | $vluncertain integers >=0 21
mpg | $vluncertain integers >=0 18
trunk | weight | $vluncertain integers >=0 64
length | $vluncertain integers >=0 47
$vluncertain integers >=0 18
turn | $vluncertain integers >=0 31
displacement |
----------------------------------------------------
weight length turn displacement) vlcontinuous
. vl move (price mpg trunk note: 7 variables specified and 7 variables moved.
------------------------------
Macro # Added/Removed
------------------------------$vldummy 0
$vlcategorical 0
$vlcontinuous 7
$vluncertain -7
$vlother 0
------------------------------
dir
. vl
-------------------------------------------------------------------------------
| Macro's contents
|------------------------------------------------------------
Macro | # Vars Description
------------------+------------------------------------------------------------
System |$vldummy | 1 0/1 variable
$vlcategorical | 1 categorical variable
$vlcontinuous | 9 continuous variables
$vluncertain | 0 perhaps continuous, perhaps categorical variables
$vlother | 0 all missing or constant variables
-------------------------------------------------------------------------------
Alternatively, since we’re moving all variables in vluncertain
, we can see our first use of the variable list!
set, dummy nonotes clear
. vl
-------------------------------------------------------------------------------
| Macro's contents
|------------------------------------------------------------
Macro | # Vars Description
------------------+------------------------------------------------------------
System |$vldummy | 1 0/1 variable
$vlcategorical | 1 categorical variable
$vlcontinuous | 2 continuous variables
$vluncertain | 7 perhaps continuous, perhaps categorical variables
$vlother | 0 all missing or constant variables
-------------------------------------------------------------------------------
$vluncertain) vlcontinuous
. vl move (note: 7 variables specified and 7 variables moved.
------------------------------
Macro # Added/Removed
------------------------------$vldummy 0
$vlcategorical 0
$vlcontinuous 7
$vluncertain -7
$vlother 0
------------------------------
Note that variable lists are essentially just global macros so can be referred to via \$name
. Note, however, that the \$
is only used when we want to actually use the variable list as a macro - in this case, we wanted to expand vluncertain
into it’s list of variables. When we’re referring to a variable list in the vl
commands, we do not use the \$
.
3.3 User Variable Lists
In addition to the System variable lists, you can define your own User variables lists, which I imagine will be used far more often. These are easy to create with vl create
:
weight mpg)
. vl create mylist1 = (note: $mylist1 initialized with 2 variables.
weight length trunk)
. vl create mylist2 = (note: $mylist2 initialized with 3 variables.
dir, user
. vl
-------------------------------------------------------------------------------
| Macro's contents
|------------------------------------------------------------
Macro | # Vars Description
------------------+------------------------------------------------------------
User |$mylist1 | 2 variables
$mylist2 | 3 variables
-------------------------------------------------------------------------------
list, user
. vl
------------------------------------------------
Variable | Macro Values Levels
---------+--------------------------------------weight | $mylist1 integers >=0 64
$mylist1 integers >=0 21
mpg | weight | $mylist2 integers >=0 64
length | $mylist2 integers >=0 47
$mylist2 integers >=0 18
trunk | ------------------------------------------------
Note the addition of the user
option to vl list
and vl dir
to show only User variable lists and suppress the System variable lists. We can also demonstrate the odd sorting syntax here:
list (_all), sort user
. vl
----------------------------------------------------
Variable | Macro Values Levels
-------------+--------------------------------------not in vluser 31
displacement | not in vluser 2
foreign | not in vluser
gear_ratio | not in vluser
headroom | length | $mylist2 integers >=0 47
$mylist1 integers >=0 21
mpg | not in vluser 74
price | not in vluser 5
rep78 | $mylist2 integers >=0 18
trunk | not in vluser 18
turn | weight | $mylist1 integers >=0 64
weight | $mylist2 integers >=0 64
----------------------------------------------------
You can refer to variable lists in all the usual shortcut ways:
vl create mylist = (x1-x100 z*)
We can add labels to variable lists:
label mylist1 "Related to gas consumption"
. vl
label mylist2 "Related to size"
. vl
dir, user
. vl
-------------------------------------------------------------------------------
| Macro's contents
|------------------------------------------------------------
Macro | # Vars Description
------------------+------------------------------------------------------------
User |$mylist1 | 2 Related to gas consumption
$mylist2 | 3 Related to size
-------------------------------------------------------------------------------
3.3.1 Modifying User Variable Lists
First, note that with User Variable Lists, the vl move
command does not work. It only works with system variable lists.
We can create new user variable lists which build off old lists with vl create
. To add a new variable:
. vl create mylist3 = mylist2 + (gear_ratio)note: $mylist3 initialized with 4 variables.
list, user
. vl
--------------------------------------------------
Variable | Macro Values Levels
-----------+--------------------------------------weight | $mylist1 integers >=0 64
$mylist1 integers >=0 21
mpg | weight | $mylist2 integers >=0 64
length | $mylist2 integers >=0 47
$mylist2 integers >=0 18
trunk | weight | $mylist3 integers >=0 64
length | $mylist3 integers >=0 47
$mylist3 integers >=0 18
trunk | $mylist3 noninteger
gear_ratio |
--------------------------------------------------
. vl create mylist4 = mylist2 - (turn)note: $mylist4 initialized with 3 variables.
list, user
. vl
--------------------------------------------------
Variable | Macro Values Levels
-----------+--------------------------------------weight | $mylist1 integers >=0 64
$mylist1 integers >=0 21
mpg | weight | $mylist2 integers >=0 64
length | $mylist2 integers >=0 47
$mylist2 integers >=0 18
trunk | weight | $mylist3 integers >=0 64
length | $mylist3 integers >=0 47
$mylist3 integers >=0 18
trunk | $mylist3 noninteger
gear_ratio | weight | $mylist4 integers >=0 64
length | $mylist4 integers >=0 47
$mylist4 integers >=0 18
trunk | --------------------------------------------------
Instead of adding (or removing) single variables at a time, we can instead add or remove lists. Keeping with the comment above, you do not use \$
here to refer to the list.
. vl create mylist5 = mylist2 - mylist1note: $mylist5 initialized with 2 variables.
list mylist5
. vl
------------------------------------------------
Variable | Macro Values Levels
---------+--------------------------------------length | $mylist5 integers >=0 47
$mylist5 integers >=0 18
trunk | ------------------------------------------------
However, if we want to simply modify an existing list, a better approach would be the vl modify
command. vl create
and vl modify
are similar to generate
and replace
; the former creates a new variable list while the later changes an existing variable list, but the syntax right of the =
is the same.
. vl modify mylist3 = mylist3 + (headroom)note: 1 variable added to $mylist3.
weight)
. vl modify mylist3 = mylist3 - (note: 1 variable removed from $mylist3.
3.4 Dropping variable list
Variable lists can be dropped via vl drop
dir, user
. vl
-------------------------------------------------------------------------------
| Macro's contents
|------------------------------------------------------------
Macro | # Vars Description
------------------+------------------------------------------------------------
User |$mylist1 | 2 Related to gas consumption
$mylist2 | 3 Related to size
$mylist3 | 4 variables
$mylist4 | 3 variables
$mylist5 | 2 variables
-------------------------------------------------------------------------------
drop mylist4 mylist5
. vl
dir, user
. vl
-------------------------------------------------------------------------------
| Macro's contents
|------------------------------------------------------------
Macro | # Vars Description
------------------+------------------------------------------------------------
User |$mylist1 | 2 Related to gas consumption
$mylist2 | 3 Related to size
$mylist3 | 4 variables
-------------------------------------------------------------------------------
System lists cannot be dropped; if you run vl drop vlcontinuous
it just removes all the variables from it.
3.5 Using Variable Lists
To be explicit, we can use variable lists in any command which would take the variables in that list. For example,
describe $mylist3
.
Variable Storage Display Valuename type format label Variable label
-------------------------------------------------------------------------------length int %8.0g Length (in.)
int %8.0g Trunk space (cu. ft.)
trunk float %6.2f Gear ratio
gear_ratio float %6.1f Headroom (in.)
headroom
describe $vlcategorical
.
Variable Storage Display Valuename type format label Variable label
-------------------------------------------------------------------------------int %8.0g Repair record 1978 rep78
We can also use them in a modeling setting.
regress mpg $mylist3
.
of obs = 74
Source | SS df MS Number F(4, 69) = 30.77
-------------+---------------------------------- F = 0.0000
Model | 1565.65298 4 391.413244 Prob >
Residual | 877.806484 69 12.7218331 R-squared = 0.6408
-------------+---------------------------------- Adj R-squared = 0.6199
Total | 2443.45946 73 33.4720474 Root MSE = 3.5668
------------------------------------------------------------------------------
mpg | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------length | -.1837962 .0327629 -5.61 0.000 -.2491564 -.1184361
trunk | -.0103867 .1627025 -0.06 0.949 -.3349693 .3141959
gear_ratio | 1.526952 1.27546 1.20 0.235 -1.017521 4.071426
headroom | .0136375 .6602514 0.02 0.984 -1.303528 1.330803_cons | 51.33708 8.300888 6.18 0.000 34.77727 67.8969
------------------------------------------------------------------------------
However, we’ll run into an issue here - how to specify categorical variables or interactions? The vl substitute
command creates “factor-variable lists” that can include factor variable indicators (i.
), continuous variable indicators (c.
), and interactions (#
or ##
). (The name “factor-variable list” is slightly disingenuous; you could create a “factor-variable list” that includes no actual factors, for example, if you wanted to interact two continuous variables.)
Creating a factor-varible list via vl substitute
can be done by specifying variables or variable lists.
. vl substitute sublist1 = mpg mylist3
display "$sublist1"
. length trunk gear_ratio headroom
mpg
dir
. vl
-------------------------------------------------------------------------------
| Macro's contents
|------------------------------------------------------------
Macro | # Vars Description
------------------+------------------------------------------------------------
System |$vldummy | 1 0/1 variable
$vlcategorical | 1 categorical variable
$vlcontinuous | 9 continuous variables
$vluncertain | 0 perhaps continuous, perhaps categorical variables
$vlother | 0 all missing or constant variables
User |$mylist1 | 2 Related to gas consumption
$mylist2 | 3 Related to size
$mylist3 | 4 variables
$sublist1 | factor-variable list
-------------------------------------------------------------------------------
Note the use of display "\$listname"
instead of vl list
. Factor-variable lists are not just lists of vairables, they also can include the features above, so must be displayed. Note that in the vl dir
, “sublist1” has no number of variables listed, making it stand apart.
We can make this more interesting by actually including continuous/factor indicatores and/or interactions.
. vl substitute sublist2 = c.mylist1##i.vldummy
display "$sublist2"
. weight mpg i.foreign i.foreign#c.weight i.foreign#c.mpg
Note the need to specify that mylist1 is continuous (with c.
). It follows the normal convention that Stata assumes predictors in a model are continuous by default, unless they’re invloved in an interaction, in which case it assumes they are factors by default.
regress price $sublist2
.
of obs = 74
Source | SS df MS Number F(5, 68) = 16.82
-------------+---------------------------------- F = 0.0000
Model | 351163805 5 70232760.9 Prob >
Residual | 283901591 68 4175023.4 R-squared = 0.5530
-------------+---------------------------------- Adj R-squared = 0.5201
Total | 635065396 73 8699525.97 Root MSE = 2043.3
------------------------------------------------------------------------------
price | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------weight | 4.415037 .8529259 5.18 0.000 2.71305 6.117024
mpg | 237.691 125.0383 1.90 0.062 -11.81907 487.201
|
foreign |
Foreign | 8219.603 7265.713 1.13 0.262 -6278.902 22718.11
|
foreign#|weight |
c.
Foreign | .7408054 1.647504 0.45 0.654 -2.546738 4.028348
|
foreign#|
c.mpg |
Foreign | -257.4683 155.426 -1.66 0.102 -567.616 52.67938
|_cons | -13285.44 5149.648 -2.58 0.012 -23561.41 -3009.481
------------------------------------------------------------------------------
3.5.1 Updating factor-variable Lists
Factor-variable lists cannot be directly modified.
display "$sublist1"
. length trunk gear_ratio headroom
mpg
. vl modify sublist1 = sublist1 - mpgnot allowed
sublist1 factor variables not allowed in this context
vlusernames containing r(198);
However, if you create a factor-variable list using only other variable lists, if those lists get updated, so does the factor-variable list!
. vl create continuous = (turn trunk)note: $continuous initialized with 2 variables.
. vl create categorical = (rep78 foreign)note: $categorical initialized with 2 variables.
. vl substitute predictors = c.continuous##i.categorical
display "$predictors"
.
turn trunk i.rep78 i.foreign i.rep78#c.turn i.foreign#c.turn i.rep78#c.trunk i.
> foreign#c.trunk
. vl modify continuous = continuous - (trunk)note: 1 variable removed from $continuous.
. quiet vl rebuild
display "$predictors"
. turn i.rep78 i.foreign i.rep78#c.turn i.foreign#c.turn
Note the call to vl rebuild
. Among other things, it will re-generate the factor-variable lists. (It produces a vl dir
output without an option to suppress it, hence the use of quiet
.)
3.6 Stored Statistics
You may have noticed that certain characteristics of the variable are reported.
list mylist3
. vl
--------------------------------------------------
Variable | Macro Values Levels
-----------+--------------------------------------$mylist3 noninteger
headroom | $mylist3 integers >=0 18
trunk | length | $mylist3 integers >=0 47
$mylist3 noninteger
gear_ratio | --------------------------------------------------
This reports some characteristics of the variables (integer, whether it’s non-negative) and the number of unique values. We can also see some other statistics:
list mylist3, min max obs
. vl
-------------------------------------------------------------------------------
Variable | Macro Values Levels Min Max Obs
---------+---------------------------------------------------------------------$mylist3 noninteger 1.5 5 74
headroom | $mylist3 integers >=0 18 5 23 74
trunk | length | $mylist3 integers >=0 47 142 233 74
$mylist3 noninteger 2.19 3.89 74
gear_r~o | -------------------------------------------------------------------------------
This is similar to codebook
except faster; these characteristics are saved at the time the variable list is created or modified and not updated automatically. If the data changes, this does not get updated.
drop if weight < 3000
.
(35 observations deleted)
summarize weight
.
dev. Min Max
Variable | Obs Mean Std.
-------------+---------------------------------------------------------weight | 39 3653.846 423.5788 3170 4840
list (weight), min max obs
. vl
-------------------------------------------------------------------------------
Variable | Macro Values Levels Min Max Obs
---------+---------------------------------------------------------------------weight | $vlcontinuous integers >=0 64 1760 4840 74
weight | $mylist1 integers >=0 64 1760 4840 74
weight | $mylist2 integers >=0 64 1760 4840 74
-------------------------------------------------------------------------------
To re-generate these stored statistics, we call vl set
again, with the update
option.
set, update
. vl
-------------------------------------------------------------------------------
| Macro's contents
|------------------------------------------------------------
Macro | # Vars Description
------------------+------------------------------------------------------------
System |$vldummy | 1 0/1 variable
$vlcategorical | 1 categorical variable
$vlcontinuous | 9 continuous variables
$vluncertain | 0 perhaps continuous, perhaps categorical variables
$vlother | 0 all missing or constant variables
-------------------------------------------------------------------------------
list (weight), min max obs
. vl
-------------------------------------------------------------------------------
Variable | Macro Values Levels Min Max Obs
---------+---------------------------------------------------------------------weight | $vlcontinuous integers >=0 34 3170 4840 39
weight | $mylist1 integers >=0 34 3170 4840 39
weight | $mylist2 integers >=0 34 3170 4840 39
-------------------------------------------------------------------------------
When the update
option is passed, variable lists are not affected, only stored statistics are updated.