Check that x is a character vector with unique, syntactically valid names
that do not consist of only dots or of two dots followed by a number, and do
not suggest they were adjusted or automatically created.
Value
TRUE or FALSE, indicating if x is a character vector that consists of
unique, syntactically valid names that do not consist of only dots or of two
dots followed by a number, and do not suggest they were adjusted or
automatically created.
Details
Duplicated or syntactically invalid names are not allowed by
all_names() because R functions are not guaranteed to handle such names
correctly. For example, not all operations on data frames
will preserve duplicated column names, and operations involving syntactically
invalid names might, by definition, give undocumented results.
Syntactically valid names only consist of letters, numbers,
dots and underscores; start with a letter, or with a dot not followed by a
number; and are not reserved words such as for or any of the NAs. The
definition of letter depends on the current locale. A
conservative check for names that are syntactically valid on all locales
would only allow digits and unaccented Latin letters, but that is not
enforced by all_names().
Names that consist of only dots, or consist of two dots followed by a number,
are not allowed by all_names() (nor by vctrs::vec_as_names()): they are
listed as reserved words even though they are not recognised as
syntactically invalid by make.names().
Suspicious names are not allowed by all_names(). A suspicious name contains
a pattern that suggests it originally was syntactically invalid and has been
adjusted into a syntactically valid name, or has been adjusted to make names
unique. Such adjustments usually occur silently, for example
when data is read into R, which is problematic because it cannot reliably be
assumed the original column names are present. The identification of
suspicious names is partly based on the assumption that names originally did
not contain dots, see the first item in the list below.
all_names() tries to recognise adjustments made by make.names(), which
is used by data.frame(), read.csv(), and
data.table::fread(x, header = TRUE, check.names = TRUE); and adjustments
made by vctrs::vec_as_names(x, repair = "universal"), which is used
throughout the tidyverse:
adjustments to replace invalid characters (i.e., characters that are not a letter, number, dot or underscore):
make.names()andvctrs::vec_as_names(x, repair = "universal")replace such characters with a dot. Their identification is based on the assumption that names originally did not contain dots, which is good practice (unfortunately not strictly followed in base-R, e.g., indata.frame()) preventing names containing a dot from being confused with methods used on classed objects.adjustments to make duplicated names unique:
make.names(x, unique = TRUE)appends a dot followed by a number;vctrs::vec_as_names(x, repair = "universal")appends three dots followed by a number. It is not checked if a complete sequence of suspicious names is present, e.g.,a.2will be flagged as suspicious even ifaanda.1are absent.adjustments to make reserved words valid:
make.names()appends a dot;vctrs::vec_as_names(x, repair = "universal")prepends a dot.adjustments to make names that did not start with a letter, nor with a dot not followed by a number, syntactically valid:
make.names()prependsX;vctrs::vec_as_names(x, repair = "universal")prepends one or more dots.adjustments to name unnamed columns:
data.frame()uses patternV1,V2,V3if a matrix without column names is converted to a data.frame, andread.csv(..., header = FALSE)uses the same pattern for data without column names;read.csv(..., header = TRUE)uses patternX,X.1,X.2.
Names containing underscores (_) are by default allowed by all_names()
because names containing underscores are not syntactically invalid. However,
setting allow_underscores to FALSE to not allow such names is useful to
check that names do not contain underscores, for example if several names
will be concatenated separated by underscores to create an ID-tag.
Programming notes
The patterns used to identify suspicious names are created using regular expressions with the following elements:
require a pattern to start at the beginning of a string (
^) or reach the end of a string ($);specify characters that should be present: a dot (
\\.or, iffixedisTRUE,.), an underscore (_), any digit ([0-9]), digits one to nine ([1-9]), charactersVorX);indicate presence: present zero or more times (
*); present one or more times (+).
Multiple patterns can be combined using |, the normal operator indicating
logical OR.
See also
Section Details of make.names(), section Names and Identifiers of
Quotes, and the R FAQ about valid names
on the syntactical validity of names.
names() to get or set object names; janitor::make_clean_names() to adjust
names, e.g., through adjusting case and transliterating non-ASCII characters.
The vignettes about design choices and about type coercion.
Other collections of checks on type and length:
all_characters(),
is_logical(),
is_natural(),
is_number(),
is_zerolength()
Examples
all_names(x = c("a", "b1a")) # TRUE
#> [1] TRUE
all_names(x = c("a", "b1a", "a")) # FALSE: duplicated name
#> Warning: Names are duplicated: 'a'.
#> Use 'x <- make.names(x, unique = TRUE)' to create unique, syntactically valid names!
#> [1] FALSE
invalid_names <- c("a", "ab#cd", "", "for", "..", "..23")
# Syntactically invalid names: the character '#' makes names invalid,
# '""' is an empty name, 'for', '..', and '..23' are reserved words.
all_names(x = invalid_names) # FALSE
#> Warning: Names are syntactically invalid: 'ab#cd', 'for', '""' (i.e., an empty string); and consist of only dots, which is a reserved word: '..'; and consist of two dots followed by digits, which is a reserved word: '..23'.
#> Use 'x <- make.names(x, unique = TRUE)' to create unique, syntactically valid names
#> (it does not recognise names that consist of only dots, or two dots followed by digits)!
#> [1] FALSE
# Names that have been made valid are suspicious
# (but make.names() does not adjust ".." or "..23"):
all_names(x = make.names(invalid_names)) # FALSE
#> Warning: Names consist of only dots, which is a reserved word: '..'; and consist of two dots followed by digits, which is a reserved word: '..23'; and are suspicious: 'ab.cd', 'X', 'for.'
#> [1] FALSE
# FALSE: suspicious names
all_names(x = c("e.2", "a.1b", ".TRUE", "..22c", "a...2",
"V3", "X.2", "X0...11", "X0.3", "X3"))
#> Warning: Names are suspicious: 'e.2', 'a.1b', '.TRUE', '..22c', 'a...2', 'V3', 'X.2', 'X0...11', 'X0.3', 'X3'
#> [1] FALSE
all_names(x = "abc_def", allow_underscores = FALSE) # FALSE: underscores
#> Warning: Names contain underscores (which are not allowed if 'allow_underscores' is FALSE):
#> 'abc_def'.
#> Use 'x <- make.names(x, unique = TRUE, allow_ = FALSE)' to create unique,
#> syntactically valid names without underscores!
#> [1] FALSE
all_names(x = "abc_def", allow_underscores = TRUE) # TRUE
#> [1] TRUE
# pass names() or colnames() used on an object
# without (column) names to all_names():
all_names(x = names(1:3)) # FALSE
#> Warning: 'x' is NULL: did you use names() or colnames() on an object without
#> (column) names to all_names()?
#> [1] FALSE
all_names(13) # FALSE: 'x' is not a character vector
#> Warning: Input to 'x' is not a character vector: 13
#> [1] FALSE