Skip to contents

Check that x is a character vector with unique, syntactically valid names that do not consist of only dots or of two dots followed by a number, and do not suggest they were adjusted or automatically created.

Usage

all_names(x, allow_underscores = TRUE)

Arguments

x

Vector of names to test.

allow_underscores

TRUE or FALSE: allow underscores?

Value

TRUE or FALSE, indicating if x is a character vector that consists of unique, syntactically valid names that do not consist of only dots or of two dots followed by a number, and do not suggest they were adjusted or automatically created.

Details

Duplicated or syntactically invalid names are not allowed by all_names() because R functions are not guaranteed to handle such names correctly. For example, not all operations on data frames will preserve duplicated column names, and operations involving syntactically invalid names might, by definition, give undocumented results.

Syntactically valid names only consist of letters, numbers, dots and underscores; start with a letter, or with a dot not followed by a number; and are not reserved words such as for or any of the NAs. The definition of letter depends on the current locale. A conservative check for names that are syntactically valid on all locales would only allow digits and unaccented Latin letters, but that is not enforced by all_names().

Names that consist of only dots, or consist of two dots followed by a number, are not allowed by all_names() (nor by vctrs::vec_as_names()): they are listed as reserved words even though they are not recognised as syntactically invalid by make.names().

Suspicious names are not allowed by all_names(). A suspicious name contains a pattern that suggests it originally was syntactically invalid and has been adjusted into a syntactically valid name, or has been adjusted to make names unique. Such adjustments usually occur silently, for example when data is read into R, which is problematic because it cannot reliably be assumed the original column names are present. The identification of suspicious names is partly based on the assumption that names originally did not contain dots, see the first item in the list below.

all_names() tries to recognise adjustments made by make.names(), which is used by data.frame(), read.csv(), and data.table::fread(x, header = TRUE, check.names = TRUE); and adjustments made by vctrs::vec_as_names(x, repair = "universal"), which is used throughout the tidyverse:

  • adjustments to replace invalid characters (i.e., characters that are not a letter, number, dot or underscore): make.names() and vctrs::vec_as_names(x, repair = "universal") replace such characters with a dot. Their identification is based on the assumption that names originally did not contain dots, which is good practice (unfortunately not strictly followed in base-R, e.g., in data.frame()) preventing names containing a dot from being confused with methods used on classed objects.

  • adjustments to make duplicated names unique: make.names(x, unique = TRUE) appends a dot followed by a number; vctrs::vec_as_names(x, repair = "universal") appends three dots followed by a number. It is not checked if a complete sequence of suspicious names is present, e.g., a.2 will be flagged as suspicious even if a and a.1 are absent.

  • adjustments to make reserved words valid: make.names() appends a dot; vctrs::vec_as_names(x, repair = "universal") prepends a dot.

  • adjustments to make names that did not start with a letter, nor with a dot not followed by a number, syntactically valid: make.names() prepends X; vctrs::vec_as_names(x, repair = "universal") prepends one or more dots.

  • adjustments to name unnamed columns: data.frame() uses pattern V1, V2, V3 if a matrix without column names is converted to a data.frame, and read.csv(..., header = FALSE) uses the same pattern for data without column names; read.csv(..., header = TRUE) uses pattern X, X.1, X.2.

Names containing underscores (_) are by default allowed by all_names() because names containing underscores are not syntactically invalid. However, setting allow_underscores to FALSE to not allow such names is useful to check that names do not contain underscores, for example if several names will be concatenated separated by underscores to create an ID-tag.

Programming notes

The patterns used to identify suspicious names are created using regular expressions with the following elements:

  • require a pattern to start at the beginning of a string (^) or reach the end of a string ($);

  • specify characters that should be present: a dot (\\. or, if fixed is TRUE, .), an underscore (_), any digit ([0-9]), digits one to nine ([1-9]), characters V or X);

  • indicate presence: present zero or more times (*); present one or more times (+).

Multiple patterns can be combined using |, the normal operator indicating logical OR.

See also

Section Details of make.names(), section Names and Identifiers of Quotes, and the R FAQ about valid names on the syntactical validity of names.

names() to get or set object names; janitor::make_clean_names() to adjust names, e.g., through adjusting case and transliterating non-ASCII characters.

The vignettes about design choices and about type coercion.

Other collections of checks on type and length: all_characters(), is_logical(), is_natural(), is_number(), is_zerolength()

Examples

all_names(x = c("a", "b1a")) # TRUE
#> [1] TRUE
all_names(x = c("a", "b1a", "a")) # FALSE: duplicated name
#> Warning: Names are duplicated: 'a'.
#> Use 'x <- make.names(x, unique = TRUE)' to create unique, syntactically valid names!
#> [1] FALSE

invalid_names <- c("a", "ab#cd", "", "for", "..", "..23")
# Syntactically invalid names: the character '#' makes names invalid,
# '""' is an empty name, 'for', '..', and '..23' are reserved words.
all_names(x = invalid_names) # FALSE
#> Warning: Names are syntactically invalid: 'ab#cd', 'for', '""' (i.e., an empty string); and consist of only dots, which is a reserved word: '..'; and consist of two dots followed by digits, which is a reserved word: '..23'.
#> Use 'x <- make.names(x, unique = TRUE)' to create unique, syntactically valid names
#> (it does not recognise names that consist of only dots, or two dots followed by digits)!
#> [1] FALSE

# Names that have been made valid are suspicious
# (but make.names() does not adjust ".." or "..23"):
all_names(x = make.names(invalid_names)) # FALSE
#> Warning: Names consist of only dots, which is a reserved word: '..'; and consist of two dots followed by digits, which is a reserved word: '..23'; and are suspicious: 'ab.cd', 'X', 'for.'
#> [1] FALSE

# FALSE: suspicious names
all_names(x = c("e.2", "a.1b", ".TRUE", "..22c", "a...2",
                "V3", "X.2", "X0...11", "X0.3", "X3"))
#> Warning: Names are suspicious: 'e.2', 'a.1b', '.TRUE', '..22c', 'a...2', 'V3', 'X.2', 'X0...11', 'X0.3', 'X3'
#> [1] FALSE

all_names(x = "abc_def", allow_underscores = FALSE) # FALSE: underscores
#> Warning: Names contain underscores (which are not allowed if 'allow_underscores' is FALSE):
#> 'abc_def'.
#> Use 'x <- make.names(x, unique = TRUE, allow_ = FALSE)' to create unique,
#> syntactically valid names without underscores!
#> [1] FALSE
all_names(x = "abc_def", allow_underscores = TRUE) # TRUE
#> [1] TRUE

# pass names() or colnames() used on an object
# without (column) names to all_names():
all_names(x = names(1:3)) # FALSE
#> Warning: 'x' is NULL: did you use names() or colnames() on an object without
#> (column) names to all_names()?
#> [1] FALSE

all_names(13) # FALSE: 'x' is not a character vector
#> Warning: Input to 'x' is not a character vector: 13
#> [1] FALSE