[R] Regex: Combining sub/grepl with ifelse

Boris Steipe boris.steipe at utoronto.ca
Sun Oct 11 15:23:26 CEST 2015


You are the domain expert, but it would seem to me that "-NEGRO" is a part of the ID because it uniquely specifies the product.

From the perspective of expressing your business logic in code, dropping this part of the string should have a separate line in the code, and a comment. Dropping the "-NEGRO" part from the token is not part of identifying the tokens. And it's not part of recognizing which token is the ID. 

The regex to identify the unwanted part is "-[A-Z]{2,}$": a literal hyphen, followed by 2 or more uppercase letters, up to the end of the token. 

I think it's great that you are introducing code that handles the case that no ID is found. Whenever you have no complete control over your input data, it's important to anticipate variations. But I would also consider the case that two tokens match the description of an ID (perhaps an "O" is mistyped as a zero). Finally, your code is correct, but you are repeating the regular expression. That is a potential source of error when anyone ever updates the regex - it violates the DRY principle: Don't Repaet Yourself. For this you have three options: either change the logic of your code to use grep only once, or assign the regex to a variable e.g. IDregEx <- "[A-Z][0-9]", or assign the result of grep() to an intermediate value. Since we want to add checks, I'll go with the latter version.

# minimal working example with test cases
mwe <- structure(list(id = c(NA, NA, NA, NA, NA, NA), marca = c("LG", 
"LG", "PANASONIC", "SONY", "LG", "LG"), producto = c("LG LED FULL HD SMART TV 42''42LF5850 - PLATEADO", 
"LG - 24MT47A + MONITOR TV 24\" PUERTOS HDMI, USB, AV - NEG...", 
"TELEVISIÓN PANASONIC TC-L40SV7L-NEGRO LED FULL-HD 40''", "SONY - TELEVISOR LED SMART TV FULL HD 40'' KDL-40R555C - ...", 
"LG -TV LED HG 32\" ", "LG TV LED SMART HD 32'' - 32LF585B (LIKE 32LF550B)"
)), .Names = c("id", "marca", "producto"), row.names = c(90L, 
106L, 126L, 133L, 77L, 88L), class = "data.frame")

new <- mwe
for (i in 1:nrow(mwe)) {
    v <- unlist(strsplit(mwe$producto[i], "[^A-Z0-9-]+")) # isolate tokens
    g <- grep("[A-Z][0-9]", v)
    if (length(g) == 0) {  # no token looks like an ID
    	    ID <- NA
    	    warning(paste("No ID in row ", 
    	                   i,
    	                   ": >>",
    	                   mwe$producto[i]),
    	                   "<<",
    	                   sep="") 
    }
    else if (length(g) > 1) { # more than one token looks like an ID
        ID <- NA
    	    warning(paste("More than one ID in row ", 
    	                   i,
    	                   ": >>",
    	                   mwe$producto[i]),
    	                   "<<",
    	                   sep="") 
    }
    else {
         ID <- sub("-[A-Z]{2,}$", "", v[g]) # drop trailing qualifier, if any
    }
    new$id[i] <- ID
}
new


Cheers,
Boris






On Oct 11, 2015, at 1:07 AM, Omar André Gonzáles Díaz <oma.gonzales at gmail.com> wrote:

> Hi  Boris,
> 
> I've modified a little the for loop to catch the IDs (if there is any) otherwise to put NAs. This is for another data set.
> 
> 
> 
> for (i in 1:nrow(linio.tv)) {
>         
>         v <- unlist(strsplit(linio.tv$producto[i], "[^A-Z0-9-]+")) # isolate tokens
>         
>         if(any(grep("[A-Z][0-9]", v))) {
>                 
>                 linio.tv$id[i] <- v[grep("[A-Z][0-9]", v)]  
>                 
>         }  
>         
>         else {
>                 linio.tv$id[i] <- NA
>         }
> }
> 
> 
> I get this warning messages, nevertheless the IDs column get the correct values:
> 
> Warning messages:
> 1: In linio.tv$id[i] <- v[grep("[A-Z][0-9]", v)] :
>   number of items to replace is not a multiple of replacement length
> 2: In linio.tv$id[i] <- v[grep("[A-Z][0-9]", v)] :
>   number of items to replace is not a multiple of replacement length
> 
> 
> The problem:
> 
> There are entries where the grep part is not specific enough. 
> 
> Like this one: "UN50JU6500-NEGRO". It satifies the rule in:
> 
> linio.tv$id[i] <- v[grep("[A-Z][0-9]", v)]  , but is not supposed to take also: "UN50JU6500-NEGRO" entirely, only this part: "UN50JU6500".
> 
> 
> I've noticed this rule: the IDs can have at maxium 1 letter after the "-". If it contains more than 1, that part should not be considered.
> 
> "TC-L42AS610"
> 
> Also IDs can start with numbers: 1,2, or 3.
> 
> "KDL-40R354B"
> 
> 
> 
> 
> May you clarify to me if it's something that can be done within R?  I'm trying to figure this out, but with any good result. 
> 
> I could cleaned with "sub()" (there is only one entry giving me troubles) but the idea is not to have "technical debt" for the future.
> 
> 
> 
> 
> This is the new data set, I'm talking about:
> 
> 
> 
> 
> 
> 
> linio.tv <- structure(list(id = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), marca = c("LG", "SAMSUNG", 
> "SAMSUNG", "SAMSUNG", "LG", "LG", "LG", "LG", "LG", "LG", "LG", 
> "SAMSUNG", "LG", "LG", "SAMSUNG", "LG", "LG", "LG", "LG", "SAMSUNG", 
> "LG", "LG", "LG", "SONY", "SAMSUNG", "LG", "LG", "SAMSUNG", "SONY", 
> "SAMSUNG", "LG", "LG", "LG", "IMACO", "SAMSUNG", "LG", "SAMSUNG", 
> "SAMSUNG", "LG", "HAIER", "LG", "SONY", "SAMSUNG", "LG", "LG", 
> "LG", "SAMSUNG", "SAMSUNG", "SAMSUNG", "SONY", "HISENSE", "LG", 
> "SAMSUNG", "LG", "SAMSUNG", "LG", "SAMSUNG", "SAMSUNG", "CONTINENTAL", 
> "LG", "IMACO", "AOC", "AOC", "SAMSUNG", "LG", "SONY", "LG", "LG", 
> "SONY", "SAMSUNG", "SAMSUNG", "PANASONIC", "LG", "SAMSUNG", "NEX", 
> "IMACO", "LG", "LG", "CONTINENTAL", "SONY", "LG", "LG", "SAMSUNG", 
> "LG", "LG", "LG", "LG", "LG", "SAMSUNG", "LG", "LG", "SAMSUNG", 
> "SAMSUNG", "SAMSUNG", "SAMSUNG", "SAMSUNG", "AOC", "LG", "LG", 
> "AOC", "LG", "SAMSUNG", "LG", "SAMSUNG", "SAMSUNG", "LG", "LG", 
> "SAMSUNG", "SAMSUNG", "SONY", "LG", "SAMSUNG", "SAMSUNG", "LG", 
> "SAMSUNG", "LG", "SAMSUNG", "LG", "SAMSUNG", "LG", "SAMSUNG", 
> "SAMSUNG", "SAMSUNG", "SAMSUNG", "LG", "PANASONIC", "PANASONIC", 
> "SAMSUNG", "SAMSUNG", "SAMSUNG", "SAMSUNG", "SAMSUNG", "SONY", 
> "LG", "LG", "PANASONIC", "AOC", "SAMSUNG", "LG", "SAMSUNG", "LG", 
> "SAMSUNG", "LG", "LG", "LG", "PANASONIC", "PANASONIC", "LG", 
> "SAMSUNG", "SAMSUNG", "SAMSUNG", "SAMSUNG", "SAMSUNG", "SAMSUNG", 
> "SAMSUNG", "SAMSUNG", "SAMSUNG", "SAMSUNG", "LG", "SAMSUNG", 
> "LG", "LG", "SAMSUNG", "LG"), producto = c("COMBO SMART - LG TV LED 4K ULTRA HD 43'' - 43UF6750 + GOO...", 
> "SAMSUNG TV LED SMART HD 32'' UN32J4300 - NEGRO", "SAMSUNG TV LED 3D SMART FULL HD 48'' - 48J6400", 
> "SAMSUNG TV LED 3D SMART FULL HD 55'' - 55J6400", "LG TV SMART LED HD 32\" 32LF585B - BLANCO", 
> "LG TV SLIM ULTRA HD 3D WEBOS 2.0 49'' 49UF8500 - PLATEADO", 
> "LG TV SMART WEBOS 2.0 FULL HD 43\" 43LF5900 -NEGRO", "LG TV LED HD 32\" - 32LF550B", 
> "LG TV LED SMART FULL HD 43'' 43LF6350 - NEGRO", "LG TV  LED SMART HD 32\" - 32LF585B", 
> "LG GAME TV LED FULL HD 49\" - 49LF5410", "SAMSUNG TV LED  FULL HD 60'' - UN60FH6003", 
> "LG TV SMART WEBOS 2.0 FULL HD 49\" - 49LF6350", "LG TV LED  FULL HD 43'' - 43LF5410", 
> "SAMSUNG TV SMART FULL HD CURVO 40'' TIZEN -  UN40J6500", "LG TV SMART WEBOS 2.0 ULTRA HD  4K 43\" - 43UF6400", 
> "LG TV SLIM LED CINEMA 3D FULL HD 42'' 42LB6200 INCLUYE 02...", 
> "LG GAMETV  LED FULL HD 43\" - 43LF5400", "LG GAME TV LED FULL HD 49\" - 49LF5410", 
> "TELEVISOR SAMSUNG UN40J5500 SMART TV LED FULL HD 40''-PLA...", 
> "LG SMART  4K ULTRA HD 55\" - 55UB8200", "LG -TV LED SMART WEBOS 2.0 FULL HD 55\" - 55LF6350", 
> "LG - GAME TV LED  FULL HD 43'' - 43LF5410", "SONY - TV LED SMART HD 32'' - 32R505C", 
> "SAMSUNG TV LED 3D SMART FULL HD 40'' - UN40H6400", "LG SMART TV 32\" HD WEBOS 2.0  32LF595B", 
> "LG - TV LED WEBOS 3D SMART ULTRA HD CURVO  55'' 55UG8700 ...", 
> "SAMSUNG TV LED FULL HD 40\" UN40JH5005 - NEGRO", "SONY TV LED FULL HD 40'' - KDL-40R354B", 
> "SAMSUNG TV LED SMART FULL HD 40'' TIZEN UN40J5500 - PLATEADO", 
> "LG TV LED FULL HD 42'' ULTRA SLIM 42LY340C - NEGRO", "LG TV SMART LED FULL HD 43\" - 43LF6350", 
> "LG TV LED CURVO 55\" SMART ULTRA HD 4K CINEMA 3D - 55UC9700", 
> "IMACO - TV LED HD 24´´ - LED24HD", "TELEVISOR SAMSUNG UN32J4300 SMART TV LED HD 32''-NEGRO", 
> "LG TV 3D SMART LED ULTRA HD 65\" - 65UF8500", "SAMSUNG - TV LED SMART 3D 65\" FULL HD SERIE 8 INTERACTIVO...", 
> "SAMSUNG - TV LED HD 32\"  32JH4005 - NEGRO", "LG TV 55\" SMART ULTRA HD 4K CINEMA 3D  55UB8500", 
> "HAIER TV LED HD LIVE GREEN 24'' - 24B8000", "LG TV LED FULL HD 47'' - 47LB5610", 
> "SONY TV LED FULL HD 32'' - KDL-32R304B", "SAMSUNG TV LED SERIE 5 FULL HD 39” - 39FH5005", 
> "LG - TV SAMT SLIM ULTRA HD 4K WEBOS 2.0 55'' 55UF7700 - P...", 
> "LG TV LED CURVO 55\" SMART ULTRA HD 4K CINEMA 3D - 55UC9700", 
> "LG TV MONITOR LED HD 23.6” - 24MT47A", "SAMSUNG - MONITOR LED 32\" MD32C - NEGRO", 
> "TELEVISOR SAMSUNG UN40J6400 SMART TV LED 3D FULL HD 40''-...", 
> "SAMSUNG LED SMART FULL HD 48'' - UN48J6500", "SONY - TV LED SMART FULL HD 40'' - 40R555C", 
> "TELEVISOR HISENSE LED 40\" 40K221W SMART TV LED FULL HD", "LG TV MONITOR LED HD 23.6” - 24MT47A", 
> "TELEVISOR SAMSUNG UN48J6400 SMART TV LED 3D FULL HD 48''-...", 
> "LG 49UB8500 LED 49\" SMART 3D 4K", "SAMSUNG TV LED 3D SMART FULL HD 40'' TIZEN UN40J6400 - NEGRO", 
> "LG TV LED FULL HD 42\" - 42LY340C", "SAMSUNG TV LED HD 32'' UN32J4000 - NEGRO", 
> "TELEVISOR SAMSUNG UN48J5500 SMART TV LED FULL HD 48''-PLA...", 
> "CONTINENTAL - TV LED 15.6\" CELED95935, INCLUYE RACK", "LG TV LED 4K ULTRA HD 43\" - 43UF6750", 
> "IMACO - TV LED HD 16´´ - LED16HD", "AOC - TELEVISOR HD 32\" LE32W454F- NEGRO", 
> "AOC TV LED HD 20\" - LE20A1140", "SAMSUNG TV LED HD 32'' - UN32J4000", 
> "LG - TV MONITOR 27.5” - 28MT47B", "SONY TV LED FULL HD 40'' - KDL-40R354B", 
> "LG TV MONITOR LED HD 23.6” - 24MT47A", "LG GAME TV LED FULL HD 49\" - 49LF5410", 
> "SONY TV LED FULL HD 40'' - KDL-40R354B", "SAMSUNG - UN48J6400 LED FULL HD 48\"SMART TIZEN 3D 2015 - ...", 
> "SAMSUNG - MONITOR FULL HD 40\" MD40C - NEGRO", "PANASONIC LED SMART FULL HD 50\" - TC-50AS600", 
> "LG - 43LF5410 LED 43\" FULL HD GAME  - SILVER", "SAMSUNG - TELEVISOR LED HD 32\" UN32JH4005 - NEGRO", 
> "NEX TV LED SMART HD 32\" USB WIFI INCORPORADO - LED3208SMR", 
> "IMACO - TV LED HD 19´´ - LED19HD", "LG -TV LED HG 32\" - 32LF550B", 
> "LG - TELEVISOR LED 32\" HD 32LF550B", "CONTINENTAL - TV LED 19\" CELED99935,  INCLUYE RACK", 
> "SONY TV LED FULL HD 40'' - KDL-40R354B", "LG - TELEVISOR LED 32\" HD SMART TV 32LF585B - BLANCO", 
> "MONITOR TV LG 24MT47A LED HD 23.6”-PLATEADO", "TELEVISOR SAMSUNG UN32J4300 SMART TV LED HD 32''-NEGRO", 
> "LG GAME TV LED FULL HD 49\" - 49LF5400", "LG - TELEVISOR LED 32\" HD SMART TV 32LF585B – BLANCO", 
> "LG - TELEVISOR LED 32\" HD 32LF550B", "LG TV LED HD 32'' - 32LF550B", 
> "LG TV LED SMART HD 32'' - 32LF585B", "SAMSUNG - TELEVISOR LED FULL HD 40\" UN40JH5005 – NEGRO", 
> "LG LED FULL HD SMART TV 42''42LF5850 - PLATEADO", "LG TV LED WEBOS 3D SMART ULTRA HD 49'' - 49UF8500", 
> "SAMSUNG - TV LED SMART FULL HD 40” UN40H5500 - NEGRO", "SAMSUNG TV LED SMART HD 32'' UN32J4300 - NEGRO", 
> "SAMSUNG TV LED ALTA DEFINICIÓN DTV USB 32\" - 32JH4005", "SAMSUNG - TELEVISOR LED FULL HD 40\" UN40JH5005 - NEGRO", 
> "SAMSUNG TV LED SMART TIZEN 3D QUADCORE40\" - UN40J6400", "AOC TV LED HD 32\" - LE32W454F +RACK FIJO", 
> "LG TV LED FULL HD 43'' - 43LF5410", "LG - TV LED WEBOS 3D SMART FULL HD 55'' - 55LF6500", 
> "AOC 32\" LE32W454F  HD DIGITAL LED TV + HOME THEATRE F1200U", 
> "LG TV LED WEBOS 3D SMART ULTRA HD 49'' - 49UF8500", "SAMSUNG TV LED ALTA DEFINICIÓN DTV USB 32\" - 32JH4005", 
> "LG - 42LF6400 LED FULL HD 42'' SMART WEBOS 3D - SILVER", "TELEVISOR SAMSUNG UN48J5300 SMART TV LED FULL HD 48''-NEGRO", 
> "SAMSUNG UN40JH5005 LED FULL HD 40\"  - NEGRO GLOSS", "LG - 24MT47A + MONITOR TV 24\" PUERTOS HDMI, USB, AV - NEG...", 
> "LG TV LED SMART 4K ULTRA HD 55\" - 55UB8200", "SAMSUNG - 55J6400 LED 55\" SMART TIZEN 3D - BLACK", 
> "SAMSUNG TV CURVED SMART ULTRA HD 48'' TIZEN UN48JU6700 - ...", 
> "TELEVISIÓN SONY KDL-32R505C LED 32\"-NEGRO", "LG TV LED CINEMA 3D 4K SMART ULTRA HD 49'' + 02 LENTES 3D...", 
> "SAMSUNG - 55J6400 LED 55\" SMART TIZEN 3D - BLACK", "SAMSUNG - 40J5500 LED 40\" SMART QUADCORE / BLUETOOTH* - S...", 
> "LG TV LED WEBOS 3D SMART ULTRA HD 49'' - 49UF8500", "SAMSUNG TV LED SMART FULL HD 40'' TIZEN UN40J5500 - PLATEADO", 
> "LG - TELEVISOR LED 42\" FULL HD SMART TV 42LF5850 – PLAT...", 
> "TELEVISIÓN SAMSUNG UN48J5500 LED SMART TV 48\"-PLATEADO", "LG - TELEVISOR LED 42\" FULL HD SMART TV 42LF5850 - PLATEADO", 
> "TELEVISOR SAMSUNG  UN55JU6700 LED UHD 4K SMART 55'' - PLA...", 
> "LG - TV LED WEBOS 3D SMART SUPER ULTRA HD 55'' - 55UF9500", 
> "TELEVISOR SAMSUNG UN50JU6500  UHD 4K SMART 50'' - PLATEADO", 
> "SAMSUNG - TELEVISOR LED HD 40\" SMART UN40J5500 - NEGRO", "TELEVISOR SAMSUNG UN48J6500 CURVO  FULL HD SMART 48'' - P...", 
> "SAMSUNG - TELEVISOR LED HD 32\" SMART UN32J4300 - NEGRO", "LG TV LED CINEMA 3D 4K SMART ULTRA HD 55'' 55UB8500 - NEGRO", 
> "TELEVISIÓN PANASONIC TC-L40SV7L LED FULL-HD 40''-NEGRO", "PANASONIC TV LED 42´´ FULL HD TC-L42E6L - NEGRO.", 
> "TELEVISOR SAMSUNG UN 40JH5005 LED FULL HD", "SAMSUNG - TV LED SMART CURVO 3D ULTRA HD 65” UN65HU9000...", 
> "SAMSUNG - UN48J5300 LED FULL HD SMART 2015 - BLACK", "SAMSUNG TV LED SMART FULL HD 50'' TIZEN UN50J5500 - PLATEADO", 
> "SAMSUNG - TV SMART 3D FULL HD 60” UN60H7100 - NEGRO", "SONY - TELEVISOR LED SMART TV FULL HD 40'' KDL-40R555C - ...", 
> "LG TV 47\" LED FULL HD - 47LY340C", "LG TV UHD 4K 65UB9800 SMART 3D LED TV C/WEBOS 65' LENTES 3D", 
> "PANASONIC - TELEVISOR TC-L42AS610 LED SMART FULL HD 42”...", 
> "AOC - TELEVISOR LED 32\" - LE32W454F", "SAMSUNG TV LED 32¨ - UN32FH4005G", 
> "LG TV SMART LED FULL HD 42\" - 42LF5850", "SAMSUNG TV LED 3D SMART FULL HD 40'' TIZEN UN40J6400 - NEGRO", 
> "LG TV SMART  LED FULL HD 42\" - 42LF5850", "SAMSUNG TV LED HD 32'' UN32JH4005 - NEGRO", 
> "LG TV PLASMA 2014 60\" FULL HD 1080P - 60PB5600", "LG TV LED CINEMA 3D SMART FULL HD 55'' 55LB7050 - PLATEADO", 
> "LG TV LED SMART FULL HD 43'' 43LF6350 - NEGRO", "PANASONIC PUERTO USB LED 40\" - TC-L40SV7L", 
> "PANASONIC LED SMART FULL HD 42\" - TC-L42AS610", "LG TV SMART  LED FULL HD 49\" - 49LF6350", 
> "SAMSUNG TV LED SMART ULTRA HD 50'' TIZEN UN50JU6500-NEGRO", 
> "SAMSUNG TV LED SMART ULTRA HD 50'' TIZEN UN50JU6500 - NEGRO", 
> "SAMSUNG TV SMART FULL HD CURVO 48'' TIZEN UN48J6500", "SAMSUNG TV  SMART ULTRA HD 4K  65'' - UN65JU6500", 
> "SAMSUNG UN48J5500 LED 48\" - PLATEADO", "SAMSUNG LED 32\" CONEXIÓN WIFI - UN32J4300", 
> "SAMSUNG LED SMART 40'' CONEXIÓN WI-FI DIRECT - UN40J5500", "SAMSUNG LED SMART ULTRA HD 55\" - TVUN55JU6700", 
> "SAMSUNG TV CURVED 3D SMART ULTRA HD 65'' TIZEN UN65JU7500...", 
> "SAMSUNG TELEVISOR  HG32NB460GF, 32\" LED, HD, 1366 X 768", "LG -TV SMART LED FULL HD 55\" - 55LF6350", 
> "SAMSUNG TV LED SMART 3D 48\" - UN48H6400", "LG LED ULTRAHD 4K 49\" SMART 3D - 49UB8300", 
> "LG - TV LED SMART HD 32'' 32LF585B - PLATEADO", "SAMSUNG - TV LED FULL HD 40\" UN40JH5005  - NEGRO GLOSS", 
> "LG - TV LED FULL HD 43'' 43LF5410 - PLATEADO"), precio.antes = c(2599L, 
> 1299L, 2899L, 3999L, 1199L, 4499L, 1999L, 1099L, 2299L, 1299L, 
> 2499L, 3999L, 2199L, 1899L, 2299L, 2299L, 1799L, 1499L, 2299L, 
> 1999L, 3999L, 3499L, 1549L, 1299L, 2299L, 2299L, 6999L, 1499L, 
> 1499L, 1899L, 1499L, 2099L, 6999L, 599L, 1299L, 9999L, 8999L, 
> 999L, 5999L, 599L, 2299L, 1299L, 1499L, 4999L, 6999L, 899L, 2299L, 
> 2499L, 3299L, 1799L, 1399L, 899L, 2499L, 4199L, 2299L, 1499L, 
> 1099L, 2499L, 399L, 2499L, 399L, 999L, 599L, 999L, 899L, 1499L, 
> 699L, 2299L, 1399L, 2499L, 2999L, 2499L, 1599L, 1149L, 999L, 
> 499L, 1089L, 1099L, 499L, 1499L, 1399L, 799L, 1299L, 2499L, 1399L, 
> 1259L, 1299L, 1299L, 1599L, 1999L, 3999L, 1999L, 1199L, 999L, 
> 1599L, 2299L, 999L, 1499L, 3699L, 1199L, 3899L, 1099L, 2299L, 
> 2499L, 1399L, 729L, 4199L, 3599L, 4999L, 1399L, 3999L, 4999L, 
> 2199L, 4499L, 2299L, 1699L, 2779L, 1699L, 5799L, 8999L, 3699L, 
> 2099L, 3299L, 1299L, 5900L, 1799L, 1799L, 1399L, 14999L, 2499L, 
> 2799L, 6299L, 1799L, 2417L, 9500L, 1799L, 799L, 999L, 1999L, 
> 2499L, 1899L, 999L, 2299L, 3699L, 2199L, 1699L, 1999L, 2499L, 
> 3499L, 3899L, 2999L, 7999L, 2299L, 1299L, 2099L, 5799L, 9999L, 
> 1110L, 3399L, 2799L, 3899L, 1299L, 1399L, 1499L), precio.nuevo = c(1799L, 
> 999L, 2299L, 3299L, 999L, 3199L, 1499L, 849L, 1399L, 979L, 1795L, 
> 2999L, 1899L, 1299L, 1699L, 1599L, 1499L, 1299L, 1699L, 1449L, 
> 3699L, 2499L, 1199L, 999L, 1499L, 899L, 4999L, 1199L, 1199L, 
> 1389L, 1299L, 1699L, 4899L, 549L, 999L, 7499L, 6700L, 849L, 4299L, 
> 549L, 1499L, 899L, 1299L, 3599L, 5354L, 538L, 1959L, 1599L, 2999L, 
> 1367L, 1099L, 589L, 2449L, 3199L, 1529L, 1229L, 839L, 1779L, 
> 329L, 1799L, 389L, 719L, 489L, 849L, 799L, 1185L, 599L, 1609L, 
> 1299L, 2179L, 2839L, 1999L, 1599L, 899L, 799L, 449L, 880L, 899L, 
> 429L, 1275L, 1199L, 589L, 999L, 1749L, 1199L, 1099L, 899L, 989L, 
> 1399L, 1999L, 2999L, 1599L, 999L, 819L, 1299L, 2299L, 789L, 1299L, 
> 3199L, 977L, 3089L, 849L, 1719L, 1799L, 1399L, 569L, 3979L, 3299L, 
> 3369L, 1093L, 3389L, 3289L, 1419L, 3429L, 1405L, 1499L, 1899L, 
> 1499L, 5199L, 6999L, 3199L, 1599L, 2999L, 1099L, 5089L, 1459L, 
> 1499L, 1289L, 12999L, 1739L, 2255L, 5879L, 1499L, 1929L, 8499L, 
> 1649L, 799L, 899L, 1659L, 1749L, 1609L, 831L, 2089L, 3659L, 1769L, 
> 1499L, 1599L, 2176L, 2749L, 2889L, 2899L, 5599L, 1899L, 1099L, 
> 1899L, 5199L, 8589L, 990L, 3169L, 2199L, 3899L, 949L, 1099L, 
> 1199L), dif.precios = c(800L, 300L, 600L, 700L, 200L, 1300L, 
> 500L, 250L, 900L, 320L, 704L, 1000L, 300L, 600L, 600L, 700L, 
> 300L, 200L, 600L, 550L, 300L, 1000L, 350L, 300L, 800L, 1400L, 
> 2000L, 300L, 300L, 510L, 200L, 400L, 2100L, 50L, 300L, 2500L, 
> 2299L, 150L, 1700L, 50L, 800L, 400L, 200L, 1400L, 1645L, 361L, 
> 340L, 900L, 300L, 432L, 300L, 310L, 50L, 1000L, 770L, 270L, 260L, 
> 720L, 70L, 700L, 10L, 280L, 110L, 150L, 100L, 314L, 100L, 690L, 
> 100L, 320L, 160L, 500L, 0L, 250L, 200L, 50L, 209L, 200L, 70L, 
> 224L, 200L, 210L, 300L, 750L, 200L, 160L, 400L, 310L, 200L, 0L, 
> 1000L, 400L, 200L, 180L, 300L, 0L, 210L, 200L, 500L, 222L, 810L, 
> 250L, 580L, 700L, 0L, 160L, 220L, 300L, 1630L, 306L, 610L, 1710L, 
> 780L, 1070L, 894L, 200L, 880L, 200L, 600L, 2000L, 500L, 500L, 
> 300L, 200L, 811L, 340L, 300L, 110L, 2000L, 760L, 544L, 420L, 
> 300L, 488L, 1001L, 150L, 0L, 100L, 340L, 750L, 290L, 168L, 210L, 
> 40L, 430L, 200L, 400L, 323L, 750L, 1010L, 100L, 2400L, 400L, 
> 200L, 200L, 600L, 1410L, 120L, 230L, 600L, 0L, 350L, 300L, 300L
> ), dif.porcentual = c(30.78, 23.09, 20.7, 17.5, 16.68, 28.9, 
> 25.01, 22.75, 39.15, 24.63, 28.17, 25.01, 13.64, 31.6, 26.1, 
> 30.45, 16.68, 13.34, 26.1, 27.51, 7.5, 28.58, 22.6, 23.09, 34.8, 
> 60.9, 28.58, 20.01, 20.01, 26.86, 13.34, 19.06, 30, 8.35, 23.09, 
> 25, 25.55, 15.02, 28.34, 8.35, 34.8, 30.79, 13.34, 28.01, 23.5, 
> 40.16, 14.79, 36.01, 9.09, 24.01, 21.44, 34.48, 2, 23.82, 33.49, 
> 18.01, 23.66, 28.81, 17.54, 28.01, 2.51, 28.03, 18.36, 15.02, 
> 11.12, 20.95, 14.31, 30.01, 7.15, 12.81, 5.34, 20.01, 0, 21.76, 
> 20.02, 10.02, 19.19, 18.2, 14.03, 14.94, 14.3, 26.28, 23.09, 
> 30.01, 14.3, 12.71, 30.79, 23.86, 12.51, 0, 25.01, 20.01, 16.68, 
> 18.02, 18.76, 0, 21.02, 13.34, 13.52, 18.52, 20.77, 22.75, 25.23, 
> 28.01, 0, 21.95, 5.24, 8.34, 32.61, 21.87, 15.25, 34.21, 35.47, 
> 23.78, 38.89, 11.77, 31.67, 11.77, 10.35, 22.22, 13.52, 23.82, 
> 9.09, 15.4, 13.75, 18.9, 16.68, 7.86, 13.33, 30.41, 19.44, 6.67, 
> 16.68, 20.19, 10.54, 8.34, 0, 10.01, 17.01, 30.01, 15.27, 16.82, 
> 9.13, 1.08, 19.55, 11.77, 20.01, 12.93, 21.43, 25.9, 3.33, 30, 
> 17.4, 15.4, 9.53, 10.35, 14.1, 10.81, 6.77, 21.44, 0, 26.94, 
> 21.44, 20.01), pulgadas = c("43", "32", "48", "55", "32", "49", 
> "43", "32", "43", "32", "49", "60", "49", "43", "40", "43", "42", 
> "43", "49", "40", "55", "55", "43", "32", "40", "32", "55", "40", 
> "40", "40", "42", "43", "55", "24", "32", "65", "65", "32", "55", 
> "24", "47", "32", "39", "55", "55", "6", "32", "40", "48", "40", 
> "40", "6", "48", "49", "40", "42", "32", "48", "6", "43", "16", 
> "32", "20", "32", "5", "40", "6", "49", "40", "48", "40", "50", 
> "43", "32", "32", "19", "32", "32", "19", "40", "32", "6", "32", 
> "49", "32", "32", "32", "32", "40", "42", "49", "40", "32", "32", 
> "40", "40", "32", "43", "55", "32", "49", "32", "42", "48", "40", 
> "24", "55", "55", "48", "32", "49", "55", "40", "49", "40", "42", 
> "48", "42", "55", "55", "50", "40", "48", "32", "55", "40", "42", 
> "NA", "65", "NA", "50", "60", "40", "47", "65", "42", "32", "32", 
> "42", "40", "42", "32", "60", "55", "43", "40", "42", "49", "50", 
> "50", "48", "65", "48", "32", "40", "55", "65", "32", "55", "48", 
> "49", "32", "40", "43"), rangos = c("S/.1500 - S/.2500", "S/.500 - S/.1500", 
> "S/.1500 - S/.2500", "S/.2500 - S/.3500", "S/.500 - S/.1500", 
> "S/.2500 - S/.3500", "S/.500 - S/.1500", "S/.500 - S/.1500", 
> "S/.500 - S/.1500", "S/.500 - S/.1500", "S/.1500 - S/.2500", 
> "S/.2500 - S/.3500", "S/.1500 - S/.2500", "S/.500 - S/.1500", 
> "S/.1500 - S/.2500", "S/.1500 - S/.2500", "S/.500 - S/.1500", 
> "S/.500 - S/.1500", "S/.1500 - S/.2500", "S/.500 - S/.1500", 
> "S/.3500 - S/.4500", "S/.1500 - S/.2500", "S/.500 - S/.1500", 
> "S/.500 - S/.1500", "S/.500 - S/.1500", "S/.500 - S/.1500", "> S/.4,500", 
> "S/.500 - S/.1500", "S/.500 - S/.1500", "S/.500 - S/.1500", "S/.500 - S/.1500", 
> "S/.1500 - S/.2500", "> S/.4,500", "S/.500 - S/.1500", "S/.500 - S/.1500", 
> "> S/.4,500", "> S/.4,500", "S/.500 - S/.1500", "S/.3500 - S/.4500", 
> "S/.500 - S/.1500", "S/.500 - S/.1500", "S/.500 - S/.1500", "S/.500 - S/.1500", 
> "S/.3500 - S/.4500", "> S/.4,500", "S/.500 - S/.1500", "S/.1500 - S/.2500", 
> "S/.1500 - S/.2500", "S/.2500 - S/.3500", "S/.500 - S/.1500", 
> "S/.500 - S/.1500", "S/.500 - S/.1500", "S/.1500 - S/.2500", 
> "S/.2500 - S/.3500", "S/.1500 - S/.2500", "S/.500 - S/.1500", 
> "S/.500 - S/.1500", "S/.1500 - S/.2500", "< S/.500", "S/.1500 - S/.2500", 
> "< S/.500", "S/.500 - S/.1500", "< S/.500", "S/.500 - S/.1500", 
> "S/.500 - S/.1500", "S/.500 - S/.1500", "S/.500 - S/.1500", "S/.1500 - S/.2500", 
> "S/.500 - S/.1500", "S/.1500 - S/.2500", "S/.2500 - S/.3500", 
> "S/.1500 - S/.2500", "S/.1500 - S/.2500", "S/.500 - S/.1500", 
> "S/.500 - S/.1500", "< S/.500", "S/.500 - S/.1500", "S/.500 - S/.1500", 
> "< S/.500", "S/.500 - S/.1500", "S/.500 - S/.1500", "S/.500 - S/.1500", 
> "S/.500 - S/.1500", "S/.1500 - S/.2500", "S/.500 - S/.1500", 
> "S/.500 - S/.1500", "S/.500 - S/.1500", "S/.500 - S/.1500", "S/.500 - S/.1500", 
> "S/.1500 - S/.2500", "S/.2500 - S/.3500", "S/.1500 - S/.2500", 
> "S/.500 - S/.1500", "S/.500 - S/.1500", "S/.500 - S/.1500", "S/.1500 - S/.2500", 
> "S/.500 - S/.1500", "S/.500 - S/.1500", "S/.2500 - S/.3500", 
> "S/.500 - S/.1500", "S/.2500 - S/.3500", "S/.500 - S/.1500", 
> "S/.1500 - S/.2500", "S/.1500 - S/.2500", "S/.500 - S/.1500", 
> "S/.500 - S/.1500", "S/.3500 - S/.4500", "S/.2500 - S/.3500", 
> "S/.2500 - S/.3500", "S/.500 - S/.1500", "S/.2500 - S/.3500", 
> "S/.2500 - S/.3500", "S/.500 - S/.1500", "S/.2500 - S/.3500", 
> "S/.500 - S/.1500", "S/.500 - S/.1500", "S/.1500 - S/.2500", 
> "S/.500 - S/.1500", "> S/.4,500", "> S/.4,500", "S/.2500 - S/.3500", 
> "S/.1500 - S/.2500", "S/.2500 - S/.3500", "S/.500 - S/.1500", 
> "> S/.4,500", "S/.500 - S/.1500", "S/.500 - S/.1500", "S/.500 - S/.1500", 
> "> S/.4,500", "S/.1500 - S/.2500", "S/.1500 - S/.2500", "> S/.4,500", 
> "S/.500 - S/.1500", "S/.1500 - S/.2500", "> S/.4,500", "S/.1500 - S/.2500", 
> "S/.500 - S/.1500", "S/.500 - S/.1500", "S/.1500 - S/.2500", 
> "S/.1500 - S/.2500", "S/.1500 - S/.2500", "S/.500 - S/.1500", 
> "S/.1500 - S/.2500", "S/.3500 - S/.4500", "S/.1500 - S/.2500", 
> "S/.500 - S/.1500", "S/.1500 - S/.2500", "S/.1500 - S/.2500", 
> "S/.2500 - S/.3500", "S/.2500 - S/.3500", "S/.2500 - S/.3500", 
> "> S/.4,500", "S/.1500 - S/.2500", "S/.500 - S/.1500", "S/.1500 - S/.2500", 
> "> S/.4,500", "> S/.4,500", "S/.500 - S/.1500", "S/.2500 - S/.3500", 
> "S/.1500 - S/.2500", "S/.3500 - S/.4500", "S/.500 - S/.1500", 
> "S/.500 - S/.1500", "S/.500 - S/.1500")), .Names = c("id", "marca", 
> "producto", "precio.antes", "precio.nuevo", "dif.precios", "dif.porcentual", 
> "pulgadas", "rangos"), class = "data.frame", row.names = c(NA, 
> -164L))
> 
> 
> 
> 
> 
> 
> 
> 
> 2015-10-10 11:55 GMT-05:00 Omar André Gonzáles Díaz <oma.gonzales at gmail.com>:
> Thank you very much to both of you. This information is very enlightening to me.
> 
> Cheers.
> 
> 
> 2015-10-10 1:11 GMT-05:00 Boris Steipe <boris.steipe at utoronto.ca>:
> David answered most of this. Just a two short notes inline.
> 
> 
> 
> 
> On Oct 10, 2015, at 12:38 AM, Omar André Gonzáles Díaz <oma.gonzales at gmail.com> wrote:
> 
> > David, Boris, so thankfull for your help. Both approaches are very good. I got this solve with David's help.
> >
> > I find very insteresting Bori's for loop. And I need a little help understanding the regex part on it.
> >
> > - The strsplit function: strsplit(ripley.tv$producto[i], "[^A-Z0-9-]+")
> >
> > I understand for this: split every row by a sequence of any number or letter or "-" that appears at leat once (+ operator).
> >
> > 1.- What does mena the "^" symbol? If you remove it, just appeare blanks.
> > 2.- Why is there the necessity of "+" after the closing "]"?
> >
> > 3.- How this:  ripley.tv$id[i] <- v[grep("[A-Z][0-9]", v)]
> >      Identifies also the IDs where "-" is present. Here the regex does not have the "-" included.
> 
> Yes. I am not matching the entire token here. Note there is no "+": The two character-class expressions match exactly one uppercase character adjacent to exactly one number. If this is found in a token, grep returns TRUE. It doesn't matter what else the token contains - the first regex already took care of removing everything that's not needed. The vector of FALSEs and a single TRUE that grep() returns goes inside the square brackets, and selects the token from v.
> 
> 
> 
> > Also, I notice that David used the "-" at the begining of the matching: [-A-Z0-9], without the "^" (stars with) at the beginning.
> 
> This can be very confusing about regular expressions: the same character can mean different things depending on where it is found. Between two characters in a character class expresssion, the hyphen means "range". Elsewhere it is a literal hyphen. David put his at the beginning, I had it at the end (in the first regex). Another tricky character is "?" which can mean 0,1 matches, or turn "greedy" matching off...
> 
> Online regex testers are invaluable to develop a regex - one I frequently use is regexpal.com
> 
> Cheers,
> B.
> 
> 
> >
> > I would appreciate a response from you, gentlemen.
> >
> > Thanks again.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > 2015-10-09 18:32 GMT-05:00 David Winsemius <dwinsemius at comcast.net>:
> >
> > On Oct 9, 2015, at 4:21 PM, Boris Steipe wrote:
> >
> > > I think you are going into the wrong direction here and this is a classical example of what we mean by "technical debt" of code. Rather than tell to your regular expression what you are looking for, you are handling special cases with redundant code. This is ugly, brittle and impossible to maintain.
> > >
> > > Respect to you that you have recognized this.
> > >
> > >
> > > The solution is rather simple:
> > >
> > > A) Isolate tokens. Your IDs contain only a limited set of characters. Split your strings along the characters that are not found in IDs to isolate candidate tokens, place them into a vector.
> > >
> > > B) Evaluate your tokens: as far as I can see IDs all contain letters AND numbers. This is a unique characteristic. Thus it is sufficient to grep for a letter/number pair in a token to identify it as an ID.
> > >
> > > Should you ever find a need to accommodate differently formed IDs, there are only two, well defined places with clearly delegated roles where changes might be needed.
> > >
> > > Here is the code:
> > >
> > > for (i in 1:nrow(ripley.tv)) {
> > >       v <- unlist(strsplit(ripley.tv$producto[i], "[^A-Z0-9-]+")) # isolate tokens
> > >       ripley.tv$id[i] <- v[grep("[A-Z][0-9]", v)]  # identify IDs and store
> > > }
> >
> > That logic actually simplifies the regex strategy as well:
> >
> >  sub("(.*[ \n])([-A-Z0-9]{6,12})(.*)", "\\2",
> >  ripley.tv$producto,
> >  ignore.case = T)
> >
> >
> > Almost succeeds, with a few all-character words, but if you require one number in the middle you get full results:
> >
> >  sub("(.*[ \n])([-A-Z0-9]{3,6}[0-9][-A-Z0-9]{2,6})(.*)", "\\2",
> >  ripley.tv$producto,
> >  ignore.case = T)
> >
> >  [1] "48J6400"     "40J5300"     "TC-40CS600L" "LE28F6600"   "LE40K5000N"
> >  [6] "LE32B7000"   "LE32K5000N"  "LE55B8000"   "LE40B8000"   "LE24B8000"
> > [11] "TC-42AS610"  "LE50K5000N"  "40JU6500"    "48JU6500"    "50JU6500"
> > [16] "55JS9000"    "55JU6500"    "55JU6700"    "55JU7500"    "65JS9000"
> > [21] "65JU6500"    "65JU7500"    "75JU6500"    "40LF6350"    "42LF6400"
> > [26] "42LF6450"    "49LF6450"    "LF6400"      "43UF6750"    "49UF6750"
> > [31] "UF6900"      "49UF7700"    "49UF8500"    "55UF7700"    "65UF7700"
> > [36] "55UF8500"    "TC-55CX640W" "TC-50CX640W" "70UF7700"    "UG8700"
> > [41] "LF6350"      "KDL-50FA95C" "KDL50W805C"  "KDL-40R354B" "40J5500"
> > [46] "50J5500"     "32JH4005"    "50J5300"     "48J5300"     "40J6400"
> > [51] "KDL-32R505C" "KDL-40R555C" "55J6400"     "40JH5005"    "43LF5410"
> > [56] "32LF585B"    "49LF5900"    "KDL-65W855C" "UN48J6500"   "LE40F1551"
> > [61] "TC-32AS600L" "KDL-32R304B" "55EC9300"    "LE32W454F"   "58UF8300"
> > [66] "KDL-55W805C" "XBR-49X835C" "XBR-55X855C" "XBR-65X905C" "XBR-75X945C"
> > [71] "XBR-55X905C" "LC60UE30U"   "LC70UE30U"   "LC80UE30U"   "48J5500"
> > [76] "79UG8800"    "65UF9500"    "65UF8500"    "55UF9500"    "32J4300"
> > [81] "KDL-48R555C" "55UG8700"    "60UF8500"    "55LF6500"    "32LF550B"
> > [86] "47LB5610"    "TC-50AS600L" "XBR-55X855B" "LC70SQ17U"   "XBR-79X905B"
> > [91] "TC-40A400L"  "XBR-70X855B" "55HU8700"    "LE40D3142"   "TC-42AS650L"
> > [96] "LC70LE660"   "LE58D3140"
> >
> > >
> > >
> > >
> > > Cheers,
> > > Boris
> > >
> > >
> > >
> > > On Oct 9, 2015, at 5:48 PM, Omar André Gonzáles Díaz <oma.gonzales at gmail.com> wrote:
> > >
> > >>>>> ripley.tv <- structure(list(id = c(NA, NA, NA, NA, NA, NA, NA, NA,
> > >>> NA, NA,
> > >>>>> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> > >>>>> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> > >>>>> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> > >>>>> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> > >>>>> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> > >>>>> NA, NA, NA, NA, NA, NA, NA), marca = c("SAMSUNG", "SAMSUNG",
> > >>>>> "PANASONIC", "HAIER", "HAIER", "HAIER", "HAIER", "HAIER", "HAIER",
> > >>>>> "HAIER", "PANASONIC", "HAIER", "SAMSUNG", "SAMSUNG", "SAMSUNG",
> > >>>>> "SAMSUNG", "SAMSUNG", "SAMSUNG", "SAMSUNG", "SAMSUNG", "SAMSUNG",
> > >>>>> "SAMSUNG", "SAMSUNG", "LG", "LG", "LG", "LG", "LG", "LG", "LG",
> > >>>>> "LG", "LG", "LG", "LG", "LG", "LG", "PANASONIC", "PANASONIC",
> > >>>>> "LG", "LG", "LG", "SONY", "SONY", "SONY", "SAMSUNG", "SAMSUNG",
> > >>>>> "SAMSUNG", "SAMSUNG", "SAMSUNG", "SAMSUNG", "SONY", "SONY", "SAMSUNG",
> > >>>>> "SAMSUNG", "LG", "LG", "LG", "SONY", "SAMSUNG", "AOC", "PANASONIC",
> > >>>>> "SONY", "LG", "AOC", "LG", "SONY", "SONY", "SONY", "SONY", "SONY",
> > >>>>> "SONY", "SHARP", "SHARP", "SHARP", "SAMSUNG", "LG", "LG", "LG",
> > >>>>> "LG", "SAMSUNG", "SONY", "LG", "LG", "LG", "LG", "LG", "PANASONIC",
> > >>>>> "SONY", "SHARP", "SONY", "PANASONIC", "SONY", "SAMSUNG", "AOC",
> > >>>>> "PANASONIC", "SHARP", "AOC"), producto = c("SMART TV LED FHD 48\" 3D
> > >>>>> 48J6400",
> > >>>>> "SMART TV LED FHD 40\" 40J5300", "TV LED FULL HD 40'' TC-40CS600L",
> > >>>>> "TELEVISOR LED LE28F6600 28\"", "SMART TV 40\" HD LE40K5000N",
> > >>>>> "TV LED HD 32'' LE32B7000", "SMART TV  32'' LE32K5000N", "TV LED FHD
> > >>> 55\" -
> > >>>>> LE55B8000",
> > >>>>> "TV LED LE40B8000 FULL HD 40\"", "TV LE24B8000 LED HD 24\" - NEGRO",
> > >>>>> "TV LED FULL HD 42'' TC-42AS610", "TELEVISOR LED LE50K5000N 50\"",
> > >>>>> "SMART TV LED UHD 40\" 40JU6500", "SMART TV ULTRA HD 48'' 48JU6500",
> > >>>>> "SMART TV 50JU6500 LED UHD 50\" - NEGRO", "SMART TV ULTRA HD 55'' 3D
> > >>>>> 55JS9000",
> > >>>>> "SMART TV LED UHD 55\" 55JU6500", "SMART TV ULTRA HD 55'' 55JU6700",
> > >>>>> "SMART TV CURVO 55JU7500 LED UHD 55\" 3D - NEGRO", "SMART TV ULTRA HD
> > >>> 65''
> > >>>>> 3D 65JS9000",
> > >>>>> "SMART TV 65JU6500 LED UHD 65\"", "SMART TV ULTRA HD 65'' 65JU7500",
> > >>>>> "SMART TV LED UHD 75\" 75JU6500", "SMART TV WEB OS 40\" FULL HD
> > >>> 40LF6350",
> > >>>>> "SMART TV 3D 42\" FULL HD 42LF6400", "TV LED 42\" FULL HD CINEMA 3D
> > >>>>> 42LF6450",
> > >>>>> "TV LED 49\" FULL HD CINEMA 3D 49LF6450", "SMART TV LF6400 49\" FULL HD
> > >>>>> 3D",
> > >>>>> "TV 43UF6750 43\" ULTRA HD 4K", "TV 49\" ULTRA HD 4K 49UF6750",
> > >>>>> "TV LED 49\" ULTRA HD SMART UF6900", "SMART TV 49UF7700 49\" ULTRA HD
> > >>> 4K",
> > >>>>> "SMART TV 49UF8500 49\" ULTRA HD 4K 3D", "TV LED 55\" CINEMA 3D SMART
> > >>> TV
> > >>>>> 55UF7700",
> > >>>>> "SMART TV 65UF7700 65\" ULTRA HD 4K", "SMART TV 55UF8500 55\" ULTRA HD
> > >>> 4K
> > >>>>> 3D",
> > >>>>> "TV LED 55\" ULTRA HD 4K SMART TC-55CX640W", "TV LED 50\" ULTRA HD 4K
> > >>> SMART
> > >>>>> TC-50CX640W",
> > >>>>> "SMART TV 70UF7700 3D ULTRA HD 70\"", "TV LED CURVO 65\" ULTRA HD 4K
> > >>> CINEMA
> > >>>>> SMART UG8700",
> > >>>>> "TV LED 60\" FULL HD SMART LF6350", "SMART TV KDL-50FA95C 50\" FULL HD
> > >>> 3D",
> > >>>>> "SMART TV KDL50W805C 50\" FULL HD 3D", "TV LED 40\" FULL HD
> > >>> KDL-40R354B",
> > >>>>> "SMART TV LED FULL HD 40'' 40J5500", "SMART TV LED FULL HD 50''
> > >>> 50J5500",
> > >>>>> "TV LED HD 32'' 32JH4005", "SMART TV LED FULL HD 50\" 50J5300",
> > >>>>> "SMART TV LED 48\" FULL HD 48J5300", "SMART TV FULL HD 40'' 3D
> > >>> 40J6400",
> > >>>>> "TV LED 32\" HD SMART KDL-32R505C", "TV LED 40\" SMART FULL HD
> > >>> KDL-40R555C
> > >>>>> - NEGRO",
> > >>>>> "SMART TV LED FHD 55\" 3D 55J6400", "TV 40JH5005 LED FHD 40\" - NEGRO",
> > >>>>> "TV 43\" FULL HD 43LF5410", "SMART TV 32LF585B LED HD 32\" - BLANCO",
> > >>>>> "TV LED 49\" FULL HD SMART 49LF5900", "SMART TV 65\" FULL HD 3D
> > >>>>> KDL-65W855C",
> > >>>>> "SMART TV LED FHD 48\" UN48J6500", "TV LED 40\" FULL HD LE40F1551",
> > >>>>> "TV LED 32'' SMART HD TC-32AS600L", "TV LED 32'' HD KDL-32R304B",
> > >>>>> "TV OLED 55\" SMART 3D FULL HD 55EC9300 PLATEADO", "TV LED HD 32''
> > >>>>> LE32W454F",
> > >>>>> "TV LED 58\" ULTRA HD SMART 58UF8300", "TV LED 55\" FULL HD SMART 3D
> > >>>>> KDL-55W805C",
> > >>>>> "TV LED 49\" ULTRA HD 4K XBR-49X835C", "TV LED 55\" ULTRA HD 4K
> > >>>>> XBR-55X855C",
> > >>>>> "TV LED ULTRA DELGADO 55\" ULTRA HD 4K XBR-65X905C", "TV LED 75\"
> > >>> ULTRA HD
> > >>>>> 4K 3D XBR-75X945C",
> > >>>>> "TV LED ULTRA DELGADO 55\" ULTRA HD 4K XBR-55X905C", "SMART TV LED 60''
> > >>>>> ULTRA HD 4K LC60UE30U",
> > >>>>> "SMART TV LED 70'' ULTRA HD 4K LC70UE30U", "SMART TV LED 80'' ULTRA HD
> > >>> 4K
> > >>>>> LC80UE30U",
> > >>>>> "SMART TV LED FULL HD 48'' 48J5500", "SMART TV CURVO 79UG8800 79\"
> > >>> ULTRA HD
> > >>>>> 4K 3D",
> > >>>>> "SMART TV 65UF9500 65\" ULTRA HD 4K 3D", "SMART TV 65UF8500 65\" ULTRA
> > >>> HD
> > >>>>> 4K 3D",
> > >>>>> "SMART TV 55UF9500 55\" ULTRA HD 4K 3D", "SMART TV LED HD 32\"
> > >>> 32J4300",
> > >>>>> "TV LED 48\" SMART FULL HD KDL-48R555C - NEGRO", "SMART TV 55UG8700
> > >>> 55\"
> > >>>>> ULTRA HD 4K 3D",
> > >>>>> "SMART TV 60UF8500 60\" ULTRA HD 4K 3D", "SMART TV 55LF6500 55\" FULL
> > >>> HD
> > >>>>> 3D",
> > >>>>> "TV 32LF550B 32\" HD", "TV LED 47\" FULL HD 47LB5610", "TV LED FULL HD
> > >>> 50''
> > >>>>> TC-50AS600L",
> > >>>>> "TV SMART LED 55\" UHD 3D XBR-55X855B", "TV LED FULL HD 4K LC70SQ17U
> > >>> 70''",
> > >>>>> "TV LED SMART UHD 79\" XBR-79X905B", "TV LED FULL HD 40'' TC-40A400L",
> > >>>>> "TV LED SMART UHD 70\" XBR-70X855B", "SMART TV UHD 55'' 3D CURVO
> > >>> 55HU8700",
> > >>>>> "TV FULL HD LE40D3142 40\" - NEGRO", "TELEVISOR LED 42\" TC-42AS650L",
> > >>>>> "SMART TV LCD FHD 70\" LC70LE660", "TV LED FULL HD 58'' LE58D3140"
> > >>>>> ), pulgadas = c(48L, 40L, 40L, 28L, 40L, 32L, 32L, 55L, 40L,
> > >>>>> 24L, 42L, 50L, 40L, 48L, 50L, 55L, 55L, 55L, 55L, 65L, 65L, 65L,
> > >>>>> 75L, 40L, 42L, 42L, 49L, 49L, 43L, 49L, 49L, 49L, 49L, 55L, 65L,
> > >>>>> 55L, 55L, 50L, 70L, 65L, 60L, 50L, 50L, 40L, 40L, 50L, 32L, 50L,
> > >>>>> 48L, 40L, 32L, 40L, 55L, 40L, 43L, 32L, 49L, 65L, 48L, 40L, 32L,
> > >>>>> 32L, 55L, 32L, 58L, 55L, 49L, 55L, 55L, 75L, 55L, 60L, 70L, 80L,
> > >>>>> 48L, 79L, 65L, 65L, 55L, 32L, 48L, 55L, 60L, 55L, 32L, 47L, 50L,
> > >>>>> 55L, 70L, 79L, 40L, 70L, 55L, 40L, 42L, 70L, 58L), precio.antes =
> > >>> c(2799L,
> > >>>>> 1799L, 1699L, 599L, 1299L, 699L, 999L, 1999L, 999L, 499L, 1899L,
> > >>>>> 1799L, 2499L, 3999L, 3699L, 10999L, 4299L, 5499L, 6999L, 14999L,
> > >>>>> 8999L, 9999L, 14599L, 1999L, 2299L, 2299L, 2899L, 2999L, 2299L,
> > >>>>> 23992L, 3599L, 3799L, 4799L, 4999L, 8499L, 5999L, 4999L, 3999L,
> > >>>>> 11999L, 10999L, 4399L, 4499L, 3799L, 1399L, 2299L, 2799L, 999L,
> > >>>>> 2199L, 2299L, 2299L, 1299L, 1699L, 3499L, 1399L, 1549L, 1299L,
> > >>>>> 2399L, 6499L, 2999L, 999L, 1249L, 999L, 14999L, 799L, 5999L,
> > >>>>> 4499L, 4999L, 6499L, 12999L, 24999L, 8999L, 5999L, 7599L, 14999L,
> > >>>>> 2499L, 29999L, 13999L, 9999L, 9699L, 1299L, 2399L, 6999L, 7999L,
> > >>>>> 3699L, 999L, 1899L, 2999L, 7999L, 8499L, 24999L, 1399L, 13999L,
> > >>>>> 8499L, 999L, 2599L, 5799L, 2399L), precio.nuevo = c(2299, 1399,
> > >>>>> 1299, 549, 1099, 629, 799, 1699, 849, 439, 1499, 1549, 1759.2,
> > >>>>> 2099.3, 2309.3, 7699.3, 2799.3, 3639.3, 4899.3, 10499.3, 5109.3,
> > >>>>> 6999.3, 10219.3, 1399, 1599, 1599, 2199, 2199, 1299, 23992, 2299,
> > >>>>> 2299, 2899, 2999, 5999, 3899, 4999, 3999, 8999, 6999, 4099, 3999,
> > >>>>> 3499, 1299, 1799, 2399, 799, 2199, 1799, 1999, 1199, 1599, 2999,
> > >>>>> 1199, 1399, 1099, 1999, 5999, 2799, 999, 1199, 949, 7999, 799,
> > >>>>> 5299, 4299, 3999, 5999, 11999, 23999, 7999, 5699, 7599, 14499,
> > >>>>> 2399, 29999, 11999, 8999, 7499, 1099, 2199, 6599, 7099, 3599,
> > >>>>> 899, 1599, 2199, 4999, 6499, 19999, 1399, 9999, 5999, 999, 2599,
> > >>>>> 5699, 2399), dif.precios = c(500, 400, 400, 50, 200, 70, 200,
> > >>>>> 300, 150, 60, 400, 250, 739.8, 1899.7, 1389.7, 3299.7, 1499.7,
> > >>>>> 1859.7, 2099.7, 4499.7, 3889.7, 2999.7, 4379.7, 600, 700, 700,
> > >>>>> 700, 800, 1000, 0, 1300, 1500, 1900, 2000, 2500, 2100, 0, 0,
> > >>>>> 3000, 4000, 300, 500, 300, 100, 500, 400, 200, 0, 500, 300, 100,
> > >>>>> 100, 500, 200, 150, 200, 400, 500, 200, 0, 50, 50, 7000, 0, 700,
> > >>>>> 200, 1000, 500, 1000, 1000, 1000, 300, 0, 500, 100, 0, 2000,
> > >>>>> 1000, 2200, 200, 200, 400, 900, 100, 100, 300, 800, 3000, 2000,
> > >>>>> 5000, 0, 4000, 2500, 0, 0, 100, 0), dif.porcentual = c(17.86,
> > >>>>> 22.23, 23.54, 8.35, 15.4, 10.01, 20.02, 15.01, 15.02, 12.02,
> > >>>>> 21.06, 13.9, 29.6, 47.5, 37.57, 30, 34.88, 33.82, 30, 30, 43.22,
> > >>>>> 30, 30, 30.02, 30.45, 30.45, 24.15, 26.68, 43.5, 0, 36.12, 39.48,
> > >>>>> 39.59, 40.01, 29.42, 35.01, 0, 0, 25, 36.37, 6.82, 11.11, 7.9,
> > >>>>> 7.15, 21.75, 14.29, 20.02, 0, 21.75, 13.05, 7.7, 5.89, 14.29,
> > >>>>> 14.3, 9.68, 15.4, 16.67, 7.69, 6.67, 0, 4, 5.01, 46.67, 0, 11.67,
> > >>>>> 4.45, 20, 7.69, 7.69, 4, 11.11, 5, 0, 3.33, 4, 0, 14.29, 10,
> > >>>>> 22.68, 15.4, 8.34, 5.72, 11.25, 2.7, 10.01, 15.8, 26.68, 37.5,
> > >>>>> 23.53, 20, 0, 28.57, 29.42, 0, 0, 1.72, 0), rangos = c("S/.1500 -
> > >>> S/.2500",
> > >>>>> "S/.500 - S/.1500", "S/.500 - S/.1500", "S/.500 - S/.1500", "S/.500 -
> > >>>>> S/.1500",
> > >>>>> "S/.500 - S/.1500", "S/.500 - S/.1500", "S/.1500 - S/.2500",
> > >>>>> "S/.500 - S/.1500", "< S/.500", "S/.500 - S/.1500", "S/.1500 -
> > >>> S/.2500",
> > >>>>> "S/.1500 - S/.2500", "S/.1500 - S/.2500", "S/.1500 - S/.2500",
> > >>>>> "> S/.4,500", "S/.2500 - S/.3500", "S/.3500 - S/.4500", "> S/.4,500",
> > >>>>> "> S/.4,500", "> S/.4,500", "> S/.4,500", "> S/.4,500", "S/.500 -
> > >>> S/.1500",
> > >>>>> "S/.1500 - S/.2500", "S/.1500 - S/.2500", "S/.1500 - S/.2500",
> > >>>>> "S/.1500 - S/.2500", "S/.500 - S/.1500", "> S/.4,500", "S/.1500 -
> > >>> S/.2500",
> > >>>>> "S/.1500 - S/.2500", "S/.2500 - S/.3500", "S/.2500 - S/.3500",
> > >>>>> "> S/.4,500", "S/.3500 - S/.4500", "> S/.4,500", "S/.3500 - S/.4500",
> > >>>>> "> S/.4,500", "> S/.4,500", "S/.3500 - S/.4500", "S/.3500 - S/.4500",
> > >>>>> "S/.2500 - S/.3500", "S/.500 - S/.1500", "S/.1500 - S/.2500",
> > >>>>> "S/.1500 - S/.2500", "S/.500 - S/.1500", "S/.1500 - S/.2500",
> > >>>>> "S/.1500 - S/.2500", "S/.1500 - S/.2500", "S/.500 - S/.1500",
> > >>>>> "S/.1500 - S/.2500", "S/.2500 - S/.3500", "S/.500 - S/.1500",
> > >>>>> "S/.500 - S/.1500", "S/.500 - S/.1500", "S/.1500 - S/.2500",
> > >>>>> "> S/.4,500", "S/.2500 - S/.3500", "S/.500 - S/.1500", "S/.500 -
> > >>> S/.1500",
> > >>>>> "S/.500 - S/.1500", "> S/.4,500", "S/.500 - S/.1500", "> S/.4,500",
> > >>>>> "S/.3500 - S/.4500", "S/.3500 - S/.4500", "> S/.4,500", "> S/.4,500",
> > >>>>> "> S/.4,500", "> S/.4,500", "> S/.4,500", "> S/.4,500", "> S/.4,500",
> > >>>>> "S/.1500 - S/.2500", "> S/.4,500", "> S/.4,500", "> S/.4,500",
> > >>>>> "> S/.4,500", "S/.500 - S/.1500", "S/.1500 - S/.2500", "> S/.4,500",
> > >>>>> "> S/.4,500", "S/.3500 - S/.4500", "S/.500 - S/.1500", "S/.1500 -
> > >>> S/.2500",
> > >>>>> "S/.1500 - S/.2500", "> S/.4,500", "> S/.4,500", "> S/.4,500",
> > >>>>> "S/.500 - S/.1500", "> S/.4,500", "> S/.4,500", "S/.500 - S/.1500",
> > >>>>> "S/.2500 - S/.3500", "> S/.4,500", "S/.1500 - S/.2500")), .Names =
> > >>> c("id",
> > >>>>> "marca", "producto", "pulgadas", "precio.antes", "precio.nuevo",
> > >>>>> "dif.precios", "dif.porcentual", "rangos"), class = "data.frame",
> > >>> row.names
> > >>>>> = c(NA,
> > >>>>> -97L))
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > David Winsemius
> > Alameda, CA, USA
> >
> >
> 
> 
> 



More information about the R-help mailing list