統計分析:使用 R 入門 / R / 邏輯運算
外觀
< 統計分析:使用 R 入門 | R
當 訪問向量元素時,我們看到了如何使用一個簡單的涉及小於號 (
<) 的邏輯表示式來生成一個邏輯向量,然後可以用來選擇小於某個值的元素。這種型別的邏輯運算非常有用。除了 < 之外,還有其他一些比較運算子。以下是完整集合(有關更多詳細資訊,請參閱 ?Comparison)<(小於)和<=(小於或等於)>(大於)和>=(大於或等於)==(等於[1])和!=(不等於)
透過使用和、或和非組合邏輯向量,可以獲得更大的靈活性。例如,我們可能想找出哪些美國州的面積小於 10 000 或大於 100 000 平方英里,或者找出哪些州的面積大於 100 000 平方英里並且名稱較短。下面的程式碼展示瞭如何使用以下 R 符號來做到這一點
&(“和”)|(“或”)!(“非”)
當使用邏輯向量時,以下函式特別有用,如下所示
which()識別邏輯向量中哪些元素為TRUEsum()可用於給出邏輯向量中為TRUE的元素數量。這是因為sum()強制將其輸入轉換為數字,如果 TRUE 和 FALSE 被轉換為數字,它們將分別取值為 1 和 0。ifelse()根據邏輯向量中的每個元素是 TRUE 還是 FALSE 返回不同的值。具體來說,像ifelse(aLogicalVector, vectorT, vectorF)這樣的命令會接受aLogicalVector,並對每個為TRUE的元素返回vectorT中的對應元素,對每個為FALSE的元素返回vectorF中的對應元素。額外說明的是,如果vectorT或vectorF比aLogicalVector短,它們將透過重複擴充套件到正確的長度。
### In these examples, we'll reuse the American states data, especially the state names
### To remind yourself of them, you might want to look at the vector "state.names"
nchar(state.name) # nchar() returns the number of characters in strings of text ...
nchar(state.name) <= 6 #so this indicates which states have names of 6 letters or fewer
ShortName <- nchar(state.name) <= 6 #store this logical vector for future use
sum(ShortName) #With a logical vector, sum() tells us how many are TRUE (11 here)
which(ShortName) #These are the positions of the 11 elements which have short names
state.name[ShortName] #Use the index operator [] on the original vector to get the names
state.abb[ShortName] #Or even on other vectors (e.g. the 2 letter state abbreviations)
isSmall <- state.area < 10000 #Store a logical vector indicating states <10000 sq. miles
isHuge <- state.area > 100000 #And another for states >100000 square miles in area
sum(isSmall) #there are 8 "small" states
sum(isHuge) #coincidentally, there are also 8 "huge" states
state.name[isSmall | isHuge] # | means OR. So these are states which are small OR huge
state.name[isHuge & ShortName] # & means AND. So these are huge AND with a short name
state.name[isHuge & !ShortName]# ! means NOT. So these are huge and with a longer name
### Examples of ifelse() ###
ifelse(ShortName, state.name, state.abb) #mix short names with abbreviations for long ones
# (think of this as "*if* ShortName is TRUE then use state.name *else* use state.abb)
### Many functions in R increase input vectors to the correct size by duplication ###
ifelse(ShortName, state.name, "tooBIG") #A silly example: the 3rd argument is duplicated
size <- ifelse(isSmall, "small", "large") #A more useful example, for both 2nd & 3rd args
size #might be useful as an indicator variable?
ifelse(size=="large", ifelse(isHuge, "huge", "medium"), "small") #A more complex example> ### In these examples, we'll reuse the American states data, especially the state names > ### To remind yourself of them, you might want to look at the vector "state.names" > > nchar(state.name) # nchar() returns the number of characters in strings of text ... [1] 7 6 7 8 10 8 11 8 7 7 6 5 8 7 4 6 8 9 5 8 13 8 9 11 8 7 8 6 13 [30] 10 10 8 14 12 4 8 6 12 12 14 12 9 5 4 7 8 10 13 9 7 > nchar(state.name) <= 6 #so this indicates which states have names of 6 letters or fewer [1] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE [15] TRUE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE [29] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE [43] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE > ShortName <- nchar(state.name) <= 6 #store this logical vector for future use > sum(ShortName) #With a logical vector, sum() tells us how many are TRUE (11 here) [1] 11 > which(ShortName) #These are the positions of the 11 elements which have short names [1] 2 11 12 15 16 19 28 35 37 43 44 > state.name[ShortName] #Use the index operator [] on the original vector to get the names [1] "Alaska" "Hawaii" "Idaho" "Iowa" "Kansas" "Maine" "Nevada" "Ohio" "Oregon" [10] "Texas" "Utah" > state.abb[ShortName] #Or even on other vectors (e.g. the 2 letter state abbreviations) [1] "AK" "HI" "ID" "IA" "KS" "ME" "NV" "OH" "OR" "TX" "UT" > > isSmall <- state.area < 10000 #Store a logical vector indicating states <10000 sq. miles > isHuge <- state.area > 100000 #And another for states >100000 square miles in area > sum(isSmall) #there are 8 "small" states [1] 8 > sum(isHuge) #coincidentally, there are also 8 "huge" states [1] 8 > > state.name[isSmall | isHuge] # | means OR. So these are states which are small OR huge [1] "Alaska" "Arizona" "California" "Colorado" "Connecticut" [6] "Delaware" "Hawaii" "Massachusetts" "Montana" "Nevada" [11] "New Hampshire" "New Jersey" "New Mexico" "Rhode Island" "Texas" [16] "Vermont" > state.name[isHuge & ShortName] # & means AND. So these are huge AND with a short name [1] "Alaska" "Nevada" "Texas" > state.name[isHuge & !ShortName]# ! means NOT. So these are huge and with a longer name [1] "Arizona" "California" "Colorado" "Montana" "New Mexico" > > ### Examples of ifelse() ### > > ifelse(ShortName, state.name, state.abb) #mix short names with abbreviations for long ones [1] "AL" "Alaska" "AZ" "AR" "CA" "CO" "CT" "DE" "FL" [10] "GA" "Hawaii" "Idaho" "IL" "IN" "Iowa" "Kansas" "KY" "LA" [19] "Maine" "MD" "MA" "MI" "MN" "MS" "MO" "MT" "NE" [28] "Nevada" "NH" "NJ" "NM" "NY" "NC" "ND" "Ohio" "OK" [37] "Oregon" "PA" "RI" "SC" "SD" "TN" "Texas" "Utah" "VT" [46] "VA" "WA" "WV" "WI" "WY" > # (think of this as "*if* ShortName is TRUE then use state.name *else* use state.abb) > > ### Many functions in R increase input vectors to the correct size by duplication ### > ifelse(ShortName, state.name, "tooBIG") #A silly example: the 3rd argument is duplicated [1] "tooBIG" "Alaska" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG" [10] "tooBIG" "Hawaii" "Idaho" "tooBIG" "tooBIG" "Iowa" "Kansas" "tooBIG" "tooBIG" [19] "Maine" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG" [28] "Nevada" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "Ohio" "tooBIG" [37] "Oregon" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "Texas" "Utah" "tooBIG" [46] "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG" > size <- ifelse(isSmall, "small", "large") #A more useful example, for both 2nd & 3rd args > size #might be useful as an indicator variable? [1] "large" "large" "large" "large" "large" "large" "small" "small" "large" "large" [11] "small" "large" "large" "large" "large" "large" "large" "large" "large" "large" [21] "small" "large" "large" "large" "large" "large" "large" "large" "small" "small" [31] "large" "large" "large" "large" "large" "large" "large" "large" "small" "large" [41] "large" "large" "large" "large" "small" "large" "large" "large" "large" "large" > ifelse(size=="large", ifelse(isHuge, "huge", "medium"), "small") #A more complex example [1] "medium" "huge" "huge" "medium" "huge" "huge" "small" "small" "medium" [10] "medium" "small" "medium" "medium" "medium" "medium" "medium" "medium" "medium" [19] "medium" "medium" "small" "medium" "medium" "medium" "medium" "huge" "medium" [28] "huge" "small" "small" "huge" "medium" "medium" "medium" "medium" "medium" [37] "medium" "medium" "small" "medium" "medium" "medium" "huge" "medium" "small" [46] "medium" "medium" "medium" "medium" "medium"
如果你做過任何計算機程式設計,你可能更習慣在“if”語句的上下文中處理邏輯。雖然 R 也擁有一個
if() 語句,但在處理向量時,它並不那麼有用。例如,以下 R 表示式if(aVariable == 0) then print("zero") else print("not zero")
期望 aVariable 是一個單一數字:如果這個數字為 0,則輸出“zero”,如果它不是零則輸出“not zero”[2]。如果 aVariable 是一個包含 2 個或多個值的向量,則只有第一個元素有效:其他所有元素都被忽略[3]。也存在一些邏輯運算子,它們只考慮向量的第一個元素:這些是 &&(用於 AND)和 ||(用於 OR)[4]。