In statistics, data is the starting point for any analysis. It is therefore essential to know how to master operations such as importing and exporting data, changing types, identifying individuals with missing values, concatenating factor levels, etc. These different notions are therefore presented in this sheet, which aims to be both concise and satisfactory in practice. We focus on the dplyr package, an up-to-date tool to manipulate data.
2. dplyr package
dplyr is a powerful R-package to transform and summarize tabular data with rows and columns. We can find information here or in the cheatsheet.
We can manipulate data with classical matrix operation. For instance, we can obtain Longitude and Latitude for swimming pool with Longitude more than 153 with
piscines[piscines$Longitude>153,c("Longitude","Latitude")]
dplyr offers a more friendly syntax
library(dplyr)
select(piscines,Longitude,Latitude) %>% filter(Longitude>153)
Code is more efficient and easier to read.
dplyr contains a grammar with the following verbs:
- select() select columns (variables)
- filter() filter rows (individuals)
- arrange() re-order or arrange rows
- mutate() create new columns (new variables)
- summarise() summarise values (compute statistics summaries)
- group_by() allows for group operations in the “split-apply-combine” concept
2.1 select()
verb
It allows to select variables (columns):
select(df, VAR1, VAR2, ...)
For instance
coord <- select(piscines, Latitude, Longitude)
head(piscines, n=2)
head(coord, n=2)
We can use helper functions (begins_with, end_with, contains, matches) for more complex selections based on the name of the variables
coord <- select(piscines, ends_with("tude"))
head(coord, n=2)
2.2 mutate()
verb
It allows to create new variables in the dataset:
mutate(df, NEW.VAR = expression(VAR1, VAR2, ...))
For instance
df <- mutate(piscines, phrase=paste("Swimming pool", Name, "is located at the address", Address))
select(df,phrase)
We can create many variables
mutate(piscines,
phrase = paste("Swimming pool", Name, "is located at the address", Address),
unused = Longitude + Latitude
)
2.3 filter()
verb
It allows to select individuals (rows):
filter(df, TEST)
For instance
p1 <- filter(piscines, Longitude>153.02)
select(p1,Longitude)
or (we select swimming pool without “Pool” in the name)
df <- filter(piscines, !grepl("Pool", Name))
select(df,Name)
or
p2 <- filter(piscines, Longitude>153.02 | Latitude < -27.488)
p2 <- select(p2, Longitude, Latitude)
p2
We use slice to select individuals with index:
piscines %>% slice(5:8)
2.4 arrange()
verb
It allows to sort a dataset according to a variable:
arrange(df, VAR) #increasing sort
or
arrange(df, desc(VAR)) #decreasing sort
For instance
arrange(piscines, Longitude)
or
arrange(piscines, desc(Longitude))
2.5 summarise()
verb
More complex… It allows to define new datasets from the original dataset. New dataset often includes statistical summaries for the original dataset
mean()
;
median()
;
IQR()
;
var()
.
For instance
summarise(piscines,
mean_long = mean(Longitude),
med_lat = median(Latitude),
min_lat = min(Latitude),
sum_long = sum(Longitude)
)
You can also look at summarise_all, summarise_at (help(summarise_all)
). dplyr also proposes the following functions (very useful in statistics):
n()
: number of lines (individuals of the dataset)
n_distinct()
: number of distinct elements of a vector
fisrt()
and last()
: first and last element of a vector
summarise(piscines,n())
summarise(piscines,last(Longitude))
2.6 Operations on the verbs
We can of course concatenate verbs. For instance, we obtain the names of the swimming pool with the higher Longitude with:
p1 <- arrange(piscines,desc(Longitude)) #decreasing sort
summarise(p1,first(Name)) #extract the first
The pipe operator %>% makes the code more readable:
piscines %>% arrange(desc(Longitude)) %>% summarise(first(Name))
2.7 Group data with ‘Group_by’
group_by allows to apply operation for group of data. For instance, we want to compute mean longitudes for swimming pools of High and Low latitude (it does not make sense!). We first add a variable lat_disc which allows to discern high and low latitudes.
lat_mean <- summarise(piscines,mean(Latitude))
pisc1 <- mutate(piscines,lat_dis=factor(Latitude>as.numeric(lat_mean)))
levels(pisc1$lat_dis) <- c("Low","High")
We can now compute with group_by the mean longitudes for the 2 groups
group_by(pisc1,lat_dis) %>% summarise(mean_long=mean(Longitude))
Exercise 3
We consider the iris dataset
data(iris)
Answer to the following questions with dplyr
Select the variables Petal.Width and Species
Select individuals from Versicolor and Virginica groups (you can use symbol | for condition or)
Calculate the number of setosa with summarise
Calculate the mean of Petal Width for the versicolor specie
Add in the dataset the variable Sum_Petal which contains the sum of Petal.Width and Sepal.Width
Calcuate the mean and variance of Sepal Length for each group of Species (use group_by)
Exercise 4 (optional)
We consider the hflights dataset which contains informations about flights departing from Houston airports IAH (George Bush Intercontinental) and HOU (Houston Hobby):
library(hflights)
hflights <- tbl_df(hflights)
Variable Unique Carrier
provides a code which identify the carrier:
lut1 <- c("AA" = "American", "AS" = "Alaska", "B6" = "JetBlue", "CO" = "Continental",
"DL" = "Delta", "OO" = "SkyWest", "UA" = "United", "US" = "US_Airways",
"WN" = "Southwest", "EV" = "Atlantic_Southeast", "F9" = "Frontier",
"FL" = "AirTran", "MQ" = "American_Eagle", "XE" = "ExpressJet", "YV" = "Mesa")
We can also specify the variable CancellationCode
as follows:
lut2 <- c("A" = "carrier", "B" = "weather", "C" = "FFA", "D" = "security", "E" = "not cancelled")
- Modify the table
hflights
to better explain variables Unique Carrier
et CancellationCode
.
hflights$UniqueCarrier <- lut1[hflights$UniqueCarrier]
hflights$CancellationCode[hflights$CancellationCode==""] <- "Z"
hflights$CancellationCode <- lut2[hflights$CancellationCode]
Use several different ways to select variables from Origin to Cancelled
Select variables DepTime
, ArrTime
, ActualElapsedTime
, AirTime
, ArrDelay
and DepDelay
. Use helper function contains().
Add a variable ActualGroundTime which corresponds to ActualElapsedTime minus AirTime
Add a variable AverageSpeed (=Distance/AirTime
) and order the dataset according to this variable
Select flights to JFK
Calculate the number of fligths to JFK
- Create a summary of
hflights
which contains:
n
: the total number of flights;
n_dest
: the total number of destinations;
n_carrier
: the total numbers of carriers.
- Create a summary of flights from the carrier American which contains
- the total number of flights ;
- the total number of cancelled flights ;
- the mean of the variable
ArrDelay
(be carefull with the NA
).
- Calculate for each carrier
- the total number of flights
- the mean of the variable AirTime
Order the carriers according to the mean of departure delays
LS0tCnRpdGxlOiAiTWFuaXB1bGF0aW5nIGRhdGEiCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KCkluIHN0YXRpc3RpY3MsIGRhdGEgaXMgdGhlIHN0YXJ0aW5nIHBvaW50IGZvciBhbnkgYW5hbHlzaXMuIEl0IGlzIHRoZXJlZm9yZSBlc3NlbnRpYWwgdG8ga25vdyBob3cgdG8gbWFzdGVyIG9wZXJhdGlvbnMgc3VjaCBhcyBpbXBvcnRpbmcgYW5kIGV4cG9ydGluZyBkYXRhLCBjaGFuZ2luZyB0eXBlcywgaWRlbnRpZnlpbmcgaW5kaXZpZHVhbHMgd2l0aCBtaXNzaW5nIHZhbHVlcywgY29uY2F0ZW5hdGluZyBmYWN0b3IgbGV2ZWxzLCBldGMuIFRoZXNlIGRpZmZlcmVudCBub3Rpb25zIGFyZSB0aGVyZWZvcmUgcHJlc2VudGVkIGluIHRoaXMgc2hlZXQsIHdoaWNoIGFpbXMgdG8gYmUgYm90aCBjb25jaXNlIGFuZCBzYXRpc2ZhY3RvcnkgaW4gcHJhY3RpY2UuIFdlIGZvY3VzIG9uIHRoZSAqKmRwbHlyKiogcGFja2FnZSwgYW4gdXAtdG8tZGF0ZSB0b29sIHRvIG1hbmlwdWxhdGUgZGF0YS4gCgoKIyAxLiBSZWFkaW5nIGRhdGEgZnJvbSBmaWxlcwoKRnVuY3Rpb25zICoqcmVhZC50YWJsZSoqIGFuZCAqKnJlYWQuY3N2KiogYWxsb3dzIHRvIGltcG9ydCBkYXRhIGZyb20gKi50eHQqIG9yICouY3N2KiBmaWxlcy4KCmBgYHtyfQpwYXRoIDwtIGZpbGUucGF0aCgiLi4vREFUQSIsICJwaXNjaW5lcy5jc3YiKSAjZmlyc3Q6IGRpcmVjdG9yeSwgc2Vjb25kOiBmaWxlCnBpc2NpbmVzIDwtIHJlYWQuY3N2KHBhdGgpCmNsYXNzKHBpc2NpbmVzKQpzdW1tYXJ5KHBpc2NpbmVzKQpgYGAKVGhlIGFyZSBtYW55IGltcG9ydGFudCBvcHRpb25zIGluICoqcmVhZC5jc3YqKjoKCiogKipzZXAqKjogdGhlIGZpZWxkIHNlcGFyYXRpb24gY2hhcmFjdGVyCiogKipkZWMqKjogdGhlIGNoYXJhY3RlciB1c2VkIGZvciBkZWNpbWFsIHBvaW50cwoqICoqaGVhZGVyKio6IGEgbG9naWNhbCB2YWx1ZSBpbmRpY2F0aW5nIHdoZXRoZXIgdGhlIGZpbGUgY29udGFpbnMgdGhlIG5hbWVzIG9mIHRoZSB2YXJpYWJsZXMgYXMgaXRzIGZpcnN0IGxpbmUKKiAqKnJvdy5uYW1lcyoqOiBhIHZlY3RvciBvZiByb3cgbmFtZXMgKHRvIGlkZW50aWZ5IGluZGl2dWFscyBpZiBuZWVkZWQpCiogKipuYS5zdHJpbmdzKio6IGEgY2hhcmFjdGVyIHZlY3RvciBvZiBzdHJpbmdzIHdoaWNoIGFyZSB0byBiZSBpbnRlcnByZXRlZCBhcyBOQSB2YWx1ZXMuCiogLi4uCgojIyMgRXhlcmNpc2UgMQoKMS4gSW1wb3J0IHRoZSBkYXRhc2V0ICpteWRhdGEuY3N2KgoKCjIuIEltcG9ydCAqKmNvcnJlY3RseSoqIHRoZSBkYXRhc2V0ICpteWRhdGEuY3N2KiAodXNlICpzZXAqLCAqZGVjKiBhbmQgKnJvdy5uYW1lcyopCgoKMy4gSW1wb3J0IHRoZSBkYXRhc2V0ICpteWRhdGEyLmNzdioKCgo0LiBUaGlzIGRhdGFzZXQgY29udGFpbnMgbWlzc2luZyBkYXRhIChjb2xsZWN0ZWQgd2l0aCBhIGRvdCkuIFdlIG5lZWQgdG8gdXNlICoqbmEuc3RyaW5ncyoqIHRvIHRha2UgaW50byBhY2NvdW50IG9mIHRoaXMgc2l0dWF0aW9uLiBJbXBvcnQgY29ycmVjdGx5IHRoaXMgZGF0YXNldC4KCgo1LiBDYWxsIHRoZSBsZXZlbHMgb2YgKnNleCo6ICoqd29tZW4qKiBhbmQgKiptYW4qKiAodXNlICoqbGV2ZWxzKiopCgoKCioqcmVhZHIqKiBwYWNrYWdlIG1ha2VzIGl0IGVhc3kgdG8gcmVhZCBtYW55IHR5cGVzIG9mIHJlY3Rhbmd1bGFyIGRhdGEsIGluY2x1ZGluZyBjc3YsIHRzdiBhbmQgZml4ZWQgd2lkdGggZmlsZXMuIEl0IHVzZXMgKnJlYWRfY3N2KiBhbmQgKnJlYWRfdGFibGUqIGluc3RlYWQgb2YgKnJlYWQuY3N2KiBhbmQgKnJlYWQudGFibGUqLiBXZSBjYW4gYWxzbyB1c2UgdGhpcyBwYWNrYWdlIHdpdGggdGhlIG1lbnUgKipJbXBvcnQgRGF0YXNldCoqIG9mIGBSc3R1ZGlvYC4KCiMjIyBFeGVyY2lzZSAyClJlYWQgdGhlICpwaXNjaW5lLmNzdiogZmlsZSB3aXRoICoqcmVhZHIqKiAodXNlICoqcmVhZF9jc3YqKiBvciBjbGljayBvbiAqKkltcG9ydCBEYXRhc2V0KiopCgoKCiMgMi4gZHBseXIgcGFja2FnZQoKZHBseXIgaXMgYSBwb3dlcmZ1bCBSLXBhY2thZ2UgdG8gdHJhbnNmb3JtIGFuZCBzdW1tYXJpemUgdGFidWxhciBkYXRhIHdpdGggcm93cyBhbmQgY29sdW1ucy4gV2UgY2FuIGZpbmQgaW5mb3JtYXRpb24gW2hlcmVdKGh0dHBzOi8vc3BhcmsucnN0dWRpby5jb20vZHBseXIuaHRtbCkgb3IgaW4gdGhlIFtjaGVhdHNoZWV0XShodHRwczovL3d3dy5yc3R1ZGlvLmNvbS93cC1jb250ZW50L3VwbG9hZHMvMjAxNS8wMi9kYXRhLXdyYW5nbGluZy1jaGVhdHNoZWV0LnBkZikuCgpXZSBjYW4gbWFuaXB1bGF0ZSBkYXRhIHdpdGggY2xhc3NpY2FsIG1hdHJpeCBvcGVyYXRpb24uIEZvciBpbnN0YW5jZSwgd2UgY2FuIG9idGFpbiAqKkxvbmdpdHVkZSoqIGFuZCAqKkxhdGl0dWRlKiogZm9yIHN3aW1taW5nIHBvb2wgd2l0aCAqTG9uZ2l0dWRlKiBtb3JlIHRoYW4gMTUzIHdpdGgKYGBge3J9CnBpc2NpbmVzW3Bpc2NpbmVzJExvbmdpdHVkZT4xNTMsYygiTG9uZ2l0dWRlIiwiTGF0aXR1ZGUiKV0KYGBgCgoqKmRwbHlyKiogb2ZmZXJzIGEgbW9yZSBmcmllbmRseSBzeW50YXgKYGBge3J9CmxpYnJhcnkoZHBseXIpCnNlbGVjdChwaXNjaW5lcyxMb25naXR1ZGUsTGF0aXR1ZGUpICU+JSBmaWx0ZXIoTG9uZ2l0dWRlPjE1MykKYGBgCgpDb2RlIGlzIG1vcmUgZWZmaWNpZW50IGFuZCBlYXNpZXIgdG8gcmVhZC4KCioqZHBseXIqKiBjb250YWlucyBhIGdyYW1tYXIgd2l0aCB0aGUgZm9sbG93aW5nIHZlcmJzOgoKKiBzZWxlY3QoKQlzZWxlY3QgY29sdW1ucyAodmFyaWFibGVzKQoqIGZpbHRlcigpCWZpbHRlciByb3dzIChpbmRpdmlkdWFscykKKiBhcnJhbmdlKCkJcmUtb3JkZXIgb3IgYXJyYW5nZSByb3dzCiogbXV0YXRlKCkJY3JlYXRlIG5ldyBjb2x1bW5zIChuZXcgdmFyaWFibGVzKQoqIHN1bW1hcmlzZSgpCXN1bW1hcmlzZSB2YWx1ZXMgKGNvbXB1dGUgc3RhdGlzdGljcyBzdW1tYXJpZXMpCiogZ3JvdXBfYnkoKQlhbGxvd3MgZm9yIGdyb3VwIG9wZXJhdGlvbnMgaW4gdGhlIOKAnHNwbGl0LWFwcGx5LWNvbWJpbmXigJ0gY29uY2VwdAoKIyMgMi4xIGBzZWxlY3QoKWAgdmVyYgoKSXQgYWxsb3dzIHRvIHNlbGVjdCB2YXJpYWJsZXMgKGNvbHVtbnMpOgpgYGB7ciwgZXZhbD1GQUxTRSwgaW5jbHVkZT1UUlVFfQpzZWxlY3QoZGYsIFZBUjEsIFZBUjIsIC4uLikKYGBgCgpGb3IgaW5zdGFuY2UKYGBge3J9CmNvb3JkIDwtIHNlbGVjdChwaXNjaW5lcywgTGF0aXR1ZGUsIExvbmdpdHVkZSkKaGVhZChwaXNjaW5lcywgbj0yKQpoZWFkKGNvb3JkLCBuPTIpCmBgYAoKV2UgY2FuIHVzZSAqKmhlbHBlciBmdW5jdGlvbnMqKiAoKipiZWdpbnNfd2l0aCoqLCAqKmVuZF93aXRoKiosICoqY29udGFpbnMqKiwgKiptYXRjaGVzKiopIGZvciBtb3JlIGNvbXBsZXggc2VsZWN0aW9ucyBiYXNlZCBvbiB0aGUgbmFtZSBvZiB0aGUgdmFyaWFibGVzCgpgYGB7cn0KY29vcmQgPC0gc2VsZWN0KHBpc2NpbmVzLCBlbmRzX3dpdGgoInR1ZGUiKSkKaGVhZChjb29yZCwgbj0yKQpgYGAKCiMjIDIuMiBgbXV0YXRlKClgIHZlcmIKCkl0IGFsbG93cyB0byBjcmVhdGUgbmV3IHZhcmlhYmxlcyBpbiB0aGUgZGF0YXNldDoKCmBgYHtyLCBldmFsPUZBTFNFLCBpbmNsdWRlPVRSVUV9Cm11dGF0ZShkZiwgTkVXLlZBUiA9IGV4cHJlc3Npb24oVkFSMSwgVkFSMiwgLi4uKSkKYGBgCkZvciBpbnN0YW5jZQpgYGB7cn0KZGYgPC0gbXV0YXRlKHBpc2NpbmVzLCBwaHJhc2U9cGFzdGUoIlN3aW1taW5nIHBvb2wiLCBOYW1lLCAiaXMgbG9jYXRlZCBhdCB0aGUgYWRkcmVzcyIsIEFkZHJlc3MpKQpzZWxlY3QoZGYscGhyYXNlKQpgYGAKV2UgY2FuIGNyZWF0ZSBtYW55IHZhcmlhYmxlcwpgYGB7cn0KbXV0YXRlKHBpc2NpbmVzLAogICAgICAgcGhyYXNlID0gcGFzdGUoIlN3aW1taW5nIHBvb2wiLCBOYW1lLCAiaXMgbG9jYXRlZCBhdCB0aGUgYWRkcmVzcyIsIEFkZHJlc3MpLAogICAgICAgdW51c2VkID0gTG9uZ2l0dWRlICsgTGF0aXR1ZGUKKQpgYGAKCgojIyAyLjMgYGZpbHRlcigpYCB2ZXJiCgpJdCBhbGxvd3MgdG8gc2VsZWN0IGluZGl2aWR1YWxzIChyb3dzKToKYGBge3IsIGV2YWw9RkFMU0UsIGluY2x1ZGU9VFJVRX0KZmlsdGVyKGRmLCBURVNUKQpgYGAKCkZvciBpbnN0YW5jZQpgYGB7cn0KcDEgPC0gZmlsdGVyKHBpc2NpbmVzLCBMb25naXR1ZGU+MTUzLjAyKQpzZWxlY3QocDEsTG9uZ2l0dWRlKQpgYGAKb3IgKHdlIHNlbGVjdCBzd2ltbWluZyBwb29sIHdpdGhvdXQgIlBvb2wiIGluIHRoZSBuYW1lKQpgYGB7cn0KZGYgPC0gZmlsdGVyKHBpc2NpbmVzLCAhZ3JlcGwoIlBvb2wiLCBOYW1lKSkKc2VsZWN0KGRmLE5hbWUpCmBgYApvcgoKYGBge3J9CnAyIDwtIGZpbHRlcihwaXNjaW5lcywgTG9uZ2l0dWRlPjE1My4wMiB8IExhdGl0dWRlIDwgLTI3LjQ4OCkKcDIgPC0gc2VsZWN0KHAyLCBMb25naXR1ZGUsIExhdGl0dWRlKQpwMgpgYGAKCldlIHVzZSAqKnNsaWNlKiogdG8gc2VsZWN0IGluZGl2aWR1YWxzIHdpdGggaW5kZXg6CmBgYHtyfQpwaXNjaW5lcyAlPiUgc2xpY2UoNTo4KQpgYGAKCgojIyAyLjQgYGFycmFuZ2UoKWAgdmVyYgoKSXQgYWxsb3dzIHRvIHNvcnQgYSBkYXRhc2V0IGFjY29yZGluZyB0byBhIHZhcmlhYmxlOgpgYGB7ciwgZXZhbD1GQUxTRSwgaW5jbHVkZT1UUlVFfQphcnJhbmdlKGRmLCBWQVIpICNpbmNyZWFzaW5nIHNvcnQKYGBgCm9yCmBgYHtyLCBldmFsPUZBTFNFLCBpbmNsdWRlPVRSVUV9CmFycmFuZ2UoZGYsIGRlc2MoVkFSKSkgI2RlY3JlYXNpbmcgc29ydApgYGAKRm9yIGluc3RhbmNlCmBgYHtyfQphcnJhbmdlKHBpc2NpbmVzLCBMb25naXR1ZGUpCmBgYApvcgpgYGB7cn0KYXJyYW5nZShwaXNjaW5lcywgZGVzYyhMb25naXR1ZGUpKQpgYGAKCiMjICAyLjUgYHN1bW1hcmlzZSgpYCB2ZXJiCgpNb3JlIGNvbXBsZXguLi4gSXQgYWxsb3dzIHRvIGRlZmluZSBuZXcgZGF0YXNldHMgZnJvbSB0aGUgb3JpZ2luYWwgZGF0YXNldC4gTmV3IGRhdGFzZXQgb2Z0ZW4gaW5jbHVkZXMgc3RhdGlzdGljYWwgc3VtbWFyaWVzIGZvciB0aGUgb3JpZ2luYWwgZGF0YXNldAoKMS4gYG1lYW4oKWAgOwoxLiBgbWVkaWFuKClgIDsKMS4gYElRUigpYCA7CjEuIGB2YXIoKWAuCgpGb3IgaW5zdGFuY2UKYGBge3J9CnN1bW1hcmlzZShwaXNjaW5lcywKICAgICAgICAgIG1lYW5fbG9uZyA9IG1lYW4oTG9uZ2l0dWRlKSwKICAgICAgICAgIG1lZF9sYXQgPSBtZWRpYW4oTGF0aXR1ZGUpLAogICAgICAgICAgbWluX2xhdCA9IG1pbihMYXRpdHVkZSksCiAgICAgICAgICBzdW1fbG9uZyA9IHN1bShMb25naXR1ZGUpCikKYGBgCgpZb3UgY2FuIGFsc28gbG9vayBhdCAqKnN1bW1hcmlzZV9hbGwqKiwgKipzdW1tYXJpc2VfYXQqKiAoYGhlbHAoc3VtbWFyaXNlX2FsbClgKS4gKipkcGx5cioqIGFsc28gcHJvcG9zZXMgdGhlIGZvbGxvd2luZyBmdW5jdGlvbnMgKHZlcnkgdXNlZnVsIGluIHN0YXRpc3RpY3MpOgoKMS4gYG4oKWA6IG51bWJlciBvZiBsaW5lcyAoaW5kaXZpZHVhbHMgb2YgdGhlIGRhdGFzZXQpCjEuIGBuX2Rpc3RpbmN0KClgOiBudW1iZXIgb2YgZGlzdGluY3QgZWxlbWVudHMgb2YgYSB2ZWN0b3IKMS4gYGZpc3J0KClgIGFuZCBgbGFzdCgpYDogZmlyc3QgYW5kIGxhc3QgZWxlbWVudCBvZiBhIHZlY3RvcgoKYGBge3J9CnN1bW1hcmlzZShwaXNjaW5lcyxuKCkpCnN1bW1hcmlzZShwaXNjaW5lcyxsYXN0KExvbmdpdHVkZSkpCmBgYAoKCgojIyAyLjYgT3BlcmF0aW9ucyBvbiB0aGUgdmVyYnMKCldlIGNhbiBvZiBjb3Vyc2UgY29uY2F0ZW5hdGUgdmVyYnMuIEZvciBpbnN0YW5jZSwgd2Ugb2J0YWluIHRoZSBuYW1lcyBvZiB0aGUgc3dpbW1pbmcgcG9vbCB3aXRoIHRoZSBoaWdoZXIgTG9uZ2l0dWRlIHdpdGg6CmBgYHtyfQpwMSA8LSBhcnJhbmdlKHBpc2NpbmVzLGRlc2MoTG9uZ2l0dWRlKSkgI2RlY3JlYXNpbmcgc29ydApzdW1tYXJpc2UocDEsZmlyc3QoTmFtZSkpICNleHRyYWN0IHRoZSBmaXJzdApgYGAKClRoZSAqKnBpcGUqKiBvcGVyYXRvciAqKiU+JSoqIG1ha2VzIHRoZSBjb2RlIG1vcmUgcmVhZGFibGU6CmBgYHtyfQpwaXNjaW5lcyAlPiUgYXJyYW5nZShkZXNjKExvbmdpdHVkZSkpICU+JSBzdW1tYXJpc2UoZmlyc3QoTmFtZSkpCmBgYAoKIyMgMi43IEdyb3VwIGRhdGEgd2l0aCAnR3JvdXBfYnknIAoKKipncm91cF9ieSoqIGFsbG93cyB0byBhcHBseSBvcGVyYXRpb24gZm9yIGdyb3VwIG9mIGRhdGEuIEZvciBpbnN0YW5jZSwgd2Ugd2FudCB0byBjb21wdXRlIG1lYW4gbG9uZ2l0dWRlcyBmb3Igc3dpbW1pbmcgcG9vbHMgb2YgSGlnaCBhbmQgTG93IGxhdGl0dWRlIChpdCBkb2VzIG5vdCBtYWtlIHNlbnNlISkuIFdlIGZpcnN0IGFkZCBhIHZhcmlhYmxlICoqbGF0X2Rpc2MqKiB3aGljaCBhbGxvd3MgdG8gZGlzY2VybiBoaWdoIGFuZCBsb3cgbGF0aXR1ZGVzLgoKYGBge3J9CmxhdF9tZWFuIDwtIHN1bW1hcmlzZShwaXNjaW5lcyxtZWFuKExhdGl0dWRlKSkKcGlzYzEgPC0gbXV0YXRlKHBpc2NpbmVzLGxhdF9kaXM9ZmFjdG9yKExhdGl0dWRlPmFzLm51bWVyaWMobGF0X21lYW4pKSkKbGV2ZWxzKHBpc2MxJGxhdF9kaXMpIDwtIGMoIkxvdyIsIkhpZ2giKQpgYGAKCldlIGNhbiBub3cgY29tcHV0ZSB3aXRoICoqZ3JvdXBfYnkqKiB0aGUgbWVhbiBsb25naXR1ZGVzIGZvciB0aGUgMiBncm91cHMKCmBgYHtyfQpncm91cF9ieShwaXNjMSxsYXRfZGlzKSAlPiUgc3VtbWFyaXNlKG1lYW5fbG9uZz1tZWFuKExvbmdpdHVkZSkpCmBgYAoKCiMjIyBFeGVyY2lzZSAzCgpXZSBjb25zaWRlciB0aGUgaXJpcyBkYXRhc2V0CmBgYHtyfQpkYXRhKGlyaXMpCmBgYAoKQW5zd2VyIHRvIHRoZSBmb2xsb3dpbmcgcXVlc3Rpb25zIHdpdGggKipkcGx5cioqCgoxLiBTZWxlY3QgdGhlIHZhcmlhYmxlcyAqKlBldGFsLldpZHRoKiogYW5kICoqU3BlY2llcyoqCgoyLiBTZWxlY3QgaW5kaXZpZHVhbHMgZnJvbSAqKlZlcnNpY29sb3IqKiBhbmQgKipWaXJnaW5pY2EqKiBncm91cHMgKHlvdSBjYW4gdXNlIHN5bWJvbCB8IGZvciBjb25kaXRpb24gKipvcioqKQoKMy4gQ2FsY3VsYXRlIHRoZSBudW1iZXIgb2YgKipzZXRvc2EqKiAgd2l0aCAqKnN1bW1hcmlzZSoqCgo0LiBDYWxjdWxhdGUgdGhlICoqbWVhbioqIG9mICpQZXRhbCBXaWR0aCogZm9yIHRoZSB2ZXJzaWNvbG9yIHNwZWNpZQoKNS4gQWRkIGluIHRoZSBkYXRhc2V0IHRoZSB2YXJpYWJsZSAqKlN1bV9QZXRhbCoqIHdoaWNoIGNvbnRhaW5zIHRoZSBzdW0gb2YgKipQZXRhbC5XaWR0aCoqIGFuZCAqKlNlcGFsLldpZHRoKioKCjYuIENhbGN1YXRlIHRoZSBtZWFuIGFuZCB2YXJpYW5jZSBvZiAqKlNlcGFsIExlbmd0aCoqIGZvciBlYWNoIGdyb3VwIG9mICoqU3BlY2llcyoqICh1c2UgKipncm91cF9ieSoqKQoKCiMjIyBFeGVyY2lzZSA0IChvcHRpb25hbCkKCldlIGNvbnNpZGVyIHRoZSAqKmhmbGlnaHRzKiogZGF0YXNldCB3aGljaCBjb250YWlucyBpbmZvcm1hdGlvbnMgYWJvdXQgZmxpZ2h0cyBkZXBhcnRpbmcgZnJvbSBIb3VzdG9uIGFpcnBvcnRzIElBSCAoR2VvcmdlIEJ1c2ggSW50ZXJjb250aW5lbnRhbCkgYW5kIEhPVSAoSG91c3RvbiBIb2JieSk6CgpgYGB7cn0KbGlicmFyeShoZmxpZ2h0cykKaGZsaWdodHMgPC0gdGJsX2RmKGhmbGlnaHRzKQpgYGAKClZhcmlhYmxlIGBVbmlxdWUgQ2FycmllcmAgcHJvdmlkZXMgYSBjb2RlIHdoaWNoIGlkZW50aWZ5IHRoZSBjYXJyaWVyOgpgYGB7cn0KbHV0MSA8LSBjKCJBQSIgPSAiQW1lcmljYW4iLCAiQVMiID0gIkFsYXNrYSIsICJCNiIgPSAiSmV0Qmx1ZSIsICJDTyIgPSAiQ29udGluZW50YWwiLCAKICAgICAgICAgIkRMIiA9ICJEZWx0YSIsICJPTyIgPSAiU2t5V2VzdCIsICJVQSIgPSAiVW5pdGVkIiwgIlVTIiA9ICJVU19BaXJ3YXlzIiwgCiAgICAgICAgICJXTiIgPSAiU291dGh3ZXN0IiwgIkVWIiA9ICJBdGxhbnRpY19Tb3V0aGVhc3QiLCAiRjkiID0gIkZyb250aWVyIiwgCiAgICAgICAgICJGTCIgPSAiQWlyVHJhbiIsICJNUSIgPSAiQW1lcmljYW5fRWFnbGUiLCAiWEUiID0gIkV4cHJlc3NKZXQiLCAiWVYiID0gIk1lc2EiKQpgYGAKV2UgY2FuIGFsc28gc3BlY2lmeSB0aGUgdmFyaWFibGUgYENhbmNlbGxhdGlvbkNvZGVgIGFzIGZvbGxvd3M6CmBgYHtyfQpsdXQyIDwtIGMoIkEiID0gImNhcnJpZXIiLCAiQiIgPSAid2VhdGhlciIsICJDIiA9ICJGRkEiLCAiRCIgPSAic2VjdXJpdHkiLCAiRSIgPSAibm90IGNhbmNlbGxlZCIpCmBgYAoKMS4gTW9kaWZ5IHRoZSB0YWJsZSBgaGZsaWdodHNgIHRvIGJldHRlciBleHBsYWluIHZhcmlhYmxlcyBgVW5pcXVlIENhcnJpZXJgIGV0IGBDYW5jZWxsYXRpb25Db2RlYC4KYGBge3J9CmhmbGlnaHRzJFVuaXF1ZUNhcnJpZXIgPC0gbHV0MVtoZmxpZ2h0cyRVbmlxdWVDYXJyaWVyXQpoZmxpZ2h0cyRDYW5jZWxsYXRpb25Db2RlW2hmbGlnaHRzJENhbmNlbGxhdGlvbkNvZGU9PSIiXSA8LSAiWiIKaGZsaWdodHMkQ2FuY2VsbGF0aW9uQ29kZSA8LSBsdXQyW2hmbGlnaHRzJENhbmNlbGxhdGlvbkNvZGVdCmBgYAoKMi4gVXNlIHNldmVyYWwgZGlmZmVyZW50IHdheXMgdG8gc2VsZWN0IHZhcmlhYmxlcyBmcm9tICoqT3JpZ2luKiogdG8gKipDYW5jZWxsZWQqKgoKCjMuIFNlbGVjdCB2YXJpYWJsZXMgYERlcFRpbWVgLCBgQXJyVGltZWAsIGBBY3R1YWxFbGFwc2VkVGltZWAsIGBBaXJUaW1lYCwgYEFyckRlbGF5YCBhbmQgYERlcERlbGF5YC4gVXNlICpoZWxwZXIgZnVuY3Rpb24qICoqY29udGFpbnMoKSoqLgoKCgo0LiBBZGQgYSB2YXJpYWJsZSAqKkFjdHVhbEdyb3VuZFRpbWUqKiB3aGljaCBjb3JyZXNwb25kcyB0byAqQWN0dWFsRWxhcHNlZFRpbWUqIG1pbnVzICpBaXJUaW1lKgoKCjUuIEFkZCBhIHZhcmlhYmxlICoqQXZlcmFnZVNwZWVkKiogKGA9RGlzdGFuY2UvQWlyVGltZWApIGFuZCBvcmRlciB0aGUgZGF0YXNldCBhY2NvcmRpbmcgdG8gdGhpcyB2YXJpYWJsZQoKCjYuIFNlbGVjdCBmbGlnaHRzIHRvIEpGSwoKCgo3LiBDYWxjdWxhdGUgdGhlIG51bWJlciBvZiBmbGlndGhzIHRvIEpGSwoKCjguIENyZWF0ZSBhIHN1bW1hcnkgb2YgYGhmbGlnaHRzYCB3aGljaCBjb250YWluczoKICAgICsgYG5gIDogdGhlIHRvdGFsIG51bWJlciBvZiBmbGlnaHRzOwogICAgKyBgbl9kZXN0YDogdGhlIHRvdGFsIG51bWJlciBvZiBkZXN0aW5hdGlvbnM7CiAgICArIGBuX2NhcnJpZXJgIDogdGhlIHRvdGFsIG51bWJlcnMgb2YgY2FycmllcnMuCiAgICAKCjkuIENyZWF0ZSBhIHN1bW1hcnkgb2YgZmxpZ2h0cyBmcm9tIHRoZSBjYXJyaWVyICoqQW1lcmljYW4qKiB3aGljaCBjb250YWlucwogICAgKyB0aGUgdG90YWwgbnVtYmVyIG9mIGZsaWdodHMgOwogICAgKyB0aGUgdG90YWwgbnVtYmVyIG9mIGNhbmNlbGxlZCBmbGlnaHRzIDsKICAgICsgdGhlIG1lYW4gb2YgdGhlIHZhcmlhYmxlIGBBcnJEZWxheWAgKGJlIGNhcmVmdWxsIHdpdGggdGhlIGBOQWApLgogICAgCgoxMC4gQ2FsY3VsYXRlIGZvciBlYWNoIGNhcnJpZXIgCiAgICArIHRoZSB0b3RhbCBudW1iZXIgb2YgZmxpZ2h0cwogICAgKyB0aGUgbWVhbiBvZiB0aGUgdmFyaWFibGUgKipBaXJUaW1lKioKICAgIAoKMTEuIE9yZGVyIHRoZSBjYXJyaWVycyBhY2NvcmRpbmcgdG8gdGhlIG1lYW4gb2YgZGVwYXJ0dXJlIGRlbGF5cyAKCgoKCgo=