首页 > 技术 > R语言绘图简单教程

R语言绘图简单教程

没选到数学就来搞搞统计。。

统计教授还是很有dalao风范的,Stanford本科+Oxford PHD。。

主要来自课内笔记和lab

绘图使用ggplot2库

Basic

一些比较基本的操作。首先下载ggthemes库,每次画图后+ theme_XXX()加上主题。也可以用theme_set(theme_XXX())设置缺省主题。后文都使用theme_set(theme_economist_white())

其次,有个比较辣鸡的图片合并库ggpubr,ggarrange(figure1, figure2, nrow=2, ncol=1)。为何辣鸡呢?因为x轴y轴的刻度对不上。后文有更好方法

WEEK1

基本操作,建议google或看文档,有点编程基础的都没问题。

WEEK2

主要讲了讲histogram,boxplot(抱歉真不知道中文怎么说

Historgram

Basic

ggplot(titanic_survival, aes(x = age)) + geom_histogram()

aes里面是x轴,是前面数据包里面的一个子项。

以后的改进可以在ggplot括号里面加,调整的是整个图像的参数,在后半部分加的是只变更histogram的参数。(因为后面可以再加统计图类型,显示在一个坐标系中,后面有介绍

(用的titanic基本数据

几个改进方向:(没有顺序

Binwidth

就是每个柱子的单位大小,是x轴的单位

ggplot(titanic_survival, aes(x = age)) + geom_histogram(binwidth = 5)

Label and Title

这个还是比较简单的,xlab为x轴,ylab为y轴,ggtitle为标题

标题居中:+ theme(plot.title = element_text(hjust = 0.5))

ggplot(titanic_survival, aes(x = age)) + geom_histogram(binwidth = 5) + xlab("Age") + ylab("Count") + ggtitle("Age of Titanic Survivals") + theme(plot.title = element_text(hjust = 0.5))

Color

没颜色真的很难受。有两个参数,fill和color,fill是填充,color是边框

ggplot(titanic_survival, aes(x = age)) + geom_histogram(binwidth = 5, color = "blue", fill = "white") + xlab("Age") + ylab("Count") + ggtitle("Age of Titanic Survivals") + theme(plot.title = element_text(hjust = 0.5))

Mean Line

加个中线更直观。用geom_vline(aes(xintercept = xx))

ggplot(titanic_survival, aes(x = age)) + geom_histogram(binwidth = 5, color = "blue", fill = "white") + xlab("Age") + ylab("Count") + ggtitle("Age of Titanic Survivals") + theme(plot.title = element_text(hjust = 0.5)) + geom_vline(aes(xintercept = mean(titanic_survival$age)), color = "yellow")

Density Plot

这个就仁者见仁了,感觉normal distribution的时候很直观。注意要改很多地方。。。

附一个作业里面挺正态的一个

大概就先这样了,以后学到了再补充。

update1 Outliers

我们先在数据里面加个outlier,画个histogram

ggplot(titanic_survival2, aes(x = age)) + geom_histogram(binwidth = 5, color = "blue", fill = "white") + xlab("Age") + ylab("Count") + ggtitle("Age of Titanic Survivals") + theme(plot.title = element_text(hjust = 0.5)) + geom_vline(aes(xintercept = mean(titanic_survival$age)), color = "yellow")

其实很简单,就一句geom_text(aes(label = ifelse(age > 100, as.character(v1),'')), y = 8) 这个导入出了一点问题,v1里面存的是名字,而且是factor,要用as改下;y表示纵坐标。实在不行就抄上改改条件

ggplot(titanic_survival2, aes(x = age)) + geom_histogram(binwidth = 5, color = "blue", fill = "white") + xlab("Age") + ylab("Count") + ggtitle("Age of Titanic Survivals") + theme(plot.title = element_text(hjust = 0.5)) + geom_vline(aes(xintercept = mean(titanic_survival$age)), color = "yellow") + geom_text(aes(label = ifelse(age > 100, as.character(v1),'')), y = 8)

果然是万能的dp逃过了一劫

update2 Gradual Change Color

学了个很秀的渐变画图方法。。先从这里开始吧

ggplot(titanic_survival, aes(x = age)) + geom_histogram(binwidth = 5, color = "blue", fill = "white") + xlab("Age") + ylab("Count") + ggtitle("Age of Titanic Survivals") + theme(plot.title = element_text(hjust = 0.5)) + geom_vline(aes(xintercept = mean(titanic_survival$age)), color = "yellow")

首先我们将histogram切成100块,加上渐变颜色。geom_histogram(binwidth = 5, aes(fill = cut(age, 100)))

ggplot(titanic_survival, aes(x = age)) + geom_histogram(binwidth = 5, aes(fill = cut(age, 100))) + xlab("Age") + ylab("Count") + ggtitle("Age of Titanic Survivals") + theme(plot.title = element_text(hjust = 0.5)) + geom_vline(aes(xintercept = mean(titanic_survival$age)), color = "yellow")

额。。ggplot给每个颜色都上了个图例。。用show.legend = F取消一下

ggplot(titanic_survival, aes(x = age)) + geom_histogram(binwidth = 5, aes(fill = cut(age, 100)), show.legend = F) + xlab("Age") + ylab("Count") + ggtitle("Age of Titanic Survivals") + theme(plot.title = element_text(hjust = 0.5)) + geom_vline(aes(xintercept = mean(titanic_survival$age)), color = "yellow")

是不是有点炫彩亮瞎狗眼的感觉。。。下面改进一下

1.加上alpha = x改变一下透明度。注意x是百分比,而且alpha最好在涂色之前加,之后可能会出现诡异的错误

ggplot(titanic_survival, aes(x = age)) + geom_histogram(binwidth = 5, alpha = 0.85, aes(fill = cut(age, 100)), show.legend = F) + xlab("Age") + ylab("Count") + ggtitle("Age of Titanic Survivals") + theme(plot.title = element_text(hjust = 0.5)) + geom_vline(aes(xintercept = mean(titanic_survival$age)), color = "yellow")

2.限制颜色出现的区间

感觉五彩的过于不正经,感觉限制到2个配色还好。scale_fill_discrete(h = c(x, y)),xy区间就多试试吧,也没啥好方法。。

ggplot(titanic_survival, aes(x = age)) + geom_histogram(binwidth = 5, alpha = 0.85, aes(fill = cut(age, 100)), show.legend = F) + scale_fill_discrete(h = c(200, 380)) + xlab("Age") + ylab("Count") + ggtitle("Age of Titanic Survivals") + theme(plot.title = element_text(hjust = 0.5)) + geom_vline(aes(xintercept = mean(titanic_survival$age)), color = "yellow")

3.改变浓度与亮度

scale_fill_discrete(h = c(200, 380), c = 120, l = 70) 浓度和亮度(chroma and luminance不知道翻译的对不对)分别对应c和l,个人感觉没啥区别。。但我也没啥美术功底

ggplot(titanic_survival, aes(x = age)) + geom_histogram(binwidth = 5, alpha = 0.85, aes(fill = cut(age, 100)), show.legend = F) + scale_fill_discrete(h = c(200, 380), c = 120, l = 70) + xlab("Age") + ylab("Count") + ggtitle("Age of Titanic Survivals") + theme(plot.title = element_text(hjust = 0.5)) + geom_vline(aes(xintercept = mean(titanic_survival$age)), color = "yellow")

附上作业里面感觉不错的两张

为了作业文档不超过50m压了下画质。。这样表示outliers还是很资瓷的

Boxplot

不会像上面讲的那么细了,建议按顺序观看

Basic

boxplot一般都是对比了,我就直接双变量了。。

ggplot(titanic_survival, aes(x = sex, y = age)) + geom_boxplot()

Color

同上,fill

ggplot(titanic_survival, aes(x = sex, y = age)) + geom_boxplot(fill = c("red", "blue"))

Label and Title

同上

ggplot(titanic_survival, aes(x = sex, y = age)) + geom_boxplot(fill = c("red", "blue")) + xlab("Gender") + ylab("Age") + ggtitle("Boxplot") + theme(plot.title = element_text(hjust = 0.5))

Outliers

感觉这个功能还不错

ggplot(titanic_survival, aes(x = sex, y = age)) + geom_boxplot(fill = c("red", "blue"), outlier.colour="red", outlier.shape=8) + xlab("Gender") + ylab("Age") + ggtitle("Boxplot") + theme(plot.title = element_text(hjust = 0.5))

Scatterplot

这个当然要用用烂的auto.mpg了hhhhh

Basic

用geom_points()

ggplot(auto.mpg, aes(x = weight, y = mpg)) + geom_point()

加上标题啥的

ggplot(auto.mpg, aes(x = weight, y = mpg)) + geom_point() + xlab("Weigth") + ylab("MPG") + ggtitle("Relationship Between MPG and Weight") + theme(plot.title = element_text(hjust = 0.5))

Color Size Shape

这三个基本上一模一样,color是颜色,size是大小,shape是形状。所以只讲color

这里讲下aes里面的color和外面的有什么不一样。。简单来说,外面就是全部变成一个颜色,里面就是搜索数据库里面的变量,如果有就染色,没有就新建一个

而且这里的染色是ggplot的缺省染色,比较奇怪(丑

ggplot(auto.mpg, aes(x = weight, y = mpg)) + geom_point(aes(color = factor(cylinders))) + xlab("Weigth") + ylab("MPG") + ggtitle("Relationship Between MPG and Weight") + theme(plot.title = element_text(hjust = 0.5))

现在有两个问题,一个是图例,一个是颜色,我们一个一个解决

图例的话查了很多东西,发现了一个奇怪的语句,就抄上吧 = =

ggplot(auto.mpg, aes(x = weight, y = mpg)) + geom_point(aes(color = factor(cylinders))) + xlab("Weigth") + ylab("MPG") + ggtitle("Relationship Between MPG and Weight") + theme(plot.title = element_text(hjust = 0.5)) + scale_colour_discrete(name = "Cylinders", breaks = c('3', '4', '5', '6', '8'), labels = c("Three", "Four", '5', '6', '8'))

应该还是比较好懂的,breaks里面放原来factor的东西,后面可以不改的但还是演示一下

Google一波。。指令竟然是同一个

ggplot(auto.mpg, aes(x = weight, y = mpg)) + geom_point(aes(color = factor(cylinders))) + xlab("Weigth") + ylab("MPG") + ggtitle("Relationship Between MPG and Weight") + theme(plot.title = element_text(hjust = 0.5)) +scale_color_manual(name = "Cylinders", breaks = c('3', '4', '5', '6', '8'), values = c("purple", "red", 'yellow', 'blue', 'green'))

(好像更丑了。。。

总结下:scale_color_manual(name = "图例名", breaks = c('3', '4', '5', '6', '8')<-factor里面的值且图例显示按照排列顺序, values = c("purple", "red", 'yellow', 'blue', 'green'))<-颜色,顺序和前面对应, labels = c("Three", "Four", '5', '6', '8'))<-每个图例的名称,顺序和前面对应

Smooth

这个是自带的拟合geom_smooth()

ggplot(auto.mpg, aes(x = weight, y = mpg)) + geom_point(aes(color = factor(cylinders))) + xlab("Weigth") + ylab("MPG") + ggtitle("Relationship Between MPG and Weight") + theme(plot.title = element_text(hjust = 0.5)) +scale_color_manual(name = "Cylinders", breaks = c('3', '4', '5', '6', '8'), values = c("purple", "red", 'yellow', 'blue', 'green')) + geom_smooth(color = "orange")

里面的几个比较有用的参数method = lm/glm/gam/loess,loess局部加权多项式模型,lm线性模型,另外几个一般用不到。默认为loess

还有个se = T/F表示是否要那圈灰色的东西,不演示了

ggplot(auto.mpg, aes(x = weight, y = mpg)) + geom_point(aes(color = factor(cylinders))) + xlab("Weigth") + ylab("MPG") + ggtitle("Relationship Between MPG and Weight") + theme(plot.title = element_text(hjust = 0.5)) +scale_color_manual(name = "Cylinders", breaks = c('3', '4', '5', '6', '8'), values = c("purple", "red", 'yellow', 'blue', 'green')) + geom_smooth(color = "orange", method = lm)

Wrap

这其实是个很强大的功能,只是在这个图上表现比较辣鸡facet_wrap(~XXX, ncol = 3)。不知道怎么表述,感受一下吧

ggplot(auto.mpg, aes(x = weight, y = mpg)) + geom_point(aes(color = factor(cylinders))) + xlab("Weigth") + ylab("MPG") + ggtitle("Relationship Between MPG and Weight") + theme(plot.title = element_text(hjust = 0.5)) +scale_color_manual(name = "Cylinders", breaks = c('3', '4', '5', '6', '8'), values = c("purple", "red", 'yellow', 'blue', 'green')) + geom_smooth(color = "orange", method = lm) + facet_wrap(~cylinders, ncol = 3)

emmm….看个正常点的吧,某次作业里的

所以只对大小比较相近的数据有效,不行就ggpubr吧


如果你觉的这篇文章不错,分享给朋友吧!

打开微信“扫一扫”,打开网页后点击屏幕右上角分享按钮

×