top of page
Search

Creating pretty heatmaps from RNA Sequencing normalised count data using pheatmap

  • kayleentay97
  • Aug 19, 2021
  • 4 min read

I have struggled with making heatmaps for a while, but I finally came up with the perfect code (by combining different tutorial sites together) to create a very pretty heatmap! Initially when using pheatmap, I wondered why a package whose name is pretty heatmap could give me such non-pretty heatmaps. But I soon learnt how to personalize the codes to best fit my research question and give me very pretty heatmaps. Here are some of my tips and tricks, as well as little points you have to look out for (but nobody mentions, because they assume everyone is a R expert).


Count data needs to be normalized before it can be plot into heatmap. If you are unsure about how to get your normalized count data, check out differential expression packages such as cufflinks, DESeq2, Limmavoom and edgeR. This bioconductor guide shows you how to analyze un-normalised count data using DESeq2 to give you normalised count data and differential expression data (http://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#count-matrix-input) that has been really helpful for me.


# we start with reading our count matrix file into RStudio, pheatmap only works on matrix or data frame

  • read.delim: Read it as a delimited file, if not R will not be able to tell it is a matrix file, and instead combine all your rows together. write.table works too.

  • header=TRUE: Indicate that your dataset contains a column title

  • sep="\t": Indicate that columns are separated by tab (if you are using excel files) If they are separated by comma you can use sep=","

  • row.names="genes": Similar to header, this indicates that you have row headers. genes is the title of my row headers (must be indicated in the count matrix file).

# as pheatmap only reads matrix files, we need to tell them it's a matrix data

  • as.matrix: Indicate it is a matrix data

# if the differences in your count data is not obvious in your heatmap, you can choose to use z-score instead. Z-score is calculated using the function minus mean and divide it by SD, so it helps make the differences in your heatmap more pronounced! Just apply the function to your data.

# now we create a dendrogram to identify which columns are similar to each other, this allows you to identify which samples have similar relationships. This is optional! pheatmap will still create a dendrogram automatically when making the heatmap. I am doing this extra step so I can indicate how I would like to cut my columns. This also gives you a nice dendrogram output. Alternatively you can indicate manually into pheatmap. We are using dendextend package.

  • install.packages('dendextend') will enable installation of dendextend package from bioconductor

  • method="average": There are a few ways to compute hierarchical clustering, average is the most commonly used method

# this next step is in preparation for my cutree step, where I indicated a separation on the dendrogram based on the 2 most different clusters.

  • k=2: A vector with the number of clusters the dendrogram should be cut into

# as I spotted a biological difference between the two clusters which were separated, I wanted to annotate the difference between the rows in my heatmap.

  • yes="downregulated",no="upregulated": You can change downregulated/ upregulated to represent your own biological question!

# now that we have our row names settled, we need the column (aka your sample) names! You can choose to categorize your technical replicates together, separately categorize your disease vs normal lines, etc etc. In this case, my first 6 rows (excluding the row name, remember we already told R that our row names have headers earlier) is Normal, whereas the next 9 rows is Disease. Next, we need to tell R that those categories in my_sample_col are describing the column names in our data!

# here comes the exciting part!! We are going to start making our heatmap. You will need pheatmap and RColorBrewer package, which can be downloaded from bioconductor.

# colors, colors and more colors. Pheatmap's default colors are ... sorry for being too blunt, very ugly. I love the navy white and red color palette from RColorBrewer, but you can choose anything you want! The world is your oyster (and why I love R so much, everything is customizable). Refer to (https://www.r-graph-gallery.com/38-rcolorbrewers-palettes.html) to select your colours

# colors again! remember the clusters that we did earlier? now it's time to color them. These hashtags are universal color hashtags, I love the user friendly interface from https://www.color-hex.com/. So tell them what you want your normal and disease colors to be, what you want your row clusters to be.

# this is my favourite part, the pheatmap creation itself! It's insanely customizable, below is my code which I customized for my own heatmap feel free to get creative and explore. Check out the usage https://www.rdocumentation.org/packages/pheatmap/versions/1.0.12/topics/pheatmap I will be describing a few of the codes which I felt was instrumental in making my heatmap look oh so pretty

  • annotation_names_col = FALSE: I didn't like the sample and gene column rows to be hanging around, so I easily removed them with this code

  • border_color="NA": For the longest time, I was wondering why my heatmap looked blocky, like a child's lego board. It took me a while to discover that just by the removal of borders, it went from baby to elegant!!

  • cutree_rows=2: As mentioned earlier during my dendrogram explanation, cutting the rows and columns allow your reader to better compartmentalize the differences in your heatmap and understand your story better!

  • angle_col=45: No reason, I thought a tilted column name looks nicer.

  • color=crp(50): The value indicates how many 'shades' you would like to separate your count data into. 50 is a good number for me.

# here is my heatmap! As the results have not been published yet, I have covered the sample and gene names, but hopefully this short tutorial will be helpful to some creating their very first heatmap! :)



 
 
 

Comentarios


Post: Blog2_Post
  • Facebook
  • Twitter
  • LinkedIn

©2021 by Rookie Coding with Kai Yi. Proudly created with Wix.com

bottom of page