This web page was produced as an assignment for a course on Statistical Analysis on Microarray Data at Pomona College (return to index)

Normalizing (etc) the data:

Arrays contained 668 probes spotted in duplicate. It included 328 known Human miRNAs, 113 Mouse miRNAs, 45 Rat miRNAs, 154 predicted Human miRNAs, and 28 control probes.

They used GenePix to measure the florescence.

“miRNA arrays were normalized and data was uploaded to the Stanford Microarray Database (http://genome-www5.stanford.edu/). To limit the measurement errors, only miRNA spots with a ratio of signal over background of at least 2.5 in either Cy3 or Cy5 channels were included. Further miRNA spots were filtered based on those where expression levels differed by at least fourfold in at least three arrays. Finally miRNA spots with >80% good data were selected. A total of 87 miRNAs passed the filtering criteria and were used for further anaysis.”

Now, keeping that quote from their paper in mind, review my interesting information about the data page. They kept the data as long as at least one of the channels had a ratio of signal over background >2.5. Look at this dataset 71814. Here the Green will clearly be greater than 2.5 times the Green background. But the information about the Reds looks almost useless.

Here is my R code which filters and normalizes in a different way. Presumably they are significantly more familiar with microarrays than I am and their filtering method makes sense for their data. Because I am trying to learn how to analyze their data (not trying to publish it), my filtering methods are based on my notions of what nice or reliable data looks like as a statistician.

This website was designed by Austen Head.

Email: austen [dot] head [at] pomona [dot] edu

Pomona Math Department