This web page was produced as an assignment for a course on Statistical Analysis on Microarray Data at Pomona College (return to index)

Interesting things about the dataset:

1) In my description of the dataset under the problems section I list the slides that I think turned out poorly. Below are the 7 that I thought were particularly bad. (If the links do not work correctly, go here and select “display data” and then the links below should work)

*71062 – lots of red noise in background

*71073 – only green and yellow, no red

*71056 – lots of bright red noise in background

*71814 – background nearly saturated red

*71813 – lots of red noise in background

*71817 – all green and yellow except background red noise

*71068 – bright red background

Some of these images are so messed up or distorted that I am amazed people can get useful information from them. All of these slides are from 2006 and 2007, so it is surprising that the quality should be so low. Another thing is that the spots are much farther apart than other datasets we’ve seen.

2) Below I have made a chart with columns representing (from left to right)

number of Red spots brighter than the Red background

number of Red spots dimmer than the Red background

array number

experiment number (the 71xxx numbers)

number of Green spots brighter than Green background

number of Green spots dimmer than Green background

> for(i in 1:34){

+ print(

+ c(

+ sum(full.dat$R[,i]>full.dat$Rb[,i]),

+ sum(full.dat$R[,i]<full.dat$Rb[,i]),

+ i,

+ dat.targets$Expt[i],

+ sum(full.dat$G[,i]>full.dat$Gb[,i]),

+ sum(full.dat$G[,i]<full.dat$Gb[,i])

+ )

+ }

[1] 908 628 1 71056 1456 74

[1] 1349 182 2 71057 1527 4

[1] 1388 131 3 71059 1504 18

[1] 1337 191 4 71061 1469 47

[1] 653 873 5 71062 1534 0

[1] 1328 203 6 71063 1531 1

[1] 764 765 7 71068 1513 13

[1] 1442 82 8 71072 1532 0

[1] 1502 21 9 71073 1532 0

[1] 1511 23 10 71074 1533 1

[1] 1437 46 11 71653 1505 11

[1] 1399 75 12 71654 1521 2

[1] 1501 15 13 71655 1527 1

[1] 1484 21 14 71656 1525 1

[1] 1468 20 15 71657 1525 2

[1] 1505 6 16 71658 1521 0

[1] 1500 17 17 71808 1503 3

[1] 1517 13 18 71813 1510 7

[1] 1473 60 19 71814 1508 18

[1] 1475 41 20 71815 1498 9

[1] 1479 17 21 71816 1515 2

[1] 1491 12 22 71817 1475 37

[1] 1456 39 23 71818 1520 0

[1] 1422 58 24 71819 1505 5

[1] 1443 16 25 71820 1513 2

[1] 1421 27 26 71821 1512 1

[1] 1456 22 27 71822 1506 0

[1] 1451 52 28 71823 1508 4

[1] 1503 20 29 71824 1487 13

[1] 1359 95 30 76166 1420 29

[1] 1439 50 31 76167 1425 24

[1] 1384 99 32 76168 1435 21

[1] 1471 21 33 76169 1419 21

[1] 1478 35 34 76170 1418 28

For arrays 1, 5, and 7 (experiments 71056, 71062, 71068) there is an overwhelmingly large number of red spots dimmer than red backgrounds (which is obvious if you click on the links to those experiments). (If the links do not work correctly, go here and select “display data” and then the links below should work) . NOTE: THIS DATA ALREADY THREW OUT LOTS OF “WORSE” DATA WHEN IT RAN MY R CODE.

3) Below is another chart, this one comparing the Red and Green intensities to each other. Columns represent:

Array number

Experiment number

Red spots brighter than Green spots

Green spots brighter than Red spots

> for(i in 1:34){

+ print(

+ c(

+ i,

+ dat.targets$Expt[i],

+ sum(full.dat$R[,i]>full.dat$G[,i]),

+ sum(full.dat$R[,i]<full.dat$G[,i])

+ )

+ }

[1] 1 71056 1313 221

[1] 2 71057 1362 173

[1] 3 71059 234 1296

[1] 4 71061 581 942

[1] 5 71062 981 547

[1] 6 71063 1480 56

[1] 7 71068 1440 96

[1] 8 71072 1380 152

[1] 9 71073 1073 453

[1] 10 71074 1071 451

[1] 11 71653 1003 518

[1] 12 71654 896 627

[1] 13 71655 887 635

[1] 14 71656 873 650

[1] 15 71657 453 1065

[1] 16 71658 663 852

[1] 17 71808 298 1230

[1] 18 71813 117 1417

[1] 19 71814 242 1291

[1] 20 71815 108 1423

[1] 21 71816 341 1187

[1] 22 71817 102 1432

[1] 23 71818 425 1107

[1] 24 71819 649 868

[1] 25 71820 292 1240

[1] 26 71821 687 832

[1] 27 71822 421 1109

[1] 28 71823 501 1017

[1] 29 71824 366 1161

[1] 30 76166 227 1302

[1] 31 76167 480 1031

[1] 32 76168 406 1105

[1] 33 76169 744 754

[1] 34 76170 590 917

You would expect to see approximately equal values in the third and fourth columns since this would reflect equal numbers of red and green spots. But arrays 6 and 7 (experiments 71063 and 71068) have many more Red spots than Green spots and arrays 20 and 22 (experiments 71815 and 71817) have many more Green spots than Red spots. In fact, the arrays with more than 90% of spots of one color are arrays 6, 7, 8, 18, 20, 22. (If the links do not work correctly, go here and select “display data” and then the links below should work). NOTE: THIS DATA ALREADY THREW OUT LOTS OF “WORSE” DATA WHEN IT RAN MY R CODE.

This website was designed by Austen Head.

Email: austen [dot] head [at] pomona [dot] edu

Pomona Math Department