This web page was produced as an assignment for a course on
Statistical Analysis on Microarray Data at
Interesting things about the dataset:
1) In my description of the dataset under the problems section I list the slides that I think turned out poorly. Below are the 7 that I thought were particularly bad. (If the links do not work correctly, go here and select “display data” and then the links below should work)
*71062 – lots of red noise in background
*71073 – only green and yellow, no red
*71056 – lots of bright red noise in background
*71814 – background nearly saturated red
*71813 – lots of red noise in background
*71817 – all green and yellow except background red noise
*71068 – bright red background
Some of these images are so messed up or distorted that I am amazed people can get useful information from them. All of these slides are from 2006 and 2007, so it is surprising that the quality should be so low. Another thing is that the spots are much farther apart than other datasets we’ve seen.
2) Below I have made a chart with columns representing (from left to right)
number of Red spots brighter than the Red background
number of Red spots dimmer than the Red background
array number
experiment number (the 71xxx numbers)
number of Green spots brighter than Green background
number of Green spots dimmer than Green background
>
for(i in 1:34){
+ print(
+ c(
+
sum(full.dat$R[,i]>full.dat$Rb[,i]),
+
sum(full.dat$R[,i]<full.dat$Rb[,i]),
+ i,
+ dat.targets$Expt[i],
+
sum(full.dat$G[,i]>full.dat$Gb[,i]),
+ sum(full.dat$G[,i]<full.dat$Gb[,i])
+ )
+ )
+ }
[1] 908
628 1 71056 1456
74
[1] 1349
182 2 71057 1527
4
[1] 1388
131 3 71059 1504
18
[1] 1337
191 4 71061 1469
47
[1] 653
873 5 71062 1534
0
[1] 1328
203 6 71063
1531 1
[1] 764
765 7 71068 1513
13
[1] 1442
82 8 71072 1532
0
[1] 1502
21 9 71073 1532
0
[1] 1511
23 10 71074 1533
1
[1] 1437
46 11 71653 1505
11
[1] 1399
75 12 71654 1521
2
[1] 1501
15 13 71655 1527
1
[1] 1484
21 14 71656 1525
1
[1] 1468
20 15 71657 1525
2
[1] 1505
6 16 71658 1521
0
[1] 1500
17 17 71808 1503
3
[1] 1517
13 18 71813 1510
7
[1] 1473
60 19 71814 1508
18
[1] 1475
41 20 71815 1498
9
[1] 1479
17 21 71816 1515
2
[1] 1491
12 22 71817 1475
37
[1] 1456
39 23 71818 1520
0
[1] 1422
58 24 71819 1505
5
[1] 1443
16 25 71820 1513
2
[1] 1421
27 26 71821 1512
1
[1] 1456
22 27 71822 1506
0
[1] 1451
52 28 71823 1508
4
[1] 1503
20 29 71824 1487
13
[1] 1359
95 30 76166 1420
29
[1] 1439
50 31 76167 1425
24
[1] 1384
99 32 76168 1435
21
[1] 1471
21 33 76169 1419
21
[1] 1478
35 34 76170 1418
28
>
For arrays 1, 5, and 7 (experiments 71056, 71062, 71068) there is an overwhelmingly large number of red spots dimmer than red backgrounds (which is obvious if you click on the links to those experiments). (If the links do not work correctly, go here and select “display data” and then the links below should work) . NOTE: THIS DATA ALREADY THREW OUT LOTS OF “WORSE” DATA WHEN IT RAN MY R CODE.
3) Below is another chart, this one comparing the Red and Green intensities to each other. Columns represent:
Array number
Experiment number
Red spots brighter than Green spots
Green spots brighter than Red spots
>
for(i in 1:34){
+ print(
+ c(
+ i,
+ dat.targets$Expt[i],
+ sum(full.dat$R[,i]>full.dat$G[,i]),
+ sum(full.dat$R[,i]<full.dat$G[,i])
+ )
+ )
+ }
[1] 1 71056
1313 221
[1] 2 71057
1362 173
[1] 3 71059
234 1296
[1] 4 71061
581 942
[1] 5 71062
981 547
[1] 6 71063
1480 56
[1] 7 71068
1440 96
[1] 8 71072
1380 152
[1] 9 71073
1073 453
[1] 10 71074
1071 451
[1] 11 71653
1003 518
[1] 12 71654
896 627
[1] 13 71655
887 635
[1] 14 71656
873 650
[1] 15 71657
453 1065
[1] 16 71658
663 852
[1] 17 71808
298 1230
[1] 18 71813
117 1417
[1] 19 71814
242 1291
[1] 20 71815
108 1423
[1] 21 71816
341 1187
[1] 22 71817
102 1432
[1] 23 71818
425 1107
[1] 24 71819 649
868
[1] 25 71820
292 1240
[1] 26 71821
687 832
[1] 27 71822
421 1109
[1] 28 71823
501 1017
[1] 29 71824
366 1161
[1] 30 76166
227 1302
[1] 31 76167
480 1031
[1] 32 76168
406 1105
[1] 33 76169
744 754
[1] 34 76170
590 917
>
You would expect to see approximately equal values in the third and fourth columns since this would reflect equal numbers of red and green spots. But arrays 6 and 7 (experiments 71063 and 71068) have many more Red spots than Green spots and arrays 20 and 22 (experiments 71815 and 71817) have many more Green spots than Red spots. In fact, the arrays with more than 90% of spots of one color are arrays 6, 7, 8, 18, 20, 22. (If the links do not work correctly, go here and select “display data” and then the links below should work). NOTE: THIS DATA ALREADY THREW OUT LOTS OF “WORSE” DATA WHEN IT RAN MY R CODE.
This website was designed by Austen Head.
Email: austen [dot]
head [at]