This web page was produced as an assignment for a course on Statistical Analysis on Microarray Data at Pomona College (return to index)

 

Interesting things about the dataset:

 

1)         In my description of the dataset under the problems section I list the slides that I think turned out poorly.  Below are the 7 that I thought were particularly bad.  (If the links do not work correctly, go here and select “display data” and then the links below should work)

*71062 – lots of red noise in background

*71073 – only green and yellow, no red

*71056 – lots of bright red noise in background

*71814 – background nearly saturated red

*71813 – lots of red noise in background

*71817 – all green and yellow except background red noise

*71068 – bright red background

Some of these images are so messed up or distorted that I am amazed people can get useful information from them.  All of these slides are from 2006 and 2007, so it is surprising that the quality should be so low.  Another thing is that the spots are much farther apart than other datasets we’ve seen.

 

 

2)         Below I have made a chart with columns representing (from left to right)

number of Red spots brighter than the Red background

number of Red spots dimmer than the Red background

array number

experiment number (the 71xxx numbers)

number of Green spots brighter than Green background

number of Green spots dimmer than Green background

 

 

> for(i in 1:34){

+      print(

+        c(

+          sum(full.dat$R[,i]>full.dat$Rb[,i]),

+          sum(full.dat$R[,i]<full.dat$Rb[,i]),

+          i,

+          dat.targets$Expt[i],

+          sum(full.dat$G[,i]>full.dat$Gb[,i]),

+          sum(full.dat$G[,i]<full.dat$Gb[,i])

+         )

+      )

+ }

[1]   908   628     1 71056  1456    74

[1]  1349   182     2 71057  1527     4

[1]  1388   131     3 71059  1504    18

[1]  1337   191     4 71061  1469    47

[1]   653   873     5 71062  1534     0

[1]  1328   203     6 71063  1531     1

[1]   764   765     7 71068  1513    13

[1]  1442    82     8 71072  1532     0

[1]  1502    21     9 71073  1532     0

[1]  1511    23    10 71074  1533     1

[1]  1437    46    11 71653  1505    11

[1]  1399    75    12 71654  1521     2

[1]  1501    15    13 71655  1527     1

[1]  1484    21    14 71656  1525     1

[1]  1468    20    15 71657  1525     2

[1]  1505     6    16 71658  1521     0

[1]  1500    17    17 71808  1503     3

[1]  1517    13    18 71813  1510     7

[1]  1473    60    19 71814  1508    18

[1]  1475    41    20 71815  1498     9

[1]  1479    17    21 71816  1515     2

[1]  1491    12    22 71817  1475    37

[1]  1456    39    23 71818  1520     0

[1]  1422    58    24 71819  1505     5

[1]  1443    16    25 71820  1513     2

[1]  1421    27    26 71821  1512     1

[1]  1456    22    27 71822  1506     0

[1]  1451    52    28 71823  1508     4

[1]  1503    20    29 71824  1487    13

[1]  1359    95    30 76166  1420    29

[1]  1439    50    31 76167  1425    24

[1]  1384    99    32 76168  1435    21

[1]  1471    21    33 76169  1419    21

[1]  1478    35    34 76170  1418    28

> 

 

For arrays 1, 5, and 7 (experiments 71056, 71062, 71068) there is an overwhelmingly large number of red spots dimmer than red backgrounds (which is obvious if you click on the links to those experiments).  (If the links do not work correctly, go here and select “display data” and then the links below should work) .  NOTE: THIS DATA ALREADY THREW OUT LOTS OF “WORSE” DATA WHEN IT RAN MY R CODE.

 

 

3)         Below is another chart, this one comparing the Red and Green intensities to each other.  Columns represent:

                        Array number

                        Experiment number

                        Red spots brighter than Green spots

                        Green spots brighter than Red spots

 

> for(i in 1:34){

+      print(

+        c(

+          i,

+          dat.targets$Expt[i],

+          sum(full.dat$R[,i]>full.dat$G[,i]),

+          sum(full.dat$R[,i]<full.dat$G[,i])

+         )

+      )

+ }

[1]     1 71056  1313   221

[1]     2 71057  1362   173

[1]     3 71059   234  1296

[1]     4 71061   581   942

[1]     5 71062   981   547

[1]     6 71063  1480    56

[1]     7 71068  1440    96

[1]     8 71072  1380   152

[1]     9 71073  1073   453

[1]    10 71074  1071   451

[1]    11 71653  1003   518

[1]    12 71654   896   627

[1]    13 71655   887   635

[1]    14 71656   873   650

[1]    15 71657   453  1065

[1]    16 71658   663   852

[1]    17 71808   298  1230

[1]    18 71813   117  1417

[1]    19 71814   242  1291

[1]    20 71815   108  1423

[1]    21 71816   341  1187

[1]    22 71817   102  1432

[1]    23 71818   425  1107

[1]    24 71819   649   868

[1]    25 71820   292  1240

[1]    26 71821   687   832

[1]    27 71822   421  1109

[1]    28 71823   501  1017

[1]    29 71824   366  1161

[1]    30 76166   227  1302

[1]    31 76167   480  1031

[1]    32 76168   406  1105

[1]    33 76169   744   754

[1]    34 76170   590   917

> 

 

You would expect to see approximately equal values in the third and fourth columns since this would reflect equal numbers of red and green spots.  But arrays 6 and 7 (experiments 71063 and 71068) have many more Red spots than Green spots and arrays 20 and 22 (experiments 71815 and 71817) have many more Green spots than Red spots.  In fact, the arrays with more than 90% of spots of one color are arrays 6, 7, 8, 18, 20, 22.  (If the links do not work correctly, go here and select “display data” and then the links below should work).  NOTE: THIS DATA ALREADY THREW OUT LOTS OF “WORSE” DATA WHEN IT RAN MY R CODE.

 

 

 

 

 

 

This website was designed by Austen Head.

Email:     austen [dot] head [at] pomona [dot] edu

Pomona Math Department