This web page was produced as an assignment for a course on Statistical Analysis on Microarray Data at Pomona College (return to index)

NOTE THAT BEFORE BEGINNING THIS SECTION I HAD TO CHANGE THE “toReadIn.txt” FILE.

I have changed data info part also to include the change.

The only change is in the last two columns which I titled “Cy3” and “Cy5” to be able to use the R-package function modelMatrix() correctly to save me from some additional code.

In this section I am trying to determine which miRNA seem to most strongly distinguish between the several specific different types of tissue. Note that it might also be interesting to try to distinguish between healthy and unhealthy tissue but I decided not to do this because one thing that I thought was interesting in the study is that they realized that two of the samples that they used had originally been misdiagnosed. An ARMS was originally misdiagnosed as an ERMS (array 12 – in the data that was posted online the original researchers had already corrected this misdiagnosis so my arrays are correctly diagnosed), and a PRMS that was misdiagnosed as a GIST (array 22).

There are 1536 spots on each chip. Each miRNA is printed two times per chip (NOT randomly, but instead one print is exactly a Y Grid Coordinate within sector difference of 3). If two miRNA are in the same sector and their X Grid Coordinate (Xcoord) within sector is the same and their Ycoord is difference is 3, then it is actually the same miRNA.

################################

To distinguish ARMS and ERMS we can look at each of the 1536 spots individually and do a t-test on each one of those miRNA. There are 3 ARMS arrays (11, 12, 14) and 1 ERMS array (13).

So for the t-tests on spot j (where j=1,2,…,1536) we have the hypotheses:

H0: true mean expression of the miRNA at spot j for ERMS – true mean expression of the miRNA at spot j for ARMS = 0

Ha: true mean expression of the miRNA at spot j for ERMS – true mean expression of the miRNA at spot j for ARMS ≠ 0

Since we are performing 1536 t-tests, we do not want a high alpha like 0.10 because then even if the null hypothesis were true then we would still find 153 significant miRNA. So instead if we have alpha=0.01 and look at the adjusted p values of each of the 1536 t-tests, we see that there are 9 miRNA that are significant (this is still a very small number, but with so few ARMS and ERMS arrays it should not be too surprising). Below are the t-tests on miRNA which have the ten highest “log odds” (B values).

NOTE: the data that the researchers gathered from the arrays were generated by standardized software which automatically labeled the miRNA as “genes” instead of miRNA. Hence, below in the table, “gene” is actually miRNA.

Spot	Gene.Symbol	Gene.Name	Sequence.Type	X.Grid.Coordinate within.sector.	Y.Grid.Coordinate within.sector.	Sector	logFC	AveExpr	t	P.Value	adj.P.Val	B
884	NA	NA	ss_oligo	4	3	19	-3.40824	7.358119	-5.90269	3.13E-06	0.002443	4.582617	a
908	NA	NA	ss_oligo	4	6	19	-3.43729	7.312997	-5.80691	4.02E-06	0.002443	4.352503	a
197	NA	NA	ss_oligo	5	1	5	5.487415	8.443424	5.49623	9.04E-06	0.003554	3.60051	b
1156	NA	NA	ss_oligo	4	1	25	2.353148	8.462062	5.354102	1.17E-05	0.003554	3.351133	c
545	NA	NA	ss_oligo	1	3	12	4.429081	7.394094	5.238089	1.60E-05	0.003881	3.061908	d
194	NA	NA	ss_oligo	2	1	5	3.226374	6.517701	5.043222	3.31E-05	0.006495	2.40603
569	NA	NA	ss_oligo	1	6	12	4.278728	7.391754	4.921453	3.74E-05	0.006495	2.270308	d
1415	NA	NA	ss_oligo	7	3	30	-1.74033	6.74557	-4.72978	6.82E-05	0.009734	1.72199
221	NA	NA	ss_oligo	5	4	5	4.931103	8.429364	4.708998	7.20E-05	0.009734	1.670885	b
1180	NA	NA	ss_oligo	4	4	25	2.371478	8.385921	4.580213	9.36E-05	0.011175	1.416722	c

As I mentioned earlier, these micorarrays have spot duplicates. We can see that the top two B values for miRNA are actually the same miRNA (marked a in the last column that I added), just duplicates (similarly with the third and ninth, and similarly with the fourth and tenth, and similarly with the fifth and seventh most significant miRNA). That suggests that those miRNA which have both their spots significant are useful miRNA in distinguishing between ERMS and ARMS tumors.

Unfortunately in the data none of the data has a name or symbol, the researchers put that identifying information in a column named “Reporter Name” which does not show up using the standard functions in the limma package of R. I tried to redo my line of code

full.dat<-read.maimages(dat.targets$Name,source="smd",wt.fun=dat.filter)

so it would include the reporter name, but doing that without messing up later code seems to be much more difficult to figure out than I would have guessed. It might be easier to simply cross reference the spot numbers (above in the table) with the reporter name from one of the datasets. Every miRNA on each chip is “ss_oligo” or “unknown” so that knowing that it’s an “ss_oligo” is not very useful.

For the miRNA that I marked with a, b, d, both spots are statistically significant at the 0.01 level and the log odds are greater than 1, so there is at least one fold difference between the miRNA on ERMS and ARMS arrays (so there is probably enough of a difference between the means of the a, b, d, miRNA to be biologically significant). So for a, b, d, we reject the null hypothesis H0 and we also believe that they are biologically significant miRNA. This suggests that the a, b, d miRNA can help discriminate between ERMS and ARMS tumor tissue types.

Volcano Plot of the spots (recall 2 spots per miRNA) comparing ARMS and ERMS:

A volcano plot gives a visualization of the important information from the above table. The 10 highest points (on the Y axis) are the ones with the 10 highest B values (which are listed in the table above). I labeled the spots that I said are significant above. A volcano plot tells you what the log odds versus log fold change is for spots on the microarray. So look at the two points labeled “a” in the table: the logFC column gives the X axis (about -3.5) for both “a”s, the B column gives the Y axis (about 4.4). We expect to see very few points having a high log odds value where the log fold change is nearly 0 because when log fold change is near 0 that means there is not much biological significance at all, so we expect to see log odds to be low here.

################################

Doing nearly the exact same thing with PRMS and GIST with similar t-test hypotheses (replace ERMS with PRMS and replace ARMS with GIST) and following the same procedure we can get a table of the miRNA with the top ten log odds values (B values). There are 8 GIST arrays (1, 2, 5, 6, 21, 23, 24, 25) and 2 PRMS arrays (15, 16).

Spot	Gene.Symbol	Gene.Name	Sequence.Type	X.Grid.Coordinate within.sector.	Y.Grid.Coordinate within.sector.	Sector	logFC	AveExpr	t	P.Value	adj.P.Val	B
1137	NA	NA	ss_oligo	1	5	24	4.753428	8.845317	8.033913	1.22E-08	1.48E-05	9.815962	a
1113	NA	NA	ss_oligo	1	2	24	4.61066	8.841657	7.427744	5.37E-08	3.26E-05	8.43079	a
607	NA	NA	ss_oligo	7	4	13	-5.65734	11.39965	-5.92134	2.58E-06	0.000881	4.777492	b
583	NA	NA	ss_oligo	7	1	13	-5.57109	11.5787	-5.8774	2.90E-06	0.000881	4.667249	b
737	NA	NA	ss_oligo	1	3	16	-4.27701	7.475112	-5.46218	8.75E-06	0.001401	3.61841	c
761	NA	NA	ss_oligo	1	6	16	-4.37404	7.32241	-5.45809	8.85E-06	0.001401	3.608045	c
1277	NA	NA	ss_oligo	5	4	27	4.746537	12.11954	5.446828	9.12E-06	0.001401	3.579437	d
1253	NA	NA	ss_oligo	5	1	27	4.702163	12.03123	5.442931	9.22E-06	0.001401	3.569541	d
993	NA	NA	ss_oligo	1	5	21	-4.63833	11.54595	-5.09194	2.36E-05	0.003193	2.675746
545	NA	NA	ss_oligo	1	3	12	-2.84123	7.394094	-4.9079	3.88E-05	0.004422	2.206044

As with the comparison between the ERMS and ARMS we see that there are 4 miRNA where both spots have very high log odds (marked a through d above). There are more B values that are high in this comparison between GIST and PRMS than there were between ERMS and ARMS. This is not too surprising since there are a lot more arrays between the GIST and PRMS than there were between the ERMS and ARMS.

For the miRNA that are represented by the spots that I marked a, b, c, d, we can reject H0 and also see that the B values are large enough that their difference is biologically significant. There are probably other miRNA between PRMS and GIST that are both statistically significant and biologically significant, but I am only showing the top ten B values for the 1536 miRNA. This suggests that the a, b, c, d miRNA can help discriminate between the PRMS and GIST tumor tissue types.

Volcano Plot of the spots (recall 2 spots per miRNA) comparing PRMS and GIST:

I again marked a, b, c, d on the above plot corresponding to the miRNA mentioned above. The volcano plots shows that the two spots for each a, b, c, d are very close together which is what we expect to see since they are the same miRNA and there are enough samples of each of these to be relatively reliable. These do in fact look like they are the most significant points also which indicates that these are probably good miRNA to help distinguish PRMS and GIST tumors.

##############################################################################################

MODIFIED APRIL 11

I have since redone this section to combine spots for single miRNA. I am not going to do analysis again, but I am mainly doing this to compare limma to SAM for the following section.

ARMS vs ERMS

> dat.topTable1

ID logFC t P.Value adj.P.Val B

580 2.362313 5.215328 1.722599e-05 0.006261658 2.98417074

281 ambi_miR_13237 4.353904 5.155586 2.021610e-05 0.006261658 2.83704244

101 hsa_miR_380_5p 5.209259 5.098591 2.612653e-05 0.006261658 2.61102860

255 5.164235 4.496471 1.184716e-04 0.016822294 1.21026627

699 -6.471218 -4.541632 1.341436e-04 0.016822294 1.12733516

452 ambi_miR_10766 -3.422761 -4.433075 1.403808e-04 0.016822294 1.05417508

551 ambi_miR_10394 2.983464 4.303697 1.983493e-04 0.017607661 0.73631674

719 -1.760304 -4.264754 2.200562e-04 0.017607661 0.64085980

513 4.889119 4.264165 2.204019e-04 0.017607661 0.63941722

185 hsa_miR_202 -5.242502 -4.195862 2.643691e-04 0.019008139 0.47228601

360 ambi_miR_10133 3.128503 4.136507 3.095481e-04 0.020233193 0.32738866

605 hsa_miR_383 -1.160560 -4.077705 3.834236e-04 0.022973467 0.14020218

206 2.585193 3.971231 4.794928e-04 0.026519640 -0.07413443

541 hsa_miR_495 3.173087 3.903489 5.732123e-04 0.029187625 -0.23773842

561 -3.416367 -3.880511 6.089212e-04 0.029187625 -0.29308924

61 -1.628324 -3.737273 8.861101e-04 0.039819575 -0.63630603

755 -4.862694 -3.724741 9.578497e-04 0.040511409 -0.69883851

628 hsa_miR_505 3.205249 3.595408 1.280996e-03 0.051168668 -0.97272369

438 2.480831 3.436091 1.929671e-03 0.073022826 -1.34562404

608 hsa_miR_525 2.303134 3.416070 2.184385e-03 0.078528649 -1.44196244

57 hsa_miR_515_3p 1.578324 3.339619 2.466984e-03 0.084464818 -1.56854578

738 hsa_miR_155 1.124161 3.256397 3.243679e-03 0.106009330 -1.80089788

179 hsa_miR_519d 2.313546 3.183628 3.653858e-03 0.112321685 -1.92378538

627 hsa_miR_10b 3.316390 3.173309 3.749263e-03 0.112321685 -1.94703911

729 2.046726 3.174092 3.968903e-03 0.114145644 -1.98347916

PRMS vs GIST

> dat.topTable2

ID logFC t P.Value adj.P.Val B

561 4.682044 7.767644 2.409615e-08 1.732513e-05 9.1487042

295 -5.614216 -5.892516 2.840995e-06 1.021338e-03 4.6865056

377 hsa_miR_133b -4.325524 -5.452137 9.145727e-06 1.703807e-03 3.5837442

629 hsa_miR_517a 4.724350 5.438734 9.478757e-06 1.703807e-03 3.5499855

489 -4.537370 -4.995085 3.109170e-05 4.470987e-03 2.4285694

281 ambi_miR_13237 -2.768114 -4.787540 5.427438e-05 6.503880e-03 1.9028109

337 CONTROL_30 -4.356353 -4.493485 1.194229e-04 1.226644e-02 1.1593276

306 hsa_miR_410 -3.908199 -4.116466 3.264630e-04 2.934086e-02 0.2136053

602 -1.359085 -3.769650 8.142714e-04 5.949799e-02 -0.6420940

185 hsa_miR_202 3.219419 3.763478 8.275103e-04 5.949799e-02 -0.6571495

198 hsa_miR_522 2.717738 3.687496 1.008807e-03 6.593931e-02 -0.8419334

370 hsa_miR_128a 2.718155 3.364578 2.315514e-03 1.387379e-01 -1.6134855

274 -1.057676 -3.332621 2.511140e-03 1.388854e-01 -1.6884511

586 -1.351366 -3.099706 4.902866e-03 2.478820e-01 -2.2584592

359 ambi_miR_3121 2.380840 3.043620 5.171391e-03 2.478820e-01 -2.3527113

392 hsa_miR_199b 1.166790 3.021171 5.914288e-03 2.531927e-01 -2.4302649

731 -3.455706 -3.027163 6.428850e-03 2.531927e-01 -2.4993519

91 ambi_miR_9630 -0.926332 -2.956845 6.396282e-03 2.531927e-01 -2.5468041

699 3.043015 2.947477 7.042913e-03 2.531927e-01 -2.5733034

437 hsa_miR_148a -1.749784 -2.936506 6.720900e-03 2.531927e-01 -2.5919020

120 ambi_miR_11143 -3.151130 -2.827866 1.010229e-02 3.301613e-01 -2.8742912

588 -1.567868 -2.807455 9.173775e-03 3.140926e-01 -2.8743584

657 hsa_miR_377 -2.486712 -2.725752 1.113972e-02 3.482373e-01 -3.0497097

259 ambi_miR_11576 1.762352 2.686120 1.222980e-02 3.576332e-01 -3.1337435

492 -2.842451 -2.683084 1.253223e-02 3.576332e-01 -3.1496578

This website was designed by Austen Head.

Email: austen [dot] head [at] pomona [dot] edu

Pomona Math Department