GCAT 2004 DATA Workshop with MAGIC Tool and SAM

Assigned Reading to be Completed Prior to Workshop


The GCAT 2004 DATA workshops will focus on using DNA microarrays to analyze gene expression.  If you do not know anything about DNA microarrays, then please begin by looking at the Flash animation of how an expression microarray is done, found at this web site:  http://www.bio.davidson.edu/courses/genomics/chip/chip.html.   This animation is a good tool to assign for classes who are going to use or interpret microarrays, so trying it yourself is a good idea even if you are already cognizant of microarrays.


The purpose of this reading assignment is to familiarize you with many of the terms and concepts commonly used in microarray papers. If possible, read the papers and answer the questions in this handout. If you are not able to do this in advance, we recommend that you do it during the workshop itself.  Additional readings will be assigned at the workshop.


Atul Butte. The Use and Analysis of Microarray Data. Nature Reviews Drug Discovery. 1: 951-960 (Dec., 2002).  

DeRisi, J. , Iyer, V, and Brown, P. O.  Exploring the metabolic and genetic control of gene expression on a global scale.  Science 278:680-686 (1997). 

News Feature: Claire Tilstone. Vital Statistics, Nature 424:610-612 (2003).   


Preparation and Hybridization of Microarrays

  1. What are the two major types of DNA printed onto microarrays?  Explain how they differ and how they impact what kinds of sequences can base pair with the printed DNA.



  1. A major company that prints commercial microarrays is Affymetrix.  Refer to the explanation of Affymetrix array printing in the article by Butte, and then explain why Affymetrix microarrays should to be less prone to false positive results.




  1. In the DeRisi et al. article, examine the pictures in Figure 1, Figure 2, and the graph in Figure 5.
    1. Explain why some of the dots were red, some yellow, and some green.




    1. Explain why the same small sections of each of the various microarrays are compared in Figure 2. 


      c. Explain how the arrays pictured could be used to obtain the data graphed in Figure 5.  In other words, what needs to happen in order to take the image you see in Figures 1 and 2 and convert it into the relative intensity represented by ‘Fold induction---fold repression’ plotted in Figure 5?  (Hint: refer to the Flash animation of microarrays at the site given above if you are unclear about what happens between the image and the quantitative data).  





    1. Look at the expression patterns of the genes that are plotted in Figure 5.  Each of the graphs B-F shows a group of genes that have the same ‘pattern of expression’.  In words, describe each individual graph’s pattern of expression.  For example, Figure 5B shows a group of genes that is constant in expression until 19 hours and then increases dramatically.





    1. Note that the individual genes represented in graph 5C (and some of the other graphs as well) are not behaving absolutely identically.  Try to define two subgroups of the genes in graph 5C.  This process illustrates the type of judgment calls that must be made in sorting genes into expression groups.





    1. In an expression file from multiple experiments such as the files we will be analyzing, each microarray is represented by only one column of data and each row contains data on one gene from multiple arrays.  Look at Table 1 in DeRisi et al. and figure out which of the columns in that table is output from an array.  Explain how it relates to what  has been measured from the array. 




  1. In order to compare two or more different microarrays, the investigators will usually normalize the data from each one.  Define “normalize”.






  1. What are some of the known sources of noise in DNA microarray experiments as cited in both Butte and Tilstone papers?






6.  What is the difference between supervised and unsupervised analysis?


7.  List the different supervised and unsupervised methods described in the Butte paper.







8.  List 4 aspects you need to consider when measuring gene expression according to Butte.




9. List two major caveats to measurements of gene expression (things to be careful to avoid)

according to Butte:




Unsupervised Analysis

    10.  Referring to ‘Unsupervised Analysis’ in Butte, define these terms:

a. Feature Determination –




b. Cluster Determination –



c. Network Determination –




d. Dissimilarity – Can you reword the paper’s definition of dissimilarity without using a

form of the word “similarity”?




e. Clustering



   11.  Name some of the types of clustering that are being used, as cited in Butte and in Tilstone.



   12. Using the Tilstone article, identify statistical methods other than those involved  in the grouping of genes with similar expression patterns that have been used to examine microarray data.


13.    From the Tilstone article, why is it considered so important to replicate microarrays?




14.    From Tilstone and Butte, what is a false positive in a set of microarray data?  Are many of these expected?




15.    What does it mean to get a ‘significant result’ from a microarray experimental analysis?