|
|
||||||||
1 Laboratory of Molecular Genetics, Department of Cell Biology, Institute for Molecular and Cellular Regulation, Gunma University, Gunma, Japan
2 Core Research for Evolutional Science and Technology (CREST), Japan Science and Technology Corporation (JST), Kawaguchi, Japan
3 Department of Diabetes and Endocrinology, Division of Molecule and Structure, Gifu University School of Medicine, Gifu, Japan
4 Laboratory of Peptide and Protein Research, Department of Molecular Physiology, Institute for Molecular and Cellular Regulation, Gunma University, Gunma, Japan
(Requests for offprints should be addressed to Yukio Horikawa, Department of Diabetes and Endocrinology, Gifu University School of Medicine, 1-1 Yanagido, Gifu-city, Gifu 500-1194, Japan; Email: yhorikaw{at}cc.gifu-u.ac.jp)
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
Recent genome projects demonstrated a similar number of 30 00040 000 genes in human and mouse chromosomes, including ~27 000 protein-encoding transcripts for which there was strong corroborating evidence and ~10 000 computationally derived genes with weak supporting evidence (International Human Genome Sequencing Consortium 2001, Venter et al. 2001, Mouse Genome Sequencing Consortium 2002). Comparison with the transcriptome revealed almost all of the human genes known to be expressed to have orthologues in the mouse genome. The other putatively novel genes in the genome were detected using computer algorithms for transcript prediction. To estimate the accuracy of the power of new gene detection, the results of the gene annotation done by the two human genome efforts were previously compared (Hogenesch et al. 2001). Surprisingly, although a similar number of the total genes was demonstrated, there is little agreement regarding the new genes predicted by the two projects, suggesting that a significant fraction of tissue-restricted transcripts for novel genes remain undiscovered, possibly due to limitations in the computer prediction methods.
As expression analysis of the genes in multiple organisms becomes a major focus in the new era of biology, functional genomics will rely largely on the vast sources of subsets of partial cDNA sequences from various tissues that have proven enormously valuable and are deposited as expressed sequence tags (ESTs) in the public databases. The Endocrine Pancreas Consortium has recently constructed human and mouse cDNA libraries from various conditions of endocrine pancreas and generated over 100 000 ESTs (Bernal-Mizrachi et al. 2003). We also have collected ~20 000 ESTs from human normal pancreatic islets and islet tumors, resulting in the identification of ~3000 new genes expressed in the islets (Takeda et al. 1993, Jin et al. 2003). Such systematic sequencing efforts complement each other and should improve the various methodologies including DNA microarray technology (Scearce et al. 2002) for monitoring differential gene expression in normal and disease states. In addition, the laboratory rat is an indispensable model organism of human diseases, providing a useful tool in experimental medicine and drug discovery. As various spontaneous diabetic rats such as the GK and OLETF rats and the experimental streptozotocin-induced diabetic rat are widely used in pancreatic islet studies, it is important to establish an additional source of rat expressed sequences. However, although ~26 millions of ESTs have so far been deposited in the database (dbEST release 031105), approximately 40% of which are derived from human and mouse, only 2.6% of the sequences are from rat, and none are from pancreatic islets except for the present deposition. In this study, toward elucidation of the entire transcriptome in rat pancreatic islets, we have made two cDNA libraries, one from rat normal pancreatic islets and the other from RINm5F tumor cells having undergone less differentiation (Gazdar et al. 1980, Philippe et al. 1987), and performed a large-scale collection of ESTs. Since a number of the genes are up- or down-regulated in different conditions, a collection of ESTs from these distinct cDNA sources should more effectively cover a wider spectrum of expressed genes, generating a larger pool of non-redundant sequences. In addition, since the insulin content of the less-differentiated RINm5F cells used in this study was previously found to be much lower than that of normal islets (Kayo et al. 1997), direct comparison of the expression profiles of the two cDNA libraries should facilitate the identification of the genes involved in insulin synthesis and secretion as well as in ß-cell differentiation and tumorigenesis.
| Materials and methods |
|---|
|
|
|---|
Pancreatic islets were prepared from male Wister rats by a collagenase digestion method as described previously (Ma et al. 1996). Briefly, under pentobarbital anesthesia, the pancreas was distended by an injection of 10 ml Hanks solution containing 0.3 mg/ml collagenase (type XI, Sigma-Aldrich, StLouis, MO, USA). Islets were separated by the Ficoll (Amersham, Piscataway, NJ, USA) density gradient method with four layers (27%, 23%, 20.5%, and 11% of Ficoll dissolved in Hanks solution). After centrifugation at 450 g for 15 min, pancreatic islets were concentrated at the interface between the 11% and 20.5% Ficoll layers. Islets were then harvested by a pick-up method under a stereomicroscope. Purity of the islets was estimated to be ~99% by counting the cells immunoreactive to insulin and glucagon antibodies after trypsin treatment of a fraction of the islets collected. The high purity also could be estimated by the frequency of cDNA for major exocrine molecules, such as
-amylase, in the entire islet ESTs identified, as described below.
Large-scale cDNA sequencing
Two unidirectional cDNA libraries were constructed in the Uni-ZAP XR vector (Stratagene, La Jolla, CA, USA) using mRNAs from rat normal pancreatic islets and RINm5F cells. A large set of plasmid DNAs for sequencing was prepared as described (Takeda et al. 1993, Jin et al. 2003). Briefly, the cDNA libraries were excised en masse from the
phage into phagemid particles using the ExAssist phage system (Stratagene), and subsequently transfected into E. coli SOLR (Stratagene) for conversion to plasmid forms. Plasmid DNAs were extracted from E. coli colonies randomly selected from LB-Amp plates using the Biomek 2000 mini-prep system (Beckman, Fullerton, CA, USA). Single-pass DNA sequencing from the 3'-end of the inserts was performed using a BigDye Terminator Cycle Sequencing FS Ready Reaction Kit and DNA Sequencer model 3700 (Applied Biosystems, Foster City, CA, USA). Vector sequences were removed from the results using Assembly LIGN software (Oxford Molecular Group PLC, Oxford, UK). Quality assessment of the sequences obtained was performed using PE Sequencing Analysis 3.3 software (Applied Biosystems).
Database analysis of rat pancreatic islet and RINm5F ESTs
We compared a total of ~40 000 sequences from rat pancreatic islet and RINm5F cells with non-redundant nucleotide and peptide sequences extracted in silico from databases at the National Center for Biotechnology Information (NCBI). Before comparisons, interspersed repetitive sequences such as LINEs were unmasked and removed from the pool using software RepeatMasker (http://ftp.genome.washington.edu/RM/RepeatMasker.html). To assemble sequences sharing a stretch of nucleotide identity, the LaboServer system (World fusion, Tokyo, Japan) was applied to make contigs. Representative sequences from each contig then were subjected to BLASTN analysis for sequence homology at nucleotide level against a merged database by the Kiroku program (World fusion). If a query sequence shared over 95% nucleotide identity and showed a score of more than 400 with any sequences in the database, they were grouped together. The clones without significant match to known sequences in the nucleotide database were re-sequenced from the other end to compare the sequences with those in the peptide database at NCBI using BLASTX program (Altschul et al. 1997), which conceptually translates the query sequence in all six reading frames for comparison. The ESTs identical or highly similar to functionally annotated genes were first classified into seven major categories according to the general functions of the proteins encoded, and then further classified into subcategories according to their specific functions.
Semi-quantitative RNA expression analysis
To ascertain the level of mRNA expression of the ESTs from rat normal islets and RINm-5F cells, real-time quantitative reverse transcriptase (RT)-PCR was carried out using an ABI PRISM 7900HT Sequence Detection System (Applied Biosystem). Total RNA was extracted from pooled islets isolated from normal rats and RINm5F cells, using an RNeasy Mini Preparation Kit (Qiagen, Valencia, CA, USA) according to the manufacturers instructions. TaqMan primers and probes were designed using Primer Express software purchased from Applied Biosystems. TaqMan reactions were performed in a reaction volume of 20 µl using components supplied in a TaqMan PCR reagent kit. Each reaction consisted of 10 µl TaqMan Universal Master Mix, 900 nM of each amplification primer, and 250 nM corresponding TaqMan probe. Each sample was run for an initial 2 min at 50 °C and 10 min at 95°C, followed by 40 cycles at 95 °C for 15 s and at 60 °C for 1 min. Amplification data were collected by the 7900HT Sequence Detector and analyzed using Sequence Detection System software. The RNA concentration was determined from the threshold cycle at which fluorescence is first detected, cycle number being inversely related to RNA concentration.
In situ hybridization
ESTs showing marked differences in frequency between islet and RINm5F cells were subjected to analysis of mRNA distribution in the pancreas by in situ hybridization. Paraffin embedded blocks and sections of normal rat pancreas for in situ hybridization (ISH) were obtained from GENOSTAFF, Inc. (Tokyo, Japan). The pancreases of male Wistar rats 8 weeks old (CREA Tokyo, Japan, Inc.) were dissected after perfusion, fixed by Tissue Fixative (GENOSTAFF, Cat No.STF-01), and embedded in paraffin by the proprietary procedures. ISH was performed with the Ventana HX system (Ventana Medical Systems, Inc., Tucson, AZ, USA). Entire EST inserts were amplified by PCR using ExTaq (TaKaRa, Kyoto, Japan) in a 50 µl reaction mixture using M13 forward and reverse primers. Amplification was performed as follows: 3 min at 94 °C for initial denaturation, 35 cycles of 94 °C denaturing for 30 s 60 °C annealing for 30 s, and 72 °C extension for 1 min, followed by a final extension at 72 °C for 10 min. Quality and quantity of the purified PCR product was confirmed by 1.2% agarose gel electrophoresis. Anti-sense and sense RNA probes were labeled using the T7/T3 digoxigenin RNA labeling kit (Roche Diagnostics, Indianapolis, IN, USA), according to the manufacturers instructions. Sections were pre-treated and hybridized with a Ventana RiboMap kit (Ventana Medical Systems) on the automated Ventana HX system Discovery. Detection of hybrids was performed with a digoxigenin nucleic acid detection kit (Boehringer Mannheim, Germany) following the manufacturers instructions. Sections were then dehydrated through an ethanol series (80, 90 and 100% ethanol, for 1 min each) and washed for 1 min in xylene before mounting in malinol mounting medium (Muto Pure Chemicals Ltd, Tokyo, Japan).
| Results and discussion |
|---|
|
|
|---|
Large-scale collection of ESTs from rat pancreatic islet and RINm5F cells
A total of 40 710 clones randomly selected from the two cDNA libraries were partially sequenced from the 3'-end. Sequences containing less than 1% ambiguous bases longer than 200 bp were subjected to BLASTN database search. Contaminated genomic sequences, e.g. repetitive sequences, (1967 clones) and mitochondrial DNAs (4633 clones), were removed from the pool of sequences, resulting in a collection of 34 110 ESTs comprising 22 310 known and 11 800 unknown sequences. Our previous study of 1000 ESTs from human pancreatic islets, the purity of which was estimated to be ~90% by microscopic examination and protein analysis, identified 13, 12, 6, and 9 ESTs for major exocrine genes for
-amylase, elastase, pancreatic lipase, and trypsinogen respectively (Takeda et al. 1993). Since only 2, 8, 2, and 5 clones for these exocrine genes respectively, were found in the ~20 000 EST sequences of this study, the possibility of contamination from exocrine cells appears to be quite negligible (~0.1%). This estimation of purity is consistent with that of the protein analysis described above. Because large-scale sequencing based on random isolation of clones generates high redundancy, clustering analysis was performed to assemble the sequences into non-redundant sequence groups. A total of 6030 and 6260 independent groups were obtained from pancreatic islet and RINm5F cells respectively, and the pattern of redundancy was similar between the two sources (Table 1
& 2
). Together, 10 406 non-redundant sequences comprising 4859 clusters of sequences and 5547 singletons were obtained representing 4078 known genes and 6328 unknown genes. Since only 1896 distinct genes (18%) were found to overlap in the normal islets and RINm5F cells, this large-scale sequencing using two distinct cDNA sources was quite effective in identifying a larger number of non-redundant sequences. Studies of the number of different mRNA sequences in a cell suggest that a typical higher eukaryotic cell synthesizes 10 000 to 20 000 different proteins (Alberts et al. 1994), so this approach covered at least 50% of the possible protein-coding genes. As in similar large-scale cDNA sequencing studies carried out in other tissues, about 50% of the clones obtained are derived from genes not functionally annotated. These unknown clones were re-sequenced from the 5'-end, and the 5512 clones sequenced successfully were also subjected to database search. As a result, 1404 sequences representing 502 distinct transcripts showed perfect identity to known genes, so the 3'-end sequences of these clones clearly are not contained in the cDNA sequences deposited in the nucleotide databases. These ESTs were then assigned to the known group. All representative EST sequences obtained from each cluster were deposited in the public database to be freely available to all researchers (DDBJ accession No. BP464981BP504629).
|
|
The ESTs showing identity or high similarity to known genes were classified into seven major categories on the basis of putative general functions of the protein encoded, as described previously (categories: cell division, cell signaling/communication, cell structure/ motility, cell/organism defense, gene/protein expression, metabolism, and unclassified). In total, 3951 out of 4078 known genes were represented in the classified data set (online supplement). The largest category of genes was gene/protein expression (26.4%). Successively smaller categories were cell signaling and communication (19.0%), metabolism (16.8%), cell structure/ motility (7.7%), cell/organism defense (7.5%), and cell division (5.6%). ESTs lacking sufficient information to be classified constituted the remainder, unclassified (16.9%). To further analyze the molecular complexity, each major category was subdivided according to the putative specific functions of the proteins (Table 3
, also see online supplement). For example, the largest category, gene/protein expression, was subdivided into eight subgroups. Of these, transcription factor constituted the largest number of non-redundant genes (416 genes by 1209 ESTs). The transcription factors include PDX-1, BETA2/NeuroD, HNF-4
, Nkx-2.2, Nkx-6.1, and Isl-1 etc, all of which are important for pancreatic development and islet-specific functions, and the first three of which are the causal genes for monogenic forms of diabetes, MODY4, MODY6, and MODY1 (Fajans et al. 2001). The other genes for transcription factors also are plausible candidates for diabetogenes or genes responsible for ß-cell specific functions.
|
|
Characterization of differentially expressed genes
The immunoreactive insulin (IRI) content of rat normal ß-cells has been reported to be ~8000 pmol/106 cells, while the RINm5F cell line used in this study has been estimated to contain a very low level of IRI (0.43 pmol/106 cells) and a much lower number of secretory granules (Kayo et al. 1997). Accordingly, the expression levels of the genes involved in insulin synthesis and secretion in RINm5F cells should markedly differ from those of the normal ß-cells. In addition, since the RINm5F cells were derived from radiation-induced tumor cells and exhibited a decrement of well-differentiation (Gazdar et al. 1980, Philippe et al. 1987, Kayo et al. 1997), the expression levels of the genes involved in cell differentiation and tumorigenesis may also be altered. The relative frequencies of ESTs have been shown to reflect the average level of expression of the corresponding mRNAs in the tissues examined (Lee et al. 1995). As pancreatic islet cells are mostly ß-cells, the expression profile of insulin-related genes in the two cDNA sources (of same size) can be compared to identify differentially expressed genes. The EST frequencies for most of the house-keeping genes were similar in the two cDNA libraries, suggesting that such comparison of EST frequencies is reasonable. Over 2-fold differences in frequency between the two libraries were found in 204 genes (higher EST> 10 times). The direction of change in mRNA levels in these ESTs observed by comparison of the EST frequency and the TaqMan semi-quantitative analysis was quite parallel, while the magnitude of the change was not correlated. The representative results for some of the ESTs (> 15 times) are shown in Fig. 1
. Previously, similar comparative analysis using ~6000 ESTs from two different conditions of PC-12 cells was performed (Lee et al. 1995). The study found the ratio of EST frequencies between the two cDNA sources to be correlated with the Northern blot analysis, except for the low-frequency ESTs. Thus, the genes of interest that are expressed at least at moderate levels also should be examined by semi-quantitative analysis before further analysis.
|
|
|
In this study, we describe a collection of 40 710 rat pancreatic islet-related ESTs representing 10 406 different transcripts. This is the first report describing a systematic collection of rat expressed genes from pancreatic islets and a ß-cell line. Since DNA microarray technology relies largely on the rapid growth of the EST databases, these newly identified expressed genes should facilitate analysis of differential gene expression in pancreatic islets under various conditions. At present, only the PanChip microarray, which was prepared using 3400 cDNA sequences from mouse whole pancreas, is available as a tissue-specific microarray for islet studies (Scearce et al. 2002). Accordingly, the establishment of islet-specific DNA microarrays for the rat should be especially important in the analysis of the transcriptome of diabetic rats such as GK and OLETF, and is presently underway in our laboratory. Another advantage of the large-scale collection of EST clones is that the cDNA fragments obtained can be used as hybridizing probes for Northern blotting or in situ hybridization to analyze the size and number of alternatively spliced transcripts and their local tissue distribution. However, it is possible that a small fraction of the ESTs obtained might be contaminated from other cell types such as endothelial cells or blood. Indeed, our preliminary trial of non-isotopic in situ hybridization using rat ESTs was found to be quite effective for analysis of mRNA expression in pancreatic islets. A large-scale in situ hybridization of rat islet mRNAs is also presently in progress in our laboratory. Functional analysis of a wide spectrum of islet-specific genes and genes highly abundant or less abundant in islets identified by this approach might clarify the molecular mechanisms underlying the differentiation of islet cells, tumorigenesis, and the pathogenesis of diabetes, as well as lead to new therapies for the improvement and regeneration of ß-cell function through manipulation of gene expression and gene products.
In addition, as the genome sequence analysis of the Brown Norway rat recently has been completed (Rat Genome Sequencing Project Consortium 2004), the results of this study should be helpful in annotating the genes actually expressed in the rat genome and thus provide further insight into mammalian evolution of genes involved in tissue-specificity of endocrine pancreas.
| Acknowledgements |
|---|
| References |
|---|
|
|
|---|
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W & Lipman DJ 1997 Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25 33893402.
Bernal-Mizrachi E, Cras-Meneur C, Ohsugi M & Permutt MA 2003 Gene expression profiling in islet biology and diabetes research. Diabetes Metabolism Research Review 19 3242.[CrossRef]
Bonner-Weir S & Sharma A 2002 Pancreatic stem cells. Journal of Pathology 197 519526.
Bradshaw AD, Graves DC, Motamed K & Sage EH 2003 SPARC-null mice exhibit increased adiposity without significant differences in overall body weight. PNAS 100 60456050.
Claesson L, Larhammar D, Rask L & Peterson PA 1983 cDNA clone for the human invariant gamma chain of class II histocompatibility antigens and its implications for the protein structure. PNAS 80 73957399.
Dor Y, Brown J, Martinez OI & Melton DA 2004 Adult pancreatic ß-cells are formed by self-duplication rather than stem-cell differentiation. Nature 429 4146.[CrossRef][Medline]
Edlund H 2002 Pancreatic organogenesis Developmental mechanisms and implications for therapy. Nature Genetics Review 3 524532.
Fajans SS, Bell GI & Polonsky KS 2001 Molecular mechanisms and clinical pathophysiology of maturity-onset diabetes of the young. New England Journal of Medicine 345 971980.
Gazdar AF, Chick WL, Oie HK, Sims HL, King DL, Weir GC & Lauris V 1980 Continuous, clonal, insulin- and somatostatin-secreting cell lines established from a transplantable rat islet cell tumor. PNAS 77 35193523.
Hirayama I, Tamemoto H, Yokota H, Kubo SK, Wang J, Kuwano H, Nagamachi Y, Takeuchi T & Izumi T 1999 Insulin receptor-related receptor is expressed in pancreatic beta-cells and stimulates tyrosine phosphorylation of insulin receptor substrate-1 and -2. Diabetes 48 12371244.[Abstract]
Hogenesch JB, Ching KA, Batalov S, Su AI, Walker JR, Zhou Y, Kay SA, Schultz PG & Cooke MP 2001 A comparison of the Celera and Ensembl predicted gene sets reveals little overlap in novel genes. Cell 106 413415.[CrossRef][ISI][Medline]
International Human Genome Sequencing Consortium 2001 Initial sequencing and analysis of the human genome. Nature 409 860921.[CrossRef][Medline]
Jin L, Wang H, Narita T, Kikuno R, Ohara O, Shihara N, Nishigori T, Horikawa Y & Takeda J 2003 Expression profile of mRNAs from human pancreatic islet tumors. Journal of Molecular Endocrinology 31 519528.[Abstract]
Kayo T, Sawada Y, Suda M, Konda Y, Izumi T, Tanaka S, Shibata H & Takeuchi T 1997 Proprotein-processing endoprotease furin controls growth of pancreatic ß-cells. Diabetes 46 12961304.[Abstract]
Kitamura T, Kido Y, Nef S, Merenmies J, Parada LF & Accili D 2001 Preserved pancreatic ß-cell development and function in mice lacking the insulin receptor-related receptor. Molecular and Cellular Biology 21 56245630.
Lee NH, Weinstock KG, kirkness EF, Earle-Hughes JA, Fuldner RA, Marmaros S, Glodek A, Gocayne JD, Adams MD, Kerlavage AR, Fraser CM & Venter JC 1995 Comparative expressed-sequence-tag analysis of differential gene expression profiles in PC-12 cells before and after nerve growth factor treatment. PNAS 92 83038307.
Ma H-T, Kato M & Tatemoto K 1996 Effects of pancreastatin and somatostatin on secretagogues-induced rise in intracellular free calcium in single rat pancreatic islet cells. Regulatory Peptide 61 143148.[CrossRef]
Mouse Genome Sequencing Consortium 2002 Initial sequencing and comparative analysis of the mouse genome. Nature 420 520562.[CrossRef][Medline]
Philippe J, Chick WL & Habener JF 1987 Multipotential phenotypic expression of genes encoding peptide hormones in rat insulinoma cell lines. Journal of Clinical Investigation 79 351358.
Rat Genome Sequencing Project Consortium 2004 Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428 493521.[CrossRef][Medline]
Scearce LM, Brestelli JE, McWeeney SK, Lee CS, Mazzarelli J, Pinney DF, Pizarro A, Stoeckert CJ Jr, Clifton SW, Permutt MA, Brown J, Melton DA & Kaestner KH 2002 Functional genomics of the endocrine pancreas. The pancreas clone set and PanChip, new resources for diabetes research. Diabetes 51 19972004
Scholler N, Fu N, Yang Y, Ye Z, Goodman GE, Hellstrom KE & Hellstrom I 1999 Soluble member(s) of the mesothelin/ megakaryocyte potentiating factor family are detectable in sera from patients with ovarian carcinoma. PNAS 96 1153111536.
Takeda J, Yano H, Eng S & Bell GI 1993 A molecular inventory of human pancreatic islets: sequence analysis of 1000 cDNA clones. Human Molecular Genetics 2 17931798.
Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, et al. 2001 The sequence of the human genome. Science 291 13041351.
Received 17 March 2005
Accepted 29 April 2005
Made available online as an Accepted Preprint 16 May 2005
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |