Academia.eduAcademia.edu
© Kamla-Raj 2008 Int J Hum Genet, 8(1-2): 97-118 (2008) Genetic Imprints of Pleistocene Origin of Indian Populations: A Comprehensive Phylogeographic Sketch of Indian Y-Chromosomes R. Trivedi1, Sanghamitra Sahoo1, 2, Anamika Singh1, G. Hima Bindu1, Jheelam Banerjee1, Manuj Tandon1, Sonali Gaikwad1, Revathi Rajkumar1, T. Sitalaximi1, Richa Ashma1, G.B.N. Chainy3 and V. K. Kashyap*1,2 1. National DNA Analysis Centre, Central Forensic Science Laboratory, 30, Gorachand Road, Kolkata 700 014, West Bengal, India 2. National Institute of Biologicals, A-32, Sector 62, Institutional Area, Noida 201307, Uttar Pradesh, India 3. PG Department of Biotechnology,Vani Vihar, Utkal University, Bhubaneswar 751 004, Orissa, India KEYWORDS Population genetics, people of india, linguistic groups, migration ABSTRACT Paleoanthropological evidence indicates that modern humans reached South Asia in one of the first dispersals out of Africa, which were later followed by migrations from different parts of the world. The variation of 20 microsatellite and 38 binary polymorphisms on the non-recombining part of the uniparental, hapliod Y-chromosome was examined in 1434 male individual of 87 different populations of India to investigate various hypothesis of migration and peopling of South Asia Sub-continent. This study revealed a total of 24 paternal lineages, of which haplogroups H, R1a1, O2a and R2 portrayed for approximately 70% of the Indian Y-Chromosomes. The high NRY diversity value (0.893) and coalescence age of approx. 45-50 KYA for H and C haplogroups signified an early settlement of the subcontinent by modern humans. Haplogroup frequency and AMOVA results provide similar evidence in support of a common Pleistocene origin of Indian populations, with partial influence of Indo-European gene pool on the Indian society. The differential Y-chromosome and mt DNA pattern in the two Austric speakers of India signaled that an earlier male–mediated exodus from South East Asia largely involved the Austro-Asiatic tribes, while the Tibeto-Burman males migrated with females through two different routes; one from Burma most likely brought the Naga-Kuki-Chin language and O3e Y-chromosomes and the other from Himalayas, which carried the YAP lineages into northern regions of subcontinent. Based on distribution of Y-chromosome haplogroups (H, C, O2a, and R2) and deep coalescing time depths for these paternal lineages, we propose that the present day Dravidian speaking populations of South India are the descendants of earliest Pleistocene settlers while Austro-Asiatic speakers came from SE Asia in a later migration event. INTRODUCTION The origins of modern humans in South Asia have been obscure. Archeological and paleoanthropological evidences are few and fragmentary. Human remains dating back to the Late Pleistocene provide limited but conclusive evidence for early human occupation in the Indian subcontinent (Deraniyagala 1992; Kennedy 2000; James et al. 2005). A number of artefacts of Middle and Upper Paleolithic cultures in Narmada Valley and the remains of Acheulian culture have been extensively found through out South Asia. Mesolithic microliths and evidences of Neolithic settlements found in diverse parts of the * Corresponding Author: Dr. V K Kashyap, Director National Institute of Biologicals, A-32, Sector 62, Institutional Area, NOIDA 201307, India. Telephone: +91-120-2400027, Fax: +91-120-2403014 E-mail: vkk2k@hotmail.com subcontinent also testify towards occupation of India by early humans (Misra 2001). Most of the prevailing genetic records further corroborate with the hypothesis that Homo sapiens colonized South Asia as a part of an early southern dispersal from Africa (Quintana-Murci et al. 1999; Cann 2001; Macaulay et al. 2005). This paper examines the current genetic diversity of Indian Y chromosomes in context to place the genetic origin/(s) and time of settlement of the earliest human populations in India. The present-day populations of India belong to 4635 endogamous communities (Singh 1998) and speak as many as 350 living languages (ethnologue), which fall under the four major supra-language families, i.e., Indo-European, Dravidian, Sino-Tibetan and Austric. The nature of extensive diversity among varied groups reported with 54 classical markers showed a typically north-south geographic division of 98 R. TRIVEDI, SANGHAMITRA SAHOO, ANAMIKA SINGH ET AL. populations and placed Indians closer to European populations than either with eastAsians or Africans in the genetic distance trees (Cavalli- Sforza et al. 1994). A number of studies based on mt DNA, Y-chromosome and other nuclear DNA markers have invariably supported these observations. Numerous surveys of genetic variation have generally portrayed the differences between caste and tribes, and the extent of gene flow among ranked caste clusters. Most of the studies conclude that maternal gene pool of Indian populations are proto-Asian in origin with limited west-Eurasian admixture. While the Ychromo-somes of the caste populations were found to be more similar to Europeans than Asians; with greater west-Eurasian admixture in castes of higher rank (Bamshad et al. 2001), recent studies provide congruent evidence against any major influx of Indo-European speakers into the Indian gene pool and have ascertained a late Pleistocene South Asian origin for majority of Indian populations (Sahoo et al. 2006; Sengupta 2006). These new findings are consistent with archeobotanical evidences (Fuller 2003) and linguistic data (Renfrew 1989) which suggest a recent common root for Elamite and Dravidic languages. It is hypothesized that the same prehistoric gene pool of southern Asian Pleistocene coastal settlers from Africa provided inocula for both Indian castes and tribes, and subsequent diversification of the gene pools was probably due to the genetic imprints laid down by later migrants, such as Huns, Greeks, Kushans, Moghuls, and others (Kivisild et al. 2003). However, much speculation remains about which of the population groups are amongst the earliest settlers of the Indian subcontinent. While the Austro-Asiatic tribes have been presumed to be descendants of the early modern humans based on nucleotide diversity of mitochondrial M haplogroup (Roychoudhary et al. 2001; Basu et al. 2003), analysis of the Indian Y chromosomes undertaken in this study depicts a different scenario. In this study, we have assessed a total of 1434 unrelated male individuals belonging to 87 different Indian populations, of which 936 Ychromosomes have been previously analysed for 38 Y-SNP markers (Sahoo et al. 2006). Additional 216 samples are included in the present analysis, while Y-chromosomal haplogroup data for 282 additional samples from seven other Indian populations were collated from the literature (Kivisild et al. 2003; Cordaux et al. 2004). The present study is based on simultaneous analysis of 38 SNP and 20-STR markers on the Ychromosome to provide the age estimates and describe their phylogeographic distribution. Apart from determining antiquity of various populations groups in South Asia, we also discuss the genetic structure and peopling of the subcontinent in light of present molecular evidences. SUBJECTS AND METHODS Populations Analyzed A total of 1152 unrelated male individuals belonging to 80 different populations were analyzed in the present study. Samples include populations from various linguistic families (IndoEuropean, Austro-Asiatic, Dravidian and TibetoBurman) and sixteen geographical areas of India. Blood samples were collected with informed consent using a protocol approved by the ethical committee of CFSL, Kolkata. DNA was extracted using standard protocols (Sambrook 1989) from peripheral blood lymphocytes. Information concerning their geographic origin, linguistic and socio-ethnic affiliation for each population is given in table 1. Additional data on 282 samples from seven Indian populations (Punjab, Konkanstha Brahmin, Koya, Yerava, Mullukunan, Kuruchian, Koraga) and from 76 populations of Western Europe (51), Russia (281), Middle East (102), Caucasus (122), Central Asia (584), Siberia (66), North East Asia (334), South East Asia (552), Oceania (225), Pakistan (691) and Sri Lanka (39) were collated from literature and included in the genetic distance analysis. Markers Analyzed 38 binary polymorphisms included in the present analysis have been previously described (Sahoo et al. 2006). Analysis of 20 Y-STRs (twelve tetranucleotide repeats DYS19, DYS385a/b, DYS 389I/II, DYS390, DYS391, DYS393, DYS460, H4, DYS437 and DYS439 and three trinucleotide repeats, DYS392, DYS388 and DYS426, two dinucleotide YCAIIa/b, two pentanucleotide repeat loci, DYS438 and DYS447 and a hexa-repeat nucleotide DYS448) was carried out in the same DNA samples by using an in-house standardized protocol using primers described elsewhere (Butler et al. 2002). Y-STR haplotypes were 99 EARLIEST SETTLERS OF INDIAN SUBCONTINENT constructed in a sequential order of loci keeping an ascending numerical order for the minimal haplotype to facilitate Y-chromosome comparisons with other world populations. Statistical Analyses Several population genetic parameters, including mean haplogroup and haplotype diversity and their standard errors, mean number of pair wise differences (MPD), pairwise FST values for haplogroups and associated p values were calculated using ARLEQUIN ver.2.0 software package (Schneider et al. 2000). To test for differences in the proportions, χ 2- test for significance was employed. Apportionment of genetic differences among various socio-ethnic, geographic and linguistic groups at different levels of hierarchical subdivisions; between individuals within populations, between populations within groups and between groups of populations were calculated using analysis of molecular variance (AMOVA) (Excoffier et al. 1992). To examine the factor/(s) responsible for genetic differentiation of Y-chromosomes, AMOVA was done both on binary markers as well as with Y-STRs within the lineages. Significance levels of the genetic variance components as well as ÖST values were estimated by using 10000 iterations. Median-joining network algorithm of haplogroup associated haplotypes (Bandelt et al., 1999; Forster et al., 2000) was performed using the software NETWORK 4.1.0.8 version (Life Sciences and Engineering Technology Solutions Web site), with epsilon value set to zero. For network calculation, seven Y-STR (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392 and DYS393) loci were used, where weightage to each locus was given according to the estimated variance. Y-STR loci with highest variance was given the lowest weights. To estimate the time to the most recent common ancestor (TMRCA), we calculated the ages to STR variation within the correponding haplogroup observed in the Indian populations using the average square difference (ASD) method. We used the same seven Y-STRs as those used in Network analysis and and a generation time of 25 years and mutation rate of 6.9 X 10-4 as described by Zhivotovsky et al. (2004). Neighbor-joining tree based on FST values of 87 Indian populations were used to illustrate the genetic affinity between the studied groups using MEGA 3.0. Genetic relationship between populations of India and other parts of the world was estimated based on pairwise genetic distances (FST values) calculated from haplogroup frequencies. Multidimensional scaling (MDS) analysis of pairwise FST values was performed using XL STAT pro 7.5 to decipher the genetic affinities of populations. The Indian populations were pooled into their regional boundaries for comparison of genetic similarity with world populations, to obtain a better resolution in the MDS plot. RESULTS Approximately 1152 individuals belonging to 80 extant human populations from 16 geographical regions of India were analysed with 38 Y-SNPs and 20 Y-STR markers to evaluate the possibility that Austro-Asiatic speakers are the earliest settlers of the Indian subcontinent. 24 different paternal lineages were observed, out of which, haplogroups H-M69, R1a1-M17, R2-M124 and O2a-M95 together account for 69% of the paternal diversity in South Asia. Another 20.9% of the genetic variation in Indian males is described by haplogroups L-M11, J2-M172, O3eM134, K2-M70, F-M89 and C-RPS4Y711, while the presence of other haplogroups- R1b3-M269 and G-M201 could be attributed to recent admixture with Europeans. Haplogroups and Extent of Y-Chromosome Diversity in Indians The Y-SNPs used in this study were based on previous reports of polymorphisms in Eurasian and Oceanic populations. Overall haplogroup diversity among Indians was relatively high when compared to European or East Asian populations. Indian populations depicted diversity values from 0.133 to a high of 0.914, with Austro-Asiatic and Tibeto-Burman tribes generally showing reduced diversity (Table 1). Twenty-five Dravidian populations showed a higher mean haplogroup diversity (0.723± 0.083) compared to IndoEuropean speakers (0.684± 0.079) represented by thirty endogamous groups. South Indian groups; Andhra Brahmins, Kallar, Raju, Chenchu and Lambadi displayed high lineage diversity values, while populations of North India typically demonstrated lower mean haplogroup diversity Code State Region Language Social Status Hierarchy Haplogroup Diversity ANDHRABRAHMIN CHENCHU KAMMA CHAUDHARY KAPPU NAIDU KOMATI LAMBADI NAIKPOD GOND RAJU REDDY YERUKULA ADI PASI BIHARBRAHMIN BHUMIHAR RAJPUT KAYASTHA YADAV KURMI BANIYA GUJ PATEL HP RAJPUT OROAN HO BHUMIJ KHARIA MUNDA BIRHOR SANTHAL IYENGAR LINGAYAT GOWDA BHOVI CHRISTIAN MUSLIM KURUVA DESASTH BRAHMIN CHITPAVAN BRAHMIN MARATHA DHANGAR PAWARA KATKARI MADIA GOND ANB CHU KMC KPN KOM LMD NPG RAJ RDY YER ADI BBH BHU BRJ BKY YAV KUI BAN PAT HRJ ORO HO BHJ KRA MUN BIR SAN IYN LYN GOW BHV CHR MUS KUR DSB CHB M AT DGR PWR KTK MGD ANDHRA PRADESH ANDHRA PRADESH ANDHRA PRADESH ANDHRA PRADESH ANDHRA PRADESH ANDHRA PRADESH ANDHRA PRADESH ANDHRA PRADESH ANDHRA PRADESH ANDHRA PRADESH ARUNACHAL PRADESH BIHAR BIHAR BIHAR BIHAR BIHAR BIHAR BIHAR GUJRAT HP JHARKHAND JHARKHAND JHARKHAND JHARKHAND JHARKHAND JHARKHAND JHARKHAND KARNATAKA KARNATAKA KARNATAKA KARNATAKA KARNATAKA KARNATAKA KARNATAKA MAHARASTRA MAHARASTRA MAHARASTRA MAHARASTRA MAHARASTRA MAHARASTRA MAHARASTRA South South South South South South South South South South North-East East East East East East East East West North East East East East East East East South South South South South South South West West West West West West West Dravidian Dravidian Dravidian Dravidian Dravidian Indo-European Dravidian Dravidian Dravidian Dravidian Tibeto-Burman Indo-European Indo-European Indo-European Indo-European Indo-European Indo-European Indo-European Indo-European Indo-European Dravidian Austro-Asiatic Austro-Asiatic Austro-Asiatic Austro-Asiatic Austro-Asiatic Austro-Asiatic Dravidian Dravidian Dravidian Dravidian Dravidian Dravidian Dravidian Indo-European Indo-European Indo-European Indo-European Indo-European Indo-European Dravidian CASTE TRIBE CASTE CASTE CASTE TRIBE TRIBE CASTE CASTE TRIBE TRIBE CASTE CASTE CASTE CASTE CASTE CASTE CASTE CASTE CASTE TRIBE TRIBE TRIBE TRIBE TRIBE TRIBE TRIBE CASTE CASTE CASTE CASTE CASTE CASTE TRIBE CASTE CASTE CASTE CASTE TRIBE TRIBE TRIBE Upper 0.8538 0.8474 0.6433 0.5737 0.5000 0.8789 0.7421 0.8596 0.8022 0.6316 0.3556 0.4510 0.6368 0.4394 0.7363 0.6786 0.8590 0.7273 0.7778 0.9143 0.4091 0.0000 0.4571 0.5333 0.1429 0.1333 0.0000 0.7485 0.8333 0.9524 0.7524 0.8333 0.8333 0.7436 0.8421 0.9048 0.8000 0.8167 0.7417 0.8246 0.6264 Lower Lower Lower Middle Middle Upper Middle Upper Middle Lower Lower Lower Middle Upper Upper Upper Lower Lower Lower Lower Upper Upper Middle Lower ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± 0.0504 0.0437 0.1078 0.1213 0.1222 0.0432 0.0584 0.0393 0.0687 0.0875 0.1591 0.1174 0.1151 0.1581 0.0748 0.1220 0.0633 0.0679 0.1100 0.0425 0.1333 0.0000 0.1406 0.1801 0.1188 0.1123 0.0000 0.0610 0.0691 0.0955 0.0918 0.0597 0.2224 0.0909 0.0657 0.0405 0.0681 0.0571 0.1053 0.0648 0.1098 R. TRIVEDI, SANGHAMITRA SAHOO, ANAMIKA SINGH ET AL. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 Population 100 Table 1: Description of the Indian populations included in this study Code State Region Language Social Status 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 MAHADEO KOLI MARA HMAR LAI LUSEI KUKI MANIPURI MUSLIM ORIYABRAHMIN KARAN KHANDAYAT GOPE PAROJA JUANG SAORA NEPALI MKL MRA HMR LAI LUS KUK MMS OBH KRN KDY GPE PRJ JUN SAR NEP MAHARASTRA MIZORAM MIZORAM MIZORAM MIZORAM MIZORAM MIZORAM ORISSA ORISSA ORISSA ORISSA ORISSA ORISSA ORISSA SIKKIM West North-East North-East North-East North-East North-East North-East East East East East East East East North-East TRIBE TRIBE TRIBE TRIBE TRIBE TRIBE CASTE CASTE CASTE CASTE CASTE TRIBE TRIBE TRIBE CASTE 57 58 59 60 61 62 63 64 65 66 67 68 BHUTIA CHAKKLIAR KALLAR VANNIYAR PALLAR GOUNDER IRULAR KANYAKUBJ BRAHMIN UP JAT UP THAKUR KHATRI BHOKSHA BHT CHK KAL VAN PAL GOU IRU KKB UPJ UPT KHT BKS SIKKIM TAMIL NADU TAMIL NADU TAMIL NADU TAMIL NADU TAMIL NADU TAMIL NADU UTTAR PRADESH UTTAR PRADESH UTTAR PRADESH UTTAR PRADESH UTTAR PRADESH North-East South South South South South South North North North North North 69 70 UP KURMI THARU UPK THR UTTAR PRADESH UTTAR PRADESH North North 71 JAUNSARI JUS UTTAR PRADESH North 72 73 74 75 76 77 78 79 80 MAHISHIYA NAMASUDRA BAURI MAHELI KARMALI KORA LODHA EZHAVA HINDU NAIR MSY NMS BAU MHL KRM KOR LOD EZH NAR WEST BENGAL WEST BENGAL WEST BENGAL WEST BENGAL WEST BENGAL WEST BENGAL WEST BENGAL KERALA KERALA East East East East East East East South South Indo-European Tibeto-Burman Tibeto-Burman Tibeto-Burman Tibeto-Burman Tibeto-Burman Tibeto-Burman Indo-European Indo-European Indo-European Indo-European Dravidian Austro-Asiatic Austro-Asiatic Tibeto-Burman/ Indo-European Tibeto-Burman Dravidian Dravidian Dravidian Dravidian Dravidian Dravidian Indo-European Indo-European Indo-European Indo-European Tibeto-Burman/ Indo-European Indo-European Tibeto-Burman/ Indo-European Tibeto-Burman/ Indo-European Indo-European Indo-European Indo-European Austro-Asiatic Austro-Asiatic Indo-European Austro-Asiatic Dravidian Dravidian TRIBE CASTE CASTE CASTE CASTE CASTE TRIBE CASTE CASTE CASTE CASTE TRIBE CASTE TRIBE Hierarchy Lower Upper Middle Middle Lower Upper Lower Middle Middle Lower Upper Upper Upper Upper Middle Lower TRIBE CASTE CASTE CASTE TRIBE TRIBE TRIBE TRIBE CASTE CASTE Middle Lower Lower Lower Upper Haplogroup Diversity 0.7636 0.5333 0.2789 0.5455 0.4094 0.6818 0.8333 0.8043 0.6471 0.7564 0.8333 0.8667 0.0000 0.6784 0.9048 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± 0.0833 0.0515 0.1235 0.0615 0.1002 0.0910 0.0980 0.0697 0.0953 0.0974 0.0720 0.0483 0.0000 0.0884 0.1033 0.5000 0.7912 0.8788 0.8333 0.7000 0.7124 0.7576 0.6909 0.0000 0.5357 0.2857 0.6222 ± ± ± ± ± ± ± ± ± ± ± ± 0.2652 0.0673 0.0751 0.0597 0.0896 0.0650 0.1221 0.1276 0.0000 0.1232 0.1964 0.1383 0.0000 ± 0.0000 0.8000 ± 0.1721 0.7333 ± 0.1552 0.8684 0.9000 0.6526 0.8211 0.2924 0.7579 0.8421 0.8056 0.0000 0.0489 0.0355 0.0648 0.0586 0.1274 0.0495 0.0595 0.0889 0.0000 ± ± ± ± ± ± ± ± ± 101 Population EARLIEST SETTLERS OF INDIAN SUBCONTINENT Table 1: Contd.... 102 Table 2a: Comprehensive haplogroup frequency data among linguistic, geographic and social categories of India Sample Size Sample Size TOTAL INDIA 1152 Language INDO-EUROPEAN 518 DRAVIDIAN 393 AUSTRO-ASIATIC 140 TIBETO-BURMAN 101 Geography NORTH 180 WEST 135 EAST 357 NORTH-EAST 108 SOUTH 372 Social Hierarchy UPPER CASTE 211 MIDDLE CASTE 175 LOWER CASTE 261 TRIBES 505 C D F* G H* H1 H2 J2* K* K2 L L1 M N O* 0.014 0.004 0.030 0.001 0.069 0.159 0.002 0.051 0.038 0.031 0.045 0.010 0.000 0.000 0.003 0.012 0.020 0.014 0.000 0.004 0.000 0.000 0.030 0.027 0.048 0.014 0.000 0.002 0.000 0.000 0.000 0.079 0.089 0.021 0.010 0.183 0.209 0.043 0.000 0.002 0.003 0.000 0.000 0.058 0.056 0.050 0.000 0.029 0.041 0.043 0.069 0.033 0.043 0.014 0.000 0.035 0.084 0.007 0.000 0.002 0.028 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.007 0.020 0.000 0.037 0.011 0.000 0.019 0.006 0.000 0.000 0.037 0.000 0.011 0.007 0.048 0.000 0.040 0.006 0.000 0.000 0.000 0.000 0.106 0.081 0.056 0.009 0.078 0.139 0.356 0.106 0.000 0.194 0.000 0.007 0.000 0.000 0.003 0.078 0.081 0.036 0.000 0.056 0.000 0.000 0.053 0.083 0.043 0.000 0.000 0.048 0.000 0.051 0.017 0.096 0.020 0.000 0.078 0.000 0.000 0.003 0.000 0.030 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.003 0.019 0.000 0.009 0.006 0.008 0.022 0.005 0.000 0.000 0.008 0.019 0.051 0.046 0.020 0.000 0.000 0.000 0.002 0.043 0.040 0.107 0.071 0.185 0.171 0.169 0.139 0.005 0.000 0.000 0.002 0.100 0.097 0.031 0.026 0.024 0.040 0.050 0.038 0.000 0.017 0.046 0.042 0.095 0.034 0.054 0.024 0.019 0.023 0.000 0.008 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.006 O2a O2a1 O3 O3e P* R* R1 R1a R1a1 R1b3 R2 0.149 0.001 0.001 0.026 0.027 0.010 0.011 0.002 0.175 0.005 0.135 0.010 0.023 0.729 0.554 0.000 0.000 0.000 0.010 0.002 0.000 0.000 0.000 0.006 0.000 0.000 0.267 0.039 0.008 0.036 0.030 0.021 0.000 0.000 0.000 0.008 0.023 0.000 0.000 0.004 0.000 0.000 0.000 0.297 0.117 0.007 0.010 0.012 0.000 0.000 0.000 0.137 0.209 0.014 0.000 0.000 0.000 0.325 0.519 0.000 0.000 0.000 0.000 0.009 0.000 0.006 0.000 0.000 0.000 0.000 0.017 0.000 0.000 0.250 0.000 0.000 0.030 0.045 0.046 0.016 0.011 0.022 0.014 0.000 0.003 0.000 0.000 0.006 0.009 0.027 0.006 0.000 0.003 0.000 0.000 0.483 0.193 0.104 0.019 0.134 0.006 0.000 0.000 0.000 0.013 0.111 0.089 0.120 0.000 0.215 0.000 0.000 0.004 0.339 0.000 0.000 0.000 0.002 0.000 0.000 0.000 0.002 0.000 0.000 0.004 0.057 0.019 0.029 0.023 0.032 0.014 0.011 0.004 0.010 0.005 0.029 0.023 0.002 0.005 0.000 0.000 0.002 0.360 0.263 0.157 0.077 0.005 0.000 0.000 0.010 0.090 0.189 0.276 0.061 R. TRIVEDI, SANGHAMITRA SAHOO, ANAMIKA SINGH ET AL. TOTAL INDIA 1152 Language INDO-EUROPEAN 518 DRAVIDIAN 393 AUSTRO-ASIATIC 140 TIBETO-BURMAN 101 Geography NORTH 180 WEST 135 EAST 357 NORTH-EAST 108 SOUTH 372 Social Hierarchy UPPER CASTE 211 MIDDLE CASTE 175 LOWER CASTE 261 TRIBES 505 Tribe Dravidian Caste AA DR TB IE Upper No. of populations Sample size. 11 179 8 126 7 92 8 108 5 72 C D F* G H* H1 H2 J2* K* K2 L L1 O* O2a O2a1 O3 O3e P* R* R1 R1a R1a1 R1b3 R2 1.1 5.6 3.3 2.2 3.2 2.8 7.3 12.7 22.2 3.9 3.4 2.2 0.6 3.2 7.1 11.1 4.8 3.2 0.6 57.0 4.3 2.8 4.6 2.2 59.8 1.1 7.1 28.3 5.6 1.7 0.8 0.6 0.6 11.9 10.6 7.1 1.9 0.9 1.9 0.9 13.9 26.9 0.9 1.9 1.1 Middle 4 58 Lower 10 137 Total (in %) 0.7 Indo-European Caste Upper Middle Lower Caste_DR Caste _IE 9 132 8 117 8 115 19 25 1.5 0.9 0.9 0.4 1.1 2.8 10.3 5.1 1.5 2.6 4.3 5.6 2.7 5.6 31.9 1.4 8.3 4.2 5.2 10.3 8.8 18.2 3.8 12.1 3.4 20.5 13.0 16.5 6.6 16.2 15.5 1.7 5.2 5.2 6.9 2.2 2.2 11.4 6.8 5.1 9.5 6.8 0.8 2.6 4.3 6.1 10.4 0.9 7.1 20.2 0.4 6.7 2.6 1.1 10.1 2.6 1.7 0.7 1.5 2.3 1.7 0.9 0.7 5.2 4.4 3.4 1.7 1.7 34.2 23.5 11.6 17.1 17.4 27.3 2.2 1.6 0.5 0.3 36.0 0.3 14.0 15.3 4.2 7.7 3.6 3.3 3.6 0.3 4.6 0.9 2.8 4.6 1.9 0.9 20.4 4.6 2.8 15.3 10.3 10.2 11.1 22.4 38.0 0.8 48.5 0.8 8.3 3.4 EARLIEST SETTLERS OF INDIAN SUBCONTINENT Table 2b: Comparative haplogroup frequency data in socio-ethnic groups of India Table 3: Y-Chromosome microsatellite diversity within the twelve major lineages found in India No of Males No of Haplotypes Lineage diversity MPD Av. gene diversity F H O3 O2a L K K2 R1a1 C J2 R2 29 27 0.995± 0.010 14.19 ± 6.55 0.709 ± 0.364 24 24 1.000 ± 0.012 13.91± 6.47 0.695 ± 0.360 221 206 0.999 ± 0.000 12.66± 5.73 0.633± 0.317 26 24 0.993 ± 0.012 7.37 ± 3.56 0.368 ± 0.198 162 144 0.997 ± 0.001 11.55± 5.26 0.577± 0.291 55 53 0.998 ± 0.003 13.13 ± 6.00 0.691 ± 0.350 36 36 1.000 ± 0.006 12.93 ± 5.96 0.718 ± 0.368 36 36 1.000 ± 0.006 14.18 ± 6.51 0.709 ± 0.361 191 188 0.999 ± 0.000 12.05± 5.47 0.602 ± 0.302 16 16 1.000 ± 0.022 13.45± 6.38 0.707 ± 0.376 52 51 0.999 ± 0.004 14.04 ± 6.40 0.702 ± 0.355 136 131 0.999± 0.001 13.04± 5.90 0.652± 0.327 2 Non Non Non 10 2 2 Non 10 Non 2 Non Non Non Non Non 3 Non Non Non 1 Non 2 1 103 Haplotype sharing Within Between P R. TRIVEDI, SANGHAMITRA SAHOO, ANAMIKA SINGH ET AL. 104 (0.569 ± 0.104). The distribution of the lineages and Y-STR diversity within the haplogroups are described in detail. the Indians. The Indo-European speakers demonstrated a significantly higher proportion of this lineage as compared to populations belonging to Dravidian linguistic family (29.7% vs. 11.7%; χ2= 7.82, p<0.05) (Table 2a). With the exception of Lodha, Nepali and Bhutia, all other Austro-Asiatic and Tibeto-Burman speakers lack this haplogroup in their Y-chromsomes. While the Indo-European and Dravidian caste group depict significant variation (χ2= 12.5, p<0.01), the tribal groups are more akin. Distribution of M17 lineage also showed a decreasing geographic cline along the latitude; its frequency was highest (approx. 50%) among the populations of Bihar and Uttar Pradesh, where almost 60% of the Upper and Middle caste groups harbored R1a1 Y–chromosomes. Out of the 191 males that carried R1a1 haplogroup, 188 unique 20-YSTR haplotypes were observed (h=0.999). While no haplotype was shared between populations, intrapopulation variation was observed within Jat, Bhoksha and Yerukula and the mean pairwise difference between all the Y-STRs was found to be high (12.05) (Table 3). The median–joining network analysis, however, revealed that populations of neighbouring area shared few of the haplotypes (Fig. 2b). Passarino et al 2002 reported two region specific allele pattern associated within M17 among Europeans; DYS19=15 and YCA IIa,b=19,21 was specific to the R1a1s in Western Europe, while Eastern European R1a1s typically harbored allele16 for DYS19 and 19,23 for YCA IIa,b. In our dataset, although, allele 15 and 16 at DYS19 were the two most common alleles with significant difference in their frequency (χ2= 4.66, p<0.05), they did not reveal any specific geographical, socioethnic or linguistic pattern in their distribution. The TMRCA of those individuals harboring R1a1 Y-chromosome is estimated around ~32 KYA (Table 4). Haplogroup H-M69 Majority of males analyzed from different geographic regions of India (23%) carried the M69C haplotype, which is additionally defined by M52C mutation. Distribution of Haplogroup H showed a north-south gradient (24.4% to 27.4%), however geographically its total frequency was highest (44.4%) in populations of western India (Table 2a). Among the 23% carrying H lineage in their Y-chromosomes, most of them were representatives of south India (27.4%), speaking Dravidian languages (30.0%). In socioethnic groups, the frequency was 27.6% in lower caste groups, while in the tribal groups it accounted for 21.2% of their paternal variation. However, the pattern of distribution did not vary statistically between the Dravidian and IndoEuropean speaking tribes or caste cluster (Table 2b). 206 distinct 20-Y-STR haplotype profiles deciphered out of 221 individuals carried a mean pairwise difference of 12.66 (Table 3), where none of the hapotypes were shared between groups. In the median–joining network analysis with 7-YSTRs associated with M69C/ M52C lineage branch, majority of Y-chromosome STR haplotypes are connected by one-or two-step mutation events (Fig. 2a). Kora, an Indo-European speaking tribal group from eastern India, branched out of the network with more than three mutation steps. The Y-STR based coalescence time of haplogroup H1 chromosomes was estimated to be ~ 43,556 years (Table 4). Haplogroup R1a1-M17 Haplogroup R1a1-M17 characterizes 17.5% of Table 4: Y-Chromosome haplogroup variances and TMRCA estimated on seven Y-STR loci Locus -wise variance DYS19 DYS3891 DYS3892 DYS390 DYS 391 DYS392 DYS393 Average Age estimates in years R1a1 H C O2a R2 0.549 0.793 0.960 1.173 1.081 1.009 0.710 0.896 32,015.31 0.497 0.760 0.872 2.246 2.533 0.708 0.921 1.220 43,556.12 1.183 0.783 1.400 1.983 1.762 1.662 0.917 1.384 49,438.78 0.293 0.383 0.620 2.314 0.791 1.620 0.995 1.002 35,795.92 0.770 0.649 1.211 1.375 0.966 1.749 1.051 1.110 39,647.96 105 EARLIEST SETTLERS OF INDIAN SUBCONTINENT Arabian Sea Indian Ocean Fig. 1. Y-Chromosome haplogroups and their frequency distribution in different regional populations of India Haplogroup O2a-M95 Haplogroup M95, which forms the major South-East Asian male lineage, (Su et al. 2000, Karafet et al 2001) accounts for 15% of the Ychromosome variation in India. It is however, localized to the eastern part of the subcontinent, restricted among the Austro-Asiatic speakers (72.9%) and Tibeto-Burman speaking tribes (56.4%) of NE India (Table 2a). Although this haplogroup was also detected in Indo-European and Dravidian speakers (3.3% in total), its presence in them could be sufficiently attributed to admixture from Austro-Asiatic speaking neighbors living in close vicinity. While this lineage is completely fixed in Juang, Ho and Santhal, it is observed that the frequency in Tibeto-Burman tribes varied from 25% in Kuki to 80% in Hmar. Surprisingly however, none of the Himalayish branch of Tibeto-Burman speakers; Nepali, Bhutia, Tharu, Jaunsari and Bhoksha harbor this haplogroup (Fig. 1). Although the Ychromosomes were rather similar (FST =0.03, p<0.05), none of the Y-STRs were shared between groups (Table 3). A comparison of Indian M-95 YSTR haplotypes with populations of SE Asia including Java, Borneo, Taiwan and Malay revealed that the Austro-Asiatic speakers of Indian subcontinent showed closer affinity to the SE Asians than their Tibeto-Burman speaking neighbors (FCT =0.43 vs 4.15, respectively) (data not shown). To further investigate the relationships between O2a Y-chromosome in the Austro-Asiatic and Tibeto-Burman speakers, a median-joining network of 27 discrete haplotypes of 151 individuals was constructed (Fig. 2c). This network exhibited two distinct clusters of haplotypes with considerable haplotype sharing between the two linguistic families; however the Austro-Asiatic speakers depicted more diverse 106 R. TRIVEDI, SANGHAMITRA SAHOO, ANAMIKA SINGH ET AL. haplotypes compared to the Tibeto-Burmans. The TMRCA of all M95T chromosomes was estimated to be ~35,795 years (Table 4). Haplogroup R2-M124 Our analysis revealed that haplogroup R2 characterizes 13.5% of the Indian Y-chromosomes and its frequency among Dravidian speakers was comparable to that of haplogroup H (20.9%) and significantly different from Indo-European and Austro-Asiatic speakers (χ2= 16.2, d=3, p<0.05). While the distribution across various geographic regions was almost uniform, significant differentiation was observed along the social groups (χ2= 18.7, d=3, p<0.05); a decreasing gradient was discernible as one moved up the caste hierarchy (Table 2a). Although tribes contributed only 7.4% of the total R2 lineage, it was proportionately distributed between the Austro-Asiatic and Dravidian tribes (Table 2b). Extensive analysis of its distribution between north and south Indian populations showed that while there was marginal difference among middle and lower caste groups of north India (17.1 and 17.4 % respectively), a clear gradient was observed among south Indians, where the frequency declined by more than one-half from lower to upper caste groups. Analysis of 20-Y-STRs within the R2 lineage revealed that three haplotypes were shared; one between Kamma Chaudhary and Kappu Naidu, both lower caste Dravidian speakers from Andhra Pradesh and two within Karmali and Pallar populations. Network analysis (Fig. 2d) depicted that a large number haplotypes were shared between populations of south India, while the populations of eastern India harbored more discrete Y-STR haplotypes. The TMRCA for M124T was estimated to be ~39,647 years (Table 4). Haplogroup L-M11 The overall frequency of haplogroup L-M11 in the Indian populations was estimated to be 5.6%, while sporadic occurrence of this lineage has earlier been described among Indo-European speakers of Caucasus, Middle East, Europe and a maximum of 4.3% in Central Asia (Semino et al. 2000; Wells et al. 2001). Dravidian speaking populations harbored a significantly higher percentage of L haplogroup compared to the Indo-European speakers, 11.2 and 3.7% respectively (χ2= 3.77, d=1, p=0.05). While frequencies were rather comparable in the lower caste groups of north and south, middle and upper caste populations of south India demonstrated relatively higher frequencies than northern caste groups (Table 2a). The L-network and high MPD (13.13) revealed results in congruence with AMOVA suggesting that no clear geographic, linguistic or social pattern could be discerned among the Y-STR haplotypes (data not shown). Haplogroup J2-M172 Haplogroup J2-M172 is the major lineage of Middle East/Mediterranean and its frequency decreases into Europe. Among the studied Indian populations, M172G exhibited a total frequency of 5.1%, where it was uniformly distributed among the three major linguistic families. Except for the Tibeto-Burman speaking tribes of northeast India, where this lineage was totally absent, no specific cline could be deciphered among the other seventy-three mainland populations. In the social categories, upper and middle caste populations harbor a significantly higher percentage of J2 lineage (~ 10%) as compared to ~3% in lower caste populations and tribal groups (χ2= 7.74, d=3, p=0.05). While the distribution was proportionate among upper caste groups of north and south India, the difference was more discrete (6.8% vs 15.5%) among middle caste groups of these regions (Table 2b). The Near Eastern populations harboring M172 Y-chromosomes are characterized by a very high frequency of DYS388 alleles with e”15 repeats, while more than 70% of the males examined under this study displayed alleles with repeat motifs d”14. Network analysis showed a large number of divergent haplotypes even with 7-YSTRs and only two reticulations. Lodha, the only Austro-Asiatic tribe that harbored J2 lineage also displayed very diverse haplotype profiles. The genetic structure estimated with AMOVA showed that although the extent of Y-STR differentiation in the populations carrying this lineage was approximately 3%, geography, language or position in the social hierarchy could not statistically delineate the studied Indian populations. Because this marker is associated with the spread of agriculture, we further estimated variance in the agricultural groups and observed a marginal difference of 1% in Y-chromosomes of landowner and labourer communities harboring the J2 lineage (data not shown). 107 EARLIEST SETTLERS OF INDIAN SUBCONTINENT Haplogroup O3-M122 and O3e-M134 The M122 haplogroup and its sub-lineage M134 are among the predominant and widespread lineages of East Asia (Su et al. 2000; Karafet et al. 2001; Shi et al. 2005). In the Indian males, it was detectable at frequencies less than 3% and was largely restricted among the Tibeto-Burman speakers of North-East. It was sporadically present among tribal groups of north India, particularly Tharu (Fig. 1), probably due to recent admixture with neighboring Tibeto-Burman speakers of Nepal and China. A clear delineation along the language family was observed in its distribution; where it was completely absent among the Tani speakers (Adi Pasi), while the Naga-Kuki-Chin branch of Tibeto-Burman speakers contributed the entire 26.7% of O3 lineage. The mean pairwise difference between Y-STRs and the lineage diversity was low at 7.37 and 0.993, respectively, compared to its sister clade O2a. Haplogroup K2-M70 Haplogroup K2 occurs on a M9G background and is reported to occur in populations of Near East and Europe (Underhill et al. 2000). In our study, it was found only in the eastern and southern regions of the country, adding to an overall frequency of 3.1%. Although it was present in the three major linguistic families, the statistical difference in its distribution was insignificant. However, its distribution depicted an inverse relation as one moved up the social ladder, with the upper caste populations completely lacking the M70 lineage in their Y-chromosomes (Table 2a). This lineage was predominant amongst the lower caste groups of east (Bauri) and tribal groups of south, particularly Yerukula, contributing approx. 60% of the total K2 chromosomes. Within the lineage, 36 distinct Y-STR haplotypes, with a very high mean pairwise difference (14.18) between them, depicted the absence of population structure due to language, geography or ethnicity. Haplogroup C- M130 (RPS4Y711) The RPS4Y711T forms the second major cluster in Asia and Australo-Melanesia and has reportedly spread into North America (Wells et al. 2001; Underhill et al. 2000; Karafet et al. 1999; Underhill et al. 2001). In the present study, this lineage was found spread all along the coastal belt in populations of Maharastra, Tamilnadu, Andhra Pradesh, Orissa and West Bengal, at an average frequency of 1.4%, and noticeably absent in the populations of North and North-East. Although this lineage was present in high frequency in tribes compared to caste groups, its distribution in them was not statistically significant (χ2= 2.25, d=1, p>0.05). We observed 16 discrete Y-STR haplotypes, with mean pairwise difference between haplotypes of 13.45 (Table 3). Although the total number of individuals carrying RPS4Y711T was too small (n=16) to make accurate evolutionary inferences about its origin within South Asia, the TMRCA of Indian RPS4Y711T individuals was estimated to be ~ 49,438 years (Table 4). Other Haplogroups Observed in Indians Haplogroup F, which is major and the most paraphyletic subcluster of M168 lineages was ubiquitous in its distribution along geographic, linguistic and socio-ethnic boundaries of India and was observed in approximately 3% of the studied males (Table 2a). Haplogroup D, a monophyletic branch of M168 lineage, defined by an Alu insertion and M174C mutation, on the other hand, was restricted in Bhutia and Tharu tribal groups. Its presence among them is most likely due to gene flow from Tibet, where this haplogroup has earlier been reported. Major haplogroups K*, P*, R*, R1, R1a contribute approximately 2-3% of the total Indian Ychromosomes and there was no difference in its distribution pattern among castes or tribes or among different geographic regions. Although, we detected a few European –specific haplogroups G and R1b3 in Indians, none of our studied samples showed the presence of haplogroup K3M147, N-M231 or I-M170, which are the other highly predominant haplogroups of Europe. Genetic structure of the Indian populations Analysis of molecular variance revealed that the extent of genetic differentiation was high among Indians; percent variation among different groups added up to 27.11%, suggesting that gene pool of India males was highly structured. To identify factor/(s) responsible for this compartmentalization of Y-chromosomes, population 108 R. TRIVEDI, SANGHAMITRA SAHOO, ANAMIKA SINGH ET AL. Fig. 2a. Median-Joining network of H haplogroup individuals, based on seven Y-STR haplotypes. Circles represent haplotypes and have an area proportional to frequency. Colour represents the four geographic regions of India (Red: South; Blue: North; Green: West; Yellow: East) Fig. 2b. Median-Joining network of R1a1 haplogroup individuals, based on seven Y-STR haplotypes. Circles represent haplotypes and have an area proportional to frequency. Colour represents the four geographic regions of India (Red: South; Blue: North; Green: West; Yellow: East) EARLIEST SETTLERS OF INDIAN SUBCONTINENT 109 Fig. 2c. Median-Joining network of O2a haplogroup individuals, based on seven Y-STR haplotypes. Circles represent haplotypes and have an area proportional to frequency. Colour represents the AustroAsiatic (Black) and Tibeto-Burman (White) linguistic families of India Fig. 2d. Median-Joining network of R2 haplogroup individuals, based on seven Y-STR haplotypes. Circles represent haplotypes and have an area proportional to frequency. Colour represents the four geographic regions of India (Red: South; Blue: North; Green: West; Yellow: East) R. TRIVEDI, SANGHAMITRA SAHOO, ANAMIKA SINGH ET AL. 110 genetic structure was analyzed on haplogroup frequency data in a hierarchical mode: within populations, among populations and among groups of populations, pooled according to geography, linguistic family and their socio-ethnic position (Table 5). The amount of genetic variation among five major geographical regions was lesser than the percent due to variation among populations within regions (Öct= 0.096 and Ösc= 0.211, respectively). Further analysis revealed that almost 14.57% of among group variation was due to regional boundaries which also defined their linguistic affinities (language sub-families). The increase in “among group variation” and “among populations within groups” was non-significant when only four major linguistic families were used as a criterion for grouping Indian populations. High Fct and Fsc values suggest that although significant structuring occurs within the populations of India, they could not be partitioned either geographically or linguistically. Apportionment of populations into two broad socio-ethnic group; caste and tribes depicted only 8% difference between them, which further decreased to 6.63% when the caste populations were further resolved into upper, middle and lower groups. Among themselves, the caste populations were not very different, harboring only 1.6% variation among them. ships among India populations, pairwise FST distances were estimated on the Y-haplogroup frequencies. On the whole, populations clustered according to their Y-chromosome lineages. Two distinct clusters of Indo-European and Dravidian speakers were discernible in the NJ tree on 87 Indian populations, where except for a few deviations most of the populations clustered within their linguistic family. Austro-Asiatic and Tibeto-Burman speakers harboring O2a Ychromosome lineage described a separate cluster, while Tharu grouped with Mara, Lai and Kuki carrying O3e lineage, and formed a branch distant from other Tibeto-Burman tribes (Fig. 3). MDS plot also substantiated the genetic proximity of Austro-Asiatic and Tibeto-Burman speakers to the populations of South East Asia. Most of the other Indian populations were closer to the IndoEuropean speakers of Central Asia and Eastern Europe (Russia and Siberia) but distant from populations of Western Europe, while populations of Middle East and Caucasus region formed a separate cluster in the MDS plot. Populations of Uttar Pradesh, Bihar and Punjab were moderately distant from other IndoEuropean speakers, while those of Pakistan remained between Indians, Central Asia and Russia (Fig. 4). DISCUSSION Genetic Relationships among the Indians and with World Populations To investigate the extent of genetic relation- This comprehensive study of Y-chromosome diversity within India aims to identify evolutionary events (founder effects, gene flow and Table 5: Genetic Differentiation in Indians at different levels of hierarchy based on Y-SNP Data Within Population No. of Groups Total Geography Regional Language Social Castes a CS vs TR$ b c UP vs MD vs LW# 5 14 4 4 2 4 4 3 % 72.89 71.22 71.95 69.16 69.88 69.79 71.49 82.29 83.73 Among Population Within Groups FST* % 0.271 0.287 0.28 0.308 0.301 0.302 0.285 0.177 0.162 19.09 13.48 15.53 17.17 21.73 21.89 16.21 14.65 FSC* 0.211 0.157 0.183 0.197 0.237 0.234 0.164 0.148 Among Groups % 27.11 9.69 14.57 15.31 12.96 8.48 6.63 1.51 1.61 *All values are statistically significant at p<0.05 $ CS: Castes: TR: Tribes # UP: Upper castes; MD: Middle castes; LW: Lower Castes a: Includes Karmali and Maheli under Austro-Asiatic; Tharu under Tibeto-Burman language family b: Includes Upper, Middle, Lower Castes and Tribes c: Includes Upper, Middle, Lower Castes and excludes Austro-Asiatic and Tibeto-Burman Tribes FCT* 0.096 0.145 0.153 0.129 0.084 0.066 0.015 0.016 111 EARLIEST SETTLERS OF INDIAN SUBCONTINENT Indo-Europeans Tibeto-Burmans Dravidians Austro-Asiatics/ Tibeto-Burmans Fig. 3. Genetic relationship among populations of India based on F ST distances estimated on Y-Haplogroup frequencies 112 R. TRIVEDI, SANGHAMITRA SAHOO, ANAMIKA SINGH ET AL. Fig. 4. Genetic relationship between populations of India and world estimated from Y-Chromosome haplogroup frequencies represented in MDS plot genetic drift) and factors (geographical, linguistic and cultural barriers) that might have produced a high degree (27%) of genetic differentiation among the Indian patrilines. Here we also evaluate some of the suggested theories of occupation of Indian subcontinent by modern humans and population histories, in the light of current molecular genetic evidences. Phylogeography of Indian Y-Chromosomes India is a relict area, which is likely to have served as an incubator during the early dispersal of modern humans out of East Africa (QuintanaMurci et al. 1999; Cann 2001) and a treasure-house of ancient population genetic signatures in its gene pool. This is reflected in the 24 different haplogroups which were observed in the present Y-chromosome analysis of 1434 Indian males. Overall haplogroup diversity among Indian populations was relatively high (0.893) in contrast to other European or East Asian populations, but was closer to that of Central Asia. This pattern of high NRY diversity (Y-SNP and Y-STR) indicates an early settlement of the Indian subcontinent by anatomically modern humans. Four haplogroups; H= 23%; R1a1=17.5%; O2a=15% and R2=13.5%, form major paternal lineage of Indians and together account for ~70% of their Y-chromosomes. Being largely restricted to the Indian subcontinent, haplogroup H is assumed to be associated with the eastward expansion of M89 Y-chromosomes from the Leventine corridor, which also carried the two late Pleistocene mt DNA haplogroups, U2 and U7 into India. Although the M69 Y-chromosomes are particularly predominant among the Dravidian speakers of south India, its fairly uniform distribution across different regions and socioethnic groups of India suggests deep time depth for these lineage clusters. Based on the predominance of M17 lineage among diverse linguistic families (Indo-European, Altaic, Uralic and Caucasian) and geographic regions (Central Asia, Europe, Caucasus, Middle East), (Wells et al. 2001; Underhill et al. 2000; Karafet et al. 1999; Rosser et al. 2000; Nebel et al. 2000) it has been associated with the Kurgan culture, domestication of horses and spread of Indo-European languages, all which supposedly EARLIEST SETTLERS OF INDIAN SUBCONTINENT originated in southern Russia/Ukraine and subsequently extended to Europe, Central Asia around 3000 B.C (Wells et al. 2001). Its presence in India has been linked to the “Aryan” migration and subsequent spread of Indo-European languages, appearance of iron and Painted Grey Ware culture in North West frontier (Cavalli-Sforza et al. 1994). However, antiquity and geographic origin of this lineage still remains contentious. Our study reveals that this lineage is present in a significant proportion among the Indian IndoEuropean speakers, and is proportionately high among upper caste groups (Table 2a). The dispersion of this lineage into the southern tribal groups (Kivisild et al. 2003, Cordaux et al. 2004) and the fact that it is proportionately distributed between the Dravidian and Indo-European tribal groups provides significant evidence against any major influx of Indo-European speakers that could have drastically changed the Indian male gene pool (Sahoo et al. 2006; Sengupta et al. 2006). The high average STR variance (0.896) and TMRCA supports a rapid population growth and expansion of M17 Y-chromosomes, which contributed M17 lineages both to Central Asian nomads and South Asian tribes much before the Indo-European introgression into India. Another sub-lineage of M173, R1b3-M269 is present at appreciable frequencies 14.5% in Turkey (Cinnioglu et al. 2004) and at considerable frequency in Europe (Cruciani et al. 2002), while it is detected at relatively low frequency (1.9%) in India, substantiating a recent and limited admixture with west Europeans. The observed high frequency of R2 Ychromosomes in Indians, which is equivalent to that of haplogroup H among Dravidian speakers, corroborates previous reports suggesting its Indian origin (Cordaux et al. 2004). The deep coalescence time for R2 lineages, dating back to Late Pleistocene, supports its indigenous origin. Outside India, it is found in Iran and Central Asia (3.3%) and among Roma Gypsies of Europe, known to have historical evidence of their migration from India (Wells et al. 2001). Within India, while it is predominant in both eastern and southern regions, its distribution pattern is rather patchy in east (Sahoo et al. 2006). It is most likely that genetic drift or bottleneck has reduced the paternal diversity of Karmali, which contributes 28% of the eastern R2 lineages. This population although considered to be Austro-Asiatic speaker, does not present any evidence of O2a Y- 113 chromosome lineage, portraying a distinctly different history. On an average, the patterns of NRY haplogroup variation of Indians reflect that populations of the subcontinent are not very distinct from each and probably have share a few common paternal ancestors. The lineage diversity was small for Austro-Asiatic and Tibeto-Burman speakers and most of them harbored single lineage, indicating a founder paternal source for these endogamous groups, which are confined to the eastern and north-eastern regions of India. Haplogroup diversities were rather high for populations of south India (average of 0.740), giving concordant evidences of a relative early settlement, growth and expansion of populations living in southern India. Traces of Ancient Migration of Modern Humans Recent studies provide substantial evidences in favor of the southern route hypothesis for the dispersal of modern human ~ 60-75 kya from the horn of Africa along the tropical coast of Indian Ocean to reach insular South East Asia and Oceania (Cann 2001; Stringer 2000). A strong Ychromosome support to this model is the distribution of haplogroup C lineages in Asia (Kivisild et al. 2003). Australo-Melanesia and North America (Karafet et al. 1999). Although present in low frequencies in Indian subcontinent, (Bamshad et al. 2001; Sengupta et al. 2006; Kivisild et al. 2003; Cordaux 2004; Wells 2006; Ramana et al. 2001) it is largely distributed along the coastal regions, with a few patchy occurrences in Punjab. However, the persistence of M130 lineages mostly among the south Indians, Pakistan (Qamar et al. 2002) and Sri Lanka (Kivisild et al. 2003) provides indirect evidence in support of the southern route of migration by early modern humans. We have previously suggested that lack of haplogroup C sub-lineages (M217, M38 and M8) is indicative of the indigenous origin of most Indian populations and argues against the theory of Aryan migration from Central Asia (Sahoo et al. 2006). The present analysis showed none of the deletions (DYS390.1 or DYS 390.3) associated with Australian or Polynesian C* chromosomes, contesting the claims of link between India and Australian aboriginals (Redd et al. 2002). . The age estimate of approximately 49KYA years in Indian samples indicates a probable Indian origin of this lineage. However, until further analysis 114 R. TRIVEDI, SANGHAMITRA SAHOO, ANAMIKA SINGH ET AL. and age estimates in other world populations are known, it cannot be conclusively proven if RPS4YT mutation arose in India or arrived with the earliest migrants after it arose somewhere in west Asia, from where it was finally lost or diluted during the Upper Paleolithic expansion of modern humans (Underhill et al. 2001). Genesis of Caste Structure and Influence of Migrations on the Indian Gene Pool The main feature of Indian society is that it is highly structured by social factors such as caste system, in which birth determines the position in the society, mode of subsistence (occupation), and choice of marriage partners. However, genesis of caste system in India is ambiguous, since many of the caste groups are known to have tribal origins (Kosambi 1964). Further the migration of Greeks, Huns, Arabs, Chinese, Turks, Persians, Portuguese and others have made understanding the nature of population structure more complex. mt DNA analysis from different geographic region and social status showed that maternal haplogroups in India are derived from a limited number of founder lineages of M and N clades supporting a common proto-Asian ancestry with limited gene flow from later migrants (Kivisild et al. 2003; Basu et al. 2003). Our study reveals that there is virtually no genetic difference in the Y-chromosomes between the caste groups and tribes (Table 5). Whatever minor difference is present is largely due to haplogroup O2a, contributed exclusively by the Austro-Asiatic and Tibeto-Burman tribes. Our present analysis (AMOVA and haplogroup frequency distribution in populations excluding the Austro-Asiatic tribes of Jharkhand and Orissa and Tibeto-Burman tribes from Northeast) provides congruent evidence in support to the hypothesis that populations in India largely derive their gene pool from the common Pleistocene settlers. High frequency of J2 and R1a1 lineages mirror a greater influence of Indo-European migrants on upper caste populations of Gangetic plains compared to the peninsular southern regions. However, these skewed frequencies also suggest that the indigenous populations received limited external gene flow from Europe, Central and West Asia. This is also supported by mt DNA haplogroups that depict Indian-specific lineages with a limited contribution from both west and east Eurasian populations (Metspalu et al. 2004). A similar trend of J2 and R1a1 among caste populations would probably provide a simplistic assumption that agriculture was brought along with caste system by the Indo-European speakers as a result of demic diffusion of early farmers from southwestern Iran, Fertile Cresent and Anatolia (QuintanaMurci et al. 2001; Cordaux et al. 2004). However, the absence of other Neolithic markers of early farmers, M35 and M201; that are prevalent in Europe, Anatolia, South Caucasus and Iran (Semino et al. 2000; Underhill et al. 2001) among Indians, in addition to the frequency of M172 in southern and western India and its persistence in south India and tribal groups (Table 2a) questions the validity of this hypothesis. Agriculture in India probably arose as two independent events; one that was a consequence of earliest migration that brought the Dravidian speakers and another much later through spread of rice cultivators from SE Asia (Fuller 2003; Diamond et al. 2003). Insights into Origin of Austro-Asiatic and Tibeto-Burman speakers The origin of two language families, AustroAsiatic and Tibeto-Burman in India is of particular interest and has received considerable attention (Basu et al. 2003; Cordaux et al. 2004). In the present study, analysis of eleven Austro-Asiatic and seven Tibeto-Burman tribes from the eastern and north-eastern region of India, establishes that the male gene pool of these groups are distinctly different from other mainland tribal populations (Table 2a). An overall low Y-STR haplotype diversity and complete fixation of O2a in some of the aboriginal tribes (Ho, Santhal, Juang, Birhor and Munda) suggests that these tribes probably experienced a major demographic event, such as common founder effect followed by a bottleneck that greatly reduced the Y-chromosome diversity in the Austro-Asiatic tribes of eastern India (Table 1). In contrast to Austro-Asiatic tribes, the TibetoBurman tribes harbor both O2a and O3e lineages in their Y-chromosomes. Interestingly, the two branches of the Tibeto-Burman language; Himalayish and Naga-Kuki-Chin, could be distinctly identified from their Y-chromosomes. The former depicts influence of Tibetan gene pool, marked by the presence of Haplogroup D lineages in Bhutia and Tharu (Sahoo et al. 2006), while the other linguistic branch harbors O3e lineages. The predominance of O haplogroup and its sub- 115 EARLIEST SETTLERS OF INDIAN SUBCONTINENT lineages in populations of East Asia suggest a SE Asian origin of Indian Austro-Asiatic and Tibeto-Burman speakers. We hypothesize that the Tibeto-Burman speakers came as a number of migratory events, while the Austro-Asiatic tribes probably arrived in India as a single event. The two groups probably migrated into India at different time period is evident from the absence of O3e lineages among Austro-Asiatic speakers, which probably are the earliest immigrants of the two. Presence of an Austro-Asiatic speaking tribe, Khasi, among the Tibeto-Burman speaking neighbors in the northeast corroborates this assumption. While the Tibeto-Burman speakers brought in a number of East Asian maternal lineages (A, B5b, F1b, M8c, M8z) (Metspalu et al. 2004), absence of these lineages in AustroAsiatic tribes (Thangaraj et al. 2005; Sahoo 2006a) portrays two different scenarios. First, the earlier exodus from South East Asia was probably a major male–mediated migration into India, or that the female gene pool of the migrating East Asians is completely lost among the Austro-Asiatic tribes. Additional confirmation to this hypothesis is provided with evidences of agricultural expansions from their homelands in China, at different times and over different geographic ranges. Austro-Asiatics are presumed to have spread west and south from southern China into the Indian subcontinent and Malay Peninsula and brought rice cultivation with them (Higham 2003; Bellwood 2004). The genetic evidence revealed in this study is consistent with anthropological records (Guha 1935), which suggests that SinoTibetans dispersed from the Yellow River and came into India through two different routes; one from Burma probably brought the Naga-Kuki-Chin language and O3e Y-chromosomes and the other from Himalayas, which carried the YAP lineages into northern regions of subcontinent. Age of Human Occupation in IndiaAustro-Asiatic or Dravidians as First Settlers? Based on socio-cultural and linguistic evidences (Thapar 1995; Pattanayak 1998) and results based on mt DNA HVSI nucleotide diversity and highest frequencies of mitochondrial M haplogroup (Roychoudhary et al. 2001; Basu et al. 2003), it was asserted that Austro-Asiatic tribes are the earliest settlers in India. The present comprehensive Y-chromosome analysis, which includes populations of all linguistic and socio- ethnic affiliations, however, suggests people of south India as the original settlers of the subcontinent. The total lineage diversity and distribution of Indian-specific Y-chromosome haplogroups (H, L, C, R1a1 and R2) in different geographical and socio-linguistic layers of the Indian populations provides substantial support in favor of this hypothesis. This theory also gathers adequate evidence from presence of the coastal marker, RPS4Y, in the south Indian tribes, who probably represent remnants of the modern human migration out of Africa that took the southern route to Australia. Any possibility that Austro-Asiatic speakers could have dispersed from India is also eliminated based on the differential distribution of O2a Y-chromosomes in southern China and India and the complete absence of East-Asian specific mt DNA lineages in Austro-Asiatic and Dravidian speakers of India. mt DNA haplogroups of Indian Austro-Asiatic speakers are instead, probably a sub-group of their Dravidian neighbors (unpublished data, Kashyap et al.). Recent archeological and linguistic evidences corroborate a Neolithic expansion of Austro-Asiatic languages from Yangtze River basin (Higham 2003) and our present study supports an east-west clinal expansion of AustroAsiatic males from South East Asia, which was not associated with any female gene flow. Further, deeper coalescence age for the Y-chromosome haplogroups C, H, R2 compared to O2a is consistent with hypothesis that Austro-Asiatic speakers cannot be considered as the earliest settlers of South Asia. CONCLUSIONS We find that genetic variation in India is characterized by a high Y-chromosome diversity, which is reflected by a greater correspondence with linguistic groups of India. Our results demonstrate India as a hotspot both as an important source and recipient of major Ychromosome lineages of the world. Haplogroup distribution and AMOVA results provide tandem evidence in support a common Pleistocene origin of Indian populations, which was subsequently followed by migrations of Austro-Asiatic speaking tribal males from SE Asia. The TibetoBurman populations were later migrants who took two different routes and carried both male and female lineages specific to East Asia. Based on deep coalescence age estimates of H, R2 and C Y- R. TRIVEDI, SANGHAMITRA SAHOO, ANAMIKA SINGH ET AL. 116 chromosome lineages, their diversity and distribution pattern, our data suggests an early Pleistocene settlement of South Asia by Dravidian speaking south Indian populations; the Austro-Asiatic speakers migrated much later from SE Asia and probably contributed only paternal lineages while amalgamating with the aboriginal populations of the region. ACKNOWLEDGEMENTS We express our appreciation to all the original donors who made this study possible. This study was made possible through facilities provided at CFSL, Kolkata. We acknowledge all researchers whose valuable data was used for this study. The SS, AS, JB, MT, SG, RR, RA are grateful to the Directorate of Forensic Sciences, MHA for the Senior Research Fellowship. GHB and TS are recipients of Senior Research Fellowship from CSIR, India. This research was supported by a financial grant to CFSL, Kolkata under the Xth Five Year Plan of the Govt. of India. Electronic –Database Information URLs for the data mentioned in this article are as follows: XL STAT pro 7.5, http://www.xlstat.com Network 4.1, http://www.fluxus-engineering. com http://www.ethnologue.com REFERENCES Bamshad M, Kivisild T, Watkins WS, Dixon ME, Ricker CE, Rao BB, Naidu JM, Prasad BV, Reddy PG, Rasanayagam A, Papiha SS, Villems R, Redd AJ, Hammer MF, Nguyen SV, Carroll ML, Batzer MA, Jorde LB 2001. Genetic evidence on the origins of Indian caste populations. Genome Res 11: 994-1004 Bandelt HJ, Forster P, Rohl A 1999. Median-joining networks for inferring intraspecific phylogenies. Mol Biol Evol, 16: 37-48 Basu A, Mukherjee N, Roy S, Sengupta S, Banerjee S, Chakraborty M, Dey B, Roy M, Roy B, Bhattacharyya NP, Roychoudhury S, Majumder PP 2003. Ethnic India: A genomic view, with special reference to peopling and structure. Genome Res, 13: 2277-2290. Bellwood P 2004. Tracking the spreads of farming beyond the fertile Crescent: Europe and Asia. In: First Farmers: The Origins of Agricultural Societies. pp 87 Butler JM, Schoske R, Vallone PM, Kline MC, Redd AJ, Hammer MF 2002. A novel multiplex for simultaneous amplification of 20 Y chromosome STR markers. Forensic Sci Int, 129: 10-24. Cann RL 2001. Genetic clues to the dispersal of human populations: Retracing the past from the present. Science, 291: 1742-1748 Cavalli-Sforza LL, Menozzi P, Piazza A 1994. The History and Geography of Human Genes. Princeton University Press, Princeton pp 208-213 Cinnioglu C, King R, Kivisild T, Kalfoglu E, Atasoy S, Cavalleri GL, Lillie AS, Roseman CC, Lin AA, Prince K, Oefner PJ, Shen P, Semino O, Cavalli-Sforza LL, Underhill PA 2004. Excavating Y-chromosome haplotype strata in Anatolia. Hum Genet, 114: 127148. Cordaux R, Aunger R, Bentley G, Nasidze I, Sirajuddin SM, Stoneking M 2004. Independent origins of Indian caste and tribal paternal lineages. Curr Biol, 14: 231-235. Cordaux R, Deepa E, Vishwanathan H, Stoneking M 2004. Genetic evidence for the demic diffusion of agriculture to India. Science, 304: 1125. Cordaux R, Weiss G, Saha N, Stoneking, M 2004. The northeast Indian passageway: a barrier or corridor for human migrations? Mol Biol Evol, 21: 15251533 Cruciani F, Santolamazza P, Shen P, Macaulay V, Moral P, Olckers A, Modiano D, Holmes S, Destro-Bisol G, Coia V, Wallace DC, Oefner PJ, Torroni A, CavalliSforza LL, Scozzari R, Underhill PA 2002. A back migration from Asia to sub-Saharan Africa is supported by high-resolution analysis of human Ychromosome haplotypes. Am J Hum Genet, 70: 1197-1214. Deraniyagala SU 1992. The Prehistory of Sri Lanka: An Ecological Perspective. Colombo: Department of The Archeological Survey, Government of Sri Lanka Diamond J, Bellwood P 2003. Farmers and their languages: The first expansions. Science, 300: 597-602. Excoffier L, Smouse PE, Quattro JM 1992. Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics, 131: 479-491. Forster P, Rohl A, Lunnemann P, Brinkmann C, Zerjal T, Tyler-Smith C, Brinkmann B 2000. A short tandem repeat-based phylogeny for the human Y chromosome. Am J Hum Genet, 67: 182-196 Fuller D 2003. An agricultural perspective on Dravidian historical linguistics: archaeological crop packages, livestock and Dravidian crop vocabulary. In: P Bellwood, C Renfrew (Eds.): Examining the Farming/ Language Dispersal Hypothesis. McDonald Institute for Archaeological Research, Cambridge. pp. 191213. Guha BS 1935. The racial affinities of the people of India. In: Census of India, 1931, Part III-Ethno-graphical Higham C 2003. Languages and farming dispersals: Austro-Asiatic languages and rice cultivation. In: P Bellwood, C Renfrew (Eds.): Examining the Farming/ Language Dispersal Hypothesis. McDonald Institute for Archaeological Research, Cambridge James HVA, Petraglia MD 2005. Modern Human origins and the evolution of behavior in the later Pleistocene Record of South Asia. Curr Anthropol, 46 Supp: S3S27 Karafet T, Xu L, Du R, Wang W, Feng S, Wells RS, Redd AJ, Zegura SL, Hammer MF 2001. Paternal EARLIEST SETTLERS OF INDIAN SUBCONTINENT Population History of East Asia: Sources, Patterns and Microevolutionary Processes. Am J Hum Genet, 69: 615-628 Karafet TM, Zegura SL, Posukh O, Osipova L, Bergen A, Long J, Goldman D, Klitz W, Harihara S, de Knijff P, Wiebe V, Griffiths RC, Templeton AR, Hammer MF 1999. Ancestral Asian source(s) of new world Ychromosome founder haplotypes. Am J Hum Genet 64: 817-831 Kennedy K 2000. God, Apes and Fossil Men: Paleoanthropology in South Asia. Ann Arbor: University of Michigan Press Kivisild T, Rootsi S, Metspalu M, Mastana S, Kaldma K, Parik J, Metspalu E, Adojaan M, Tolk HV, Stepanov V, Golge M, Usanga E, Papiha SS, Cinnioglu C, King R, Cavalli-Sforza L, Underhill PA, Villems R 2003. The genetic heritage of the earliest settlers persists both in Indian tribal and caste populations. Am J Hum Genet, 72: 313-332 Kosambi DD 1964. The Culture and Civilization of Ancient India in Historical Outline, New Delhi: Vikas Publishing House Pvt. Ltd. Macaulay V, Hill C, Achilli A, Rengo C, Clarke D, Meehan W, Blackburn J, Semino O, Scozzari R, Cruciani F, Taha A, Shaari NK, Raja JM, Ismail P, Zainuddin Z, Goodwin W, Bulbeck D, Bandelt HJ, Oppenheimer S, Torroni A, Richards M 2005. Single, rapid coastal settlement of Asia revealed by analysis of complete mitochondrial genomes. Science, 308: 1034-1036 Metspalu M, Kivisild T, Metspalu E, Parik J, Hudjashov G, Kaldma K, Serk P, Karmin M, Behar DM, Gilbert MT, Endicott P, Mastana S, Papiha SS, Skorecki K, Torroni A, Villems R 2004. Most of the extant mtDNA boundaries in south and southwest Asia were likely shaped during the initial settlement of Eurasia by anatomically modern humans. BMC Genet, 5: 26 Misra VN 2001 Prehistoric human colonization of India. J Biosci, 26: 491-531 Nebel A, Filon D, Weiss DA, Weale M, Faerman M, Oppenheim A, Thomas MG 2000. High-resolution Y chromosome haplotypes of Israeli and Palestinian Arabs reveal geographic substructure and substantial overlap with haplotypes of Jews. Hum Genet, 107: 630-641 Passarino G, Cavalleri GL, Lin AA, Cavalli-Sforza LL, Borresen-Dale AL, Underhill PA 2002. Different genetic components in the Norwegian population revealed by the analysis of mtDNA and Y chromosome polymorphisms. Eur J Hum Genet, 10: 521-529 Pattanayak DP 1998. The language heritage of India. In: Balasubramanian and NA Rao (Eds.): The Indian Human Heritage. Hyderabad. pp: 95-99 Qamar R, Ayub Q, Mohyuddin A, Helgason A, Mazhar K, Mansoor A, Zerjal T, Tyler-Smith C, Mehdi SQ 2002. Y-chromosomal DNA variation in Pakistan. Am J Hum Genet, 70: 1107-1124 Quintana-Murci L, Krausz C, Zerjal T, Sayar SH, Hammer MF, Mehdi SQ, Ayub Q, Qamar R, Mohyuddin A, Radhakrishna U, Jobling MA, Tyler-Smith C, McElreavey K 2001. Y-chromosome lineages trace diffusion of people and languages in southwestern Asia. Am J Hum Genet, 68: 537-542. Quintana-Murci L, Semino O, Bandelt HJ, Passarino G, McElreavey K, Santachiara-Benerecetti AS 1999. Genetic evidence of an early exit of Homo sapiens 117 sapiens from Africa through eastern Africa. Nat Genet, 23: 437-441 Ramana GV, Su B, Jin L, Singh L, Wang N, Underhill P, Chakraborty R 2001. Y-chromosome SNP haplotypes suggest evidence of gene flow among caste, tribe, and the migrant Siddi populations of Andhra Pradesh, South India. Eur J Hum Genet, 9: 695700. Redd AJ, Roberts-Thomson J, Karafet T, Bamshad M, Jorde LB, Naidu JM, Walsh B, Hammer MF 2002. Gene flow from the Indian subcontinent to Australia: Evidence from the Y chromosome. Curr Biol, 12: 673-677. Renfrew C 1989. The origins of Indo-European languages. Sci Am, 261: 82-90. Rosser ZH, Zerjal T, Hurles ME, Adojaan M, Alavantic D, Amorim A, Amos W, et al. 2000. Y-chromosomal diversity in Europe is clinal and influenced primarily by geography, rather than by language. Am J Hum Genet, 67: 1526-1543. Roychoudhary S, Roy S, Basu A, Banerjee R, Vishwanathan H, Usha Rani MV, Sil SK, Mitra M, Majumder PP 2001. Genomic structures and population histories of linguistically distinct tribal groups of India. Hum Genet, 109: 339-50. Sahoo S and Kashyap VK 2006. Phylogeography of mitochondrial DNA and Y-Chromosome haplogroups reveal asymmetric gene flow in populations of Eastern India. Am J Phys Anthropol, (in press). Sahoo S, Singh A, Himabindu G, Banerjee J, Sitalaximi T, Gaikwad S, Trivedi R, Endicott P, Kivisild T, Metspalu M, Villems R, Kashyap VK 2006. A prehistory of Indian Y chromosomes: evaluating demic diffusion scenarios. Proc Natl Acad Sci USA, 103: 843-848 Sambrook J, Fritsch EF, Maniatis T 1989. Molecular Cloning. A Laboratory Manual. 2nd Ed. CSHL Press, Cold Spring Harbor, NY Schneider S, Roessli D, Excoffier L 2000. ARLEQUIN ver 2.0.a software for Population Genetics Data Analysis. Geneva: Genetics and Biometry Laboratory, University of Geneva. Semino O, Passarino G, Oefner PJ, Lin AA, Arbuzova S, Beckman LE, De Benedictis G, Francalacci P, Kouvatsi A, Limborska S, Marcikiae M, Mika A, Mika B, Primorac D, Santachiara-Benerecetti AS, Cavalli-Sforza LL, Underhill PA. 2000. The genetic legacy of Paleolithic Homo sapiens sapiens in extant Europeans: a Y chromosome perspective. Science, 290: 1155-1159. Sengupta S, Zhivotovsky LA, King R, Mehdi SQ, Edmonds CA, Chow CE, Lin AA, Mitra M, Sil SK, Ramesh A, Usha Rani MV, Thakur CM, CavalliSforza LL, Majumder PP, Underhill PA 2006. Polarity and temporality of high-resolution Ychromosome distributions in India identify both indigenous and exogenous expansions and reveal minor genetic influence of central Asian pastoralists. Am J Hum Genet, 78: 202-221. Shi H, Dong YL, Wen B, Xiao CJ, Underhill PA, Shen PD, Chakraborty R, Jin L, Su B 2005. Ychromosome evidence of southern origin of the East Asian-specific haplogroup O3-M122. Am J Hum Genet, 77: 408-419 Singh, KS 1998. India’s Communities. National Series. People of India. New Delhi: Oxford University Press. 118 R. TRIVEDI, SANGHAMITRA SAHOO, ANAMIKA SINGH ET AL. Stringer C 2000. Coasting out of Africa. Nature, 405: 24-25 Su B, Xiao C, Deka R, Seielstad MT, Kangwanpong D, Xiao J, Lu D, Underhill P, Cavalli-Sforza L, Chakraborty R, Jin L 2000. Y chromosome haplotypes reveal prehistorical migrations to the Himalayas. Hum Genet, 107: 582-590 Thangaraj K, Sridhar V, Kivisild T, Reddy AG, Chaubey G, Singh VK, Kaur S, Agarawal P, Rai A, Gupta J, Mallick CB, Kumar N, Velavan TP, Suganthan R, Udaykumar D, Kumar R, Mishra R, Khan A, Annapurna C, Singh L 2005. Different population histories of the Mundari- and Mon-Khmer-speaking Austro-Asiatic tribes inferred from the mtDNA 9-bp deletion/ insertion polymorphism in Indian populations. Hum Genet, 116: 507-517 Thapar R 1995 The first millennium B.C. in the northern India. In: R. Thaper (Ed.): Recent Perspective of Early Indian History. Bombay. pp. 80-141 Underhill P, Passarino G, Lin AA, Shen P, Mirazón Lahr M, Foley RA, Oefner PJ, Cavalli-Sforza LL 2001. The phylogeography of the Y chromosome binary haplotypes and the origins of modern human populations. Ann Hum Genet, 65: 43-62 Underhill PA, Shen P, Lin AA, Jin L, Passarino G, Yang WH, Kauffman E, Bonne-Tamir B, Bertranpetit J, Francalacci P, Ibrahim M, Jenkins T, Kidd JR, Mehdi SQ, Seielstad MT, Wells RS, Piazza A, Davis RW, Feldman MW, Cavalli-Sforza LL, Oefner PJ 2000. Y chromosome sequence variation and the history of human populations. Nat Genet, 26: 358-361 Wells RS, Yuldasheva N, Ruzibakiev R, Underhill P, Evseeva I, Blue-Smith J, Jin L, et al. 2001. The Eurasian heartland: A continental perspective on Ychromosome diversity. Proc Natl Acad Sci USA, 98: 10244-10249 Zhivotovsky LA, Underhill PA, Cinnioglu C, Kayser M, Morar B, Kivisild T, Scozzari R, Cruciani F, DestroBisol G, Spedini G, Chambers G., Herrera RJ, Yong KK, Gresham D, Tournev I, Feldman MW, Kalaydjieva L 2004. The Effective Mutation Rate at Y Chromosome Short Tandem Repeats, with Application to Human Population-Divergence Time. Am J Hum Genet, 74: 50-61.