Citizenship Amendment Act (CAA/CAB) in Numbers

Abhijeet Pokhriyal
10 min readDec 17, 2019

What were the reasons that lead to this unprecedented act ?

Goal of this research was to stay unbiased and better understand how things got to where they are today with respect to the new act by the government and how it might impact the people.

How I wanted to go about it was to the get numbers from different sources about refugees, visas, undocumented immigration data etc and then try to reason about the things going on.

I am going to list the key findings right up here so that people who probably are too biased or don’t have capacity to reason can save some time. I appreciate constructive criticism, therefore if I have misinterpreted or missed something or if there is another line of reasoning that can be looked into, I am open to suggestions.

Key findings:

  1. It’s difficult for me to debate morality and ethics of the decision and therefore if according to you the decision is wrong in principle then the discussion can end here. I support you. But here I am not taking that route of reasoning, it’s the numbers.
  2. Excluding one particular religion made no logical sense even for a particular area — north east, I could see no reason why the law can’t accommodate people on case by case basis.
  3. What’s even weirder is that the protesters don’t seem to understand the impact. They seem to have clearly overestimated the potential of the act. From numbers below you will see that in reality India is not a hot destination for refugees anyways. We are not that great. Just FYI people looking to start a new life feel way more comfortable in countries which have similar culture and interests.
  4. Timing of this whole thing seems out of place. Well sort of. The number of refugees/immigrants in India have stayed pretty much the same in past 2 decades. Bringing in such controversial laws can only be attributed to political motivation.
  5. From North East’s perspective it’s just Bizarre. From what I understood they fear demographics changing because of influx of legal/illegal immigrants from surrounding countries. Not that it’s great, but in its current form the act excludes Muslims. That takes away major chunk of migrants from Bangladesh and Myanmar out of the picture. There are not many Hindus/Jains migrating anyways. Even if all Hindus from Bangladesh — 8 Million, migrated to India, That’s still around 6% of the population of West Bengal and Assam Combined. Also it’s not the Indian Government but international community that’s paying for most of expenses. What are people worried about ?

SO now let’s start with the analysis.

First source of data was UNHRC — somewhat credible source right?.

For people who are not interested in the code can skip the grayed out sections.

Loading Data on Refugees from UNHRC.

This dataset has data for last couple of decades and includes the countries from where the migration and to which the migration has been taking place. Idea is look at the numbers and try and understand what really triggered recent developments.

seekers <- read.csv(“./UNdata_Export_20191216_191045051.csv”)
seekers <- seekers %>% clean_names()
coi <- c(“Pakistan” , “Bangalesh” , “Afghanistan” , “Sri Lanka” , "China")

Above I defined some countries of interest like Afghanistan, Bangadesh, Srilanka , China(Tibet) as coi — these are the countries referenced in CAA.

Below i am just Wrangling data to make column names shorter and accessible

cols <- colnames(seekers)
cols[1] <- "residence"
cols[2] <- "origin"
cols[6] <- "total_refugee_like"
cols[7] <- "total_refugee_like_assisted_unhcr"
colnames(seekers) <- cols

Lets take a look at the data itself.

kable(head(seekers)) %>%
kable_styling(bootstrap_options = c("striped", "hover"))

Columns available

Data Head

Filtering data where residence = India. That is — data for where people are flocking to India.

forindia <- seekers %>% filter(residence == "India")
forindia$maxcount <- apply(forindia %>% select(contains("refugee")) ,1, function(r) { max(r , na.rm=TRUE) })

Plotting overall overtime refugee count for India

yoyref <- forindia %>% group_by(residence ,year) %>% summarize(totals = sum(maxcount))ggplot(data = yoyref , aes(x=year , y = totals)) + geom_line(color = lightlinecol) + 
theme +
scale_y_continuous(labels = scales::comma)

We see that the numbers shot up in early 1990s (reasons below) but over the past few decades things have been pretty constant. No major influx of refugees.

Lets break the counts down by different origin countries

ggplot(data = forindia , aes(x = year , y = maxcount, color=origin , label=origin)) + 
geom_line() +
theme(legend.position = "none") +
geom_dl(aes(label = origin), method = list(dl.trans(x = x - 1 ,y = y + 0.3), "last.points", cex = 0.8)) +
geom_dl(aes(label = origin), method = list(dl.trans(x = x - 0.2), "first.points", cex = 0.8)) + theme

Key Takeaways form the above chart

  • We can see that it’s Srilanka , China, Myanmar and Afghanistan that pop out as the major contributors.
  • We see Sri Lankan refugees peaked around 1990s when the LTTE crisis hit and that’s when most of the influx happened.
  • Tibetan Refugees data looks to be starting from around the same time. I wonder if its just missing data or some important incident in there as well. Brief background — Wikipedia
  • For Bangladesh we note below that the numbers held constant at around 53K and then there is missing data after 2000.
  • It would have been interesting to see how Bangladesh’s Trend has been since 2000s because it’s at the core of the ongoing issue. But unfortunately the data is not available. This itself might be indicative of other problems that we are facing from Bangladesh. I try later in the article to get data from other source and some key points emerge for Bangladesh in particular.

Another thing that comes to mind is where are people from COI actually taking refuge in ? Is it just India that is impacted ?

coiseekers <- seekers %>% filter(origin %in% coi)
coiseekerstotals <- coiseekers %>% group_by(origin , residence) %>% summarise(totals = sum(maxcount)) %>%
coiseekerstotals <- coiseekerstotals %>% filter(totals > 200000)ggparallel::ggparallel(data = coiseekerstotals,
c("origin" , "residence") ,
weight="totals" , label.size = 8, text.angle=0 , text.offset = 0 ,label=TRUE) +

scale_color_manual(values=sample(color , 34) )+
theme + theme(legend.position="none")

In the above code i have filtered out data based on total refugee count in the residence country to be over 0.2 Million (2 Lakh). That is just to be able to better assess the impact of mass migrations and not focus on the smaller numbers.

Seems like out of the COIs, Afghanistan is the one worst impacted (by Taliban) and people in very large numbers have been seeking refuge in the neighboring countries of Pakistan and Iran. Wikipedia — Afghan Refugees

Compared to Pakistan and Iran, India is taking in a very small numbers from Afghanistan (a Muslim majority country). Let’s zoom in on the impact on India alone.

Now what we see is, what was sort of expected, Pakistan doesn’t even show up on the chart. People from Pakistan don’t seem to be interested in taking refuge in India, sounds reasonable, same argument as for Afghanistan. But the government seems to think that the persecuted minorities are coming from all over the subcontinent.

It also becomes clearer that India hasn’t taken in that many refugees from Bangladesh either , at least the official numbers say so. Compared to Tibet and Sri Lanka — which are predominantly Buddhist, India doesn’t have that many migrants from Bangladesh — another Muslim majority country.

Official numbers from UNHRC show similar results

In these UNHRC figures two new names pop up — Myanmar and Somalia. If i expand the list of COIs to include these two countries as well.

We see that for Somalia the refugees , not just Christians, moved to neighboring countries like Kenya and Ethiopia. For Myanmar, it was Bangladesh and Thailand. Therefore it makes somewhat sense to have location as a parameter in CAA, it’s just natural and convenient.

Myanmar ‘s case is some what more interesting — country that is predominantly Buddhist and recent migration has been mostly from the persecuted Muslims — crisis that we have come to know as Rohingya Refugee Crisis.

Above we see that in a 10 year period from 2007 to 2016, the number of migrants from Myanmar into India has gone from around 2K to 15K. Now that’s a whopping 758% increase. There was a sharp rise in 2012 and it peaked around 2015. Corroborated by the below table.

But again it’s not India that’s worst impacted by the influx

People from Myanmar have mostly settled for Bangladesh, Thailand and Malaysia. Their Neighbors.

Now this puzzles me because, it makes it so much more harder to understand either stances on the act. When people are not even willing to take refuge in India (specially Muslims), when the number of refugees has stayed pretty constant in past few decades, when its other countries like Australia, USA /UNHRC paying for the refugees, what’s the big deal ? Why such a riot ?

North East Story and Role of Bangladesh

Now before we even go there. This is an amazing read by Elena Dabova.

Key Takeaways

  1. One of the reasons for such situation is that the social, economic and other troubles of Muslim population in India are not a result of the oppressive politics of the Indian state. Vast inequalities between the elites and the most of the populations were present in precolonial Indian feudalism, deepened during British rule, and remained the same all along before and after the creation of the independent Indian state in 1948
  2. One of the goals of terrorist actions is promotion of separatism through twisted demographic and immigration politics. For example, by 1993, at least 15 million Bangladeshi Muslims migrated to India illegally, outnumbering Hindu refugees by 3:1. They moved mainly to Assam, West Bengal, Bihar, Tripura and other North East India states.
  3. The infiltration [of illegal immigrants] has invalidated the communal logic of the Muslim league which led to the creation of Pakistan and the Pakistan’s thesis that Muslims could only protect themselves and fourish by being separated from Hindus in their own pure land.
  4. Most politicians though believe that the Bangladeshi Muslims are entering India not voluntarily (Manchanda, 2010;Rai, 1993). Instead they are pushed out by poverty and demographic suffocation in Bangladesh.
  5. The demographic threat to India from international immigration per se is overblown. The proportion of immigrant population to nonimmigrant population in India is about the same or even smaller compared to other countries, especially compared to the United States in the 20th century.

To assess this threat of illegal immigration, I felt it was reasonable to look into visa allocations by India.

Data is from

visa <- read.csv("./VISA_Details_2010-2013-oct.csv")
visa <- visa %>% clean_names()
visa <- visa %>% gather(key="type" , value="counts" ,-country , -mission , -visa_issue_date)
visa$visa_issue_date_d <- visa$visa_issue_date %>% strptime(format="%d-%m-%y") %>% as.Date()
visa$visa_issue_date_year <- visa$visa_issue_date_d %>% format("%Y")
visa$visa_issue_date_month <- visa$visa_issue_date_d %>% format("%m")

First step was to take a look at the different visa categories and how many India issues for each

Business Visa Makes up quite a big proportion but its clearly the Tourist Visa that’s issued the most.

But to whom are they being issued ?

ggplot(data= visa %>% group_by(country ,type) %>% summarise(totals = sum(counts))
, aes(x = country %>% reorder(totals) , y = totals, fill = type)) +
geom_bar(stat="identity") +
theme +
theme(axis.text.y = element_text(angle=0), legend.position="top" )

It’s not that clear just yet. Let me zoom in for you.

Well well. It’s Bangladesh that seems to have most people looking for a tourist visa. Followed by UK and USA. Having seen this, I would expect that Bangladeshi people make up most of the foreign tourists that Visit India right ? Let’s pull more data in.

Tourist’s data from

tourists <- read.csv("./InternationToursits2001_2010.csv")
tourists <- tourists %>% clean_names()

Let’s take a look at number of foreign tourists visiting India over the years

touristsperyear <- tourists %>% gather(key="year" , value="numberoftourists"  , -name_of_countries)
touristsperyear$year <- touristsperyear$year %>% str_remove("x") %>% strptime(format="%Y") %>% as.Date()
ggplot(data= touristsperyear
, aes(x = year , y= numberoftourists, color=name_of_countries)) +
geom_line() +
geom_dl(aes(label = name_of_countries), method = list(dl.trans(x = x - 1 ,y = y + 0.3), "last.points", cex = 0.8)) +
geom_dl(aes(label = name_of_countries), method = list(dl.trans(x = x - 0.2), "first.points", cex = 0.8)) + theme

Now this seems odd. Even though Bangladesh has most number of Visa’s Issued over the years, yet very few people tend to be using that Visa. My likely guess is that people are moving in and then overstaying their welcome. It’s one of the symptoms of the whole “Illegal Immigrant” line of reasoning. But still the numbers are insignificant. Even if we had more data, data for unaccounted migrations, even if the actual number of illegal immigrants are many fold these numbers, it still seems more like politics than a real concern. National Security ? again , I would defer that to Elena’s article.



Abhijeet Pokhriyal

School of Data Science @ University of North Carolina — Charlotte