Week 10.2: On the dynamics of username change behavior on Twitter
Estimated read time: 1:20
Learn to use AI like a Pro
Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.
Summary
This lesson delves into the dynamics of username change behavior on Twitter, derived from an extensive study involving 8.7 million users tracked for two months. The study found that 73% of users modified their profile attributes, with 10% changing their usernames. Notably, 20% of the users triggered 85% of these changes. The motivations behind these changes include gaining more space for text due to Twitter's character limit, participating in trending events, achieving anonymity, adjusting to real-life changes, or malicious intentions like username squatting. The analysis highlighted a weak correlation between a userβs popularity or activity and the frequency of username changes. The researchers also faced data collection challenges due to the limitations of Twitterβs API restrictions, which dictated a randomly sampled dataset from the larger user pool.
Highlights
20% of users cause 85% of username changes due to factors like space gain and trend participation. π
Some users change usernames to gain anonymity, while others may do it for malicious purposes like squatting. π
Figure 3 in the lesson shows the distribution of frequency versus users changing intentionally or rarely. π
Challenges in the study include API limits from Twitter, which restricted more extensive data collection. π
The findings could help improve online privacy and identify behavior patterns related to social media identity changes. π
Key Takeaways
73% of Twitter users change their profile attributes, while 10% change their usernames. π€³
Username change behavior often follows the Pareto principle, where 20% of users make 85% of changes. π
Reasons for changing usernames range from gaining space to participating in trends or evading identification. π
Username squatting is a common issue on Twitter, akin to domain squatting. π¦
There is a weak correlation between a user's popularity or tweet activity and the frequency of username changes. π
Overview
In this session, we explore the intriguing world of Twitter username dynamics, a topic dissected through a meticulous study of user behavior on the platform. Over two months, researchers tracked 8.7 million Twitter users to observe how often usernames and profile attributes were altered and the reasons behind such changes.
The study highlighted that a significant chunk of users altered their usernames for reasons ranging from pragmatic needs, such as gaining more tweet space, to more complex motives like engaging with trending topics, hiding their identity, or making a fresh start. Interestingly, 20% of the users accounted for a majority of the username changes, showcasing a pattern similar to the famed Pareto principle.
Despite these revelations, the study faced certain constraints, primarily due to Twitter's API limitations, which capped the data collected at 10,000 users for frequent observation. This restriction paved the way for future research avenues, suggesting that scaling up the dataset could provide more comprehensive insights into username change behaviors across different user demographics.
Chapters
00:00 - 00:30: Introduction to the course and paper topic The chapter provides an introduction to the course titled 'Privacy and Security in Online Social Media' offered on NPTEL. It sets the stage for studying various patterns related to privacy and security in online social media. The focus is on continuing the analysis and understanding of different patterns and behaviors observed on social media platforms.
00:30 - 01:00: Brief overview of the paper This chapter provides a brief overview of a paper titled 'On the dynamics of username changing behavior on Twitter'. It touches upon the topic of username changes by Twitter users, exploring the reasons behind these changes, the frequency with which they occur, and other related dynamics.
01:00 - 01:30: Paper's research and findings The chapter titled 'Paper's research and findings' delves into the individuals responsible for changes and the benefits derived from altering user handles. The paper is highlighted as intriguing with significant implications. The abstract indicates that prior research demonstrates considerable findings, setting the stage for detailed exploration in the following sections.
01:30 - 03:00: Research methodology and dataset The chapter discusses the methodology and dataset used in a study of Twitter user behavior, specifically focusing on changes in usernames over time. Researchers examined 8.7 million Twitter users over a two-month period. The data collection process is noted to be complex, but it provides a high-level conclusion that only a few users frequently change or favor certain usernames.
03:00 - 04:00: Analysis of username change behavior The chapter titled 'Analysis of username change behavior' examines the patterns and reasons behind why users change their usernames. It concludes that a small group of people change their usernames multiple times, while a slightly larger group changes them infrequently. The chapter also investigates the motivations behind these changes.
04:00 - 05:30: Correlations with popularity and activity The chapter titled 'Correlations with popularity and activity' discusses the use and growth of Twitter. It covers how Twitter is utilized by users, the types of data shared on the platform, and mentions an abstract which summarizes the main points of a paper related to these topics.
05:30 - 06:00: Survey and user reactions The chapter titled "Survey and user reactions" discusses findings from a dataset involving 8.7 million Twitter users over a two-month period. It highlights that 73.21% of these users changed their profile attributes and assigned new values during this time. Additionally, around 10% of users were noteworthy for changing their attributes.
06:00 - 07:30: In-depth analysis of username change reasons The chapter provides an in-depth analysis of the reasons why users change their usernames. It reports that approximately 73 percent of the users alter their profile attributes. A graphical representation shows different attributes on the x-axis and the percentage of users who made changes on the y-axis, with colors indicating the frequency of changes.
07:30 - 09:00: Patterns of username use and conclusions This chapter examines the patterns of username changes among Twitter users. It reveals that approximately 73.21% of 8.7 million users change their attributes, whereas only about 10% of users change their username. The chapter concludes by summarizing the contributions of the paper, though specific contributions are not detailed in the transcript.
Week 10.2: On the dynamics of username change behavior on Twitter Transcription
00:00 - 00:30 Welcome back to the course Privacy and Security
in Online Social Media on NPTEL. So, what we will do now is continuing the
pattern that we have been doing for studying the analysis, studying how different kinds
of patterns can be analyzed on social media.
00:30 - 01:00 I am going to look at this paper called 'On
the dynamics of username changing behavior on Twitter'. I think we have mentioned in this topic briefly
in the past, which is that how users actually change their usernames, why do they change,
what level of frequency does the change happen,
01:00 - 01:30 who are these people who are changing it,
and what are the benefits by changing the user handles that is what we are going to
look at. This is an interesting paper which has some
interesting implications also. So, in the abstract, it looks the author say
that past studies show that a substantial
01:30 - 02:00 section of Twitter users change their username
over time. And authors actually look at 8.7 million users
on Twitter for duration of two months. The data collection is slightly interesting
and complicated also; we look at actually as we progress. So the high level conclusion from the study
is that few favor a username by repeatedly
02:00 - 02:30 choosing it multiple times. So, essentially what the paper would conclude
is that there is small set of people who actually change their handles many times and there
are slightly larger set of people who change it very less number of times, and the paper
also looks at the reasons why people actually
02:30 - 03:00 change their usernames. As I have said before, abstract only summarizes
what is in the paper then in the introduction you talk about the whole growth of Twitter,
in terms of why it is, how users are actually using it and what kind of data is being pushed
onto twitter, so that is what is being discussed
03:00 - 03:30 here. So, the conclusion that, one of the conclusions
that the author have is in our dataset of 8.7 million Twitter users tracked for two
months, they observe that 73.21 percent users change their profile attributes, and assign
new values, about 10 percent of users changes
03:30 - 04:00 their usernames in total. So, this is basically showing you that about
73 percent of the users change their profile attributes and assign new values. So, you can just see here that this is the
x-axis is the different attributes, and y-axis the percentage of users who have changed. And the color here mentions the different
number of times that the changes were made.
04:00 - 04:30 Two means twice the value users actually changed,
and then three times, four values and five values changes. 73.21 percent of the 8.7 million users change
their attributes on Twitter and just about 10 percent of users change their username. So, now let us look at the contributions of
the paper.
04:30 - 05:00 So, there are three contributions on the paper. 20 percent of users trigger 85 percent of
username changes, again this is the same Pareto principle that we have seen in the past, or
a power law pattern that is 20 percent in
05:00 - 05:30 users trigger 85 percent of username changes. Observed to change 5 time or more. Username changing behavior follows a Pareto
principle, 10 percent username change occur after an hour of the earlier username change. I think the username changing pattern is also
interesting because there are multiple reasons
05:30 - 06:00 why people change it, people change, because
they want to get some space. For example, if at all if my account when
I started was Ponnurangam Kumaraguru, which is pretty long and if I change the account
to Ponguru, I will start getting when users tag me or mention me in their post, they would
get actually more space to write the content.
06:00 - 06:30 And all this is happening just because there
is space constraint in Twitter. Whereas, in facebook if you see there is lot
more space for the content and therefore, facebook actually allows you to change your
username only once. Twitter allows you to change as many numbers
of times as possible that is the reason why this problem is actually appearing.
06:30 - 07:00 65 percent of users choose a new username
unrelated to the old name, while 35 percent reused an old one sometime later. You will actually see a table later also where
there is a small set of people who actually collude that is in a group, they would actually
use the same name and different users will
07:00 - 07:30 start using the same handle within the group. The reasons to change username include benign
reasons like space gain, suit a trending event, a gain or loss anonymity, adjust to real-life
events, avoid boredom and malicious intentions like obscured username promotion and username
squatting.
07:30 - 08:00 I will just tell you quickly what they are
and then when we go into the paper as we move forward, we can actually look at them in details. Space gain, I said the Ponnurangam Kumaraguru
to Ponguru. Suit a trending event the some event that
is going on let us take IPL β cricket, football I would change my handle to look very similar
to them, and therefore, I will get more of attraction. Gain or lose anonymity, I create in account
Ponguru which is Ponnurangam Kumaraguru which
08:00 - 08:30 is probably very identifiable, whereas if
I have an account saying a guy from Chennai, the anonymity is pretty high. Adjust to a real life events and things are
changing in my life. So, earlier let us take I was a grad student
I could have a graduate in my user handle,
08:30 - 09:00 but as now I am a professor so, could actually
use professor in my user handle. Avoid boredom, it is boring since I have been
using Ponguru for a long time, malicious intent is user obscured username promotion, I could
actually create an account which is, change my user handle which is very similar to somebody
who is popular and actually get my handle
09:00 - 09:30 promoted and username squatting. I could actually register for an account called
Amitabh Bacchan, now and keep it for me whenever Amitabh bacchan actually wants to create an
account, they would actually have to take the account from me. This username squatting is actually pretty
popular problem in terms of the usernames also.
09:30 - 10:00 There was an incident even in India when the
current government in central wanted to have an account there was an issue of PMO India. So, squatting of that handle like ponguru
for somebody to actually use it is a problem and this is a traditional problem in general
domains also. Somebody could squat URL called pmoindia.in
or pmoindia.com or apple.com, and they could
10:00 - 10:30 actually have others to pay for. I think there was in there was an experience
with the housing.com when they wanted the domain there was squatting for that domain
name.
10:30 - 11:00 So, now let us look at actually the data set
collection. So, as I said before in related work that
there is mention of three different types of domains in this three different areas that
this paper attacks which is evolving user behavior how users are actually changing the
behavior online. And the second one is profile linking, which
we have seen in this course before in terms
11:00 - 11:30 of actually connecting to user handles and
finding out whether there is a same identifying malicious. All these three different types of domains
actually come into this research work, so the authors actually mention about these related
work. Data collection, so in terms of actually the
total data that was collected, the authors actually created a large seed data set, track
the seed set of for two months every fortnight,
11:30 - 12:00 find users who change usernames more often
than others, filtered theses users and track their profiles every 15 minutes. Essentially, what others did was they took
the large data set and they were they tried tracking it every fortnight. And for the small data sets, small user set
from this larger data set, they were actually
12:00 - 12:30 tracking it for every fifteen minutes. We will actually explain later why this is
this approach authors took this approach. So, I think then users who participated in
the 17 local and global events during April
12:30 - 13:00 1, 2013 to September 3, 2013. So, essentially there where they have to be
some ways of collecting the users So, one approach that they took is events between
April and September, 2013 all the people who posted at about these global events the handles
were taken and they were actually the data for these 8.7 million users were collected
which is about that users handles.
13:00 - 13:30 Seed tracking, now from now on 8.7 users is
the seed users. 8.7 million users for any username changes
by querying them every fortnight within a period of October 2013 and November 2013. So, 8.7 million users, every fourteen days
go and check whether they have actually changed
13:30 - 14:00 their handle. By comparing two consecutive scans, old and
new usernames of a user were recorded, which is if fourteen days before, if my account
was Ponguru, and today my account is ponnurangam dot kumaraguru both of them are actually captured. Twitter usernames are case-insensitive; therefore,
any changes, any case changes were not counted
14:00 - 14:30 as username changes. We found that 853,827 users of the 8.7 million
users which are about 10 percent, changed their usernames at least once during a small
observation period of 2 months. In these 2 months, 10 percent of these 8.7
million users changed their user handle which is actually pretty large, 10 percent of users
changing their user handles.
14:30 - 15:00 So, now how do you, meaning we cannot actually
collect all the users, 8.7 million users for very frequent data collection. So, the authors actually decided to sample,
so tracking users who do not participate in such behavior had little value, which is the
users who do not change the behavior we therefore,
15:00 - 15:30 filtered users, 711,609 users who changed
their usernames at least once and randomly sampled 10,000 users to monitor them for a
short intervals. The idea is to find out people who are changing
their usernames and from their take the usernames and create a small sample to collect data,
and the big reason why we want to actually,
15:30 - 16:00 the reason why authors actually choose to
collect a smaller data set of only 10,000 is that. If you had to make so much of API calls to
Twitter, it is going to be impossible, so that is what they call. That is what the author say here if you look
at it, quicker scans would need 1,462 application authentication tokens.
16:00 - 16:30 And therefore, it is going to be actually
hard to do that. Now that we have seen the different types
of seeds that the authors used. Here is the table that actually gives you
the details in a such things formed, fortnight scan October 16, 2013 to November 26 2013,
8 million users - 15 minute scan November
16:30 - 17:00 22 to January 22 of 2015 10,000 users. Out of
the 10,000 users, 4,198 users changed their usernames at least once in 14 months. Constituting 14,880 username changes, about
20 percent users changed 5 minutes or more
17:00 - 17:30 triggering around 12,648 - 85 percent of user
name changes. And so we will see the figures also. So, one user changed her username 113 times
in fourteen months which on manual inspection turned out to be an inorganic user with half
completed tweets, tweets with the same text, and frequent posts in short duration So, it
is essentially saying that it is not necessarily
17:30 - 18:00 legitimate or human being user. So, the conclusion that you want to remember
is 10,000 users, 4,198 users changed their username at least once in 14 months. So, the reference here is to figure 3. Let us go look at figure 3. So, here is figure 3, which actually shows
user distribution for frequency of changing
18:00 - 18:30 user names, 20 percent of the users frequently
changed this usernames and 80 percent of the users change rarely. So, again like the last week paper that we
saw, it is a percentage of users where you can actually see that the first part until
about 10 or 12 is actually very short.
18:30 - 19:00 So, there the insight is actually giving you
the more detailed view of the data, frequency of username change versus number of users. So, one user changed, one user changed it
113 times in 14 months.
19:00 - 19:30 Around 20 percent of the username changes
were triggered within a day of the previous username change. Observe a Pareto distribution with 20 percent
of the users frequently changing usernames in short intervals, and 80 percent of the
users changing rarely after long duration So, this is why this is in figure 2(a) that
is the distribution that you want to actually
19:30 - 20:00 look at. So, again there is an insight here to show
the number of days for a username change, this is 0 to 600, whereas this is just showing
0 to 1. And the frequency of username changes frequency
of username changes here right, this is the
20:00 - 20:30 percentage of username changes So, (a) is
giving you that, normalized longest common subsequence length, we'll see all of these,
position of change relative to usernames. So, now look at the usernames itself. Specifically targeting only looking at the
usernames, we actually see popularity versus
20:30 - 21:00 frequency change. We measure popularity of 4,198 users using
followers, that is in degree you know what in degree is, and plot it against a frequency
of username change which is number of followers that I have versus the number of times or
my username changes. This will actually be interesting results
like whether popular users who are having
21:00 - 21:30 a lot more followers are actually changing
the usernames more frequently versus people who have lesser number of followers. To find the correlation between the two, authors
basically removed a everybody who had greater than one million followers, and too less which
is less than one for us, because there is no sense in having or both these types of
users because it will actually not, it will
21:30 - 22:00 basically skew the analysis that we are looking
at. We observe that username change frequency
is weakly yet positively correlated with the in degree of username, which is a significant
positive correlation imply that higher the popularity, higher is the frequency of change,
however, weak correlation does not guarantee
22:00 - 22:30 the same. Which is that we in this case we only have
authors only found weak correlation. So, there may be a chance that the number
of in degree followers is actually affecting the username changes, but it may not also
be effective. Figure 4 (a), so if you see here, this shows
number of followers from 1 to 10 to the power
22:30 - 23:00 of 6, frequency of username changes. So, this is not a really meaning, if it would
have been a positive correlation we could have actually seen all data like this, which
is as the number of followers increases, first percentage of times the username changes increases
then it could be a linear graph. Whereas, this graph is showing you that it
is not really positively correlated. Also if you look at this another metric which
is percentage of tweets posted versus frequency
23:00 - 23:30 of username change, weak correlation imply
that popularity and activity has a little impact on choice of change in username. Which is the second graph is here figure 4(b)
shows the frequency of username change with the users' activity.
23:30 - 24:00 To find correlation between the two, again
the more greater than ten thousand tweets less than one tweet, we observed a weak and
a positive correlation between the two, same as the number of followers that is so we have
weak correlation between these two, number of tweets posted and the frequency of username
change. So, that is gives you a sense.
24:00 - 24:30 So, let us go overall the analysis that you
have seen until now, it basically says that 20 percent of the users change five times,
that there are people who have changed 113 times and there is weak relationship between
popularity and the number of post somebody posts for the username changes that is what
we learnt until now. So, for studying, actually the reasons why
people change the name, Paridhi, the first
24:30 - 25:00 author of the paper actually tried doing some
interesting things. Once she created a survey with the some questions
asking why users actually change their usernames. And she posted tweets tagging the users who
had actually changed the usernames.
25:00 - 25:30 Interestingly we got some both very positive
reactions and very negative reactions also. There were people who actually said why you
actually tracking us, who are you and why you actually understanding username changes
that I have done, why you asking me all these questions.
25:30 - 26:00 Some users actually reacted with their, gave
the reasons, why they are actually changing their names, usernames and actually explain
things. Here is a some reasons that we said in the
abstract, what are the reasons that people could actually changes the usernames, space
gain, of course, Ponnurangam Kumaraguru versus Ponguru would help them change their get more
text into the posts.
26:00 - 26:30 Let us also look at the figure 5, which actually
is showing you that the length difference between the names. So, if you look at the x-axis, it is a space
gain being in their x-axis and y-axis is old
26:30 - 27:00 username length which is to find out that
if I moved from ponnurangam to ponnurangam kumaraguru to Ponguru versus Ponguru to ponnurangam
kumaraguru, what is happening and why are users doing that. So the authors calculated the length difference
between the new and the old name of users, and separately represents users of the old
names less than and greater than the median
27:00 - 27:30 length of eleven that was because I think
the data itself showed that the median length of user handle size was 11. Authors observed that 75 percent of long usernames
moved to short or the same length new usernames. 75 percent, 75.19 percent of long usernames
moved to short or same length usernames.
27:30 - 28:00 While 60.87 percent short usernames picked
long new usernames So, it is kind of a same kind of percentage of people are actually
flipping from small to big, and big to small. In other words, most users with old usernames
of less than 11 tend to add characters in their usernames. While most users with old usernames greater
than 11 prefer to remove characters from their
28:00 - 28:30 new usernames; old username length greater
than 11 which is shown as red here they are all moving from, so basically space gain if
I actually reduce my user handle size, I am getting actually more space. If I am getting increasing the characters,
I am losing space that is what is positive
28:30 - 29:00 and negative here. It moves from, so old usernames old username
less than 11, where if you see here old username less than 7 tend to add characters, so that
is what is here, a space gain and this is negative space gain. old username less than eleven characters.
29:00 - 29:30 So, this is blue is old username is less than
eleven characters. So, they are actually getting space gain which
is positive. Old username greater than eleven which are
getting, they are adding more characters. So, they getting space gain negative, they
are losing space. That would help you to understand what kind
of username changes are happening on Twitter.
29:30 - 30:00 In terms of the 10,000 users that the authors
are actually analyzing. Maintain multiple accounts, few exchange usernames
with the multiple accounts. Which is I maintain three accounts and I actually
keep changing the usernames between these three accounts. So, the users, some users in the data set
change username to reverse the identifiability
30:00 - 30:30 of the users, either to make them personal
or to anonymous. So, for example, I could actually have a username
Ponguru, which is probably identifiable and I move from Ponguru to professor from Chennai
and that would make it anonymous, compared to professor from Chennai to Ponguru which
will make it more identifiable. So, just take a look at this table, this is
something I mentioned earlier, but I will
30:30 - 31:00 actually explaine what happened in the data
set now. So, you should look at the first column which
is the id, which is the unique id that the users have on Twitter, we have put x so that
you can also not identify the users for now. Scan 1 - Peshawar underscore sms; scan 2 - Peshawar
underscore went to the next user in row 2.
31:00 - 31:30 And scan 3 - Peshawar underscore sms went
to the user 3. So, it is a same group Sajan group and given
that we were actually tracking, the authors were tracking every 15 minutes we could actually
find out that every scan at the user handle with different sets of users from the same
group.
31:30 - 32:00 It could be the case that this all these handles
are actually managed by the same person that is a probability there, but what we found
was this - that usernames within a group have been shared and people actually use different
account starts using the same user handle. So, the last column is date of observation
which is showing you that we captured data
32:00 - 32:30 in different snapshots. So, looking at the reasons for actually username
changes; adjust to events which is one user actually said that and the user was actually
associated with an event, the event finished and the user started connecting with the another
event and then, so they handle was changed
32:30 - 33:00 for example, pwifanclub to ForceIndia. So, this is the explanation that I did with
the table two, which is the authors found that a few users collaboratively pick the
same username at different times stamps, and the table I have already walked you through,
so which actually gives you a sense of how
33:00 - 33:30 the handles have been managed. Username squatting. Username squatting is actually against the
Twitter rules, but users actually generate user handles and keep it, so that they can
actually monetize them when necessary, when other users are actually wanting to have these
handles. So, that ends the paper. So, this is essentially a paper which talks
about how much frequently the users change
33:30 - 34:00 the handles, why do they change the handle,
what kind of patterns, what is the relationship between number of, who's changing the handle
- people who are popular versus people who are not popular, people who post a lot more
text versus the people who posts less texts, that is the kind of analysis that this paper.
34:00 - 34:30 This paper can be actually very useful in
terms of even analyzing and even making some inferences on username changes. Also here is a table which actually also says
few reasons for username changes. Privacy, for privacy since as my initials
and part of my full name.
34:30 - 35:00 So, is simply people are actually given the
reasons for why people change the usernames privacy, privacy and abuse, link all accounts,
use real name, use easier, shorter username and reading the text in the column one violates
wiki policy, violates wiki policy, violates wiki policy for religious reasons.
35:00 - 35:30 So, these are not from the user handles from
Twitter. Authors actually got a chance to look at the
username changes in Wikipedia and these are the reasons that people had actually mentioned
even in Wikipedia you could actually change your handle. Of course this work, meaning all of these
kind of work has to have some kind of a limitation. So, here due to the Twitter API restrictions
only ten thousand users' data was actually
35:30 - 36:00 collected and analyzed for the fifteen minutes
scan, that is one of the biggest limitations for the study. And that are many directions that people could
actually take this kind of work; one direction which users could take or people could take
is actually extending, increasing the data
36:00 - 36:30 set of the analysis itself studying it among
much larger data set probably may give some more results which is generalizable to large
audience also. With that I will stop this paper; I will see
you soon.