I recently finished my Master’s thesis titled, “Social Privacy on Facebook: A cross-sectional survey analyzing awareness among university students in the Netherlands”. It took me a total of 6 months intensive work to complete it.
This master’s thesis research aimed to derive the insights into what extent social privacy awareness was present among student users on Facebook from a quantitative approach. In brief, social privacy awareness was about exploring the kind and amount of information users disclose, the strategies they employ, their concerns about other users’ gazes, and how they make that information accessible to others. Using questionnaires whose items are emerged from the theoretical framework of the thesis are aimed at students engaged with social media, exclusively Facebook. University students in the Netherlands were the central subjects of the research. The study illustrated that Facebook users seem to disclose less information about themselves and consistently use technological privacy tools, but these variables should not be considered on a scale.
Overall, it was a great experience conducting a research in that field with such practical implications for social privacy on Facebook, and this research has certainly intensified my knowledge on quantitative study methods.
This post serves as an abstract of the result section of the thesis.
The survey research method was chosen to be able to explore general patterns of social privacy awareness on Facebook. A cross-sectional survey had been implemented on the unit of analysis, using university students in the Netherlands who have a Facebook account or have had one in the past.
The survey was conducted in May 2017 with a total of 176 students, and only 169 respondents finished the survey (N = 169) from different backgrounds with the response rate of 96%. The information about the students who refused to participate was unknown. The bias against students who are not frequently online or not visiting such groups and pages might be existing because the survey was administered on the Web. The total number of respondents was suitable for the statistical analysis with a 95% confidence interval, which is based on both the sample size and the variance of the measurement.
The survey results were statistically analyzed by R programming language on the OS X operating system. The visualizations of the descriptive and inferential results derived from the survey data were created by R and R’s ggplot2 package.
A total number of 169 student respondents finished the survey out of 176 students for this thesis study with a 96% response rate. The demographics of the students who finished the survey was presented in the table below. Although the survey was aimed to represent all genders, more females (scored an average of 58.52%) than males (scored an average of 40.34%) participated in the survey (and Other reached 1.14% which was removed from further analysis due to lack of representation in the sample). Most of the respondents were in the age group of 19 to 25 (81.14%). Following that, students were asked their level of education including the following options: (1) 1st year (17.61%); (2) 2nd year (17.05%); (3) 3rd year (22.16%); (4) Masters (35.80%); (5) Other (7.39%). Most responses were recorded as Masters Students (35.80%) and 3rd year students (22.16%), who cover more than half of the respondents.
The figure on the left below shows the waffle chart of respondents’ nationalities, which were the eight most common nationalities of the respondents who participated in the survey. The most common nationalities were the Netherlands (46.83%), which covers almost half of the respondents, followed by Turkey (12.70%), Germany (8.73%), China (7.94%), Italy (7.94%), Greece (5.56%), Poland (5.56%), and France (4.76%). The figure on the right below displays a waffle chart of respondents’ academic disciplines. Despite that the survey was aimed to be dispersed to all students coming from different department backgrounds, it was found that the majority of the students (85.23%) study in a social sciences-related department, and only a minority of students (13.64%) study in natural sciences-related department.
The following table shows the frequency of respondents’ intensity of Facebook use with the sample demographics. The year of being on Facebook shows the frequency since the users registered on the platform. The majority of the respondents have joined Facebook in the years of 2008 (20.22%), 2009 (19.1%), and 2010 (15.00%). The average of hours per day of checking the news feed on Facebook points out that the majority of respondents reported that they are checking 1-3 (50.39%) and 0-1 (38.25%). Next, the number frequency of respondents going on Facebook specifies that the majority of respondents reported that they go on more than twice a day (81.35%), while the “never” response stays at zero. The respondents’ amount of shared content (i.e., update status, add photo on Facebook) per day average implies that the majority of respondents reported that they are checking 1-3 (50.39%) and 0-1 (38.25%). Surprisingly, the majority of respondents reported that they have more than 500 Friends (43.38%), yet the items seem to be dispersed as gradually lowering. Lastly, the level of connection with Friends illustrates respondents’ levels of connection with their Friends on Facebook by the term of degrees. It was reported that 43.27% of respondents think that they mostly have connected with Acquaintances at a Moderate level; 44.44% respondents think that they mostly have connected with Close friends at a Low level; and only 21.87% reported that they have no Distant friends. The last group, the People Only Met on Facebook, seems the most interesting; the respondents in this group reported no moderate degrees, as 41.52% respondents said “High”, when 46.78% respondents said “None.”
The subsequent table shows the amount and kinds of personal information disclosed on Facebook by gender of the respondents. The most remarkable items were reported as follows: Full name (M = 1.24, SD = 0.50) is generally shared in both male (83.33%) and female (76.70%). Date of birth (M = 1.40, SD = 0.71) is common to share completely and accurately like full name; there is a high sharing rate for both male and female, 77.27% and 71.84% respectively. 63.64% males did not share their e-mail address (M = 2.52, SD = 0.80), while females were at 76.70%. 15.15% males reported that they share telephone numbers (M = 2.82, SD = 0.53) completely and accurately while only 1.94% females do this. Home address (M = 2.94, SD = 0.31) shows the highest proportion of not shared information among the other items for both male (1.52%) and female (1.94%). Political views (M = 2.64, SD = 0.67) are not a very shared topic, only 13.64% male and 9.71% female shared this information completely and accurately. Religion (M = 2.74, SD = 0.62) is another not very shared topic; 78.79% of males and 86.41% of females did not share it. Photos of you (M = 1.37, SD = 0.56) was mostly shared personal information for both males (66.67%) and females (67.96%), and only 27.27% males and 29.13% females reported that they share this information but it is not complete or accurate. Opinions about job, school, and family (M = 2.65, SD = 0.69) were reported, and 72.73% of males and 82.52% of females did not share this information.
There was a general trend that respondents chose not to share information as complete and accurate except a few, such as full name. Additionally, the sensitive information, such as telephone number and home address did not shared completely and accurately. Females tend not to share personal information more than men in all areas.
Next figure below showed the stacked bar graph of frequencies of the status of each of the twenty-two items. The abbreviations in this graph were used for aesthetic reasons: NS means “I don’t share this information”, S-BNC/I means “I share this information but it is not complete or accurate”, and S means “I share this information completely and accurately.” The most remarkable items were as follows: Full name (77.27%), Date of birth (71.59%), and Hometown or City (69.32%) show high levels of sharing. School or employment (62.50%), and Photos of you (65.34%) show moderate levels of sharing this information completely and accurately. E-mail address (69.89%), Telephone number (86.93%), Home address (92.61%), Political views (73.30%), and Religion (81.25%) show high levels of not sharing this information completely and accurately. There was a general trend that respondents chose not to share information as complete and accurate except a few, such as full name. Additionally, the sensitive information such as telephone number and home address was not shared completely and accurately.
The set of technological privacy tools questions were aimed to show what each kind of privacy tools or settings share and to what extent. Table 4.4 shows the amount and percentage of respondents’ answers to different technological privacy tools questions. The most remarkable items were reported as follows: 37.72% of users always send private messages instead of posting on a Friend’s wall and 31.74% do this most of the time. Half of the respondents (51.50%) reported that they have never gone offline on Facebook chat. The majority of the respondents (70.06%) said that they do not provide fake or inaccurate information to restrict other users. Nevertheless, almost half of the respondents (48.50%) are never worried about being embarrassed by wrong information others may post about them on Facebook, and also nearly half of the respondents (49.70%) are never concerned that others will see their profile.
The set of awareness of social privacy questions were aimed to show the extent of awareness of social privacy in terms of agreeing or disagreeing. Table 4.5 shows the amount and percentage of respondents’ answers to different social privacy awareness questions. Neutral is expressed as “Neither agree nor disagree.” The respondents answered the questions in a dispersed way, even so that the most remarkable items are that only 4.12% of respondents reported that they are worry about others scrutinizing their profile, while 30.00% reported they neither agree nor disagree. 43.53% of respondents said that they are uncomfortable with the level of exposure their Facebook content might bring.
The following figure below illustrated diverging bar plot of the standard scores (z-scores) of the awareness of social privacy. The question items were computed and distributed regarding the value of items which are staying either above or below the average. The most prominent parts in this chart that the question item, not comfortable with the level of exposure, was above the average (z-score = 1.62); and, the item, worrying others scrutinizing my profile, was below the average (z-score = -1.24).
Next figure displays a scatter plot in which the dots accumulated on the right side of the scatter plot were more aware of social privacy. The standard deviation of the social privacy awareness was used as a measure of dispersion showing how the data spread out about the mean. The variables of Awareness of Social Privacy were calculated on the same scale as the sum of the total of construct variables, as that has better interpretability in the scatter plot chart. When we looked at the chart, the majority of dots accumulated in the upper-middle and left-middle places, showing these respondents showed a lesser awareness level.
Negative SNS experiences reports the percentage of negative situations which respondents heard or experienced. In the data, 59.15% of respondents in the 19-25 age group said they heard some negative situation out of 74.3% of total respondents, while the other 25.7% did not. 68.90% of respondents in the 19-25 age group said they did not experience any negative situation out of 84.6% of total respondents, while 15.6% did. Overall, people heard about negative situations happening due to the disclosure of personal information online more than experiencing it.
The set of awareness of social privacy questions were aimed to show the extent of awareness of social privacy in terms of agreeing or disagreeing. The respondents answered the questions in a dispersed way, even so that the most remarkable items are that only 4.12% of respondents reported that they are worry about others scrutinizing their profile, while 30.00% reported they neither agree nor disagree. 43.53% of respondents said that they are uncomfortable with the level of exposure their Facebook content might bring.
In order to draw a conclusion about the relationship between demographic control variables and the dependent social privacy awareness variable, Pearson’s chi square test was conducted to evaluate the likelihood of an observed difference between the sets. Pearson’s r was calculated to compare the frequency of the heard and experienced negative situations in demographic variables. No significant relationship was found in having social privacy awareness and gender [ ]. A significant relationship was found from social privacy awareness between age [ ] and education level [ ].
Validity and Reliability
Reliability and validity are two essential factors demonstrating the rigor of the research process and credibility of research findings. Not only should the results of the study be significant, but also the rigor of the research. In the survey analysis, reliability refers to the consistency and dependability allowing the results of analyses to recur in similar conditions. Statistical tests are usually used to measure reliability. Internal consistency of the questionnaire questions can be found by an internal consistency test like Cronbach’s alpha (α), which captures the standardized alpha based upon the correlations based on how coherent the scales used are. The Cronbach’s alpha (α) resulted between 0 and 1, and levels higher than .70 are accepted as a reliable analysis.
Assuring high quality results for the survey, the reliability and validity of the measurement scales were estimated. In order to examine the extent of internal consistency reliability to which the research instruments are related to other instruments, Cronbach’s alpha (α) is computed separately for the items in each construct.
The table below demonstrates the item-total statistics of Intensity of Facebook Use, which consist of the correlations between items and the construct’s total score. The 1st, 3rd, and 4th variables in the Intensity of Facebook Use construct had a negative correlation before they were reversed. After these items are reversed, if the low correlated items (6th, 7th, 8th, and 9th) led to a low alpha score, then they were removed from the construct. After all, the total scale of Cronbach’s alpha (α) reaches the level of .482, which does not make the construct reliable enough. On that matter, if the construct was removed, then the variables were analyzed independently.
Validity was discussed in terms of measurement validity, or how well the conceptual definition of the construct and empirical indicators fit together. As shown in the Operationalization section, the questions measured the concepts intended to be measured because the questions of the survey were mainly constructed by the previous research, according valid indicators to the survey questionnaire. Therefore, the validity of the questionnaire could be tested with factor analysis. On that account, a principal component factor analysis was performed to identify the validity implemented to check correlations and its structure between the variables.
A principal component analysis (PCA) was applied for all the items of the constructs with orthogonal rotation (varimax with Kaiser Normalization). The rotation sums of the squared loadings tried to maximize the variance of each factor. Firstly, Kaiser-Meyer-Olkin and Barlett’s test of sphericity were performed to see whether it makes sense to do factor analysis on these variables. The KMO that measures sampling adequacy indicated that the relationship among variables was high (KMO = .766), which was reasonable to continue the analysis. Barlett’s test of sphericity was significant , indicating the assumption of equal variances was not valid. The number of principal components was assessed by Kaiser’s criterion, which indicated the eigenvalues were greater than 1, and a parallel analysis ran with 100 simulations. Initially, only 6 components had eigenvalues greater than 1, which was the suggestion by Kaiser-Harris criterion.
The Table below illustrates the rotated factor loadings based upon the correlation matrix. The item variable names were shortened with their unique variable numbers, such as I (Personal Information Disclosure), T (Technological Privacy Tools), and A (Awareness of Social Privacy), in order to display an efficient table.
Since five or more factor loadings with the value of .50 or better were considered suitable to imply a solid factor, the factor loadings over .50 were chosen to create a construct. When there were more than two component values over .50, the higher factor loading between them was chosen. The 1st, 2nd, 3rd, and 4th components showed five or more factor loadings higher than .50. The Awareness of Social Privacy and Technological Privacy Tools were reconstructed according to the 1st and 3rd components’ factor loadings respectively. The variables less than .50 were removed from the construct. The Personal Information Disclosure seemed to divide into two components, which were reconstructed from the 2nd component and the 4th component. “Basic information disclosure” was constructed from the loaded items of the 2nd component whose variables are such as “Family members”, and “Partner’s name”; “Appearance information disclosure” was constructed from the loaded items of the 4th component whose variables are such as “Photos of you”, and “Places you visit.” After all, the constructs new KMO variable was .798 which was reasonable to perform factor analysis. Four components having eigenvalues greater than 1 were extracted. Table 1B and Table 2B in Appendix B show the new reliability and validity constructs. The factor analysis changed the constructs and measures of the constructs: Awareness of Social Privacy, Technological Privacy Tools, Basic Personal Information Disclosure, and Appearance Personal Information Disclosure.
Subsequently, a correlation matrix was performed to display the initial overview of the relations between variables with the statistical significance of the variables checked by Pearson’s correlation coefficients. The table below displays the correlation matrix of independent and dependent variables with Pearson’s correlation coefficient significance.
A set of Multiple Linear Regression (OLS) analyses were performed in order to test the hypotheses proposed in the Theoretical Framework, mainly to explore the dependent variable, social privacy awareness, with other proposed independent variables. In addition, the demographic data (gender, age, education level) were used as control variables in the regression models in order to see whether the predictor values are significant, and whether the predictor values have unique variance. The nationality variable was not taken into consideration because of high variety, and the department variable was not included because it was collected as an open-ended question. The categorical gender variable is recoded as a dummy variable (0 = Male and 1 = Female) for the subsequent regression analyses. “Other” was omitted due to low data in the sample population. Considering that regression analyses were done with social privacy awareness, several independent variables were constructed as reliable and valid in the previous analyses.
Two regression models were used to examine the relation between personal information disclosure and social privacy awareness (H1). In the first model, the basic personal information disclosure on social privacy awareness was tested. In the second model, the appearance personal information disclosure on social privacy awareness was tested.
Before the regression analysis, the regression diagnostics were checked. Below image shows the diagnostic plots for the regression analysis of basic personal information on social privacy awareness. The first plot (Residuals vs Fitted) shows a fairly linear relationship that does not indicate any distinctive pattern that assumes the linearity assumption is violated. Second, the Normal Q-Q plot shows the normality assumption was met as the points forming the line were close to being straight. Third, the Scale-Location plot implies that homoscedasticity was not met as the points on the graph did not equally spread along the horizontal line. Fourth, the last plot, Residuals vs Leverage, showed that there were some influential cases pointed outside of Cook’s distance line. The analysis of multicollinearity was completed (VIF = 1.02), and collinearity was not violated by this test. Additionally, for the normality distribution of residuals, the Shapiro-Wilk test was performed, finding that the residuals did deviate from normality (p < .001).
The regression results of regression coefficients and standard error from the first and the second model were reported in the following table. The regression models of the level of social privacy awareness, as the dependent variable, and the basic information disclosure, F(4, 164) = 1.218, p = 0.30 , and appearance personal information disclosure, F(4, 164) = 1.218, p = 0.30 , as independent variables, were found no significant along with the control variables, the age, gender, education level.
A two-way factorial ANOVA was performed to examine the effects of social privacy awareness in connection with the effects in kinds of intensity of Facebook use. Table below illustrates that the dependent variable of the level of social privacy awareness was performed on the dependent variables, and only the frequency of going on Facebook returned a statistical significance, F(4, 169) = 3.706, p = 0.006 . Nevertheless, the significance value generated in a two-way analysis of variance does not tell us where this effect happens. Since the frequency of going on is between six levels, determining which conditions are significantly different from other conditions requires conducting and reporting the results of a post-hoc test, which compares the significance of each condition with all other conditions. A post-hoc comparison using Tukey HSD (Tukey’s Honest Significance) test was conducted (with the confidence level of 0.95) on all possible family-wise contrasts and multiple comparisons of means to see the differences between means of the specified variables. The dependent variable, awareness of social privacy, was mean-centered standardized. Since Tukey HSD requires categorical variables for the test, the continuous dependent variable, awareness of social privacy, was categorized in the terms of number ranges in four degrees, like 1 to 2, 2 to 3, 3 to 4, and 4 to 5 (out of a five point Likert scale ranging from strongly agree to strongly disagree). Tukey HSD showed that the groups between the “3 to 4” level of social privacy awareness and “once a day” going on Facebook differed significantly at p < .05 .
The following figure illustrates the notched box plots of sample mean estimates for the negative SNS experiences on social privacy awareness. First, a strong significant difference was found in social privacy awareness in the conditions of heard and not heard. On average, “Yes” scored (M = 1.95) higher than “No” (M = 1.70) in the heard group (1st plot), t(86) = -2.39, p < .001 . This result suggests that having heard about a negative experience has an effect on social privacy awareness; when users hear about negative experiences, they become more aware of social privacy. Second, a significant difference in social privacy awareness was found in the conditions of experienced and not experienced. On average, “Yes” scored (M = 2.29) significantly higher than “No” (M = 1.81) in the heard group (2nd plot), t(33) = -3.03, p = .004 . It is implying that having experienced a negative situation has an effect on social privacy awareness; when users have negative experiences, they become more aware of social privacy.
This is a condensed version of the results. If you are still interested, you can access to the full version of thesis here.