Over the past months I have gradually started to shift the way I think about data and information. Through this course I have discovered how important it is to look at the bigger picture; the influence of my decisions or those of the curator, but also the impact of subjectivity. Most importantly, this course has made me reflect upon my own approach to interpretation in various areas of my life. While reading articles in the news or in research I often think about what is not there, rather than what is, I think about what the curator chose to leave out, and what their reason for doing so might be. At the start of the semester we were confronted with the fact that data are shaped by the subjective decisions of the curator, as well as constraints presented by infrastructure [1]. Therefore through curation, data is always in some way biased, which directly influences the output of knowledge and our understanding of that knowledge [2]. However acknowledging that the process of curation inherently creates bias in the data wasn’t meant to make us think that all data is futile, rather it was for us to develop a more critical approach to how we understand it.
This exhibition will feature works throughout the semester, showing my process of curation by gathering data, categorizing and archiving, and finally visualizing datasets.
By digitizing my physical collection of CDs I learned that the process of archiving physical objects was not neutral and greatly shaped by both the objects themselves as well as the infrastructure I chose to use [3]. Using a spreadsheet limited my collection to the rows and columns, basically forcing me to separate the CDs into quantifiable and somewhat restricted variables. During this process I, as a curator, had to make decisions which in the end greatly influenced the end result. I also realized how difficult it can be to determine which variables would be important to record, so as to have my datafied collection be functionally equivalent to an actual archive. Therefore I had to do some research and find lesser-known variables to add, and what their functions were.
[1] Kitchin, Rob. 2022. “Introducing Data.” In The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences, 1–19. SAGE Publications Ltd.
[2] Ackoff, R. L. (1989). From Data to Wisdom. Journal of Applied Systems Analysis, 16, 3–9.
[3] Dourish, Paul. 2017. “Spreadsheets and Spreadsheet Events in Organizational Life.” In The Stuff of Bits: An Essay on the Materialities of Information, 81–104. Cambridge, [Massachusetts]: The MIT Press.
[4] Ford, Heather, and Andrew, Illiadis. 2023. “Wikidata as Semantic Infrastructure: Knowledge Representation, Data Labor, and Truth in a More-Than-Technical Project” Social Media + Society 9 (3). https://doi.org/10.1177/20563051231195552
[5] Bowker, Geoffrey C., and Susan Leigh Star. 2000. “Why Classifications Matter“. In Sorting Things out: Classification and Its Consequences. Cambridge, Massachusetts London, England: The MIT Press. 319-326.
[6] Drucker, Johanna. 2011. “Humanities Approaches to Graphical Display” 5 (1). http://www.digitalhumanities.org/dhq/vol/5/1/000091/000091.html.
This project showed me that proper translation of historical experiences, especially those of non-western origin, is crucial to do in order to simplify the knowledge for a wider audience. The way I translated this data from Dekoloniale to Wikidata was, again, greatly shaped my own decision-making process, but also greatly influenced by the infrastructure and ontology of the two platforms provided [1, 4]. There exists no real way to remove this bias, however we should also not be aiming to be perfect when categorizing data as this also creates bias [5].
The workshop related to this project also made me realize how important it is to create digital archives of historical (and generally, all) data, since times are changing and we are in the digital age. Archiving this data is a means of preserving the history which without it can simply be erased, which is problematic even outside of the topics we researched.
I visualized a dataset which contained variables about social media usage overall well-being across gender. Showing general trends and patterns between variables was done to show the data in an objective way.
Displaying the data using a more critical/data-feminist approach was also important. As a curator you need to organize and categorize data, and using any sort of method used for this highlights some viewpoints but it also suppresses others [6]. By showing the gender distribution of the dataset it underscores something which is seldom discussed in data curation, and that is the individual experiences not represented due to categorization. The visualization of data is done so the audience has an easier time with interpreting them, but it can also be used to both hide and emphasize things depending on what the curator decides.
- What variables are important to include?
- Should I confine my variables to fit into the provided infrastructure?
- Are the CD covers important as independent variables?
- Is it worth having the series titles as they appear in their original language?
- Should I exclude the year of release, since they are often not available, or is it important to show that this information is often missing?
- Would IFPI codes be deemed important in an actual disc archive?
- Why is including an image of the CD important to the data?
- Should I do additional research and add the length of the series as a variable?
- Is there a specific order the variables should appear in to make the data easily readable?
- Since the dubbed language is always Polish, should I include it as it's own variable, or is it something that should be mentioned in the critical report?
- What is the best way to take notes from Dekoloniale?
- Which of Hu Lanqi's experiences are more important than others?
- How are my own (and my group's) decisions shaping what we include in Hu Lanqi's page?
- Why is it important to include another photo of Hu Lanqi to her Wikidata page? What does it add?
- Is there anything I am leaving out because of the way Wikidata is structured?
- What is the best approach to translate my notes from Dekoloniale onto Wikidata?
- Is there anything I would do differently if Wikidata was formatted differently?
- Should I look for additional information to include online?
- Is there anything on Hu Lanqi's Wikidata page that should include additional information?
- Which data set resonates with me the most? What is something that interests me but is also good for analyzing?
- What medium should be used for visualizing this data?
- What is the best way to show the data in an objective way?
- Which variables from the data should be used to show patterns?
- How are my decisions shaping the way this data is being represented?
- What is not being shown in the data?
- How does my interpretation of what isn't shown influence my visualizations?
- Is the visualization of what isn't represented in the data appropriately reflected upon in the report?
- How much do the colors of the visualizations influence how the audience interprets them?