Data Curious

Share this post

Hiding the Magic

datacurious.substack.com

Discover more from Data Curious

Data Curious about finding new ways of experiencing the world around us through data—visually, sonically, and philosophically. Common topics include data visualization, creative coding, generative art, sonification, and human-computer interaction.
Continue reading
Sign in

Hiding the Magic

The hidden cost of ChatGPT, an Information Design Journal, and the MusicCaps dataset

Ben
Jan 30, 2023
Share this post

Hiding the Magic

datacurious.substack.com
Share

Hi,

In preparation for this newsletter, I broke a small personal rule: I logged back into Twitter. Historically, Twitter was my main information feed for finding interesting links for Data Curious. After the Musk takeover I reached a breaking point and jumped ship to Mastodon (vis.social server). I’m grateful that a small but not insignificant number of people I admire from my Twitter-sphere also did the same.

That being said, Mastodon hasn’t quiet rivaled the level of discovery achieved on Twitter. On the bright side, it’s not (as of now) a soul-sucking portal designed to encourage outrage and dunking on people. So pros and cons.

But every once in a while (like yesterday), I’ll pop my head back in to see if there’s anything in the data + tech world that might be inspiring. I’m not sure if I regret it yet or not. One thing that really stood out as different: the AI hype is everywhere.

I don’t typically write about AI. It’s not my field of interest or expertise. And I’m not going to offer much editorial on it in this letter (only a bit). But I will say that I recently finished reading Atlas of AI by Kate Crawford, which helped me understand the wider scope of where we are now and how we got here. Highly recommend it for a critical reading of machine learning technologies (i.e. AI, minus the marketing hype).

That being said, I will be featuring some AI related content here (but not at all for promotional purposes…quite the opposite in fact).

If you have any other recommendations for critical analysis of AI, let me know in the comments below or by replying to this email.


Read

Image from the TIME story linked below. Caption reads: ‘This image was generated by OpenAI's image-generation software, Dall-E 2. The prompt was: "A seemingly endless view of African workers at desks in front of computer screens in a printmaking style." TIME does not typically use AI-generated art to illustrate its stories, but chose to in this instance in order to draw attention to the power of OpenAI's technology and shed light on the labor that makes it possible.’

OpenAI Used Kenyan Workers on Less Than $2 Per Hour to Make ChatGPT Less Toxic

Ok, I know I said that I wouldn’t write much about AI…but this story came across my feed last week and the timing felt eerily prescient. I had just been reading in Atlas of AI how fundamentally, AI technology is one of extraction. Extracting data in the form of scraping huge amounts of information from the web. Extracting valuable minerals from the Earth to fuel the huge amount of computing power needed to run the models. And extracting human labor (as demonstrated in this investigative TIME piece) by outsourcing content moderation and data labeling at the lowest possible costs.

Users of ChatGPT see exactly what OpenAI want them to see: the instant magic of a machine answering your questions reasonably. But as the piece shows, much of the work is not magic: it’s human exploitation.


Explore (but mostly read)

https://www.jbe-platform.com/images/covers/1569979x.png
Online cover image from the Information Design Journal (Issue 1, Volume 27)

Information Design Journal (Issue 1, Volume 27) is now Open Access

I was unfamiliar with this publication until last week, when I saw someone announce that the first issue of this year was just made Open Access (all articles available online). In my experience, this is rare for academic style journals. From a quick skim, I’m eager to have a closer look, especially with titles such as “A dynamic topography for visualizing time and space in fictional literary texts”.

Twitter avatar for @InfoPlusConf
Information+ @InfoPlusConf
We are very happy to announce our special issue of Information Design Journal with 10 contributions from #infoplus2021 is now fully published as #OpenAccess: doi.org/10.1075/idj.27… A big thank you again to our wonderful authors and reviewers! #DataVis #InformationDesign
doi.orgVolume 27, Issue 1 | John BenjaminsWelcome to e-content platform of John Benjamins Publishing Company. Here you can find all of our electronic books and journals, for purchase and download or subscriber access.
10:05 PM ∙ Jan 20, 2023
12Likes6Retweets

Data

Twitter avatar for @teropa
Tero Parviainen @teropa
The results are very impressive, but the dataset seems like the most immediately advantageous aspect here, given they've no plans to release the model. "5,521 music examples, each of which is labeled with an English aspect list and a free text caption written by musicians"
Twitter avatar for @arankomatsuzaki
Aran Komatsuzaki @arankomatsuzaki
MusicLM: Generating Music From Text Presents MusicLM, a model for generating high-fidelity music from text. MusicLM generates music at 24 kHz that remains consistent over several minutes. proj: https://t.co/8vzBONkPe3 abs: https://t.co/vzW01q7VpH data: https://t.co/LERn2mZMtO https://t.co/u4L4ui0RwU
8:35 AM ∙ Jan 27, 2023

MusicLM: Generating Music From Text (Dataset)

Admittedly, I was impressed by MusicLM when I first heard it. MusicLM is the newest text-to-music generational model on the block from Google Research. But even more interesting to me was (as mentioned by Tero in the tweet above) the fact that the authors open sourced the training dataset: over 5 thousand music examples with text labels attached. Could be some interesting source material to experiment with later, for both machine learning and visualization purposes. The MusicCaps dataset is on Kaggle here and the research paper as a PDF is here.

Share this post

Hiding the Magic

datacurious.substack.com
Share
Comments
Top
New
Community

No posts

Ready for more?

© 2023 Ben
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing