Seeking the Soul of Open Education in the Era of Gen AI

An intense debate has opened up on the Creative Commons Open Education email list. This extends discussions which have been brewing for some time about whether Open Education practitioners should support or fight against Large Language Model developers scraping web publications without either attribution or positive permissions for training data for Gen AI.

This week the debate heated up following the advertisement of a webinar featuring a presentation by Dave Wiley:

The University of Regina's OEP Program invites you to a special online presentation by Dr. David Wiley. Dr. Wiley is widely recognized as one of the founders of and key thinkers surrounding the open movement in education.

Date: Thursday September 19, 2024

Abstract:

For over 25 years, the primary goal of the open education movement has been increasing access to educational opportunity. And from the beginning of the movement the primary tactic for accomplishing this goal has been creating and sharing OER. However, using generative AI is a demonstrably more powerful and effective way to increase access to educational opportunity. Consequently, if we are to remain true to our overall goal, we must begin shifting our focus from OER to generative AI.

There was near instant kickback on the list. Heather Ross wrote:

I’m really troubled by so many in the open movement seeing GenAI as a natural fit with OER. OER aligns with several of the UN SDGs and is being used to integrate sustainability into curriculum, teaching about how all disciplines are tied to the SDGs. GenAI is an environmental nightmare. OER is being used to integrate EDI and Indigenization into curriculum. GenAI, programmed by those of dominant groups, often fails to represent or misrepresents members of marginalized communities. Taking what isn’t yours to create something new without giving credit, having permission, or considering the impact on others isn’t innovation or acting in the spirit of open. It’s colonization. OER has always called for recognition of the work’s creators and contributors and gratitude for their willingness to share it openly. Any gratitude toward GenAI-created work that was taught on copyrighted works against the copyright holder’s permission will ring hollow. During my comprehensive exam, a committee member asked me what the difference between OER and Napster was. At the time, that was easy to answer. Most OER was created by authors who willingly released their work with an open license. Napster was the sharing of music without the artist’s permission. If I were asked that question now, it would be a lot harder to answer.

And Dave Wiley came back to say:

It feels like we spent the second full decade of the OER movement, from 2008 - 2018, running non-stop workshops about copyright and the Creative Commons licenses. We had to spend ten years that way because there are certain fundamentals about copyright and licensing that a person has to understand before they can participate in the OER movement in a way that goes beyond reusing content created by others.

The same is true for generative AI. People who want to participate as something more than reusers of generative AI tools created by others will need at least some proficiency in prompt engineering, retrieval augmented generation, fine-tuning, and other topics. I agree that smaller models running locally is where this all needs to go eventually, which means additional understanding will be needed in techniques like quantizing, pruning, and distilling the knowledge of larger models into smaller ones so these models can fit (and run) on edge devices like consumer laptops and phones. 

There are strong analogs between the revise and remix potentials created by openly licensed content and the revise and remix potentials created by openly licensed model weights. And the overall educational potential is far greater for open weights than open content. But without some baseline understanding of how generative AI works it will be difficult to participate (productively) in these kinds of conversations. It looks like we might have another decade of dry, technical, arcane professional development workshops ahead of us. :)

This is some of the territory I'm going to cover in the talk in a couple of weeks.

Stephen Downes weighed in with a post on his blog entitled What is the Soul of Open Education?.

I've had my disagreements with Wiley over the years but we are in agreement on this point. Now what it means to say "increase access to educational opportunity" may be another point of contention; creating startups and making money isn't my idea of progress. But we agree on the potential of AI.....

If it takes (AI) a fraction of the resources it used to take to create a useful and usable OER, even if it has to be corrected for misrepresentation, then there is far more opportunity for people in under-represented groups to crate resources where they see themselves reflected in the materials being used in learning. AI-assisted transcription and translation, resource recommendation, community formation and more can also help members of marginalized groups.

There were many more contributions and I am sure we have only seen the start of this debate. But it seems a very important one for the future of Open Education and for Open Education practitioners wrestling with AI.

More to follow.

The Creative Commons Open Education Platform is a space for open education advocates and practitioners to identify, plan and coordinate multi-national open education content, practices and policy activities to foster better sharing of knowledge.

This platform is open to all interested people working in open education.

You can join the email list at cc-openedu [at] googlegroups [dot] com

Definition of Open Source AI

Clarote & AI4Media / Better Images of AI / Power/Profit / CC-BY 4.0

There is growing interest in using and developing Open Source Software approaches to Generative AI for teaching and learning in education. And there are an explosion of models claiming to be Open Source (see, for example Hugging Face). But Gen AI is a new form of software and there has been difficulties on agreeing what a definition is. This week the Open Source Initiative has released a draft definition.

In the preamble they explain why it is important.

Open Source has demonstrated that massive benefits accrue to everyone when you remove the barriers to learning, using, sharing and improving software systems. These benefits are the result of using licenses that adhere to the Open Source Definition. The benefits can be summarized as autonomy, transparency, frictionless reuse, and collaborative improvement.

Everyone needs these benefits in AI. We need essential freedoms to enable users to build and deploy AI systems that are reliable and transparent.

The following text is taken from their website.

What is Open Source AI

When we refer to a “system,” we are speaking both broadly about a fully functional structure and its discrete structural elements. To be considered Open Source, the requirements are the same, whether applied to a system, a model, weights and parameters, or other structural elements.

An Open Source AI is an AI system made available under terms and in a way that grant the freedoms[1] to:

  • Use the system for any purpose and without having to ask for permission.
  • Study how the system works and inspect its components.
  • Modify the system for any purpose, including to change its output.
  • Share the system for others to use with or without modifications, for any purpose.

These freedoms apply both to a fully functional system and to discrete elements of a system. A precondition to exercising these freedoms is to have access to the preferred form to make modifications to the system.

The preferred form of making modifications to a machine-learning system is:

  • Data information: Sufficiently detailed information about the data used to train the system, so that a skilled person can recreate a substantially equivalent system using the same or similar data. Data information shall be made available with licenses that comply with the Open Source Definition.
    • For example, if used, this would include the training methodologies and techniques, the training data sets used, information about the provenance of those data sets, their scope and characteristics, how the data was obtained and selected, the labeling procedures and data cleaning methodologies.
  • Code: The source code used to train and run the system, made available with OSI-approved licenses.
    • For example, if used, this would include code used for pre-processing data, code used for training, validation and testing, supporting libraries like tokenizers and hyperparameters search code, inference code, and model architecture.
  • Weights: The model weights and parameters, made available under OSI-approved terms[2].
    • For example, this might include checkpoints from key intermediate stages of training as well as the final optimizer state.

For machine learning systems,

  • An AI model consists of the model architecture, model parameters (including weights) and inference code for running the model.
  • AI weights are the set of learned parameters that overlay the model architecture to produce an output from a given input.

The preferred form to make modifications to machine learning systems also applies to these individual components. “Open Source models” and “Open Source weights” must include the data information and code used to derive those parameters.

Of course this is only a draft and there will be disagreements. A particularly tricky issue is whether Large Language Models should be allowed to be trained from data scraped from the web without permission or attribution.

AI Governance

Open consultation on regulatory approaches for AI 

Following extensive expert consultations and discussions with parliamentarians, UNESCO have released a consultation paper in English for public consultation on AI governance.. 

UNESCO encourages stakeholders, including parliamentarians, legal experts, AI governance experts and the public, to review, and provide feedback on the different regulatory approaches for AI. You can read the consultation paper here

The Consultation Paper on AI Regulation is part of a broader effort by UNESCO, Inter-Parliamentary Union and Internet Governance Forum’s Parliamentary Track to engage parliamentarians globally and enhance their capacities in evidence-based policy making for AI.

The Paper has been developed through:

  • Literature review on AI regulation in different parts of the world.
  • A discussion on “The impact of AI on democracy, human rights and the rule of law” with parliamentarians from around the world at the IPU Assembly in Geneva, 23-27 March 2024.
  • Capacity building workshop co-designed and co-facilitated by UNESCO on 25 March 2024 at the IPU in Geneva and three webinars on the subject that were organized by IPU, UNESCO and the Internet Governance Forum (IGF) for parliamentarians to inform the development of the discussion paper.
  • Discussion with Members of Parliament at the Regional Summit of Parliamentarians on Artificial Intelligence in Latin America held in Buenos Aires on 13 and 14 June 2024. 

The deadline for comments is 19 September, 2024.

AI Governance

Open consultation on regulatory approaches for AI 

Following extensive expert consultations and discussions with parliamentarians, UNESCO have released a consultation paper in English for public consultation on AI governance.. 

UNESCO encourages stakeholders, including parliamentarians, legal experts, AI governance experts and the public, to review, and provide feedback on the different regulatory approaches for AI. You can read the consultation paper here

The Consultation Paper on AI Regulation is part of a broader effort by UNESCO, Inter-Parliamentary Union and Internet Governance Forum’s Parliamentary Track to engage parliamentarians globally and enhance their capacities in evidence-based policy making for AI.

The Paper has been developed through:

  • Literature review on AI regulation in different parts of the world.
  • A discussion on “The impact of AI on democracy, human rights and the rule of law” with parliamentarians from around the world at the IPU Assembly in Geneva, 23-27 March 2024.
  • Capacity building workshop co-designed and co-facilitated by UNESCO on 25 March 2024 at the IPU in Geneva and three webinars on the subject that were organized by IPU, UNESCO and the Internet Governance Forum (IGF) for parliamentarians to inform the development of the discussion paper.
  • Discussion with Members of Parliament at the Regional Summit of Parliamentarians on Artificial Intelligence in Latin America held in Buenos Aires on 13 and 14 June 2024. 

The deadline for comments is 19 September, 2024.

LLMs are a cultural technology

Yutong Liu & Kingston School of Art / Better Images of AI / Exploring AI / CC-BY 4.0

John Naughton writing in the Guardian says:

Assessment in humanities in time of LLMs requires, "if not a change of heart, two changes of mindset.

The first is an acceptance that LLMs – as the distinguished Berkeley psychologist Alison Gopnik puts it – are “cultural technologies”, like writing, print, libraries and internet search. In other words, they are tools for human augmentation, not replacement.

Second, and more importantly perhaps, is a need to reinforce in students’ minds the importance of writing as a process."