EleutherAI is a recently formed collective of volunteer researchers, engineers, and developers focused on open-source AI research. The organization uses the GPT-Neo and GPT-NeoX codebases to train massive language models that it plans to release under open licenses. “Open source data benefits researchers because scientists have more free resources to use to train models and complete research,” Edward Cui, the CEO of AI company Graviti, told Lifewire in an email interview. His company is not involved in EueutherAI. “We know that scores of AI projects were held up by a general lack of high-quality data from real use cases, so it is vital to establish guidance that ensures data quality, with the help of the participating community.”
This Is the Way
The beginnings of the EleutherAI were humble. Last year, an independent AI researcher named Connor Leahy posted the following message on a Discord server: “Hey guys lets [SIC] give OpenAI a run for their money like the good ol’ days.” And so, the group was formed. It now has hundreds of contributors that post their code on the online software repository GitHub. Open-sourcing AI efforts aren’t new. In fact, Airbnb’s Airflow workflow management platform and Lyft’s data discovery engine are the outcomes of using open-source tools to enable data teams to do better work with data, pointed out Ali Rehman, project manager for software company CloudiTwins in an email interview with Lifewire. “Just as the open-source revolution has led to a transformation of software development, so too has it been driving the development and democratization of data science and artificial intelligence,” Rehman said. “Open source has become a critical enabler of enterprise data science solutions, with the majority of data scientists using open-source tools.”
Opening the Door
Developing open-source AI could help make the potentially game-changing power of the new technology less prone to biases and errors, some observers argue. AI research now primarily happens in the open, with nearly all companies, research labs, and universities presenting their results immediately in scholarly publications, Kush Varshney, an AI researcher at IBM, told Lifewire in an email interview. “This open community is essential, as it provides enhanced levels of checks and balances to ensure AI is being researched, created, deployed, and applied responsibly,” Varshney added. “This is especially critical in situations where these systems can influence the lives of our most vulnerable members of society. This openness applies not only to general machine learning and deep learning algorithms but also to elements of trustworthy AI.” Rehman said that one of the critical differences between proprietary and open-source software is flexibility and customization. Proprietary AI research will have issues with security, updates, and optimizations. “This is because the open-source community-based approach gets valuable input from thousands of industry experts that identify potential security vulnerabilities which are then remediated more quickly,” Rehman added. “The consensus of the community means that quality is guaranteed and new opportunities are more easily identified.” Another issue is that proprietary AI research will not be interoperable, meaning that it cannot work with various data formats and will likely have vendor lock-in, which prevents companies from testing and trying the software before committing to a solution, Rehman said. But not every aspect of AI research needs to be open-source, Chris Kent, the CEO of the medical AI company Reveal Surgical, told Lifewire in an email interview. “It’s important to protect the economic incentives that drive the commercial development of key applications of AI,” he said. However, research into AI needs a robust open-source component, Kent said. He added that open source works to build trust and use datasets that are not or should not be controlled by single institutions or companies. “An open-source approach is the best way to identify and compensate for underlying bias that may exist in training sets and will lead to more holistic, creative, and reliable applications of AI,” Kent said.