ylliX - Online Advertising Network

OpenAI Whistle Blower Disgusted That His Job Was To Collect Copyrighted Data For Training Its Models.

Image Source: Photo by Andrew Neel: https://www.pexels.com/photo/computer-monitor-with-openai-website-loading-screen-15863000/

You can listen to audio version of the article above.

A researcher who used to work at OpenAI is claiming that they broke the law by using copyrighted materials to train their AI models. The whistle blower also says that OpenAI’s whole way of doing business could totally shake up the internet as we know it.

Suchir Balaji, 25, worked at OpenAI for four years. But he got so freaked out by what they were doing, he quit!

He is basically saying that now that ChatGPT is making big bucks, they can’t just grab stuff from the internet without permission. It’s not “fair use” anymore, he says.

Of course, OpenAI is fighting back, saying they’re totally in the clear. Things are getting messy because even the New York Times is suing them over this whole copyright thing!”

“If you believe what I believe,” Balaji told the NYT, “You have to just leave the company.”

Balaji’s warnings, which he outlined in a post on his personal website, adds to the ever-growing controversy around the AI industry’s collection and use of copyrighted material to train AI models which was largely conducted without comprehensive government regulation and outside of the public eye.

“Given that AI is evolving so quickly,” intellectual property lawyer Bradley Hulbert told the NYT, “it is time for Congress to step in.”

So, picture this: It’s 2020, and Balaji, fresh out of college maybe, lands this cool job at OpenAI. He’s basically part of this team whose job it is to scour the web and gather all kinds of stuff to feed these AI models. Back then, OpenAI was still playing the whole “we’re just researchers” card, so nobody was really paying attention to where they were getting all this data from. Copyright? Meh, not a big deal… yet!”

“With a research project, you can, generally speaking, train on any data,” Balaji told the NYT. “That was the mindset at the time.”

But then, boom! ChatGPT explodes onto the scene in 2022, and everything changes. Suddenly, this thing isn’t just some nerdy research project anymore.

It’s making real money, generating content, and even ripping off people’s work! Balaji starts to realize that this whole thing is kinda shady. He’s seeing how ChatGPT is basically stealing ideas and putting people’s jobs at risk. It’s like, ‘Wait a minute, this isn’t what I signed up for!’”

“This is not a sustainable model,” Bilaji told the NYT, “for the internet ecosystem as a whole.”

“Now, OpenAI is singing a different tune. They’ve totally ditched their whole “we’re just a non-profit” act and are all about the Benjamins. They are saying, “Hey, we’re just using stuff that’s already out there, and it’s totally legal!” They even try to make it sound patriotic by saying that its “critical for “US competitiveness.”.

OpenAI Exposes Musk’s For-Profit Push In Fiery Rebuttal; The Drama Continues!

Source of image: Photo by Andrew Neel: https://www.pexels.com/photo/openai-text-on-tv-screen-15863044/

You can listen to audio version of the article above.

The ongoing dispute between OpenAI and Elon Musk has taken a new turn. OpenAI has released a series of emails on its website suggesting that Musk himself had previously advocated for a for-profit structure for the startup.

This revelation is huge given how critic Musk is of OpenAI’s subsequent transition from a non-profit to a for-profit entity, which also led to a lawsuit involving Microsoft.

In a Saturday blog post, OpenAI asserted that Musk not only desired a for-profit model but also proposed a specific organizational structure. Supporting this claim, OpenAI shared documentation indicating that Musk instructed his wealth manager, Jared Birchall, to register “Open Artificial Intelligence Technologies, Inc.” as the for-profit arm of OpenAI.

OpenAI isn’t holding back in their latest response to Elon Musk’s legal actions. In a recent blog post, they pointed out that this is Musk’s fourth try in under a year to change his story about what happened. They basically said, “His own words and actions tell the real story.”

They went on to say that back in 2017, Musk didn’t just want OpenAI to be for-profit, he actually set up a for-profit structure himself. But when he couldn’t get majority ownership and total control, he walked out telling them they were doomed to fail.

Now they argue that since OpenAI is a leading AI lab and Musk is running a rival AI company, he is trying to use the courts to stop them from achieving their goals.

In a separate legal filing, OpenAI also pushed back against Musk’s attempt to block their move to a for-profit model. They argued that what Musk is asking for would seriously hurt OpenAI’s business, decision-making and mission to create safe and beneficial AI, all while benefiting Musk and his own company.

OpenAI also claimed that Musk wanted majority stake in the for-profit arm of the company. The AI startup claimed that Musk said that he did not care about the money but instead wanted to accumulate $80 billion in wealth in order to build a city on Mars.

Elon Musk wanted to accumulate wealth to build city on Mars, claims OpenAI.

Research Shows AI Systems Are Highly Susceptible To Data Poisoning With Minimal Misinformation

Photo by Lukas: https://www.pexels.com/photo/pie-graph-illustration-669621/

You can listen to the audio version of this article in the above video.

It is widely known that large language models (LLMs), the technology behind popular chatbots like ChatGPT, can be surprisingly unreliable. Even the most advanced LLMs have a tendency to misrepresent facts, often with unsettling confidence.

This unreliability becomes particularly dangerous when dealing with medical information, as people’s health could be at stake.

Researchers at New York University have discovered a disturbing vulnerability: adding even a tiny amount of deliberately false information (a mere 0.001%) to an LLM’s training data can cause the entire system to spread inaccuracies.

Their research, published in Nature Medicine and reported by Ars Technica, also revealed that these corrupted LLMs perform just as well on standard tests designed for medical LLMs as those trained on accurate data. This alarming finding suggests that current testing methods may not be sufficient to detect these serious risks.

The researchers emphasize the urgent need for improved data tracking and greater transparency in LLM development, especially within the healthcare sector, where misinformation can have life-threatening consequences for patients.

In one experiment, the researchers introduced AI-generated medical misinformation into “The Pile,” a commonly used LLM training dataset that includes reputable medical sources like PubMed. They were able to create 150,000 fabricated medical articles in just 24 hours, demonstrating how easily and cheaply these systems can be compromised. The researchers point out that malicious actors can effectively “poison” an LLM simply by disseminating false information online.

This research highlights significant dangers associated with using AI tools, particularly in healthcare. This is not a hypothetical problem; last year, the New York Times reported that MyChart, an AI platform used by doctors to respond to patient inquiries, frequently generated inaccurate information about patients’ medical conditions.

The unreliability of LLMs, especially in the medical field, is a serious and pressing concern. The researchers strongly advise AI developers and healthcare providers to acknowledge this vulnerability when developing medical LLMs. They caution against using these models for diagnosis or treatment until stronger safeguards are implemented and more thorough security research is conducted to ensure their reliability in critical healthcare settings.

The study found that by replacing just one million out of 100 billion training units (0.001%) with vaccine misinformation, they observed a 4.8% increase in harmful content generated by the LLM. This was achieved by adding approximately 2,000 fake articles (around 1,500 pages), which cost a mere $5 to generate.

Crucially, unlike traditional hacking attempts that target data theft or direct control of the AI, this “data poisoning” method does not require direct access to the model’s internal workings, making it a particularly insidious threat.