>>/177889/, >>/177890/, >>/177891/, >>/177892/, >>/177893/, >>/177894/, >>/177895/, >>/177896/, >>/177897/, >>/177898/, >>/177899/, >>/177900/, >>/177901/, >>/177902/, >>/177903/, >>/177904/, >>/177905/, >>/177906/, >>/177907/, >>/177908/, >>/177909/, >>/177910/, >>/177911/, >>/177912/, >>/177913/, >>/177914/, >>/177915/, >>/177916/, >>/177917/, >>/177918/, >>/177919/, >>/177920/, >>/177921/, >>/177922/, >>/177923/, >>/177924/, >>/177925/, >>/177926/, >>/177927/, >>/177928/, >>/177929/, >>/177930/, >>/177931/, >>/177932/, >>/177933/, >>/177934/, >>/177935/, >>/177936/, >>/177937/
Robert W Malone, MD @RWMaloneMD - Really important study. Buyer (or author) beware. AI “hallucinations “ are a major problem.
Quote:
Abdul Șhakoor @abxxai
BREAKING: Someone just tested 35 AI models across 172 billion tokens of real document questions.
The hallucination numbers should end the "just give it the documents" argument forever.
Here is what the data actually showed.
The best model in the entire study, under perfect conditions, fabricated answers 1.19% of the time. That sounds small until you realize that is the ceiling. The absolute best case. Under optimal settings that almost no real deployment uses.
Typical top models sit at 5 to 7% fabrication on document Q&A. Not on questions from memory. Not on abstract reasoning. On questions where the answer is sitting right there in the document in front of it.
The median across all 35 models tested was around 25%.
One in four answers fabricated, even with the source material provided.
Then they tested what happens when you extend the context window. Every company selling 128K and 200K context as the hallucination solution needs to read this part carefully.
At 200K context length, every single model in the study exceeded 10% hallucination. The rate nearly tripled compared to optimal shorter contexts.
The longer the window people want, the worse the fabrication gets. The exact feature being sold as the fix is making the problem significantly worse.
There is one more finding that does not get talked about enough.
Grounding skill and anti-fabrication skill are completely separate capabilities in these models.
A model that is excellent at finding relevant information in a document is not necessarily good at avoiding making things up. They are measuring two different things that do not reliably correlate. You cannot assume a model that retrieves well also fabricates less.
172 billion tokens. 35 models. The conclusion is the same across all of them.
Handing an LLM the actual document does not solve hallucination. It just changes the shape of it.
https://x.com/RWMaloneMD/status/2031730228758773783
Robert W Malone, MD @RWMaloneMD - The FDA’s New “Transparency” Database
You can't find, what you don't look for.
Adverse Event Monitoring System (AEMS)
On March 11, the U.S. Food and Drug Administration announced what it describes as the largest technical overhaul in its history: a unified safety database called the FDA Adverse Event Monitoring System (AEMS). According to the agency, the platform will merge multiple legacy reporting systems into one searchable dashboard covering drugs, vaccines, cosmetics, animal products, and more.
Commissioner Marty Makary called the project a step toward “radical transparency.” The FDA says the new system will modernize the analysis of adverse event reports while saving taxpayers about $120 million over five years.
About six million reports per year, which were previously scattered across seven databases, will eventually appear in one place.
That is the official explanation.
Read more at (click on the image):
https://www.malone.news/p/the-fdas-new-transparency-database
https://x.com/RWMaloneMD/status/2031860685789892650
robyn @RRR0BYN - Fags did this. Buccal fat removal plus a bob will age even the most beautiful woman on earth by 10 years instantly. Stop taking beauty advice from gays!
Quote:
Pamela @PamelaBies
One of the biggest mistakes a woman can make is to remove her buccal fat.
https://x.com/RRR0BYN/status/2031596174894047432
50