Innodata Releases Open-Source LLM Evaluation Toolkit and Evaluation Datasets and Announces New LLM Trust and Safety Wins
- Innodata Inc. provides an open-source LLM Evaluation Toolkit and Evaluation Datasets for assessing LLM safety in enterprise tasks.
- Data scientists can use the toolkit to test the safety of LLMs across various harm categories and identify problematic output conditions.
- Innodata encourages enterprise LLM developers to utilize the toolkit and datasets.
- The company plans to release a commercial version of the toolkit and more comprehensive benchmarking datasets later this year.
- Innodata published research on benchmarking LLM safety, sharing reproducible results for various LLMs.
- The toolkit, datasets, and research are available on GitHub for access and implementation.
- Innodata secured trust and safety engagements in Q4-2023 and Q1-2024, demonstrating its focus on LLM safety and evaluation.
- The company initiated pilots for new and existing customers in Q1-2024, further expanding its LLM trust and safety initiatives.
- None.
NEW YORK, NY / ACCESSWIRE / April 25, 2024 / Innodata Inc. (NASDAQ:INOD), a leading data engineering company, today announced that it has released an open-source LLM Evaluation Toolkit, together with a repository of 14 semi-synthetic and human-crafted evaluation datasets, that enterprises can utilize for evaluating the safety of their Large Language Models (LLMs) in the context of enterprise tasks.
Using the toolkit and the datasets, data scientists can automatically test the safety of underlying LLMs across multiple harm categories simultaneously. By identifying the precise input conditions that generate problematic outputs, developers can understand how their AI systems respond to a variety of prompts and can identify remedial fine-tuning required to align the systems to the desired outcomes. Innodata encourages enterprise LLM developers to begin utilizing the toolkit and the published data sets as-is. Innodata expects a commercial version of the toolkit and more extensive, continually-updated benchmarking datasets to become available later this year.
Together with the release of the toolkit and the datasets, Innodata published its underlying research around its methods for benchmarking LLM safety. In the paper, Innodata shares the reproduceable results it achieved using the toolkit to benchmark Llama2, Mistral, Gemma, and GPT for factuality, toxicity, bias, and hallucination propensity.
The toolkit, the datasets, and the research are available on GitHub at https://github.com/innodatalabs/innodata-llm-safety.
Innodata began working on trust and safety for one of its Big Tech customers in Q4-2023. In Q1-2024, Innodata won two additional engagements for LLM safety and evaluation - one for a hyperscaler's own foundation models and one for an enterprise customer of the hyperscaler through Innodata's white label program with the hyperscaler. In addition, in Q1-2024, Innodata started pilots for a new customer and an existing customer around LLM trust and safety.
For additional information about Evaluation and Red Teaming in LLMs, see: https://innodata.com/red-teaming-in-llms-unveiling-ai-vulnerabilities/.
About Innodata
Innodata (NASDAQ:INOD) is a global data engineering company delivering the promise of AI to many of the world's most prestigious companies. We provide AI-enabled software platforms and managed services for AI data collection/annotation, AI digital transformation, and industry-specific business processes. Our low-code Innodata AI technology platform is at the core of our offerings. In every relationship, we honor our 30+ year legacy delivering the highest quality data and outstanding service to our customers. Visit www.innodata.com to learn more.
Forward Looking Statements
This press release may contain certain forward-looking statements within the meaning of Section 21E of the Securities Exchange Act of 1934, as amended, and Section 27A of the Securities Act of 1933, as amended. These forward-looking statements include, without limitation, statements concerning our operations, economic performance, and financial condition. Words such as "project," "believe," "expect," "can," "continue," "could," "intend," "may," "should," "will," "anticipate," "indicate," "predict," "likely," "estimate," "plan," "potential," "possible," "promises," or the negatives thereof, and other similar expressions generally identify forward-looking statements.
These forward-looking statements are based on management's current expectations, assumptions and estimates and are subject to a number of risks and uncertainties, including, without limitation, impacts resulting from the continuing conflict between Russia and the Ukraine and Hamas' attack against Israel and the ensuing conflict; investments in large language models; that contracts may be terminated by customers; projected or committed volumes of work may not materialize; pipeline opportunities and customer discussions which may not materialize into work or expected volumes of work; the likelihood of continued development of the markets, particularly new and emerging markets, that our services support; the ability and willingness of our customers and prospective customers to execute business plans that give rise to requirements for our services; continuing reliance on project-based work in the Digital Data Solutions (DDS) segment and the primarily at-will nature of such contracts and the ability of these customers to reduce, delay or cancel projects; potential inability to replace projects that are completed, canceled or reduced; continuing DDS segment revenue concentration in a limited number of customers; our dependency on content providers in our Agility segment; the Company's ability to achieve revenue and growth targets; difficulty in integrating and deriving synergies from acquisitions, joint ventures and strategic investments; potential undiscovered liabilities of companies and businesses that we may acquire; potential impairment of the carrying value of goodwill and other acquired intangible assets of companies and businesses that we acquire; a continued downturn in or depressed market conditions; changes in external market factors; changes in our business or growth strategy; the emergence of new, or growth in existing competitors; various other competitive and technological factors; our use of and reliance on information technology systems, including potential security breaches, cyber-attacks, privacy breaches or data breaches that result in the unauthorized disclosure of consumer, customer, employee or Company information, or service interruptions; and other risks and uncertainties indicated from time to time in our filings with the Securities and Exchange Commission.
Our actual results could differ materially from the results referred to in forward-looking statements. Factors that could cause or contribute to such differences include, but are not limited to, the risks discussed in Part I, Item 1A. "Risk Factors," Part II, Item 7. "Management's Discussion and Analysis of Financial Condition and Results of Operations," and other parts of our Annual Report on Form 10-K, filed with the Securities and Exchange Commission on March 4, 2024, as updated or amended by our other filings that we may make with the Securities and Exchange Commission. In light of these risks and uncertainties, there can be no assurance that the results referred to in the forward-looking statements will occur, and you should not place undue reliance on these forward-looking statements. These forward-looking statements speak only as of the date hereof.
We undertake no obligation to update or review any guidance or other forward-looking statements, whether as a result of new information, future developments or otherwise, except as may be required by the U.S. federal securities laws.
Company Contact
Marcia Novero
Innodata Inc.
Mnovero@innodata.com
(201) 371-8015
SOURCE: Innodata Inc.
View the original press release on accesswire.com
FAQ
What did Innodata announce regarding LLM safety evaluation?
How can data scientists utilize the toolkit released by Innodata?
Where are the toolkit, datasets, and research published by Innodata available?
When did Innodata start working on trust and safety for one of its Big Tech customers?
What engagements did Innodata win in Q1-2024 related to LLM safety and evaluation?