Friday, October 04, 2024

Authors' concern over publishers selling their research to AI developers






Gradually, publishers are moving towards a deal with tech companies including Microsoft, Google, Open AI, Apple and Meta. The deal is that the publishers will get compensation for their content being used to feed and train large language models (LLMs) or other generative AI models. Academic papers having high information content are valuable for these LLMs. Lucy Lu Wang, who co-created S2ORC, a data set based on 81.1 million academic papers, says "Training models on a large body of scientific information also give them a much better ability to reason about scientific topics."[1] Recently Taylor and Francis made a deal with Microsoft in July for selling access to its authors’ work to AI firm, also Routledge, Wiley, Sage, Cambridge University Press and Oxford University Press are in the same line.

Publishers probably view these deals an alternative as their data already being harvested by these firms without an agreement. According to a Sage spokesperson “We believe that a preferable route is to offer clear licensing routes to our content that protect rights and include payment for the use of content by the LLM that we can pass on to authors and societies." [2] Informa, the parent company of Taylor & Francis and Routledge has recently revealed to make $75m from AI deals and Wiley to make $44m from two AI partnerships. The Independent Publishers Guild’s Autumn Conference is one of the biggest and best events of the UK publishing year held recently on September 17, 22024 in London, UK and the most discussed topic was AI and licensing.

 
 Possible use of content by AI companies         

In the article published in Leiter Reports, a philosophy blog, "Cambridge University Press now asking authors whether they want to license their publications for LLMs."[3] CUP discussed the possible use of authors content by AI developers.

If your work is part of a generative AI licensing agreement, it could be used for: 
 
  • Training and testing the foundational models that are then used to create, for example, personal assistant and chatbot tools or discoverability summaries.
  • As part of banks of authoritative content that are used, on a perpetual basis, to check and verify the accuracy of information provided by AI tools. 

  

 Benefits of this licensing to authors            


  1. Publishers can monetize their archives and content by AI companies paying them to use it to train their LLMs.
  2. It may improve the quality and accuracy of tools that are increasingly going to be used in everyday life.
  3. There may also be opportunities for your content to have greater visibility and impact if it is properly cited and attributed by AI tools.


 Authors' concerns          


In all of these deals authors rights have been ignored causing authors' concern about their work that is being fed to LLMs without even any information and remuneration for their work. The Society of Authors (SoA) has more than 12000 members. It has written letter stating that they do not consent to these tech companies involved in using their work in the development of artificial intelligence (AI) systems.

The letter by SOA Policy Team (August 2024) states:

“Our members have instructed us to put you on express notice that they do not authorise or otherwise grant permission for the use of any of their copyright-protected works in relation to, without limitation, the training; development; or operation of AI models (including the generation of Infringing Works), by large language models or other generative AI models, unless they have first specifically agreed licensing arrangements for the use of their work.” It warns that this “continues to cause great harm to creators’ livelihoods and jeopardizes the future of the profession, which in turn threatens our creative industries and our cultural capital”. [4]

The letter by The Creative Rights Alliance (August 2024) which represent over 500,000 creators, has also written a similar letter to tech companies. [5] Instead of above a few more cases are there which supports the fact that tech firms should not use copyright-protected works without permission or compensation, and that these firms should seek licenses and create transparency for rights holders. Authors have also angered on publishers like Taylor & Francis dealing with Microsoft for selling authors' research for $10m. [6] Sage confirmed that it will pay royalties on any licensing income to the authors, editors and societies based on according to the contracts. [7] Authors having their research content with CUP are more relaxed as the Cambridge University Press (CUP) is carefully considering how best to license their content to generative licensing providers and have created a set of principles to guide our decision-making. [8]

These focus on:
  • author attribution
  • the creation of formal licensing arrangements to govern content
  • obtaining permissions from rights holders
  • obtaining fair remuneration for the use of content.

CUP’s "opt-in" approach [9] involves asking for the consent of all authors and rights-holders for being the part of generative AI licensing agreement before licensing their content to providers of generative AI technologies. 
The Bookseller - News - Sage confirms it is in talks to license content to AI firms [10]
 

Overall, the evolving landscape of publishing in relation to generative AI presents both opportunities and challenges for authors and publishers alike. with the licensing of academic content between publishers and these tech companies like Microsoft, Google, and Open AI, etc, there is a clear potential for monetization and enhanced visibility for authors' works.  However, the concerns surrounding authors' rights and compensation cannot be ignored. Many authors express anger with the existing practices, feeling their contributions are exploited without proper recognition or remuneration.

Organizations such as the Society of Authors and the Creative Rights Alliance are advocating for transparent licensing agreements that respect authors’ rights and ensure fair compensation. Meanwhile, publishers like Cambridge University Press are adopting an "opt-in" approach, prioritizing author consent and establishing principles for ethical licensing.

As the discussions around AI and copyright continues to evolve, it is very important for all stakeholders—authors, publishers, and tech companies to collaborate in creating a framework that protects the rights and prestige of authors. Finding a balance between the advantages of AI in making research more accessible and the need to respect authors' work is essential for the future of publishing in the age of artificial intelligence. 


 Reference        


  1. 1. Has your paper been used to train an AI model? Almost certainly (nature.com)
  2. 2. https://www.thebookseller.com/news/sage-confirms-it-is-in-talks-to-license-content-to-ai-firms
  3. 3. https://leiterreports.typepad.com/blog/2024/05/cambridge-university-press-now-asking-authors-whether-they-want-to-license-their-publications-for-ll.html
  4. 4. The Society of Authors writes to tech companies asserting members’ rights around uses of their works by generative AI - The Society of Authors
  5. 5. https://www.thebookseller.com/news/creators-demand-immediate-change-from-companies-developing-ai-after-unlawful-use-of-content
  6. 6. https://www.thebookseller.com/news/academic-authors-shocked-after-taylor--francis-sells-access-to-their-research-to-microsoft-ai
  7. 7. https://www.thebookseller.com/news/sage-confirms-it-is-in-talks-to-license-content-to-ai-firms
  8. 8. The Bookseller - News - IPG 2024 Autumn Conference dominated by AI and licensing discussions
  9. 9. https://infogram.com/1p9g1kvndzqkrkt7523yd02wk3b3grmm9mw?live&utm_campaign=LLM+Comms&utm_medium=bitly&utm_source=Email
  10. 10. https://www.thebookseller.com/news/sage-confirms-it-is-in-talks-to-license-content-to-ai-firms
  11. 11. Open-access expansion threatens academic publishing industry (insidehighered.com)
  12. 12. https://www.thebookseller.com/news/wiley-set-to-earn-44m-from-ai-rights-deals-confirms-no-opt-out-for-authors
  13. 13. https://www.thebookseller.com/news/society-of-authors-writes-to-ai-firms-demanding-appropriate-remuneration-and-consent-for-authors
  14. 14. https://www.thebookseller.com/news/anthropic-sued-by-us-authors-over-use-of-pirated-books-to-train-ai-chatbot
  15. 15. https://www.thebookseller.com/news/taylor-francis-set-to-make-58m-from-ai-in-2024-as-it-reveals-second-partnership
  16. 16. https://www.nature.com/articles/d41586-024-02599-9



Thursday, August 15, 2024

How about the ability to visualize journal article with 3D, AR and VR technologies?




The fundamental structure of any research article remains a simple document comprised of text and printable figures. Printable media have some limitations to represent scientific communication. It constrains complex scientific data into 2D static figures, hindering our ability to effectively exchange the complex and extensive information. Although the practice of of using digital supplementary material to include digital media with articles is common to modernize articles. Unfortunately, recent metrics indicate these materials are accessed by as few as .04% of readers. [1] [2]

Now the whole scenario is changing and the way content is created, consumed and interacted has changed drastically in digital world. Publishing industry is now revolutionizing with cutting edge technologies Augmented Reality (AR) and Virtual Reality (VR) and 3D. Adoption of smartphones, emergence of native browser integration of the web graphics library (WebGL) are now the part of modern era scientific communication. These bring an immersive captivating experience for readers and improving readers engagement. In disciplines of sciences and medical sciences and others also, authors who include 3D models such as molecular structures and tissue illustrations as part of their manuscript submission will have the opportunity to turn them into interactive AR-viewable objects. Before knowing that how these technologies are being used in publishing it is important to know the basic concepts of AR and VR.

Saturday, June 22, 2024

Principles for Library Ownership of Digital Books by Library Futures

p;

Sunday, May 26, 2024

Quiz on modern trends in library and information science (Series 10)

Friday, May 17, 2024

Don't share sensitive information to ChatGPT/Google Gemini

Don't share sensitive information to ChatGPT/Google Gemini Chats may be reviewed and used to train their models.ChatGPT, like many AI models, operates on the principle of learning from interactions. While this enhances its ability to generate relevant and helpful responses, it also raises concerns about privacy and data security.

Don't share sensitive information to ChatGPT/Google Gemini Chats may be reviewed and used to train their models.

ChatGPT, like many AI models, operates on the principle of learning from interactions. While this enhances its ability to generate relevant and helpful responses, it also raises concerns about privacy and data security. Users must remain vigilant about the information they share, especially when it pertains to sensitive or confidential matters.

Key Considerations for Protecting Privacy in Chat Interactions

    Mindful Sharing

    Before sharing sensitive information in a chat, consider whether it's necessary and appropriate. Avoid divulging personal details, financial information, or other sensitive data unless absolutely essential.


    Encryption and Security

    Opt for platforms that prioritize encryption and robust security measures. Look for end-to-end encryption, which ensures that only the sender and recipient can access the messages, minimizing the risk of interception.


    Awareness of AI Presence

    When interacting with AI models like ChatGPT, be mindful that the conversations may be recorded and used for training purposes. Avoid sharing confidential information or anything you wouldn't want to be stored or analyzed.


    Clear Communication

    Clearly communicate any boundaries or limitations regarding the information you're comfortable sharing. If you're unsure about the security of a chat platform or AI model, err on the side of caution and refrain from sharing sensitive data.


    Regular Monitoring

    Periodically review your chat history and delete any messages containing sensitive information. This reduces the likelihood of unintended exposure or data breaches.


    Stay Informed

    Stay updated on the latest developments in chat platform security and privacy policies. Be cautious of any changes that may impact the confidentiality of your conversations.


    By adopting these practices, users can mitigate the risks associated with sharing sensitive information in chat interactions. While chat platforms offer convenience and connectivity, safeguarding privacy should always remain a top priority.

    Remember, what you share in a chat conversation could have long-lasting implications. By exercising caution and mindfulness, you can protect your privacy and ensure a safer online experience for yourself and others.

Monday, May 06, 2024

Top Plagiarism Checker Tools

LIS (Library and Information Science) BlogsExplore the pinnacle of plagiarism detection with our curated list of top-notch plagiarism checker tools. Uncover the best software equipped to safeguard your academic or professional integrity. From comprehensive analysis to user-friendly interfaces, these tools offer unparalleled accuracy and efficiency in ensuring originality and authenticity in your work.
Top Plagiarism Checker Tools
Image 1

Turnitin

Turnitin is a widely used originality checking tool used to detect plagiarism. It is essentially a software program that compares submitted work to a massive database of online sources, academic papers, and student work. This helps educators identify instances where a student might have copied content without proper citation.

Image 1

Grammarly

Grammarly offers a plagiarism checker as part of its suite of writing tools. It helps identify potential instances of plagiarism in your writing.The free version of Grammarly provides a basic plagiarism check indicating the presence or absence of plagiarism. Upgrading to Grammarly Premium offers a more detailed report.

Image 1

Originality.ai

Originality.ai is a tool specifically designed to address AI-generated content and plagiarism in the context of online publishing. It's particularly useful for those working with content creation and publishing where AI-generated content and authenticity are growing concerns.

Image 1

Copyleaks

It is a plagiarism detection tool similar to Turnitin. It's an online platform that provides plagiarism detection services for educational institutions, businesses, and individuals. Copyleaks scans submitted content, such as academic papers, articles, websites, and more, to identify similarities with existing online content. It offers features like real-time scanning, batch scanning, and integration with LMS and CMS.

Image 1

Ithenticate

iThenticate is a plagiarism detection software developed by Turnitin, the same company behind the Turnitin plagiarism detection service. iThenticate is primarily targeted towards researchers, authors, publishers, and institutions involved in scholarly publishing. It allows users to submit documents, such as academic papers, manuscripts, grant proposals, and other scholarly works, to check for similarity with existing content in its extensive database.

Image 1

DrillBit

Drillbit Plagiarism" is a plagiarism detection software designed for academic institutions and instructors. It works by scanning student submissions for similarities with existing sources, including online content and academic databases. Similar to other plagiarism detection tools like Turnitin and Copyleaks, Drillbit Plagiarism helps educators ensure academic integrity by identifying instances of plagiarism in student work.

Image 1

PlagScan

PlagScan is a web-based plagiarism detection software specifically designed to identify instances of copied content. used primarily in academic and professional settings. Similar to other plagiarism detection tools, Plagscan allows users to upload documents and scans them for similarities with other texts available online and in its own database. It's often used by educators, researchers, publishers, and businesses to ensure the originality of written work and to prevent plagiarism.

Image 1

Cross Plag

Cross Plag" most likely refers to the software Crossplag, a plagiarism checker with a specific focus on multilingual detection. If you need a plagiarism checker that can handle content in multiple languages, Crossplag is a strong contender. However, if you don't require cross-lingual features and prioritize features like AI content detection or comprehensive analysis, other options like Originality.ai or Copyleaks might be better suited for your needs.

Image 1

Unicheck

Unicheck is a plagiarism checker tool designed to help educators and students detect plagiarism according to their website. It is used by over 1 million users in over 90 countries. Unicheck offers a variety of features including the ability to check for plagiarism across 91 billion current and archived webpages.

Image 1

Quetext

it is a plagiarism checker and AI content detector tool that utilizes deep search technology to identify plagiarism and AI, resolve writing issues, and build citations. It helps students, teachers, and content writers alike ensure the originality of their work.

Image 1

Copyscape

Copyscape is a plagiarism checker tool that allows users to search for copies of their content on the web. Copyscape offers a free and premium version.

Image 1

Duplichecker

It can detect plagiarism from billions of websites and also check for paraphrased content. You can upload your documents or copy and paste text into the plagiarism checker. The plagiarism checker will then provide a report that shows the percentage of original content and the percentage of plagiarized content. DupliChecker.com also offers a variety of other SEO and content tools, such as a paraphrasing tool, a grammar checker, and a backlink checker.

Monday, April 29, 2024

Tortured phrases: common behavior of language models






  Use of Large language models (LLMs) in preparing academic content is grappling academic research writings, blogs, etc. all over the world. There are many reasons to support this but to what extent this can be useful and to what extent it is causing the degraded quality of research papers. This question is under discussion and leaving our mind little bit perplexed in the fair use of these LLMs. Tortured phrases found in academic papers give the proof of using AI generated text in research papers and articles.

 What are Tortured phrases?       

Guillaume Cabanac. Cyril Labbé. Alexander Magazinov (2021) introduced the concept of 'Tortured Phrases' in their paper (Tortured phrases: A dubious writing style emerging in science (arxiv.org)) defined as
unexpected, weird phrases in lieu of established ones, such as ‘counterfeit consciousness’ instead of ‘artificial intelligence'. [1]
As we know that words in the original language have multiple meanings, and words also change their meaning depending on the word or words they have been paired with. Depending on context in which we use them some pairs are appropriate, and some are not. Humans who know the language easily understand this but computers are not smart enough to know the difference and may not always choose corresponding words with the intended meaning. 

Let's read the following words 👇 that you have listened or read all these days randomly somewhere and you are familiar with them. 

"Artificial Intelligence",
"big data,” and
“random value.

👉 But what if they are taken to mean

“counterfeit consciousness,”
“colossal information,” 
and “irregular esteem”?

 Few Tortured Phrases        

 

General Phrase

Tortured Phrase

Artificial Intelligence

Counterfeit Consciousness

Big Data

Colossal Information

Random value

Irregular Esteem

Deep neural network’

Profound neural organization

Signal to noise

Flag to commotion

Remaining Energy

Leftover vitality

Cloud Computing

Haze figuring

Linear prediction

Straight expectation

Naive Bayes

Gullible Bayes

Random forest

Irregular Woodland

Smart home

Savy home


These weird phrases have been found in a few journals. Many of them (about 500 papers) found concentrated in special issue of the journal Microprocessors and Microsystems between 2018 and 2021. [2] 
By January 2022, Cabanac, Labbé, and Magazinov had found nearly 3,200 papers containing tortured phrases or weird English phrases even in reputable and peer-reviewed journals. [3]
After research it was found that such phrases are outcome of using automated translation/paraphrasing. [4]

 Problematic Paper Screener       

It is a tool, (software package) to track papers that contain tortured phrases or weird English phrases. The team of computer scientists, led by Cabanac, Labbé, and Magazinov, developed Problematic Paper Screener. [5]
 According to Yateendra Joshi, this practice of totally depending on AI LLMs for writing academic papers is unethical and erodes public confidence in the academic publishing industry, which may lead to the authors of such publications being pressured to retract them. The researchers ought to make an effort to write more effectively or enlist the aid of reliable editing and translation services. [6]


We can take advantage of LLMs and use them in writing, but with care and understanding that computer models have their limitations and humans creativity and analysis power do not know any boundaries. Use them wisely, and if the language is a barrier to writing, then the help of language experts and tools can be beneficial. This will not only make your research writing sound but eventually bring faith to research publications that are made with the help of LLMs.


References: 


2. Else, Holly. Tortured phrases’ give away fabricated research papers. Nature 596, 328-329 (2021) doi: https://doi.org/10.1038/d41586-021-02134-0

3. Cabanac, G., Labbé, C. & Magazinov, A. (January 13, 2022). “Bosom peril” is not “breast cancer”: How weird computer-generated phrases help researchers find scientific publishing fraud. Bulletin of the Atomic Scientist. URL: "Bosomperil" is not "breast cancer": How weird computer-generatedphrases help researchers find scientific publishing fraud - Bulletin of theAtomic Scientists (thebulletin.org)

4. Joshi, Yateendra (April 21, 2022) Tortured phrases: What they are, how they are detected, and how to avoid them. Editage Insights. URL: Torturedphrases: What they are, how they are detected, and how to avoid them(editage.com)

5. Cabanac, G., Labbé, C., & Magazinov, A. (2022). The ‘Problematic Paper Screener’ automatically selects suspect publications for post-publication (re)assessment.Presented at WCRI 2022: 7th World Conference on Research Integrity. arXiv preprint. https://doi.org/10.48550/arXiv.2210.04895

6. Joshi, Yateendra (April 21, 2022) Tortured phrases: What they are, how they are detected, and how to avoid them. Editage Insights. URL: Torturedphrases: What they are, how they are detected, and how to avoid them(editage.com)


Thursday, April 18, 2024

What is Doxing? (also spelled "doxxing")

The word "Diming" (also spelled "Doxxing") is derived from the term "dropping do.," or "documents.".......

Wednesday, March 13, 2024

Aakashganga a galaxy of opportunities: open access portal on Indian scholarship

 




This initiative is part of the broader open access movement, which advocates for the free dissemination of scholarly work online, without barriers such as subscription fees or paywalls. By making research articles openly accessible, the Aakashganga portal promotes transparency, collaboration, and innovation in academic research within India.

Thursday, February 22, 2024

OA.mg Unveiled: The Modern Academic's Gateway to Research

 

In the vast ocean of academic research, finding the right paper can feel like searching for a needle in a haystack. That's where OA.mg comes into play, revolutionizing the way we access, review, and interact with academic literature. The Open Access movement has given momentum to an increasing number of open-access journals, and OA.mg focuses on that.

Friday, February 09, 2024

Elsevier Scopus AI: cutting-edge AI for improved scholarly research





      In the generation of AI, a global leader in information and analytics, Elsevier has now come up with its new intuitive and intelligent AI powered search tool, Scopus AI. It represents a significant advancement in the field of academic research through the integration of generative AI. This AI-powered research tool has been designed to enhance the capabilities of researchers and academic institutions by providing fast and accurate summaries, insights into research, and fostering collaboration for societal impact. The platform utilizes the extensive database of SCOPUS, featuring over 29,200+ peer-reviewed journals from more than 7,000 publishers worldwide. Natural language processing is also used by Scopus. Users can now enter in their question, statement, or hypothetical using natural language, without having to worry about matching certain keywords or Boolean operators.