Home OpenAI Google AI Researchers Introduce a New Whale Bioacoustics Model that can Identify Eight Distinct Species, Including Multiple Calls for Two of Those Species

OpenAI

Google AI Researchers Introduce a New Whale Bioacoustics Model that can Identify Eight Distinct Species, Including Multiple Calls for Two of Those Species

adminUpdated 10 months Ago2 Mins read118 Views

Google AI Researchers Introduce a New Whale Bioacoustics Model that can Identify Eight Distinct Species, Including Multiple Calls for Two of Those Species

Whale species produce a wide range of vocalizations, from very low to very high frequencies, which vary by species and location, making it difficult to develop models that automatically classify multiple whale species. By analyzing whale vocalizations, researchers can estimate population sizes, track changes over time, and help develop conservation strategies, including protected area designation and mitigation measures. Effective monitoring is essential for conservation, but the complexity of whale calls, especially from elusive species, and the vast amount of underwater audio data complicate efforts to track their populations.

Current methods for animal species identification through sound are more advanced for birds than for whales, as models like Google Perch can classify thousands of bird vocalizations. However, similar multi-species classification models for whales are more challenging to develop due to the diversity in whale vocalizations and a lack of comprehensive data for certain species. Previous efforts have focused on specific species like humpback whales, with earlier models developed by Google Research in partnership with NOAA and other organizations. These models helped classify humpback calls and identified new locations of whale activity.

To address the limitations of previous models, Google researchers developed a new whale bioacoustics model capable of classifying vocalizations from eight distinct species, including the mysterious “Biotwang” sound attributed to the Bryde’s whale. This new model expands on earlier efforts by classifying multiple species and vocalization types, designed for large-scale application on long-term passive acoustic recordings.

The proposed whale bioacoustics model processes audio data by converting it into spectrogram images for each 5-second window of sound. The front-end of the model uses mel-scaled frequency axes and log amplitude compression. It then classifies these spectrograms into one of 12 classes, corresponding to eight whale species and several specific vocalization types. To ensure accurate classifications and minimize false positives, the model was trained not just on positive examples but also on negative and background noise data. The model’s performance, as measured by metrics such as the area under the receiver operating characteristic curve (AUC), showed strong discriminative abilities, particularly for species like Minke and Bryde’s whales.

Along with the classification task, the model helped researchers discover new insights about species’ movements, including differences between central and western Pacific Bryde’s whale populations. By labeling over 200,000 hours of underwater recordings, the model also uncovered the seasonal migration patterns of some species. The model is now publicly available via Kaggle for further use in whale conservation and research efforts.

In conclusion, Google’s new whale bioacoustics model is a significant advancement in the field, addressing the challenge of multi-species classification with a model that not only recognizes eight species but also provides detailed insights into their ecology. This model is a crucial tool in marine biology research, offering scalable and accurate underwater audio data classification and furthering our understanding of whale populations, especially for elusive species like Bryde’s whales.

Check out the Paper and Blog. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 50k+ ML SubReddit

⏩ ⏩ FREE AI WEBINAR: ‘SAM 2 for Video: How to Fine-tune On Your Data’ (Wed, Sep 25, 4:00 AM – 4:45 AM EST)

Pragati Jhunjhunwala is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Kharagpur. She is a tech enthusiast and has a keen interest in the scope of software and data science applications. She is always reading about the developments in different field of AI and ML.

⏩ ⏩ FREE AI WEBINAR: ‘SAM 2 for Video: How to Fine-tune On Your Data’ (Wed, Sep 25, 4:00 AM – 4:45 AM EST)

Source link

Previous post Advancing Membrane Science: The Role of Machine Learning in Optimization and Innovation

Next post ByteDance Researchers Release InfiMM-WebMath-40: An Open Multimodal Dataset Designed for Complex Mathematical Reasoning

Apple Researchers Introduce FastVLM: Achieving State-of-the-Art Resolution-Latency-Accuracy Trade-off in Vision Language Models

Vision Language Models (VLMs) allow both text inputs and visual understanding. However,...

admin3 Mins read

OpenAI

A Coding Guide to Build a Scalable Multi-Agent System with Google ADK

In this tutorial, we explore the advanced capabilities of Google’s Agent Development...

admin7 Mins read

OpenAI

Too Much Thinking Can Break LLMs: Inverse Scaling in Test-Time Compute

Recent advances in large language models (LLMs) have encouraged the idea that...

admin3 Mins read

OpenAI

Rubrics as Rewards (RaR): A Reinforcement Learning Framework for Training Language Models with Structured, Multi-Criteria Evaluation Signals

Reinforcement Learning with Verifiable Rewards (RLVR) allows LLMs to perform complex reasoning...

admin3 Mins read

This Week

The U.S. White House Releases AI Playbook: A Bold Strategy to Lead the Global AI Race

Building a Context-Aware Multi-Agent AI System Using Nomic Embeddings and Gemini LLM

VLM2Vec-V2: A Unified Computer Vision Framework for Multimodal Embedding Learning Across Images, Videos, and Visual Documents

Weekly Newsletter

Google AI Researchers Introduce a New Whale Bioacoustics Model that can Identify Eight Distinct Species, Including Multiple Calls for Two of Those Species

Leave a comment

Leave a Reply Cancel reply

Latest Posts

Building a Context-Aware Multi-Agent AI System Using Nomic Embeddings and Gemini LLM

VLM2Vec-V2: A Unified Computer Vision Framework for Multimodal Embedding Learning Across Images, Videos, and Visual Documents

Key Factors That Drive Successful MCP Implementation and Adoption

NVIDIA AI Dev Team Releases Llama Nemotron Super v1.5: Setting New Standards in Reasoning and Agentic AI

Apple Researchers Introduce FastVLM: Achieving State-of-the-Art Resolution-Latency-Accuracy Trade-off in Vision Language Models

A Coding Guide to Build a Scalable Multi-Agent System with Google ADK

Too Much Thinking Can Break LLMs: Inverse Scaling in Test-Time Compute

Rubrics as Rewards (RaR): A Reinforcement Learning Framework for Training Language Models with Structured, Multi-Criteria Evaluation Signals

Get to Know Us

keep in touch