Home OpenAI Integrating Neural Systems for Visual Perception: The Role of Ventral Temporal Cortex VTC and Medial Temporal Cortex MTC in Rapid and Complex Object Recognition
OpenAI

Integrating Neural Systems for Visual Perception: The Role of Ventral Temporal Cortex VTC and Medial Temporal Cortex MTC in Rapid and Complex Object Recognition

Share
Integrating Neural Systems for Visual Perception: The Role of Ventral Temporal Cortex VTC and Medial Temporal Cortex MTC in Rapid and Complex Object Recognition
Share


Human and primate perception occurs across multiple timescales, with some visual attributes identified in under 200ms, supported by the ventral temporal cortex (VTC). However, more complex visual inferences, such as recognizing novel objects, require additional time and multiple glances. The high-acuity fovea and frequent gaze shifts help compose object representations. While much is understood about rapid visual processing, less about integrating visual sequences is known. The medial temporal cortex (MTC), particularly the perirhinal cortex (PRC), may aid in this process, enabling visual inferences beyond VTC capabilities by integrating sequential visual inputs.

Stanford researchers evaluated the MTC’s role in object perception by comparing human visual performance to macaque VTC recordings. While humans and VTC perform similarly with brief viewing times (<200ms), human performance significantly surpasses VTC with extended viewing. MTC plays a key role in this improvement, as MTC-lesioned humans perform like VTC models. Eye-tracking experiments revealed that humans use sequential gaze patterns for complex visual inferences. These findings suggest that MTC integrates visuospatial sequences into compositional representations, enhancing object perception beyond VTC capabilities.

Researchers used a dataset of various object images presented in different orientations and settings to estimate performance based on VTC responses and compare it with human visual processing. They implemented a cross-validation strategy where trials featured two typical objects and one outlier in randomized configurations. Neural responses from the brain’s high-level visual areas were then used to train a linear classifier to detect the odd object. This process was repeated multiple times, with results averaged to determine a performance score for distinguishing each pair of objects.

For comparison, a CNN model, pre-trained for object classification, was used to evaluate VTC model performance. The images were preprocessed for the CNN, and a similar experimental setup was followed, where a classifier was trained to detect odd objects in various trials. The model’s accuracy was tested and compared to neural response-based predictions, offering insights into how closely the model’s visual processing mirrored human-like inference.

The study compares human performance in two visual regimes: time-restricted (less than 200ms) and time-unrestricted (self-paced). In time-restricted tasks, participants rely on immediate visual processing since there’s no opportunity for sequential sampling through eye movements. A 3-way visual discrimination task and a match-to-sample paradigm were used to assess this. Results showed a strong correlation between time-restricted human performance and the performance predicted by the high-level VTC of macaques. However, with unlimited viewing time, human participants significantly outperformed VTC-supported performance and computational models based on VTC. This demonstrates that humans exceed VTC capabilities when given extended viewing times, suggesting reliance on different neural mechanisms.

The study reveals complementary neural systems in visual object perception, where the VTC enables rapid visual inferences within 100ms, while the MTC supports more complex inferences through sequential saccades. Time-restricted tasks align with VTC performance, but with more time, humans surpass VTC capabilities, reflecting MTC’s integration of visuospatial sequences. The findings emphasize MTC’s role in compositional operations, extending beyond memory to perception. Models of human vision, like convolutional neural networks, approximate VTC but fail to capture MTC’s contributions, suggesting the need for biologically plausible models that integrate both systems.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 50k+ ML SubReddit

⏩ ⏩ FREE AI WEBINAR: ‘SAM 2 for Video: How to Fine-tune On Your Data’ (Wed, Sep 25, 4:00 AM – 4:45 AM EST)


Sana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.





Source link

Share

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles
Meta AI Introduces VideoJAM: A Novel AI Framework that Enhances Motion Coherence in AI-Generated Videos
OpenAI

Meta AI Introduces VideoJAM: A Novel AI Framework that Enhances Motion Coherence in AI-Generated Videos

Despite recent advancements, generative video models still struggle to represent motion realistically....

Creating an AI Agent-Based System with LangGraph: Putting a Human in the Loop
OpenAI

Creating an AI Agent-Based System with LangGraph: Putting a Human in the Loop

In our previous tutorial, we built an AI agent capable of answering...

ByteDance Proposes OmniHuman-1: An End-to-End Multimodality Framework Generating Human Videos based on a Single Human Image and Motion Signals
OpenAI

ByteDance Proposes OmniHuman-1: An End-to-End Multimodality Framework Generating Human Videos based on a Single Human Image and Motion Signals

Despite progress in AI-driven human animation, existing models often face limitations in...

Meet Crossfire: An Elastic Defense Framework for Graph Neural Networks under Bit Flip Attacks
OpenAI

Meet Crossfire: An Elastic Defense Framework for Graph Neural Networks under Bit Flip Attacks

Graph Neural Networks (GNNs) have found applications in various domains, such as...