Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up

VISIONx @ NYU

university
https://www.sainingxie.com/
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

ellisbrown  submitted a paper 1 day ago
PaintBench: Deterministic Evaluation of Precise Visual Editing
ellisbrown  authored a paper 1 day ago
Benchmarking Visual State Tracking in Multimodal Video Understanding
ellisbrown  authored a paper 1 day ago
PaintBench: Deterministic Evaluation of Precise Visual Editing
View all activity

Papers

Benchmarking Visual State Tracking in Multimodal Video Understanding

PaintBench: Deterministic Evaluation of Precise Visual Editing

View all Papers

Ellis Brown's profile picturePeter Tong's profile pictureManoj Middepogu's profile pictureSai Charitha Akula's profile picturePenghao Wu's profile pictureJihan Yang's profile pictureSaining Xie's profile pictureBingda Tang's profile pictureBoYang Zheng's profile pictureSayak Paul's profile pictureShusheng Yang's profile pictureChenyu, Li's profile pictureAnjali W Gupta's profile pictureXichen Pan's profile picturePinzhi Huang's profile pictureNanye Ma's profile pictureJaskirat Singh's profile pictureZiteng Wang's profile pictureJunwan Kim's profile pictureGeorgy Savva's profile pictureDaohan Lu's profile pictureSihyun Yu's profile pictureZifan Zhao's profile picture
nyu-visionx 's papers 7
Submitted by
Pinzhi Huang
21

Benchmarking Visual State Tracking in Multimodal Video Understanding

nyu-visionx VISIONx @ NYU
23 1
Submitted by
Ellis Brown
2

PaintBench: Deterministic Evaluation of Precise Visual Editing

nyu-visionx VISIONx @ NYU
2 3
Submitted by
taesiri
31

Solaris: Building a Multiplayer Video World Model in Minecraft

nyu-visionx VISIONx @ NYU
206 3
Submitted by
BoYang Zheng
55

Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

nyu-visionx VISIONx @ NYU
250 2
Submitted by
Ellis Brown
6

SIMS-V: Simulated Instruction-Tuning for Spatial Video Understanding

nyu-visionx VISIONx @ NYU
11 2
Submitted by
Jihan Yang
10

Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts

nyu-visionx VISIONx @ NYU
2
Submitted by
Peter Tong
171

Diffusion Transformers with Representation Autoencoders

nyu-visionx VISIONx @ NYU
1.92k 6
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs