Welcome to the Berkeley view of Vision and Learning Research.

Autonomous perception research is at a tipping point, with rapid advances in machine learning algorithms combined with the recent availability of vast sources of labeled and unlabeled data leading to transformative new capabilities. We are finally realizing the long-standing goal of systems that can visually interact with people, objects and scenes. Mobile devices now observe and index the world based on visual cues, and in the social media sphere it is increasingly apparent that one must analyze image and video signals on an equal footing as text signals to fully understand the semantics of the medium.

BVLC research can be seen as being driven by three core, and overlapping, principles:

Rich Representations

Representations are key to effective perception, and how they must be informed by processes in the real world. Not only must they be learned from data and adapt to different domains, but they should exploit, rather than oppose, the physics of image formation. Berkeley researchers pioneered the standard approaches to autonomous and interactive image segmentation, core aspects of object representation and their adaptation across domains, and physically-based representations for vision and graphics alike. These researchers are now working together to develop the next generation of layered models, which not only have structures exploiting unsupervised “deep” or “mid-level” components, but also have layers which reflect intrinsic elements of the image formation process: a shadow looks different from a cloud and from a reflection, even if they have the same shape; conventional representations ignore this at their peril.

Semantics at Large Scale

Computational vision is key challenge area for machine learning research and one of the original sources of “big data”. Machine learning has played an increasingly prominent role in information technology in recent years, a development that is driven by the increasing availability of massive data streams (e.g., video) and the need to analyze, merge and model these data streams to answer inferential questions of interest: it is increasingly clear that semantics emerge at large scale. Slowly the field has begun to face evermore complex inferential questions and to do so with ever more efficient algorithms. However, progress has been arguably incremental and the field has never faced the grand challenge of developing a general scalable methodology whereby machine learning can be applied to data sets of arbitrary size. Berkeley plans to face this challenge, with perception and representation learning as a key goal. Doing so will require the development of new theoretical, algorithmic and systems perspectives that merge statistics and computation in a thoroughgoing manner, in which the tradeoffs between statistical risk and computational resources are well characterized yet are done a streaming setting where decisions as to whether to include data points in a model or not, to aggregate data or not, and where the computational costs of making such decisions are part and parcel of the overall statistico-computational strategy.

3–D Physical Interaction

Interaction in 3-D is essential for situated perceptual systems, especially those that interact physically with humans. Physical movement provides critical grounding of the meaning in action, by explicit demonstration or implicit interaction. Currently robots exist that, when tele-operated, could perform many of our household chores (e.g., Willow Garage's PR2), and robots exist that, in tele-operation, perform 400,000 surgeries per year (Intuitive Surgical's da Vinci). However, there is a substantial gap between what can be done in tele-operation and what can be done autonomously. The fundamental challenges to advance the autonomous capabilities of robots in unstructured environments are to handle extreme variability and uncertainty. Berkeley is addressing these challenges by pioneering new methods for Apprenticeship Learning, where a robot first observes human demonstrations, from which it extracts the essence and then generalizes to new situations in which a similar task has to be executed and further refines its skills on its own. These researchers have pioneered methods for learning detailed and reliable 3-D models of cluttered scenes despite dynamic conditions, and developed 3-D-based systems that interact with humans in driving and mapping tasks, domestic chores, and interactive artistic environments.

The Berkeley view is that the distance between these three key principal areas of investigation—into rich and adaptive representations, large-scale semantics, and 3-D demonstration and interaction—is closing fast, and that a unified core of algorithms, data, and implementations will drive future advances in intelligent media and active interactive systems. Until now, there have been limited opportunities for industry to broadly engage with the range of vision and learning research underway at UC Berkeley.

First-year Student Fellows 2013/2014

BVLC sponsors provide support for first year students admitted into vision and learning research at UC Berkeley.

A New Center

We are launching the Berkeley Vision and Learning Center (BVLC) to provide a forum for focused interaction with industry, and as a specific mechanism to support first year students admitted to study vision and learning in UC Berkeley’s EECS department. The goals of the center are to:

  • Establish a venue that fosters direct contact between interested
    sponsors, students, and faculty pursuing vision and learning research
  • Offer the earliest possible student recruiting opportunities for industry partners
  • Facilitate opportunities for technology transfer and/or additional sponsored research
  • Provide support to first year vision and learning students


BVLC Spring Retreat ’14

The BVLC Spring Retreat for Berkeley research groups and sponsors will take place at Northstar in Lake Tahoe on Monday March 24th - Wednesday March 26th. See the draft schedule .

BVLC Fall Retreat ’13

Nov 8 ’13. Talk sessions and panel at Sutardja Dai Hall Auditorium followed by a reception and dinner at the Women's Faculty Club.
(Contact us for overnight accommodations if desired.)


9:00am Welcome and Overview

9:10am Morning Session I: Big Trends in Big Vision

Jitendra Malik, The Three R's of Computer Vision: Recognition, Reconstruction and Reorganization
Alyosha Efros, The Promise and Perils of Big Visual Data
Ross Girshick, Rich feature hierarchies for accurate object detection and semantic segmentation

10:20am Break

10:40am Morning Session II: Optimization and Learning

Michael Jordan, On the Computational and Statistical Interface and “Big Data”
Ben Recht, Getting the Most out of Your Data with Convex Optimization
Stefanie Jegelka, Efficient learning and inference with combinatorial structure

12:00pm Lunch with First-Year Fellows

1:00pm Afternoon Session I: From “Physical” Vision to Storytelling

Pieter Abbeel, Machine Learning and Optimization in Robotics
Ravi Ramamoorthi, Physics-Based Vision: From Natural Lighting to Volumetric Scattering
Mannesh Agrawala, Storytelling Tools

2:30pm Research “Snapshots”: Darrell Group, Bajcsy Group, Zakhor Group

3:00pm Break

3:15pm Afternoon Session II: Breakouts

Deep & Mid-level Visual Features (Room 254)
Discussants: Yangqing Jia, Jeff Donahue, Ross Girshick

Learning with Big Data (Auditorium)
Discussants: Ameet Talwakar, Stefanie Jegelka

Active & Interactive 3D Perception (Room 240)
Discussants: Prof. Ruzena Bajcsy, Sachin Patil

Breakout Charge: summarize key current work in BVLC and by sponsors, identify promising directions and possible collaborations.

4:15pm Breakout report-back

4:30pm Closing Discussion: BVLC Best Practices and Plans for Winter/Spring Retreat

5:00pm Board Meeting

5:30pm Reception and Dinner: Women’s Faculty Club
6:00pm Dinner, Women's Faculty Club.
Dinner Speaker: Prof. Jack Gallant, “Reverse Engineering the Human Brain”


Prof. Trevor Darrell, trevor [at] eecs, 415 690 0822
Angie Abbatecola, angie [at] eecs, 510-643-6413

Adobe Creative Technologies Lab Retreat

Nov 12 ’13. Talks to be held all-day at the Banatao Auditorium and discussions in the Kvamme Atrium, both in Sutardja Dai Hall at UC Berkeley.

9:00am Introduction: David Salesin

9:15am Session 1: New Era of Machine Learning

9:15 Session 1: A New Era Of Machine Learning
David Salesin: Artificial Emotional Intelligence: The Next Frontier?
Sylvain Paris: Machine Learning On A Diet
John Canny: Visual Analytics

10:15 Discussion

11:00 Session 2: Constructions & Meditations
Dan Goldman & Ellen Wixted: Juxt
Carlo séquin: Weak Links In The Chain From Concept To Construction
Alvy Ray Smith: Moore'S Law Meditations

12:00 Discussion & Lunch.

1:15 Session 3: Lifelong Learning
Bjoern Hartmann: Moocs And Anti-Moocs
Joel Brandt: How To Be A Great Engineer In 2023
Michael Rubin: The Quantified Student

2:15 Discussion

3:00 Session 4: Vision & Understanding
Vladlen Koltun: Three Goals For 3d Vision
James O’Brien: Geometric Image And Video Forensics
Alyosha Efros: Towards A Visual Memex

4:00 Discussion

4:45 Session 5: Difficult Data
Tapan Parikh: Introducing Youth To Data Science
Trevor Darrell: Deep Representations
Aseem Agarwala: 7 Important Problems I Don’T Know How To Solve

Microsoft Research Visit

img2txt by Simon Baker , MSR and Immersive Telepresence by Zhengyou Zhang , MSR.
Oct 1 ’13 at 4pm, Sutardja Dai Hall Auditorium.

The inaugural event of the new Berkeley Vision and Learning Center will be held on Tuesday Oct 1st, featuring a visit from Microsoft Research. Rick Szeliski, Simon Baker, Zhengyou Zhang, and other colleagues from MSR will visit us, and Simon will present the following talk at 4pm in the SDH auditorium. The talk is open to the public and members of BVLC groups are encouraged to join the reception and dinner thereafter in the SDH lobby.

img2txt, the problem of generating a text description of an image, has received significant interest over the last few years. We focus on the question: why perform imgt2xt? We suggest two answers, in each case demonstrating a prototype application.

Immersive Telepresence aims at bringing immersive experience into telecommunication so people across geographically distributed sites can interact collaboratively as if they were face-to-face. This requires deep understanding of multiple disciplines. In particular, computer vision, graphics and acoustics are indispensable in capturing and rendering 3D dynamic environments in order to create the illusion that the remote participants are in the same room. Existing videoconferencing systems leave a great deal to be desired: mutual gaze, 3D, motion parallax, spatial audio, to name a few. In this talk, I will describe various research activities that are being conducted in Microsoft Research. Broadly, we follow two approaches: virtual tele-immersion and physical tele-immersion.