Learning to Compose and Decompose

  • Full Conference Pass (FC) Full Conference Pass (FC)
  • Full Conference One-Day Pass (1D) Full Conference One-Day Pass (1D)

Date: Wednesday, December 5th
Time: 9:00am - 9:21am
Venue: Hall B5(1) (5F, B Block)


Summary: Object functionality is often expressed through part articulation -- as when the two rigid parts of a scissor pivot against each other to accomplish the cutting function. Such articulations are often consistent across objects within the same underlying functional category. In this paper we explore how the observation of different articulation states provides evidence for part structure of 3D objects. We start from a pair of unsegmented 3D CAD models or scans, or a pair of a 3D shape and an RGB image, indicating two different articulation states of two functionally related objects and aim to induce their common part structure. This is a challenging setting, as we assume no prior shape structure, the articulation states may belong to objects of different geometry, plus we allow noisy and partial scans as input. Our method learns a neural network architecture with three modules that respectively propose correspondences, estimate 3D deformation flows, and perform segmentation. As we demonstrate, when our architecture is iteratively used in an ICP-like fashion alternating between correspondence, flow, and segmentation prediction, it significantly outperforms state-of-the-art techniques in the task of discovering articulated parts of objects. We note that our part induction is object-class agnostic and generalizes well to new and unseen objects.

Author(s)/Speaker(s):
Moderator: Lecturer(s):

Author(s)/Speaker(s) Bio:

Date: Wednesday, December 5th
Time: 9:21am - 9:42am
Venue: Hall B5(1) (5F, B Block)


Summary: A majority of stock 3D models in modern shape repositories are assembled with many fine-grained components. The main cause of such data form is the component-wise modeling process widely practiced by human modelers. These modeling components thus inherently reflect some function-based shape decomposition the artist had in mind during modeling. On the other hand, modeling components represent an over-segmentation since a functional part is usually modeled as a multi-component assembly. Based on these observations, we advocate that labeled segmentation of stock 3D models should not overlook the modeling components and propose a learning solution to grouping and labeling of the fine-grained components. However, directly characterizing the shape of individual components for the purpose of labeling is unreliable, since they can be arbitrarily tiny and semantically meaningless. We propose to generate part hypotheses from the components based on a hierarchal grouping strategy, and perform labeling on those part groups instead of directly on the components. Part hypotheses are mid-level elements which are more probable to carry semantic information. A multiscale 3D convolutional neural network is trained to extract context-aware features for the hypotheses. To accomplish a labeled segmentation of the whole shape, we formulate higher-order conditional random fields (CRFs) to infer an optimal label assignment for all components. Extensive experiments demonstrate that our method achieves significantly robust labeling results on raw 3D models from public shape repositories. Our work also contributes the first benchmark for component-wise labeling.

Author(s)/Speaker(s):
Moderator: Lecturer(s):

Author(s)/Speaker(s) Bio:

Date: Wednesday, December 5th
Time: 9:42am - 10:03am
Venue: Hall B5(1) (5F, B Block)


Summary: We introduce SCORES, a recursive neural network for shape composition. Our network takes as input sets of parts from two or more source 3D shapes and a rough initial placement of the parts. It outputs an optimized part structure for the composed shape, leading to high-quality geometry construction. A unique feature of our composition network is that it is not merely learning how to connect parts. Our goal is to produce a coherent and plausible 3D shape, despite large incompatibilities among the input parts. The network may significantly alter the geometry and structure of the input parts and synthesize a novel shape structure based on the inputs, while adding or removing parts to minimize a structure plausibility loss. We design SCORES as a recursive autoencoder network. During encoding, the input parts are recursively grouped to generate a root code. During synthesis, the root code is decoded, recursively, to produce a new, coherent part assembly. Assembled shape structures may be novel, with little global resemblance to training exemplars, yet have plausible substructures. SCORES therefore learns a hierarchical substructure shape prior based on per-node losses. It is trained on structured shapes from ShapeNet, and is applied iteratively to reduce the plausibility loss. We show results of shape composition from multiple sources over different categories of man-made shapes and compare with state-of-the-art alternatives, demonstrating that our network can significantly expand the range of composable shapes for assembly-based modeling.

Author(s)/Speaker(s):
Moderator: Lecturer(s):

Author(s)/Speaker(s) Bio:

Date: Wednesday, December 5th
Time: 10:03am - 10:24am
Venue: Hall B5(1) (5F, B Block)


Summary: We introduce a novel framework for using natural language to generate and edit 3D indoor scenes, harnessing scene semantics and text-scene grounding knowledge learned from large annotated 3D scene databases. The advantage of natural language editing interfaces is strongest when performing semantic operations at the sub-scene level, acting on groups of objects. We learn how to manipulate these sub-scenes by analyzing existing 3D scenes. We perform edits by first parsing a natural language command from the user and transforming it into a semantic scene graph that is used to retrieve corresponding sub-scenes from the databases that match the command. We then augment this retrieved sub-scene by incorporating other objects that may be implied by the scene context. Finally, a new 3D scene is synthesized by aligning the augmented sub-scene with the user's current scene, where new objects are spliced into the environment, possibly triggering appropriate adjustments to the existing scene arrangement. A suggestive modeling interface with multiple interpretations of user commands is used to alleviate ambiguities in natural language. We conduct studies comparing our approach against both prior text-to-scene work and artist-made scenes and find that our method significantly outperforms prior work and is comparable to handmade scenes even when complex and varied natural sentences are used.

Author(s)/Speaker(s):
Moderator: Lecturer(s):

Author(s)/Speaker(s) Bio:

Date: Wednesday, December 5th
Time: 10:24am - 10:45am
Venue: Hall B5(1) (5F, B Block)


Summary: While computer-aided design is a major part of many modern manufacturing pipelines, the design files generated describe raw geometry. Lost in this representation is the procedure by which these designs were generated. In this paper, we present a method for reverse-engineering the process by which 3D models may have been generated, in the language of constructive solid geometry (CSG). Observing that CSG is a formal grammar, we formulate this inverse CSG problem as a program synthesis problem. Our solution is an algorithm that couples geometric processing with state-of-the-art program synthesis techniques. In this scheme, geometric processing is used to convert the mixed discrete and continuous domain of CSG trees to a pure discrete domain that modern program synthesizers excel in. We prove that our algorithm provides a complete search and demonstrate its efficiency and scalability on several different examples, including those with over $100$ primitive parts. We show that our algorithm is able to find simple programs which are close to the ground truth, and demonstrate our method's applicability in mesh re-editing. Finally, we compare our method to prior state-of-the-art. We demonstrate that our algorithm dominates previous methods in terms of resulting CSG compactness and runtime, and can handle far more complex input meshes than any previous method.

Author(s)/Speaker(s):
Moderator: Lecturer(s):

Author(s)/Speaker(s) Bio:

 

Back

/jp/attendees/doctoral-consortium /jp/attendees/production-gallery