INTERACTIVE PORTRAIT HARMONIZATION

Abstract

Current image harmonization methods consider the entire background as the guidance for harmonization. However, this may limit the capability for user to choose any specific object/person in the background to guide the harmonization. To enable flexible interaction between user and harmonization, we introduce interactive harmonization, a new setting where the harmonization is performed with respect to a selected region in the reference image instead of the entire background. A new flexible framework that allows users to pick certain regions of the background image and use it to guide the harmonization is proposed. Inspired by professional portrait harmonization users, we also introduce a new luminance matching loss to optimally match the color/luminance conditions between the composite foreground and select reference region. This framework provides more control to the image harmonization pipeline achieving visually pleasing portrait edits. Furthermore, we also introduce a new dataset carefully curated for validating portrait harmonization. Extensive experiments on both synthetic and real-world datasets show that the proposed approach is efficient and robust compared to previous harmonization baselines, especially for portraits. The code can be found here:

1. INTRODUCTION

With the increasing demand of virtual social gathering and conferencing in our lives, image harmonization techniques become essential components to make the virtual experience more engaging and pleasing. For example, if you cannot join a wedding or birthday party physically but still want to be in the photo, the first option would be to edit yourself into the image. Directly compositing yourself into the photo would not look realistic without matching the color/luminance conditions. One possible solution to make the composition image more realistic is to leverage existing image harmonization methods Cong et al. (2020; 2021) Most previous works focus on a more general image harmonization setup, where the goal is to match a foreground object to a new background scene without too much focus on highly retouched portrait. However, when we conduct surveys among professional composition Photoshop/Affinity 1 users, we realized that portrait harmonization is the most common task of image editing in real-world scenario and professional settings. This makes portrait harmonization the most important use case of image harmonization. We note that previous harmonization works have not focused on addressing portrait harmonization on real-world data. In this work, we aim to explore a better solution to obtain realistic and visually pleasing portrait harmonization for real-world high-resolution edited images. One common question that pops up when we demo existing image harmonization workflow to these professional users is: 'How could we choose a certain person as reference when we do harmonization with existing workflow ?'. The workflow design of existing state-of-the-art harmonization methods Cong et al. (2020; 2021); Ling et al. (2021); Guo et al. (2021b) limits the capability for user to choose any person/region as reference during the harmonization process. These frameworks are designed such that they just take in the composite image and foreground mask as input thus offering no specific way to help the user to guide the harmonization. Certain frameworks such as 2021) have the flexibility to be tweaked and converted to serve interactive harmonization. However, from our experiments we find that they are not robust enough to perform realistic portrait harmonization. Furthermore, professional portraits are mostly shot at studios where screens constitute the background. These screens offer little to no information in harmonizing a new composite when edited into the photo. This causes current harmonization methods to produce unstable results (see first row of Figure 1 ) as they have not been trained to perform harmonization with screens as background. Also, portraits which are captured by everyday users usually contain a background of spatially varying luminance characteristics. Using the entire background to guide harmonization here can result in undesirable outcomes (see second row of Figure 1 ) as harmonization depends on the location where the foreground is composited to. To this end, we introduce Interactive harmonizationa more realistic and flexible setup where the user can guide the harmonization. The user can choose specific regions in the background as a reference to harmonize the composite foreground. From a professional stand-point, this is a much-needed feature as it could enable a lot of control for editors. In this work, we propose a new interactive portrait harmonization framework that has the flexibility to take in a reference region provided by the user. We use an encoder-decoder based harmonization network which takes in the foreground composite region as input and a style encoder which takes in the reference region as its input. The reference region to guide the harmonization is selected by the user and can be a person, and object or even just some part of the background. However, it can be noted that in portrait editing it is very common to choose another person in the picture as reference to obtain an effective harmonization. We also use a style encoder that extracts the style code of the reference region and injects it to the harmonized foreground. We carefully align the style information with foreground while also preserving the spatial content using adaptive instance normalization Huang & Belongie (2017) layers at decoder. To make the harmonization look more realistic it is important to optimally match the luminance, color, and other appearance information between the reference and foreground. To match these characteristics in manual photo editing, professional photography users usually match statistics of the highest (highlight), lowest (shadow) and average (mid-tone) of luminance points between the composite foreground image and the reference region. Hence, we propose a new luminance matching loss that is inspired by professional usersfoot_0 . In the proposed loss, we match the highlight, mid-tone and shadow points between the reference region and foreground composite region.



https://www.youtube.com/watch?v=SoWefQNcIyY&t=268s



; Cun & Pun (2020); Ling et al. (2021); Jiang et al. (2021); Guo et al. (2021b); Tsai et al. (2017).

Figure 1: Testing with in-the-wild portrait composites. Top row-Professional Studio Portrait, Bottom row-Non-studio portrait. It can be observed that the current SOTA harmonization method Ling et al. (2021) fails to give realistic results as it tries to match the appearance of foreground to the entire background. Our proposed interactive harmonization framework produces a visually pleasing portrait harmonization as it matches the appearance of the original portrait in the reference region instead of the entire background as selected by the user.

availability

https://github.com/jeya

