1
00:00:02,770 --> 00:00:07,310
I am Vassillen Chizhov, currently in my final
year as a PhD student at the Mathematical

2
00:00:07,310 --> 00:00:09,900
Image Analysis Group at Saarland University,
Germany.

3
00:00:09,900 --> 00:00:15,529
I will be presenting our work "Perceptual
Error Optimization for Monte Carlo Rendering".

4
00:00:15,529 --> 00:00:19,790
This is joint work with Iliyan Georgiev from
Autodesk, United Kingdom, Karol Myszkowski

5
00:00:19,790 --> 00:00:23,970
and Gurprit Singh from the Max-Planck Institute
for Informatics, Saarland, Germany.

6
00:00:23,970 --> 00:00:28,600
First, I will motivate our work using some
practical applications.

7
00:00:28,600 --> 00:00:32,900
Then I will go over relevant previous work
and its relation to our approach.

8
00:00:32,900 --> 00:00:38,050
We will also need some basic ideas from halftoning,
and I will describe how our work transfers

9
00:00:38,050 --> 00:00:39,800
those to rendering.

10
00:00:39,800 --> 00:00:46,030
I will conclude with a discussion of our algorithms,
results, and extensions.

11
00:00:46,030 --> 00:00:50,030
In practice we want to synthesize renderings
that have a lower perceptual error at the

12
00:00:50,030 --> 00:00:51,590
same number of samples.

13
00:00:51,590 --> 00:00:55,910
We achieve this by optimizing the screen-space
samples' distribution resulting in the high-quality

14
00:00:55,910 --> 00:00:57,020
image on the right.

15
00:00:57,020 --> 00:01:03,920
In contrast, the noisy image on the left was
synthesized using a standard approach.

16
00:01:03,920 --> 00:01:07,880
Our optimization is performed with respect
to a specific blurring kernel.

17
00:01:07,880 --> 00:01:12,259
If this kernel is used not only for optimization,
but also for reconstruction, then we can get

18
00:01:12,259 --> 00:01:16,450
a very smooth image, as shown on the right.

19
00:01:16,450 --> 00:01:20,469
Georgiev and Fajardo's work introduces the
first practical algorithm for optimizing the

20
00:01:20,469 --> 00:01:23,479
screen-space samples' distribution in rendering.

21
00:01:23,479 --> 00:01:28,390
They perform an offline optimization with
the goal of achieving a blue noise error distribution.

22
00:01:28,390 --> 00:01:32,540
Heitz and colleagues also employ an offline
optimization, but it is adapted to a Sobol

23
00:01:32,540 --> 00:01:36,619
sampler, and has better scaling with the number
of samples.

24
00:01:36,619 --> 00:01:41,319
We mainly build upon the work of Heitz and
Belcour, where they use an online optimization

25
00:01:41,319 --> 00:01:44,689
and offset its cost over several frames in
an animation setting.

26
00:01:44,689 --> 00:01:49,869
Finally, Ahmed and Wonka proposed a Sobol
sampler with similar performance to the work

27
00:01:49,869 --> 00:01:53,020
of Heitz and colleagues, with lower storage
requirements.

28
00:01:53,020 --> 00:01:58,259
Our work provides a unifying framework that
can be used to analyze any of the above methods.

29
00:01:58,259 --> 00:02:04,119
It also provides insights into their advantages,
disadvantages, assumptions, limitations, and

30
00:02:04,119 --> 00:02:08,229
it lays out guidelines for future work.

31
00:02:08,229 --> 00:02:13,470
In order to understand our approach, we need
to introduce some basic ideas from halftoning.

32
00:02:13,470 --> 00:02:18,030
Given a reference image "I", and a set of
quantization levels, the main goal of halftoning

33
00:02:18,030 --> 00:02:21,030
is to produce a quantized image "Q" similar
to the reference.

34
00:02:21,030 --> 00:02:25,730
More precisely, we want the quantized image
to look similar to the reference from a specific

35
00:02:25,730 --> 00:02:27,200
distance.

36
00:02:27,200 --> 00:02:32,480
As an example, consider a quantization set
consisting of 3 gray values: black, gray,

37
00:02:32,480 --> 00:02:33,480
and white.

38
00:02:33,480 --> 00:02:36,880
We can generate a quantized image using these
3 colors.

39
00:02:36,880 --> 00:02:41,470
If the quantized image is the result of an
optimization taking into account the human

40
00:02:41,470 --> 00:02:42,960
visual system,

41
00:02:42,960 --> 00:02:46,940
then the kernel used to model the human visual
system can also be used for the reconstruction,

42
00:02:46,940 --> 00:02:50,410
as shown on the right.

43
00:02:50,410 --> 00:02:54,210
To understand how the optimization may be
derived we can look at the error image.

44
00:02:54,210 --> 00:02:58,540
In the top left we have the error for an "optimal"
quantization, while on the bottom right we

45
00:02:58,540 --> 00:03:01,300
have the error resulting from random dithering.

46
00:03:01,300 --> 00:03:05,460
The key is to study how those evolve under
convolution with kernels modeling the human

47
00:03:05,460 --> 00:03:06,460
visual system.

48
00:03:06,460 --> 00:03:10,710
In the example above, we consider a Gaussian
kernel (Pappas and Neuhoff 1999), and it becomes

49
00:03:10,710 --> 00:03:14,440
clear that the optimized error decays much
faster under convolution.

50
00:03:14,440 --> 00:03:19,700
This is even more apparent for a Gaussian
with a larger standard deviation, which corresponds

51
00:03:19,700 --> 00:03:21,630
to a larger viewing distance.

52
00:03:21,630 --> 00:03:25,990
This means that we want to minimize the error
perceived from a certain distance, which we

53
00:03:25,990 --> 00:03:29,920
can model through the following minimization
problem:

54
00:03:29,920 --> 00:03:33,370
This minimization problem lies at the heart
of halftoning.

55
00:03:33,370 --> 00:03:36,840
Methods such as dispersed-dot dithering and
error diffusion aim to quickly approximate

56
00:03:36,840 --> 00:03:40,180
the optimum for a specific kernel.

57
00:03:40,180 --> 00:03:45,440
Our main contribution lies in transferring
those halftoning ideas to rendering.

58
00:03:45,440 --> 00:03:48,260
However, there are some non-trivial challenges.

59
00:03:48,260 --> 00:03:52,470
The first difficulty is that what used to
be the quantization image "Q" is now an image

60
00:03:52,470 --> 00:03:57,040
formed by Monte Carlo estimators for the measurement
equation (Veach 1997).

61
00:03:57,040 --> 00:04:00,090
The final image is thus a function of samples.

62
00:04:00,090 --> 00:04:04,220
A bigger issue is that we do not have the
reference "I" in a rendering context.

63
00:04:04,220 --> 00:04:08,450
However, we will see that it may be replaced
by a suitable smooth approximation, which

64
00:04:08,450 --> 00:04:11,710
we term a surrogate or guidance image.

65
00:04:11,710 --> 00:04:15,640
We first tackle the issue of the image being
a function of samples.

66
00:04:15,640 --> 00:04:19,530
Consider the setting where we have generated
a vertical stack of images at one sample per

67
00:04:19,530 --> 00:04:21,229
pixel.

68
00:04:21,229 --> 00:04:25,830
The vertical set of estimates within a single
pixel can then be interpreted as the already

69
00:04:25,830 --> 00:04:29,460
familiar quantization levels' set from halftoning.

70
00:04:29,460 --> 00:04:33,469
We term the above search space "vertical",
and then the optimization can be done as in

71
00:04:33,469 --> 00:04:35,300
halftoning.

72
00:04:35,300 --> 00:04:39,120
In fact the left image from the motivation
section was produced by random dithering on

73
00:04:39,120 --> 00:04:40,639
the stack of estimates,

74
00:04:40,639 --> 00:04:44,639
while the image on the right minimizes the
described halftoning energy.

75
00:04:44,639 --> 00:04:48,130
The "vertical" search space corresponds to
the search space in the "histogram" approach

76
00:04:48,130 --> 00:04:49,500
of Heitz and Belcour,

77
00:04:49,500 --> 00:04:53,139
however they do not formulate an explicit
minimization problem.

78
00:04:53,139 --> 00:04:57,100
Additionally their halftoning method is an
order statistics match, which can be seen

79
00:04:57,100 --> 00:04:59,550
as a special case of dispersed-dot dithering,

80
00:04:59,550 --> 00:05:05,509
and their algorithm also implicitly relies
on a sub-optimal guidance image.

81
00:05:05,509 --> 00:05:09,710
The horizontal search space consists of all
possible permutations of the per pixel sample

82
00:05:09,710 --> 00:05:10,710
sets.

83
00:05:10,710 --> 00:05:15,409
That is, we allow the optimization to swap
the samples used to estimate two pixels.

84
00:05:15,409 --> 00:05:19,909
In practice this would incur a heavy cost,
as each swap requires a re-rendering.

85
00:05:19,909 --> 00:05:24,840
To alleviate this we choose to use pixel swaps
instead of sample swaps for the optimization.

86
00:05:24,840 --> 00:05:31,189
At the end, we use the resulting permutation
on the samples, and then re-render.

87
00:05:31,189 --> 00:05:34,229
The procedure is illustrated in the following
images.

88
00:05:34,229 --> 00:05:37,999
The top row corresponds to the results where
the pixels have been swapped.

89
00:05:37,999 --> 00:05:41,660
The bottom row corresponds to the results
where the samples have been swapped and the

90
00:05:41,660 --> 00:05:42,660
image has been re-rendered.

91
00:05:42,660 --> 00:05:47,580
We can see that Heitz and Belcour's approach
results in a much noisier image, especially

92
00:05:47,580 --> 00:05:48,580
near edges.

93
00:05:48,580 --> 00:05:52,830
This is because they use disjoint tile regions
for the allowed swaps, and they use order

94
00:05:52,830 --> 00:05:55,180
statistics matching for halftoning.

95
00:05:55,180 --> 00:05:59,629
Our approach uses overlapping disk regions
for allowed swaps, and an iterative minimization,

96
00:05:59,629 --> 00:06:02,500
which results in a much better fit.

97
00:06:02,500 --> 00:06:06,499
We illustrate the effect of replacing the
true solution with a guidance image.

98
00:06:06,499 --> 00:06:11,189
On the left we have a piece-wise constant
averaged image, and while it is much smoother

99
00:06:11,189 --> 00:06:13,129
than the noisy estimate on the right,

100
00:06:13,129 --> 00:06:16,080
it is also biased and exhibits block artifacts.

101
00:06:16,080 --> 00:06:20,870
Our optimization essentially projects this
biased guidance image on the space of unbiased

102
00:06:20,870 --> 00:06:23,009
estimates, reducing the bias,

103
00:06:23,009 --> 00:06:26,479
while still being much smoother than the noisy
image on the right.

104
00:06:26,479 --> 00:06:31,669
This can also be used to reduce hallucinated
artifacts, visible on the left, resulting

105
00:06:31,669 --> 00:06:33,120
from a denoiser.

106
00:06:33,120 --> 00:06:36,779
Applying the same optimization approach, the
artifacts are minimized, while the smoothness

107
00:06:36,779 --> 00:06:41,160
is preserved to a large extent.

108
00:06:41,160 --> 00:06:44,580
The question arises how to minimize the energy
in practice.

109
00:06:44,580 --> 00:06:47,449
Different methods can be categorized in three
main classes.

110
00:06:47,449 --> 00:06:51,460
The first option is to apply a greedy iterative
minimization procedure such as best-first

111
00:06:51,460 --> 00:06:52,460
search.

112
00:06:52,460 --> 00:06:56,300
It is the most flexible of all of our methods,
but it is also the slowest.

113
00:06:56,300 --> 00:07:00,839
Error diffusion approaches, such as Floyd-Steinberg,
are much faster but aren't as flexible.

114
00:07:00,839 --> 00:07:05,819
Finally, dispersed-dot dithering is the fastest,
but it also has the lowest fitting power.

115
00:07:05,819 --> 00:07:10,439
Heitz and Belcour's "histogram" and "permutation"
approaches fall under this last category.

116
00:07:10,439 --> 00:07:14,699
Both dispersed-dot dithering and error diffusion
can be seen as fast approximations to the

117
00:07:14,699 --> 00:07:15,699
solution

118
00:07:15,699 --> 00:07:19,020
produced from the iterative minimization approach.

119
00:07:19,020 --> 00:07:23,229
In the following, we illustrate the robustness
of iterative minimization methods to different

120
00:07:23,229 --> 00:07:25,400
kernels, using a few instructive experiments.

121
00:07:25,400 --> 00:07:29,870
We start with a white noise distribution and
a horizontal/permutation search space.

122
00:07:29,870 --> 00:07:35,159
If we optimize the energy with respect to
a low-pass filter, then the result is a blue

123
00:07:35,159 --> 00:07:36,159
noise distribution.

124
00:07:36,159 --> 00:07:40,270
The power spectrum of the kernel is shown
on the bottom left, while the power spectrum

125
00:07:40,270 --> 00:07:43,509
of the optimized image is shown on the bottom
right.

126
00:07:43,509 --> 00:07:48,400
Note that ideally the power spectra should
be inverses of each other, as is the case

127
00:07:48,400 --> 00:07:49,439
here.

128
00:07:49,439 --> 00:07:53,909
Using a band-stop filter results in a green
noise distribution as expected, and using

129
00:07:53,909 --> 00:07:57,020
a high-pass filter, results in red noise.

130
00:07:57,020 --> 00:08:00,379
Similarly using a band-pass filter results
in violet noise.

131
00:08:00,379 --> 00:08:04,900
The optimization can also handle anisotropic
kernels and spatially varying kernels.

132
00:08:04,900 --> 00:08:09,229
The latter ought to be helpful for future
work aiming to incorporate denoisers in the

133
00:08:09,229 --> 00:08:14,669
optimization, since denoisers can produce
both anisotropic and spatially varying kernels.

134
00:08:14,669 --> 00:08:18,770
Thus we have seen, that the iterative minimization
algorithms that we consider, are extremely

135
00:08:18,770 --> 00:08:25,199
robust and flexible, which explains their
high fitting power and good performance.

136
00:08:25,199 --> 00:08:29,930
Next we provide visual comparisons of our
best methods to the "histogram" and "permutation"

137
00:08:29,930 --> 00:08:31,800
algorithms of Heitz and Belcour.

138
00:08:31,800 --> 00:08:36,190
In all cases the perceptual error, that is
the formulated halftoning energy, is also

139
00:08:36,190 --> 00:08:38,790
always much lower for our approaches.

140
00:08:38,790 --> 00:08:44,030
Note the higher noise for horizontal methods
near edges, such as the ceiling and at the

141
00:08:44,030 --> 00:08:46,700
base of the windows.

142
00:08:46,700 --> 00:08:50,750
This is less apparent for the bathroom scene,
but it still occurs near the edges of the

143
00:08:50,750 --> 00:08:53,070
sink.

144
00:08:53,070 --> 00:08:56,780
In the living room scene, this effect is invisible
for our horizontal method,

145
00:08:56,780 --> 00:09:01,490
however for Heitz and Belcour's permutation
it is still very prominent near the picture

146
00:09:01,490 --> 00:09:04,310
frame and on the wall near the floor.

147
00:09:04,310 --> 00:09:09,440
We emphasize that the above experiments are
chosen to be an ideal case for their approach,

148
00:09:09,440 --> 00:09:11,250
as the scene remains static and

149
00:09:11,250 --> 00:09:16,600
horizontal approaches are given 16 "animation"
frames to converge.

150
00:09:16,600 --> 00:09:21,660
We use a state-of-the-art visual quality metric:
HDR-VDP-2 by Mantiuk and colleagues, to evaluate

151
00:09:21,660 --> 00:09:23,370
our results.

152
00:09:23,370 --> 00:09:27,800
The error map from HDR-VDP-2 is visualized
on the right halves of the images.

153
00:09:27,800 --> 00:09:32,370
The ranking from the perceptual energy seems
to match well this advanced metric.

154
00:09:32,370 --> 00:09:36,460
That is, the methods with lowest fitting power
are Heitz and Belcour's approaches and our

155
00:09:36,460 --> 00:09:38,470
horizontal optimization,

156
00:09:38,470 --> 00:09:42,940
while the ones with the highest fitting power
are our vertical methods.

157
00:09:42,940 --> 00:09:46,590
We note that the worse the guidance image
is, the lower the fitting power used should

158
00:09:46,590 --> 00:09:47,590
be.

159
00:09:47,590 --> 00:09:51,780
This means that there are cases where any
of the methods outperform the others, depending

160
00:09:51,780 --> 00:09:56,240
on the quality of the guidance image.

161
00:09:56,240 --> 00:10:00,090
The discussed iterative energy minimization
allows for straightforward extensions through

162
00:10:00,090 --> 00:10:02,160
small changes in the energy.

163
00:10:02,160 --> 00:10:05,760
For example, we can take into account viewing
distance by modifying the blurring kernel

164
00:10:05,760 --> 00:10:07,000
used in the optimization.

165
00:10:07,000 --> 00:10:11,470
On the left, the image is optimized for near
viewing, while on the right the optimization

166
00:10:11,470 --> 00:10:13,700
aims at far viewing distances.

167
00:10:13,700 --> 00:10:18,120
Note the enhancement of the edges in the right
image.

168
00:10:18,120 --> 00:10:21,530
Another important feature is the handling
of colors.

169
00:10:21,530 --> 00:10:26,320
Heitz and Belcour convert the image to grayscale
in order to match the order statistics.

170
00:10:26,320 --> 00:10:30,070
We have illustrated such a result using our
methods in the top right.

171
00:10:30,070 --> 00:10:32,690
It exhibits objectionable color noise.

172
00:10:32,690 --> 00:10:37,020
A straightforward improvement consists of
modifying the norm in the energy to work on

173
00:10:37,020 --> 00:10:39,830
RGB vectors, see the image in the top left.

174
00:10:39,830 --> 00:10:44,700
The best quality is achieved by applying the
optimization to each channel separately, as

175
00:10:44,700 --> 00:10:47,450
illustrated in the image in the bottom right
corner.

176
00:10:47,450 --> 00:10:51,380
However, note that the latter is applicable
only for vertical search spaces.

177
00:10:51,380 --> 00:10:56,270
If the rendered image is tonemapped, then
it also makes sense to include the tonemapping

178
00:10:56,270 --> 00:10:57,270
in the optimization,

179
00:10:57,270 --> 00:11:04,440
as otherwise sub-optimal results are achieved,
see for example the lower left corner.

180
00:11:04,440 --> 00:11:08,750
We constructed a framework by transferring
key ideas from halftoning to the rendering

181
00:11:08,750 --> 00:11:09,890
setting.

182
00:11:09,890 --> 00:11:14,360
This naturally lead to a collection of algorithms
mirroring their halftoning counterparts.

183
00:11:14,360 --> 00:11:20,170
Additionally, we proposed several extensions
to handle standard rendering features.

184
00:11:20,170 --> 00:11:24,610
A promising avenue for future research is
replacing the convolution in the energy with

185
00:11:24,610 --> 00:11:26,220
a nonlinear reconstruction.

186
00:11:26,220 --> 00:11:30,130
This can be used to couple a denoiser to the
optimization.

187
00:11:30,130 --> 00:11:35,350
Another important use of our theory is the
construction of new offline methods.

188
00:11:35,350 --> 00:11:40,280
Some preliminary experiments regarding this
are available in our supplemental document.

189
00:11:40,280 --> 00:11:41,490
Thank you for your attention.