We propose a theoretical analysis of recommendation systems in an online setting, where items are sequentially recommended to users over time. In each round, a user, randomly picked from a population of m users, requests a recommendation. The decision-maker observes the user and selects an item from a catalogue of n items. Importantly, an item cannot be recommended twice to the same user. The probabilities that a user likes each item are unknown. The performance of the recommendation algorithm is captured through its regret, considering as a reference an Oracle algorithm aware of these probabilities. We investigate various structural assumptions on these probabilities: we derive for each structure regret lower bounds, and devise algorithms achieving these limits. Interestingly, our analysis reveals the relative weights of the different components of regret: the component due to the constraint of not presenting the same item twice to the same user, that due to learning the chances users like items, and finally that arising when learning the underlying structure.

This is joint work with Kaito Ariu, Narae Ryu, and Alexandre Proutière.

# Special Session @ 2019 KSIAM Annual Meeting

A special session on “Combinatorial and Discrete Optimization” at the 2019 KSIAM Annual Meeting is organized by Dabeen Lee. URL: https://www.ksiam.org/conference/84840fb6-87b0-4566-acc1-4802bde58fbd/welcome

## Date

**Nov 8, 2019** – **Nov 9, 2019** Address: 61-13 Odongdo-ro, Sujeong-dong, Yeosu-si, Jeollanam-do (전남 여수시 오동도로 61-13)

## Venue

Venezia Hotel & Resort Yeosu, Yeosu, Korea (여수 베네치아 호텔) Address: 61-13 Odongdo-ro, Sujeong-dong, Yeosu-si, Jeollanam-do (전남 여수시 오동도로 61-13)

## Speakers

## Schedules

- Combinatorial and Discrete Optimization I: November 8, 2019 Friday, 14:20 – 15:40.
- Kangbok Lee
- Kedong Yan
- Dabeen Lee
- Se-young Yun

- Combinatorial and Discrete Optimization II: November 9, 2019 Saturday, 10:00 – 11:20.
- Hyung Chan An
- Dong Yeap Kang
- Tony Huynh
- Sang-il Oum

## Abstracts

#### Kangbok Lee (이강복), Bi-criteria scheduling

The bi-criteria scheduling problems that minimize the two most popular scheduling objectives, namely the makespan and the total completion time, are considered. Given a schedule, makespan, denoted as $C_\max$, is the latest completion time of the jobs and the total completion time, denoted as $\sum C_j$, is the sum of the completion times of the jobs. These two objectives have received a lot of attention in the literature because of their practical implications. Scheduling problems are somehow difficult to solve even for single criterion. On the other hand, when it comes to a bi-criteria problem, a balanced solution coordinating both objectives is indeed essential. In this paper, we consider bi-criteria scheduling problems on $m$ identical parallel machines where $m$ is 2, 3 and an arbitrary number, denoted as $P2 || (C_\max,\sum C_j)$, $P3 || (C_\max,\sum C_j)$ and $P || (C_\max,C_j)$, respectively. For each problem, we explore its inapproximability and develop an approximation algorithm with analysis of its worst performance.

#### Kedong Yan, Cliques for multi-term linearization of 0-1 multilinear program for boolean logical pattern generation

Logical Analysis of Data (LAD) is a combinatorial optimization-based machine learning method. A key stage of LAD is pattern generation, where useful knowledge in a training dataset of two types of, say, + and − data under analysis is discovered. LAD pattern generation can be cast as a 0-1 multilinear program (MP) with a single 0-1 multilinear constraint: $$(PG): \max\limits_{x\in\{0,1\}^{2n}}f(x):=\sum_{i\in S^+}\Pi_{j\in J_i}(1-x_j)~~\text{subject to}~~g(x):=\sum_{i\in S^-}\Pi_{j\in J_i}(1-x_j)=0$$

The unconstrained maximization of $f$ (without $g$) is straightforward, thus the main difficulty of globally maximizing $(PG)$ arises primarily from the presence of g and the interaction between $f$ and $g$. We dealt with the task of linearizing $g$. Namely, we employed a graph theoretic analysis of data to discover sufficient conditions among neighboring data and also neighboring groups of data for ‘compactly linearizing’ $g$ in terms of a small number of stronger valid inequalities, as compared to those that can be obtained via 0-1 linearization techniques from the literature. In an earlier work, we analyzed + and − data (that is, terms of $f$ and $g$ together) on a graph to develop a polyhedral overestimation scheme for $f$. Extending this line of research, this paper proposes a new graph representation of monomials in f in conjunction with terms in $g$ to more aggressively aggregate a set of terms/data through each maximal clique in the graph into yielding a stronger valid inequality. This is achieved by means of a new notion of ‘neighbors’ that allows us to join two data that are more than 1-Hamming distance away from each other by an edge in the graph. We show that new inequalities generalize and subsume those from the earlier paper. Furthermore, with using six benchmark data mining datasets, we demonstrate that new inequalities are superior to their predecessors in terms of a more efficient global maximization of $(PG)$; that is, for a more efficient analysis and classification of real-life datasets.

#### Dabeen Lee (이다빈), **Joint Chance-constrained programs and the intersection of mixing sets through a submodularity lens**

The intersection of mixing sets with common binary variables arise when modeling joint linear chance-constrained programs with random right-hand sides and finite sample space. In this talk, we first establish a strong and previously unrecognized connection of mixing sets to submodularity. This viewpoint enables us to unify and extend existing results on polyhedral structures of mixing sets. Then we study the intersection of mixing sets with common binary variables and also linking constraint lower bounding a linear function of the continuous variables. We propose a new class of valid inequalities and characterize when this new class along with the mixing inequalities are sufficient to describe the convex hull.

#### Se-Young Yun (윤세영), Optimal sampling and clustering algorithms in the stochastic block model

This paper investigates the design of joint adaptive sampling and clustering algorithms in the Stochastic Block Model (SBM). To extract hidden clusters from the data, such algorithms sample edges sequentially in an adaptive manner, and after gathering edge samples, return cluster estimates. We derive information-theoretical upper bounds on the cluster recovery rate. These bounds reveal the optimal sequential edge sampling strategy, and interestingly, the latter does not depend on the sampling budget, but only the parameters of the SBM. We devise a joint sampling and clustering algorithm matching the recovery rate upper bounds. The algorithm initially uses a fraction of the sampling budget to estimate the SBM parameters, and to learn the optimal sampling strategy. This strategy then guides the remaining sampling process, which confers the optimality of the algorithm.

#### Hyung-Chan An (안형찬), Constant-factor approximation algorithms for parity-constrained facility location problems

*Facility location* is a prominent optimization problem that has inspired a large quantity of both theoretical and practical studies in combinatorial optimization. Although the problem has been investigated under various settings reflecting typical structures within the optimization problems of practical interest, little is known on how the problem behaves in conjunction with parity constraints. This shortfall of understanding was rather disturbing when we consider the central role of *parity* in the field of combinatorics. In this paper, we present the first constant-factor approximation algorithm for the facility location problem with parity constraints. We are given as the input a metric on a set of *facilities* and *clients*, the opening cost of each facility, and the *parity requirement*—$\mathsf{odd}$, $\mathsf{even}$, or $\mathsf{unconstrained}$—of every facility in this problem. The objective is to open a subset of facilities and assign every client to an open facility so as to minimize the sum of the total opening costs and the assignment distances, but subject to the condition that the number of clients assigned to each open facility must have the same parity as its requirement. Although the unconstrained facility location problem as a relaxation for this parity-constrained generalization has unbounded gap, we demonstrate that it yields a structured solution whose parity violation can be corrected at small cost. This correction is prescribed by a $T$-join on an auxiliary graph constructed by the algorithm. This graph does not satisfy the triangle inequality, but we show that a carefully chosen set of shortcutting operations leads to a cheap and *sparse* $T$-join. Finally, we bound the correction cost by exhibiting a combinatorial multi-step construction of an upper bound. At the end of this paper, we also present the first constant-factor approximation algorithm for the parity-constrained $k$-center problem, the bottleneck optimization variant.

#### Dong Yeap Kang (강동엽), On minimal highly connected spanning subgraphs in dense digraphs

In 1985, Mader showed that every $n(\geq4k+3)$-vertex strongly $k$-connected digraph contains a spanning strongly $k$-connected subgraph with at most $2kn-2k^2$ edges, and the only extremal digraph is a complete bipartite digraph $DK_{k,n−k}$. Nevertheless, since the extremal graph is sparse, Bang-Jensen asked whether there exists g(k) such that every strongly $k$-connected $n$-vertex tournament contains a spanning strongly $k$-connected subgraph with $kn + g(k)$ edges, which is an “almost $k$-regular” subgraph.

Recently, the question of Bang-Jensen was answered in the affirmative with $g(k) = O(k^2\log k)$, which is best possible up to logarithmic factor. In this talk, we discuss how to find minimal highly connected spanning subgraphs in dense digraphs as well as tournaments. In particular, we show that every highly connected dense digraph contains a spanning highly connected subgraph that is almost $k$-regular, which yields $g(k) = O(k^2)$ that is best possible for tournaments.

#### Tony Huynh, Stable sets in graphs with bounded odd cycle packing number

It is a classic result that the maximum weight stable set problem is efficiently solvable for bipartite graphs. The recent bimodular algorithm of Artmann, Weismantel and Zenklusen shows that it is also efficiently solvable for graphs without two disjoint odd cycles. The complexity of the stable set problem for graphs without $k$ disjoint odd cycles is a long-standing open problem for all other values of $k$. We prove that under the additional assumption that the input graph is embedded in a surface of bounded genus, there is a polynomial-time algorithm for each fixed $k$. Moreover, we obtain polynomial-size extended formulations for the respective stable set polytopes. To this end, we show that 2-sided odd cycles satisfy the Erdos-Posa property in graphs embedded in a fixed surface. This result may be of independent interest and extends a theorem of Kawarabayashi and Nakamoto asserting that odd cycles satisfy the Erdos-Posa property in graphs embedded in a fixed orientable surface Eventually, our findings allow us to reduce the original problem to the problem of finding a minimum-cost non-negative integer circulation of a certain homology class, which we prove to be efficiently solvable in our case.

#### Sang-il Oum, Rank-width: Algorithmic and structural results

Rank-width is a width parameter of graphs describing whether it is possible to decompose a graph into a tree-like structure by ‘simple’ cuts. This talk aims to survey known algorithmic and structural results on rank-width of graphs. This talk is based on a survey paper with further remarks on the recent developments.