%jmlr submission
\documentclass[11pt,theapa]{article}

\newcommand{\comment}[1]{}

%% JMLR specific 

\usepackage{jmlr2e}

% Heading arguments are {volume}{year}{pages}{submitted}{published}{authors}

\jmlrheading{2}{2002}{445-498}{7/01}{2/02}{David Maxwell Chickering}
\ShortHeadings{Learning Equivalence Classes of Bayesian-Network Structures}{Chickering}
\firstpageno{445}

\begin{document}
%\bibliographystyle{theapa}

\newcommand{\qed}{\mbox{$\Box$}}
\newcommand{\data}{D}
\newcommand{\gr}{\mbox{$\cal G$}}
\newcommand{\pdag}{\mbox{$\cal P$}}
\newcommand{\cpdag}{\mbox{${\cal P}^c$}}
\newcommand{\cpdagone}{\mbox{$\pdag^c_1$}}
\newcommand{\cpdagtwo}{\mbox{$\pdag^c_2$}}
\newcommand{\eclass}{\mbox{$Class$}}

\newcommand{\Par}[1]{\Pi_{#1}}
\newcommand{\Nei}[1]{N_{#1}}
\newcommand{\ParNei}[2]{\Omega_{#1,#2}}
\newcommand{\ComNei}[2]{N_{#1,#2}}

\newcommand{\vstruct}[3]{#1 \rightarrow #2 \leftarrow #3}

\title{Learning Equivalence Classes of Bayesian-Network Structures}

\author{\name David Maxwell Chickering \email dmax@microsoft.com\\
\addr Microsoft Research\\
One Microsoft Way\\
Redmond, WA 98052}

\editor{Craig Boutilier}

\maketitle
\begin{abstract}
\noindent Two Bayesian-network structures are said to be {\em
equivalent} if the set of distributions that can be represented with
one of those structures is identical to the set of distributions that
can be represented with the other. Many scoring criteria that are used
to learn Bayesian-network structures from data are {\em score
equivalent}; that is, these criteria do not distinguish among networks
that are equivalent. In this paper, we consider using a score
equivalent criterion in conjunction with a heuristic search algorithm
to perform model selection or model averaging. We argue that it is
often appropriate to search among equivalence classes of network
structures as opposed to the more common approach of searching among
individual Bayesian-network structures. We describe a convenient
graphical representation for an equivalence class of structures, and
introduce a set of operators that can be applied to that
representation by a search algorithm to move among equivalence
classes. We show that our equivalence-class operators can be scored
locally, and thus share the computational efficiency of traditional
operators defined for individual structures. We show experimentally
that a greedy model-selection algorithm using our representation
yields slightly higher-scoring structures than the traditional
approach without any additional time overhead, and we argue that more
sophisticated search algorithms are likely to benefit much more.
\end{abstract}

\section{Introduction} \label{sec:intro}

Throughout the last decade, there has been an enormous amount of work
in the area of learning Bayesian-network structures from data; Buntine
(1996) provides a good review of the literature, Heckerman (1995)
presents a tutorial on the topic, and Jordan (1998) contains some
introductory articles and more recent advances. Contributions to this
area of research can typically be classified as addressing one of two
general problems: the {\em evaluation} problem and the {\em
identification} problem.

\nocite{Buntine96} 
\nocite{Heckerman95tut}
\nocite{Jordan98}

Work on the evaluation problem concentrates on deriving scoring
criteria for network structures given data; that is, given a structure
and a set of data, how do we assign a score to the structure that
measures how well it fits with the data?  Work to develop or extend
evaluation criteria for Bayesian-network structures includes Buntine
(1991), Cooper and Herskovitz (1992), Suzuki (1993), Bouckaert (1993),
and Heckerman, Geiger, Chickering (1995), among many others.

\nocite{Buntine91}
\nocite{CH92}
\nocite{Suzuki93uai}
\nocite{Bouckaert93}
\nocite{HGC95}

The identification problem is to identify one or more network
structures that yield a high value for the scoring criterion. Research
contributions to this problem include heuristic search techniques as
well complexity results. Note that much of the work on the so-called
``independence approach'' to learning structure falls into this
category. A few of the many contributions to this area include Chow
and Liu (1968), Verma and Pearl (1990), Cooper and Herskovitz (1992),
and Chickering (1996a).

\nocite{Chow68}
\nocite{VP90}
\nocite{CH92}
\nocite{Chickering96uai}

The notion of {\em equivalence} plays an important role for both of
the learning problems. For any given network structure, there is a
corresponding set of probability distributions that can be represented
using a Bayesian network with that structure.  Two network structures
are {\em equivalent} if the set of distributions that can be
represented using one of the structures is identical to the set of
distributions that can be represented using the other. Because
equivalence is reflexive, symmetric, and transitive, the relation
defines a set of equivalence classes over network structures.  Many of
the evaluation criteria found in the literature are {\em score
equivalent}; that is, these scoring criteria assign the same score to
equivalent structures. As we shall argue later, score equivalence is a
desirable property of a scoring criterion unless ``extra'' semantics
(such as causality) can be attributed to the edges in a Bayesian
network.

The main contribution of this paper is to show how to take advantage
of a score-equivalent evaluation criterion when identifying
high-scoring network structures. In particular, we formulate a search
space that can be used in conjunction with a score-equivalent scoring
criterion to search over equivalence classes of Bayesian-network
structures, as opposed to the more common approach of searching over
network structures themselves. The search space is similar to the one
presented by Chickering (1996a); we extend this work by showing how
all of the operators can be scored {\em locally}, and thus we incur
overhead comparable to traditional structure search. By locally, we
mean that the change in score can be computed using local functions of
a subset of the nodes in the network. We provide experimental results
using a greedy search algorithm that demonstrate that our approach
yields slightly higher-scoring solutions while remaining as fast as
the traditional approach. We argue that when a more sophisticated
search algorithm is applied to the learning problem, our space should
prove to be significantly more advantageous.

Other researchers have designed search algorithms to be used with a
score-equivalent scoring criterion. To our knowledge, however, we are
the first to use an explicit representation of an equivalence class
and yet retain the advantage of local scoring.

This paper is organized as follows.  In Section \ref{sec:background}
we discuss previous relevant work and describe our notation.  In
Section \ref{sec:trans}, we describe two algorithms that are needed to
specify our search space. In Section \ref{sec:space}, we provide a
detailed specification of our search space in which the states are
defined to be equivalence classes of Bayesian-network structures.  In
Section \ref{sec:main}, we show how to evaluate each of the operators
locally, and provide necessary and sufficient conditions for each of
the operators to be valid. In Section \ref{sec:experiments}, we
provide experimental results from applying a greedy search algorithm
to our search space, using a particular score-equivalent evaluation
criterion. Finally, in Section \ref{sec:conclusion}, we conclude with
a discussion. Detailed proofs for our main results are given in the
appendix.

\section{Background} \label{sec:background}

In this section, we discuss previous work that is relevant to this
paper, and describe some of the notation that we use.

\subsection{Bayesian Networks}

A Bayesian network $B$ for a set of variables $U = \{x_1, \ldots,
x_n\}$ is a pair $(\gr, \Theta)$. $\gr = (V, E)$ is a directed acyclic
graph---or {\em DAG} for short---consisting of (1) nodes $V$ in
one-to-one correspondence with the variables $U$, and (2) directed
edges $E$ that connect the nodes. $\Theta$ is a set of conditional
probability distributions such that $\Theta_i \in \Theta$ defines the
conditional probability of node $x_i$ given its parents in $\gr$. A
Bayesian network represents a joint distribution over $U$ that factors
according to the structure $\gr$ as follows:
\begin{equation} \label{eq:factor}
p_B(x_1, \ldots, x_n) = \prod_{i=1}^n p(x_i | \Par{x_i}, \Theta_i)
\end{equation}
where $\Par{x_i}$ is the set of parents of node $x_i$ in $\gr$. As
discussed below, there may be additional semantics to the structure of
a Bayesian network. 

The structure $\gr$ of a Bayesian network imposes a set of
independence constraints that must hold in any distribution that can
be represented by a network with that structure. In particular, it can
be shown that Equation \ref{eq:factor} imposes the constraint that
each node is independent of its non-descendants given its parents. See
(for example) Pearl (1988) for a detailed discussion these and other
independence constraints that can be derived from a DAG, as well as a
thorough introduction to Bayesian networks.

\nocite{Pearl88}

Throughout this paper, we make many comparisons between DAGs (and also
between PDAGs, which are defined below). To simplify the discussion,
we will assume that when any such comparison is made, the models are
defined over the same set of variables. Thus when we say, for example,
that two DAGs $\gr$ and $\gr'$ represent the same independence
constraints, we assume that $\gr$ and $\gr'$ have the same node sets.

Two DAGs $\gr$ and $\gr'$ are {\em equivalent} if for every Bayesian
network $B = (\gr, \Theta)$, there exists a Bayesian network $B' =
(\gr', \Theta')$ such that $B$ and $B'$ define the same probability
distribution, and vice versa.

Note that we have placed no restrictions on the functional form of the
conditional probability distributions given in Equation
\ref{eq:factor}. Often in practice, each of these distributions is
assumed to be a member of some specific family, where the choice of
family depends on the particular variable and parent set. For example,
the conditional distributions for continuous variables with continuous
parents might be restricted to linear regressions. If such
restrictions are included, our current definition of equivalence is
inadequate because it ignores any non-independence constraints that
the DAG imposes via the choice of distribution family. For the
remainder of this paper, we will assume that any distribution
restrictions are chosen such that the only (additional) constraints on
the joint distribution imposed by the DAG are independence
constraints. There are two common examples of such distribution
restrictions in practice: (1) when all variables are discrete, the
joint distribution is an unrestricted discrete distribution and the
conditional probability distributions are independent multinomials for
each variable and each parent configuration, and (2) when all
variables are continuous, the joint distribution is a multivariate
Gaussian and the conditional probability distributions are independent
linear regressions for each variable.

\subsection{Learning Bayesian Networks}

As described in the introduction, there has been an enormous amount
of work on learning Bayesian networks from data.  Methods to learn
Bayesian networks from data typically consist of two components. The
first component is a {\em scoring criterion} that takes as input a
Bayesian-network structure, a data set, and possibly some domain
knowledge; the scoring criterion returns a value indicating how well
the structure fits the data. The second component in a typical
learning method is a {\em search algorithm} that identifies one or
more structures with high score. Once the structure of a Bayesian
network is identified from data, it is typically straightforward to
estimate the parameters of that model.

An important property of any method for learning Bayesian networks
from data is the {\em interpretation} that is given to structures by
the scoring criterion. For most criteria, Bayesian-network structures
are interpreted as independence constraints in some distribution. We
call this the {\em independence interpretation} of structures. For
example, most Bayesian criteria interpret a DAG $\gr$ as an assertion
about the independence constraints in the distribution from which the
data was generated. Penalized-likelihood criteria, such as AIC
(Akaike, 1974) and BIC (Schwarz, 1978), interpret $\gr$ as a set of
independence constraints in the maximum-likelihood estimate, derived
from the observed data, of a joint distribution over the domain. The
MDL criterion (e.g., Rissanen, 1986 and Bouckaert, 1993) interprets
$\gr$ as a set of independence constraints in a joint distribution
that can be used to compress the observed data.

\nocite{Akaike74}
\nocite{Schwarz78}
\nocite{Rissanen86}

The most common exception to the independence interpretation of
structures is when the scoring criterion attributes additional causal
semantics to the edges in a DAG. See, for example, Pearl and Verma
(1991), Spirtes, Glymour and Scheines (1993), Druzdzel and Simon
(1993), and Heckerman and Shachter (1995) for formal causal semantics
to Bayesian networks.  

\nocite{PV91} 
\nocite{Spirtes93} 
\nocite{DS93}
\nocite{HS95}

When structures are interpreted as independence constraints in some
distribution, all of the DAGs within the same equivalence class are
redundant representations of the same constraints. It should therefore
not be surprising that many common scoring criteria assign the same
score to structures in the same equivalence class, a property we call
{\em score equivalence}. Chickering (1995), for example, shows that
the AIC criterion, the BIC criterion, the MDL criterion and the
(Bayesian) BDe criterion (Heckerman et al., 1995) are all score
equivalent.\footnote{Chickering (1995) shows that the BDe criterion is
score equivalent under the independence interpretation of DAGs; the
BDe scoring criterion can also be used when DAGs are interpreted
causally.}

\nocite{Chickering95a}
\nocite{Akaike74}
\nocite{Schwarz78}
\nocite{Rissanen86}
\nocite{Bouckaert93}

Given a scoring criterion, the goal of the search procedure is to
identify one or more high-scoring models. In a model-selection
scenario, for example, the goal of the search procedure is to identify
a single model with the highest possible score. Chickering (1996b)
shows that the problem of identifying the Bayesian-network structure
with the highest score is NP-hard when using the BDe scoring
criterion. Although not yet proven, it is thought by most researchers
that this model-selection problem is hard for other common criteria as
well. As a result, a large amount of research effort has concentrated
on applying various heuristic search algorithms. By far the most
common approach is to apply a heuristic search algorithm to the space
of DAGs; in particular, these algorithms traverse the space of DAG
models by applying local changes to the edges of DAGs.

\nocite{Chickering96lns}

\subsection{Learning Equivalence Classes vs. Learning DAGs}

The focus of this paper is to improve heuristic search algorithms in
the common situation when DAGs are interpreted as independence
constraints, and when the scoring criterion is score equivalent. As
pointed out by Andersson, Madigan and Perlman (1997), there are a
number of potential problems when heuristic search algorithms traverse
through the space of DAGs instead of the space of equivalence classes
of DAGs in this situation. We now elaborate on some of these problems.

\nocite{AMP97}

The first potential problem is one of efficiency. Many of the
operators used by these search algorithms are defined to move between
DAGs within the same equivalence class. Under the independence
interpretation of structure, applying such an operator will result in
(a new representation of) the same state. As a result, unless the
search algorithm explicitly tests for equivalence, the algorithm can
waste time re-scoring the same equivalence class to evaluate the merit
of each such operator. In many cases, in order for the algorithm to
move from one equivalence class to another, it will have to make
numerous moves within the same equivalence class. In other words, the
local connectivity of equivalence classes in the search space depends
on the particular DAG being used to represent the current equivalence
class. This can be particularly problematic when applying a greedy
search algorithm; in this case, the choice of the representation for
an equivalence class can be arbitrary due to ties in the scoring
criterion, and because a move within an equivalence class can never
increase the score, the connectivity of the search space becomes
arbitrary as well. Furthermore, the equivalence classes that are
reachable from a given (equivalence class) state will depend on how
the search algorithm reached that state.

Consider Figure \ref{fig:greed}, which shows some DAGs defined over a
three-variable domain. Assume that a greedy search algorithm is
applied to this space, starting from the graph containing no edges
shown in Figure \ref{fig:greed}(a), and assume that the operators used
by the algorithm are directed edge additions, deletions and
reversals. As we shall see below, the DAGs in Figure
\ref{fig:greed}(b) and Figure \ref{fig:greed}(c) are equivalent. Using
a score-equivalent criterion, a greedy search algorithm does not
prefer either DAG to the other, and assuming the score for these DAGs
are better than any other DAG reachable in one move from the initial
state, the algorithm will arbitrarily choose one of them. Unless the
algorithm recognizes that these two DAGs are equivalent, a greedy
search algorithm will waste time evaluating an edge reversal once
reaching either of these states because they are reachable from each
other by an edge reversal. If the DAG in Figure \ref{fig:greed}(b) is
chosen, the algorithm can move, in one step, to the DAG in Figure
\ref{fig:greed}(d). On the other hand, if the DAG in Figure
\ref{fig:greed}(c) is chosen then there is no single edge addition or
reversal that can move the search to the DAG in Figure
\ref{fig:greed}(d), which happens to be the only DAG representation
for its corresponding equivalence class. Unfortunately, the score of
the DAG identified by a greedy search can be sensitive to these
arbitrary choices.

\begin{figure}[th]
\begin{center}
\leavevmode
\epsfxsize=6.0in
\epsffile{Figures/greed.eps}
\end{center}
\caption{Example of a greedy algorithm applied to the space of DAG models.
\label{fig:greed}
}
\end{figure}

Another potential problem with searching over DAG models instead of
equivalence classes has to do with the structure prior provided for
Bayesian scoring criteria. If an algorithm interprets each DAG 
within a given equivalence class as a separate model, the effective
prior probability for an equivalence class will be the {\em sum} of
the prior probabilities for all DAGs within that class. This implicit
summing of priors can have unintended consequences. For example,
assume that we assign a uniform prior over all DAGs. In a domain with
$n$ variables, the equivalence class that places no independence
constraints on the distribution has $n!$ DAG representations, whereas
the equivalence class that imposes the constraint that all variables
are mutually independent has exactly one DAG representation. Thus the
``uniform'' prior specified for the criterion in fact favors the
no-independence model over the all-independence model by a factor of
$n!$ a priori.

When performing model {\em selection}, assigning priors to DAGs
instead of equivalence classes is not necessarily problematic. In
particular, if the algorithm identifies a single high-scoring DAG,
that DAG will correspond to a high-scoring equivalence class that has
the same prior as the DAG. The main problem with the structure priors
comes when performing Bayesian model averaging or selective Bayesian
model averaging; in this case, the structure prior affects how learned
models are combined together to compute various quantities of
interest.  Madigan, Andersson, Perlman and Volinsky (1996) recognized
this problem and, in the context of Monte-Carlo sampling of
structures, devised methods for sampling equivalence classes.

\nocite{MAPV96}

There has been some recent work by Gillispie and Perlman (2001)
studying the behavior of the ratio of the number of DAGs to the number
of equivalence classes as a function of the size of the domain. Their
results suggest that this ratio reaches an asymptote around $3.7$ with
as few as ten nodes. This result potentially mitigates our worries
about searching over DAG models because we expect the number of DAGs
for the average equivalence class to be reasonably small. We expect,
for example, that a naive exhaustive search algorithm traversing all
models will take only four times as long in DAG space than if it
enumerated the equivalence classes explicitly. More important,
however, is the number of DAG elements contained in the equivalence
classes over which our algorithm searches; as we shall see in Section
\ref{sec:experiments}, this number can be enormous.

\nocite{GP01}

There exist scoring criteria that use the independence interpretation
of structure, yet are not score equivalent. The most widely used
example of this is the K2 criterion derived by Cooper and Herskovitz
(1992). The K2 criterion applied to a DAG $\gr$ evaluates the relative
posterior probability that the generative distribution has the same
independence constraints as those entailed by $\gr$. The criterion is
not score equivalent because the prior distributions over parameters
corresponding to different structures are not always consistent;
nevertheless, the criterion is very efficient to evaluate and yields
good results in practice. The results of this paper can be applied
directly when using criteria of this type by allowing equivalence
classes to be evaluated by arbitrary members of that class; this will
often be reasonable because the distinction between DAG scores within
the same equivalence class under the independence interpretation
should have no significance other than computational convenience.

\nocite{CH92}

There also exist score-equivalent criteria that do not use the
independence interpretation of structure. An example of this is the
(Bayesian) BDe scoring criterion derived by Heckerman et al. (1995)
when applied to learning causal networks from data. This scoring
criterion has the property that the data does not distinguish between
equivalent structures. Thus, if all DAGs in an equivalence class are
equally likely a priori, they will all get the same score using the
criterion, and therefore searching over equivalence classes is
appropriate.

Other researchers have explored methods for searching through
equivalence classes of Bayesian-network structures. One approach,
commonly known as the {\em independence approach} to learning, uses an
independence oracle---implemented with some statistical test with a
given level of sensitivity---in conjunction with a set of rules to
identify an equivalence class from data. Verma and Pearl (1992),
Spirtes, Glymour and Scheines (1993), and Meek (1995) all describe
examples of this approach. In this paper, we concentrate instead on
model selection and model averaging when using scoring criteria to
identify models. See Chickering (1995) for a discussion of the
distinction between the independence approach to learning and the
so-called {\em metric approach} that we consider here.

As mentioned above, Madigan et al. (1996) consider searching through
the space of equivalence classes using Monte-Carlo sampling, and their
operators can be used for general heuristic search algorithms as
well. Spirtes and Meek (1995) define a greedy search algorithm that
explicitly searches through the space of equivalence
classes. Chickering (1996a) defines a set of operators that traverse
through the space of equivalence classes, and shows that a greedy
search algorithm using these operators generally results in better
models than when searching through DAG space. Dash and Druzdzel (1999)
apply multiple instances of the independence approach---using random
levels of significance for the statistical test that approximates the
independence oracle---and evaluate each result by extracting a
representative DAG model, retaining the best such DAG at each step.

In this paper, we define a search space that can be used by any
heuristic search algorithm, and we show how to score all of the
operators using local functions of the nodes in our graphical
representations of equivalence classes. The advantage of our approach
over previous work is efficiency: because the operators can be scored
locally, search algorithms applied to our space can traverse through
equivalence classes much faster than previous algorithms. Munteanu and
Cau (2000) derive three operators that are a subset of the operators
we present in this paper and present local scores for them, but our
results provide counter-examples to the validity conditions of all of
their operators and to the scores of two of the three. After the
original submission of the present paper, Munteanu and Bendou (2001)
extended the set of three operators to include two more of the
operators that we present here.

\nocite{MC00}
\nocite{Meek95uai}
\nocite{Spirtes93}
\nocite{DD99}
\nocite{SM95}
\nocite{MB01}

\subsection{General Notation and Results}

We now introduce notation and results that will be used to define our
search space.  The {\em skeleton} of any DAG is the undirected graph
resulting from ignoring the directionality of every edge.  A {\em
v-structure} in a DAG $\gr$ is an ordered triple of nodes $(x, y, z)$
such that $\gr$ contains the directed edges $x \rightarrow y$ and $z \rightarrow
y$, and $x$ and $z$ are not adjacent in $\gr$.  Verma and Pearl (1990)
derive the following characterization of equivalent structures:

\nocite{VP90}

\begin{theorem} {\rm [Verma and Pearl, 1990]} \label{th:vp}
Two DAGs are equivalent if and only if they have the same
skeletons and the same v-structures.
\end{theorem}

A directed edge $x \rightarrow y$ is {\em compelled} in $\gr$ if for
every DAG $\gr'$ equivalent to $\gr$, $x \rightarrow y$ exists in
$\gr'$. For any edge $e$ in $\gr$, if $e$ is not compelled in $\gr$,
then $e$ is {\em reversible} in $\gr$; that is, there exists some DAG
$\gr'$ equivalent to $\gr$ in which $e$ has opposite orientation.

A consequence of Theorem \ref{th:vp} is that for any edge $e$
participating in a v-structure in some DAG $\gr$, if that edge is
reversed in some other DAG $\gr'$, then $\gr$ and $\gr'$ are not
equivalent. Thus any edge participating in a v-structure is compelled.
Not every compelled edge, however, necessarily participates in a
v-structure. For example, the edge from $z$ to $w$ in the DAG shown in
Figure \ref{fig:cpdag}(a) is compelled.

An acyclic partially directed graph, or {\em PDAG} for short, is a
graph that contains both directed and undirected edges. It is acyclic
in the sense that it contains no {\em directed} cycles. We use PDAGs
to represent equivalence classes of Bayesian-network structures.  Let
$\pdag$ denote an arbitrary PDAG. We define the equivalence class of
DAGs corresponding to $\pdag$---denoted $\eclass(\pdag)$---as follows:
$\gr \in \eclass(\pdag)$ if and only if $\gr$ and $\pdag$ have the
same skeleton and the same set of v-structures.\footnote{The
definitions for the skeleton and set of v-structures for a PDAG are
the obvious extensions to these definitions for DAGs.}  From Theorem
\ref{th:vp}, it follows that a PDAG containing a directed edge for
every edge participating in a v-structure, and an undirected edge for
every other edge, uniquely identifies an equivalence class of
DAGs. There may be many other PDAGs, however, that correspond to the
same equivalence class. For example, any DAG interpreted as a PDAG can
be used with our definition of $\eclass$ to represent its own
equivalence class.

If a DAG $\gr$ has the same skeleton and the same set of v-structures
as a PDAG $\pdag$, and if every directed edge in $\pdag$ has the same
orientation in $\gr$, we say that $\gr$ is a {\em consistent
extension} of $\pdag$. Note that any DAG that is a consistent
extension of $\pdag$ must also be contained in $\eclass(\pdag)$, but
not every DAG in $\eclass(\pdag)$ is a consistent extension of
$\pdag$. If there is at least one consistent extension of a PDAG
$\pdag$, we say that $\pdag$ {\em admits a consistent
extension}. Figure \ref{fig:pdag}(a) shows a PDAG that admits a
consistent extension, and Figure \ref{fig:pdag}(b) shows one such
consistent extension. Figure \ref{fig:pdag}(c) shows a PDAG that does
not admit a consistent extension.

\begin{figure}[th]
\begin{center}
\leavevmode
\epsfxsize=4.5in
%%\epsfxsize=2.5in
\epsffile{Figures/pdag.eps}
\end{center}
\caption{(a) a PDAG that admits a consistent extension, (b) a
consistent extension of the PDAG in (a), and (c) a PDAG that does not
admit a consistent extension.
\label{fig:pdag}
}
\end{figure}

The {\em completed PDAG} corresponding to an equivalence class is the
PDAG consisting of a directed edge for every compelled edge in the
equivalence class, and an undirected edge for every reversible edge in
the equivalence class. Note that for a completed PDAG $\cpdag$, unlike
arbitrary PDAGs, every DAG contained in $\eclass(\cpdag)$ is a
consistent extension of $\cpdag$. Figure \ref{fig:cpdag}(a) shows a
DAG $\gr$, and Figure \ref{fig:cpdag}(b) shows the completed PDAG for
$\eclass(\gr)$. PDAGs are called {\em patterns} by Spirtes et
al. (1993) and completed PDAGs are called {\em essential graphs} by
Andersson et al. (1997) and {\em maximally oriented graphs} by Meek
(1995).

\begin{figure}[th]
\begin{center}
\leavevmode
\epsfxsize=4.25in
%%\epsfxsize=2.5in
\epsffile{Figures/cpdag.eps}
\end{center}
\caption{(a) a DAG $\gr$ and (b) the completed PDAG for
$\eclass(\gr)$.
\label{fig:cpdag}
}
\end{figure}

Given an equivalence class of Bayesian-network structures, the
completed PDAG for that class is unique. We emphasize this result with
a lemma.

\begin{lemma} \label{lem:unique}
Let $\cpdagone$ and $\cpdagtwo$ denote two completed PDAGs that both
admit a consistent extension. Then $\cpdagone  = \cpdagtwo$ if and 
only if $\eclass(\cpdagone) = \eclass(\cpdagtwo)$.
\end{lemma}

\noindent{\bf Proof:} Follows immediately by Theorem \ref{th:vp} and
by the definitions of compelled and reversible edges. \qed

A {\em clique} in a DAG or a PDAG is a set of nodes for which every
pair of nodes is adjacent. A {\em topological sort} of the nodes in a
DAG $\gr$ is any total ordering of the nodes such that, for any pair
of nodes $x_i$ and $x_j$ in $\gr$, if $x_i$ is an ancestor of $x_j$,
then $x_i$ must precede $x_j$ in the ordering.

As in Equation \ref{eq:factor} above, we use $\Par{x}$ to denote the
set of parents for node $x$ in both a DAG and a PDAG. We use $\Nei{x}$
to denote the set of neighbors of node $x$ in a PDAG. We use
$\ComNei{x}{y} = \Nei{x} \cap \Nei{y}$ to denote the set of common
neighbors of $x$ and $y$ in a PDAG. We use $\ParNei{x}{y} = \Par{x}
\cap \Nei{y}$ to denote the set of parents of node $x$ that are
neighbors of node $y$ in a PDAG. 

\section{Converting Between DAGs and PDAGs} \label{sec:trans}

In this section, we describe two algorithms that we need in order to
apply the search operators that we present in the next section.  The
first algorithm, which we refer to as {\sc DAG-to-CPDAG}, takes as
input a Bayesian-network structure, and outputs a completed PDAG
representation of the equivalence class to which that structure
belongs. The second algorithm, which we refer to as {\sc PDAG-to-DAG},
takes as input a PDAG representation for an equivalence class, and
outputs a DAG contained in that class.

Although we briefly discuss the complexity of implementing both of
these algorithms, the time most heuristic search algorithms will spend
converting between DAGs and PDAGs is insignificant. In particular,
most algorithms spend the majority of their time evaluating the score
of adjacent states rather than actually traversing states; because we
have derived local scores for all of the operators, the conversion
algorithms presented in this section need only be applied when the
search algorithm commits to moving to a given state. Furthermore, the
complexities of the conversion algorithms depend only on the
structural complexity (i.e., number of nodes and edges) of the graph,
and not on the size of the data.

Several researchers, including Verma and Pearl (1992) and Meek (1995),
present {\em rule-based} algorithms that can be used to implement {\sc
DAG-to-CPDAG}. The idea of these implementations is as follows. First,
we undirect every edge in a DAG, except for those edges that
participate in a v-structure. Then, we repeatedly apply one of a set
of rules that transform undirected edges into directed edges. The
rules ``fire'' by matching local patterns of undirected and directed
edges in the given graph. Meek (1995) proves that the transformation
rules are sound and complete. That is, once no rule matches on the
current graph, that graph must be a completed PDAG.  In the appendix,
we detail these rules for the purpose of proving some of our
results. Andersson et al. (1997)\nocite{AMP97}, provide a similar
rule-based algorithm, except that edges from a DAG are {\em
undirected} by matching patterns of edges.

\nocite{Meek95uai} 

We now provide the details of an alternative algorithm that can be
used to implement {\sc DAG-to-CPDAG} which is computationally
efficient. In particular, Chickering (1995) describes an algorithm
that, when given a DAG containing $|E|$ edges, runs in time $O(|E|)$
on average.  The algorithm labels all of the edges in a DAG as either
``compelled'' or ``reversible''; given such a labeling, it is trivial
to construct the corresponding completed PDAG.  The first step of the
algorithm is to define a (particular) total ordering over the edges in
the given DAG. For simplicity, we present this step as a separate
procedure listed in Figure \ref{fig:order-edges}.  \comment{To avoid
confusion between ordered nodes and ordered edges, we have capitalized
``node'' and ``edge'' in the figure.} In Figure \ref{fig:label-edges},
we show an algorithm of Chickering (1995) that labels the edges. For a
detailed study of the correctness of these algorithms, we refer the
reader to Chickering (1995).

\begin{figure}[t] 
{
\noindent {\sc \bf \bf Algorithm {\sc ORDER-EDGES}($\gr$)} \\
\noindent {\sc \bf \bf Input:} DAG $\gr$ \\
\noindent {\sc \bf \bf Output:} DAG $\gr$ with labeled total order on
edges \\
\noindent 1. \parbox[t]{5in}{Perform a topological sort on the NODES in $\gr$}\\
\noindent 2. \parbox[t]{5in}{Set $i = 0$} \\
\noindent 3. \parbox[t]{5in}{{\bf While} there are unordered EDGES in $\gr$}\\
\noindent 4. \hspace{.1in} \parbox[t]{5.5in}{Let $y$ be the lowest ordered
NODE that has an unordered EDGE incident into it} \\
\noindent 5. \hspace{.1in} \parbox[t]{5in}{Let $x$ be the highest ordered
NODE for which $x \rightarrow y$ is not ordered}\\
\noindent 6. \hspace{.1in} \parbox[t]{5in}{Label $x \rightarrow y$ with
order $i$} \\
\noindent 7. \hspace{.1in} \parbox[t]{5in}{$i = i + 1$}
}
\caption{Algorithm to produce a total ordering over the edges in a
DAG. The algorithm is used by Algorithm {\sc Label-Edges}.
\label{fig:order-edges}
}
\end{figure}

\begin{figure}[t] 
\noindent {\sc \bf Algorithm {\sc LABEL-EDGES}($\gr$)} \\
\noindent {\sc \bf Input:} DAG $\gr$ \\
\noindent {\sc \bf Output:} \parbox[t]{5in}{DAG $\gr$ with each edge labeled either
``compelled'' or ``reversible'' }
%1
\begin{tabbing}
\noindent 1. \= \parbox[t]{5in}{Order the edges in $\gr$ using {\sc \bf
Algorithm Order-Edges}} \\
%2
\noindent 2. \> \parbox[t]{5in}{Label every edge in $\gr$ as ``unknown''}\\
%3
\noindent 3. \> \parbox[t]{5in}{{\bf While} there are edges labeled
``unknown'' in $\gr$}\\
%4
\noindent 4. \> \hspace{.1in} \parbox[t]{5in}{Let $x \rightarrow y$ be the
lowest ordered edge that is labeled ``unknown'' } \\
%5
\noindent 5. \> \hspace{.1in} \parbox[t]{5in}{{\bf For} every 
edge $w \rightarrow x$ labeled ``compelled''} \\
%6
\noindent 6. \> \hspace{.3in} \parbox[t]{5in}{{\bf If} $w$ is not a parent of
 $y$}\\
\noindent 7. \> \hspace{.5in} \parbox[t]{4.8in}{Label $x \rightarrow y$ and every edge incident into $y$
with ``compelled''}\\
\noindent 8. \> \hspace{.5in} \parbox[t]{4.8in}{{\bf Goto} 3}\\
%7
\noindent 9. \> \hspace{.3in} \parbox[t]{5in}{{\bf Else}}\\
\noindent 10. \> \hspace{.5in} \parbox[t]{4.8in}{Label $w \rightarrow y$ with ``compelled''} \\
%8
\noindent 11. \> \hspace{.1in} \parbox[t]{5in}{{\bf If} there exists an 
edge $z \rightarrow y$ such that $z \not = x$ and $z$ is not a parent
of $x$} \\
\noindent 12. \> \hspace{.3in} \parbox[t]{5in}{Label $x \rightarrow y$ and all ``unknown'' edges
incident into $y$ with ``compelled''}\\
%9
\noindent 13. \> \hspace{.1in} \parbox[t]{5in}{{\bf Else}}\\
\noindent 14. \> \hspace{.3in} Label $x \rightarrow y$ and all ``unknown'' edges incident into $y$
with ``reversible'' \\
%10
\end{tabbing}
\caption{Algorithm to label each edge in a DAG with ``compelled'' or
``reversible'', which leads to an immediate implementation of {\sc
DAG-to-CPDAG}.
\label{fig:label-edges}
}
\end{figure}

We now consider a simple implementation of {\sc PDAG-to-DAG} due to
Dor and Tarsi (1992).  Recall from Section \ref{sec:background} that
we use $\Nei{x}$ and $\Par{x}$ to denote the set of neighbors and
parents, respectively, of node $x$. Given a PDAG $\pdag$, we create a
DAG $\gr$ that contains all of the directed edges from $\pdag$, and no
other edges. We then repeat the following procedure: First, select a
node $x$ in $\pdag$ such that (1) $x$ has no outgoing edges and (2) if
$\Nei{x}$ is non-empty, then $\Nei{x} \cup \Par{x}$ is a clique. If
$\pdag$ admits a consistent extension, a node $x$ with these
properties is guaranteed to exist. Next, for each undirected edge
$y-x$ incident to $x$ in $\pdag$, insert a directed edge $y
\rightarrow x$ in $\gr$. Finally, remove $x$ and all incident edges
from the $\pdag$ and continue with the next node. The algorithm
terminates when all nodes have been deleted from $\pdag$.

\nocite{DorTarsi92}

We now consider a very loose upper bound on the complexity of a simple
implementation of {\sc PDAG-to-DAG} for a PDAG consisting of $|V|$
nodes and $|E|$ edges. For each node $x$, we can determine if its
neighbors and parents form a clique in $O(|\Nei{x} \cup
\Par{x}|^2)$ time. For each edge that is removed, we need only check
the endpoints to see if their neighbors and parents now form a clique;
if $\Nei{z} \cup \Par{z}$ is a clique for some node $z$ in $\pdag$ at
any step of the algorithm, this set must remain a clique after each
edge removal in the algorithm. If $|\Nei{x} \cup \Par{x}|$ is
bounded above by some constant $k$ then we can implement the algorithm
in time $O(|V| k^2 + |E| k^2)$.

Our upper bound may not be very good if $k$ is on the order of the
number of variables in the domain. We note, however, that if we select
$x_i$ after discovering a clique among its neighbors and parents, then
the algorithm results in {\em every} member of $\Nei{x_i} \cup
\Par{x_i}$ being made a parent of $x_i$ in the DAG. Because the number
of parameters in the local distribution of a discrete node grows
exponentially with the number of parents, it is reasonable to expect
the number of {\em parents} of such a node to be bounded by some
reasonably small constant $k$. This can be enforced explicitly using
the scoring criterion, or implicitly by the lack of data needed to
support such a complex model. If we are given the upper bound $k$
ahead of time, we need never even {\em check} for a clique among the
neighbors and parents unless the number of these nodes is less than
$k$.

In practice, graphs encountered during search are reasonably sparse,
and as will be evident from the timing results in Section
\ref{sec:experiments}, the time spent executing either of the two
algorithms presented in this section is insignificant.

\section{Heuristic Search for Bayesian-network Structures} \label{sec:space}

Given a scoring criterion for evaluating Bayesian-network structures,
a typical learning algorithm attempts to identify one or more
structures that attain a high score by applying a heuristic search
algorithm. In this section, we formulate a {\em search space} that the
heuristic search algorithm can use---in conjunction with a
score-equivalent scoring criterion---to search over equivalence classes
of Bayesian-network structures. A search space has three components:

\begin{enumerate}
\item A set of states
\item A representation scheme for the states
\item A set of operators
\end{enumerate}

The set of states represents the logical set of solutions to the
search problem, the representation scheme defines an efficient way to
represent the states, and the set of operators is used by the search
algorithm to transform the representation of one state to another in
order to traverse the space in a systematic way. Once the search space
has been defined, any one of a number of well-known heuristic search
algorithms can easily be applied to that space.

In perhaps the simplest formulation of a search space for learning
Bayesian networks, the states of the search are defined to be
individual Bayesian-network structures, the representation of a state
is simply a DAG, and the operators are defined to be local changes to
those DAGs. For example, Chickering, Geiger and Heckerman (1995)
\nocite{CGH95aistats} compare various search procedures in the search
space of Bayesian-network structures, using the following operators:
for any pair of nodes $x$ and $y$, if $x$ and $y$ are adjacent, the
edge connecting them can be either deleted or reversed. If $x$ and $y$
are not adjacent, an edge can be added in either direction. All
operators are subject to the constraint that a cycle cannot be
formed. We shall use {\em Bayesian-network space}, or {\em B-space}
for short, to denote the search space defined in this way. Figure
\ref{fig:dagops} shows an example of each operator in B-space.

\begin{figure}[t]
\begin{center}
\leavevmode
\epsfxsize=5.5in
\epsffile{Figures/dagops.eps}
\end{center}
\caption{States resulting from the application of a single operator in
B-space.
\label{fig:dagops}
}
\end{figure}

To appreciate the reasons for the popularity of B-space among
practitioners, we need the following definition.

\begin{definition} A Bayesian-network-structure scoring criterion $S$
is {\em decomposable} if it can be written as a sum of measures, each
of which is a function only of one node and its parents.
\end{definition} 

In other words, a decomposable scoring criterion $S$ applied to a
DAG $\gr$ can always be expressed as:
\begin{equation} \label{eq:decompose}
S(\gr) = \sum_{i=1}^n s(x_i, \Par{x_i})
\end{equation}
where $n$ is the number of nodes in $\gr$ and $s(x_i, \Par{x_i})$ is a
function only of $x_i$ and its parents in $\gr$.  Note that the data
$\data$ is implicit in Equation \ref{eq:decompose}. When we say that
$s(x_i, \Par{x_i})$ is only a function of $x_i$ and its parents, we
intend this to also mean that the {\em data} on which this measure
depends is restricted to those columns corresponding to $x_i$ and its
parents.\footnote{To be explicit, we could re-write the terms in the
sum of Equation \ref{eq:decompose} as $s(x_i,
\Par{x_i},\data(\{x_i\}), \data(\Par{x_i}))$, where $\data(X)$ denotes
the data restricted to the columns corresponding to the variables in
set $X$. We find it convenient, however, to keep the notation simple.}

Given a decomposable scoring criterion, B-space is particularly
attractive because the change in score that results from the
application of an operator can be computed locally. In particular,
given the score for some DAG, only those terms in the sum above
corresponding to nodes whose parents have changed need be re-computed
to derive the resulting score.  As discussed at length in Section
\ref{sec:background}, however, there are some potential problems with
using B-space when learned DAGs are interpreted as independence
constraints in some distribution.

We now define a search space for which the states are equivalence
classes of Bayesian-network structures. We call this search space {\em
equivalence-class space}, or {\em E-space} for short.  We use
completed PDAGs to represent the states of the search in E-space. As
we saw from Lemma \ref{lem:unique} in Section \ref{sec:background},
using completed PDAGs instead of general PDAGs (or DAGs in the case of
B-space) eliminates the problem of having multiple representations for
the same equivalence class.  To complete the specification of E-space,
we define six simple operators that can be applied to completed
PDAGs. The operators are given in Table \ref{tab:operators}.

All operators are subject to the constraint that the resulting graph
is a PDAG (i.e., a graph containing no directed cycles) that admits a
consistent extension. If an operator does, in fact, result in a PDAG
that admits a consistent extension, we say that the operator is {\em
valid}; otherwise, we say the operator is {\em not valid}.

\begin{table}
\begin{enumerate}
\item {\bf (InsertU)} For any pair of nodes $x$ and $y$ that are not adjacent in
\cpdag, we can insert an undirected edge between $x$ and $y$
\item {\bf (DeleteU)} For any undirected edge $x - y$ in $\cpdag$, we can delete the
edge 
\item {\bf (InsertD)} For any pair of nodes $x$ and $y$ that are not
adjacent in $\cpdag$, we can insert a directed edge in either
direction
\item {\bf (DeleteD)} For any directed edge $x \rightarrow y$ in $\cpdag$,
we can delete the edge
\item {\bf (ReverseD)} For any directed edge $x \rightarrow y$ in $\cpdag$, we can
reverse the edge
\item {\bf (MakeV)} For any triple of nodes $x$, $y$ and $z$ in $\cpdag$, such that
(1) $x$ and $z$ are not adjacent, (2) $\cpdag$ contains the undirected
edge $x - y$, and (3) $\cpdag$ contains the undirected edge $y - z$,
we can insert the v-structure $x \rightarrow y \leftarrow z$.
\end{enumerate}
\caption{E-space operators to be applied to completed PDAGs}
\label{tab:operators}
\end{table}

\begin{figure}[t]
\begin{center}
\leavevmode
\epsfxsize=5.75in
%%\epsfxsize=2.5in
\epsffile{Figures/scheme.eps}
\end{center}
\caption{Diagram depicting how operators are applied. The PDAGs on the
bottom give an example for every stage of the process.
\label{fig:scheme}
}
\end{figure}

After applying an operator to a completed PDAG, the resulting PDAG is
not necessarily completed. To recover the completed PDAG resulting
from the operator, we use the transformation algorithms presented in
Section \ref{sec:trans}. For a given completed PDAG, let $\pdag$
denote the PDAG---not necessarily completed---that results after
directly applying one of the operators.  The completed PDAG that
results from the operator is obtained as follows. First, we call the
algorithm {\sc PDAG-to-DAG} with input $\pdag$ to extract a consistent
extension $\gr$. If $\pdag$ does not admit a consistent extension,
then the given operator is not valid. To complete the application, the
algorithm {\sc DAG-to-CPDAG} is called with input $\gr$ to build the
resulting completed PDAG representation. The process of applying an
operator is depicted schematically in Figure \ref{fig:scheme}, and the
application of each operator type is illustrated in Figure
\ref{fig:pdagops}.


\begin{figure}[t]
\begin{center}
\leavevmode
\epsfxsize=6in
%%\epsfxsize=2.5in
\epsffile{Figures/pdagops.eps}
\end{center}
\caption{Example of each type of operator.
\label{fig:pdagops}
}
\end{figure}

There is one other restriction that we place on the insertion
operators. Namely, we only allow the insertion of an edge if it has
the same ``directedness'' in the resulting completed PDAG. In other
words, we cannot insert a directed edge if that edge will be
undirected in the resulting completed PDAG, and we cannot insert an
undirected edge if the edge will be directed in the resulting
completed PDAG. In Figure \ref{fig:pdagops}, for example, we are not
allowed to insert the directed edge $v \rightarrow u$.

As we state in the following theorem, the operators are {\em complete}
for the search space. That is, given any pair of completed PDAGs
$\cpdagone$ and $\cpdagtwo$ that both admit a consistent extension,
there exists a sequence of valid operators that moves from $\cpdagone$
to $\cpdagtwo$. 

\begin{theorem} \label{th:complete}
The operators
InsertU, InsertD, DeleteU, DeleteD, and MakeV are complete.
\end{theorem}

Note that the ReverseD operator is not needed for the space to be
complete; we include it because it adds extra connectivity to the
space without adding too many extra operators. We provide a detailed
proof of Theorem \ref{th:complete} in the appendix, but the intuition
is as follows: we can transform $\cpdagone$ into a completed PDAG with
no edges by deleting edges, either directed or undirected, one at a
time until there are none remaining. We can then construct each of the
v-structures that exist in $\cpdagtwo$ with (one or) two edge
additions followed by (if necessary) a MakeV operator. The remaining
directed edges from $\cpdagtwo$ can then be inserted one at a time
such that they are also directed in $\cpdagone$. Finally, all of the
undirected edges from $\cpdagtwo$ can be added to $\cpdagone$.

Having defined the search space, we can now apply heuristic search
techniques to identify high-scoring equivalence classes using a
scoring criterion that evaluates Bayesian-network structures. In
particular, given a current state, an algorithm can evaluate each
adjacent state (i.e., the state resulting from applying each operator)
by applying the operator to the PDAG and evaluating the score of any
consistent extension of the resulting PDAG.  Chickering (1996a)
provides experimental results that compare E-space\footnote{Chickering
(1996a) does not require the presence of the two undirected edges in
the MakeV operator.} to B-space using this exact approach, in
conjunction with a greedy search algorithm and a Bayesian scoring
criterion. The results show that although the greedy algorithm found
slightly better equivalence classes when applied to E-space, the
algorithm was much faster when applied to B-space. 

\nocite{Chickering96uai}

The extra overhead Chickering (1996a) encountered for E-space stems
from the way operators in E-space are scored. Although these operators
are all simple and local changes to the edges in a completed PDAG, as
we see in the example of Figure \ref{fig:scheme}, an operator can have
``cascading'' effects on the PDAG---and consequently the consistent
extension that is used to score the equivalence class---that
results. To score an operator, it was sometimes necessary to first
apply that operator to the current state, then extract a consistent
extension using PDAG-to-DAG, and finally score the resulting DAG
without being able to exploit locality. The author mentions that most
operators can be scored by applying local changes to a corresponding
consistent extension; unfortunately the few operators that do need to
be applied to the PDAG slow the algorithm enough to raise doubt about
the usefulness of the space.

In the next section, we show how to score all of the operators in
E-space locally, and thus eliminate the drawback of using the state
space.

\section{Local Scoring in E-Space} \label{sec:main}

In this section, we present the main result of this paper; we show how
to score locally all of the E-space operators presented in the
previous section. In particular, we show that given a decomposable,
score-equivalent scoring criterion for DAG models, the increase in
score that results from applying each of the operators can be
calculated by evaluating at most four terms in the sum of Equation
\ref{eq:decompose}.

To describe the local score, we need the concept of a {\em
semi-directed} path. This is the same as a directed path except that
any of the edges may be undirected. More formally we have the
following definition.

\begin{definition} \label{def:semi-directed}
A {\em semi-directed path} from $x$ to $y$ in a PDAG is a path from
$x$ to $y$ such that each edge is either undirected or directed away
from $x$.
\end{definition}

Recall from Section \ref{sec:background} that $\Par{x}$ is the set of
parents of $x$, $\Nei{x}$ is the set of neighbors of $x$,
$\ComNei{x}{y}$ is the set of common neighbors of $x$ and $y$, and
$\ParNei{x}{y}$ is the set of parents of $x$ that are neighbors of
$y$. The following six theorems and corresponding corollaries
demonstrate, for each of the six types of operators, how to test
whether or not the operator is valid and how to score the operator. In
the statement of these results, we have adopted the following notation
to simplify presentation: for a set of nodes $M$ and a node $x$, we
use $M^{+x}$ as shorthand for $M \cup \{x\}$. Similarly, we use
$M^{-x}$ as shorthand for $M \setminus \{x\}$.

\begin{theorem} \label{th:insertu-valid}
Let $x$ and $y$ be two nodes that are not adjacent in $\cpdag$. The
insertion of the undirected edge $x - y$ is valid if and only if (1)
every undirected path between $x$ and $y$ contains a node in
$\ComNei{x}{y}$, and (2) $\Par{x} = \Par{y}$.
\end{theorem}

\begin{corollary} \label{cor:insertu-score}
For any score-equivalent decomposable scoring criterion, the increase
in score that results from a valid insertion of an undirected edge $x
- y$ to a completed PDAG $\cpdag$ is:
$$
s(y,\ComNei{x}{y}^{+x} \cup \Par{y}) - 
s(y,\ComNei{x}{y} \cup \Par{y})
$$
\end{corollary}

\begin{theorem} \label{th:deleteu-valid}
Let $x-y$ be an undirected edge in completed PDAG $\cpdag$. The
deletion of $x - y$ is valid if and only if $\ComNei{x}{y}$ is a
clique of undirected edges.
\end{theorem}

\begin{corollary} \label{cor:deleteu-score}
For any score-equivalent decomposable scoring criterion, the increase
in score that results from a valid deletion of an undirected edge
$x-y$ to a completed PDAG $\cpdag$ is:
$$ 
s(y,\ComNei{x}{y} \cup \Par{y}) - 
s(y,\ComNei{x}{y}^{+x} \cup \Par{y})
$$
\end{corollary}

\begin{theorem} \label{th:insertd-valid}
Let $x$ and $y$ be any two nodes that are not adjacent in 
completed PDAG $\cpdag$. The insertion of $x \rightarrow y$ is valid
if and only if (1) every semi-directed path from $y$ to $x$ contains
at least one node in $\ParNei{x}{y}$, (2) $\ParNei{x}{y}$ is a clique
of undirected edges, and (3) $\Par{x} \not = \Par{y}$.
\end{theorem}

\begin{corollary} \label{cor:insertd-score}
For any score-equivalent decomposable scoring criterion, the increase
in score that results from a valid insertion of a directed edge $x
\rightarrow y$ to a completed PDAG $\cpdag$ is:
$$ 
s(y,\ParNei{x}{y} \cup \Par{y}^{+x}) - 
s(y,\ParNei{x}{y} \cup \Par{y})
$$
\end{corollary}

\begin{theorem} \label{th:deleted-valid}
Let $x \rightarrow y$ be a directed edge in completed PDAG
$\cpdag$. The deletion of $x \rightarrow y$ is valid if and only if
$\Nei{y}$ is a clique of undirected edges.
\end{theorem} 

\begin{corollary} \label{cor:deleted-score}
For any score-equivalent decomposable scoring criterion, the increase
in score that results from a valid deletion of a directed edge $x
\rightarrow y$ to a completed PDAG $\cpdag$ is:
$$ 
s(y,\Nei{y} \cup \Par{y}^{-x}) - 
s(y,\Nei{y} \cup \Par{y})
$$
\end{corollary}

\begin{theorem} \label{th:reversed-valid}
Let $x \rightarrow y$ be a directed edge in completed PDAG
$\cpdag$. The reversal of $x \rightarrow y$ is valid if and only if
(1) every semi-directed path from $x$ to $y$ that does not include the
edge $x \rightarrow y$ contains at least one node in $\ParNei{y}{x}
\cup \Nei{y}$, and (2) $\ParNei{y}{x}$ is a clique of undirected
edges.
\end{theorem}

\begin{corollary} \label{cor:reversed-score}
For any score-equivalent decomposable scoring criterion, the increase
in score that results from a valid reversal of a directed edge $x
\rightarrow y$ to a completed PDAG $\cpdag$ is:
$$ 
s(y,\Par{y}^{-x}) + s(x,\Par{x}^{+y} \cup \ParNei{y}{x})- 
s(y,\Par{y}) - s(x,\Par{x} \cup \ParNei{y}{x})
$$
\end{corollary}

\begin{theorem} \label{th:makev-valid}
Let $x-z-y$ be any length-two undirected path in completed PDAG
$\cpdag$ such that $x$ and $y$ are not adjacent. Replacing the
undirected edges with directed edges to create the v-structure
$\vstruct{x}{z}{y}$ is valid if and only if every undirected path
between $x$ and $y$ contains a node in $\ComNei{x}{y}$.
\end{theorem}

\begin{corollary} \label{cor:makev-score}
For any score-equivalent decomposable scoring criterion, the increase
in score that results from a valid application of the MakeV operator
is:
$$ 
s(z,\Par{z}^{+y} \cup \ComNei{x}{y}^{-z+x}) + s(y,\Par{y} \cup
\ComNei{x}{y}^{-z})\\- s(z,\Par{z} \cup \ComNei{x}{y}^{-z+x}) - s(y,\Par{y} \cup
\ComNei{x}{y})
$$
\end{corollary}

We provide the proofs of these validity conditions and scores in the
appendix. The proofs for each of the operators follow the same basic
approach, depicted in Figure \ref{fig:approach}. Given the current
state $\cpdag$, we construct a consistent extension $\gr$ such that
the operator can be applied directly to $\gr$, resulting in a DAG
$\gr'$ that is shown to be a consistent extension of the PDAG $\pdag'$
that results from applying the operator to $\cpdag$.  Most of the work
involved in the proofs lies in showing that the validity conditions
for each operator are necessary and sufficient; the local score for
each operator is an immediate corollary of each of the corresponding
validity proofs.

\begin{figure}[th] 
\begin{center}
\leavevmode
\epsfxsize=4.25in
\epsffile{Figures/approach.eps}
\end{center}
\caption{Approach taken to prove each of the operator results.
\label{fig:approach}
}
\end{figure}

In Table \ref{tab:main} we summarize the results of the six theorems
and the corresponding six corollaries. In particular we provide, for
each of the E-space operators, both necessary and sufficient
conditions for that operator to be valid, and the increase in score
that results from applying that operator.  The table, in conjunction
with the two algorithms presented in Section \ref{sec:trans}, should
be sufficient for practitioners to implement an efficient search
algorithm in E-space.

\begin{table}[th] 
\begin{center}
\begin{tabular}{l|l|l}
Operator  &  Validity Tests & Change in Score \\
\hline 
\hline
& & \\[-0.3in]
\parbox{0.75in}{InsertU \\ $x-y$} &
\parbox{2.25in}{
\begin{tabbing}
(1) \= Every undirected path from $x$ \\
\> to $y$ contains a node in $\ComNei{x}{y}$ 
\end{tabbing}
(2) $\Par{x} = \Par{y}$
} &  
\parbox{2.25in}{
$s(y, \ComNei{x}{y}^{+x} \cup \Par{y}) - s(y, \ComNei{x}{y}
\cup \Par{y})$
} \\
& & \\[-0.1in]
\hline 
& & \\[-0.1in]
\parbox{0.75in}{DeleteU \\ $x-y$} & 
\parbox{2.25in}{
$\ComNei{x}{y}$ is a clique
}
&
\parbox{2.25in}{$s(y, \ComNei{x}{y} \cup \Par{y}) - s(y,
\ComNei{x}{y}^{+x} \cup \Par{y})$ } 
\\
& & \\[-0.1in]
 \hline
& & \\[-0.2in]
\parbox{0.75in}{InsertD \\ $x \rightarrow y$} & 
\parbox{2.25in}{
\begin{tabbing}
(1) \= Every semi-directed path from \\
\> $y$ to $x$ contains a node in
$\ParNei{x}{y}$
\end{tabbing}
(2) $\ParNei{x}{y}$ is a clique \\
(3) $\Par{x} \not = \Par{y}$
}
&
\parbox{2.25in}{
$s(y, \ParNei{x}{y} \cup \Par{y}^{+x}) - 
s(y, \ParNei{x}{y} \cup \Par{y})$}
\\
& & \\[-0.1in]
\hline
& & \\[-0.1in]
\parbox{0.75in}{DeleteD \\ $x \rightarrow y$} & 
\parbox{2.25in}{$\Nei{y}$ is a clique} 
&
\parbox{2.25in}{$s(y,  \Nei{y} \cup \Par{y}^{-x}) - 
s(y, \Nei{y} \cup \Par{y})$}
\\
& & \\[-0.1in]
\hline
& & \\[-0.2in]
\parbox{0.75in}{ReverseD \\ $x \rightarrow y$} & 
\parbox{2.25in}{
\begin{tabbing}
(1) \= Every semi-directed path from \\
\> $x$ to $y$ that does not include the \\
\> edge $x \rightarrow y$ contains a node in \\
\> $\ParNei{y}{x} \cup \Nei{y}$
\end{tabbing}
(2) $\ParNei{y}{x}$ is a clique
}
&
\parbox{2.25in}{
$s(y,\Par{y}^{-x}) + s(x,\Par{x}^{+y} \cup \ParNei{y}{x}) \\
- s(y,\Par{y}) - s(x,\Par{x} \cup \ParNei{y}{x})$} 
\\
& & \\[-0.1in]
\hline
& & \\[-0.1in]
\parbox{0.75in}{MakeV \\$x \rightarrow z \leftarrow y$} & 
\parbox{2.25in}{Every undirected path between
$x$ and $y$ contains a node in $\ComNei{x}{y}$} 
&
\parbox{2.25in}{
$s(z,\Par{z}^{+y} \cup \ComNei{x}{y}^{-z+x}) + s(y,\Par{y} \cup
\ComNei{x}{y}^{-z})\\- s(z,\Par{z} \cup \ComNei{x}{y}^{-z+x}) - s(y,\Par{y} \cup
\ComNei{x}{y})$
}
\\
\hline
\end{tabular}
\end{center}
\caption{Necessary and sufficient validity conditions and (local)
change in score for each operator in E-space}
\label{tab:main}
\end{table}

We now consider how to efficiently test the ``path'' validity
conditions. Consider a data structure for a PDAG that allows for
random access to all of the nodes, and direct access to all adjacent
nodes (i.e., parents, children and neighbors) of each node. We can test
for the presence of semi-directed or undirected paths that do not pass
through some set $S$ as follows. First, we mark each node in $S$, and
then we perform a depth-first search from the source node (following
the appropriate links, depending on the type of path) looking for the
destination node, where the search is not allowed to pass through any
marked node (or any previously-visited node). This algorithm takes
time $O(|S| + |E|)$ in the worst case, where $|E|$ is the number of
edges in the graph. If the algorithm ever becomes a bottleneck, a
search algorithm can postpone checking the path conditions until the
operator is tentatively chosen. There is a tradeoff with this trick
because typically search algorithms need the score of an operator to
determine whether or not it is chosen; if we postpone the validity
test, we may end up unnecessarily scoring invalid operators.

In the experiments presented in Section \ref{sec:experiments}, the
time taken to test the validity conditions for the operators was
insignificant.
 
\section{Experimental Results} \label{sec:experiments}

In this section, we evaluate how well a greedy search algorithm
performs, both in terms of solution quality and in terms of search
time, when searching through the space of equivalence classes of
structures. We compare these results to the same algorithm applied to
the space of individual DAGs, in the context of model selection. For
simplicity, we restrict our attention to domains in which all
variables are discrete, but note that there exist score-equivalent
scoring criteria for continuous variables.

Greedy search is a simple algorithm that, given the current state,
always moves to the adjacent state that increases the score the
most. If no adjacent state has a higher score, the algorithm
terminates. In all of our experiments, we used greedy search to
perform model selection; that is, we identify a single equivalence
class with a high score.

We used a Bayesian scoring criterion for discrete variables in all of
our experiments. The criterion, called the {\em BDeu} criterion, is
derived by Heckerman et al. (1995).\footnote{The BDeu criterion is a
version of the BDe criterion that is ``uninformed'' in the sense that
the parameter prior is completely specified by a single equivalent
sample size.} The criterion measures the relative posterior
probability that the distribution from which the data was generated
has the independence constraints given in the DAG. The BDeu criterion
uses a parameter prior that has uniform means, and requires both a
prior equivalence sample size and a structure prior. For all of our
experiments, we used a prior equivalent sample size of ten, and a
structure prior of $0.001^{f}$, where $f$ is the number of free
parameters in the DAG.  Let $q_i$ denote the number of configurations
of the parent set $\Par{x_i}$, and let $r_i$ denote the number of
states of variable $x_i$. Then the version of the BDeu criterion used
in our experiments is:
\begin{equation} \label{eq:bdeu}
S_{BDeu}(\gr, \data) = \log \prod_{i=1}^n 0.001^{(r_i - 1)q_i}
\prod_{j=1}^{q_i}
\frac{\Gamma(\frac{10}{q_i})}{\Gamma(\frac{10}{q_i}+N_{ij})} \cdot
\prod_{k=1}^{r_i} \frac{\Gamma(\frac{10}{r_i \cdot q_i} +
N_{ijk})}{\Gamma(\frac{10}{r_i \cdot q_i})}
\end{equation} 
where $N_{ijk}$ is the number of records in $\data$ for which $x_i =
k$ and $\Par{x_i}$ is in the $jth$ configuration, and $N_{ij} = \sum_k
N_{ijk}$. $\Gamma(\cdot)$ is the {\em Gamma} function, which satisfies
$\Gamma(y + 1) = y \Gamma(y)$ and $\Gamma(1) = 1$. Note from Equation
\ref{eq:bdeu} that our scoring criterion is decomposable. Heckerman et
al. (1995) show that it is score equivalent.

We performed our experiments using the following six real-world
datasets.

\begin{enumerate}
\item {\bf Microsoft Web Training Data (MSWeb)}

This dataset, which is available via anonymous ftp from the UCI
Machine Learning Repository, contains 32711 instances of users
visiting the www.microsoft.com web site on one day in 1996. For each
user, the data contains a variable indicating whether or not that user
visited each of the 292 areas (i.e., ``vroots'') of the site.

\item {\bf Nielsen}

The Nielsen dataset contains data about television watching behavior
during a two-week period in 1995. The data was made available courtesy
of Nielsen Media Research. The data records whether or not each user
watched five or more minutes of network TV shows aired during the
given time period. There were 3314 users in the study, and 402
television shows.

\item {\bf EachMovie}

The EachMovie dataset consists of viewer ratings on movies. The data
was collected during an 18-month period beginning in 1995. We used the
ratings of the 300 most popular movies by 61,265 viewers. The rating
is a discrete variable that is either missing, or is provided as an
integer from one to five.

\item {\bf Media Metrix}

This dataset contains demographic and internet-use data for 4808
individuals during the month of January 1997. We used only the
internet-use variables in our experiments; there are 13 such variables
that indicate the category of web site visited.

\item {\bf 1984 United States Congressional Voting Records (HouseVotes)}

This dataset contains the 1984 congressional voting records for 435
representatives voting on 17 issues, and is available via anonymous ftp from
the UCI Machine Learning Repository. Votes are all three-valued: yes,
no, or unknown. For each representative, the political party is given;
this dataset is typically used in a classification setting to predict
the political party of the representative based on the voting record.

\item {\bf Mushroom}

The Mushroom dataset, available via anonymous ftp from the UCI Machine
Learning Repository, contains physical characteristics of 8124
mushrooms, as well as whether each mushroom is poisonous or
edible. There are 22 physical characteristics for each mushroom, all
of which are discrete.

\end{enumerate}

For all datasets, we assume that values are {\em not} missing at
random. In particular, we treat ``missing'' as a distinct, discrete
state.

Given a particular scoring function, the goal of the search algorithm
in the context of model selection is to identify the single structure
with the highest score.  Consequently, the score of the best state
found by the search algorithm is the best indicator of the algorithm's
performance, and therefore we emphasize the score when we compare
algorithm performance in the two spaces. Many researchers have
compared scoring functions and search algorithms {\em together}. In
this case, learning accuracy is most often measured in terms of a
holdout score. To be consistent with previous work, we have included
such a score for all of our experiments; our holdout score is simply
the average log probability that the selected model assigns to each
case in the holdout set.

Our experiments can be described as follows. Because we include a
holdout score in our results, we first split each of the datasets into
a training set and a test set, consisting of approximately 70 percent
and 30 percent of the total data, respectively. Then, for each
dataset, we applied a greedy search algorithm in both B-space and
E-space, starting each search in the state for which all variables are
mutually independent (i.e., the DAG/PDAG containing no edges). At each
step in the search, we recorded the score achieved by the algorithm as
well as the time taken so far. 

\begin{figure}[th] 
\begin{center}
\leavevmode
\epsfxsize=4.75in
\epsffile{Figures/AllResults.eps}
\end{center}
\caption{Score as a function of time for the six datasets.
\label{fig:AllResults}
}
\end{figure}

In Figure \ref{fig:AllResults}, we give plots showing the score as a
function of elapsed time for each operation application in both
spaces.  For all datasets, the search algorithm moves more quickly to
higher-scoring equivalence classes in E-space. The most remarkable
instance of this is in the EachMovie dataset, where the algorithm
applied to E-space reaches a local maximum in about a third of the
time than it does in B-space.

Of particular interest is the ``gap'' in the curve of Figure
\ref{fig:AllResults}(a). The algorithm in E-space initially moves much
more quickly than in B-space, but then ``stalls'' after 153 operator
moves and the algorithm applied to B-space catches up. After further
investigation, we found that in E-space a MakeV operator was applied
on the 153rd operation, and the {\em majority} of the edges changed
from being undirected to directed; consequently, a large number of
operators that were previously not valid (i.e., directed edge
insertions and directed edge reversals) became valid and thus needed
to be scored. This is a problem with the fact that although all of the
operators can be scored locally, the result of {\em applying} an
operator is a non-local change to the equivalence class. As is evident
from the figure, such a radical change as the one in Figure
\ref{fig:AllResults}(a) is not common, but it is certainly a
concern. Despite the unlucky large change in the equivalence class,
the time to reach a local maximum remains competitive with the search
in B-space.

In Table \ref{tab:results} we summarize the results from the local
maximum reached by each of the algorithms. In the table we report, for
each dataset, (1) the score per training case of the local maximum for
each space, (2) the relative improvement of the per-case score in
E-space to the per-case score in B-space, (3) the holdout score of the
local maximum for each space, (4) the relative improvement of the
holdout score in E-space to the holdout score in B-space, (5) the
ratio of the time spent in B-space to the time spent in E-space. The
relative improvements for the per-case scores and the holdout scores
are computed as the score in E-space minus the score in B-space
divided by the score in B-space. Relative improvements are reported as
opposed to absolute differences so that results may be compared across
the different domains.

\begin{table}[t] 
\begin{center}
\begin{tabular}{l|l|l|l|l|l|l|l}
	& B-space & E-space & Rel. & B-space & E-space & Rel. & Time \\ 
Dataset & Score   & Score   & Imp. & Holdout & Holdout & Imp. & Ratio\\ \hline
MSWeb		& -10.2857 &	-10.2801&	0.0005&	-0.0339& -0.0339&0.0004&0.95\\
Nielsen		& -42.0734 &	-41.9104&	0.0039&	-0.0911& -0.0909&0.0017&0.96\\
EachMovie	& -116.6294&	-116.4836&	0.0013&	-0.3745& -0.3763&-0.0047&2.902\\
Media Metrix	& -9.7208  &	-9.7192	&       0.0002&	-0.7278& -0.7265&0.0017&1.49\\
HouseVotes	& -12.7228 &	-12.7228&	0.0000&	-0.6006& -0.6006&0.0000&1.27\\
Mushroom	&-13.3674 &	-13.8971&      -0.0382&	-0.4873& -0.5563&-0.1241&2.81\\
\end{tabular}
\end{center}
\caption{Results for the model selected by the greedy algorithm
applied to both spaces.}
\label{tab:results}
\end{table}

In all but the Mushroom and HouseVotes datasets, the search in E-space
found a better local maximum---that is, the search identified a model
with a higher value of the scoring criterion---than did the search in
B-space; the two methods identified the same model for the HouseVotes
datasets. As shown by the holdout scores, the models with higher
values of the particular scoring criterion we used in our experiments
tended to have superior out-of-sample prediction accuracy as well. We
should note, however, that the goal of the search algorithm is to
identify the model with the highest possible score; it is somewhat
misleading to compare {\em search algorithms} based on a predictive
score because differences in this score are dependent on the quality
of the scoring criterion.

For both the EachMovie dataset and the Mushroom dataset, the search in
E-space found a local maximum almost three times faster than the
search in B-space. The search in E-space was also significantly faster
in both Media Metrics and HouseVotes. For MSWeb and Nielsen, the
search in E-space was roughly five percent slower than the search in
B-space. The reason the search algorithm can be faster in E-space is
that it often has to score fewer operators. In particular, for a state
containing an undirected edge in E-space, the only operator that the
algorithm needs to consider for that edge is a deletion. When the same
state is considered in B-space, however, the algorithm considers both
a reversal and a deletion of the edge.

To investigate the number of DAG representations for the equivalence
classes that the greedy search visited in E-space, we counted, at each
step of the algorithm, the number of nodes in each undirected
component of the completed PDAG. We get a lower bound on the number of
DAGs in the equivalence class by multiplying these counts together;
every node in an undirected component can be selected as a root node
in that component to yield {\em at least} one unique configuration of
the corresponding directed edges in a consistent extension (see the
appendix). If the undirected component is not a tree structure, there
can be many more such configurations.  In Table \ref{tab:count} we
show, for each dataset, the average and maximum lower bound on the
number of DAGs per equivalence class out of all states visited by the
search algorithm applied to E-space.

As we see from the table, the number of DAGs per equivalence class in
those states that we encounter during our greedy search can be
enormous for some of the datasets, and is significant in all of
them. This suggests that despite the fact that the ratio of DAGs to
equivalence classes may be $3.7$ when considering all possible
equivalence classes defined for a set of nodes (see discussion in
Section \ref{sec:background}), it is important to consider the size of
the particular equivalence classes that the search algorithm is likely
to encounter.

\begin{table}[t] 
\begin{center}
\begin{tabular}{l|l|l}
Dataset 	& Average Lower Bound & Maximum Lower Bound \\ \hline
MSWeb		& $301,000$ & $10,300,000$ \\
Nielsen		& $9.7 \times 10^{19}$ & $7.1 \times 10^{21}$\\
EachMovie	& $22,300$ & $570,000$ \\
Media Metrix	& $8.5$ & $13$ \\
HouseVotes	& $8.6$ & $15$ \\
Mushroom	& $11.5$ & $22$ 
\end{tabular}
\end{center}
\caption{Average and maximum lower bound on the number of DAGs per
equivalence class}
\label{tab:count}
\end{table}

\section{Discussion} \label{sec:conclusion}

In this paper, we have argued that in many situations search
algorithms should search over the set of equivalence classes of
Bayesian-network structures instead of individual structures. We
specified a search space that can be used by any heuristic search
algorithm to explicitly search through the space of equivalence
classes. We showed how all of our operators can be scored locally, and
thus an algorithm searching our search space with a decomposable
scoring criterion is able to evaluate adjacent states efficiently. We
provided experimental evidence, using six real-world datasets, that
suggest that a greedy search in our space can yield slightly better
results in less time than the same search applied to the traditional
space of DAGs.

The results in Section \ref{sec:experiments} do not provide
overwhelming evidence that E-space should be favored over B-space when
using greedy search. This is not too surprising: because a greedy
search algorithm is being applied to B-space, the search is
necessarily moving to a new equivalence class for each operator. Thus
one of the major concerns with using B-space---namely, a loss of
efficiency because the search wastes time traversing within an
equivalence class---is not an issue. Furthermore, because our
experiments were done in the context of model selection, the only
real difference between the two spaces is connectivity.

Our experimental results do, however, demonstrate the efficiency of
traversing through the space of equivalence classes. This suggests
that we can get significant improvements in the running time of
non-greedy search algorithms that can waste time in B-space traversing
redundant representations of equivalence classes. Examples of such
algorithms include Monte-Carlo sampling algorithms (e.g., Madigan et
al., 1996), as well as any number of systematic search algorithms such
as best-first search or limited discrepancy search. We hope that our
contributions will lead to successful implementations of non-greedy
search algorithms.

We saw in Section \ref{sec:experiments} that the (lower bound of the)
number of redundant representations of equivalence classes visited by
greedy search was {\em significantly} higher than the hypothesized
$3.7$ average across {\em all} equivalence classes for a given set of
nodes. This result shows that the degree to which multiple
representations will be a problem is highly dependent on the states
visited by the search algorithm. We suspect that for most (systematic)
search algorithms that spend the majority of time in reasonably sparse
graphs, the average number of redundant representations will be on the
same order as that encountered by the greedy search. It would be
interesting to more rigorously pursue a characterization of those
equivalence classes in which the number of DAGs members is small, and
try to determine whether or not sequences of such classes are likely
to be encountered by heuristic search algorithms.

An interesting extension to this work would be to derive an efficient
set of operators for which the neighbor states of each equivalence
class correspond to the {\em inclusion boundary} defined by Kocka,
Bouckaert and Studeny (2001). The inclusion boundary of an equivalence
class can be explained as follows. Consider the set of all DAGs
contained within a particular equivalence class. To each such DAG, we
can associate a set of equivalence classes that correspond to the DAGs
that result from a single (directed) edge addition or deletion from
that DAG. By taking the union, over all DAGs in the equivalence class,
of these off-by-one equivalence classes, we get the inclusion
boundary.  The inclusion boundary was originally used by Meek (1997)
in a search algorithm that---under the assumption that the ``Meek
Conjecture'' is true---is optimal for infinite data.

Another interesting extension to this work would be to consider the
case when we know the orientation of some of the edges in the ``true''
DAG model; this scenario can occur when learning causal models for
which we know the direction of some of the causal influences. Meek
(1995) considers this ``prior knowledge'' problem and derives rules
for extracting consistent extensions of equivalence classes that are
consistent with the known edges. Our operators would need to be
modified so that only those equivalence classes that are consistent
with the prior knowledge are considered during search.

\nocite{KBS01}
\nocite{Meek97}

\section*{Acknowledgments}

I would like to thank Chris Meek, Dan Geiger, David Heckerman and
Michael Perlman for useful discussions and suggestions. I would also
like to thank the anonymous reviewers for their help with improving
this paper.

%\bibliography{c:/texmf/bibtex/bib/max-master} 

\bibliographystyle{theapa}
\input{staticbib.bbl}

\appendix

\section{Detailed Proofs}

In this section, we provide detailed proofs of the main results
presented in this paper. In Section \ref{sec:prelim-appendix}, we
provide some additional definitions and simple results that will be used
frequently throughout the remainder of the proofs to follow. As shown
in Figure \ref{fig:approach} of Section \ref{sec:main}, our approach
to proving that each operator can be scored locally is to construct a
consistent extension $\gr$ of the current completed PDAG such that the
operator can be applied directly to $\gr$. Proving that such a
consistent extension exists is non-trivial: in Section \ref{sec:mcs}
we prove various general results about consistent extensions that
allow us to guarantee that, for each operator, a consistent extension
with the desired properties exist.

Recall from Section \ref{sec:space} that we place an additional
requirement on the insert operators: we only allow a directed
(undirected) edge addition if the edge is compelled (reversible) in
the resulting equivalence class. This property is important because it
disambiguates the result of the operator for scoring
purposes. Detecting whether or not an inserted edge is compelled or
not turns out to be trivial: an addition between nodes $x$ and $y$
will be compelled if and only if $\Par{x} \not = \Par{y}$. Proving
that this test is correct, however, was not trivial. In Section
\ref{sec:pain}, we provide the proof that this simple test is
correct. 

In Section \ref{sec:op1} through Section \ref{sec:op6}, we use the
results from Section \ref{sec:mcs} and Section \ref{sec:pain} to
prove, for each operator, that the (necessary and sufficient) validity
conditions and increases in score given in Table \ref{tab:main} are
correct. Finally, in Section \ref{sec:complete}, we prove Theorem
\ref{th:complete}; that is, we prove that the operators are
complete.

\subsection{Definitions and Preliminary Results}
\label{sec:prelim-appendix} 

We use the following definitions throughout the proofs. We say a
directed edge $x \rightarrow y$ is {\em covered} if $\Par{x} = \Par{y}
\setminus x$. Similarly, we say an undirected edge is covered if
$\Par{x} = \Par{y}$. A node $x$ is {\em higher} than a node $y$ in a
DAG or PDAG if there is a directed path from $x$ to $y$. 

The following two results, which apply to PDAGs and therefore also
DAGs, provide simple tests for determining that an edge must be
compelled, based on other edges.

\begin{proposition} \label{prop:rule1}
Let $\pdag$ be any PDAG that admits a consistent extension and
contains a compelled edge $x \rightarrow y$. If there is an edge,
either directed or undirected, between $y$ and some node $z$ such that
$z$ and $x$ are not adjacent, then that edge is compelled.
\end{proposition}
{\bf \noindent Proof:} Any consistent extension of $\pdag$ containing
$y \leftarrow z$ contains a v-structure that cannot exist in any
consistent extension containing $y \rightarrow z$, and it follows
immediately from Theorem \ref{th:vp} that the edge therefore cannot be
reversible. \qed

\begin{proposition} \label{prop:rule2}
Let $\pdag$ be any PDAG that admits a consistent extension such that
there is a directed path from $x$ to $y$ consisting of compelled
edges. If there is an edge between $x$ and $y$, it is compelled as $x
\rightarrow y$.
\end{proposition}
{\noindent \bf Proof:} Follows because the directed path from $x$ to
$y$ must also exist in any consistent extension, and because
consistent extensions must be acyclic. \qed

The next lemma was proved by Chickering (1995). We use the result so
frequently that we find it convenient to refer to the lemma as the
{\em Triangle Lemma}.

\begin{lemma} {\bf (Chickering, 1995)} \label{lem:triangle}
Let $\{x,y,z\}$ be any three nodes that form a clique of size three in
PDAG $\pdag$. If any two of the edges in the clique are reversible,
then the third edge is reversible as well.
\end{lemma}

Some of the operators include the condition that either
$\ComNei{x}{y}$ or $\ParNei{x}{y}$ form a clique. A result of the
Triangle Lemma is that for any pair of nodes that both neighbors of
some node, if those nodes are adjacent then they must be connected by
an undirected edge. Consequently, the condition that these sets form a
clique is equivalent to the condition that these sets form a clique
consisting entirely of undirected edges.

\begin{lemma} \label{lem:covered}
If $x - y$ is an undirected edge in a completed PDAG, then $\Par{x}
= \Par{y}$.
\end{lemma}
{\noindent \bf Proof:} Suppose not. Without loss of generality, let
$z$ be any parent of $x$ that is not a parent of $y$. The nodes $z$
and $y$ must be adjacent, lest $x-y$ would be compelled (and hence
directed) by Proposition \ref{prop:rule1}.  If $y \rightarrow z$, then
$x-y$ would be compelled (and hence directed) by Proposition
\ref{prop:rule2}. By the Triangle Lemma (Lemma \ref{lem:triangle}), no
undirected edge can exist between $z$ and $y$. \qed

Lemma \ref{lem:covered} also follows from Condition (iii) in Theorem
4.1 of Andersson et al. (1997).

\begin{lemma} \label{lem:covered-component}
For any directed edge $x \rightarrow y$ in a completed PDAG, $x$ is a
parent of every node reachable by $y$ via undirected edges.
\end{lemma}
{\noindent \bf Proof:} Follows by a repeated application of Lemma
\ref{lem:covered} along the edges in any undirected path. \qed

\begin{lemma} \label{lem:components}
If there is a directed path of length one or more from $x$ to $y$ in a
completed PDAG $\cpdag$, then $x$ and $y$ are not in the same
undirected component.
\end{lemma}
{\noindent \bf Proof:} Suppose not. Consider the last edge $z
\rightarrow y$ in the directed path. By Lemma
\ref{lem:covered-component}, $z$ is a parent of every node in the
undirected component containing $y$, which means that $z$ is a parent
of $x$. Because there is a directed path from $x$ to $z$, $\cpdag$
contains a cycle, contradicting the fact that it is a completed
PDAG. \qed. 

Lemma \ref{lem:components} also follows from the fact that, as proved
by Andersson et al. (1997), a completed PDAG is a chain graph.

\begin{lemma} \label{lem:localInsert}
Let $\cpdag$ denote a completed PDAG, and let $\pdag'$ denote the PDAG
that results from adding a single edge (directed or undirected)
between $x$ and $y$ to $\cpdag$. Consider any consistent extension
$\gr$ of $\cpdag$, and the directed graph $\gr'$ that results by
inserting a directed edge between $x$ and $y$ such that the result is
a DAG with the same adjacencies as $\pdag'$. Any v-structure in $\gr'$
but not in $\pdag'$, or any v-structure in $\pdag'$ but not in $\gr'$,
must include the edge between $x$ and $y$.
\end{lemma}
{\noindent \bf Proof:}
Let $a \rightarrow b \leftarrow c$ be any v-structure in $\gr'$ but
not in $\pdag'$, and assume that neither edge is $x \rightarrow y$ or
$y \rightarrow x$. Because $a$ and $c$ are not adjacent in $\gr'$,
they must also not be adjacent in $\gr$ because $\gr$ contains a
strict subset of the adjacencies of $\gr'$. Because $\gr$ and $\cpdag$
have the same adjacencies, and $\gr'$ and $\pdag'$ have the same
adjacencies, $a$ and $c$ are not adjacent in any of the four
graphs. Because neither edge in the v-structure is between $x$ and
$y$, this v-structure must exist in $\gr$, and consequently the edges
are compelled and the v-structure exists in $\cpdag$. Because $a$ and
$c$ are not adjacent in $\pdag'$, this v-structure remains after
adding the edge between $x$ and $y$, yielding a contradiction.

Let $a \rightarrow b \leftarrow c$ be any v-structure in $\pdag'$ but
not in $\gr'$, and assume that neither edge is the one that was added
to $\cpdag$. This means that the v-structure existed in $\cpdag$, and
consequently the edges are compelled and the v-structure exists in
$\gr$. Because $a$ and $c$ are not adjacent in $\pdag'$, they cannot
be adjacent in $\gr'$, and thus $\gr'$ must contain the v-structure,
yielding a contradiction. \qed 

\begin{lemma} \label{lem:localDelete}
Let $\cpdag$ denote a completed PDAG, and let $\pdag'$ denote the PDAG
that results from deleting a single edge (directed or undirected)
between $x$ and $y$ from $\cpdag$. Consider any consistent extension
$\gr$ of $\cpdag$, and the directed graph $\gr'$ that results by
deleting the (directed) edge between $x$ and $y$ in $\gr$ such that
the result is a DAG with the same adjacencies as $\pdag'$. Any
v-structure in $\gr'$ but not in $\pdag'$, or any v-structure in
$\pdag'$ but not in $\gr'$, must be of the form $x \rightarrow z
\leftarrow y$.
\end{lemma}
{\noindent \bf Proof:} Let $a \rightarrow b \leftarrow c$ be any
v-structure in $\gr'$ but not in $\pdag'$, and assume that $\{a,c\}$
is not $\{x,y\}$. By definition of a v-structure, $a$ and $c$ are not
adjacent in $\gr'$, and because the edges in $\gr$ are the same as in
$\gr'$ except for the extra edge between $x$ and $y$, $a \rightarrow b
\leftarrow c$ must exist in $\gr$. This implies the edges are
compelled and exist in $\cpdag$. Because $\pdag'$ has the same
adjacencies as $\gr'$, these edges were unaffected by the deletion of
the edge between $x$ and $y$ from $\cpdag$, and consequently $a
\rightarrow b \leftarrow c$ must exist in $\pdag'$, yielding a
contradiction.

Let $a \rightarrow b \leftarrow c$ be any v-structure in $\pdag'$ but
not in $\gr'$, and assume that $\{a,c\}$ is not $\{x,y\}$. Because
only the edge between $x$ and $y$ was deleted, $a$ and $c$ are not
adjacent in $\cpdag$. Furthermore, because the edges in $\pdag'$ are
the same as in $\cpdag$ except for the absence of the edge between $x$
and $y$, $a \rightarrow b \leftarrow c$ must exist in $\cpdag$. This
implies that the v-structure must also exist in $\gr$. Because the
v-structure exists in $\pdag'$, neither edge participating in the
v-structure was changed by the deletion in either $\cpdag$ or $\gr$,
and thus the v-structure exists in $\gr'$, yielding a
contradiction. \qed 

\subsection{Extracting Consistent Extensions from Completed PDAGs}
\label{sec:mcs}

In this section, we first describe an algorithm that is a simple
modification to the well-known {\em maximum cardinality search} of
Tarjan and Yannakakis (1984) for determining whether or not a graph is
chordal, and if so, identifying a consistent extension of that graph.
We then show how we can apply this algorithm to extract consistent
extensions from completed PDAGs, such that these consistent extensions
have useful properties for the purpose of proving our main results.

\nocite{TY84}

We first need some definitions.  A graph is {\em chordal} if every
undirected cycle of length at least four has a chord; that is, the
cycle contains a ``short cut'' undirected edge that connects two
non-consecutive nodes in the cycle. A {\em maximal undirected
component} $K$ of a completed PDAG $\cpdag$ is a connected subgraph of
$\cpdag$, where each connecting edge is undirected, and for every node
$x \in K$, if there is an undirected edge $y - x$ in $\cpdag$, then
$y$ is in $K$. For the remainder of this paper, we use {\em
undirected component} to mean maximal undirected component.  We say
that $y$ is a {\em reversible parent} of $x$ in a PDAG or DAG if the
edge $y \rightarrow x$ is reversible. Similarly, we say that $y$ is a
{\em compelled parent} of $x$ if $y \rightarrow x$ is compelled.

The following result was proved by both Meek (1995) and Andersson,
Madigan and Perlman (1997):

\begin{theorem} \label{th:chordal}
The undirected components of a completed PDAG are chordal.
\end{theorem}

In the following lemma, we show that we can extract a consistent
extension from a completed PDAG by independently extracting consistent
extensions from the chordal components of that PDAG. 

\begin{lemma} \label{lem:indep-ce}
Let $\cpdag$ be any completed PDAG and let ${\bf K} = \{K_1, ...,
K_m\}$ be the set of undirected components of $\cpdag$. For any set
$\{\gr_1, ..., \gr_m\}$, where $\gr_i$ is any consistent extension of
the undirected component $K_i$, the graph $\gr$ that results from
replacing each reversible edge in $\cpdag$ with the directed edge from
the consistent extension of the corresponding undirected component is
a consistent extension of $\cpdag$.
\end{lemma}
{\noindent \bf Proof:}
The graph $\gr$ clearly has the same skeleton as $\cpdag$. Any
v-structure in $\gr$ that is not in $\cpdag$ must contain one
reversible edge $x \rightarrow y$ and one compelled edge $y \leftarrow
z$ from $\cpdag$ because none of the chordal components contain
compelled edges. But this is impossible by Proposition
\ref{prop:rule1}. Any v-structure in $\cpdag$ also exists in $\gr$
because $\gr$ contains the same compelled edges and adjacencies as
$\cpdag$. 

It remains to be shown that $\gr$ is acyclic. Suppose there exists a
cycle. Any cycle must include both reversible and compelled edges from
$\cpdag$ because (1) each $\gr_i$ is individually acyclic, (2) any
path that contains edges from two separate $\gr_i$ and $\gr_j$ must
pass through at least one directed edge, and (3) there cannot be a
directed cycle among the compelled edges in $\cpdag$. Consider the
{\em shortest} such cycle, and let $y \rightarrow z$ be any edge in
the cycle that is reversible in $\cpdag$ such that the previous edge
$x \rightarrow y$ is compelled in $\cpdag$. We know that $x$ and $z$
must be adjacent in both $\cpdag$ and $\gr$, else by Proposition
\ref{prop:rule1}, $y \rightarrow z$ could not be reversible. We know
by the Triangle Lemma (Lemma \ref{lem:triangle}) that the edge between
$z$ and $x$ must be compelled. If $x \rightarrow z$, then there is a
shorter cycle, contradicting the fact that we have considered the
shortest. If $x \leftarrow z$, then there is a directed path from $z$
to $y$ in $\cpdag$, which implies by Proposition \ref{prop:rule2} that
the edge between $y$ and $z$ must be compelled, yielding a
contradiction. \qed

Tarjan and Yannakakis (1984) provide an algorithm that can be used to
extract a consistent extension from any (chordal) undirected component
of a completed PDAG. The algorithm proceeds as follows: number the
nodes in the component from $n$ to $1$ in decreasing order, always
selecting next a node adjacent to the largest number of previously
numbered vertices, breaking ties arbitrarily. Tarjan and Yannakakis
(1984) show that if a component is chordal, which every undirected
component in $\cpdag$ must be, then for any pair of non-adjacent nodes
$x$ and $y$, every path between $x$ and $y$ contains a node that has a
higher label than either $x$ or $y$ (or both).

Given a total ordering $\tau$, we say an undirected edge $x-y$ is {\em
directed to be consistent with $\tau$} if it is directed $x
\rightarrow y$ if $\tau(x) > \tau(y)$ and is directed $y \rightarrow
x$ otherwise.

A consequence of the Tarjan and Yannakakis (1984) result is that by
directing all edges in an undirected component to be consistent with
the total ordering from a maximum cardinality search, where higher
ordered nodes always proceed lower ordered nodes, it is impossible to
create a v-structure. Furthermore, because the edges are directed to
be consistent with a total ordering, there can be no cycle in the
resulting component. We emphasize this result with the following
lemma.  We use {\em MCS-ordering} to denote a total ordering that can
be obtained by a maximum cardinality search.

\begin{lemma} \label{lem:mcs}
Let $\{\tau_1, ..., \tau_m\}$ denote a set of MCS-orderings over the
nodes in the $m$ undirected components $K_1, ..., K_m$ of a completed
PDAG $\cpdag$. The DAG $\gr$ that results from directing each
undirected edge within $K_i$ in $\cpdag$ to be consistent with the
corresponding ordering $\tau_i$ is a consistent extension of $\cpdag$.
\end{lemma}
{\noindent \bf Proof:} Follows immediately from Lemma
\ref{lem:indep-ce} and from the fact that each $\tau_i$ yields a
consistent extension of the undirected component $K_i$. \qed

Lemma \ref{lem:mcs} leads to an efficient algorithm---shown in Figure
\ref{fig:algmcs}---for extracting a consistent extension of a
completed PDAG. Note that this algorithm takes as input a {\em
completed} PDAG; it does not work for PDAGs in general, and thus
cannot be used in place of algorithm {\sc PDAG-to-DAG} from Section
\ref{sec:trans}. 

\begin{figure}[t]
\noindent {\sc \bf Algorithm DAG-MCS($\cpdag$)} \\
\noindent {\sc \bf Input:} completed PDAG $\cpdag$ \\
\noindent {\sc \bf Output:} DAG $\gr$ that is a consistent extension
of $\cpdag$
\begin{tabbing}
1. \= Set $\gr = \cpdag$\\
2. \> \parbox[t]{5in}{For each undirected component $K$ in $\cpdag$} \\
3. \> \hspace{0.1in} \parbox[t]{5in}{Mark every node in $K$ as
``unprocessed''}\\
4. \> \hspace{0.1in} \parbox[t]{5in}{{\bf While} there are unprocessed nodes
in $K$}\\
5. \> \hspace{.2in} \parbox[t]{4.8in}{Select the unprocessed node $x$
with the most parents that are in $K$, breaking ties arbitrarily}\\
6. \> \hspace{.2in} \parbox[t]{4.8in}{Direct all undirected edges
incident to $x$ away from $x$ in $\gr$.}\\
7. \> \hspace{.2in} \parbox[t]{4.8in}{Mark $x$ as processed}\\
\end{tabbing}
\caption{Algorithm DAG-MCS which extracts a consistent extension from
a completed PDAG.}
\label{fig:algmcs}
\end{figure}

\begin{lemma} \label{th:mcsalg}
Algorithm DAG-MCS returns a valid consistent extension of the
completed PDAG.
\end{lemma}
{\bf \noindent Proof:} Clearly, the algorithm identifies an
MCS-ordering within each of the undirected components: the number of
parents from $K$ for each node at step 4 is precisely the number of
previously ordered adjacent nodes from the maximum cardinality search
algorithm. By directing the incident undirected edges away from a node
whenever that node is selected in step 4, the resulting component
edges are consistent with the MCS-ordering. \qed 

As we see below, the ``breaking ties arbitrarily'' step in the maximum
cardinality search algorithm allows us some flexibility in choosing
the edge directions in a consistent extension. We use {\em undirected
clique} to denote a clique where every edge is undirected.

\begin{lemma} \label{Xlem:mcs-cliqueX}
Let $X = \{x_1, ..., x_n\}$ be the nodes from any undirected clique of
size $n$ within some undirected component $K$ of a completed PDAG
$\cpdag$, and let $\tau$ denote any total ordering of the nodes in
$X$. There exists a consistent extension of $\cpdag$ for which (1) the
edge orientations among the nodes in $X$ are consistent with $\tau$,
and (2) any edge between $x_i$ and a node $y \in K$ that is not in $X$
is oriented as $x_i \rightarrow y$.
\end{lemma}
{\noindent \bf Proof:} Consider the application of the DAG-MCS
algorithm to $K$. When all nodes from $K$ are unprocessed, we select
$\tau(1)$ in step 4. That is, we break the initial
zero-parents-from-$K$ tie among all nodes in $K$ by choosing
$\tau(1)$. Because the nodes in $X$ form a clique, as long as we
continue to process nodes within $X$ at step 4, all unprocessed nodes
in $X$ will have the same maximum number of parents from within $K$;
consequently when selecting the $ith$ node from $K$ at step 4, we can
always break the tie in favor of $\tau(i)$, and thus condition (1)
holds. After processing all of the nodes in $X$, all undirected edges
incident to every node $x \in X$ will be directed away from $x$, and
thus condition (2) holds.

\begin{lemma} \label{lem:cliqueNeighbors}
Let $\cpdag$ be a completed PDAG that admits a consistent extension,
and let $x$ and $y$ be any pair of nodes that are NOT adjacent. Then
$\ComNei{x}{y}$ is a clique of undirected edges.
\end{lemma}
{\noindent \bf Proof:} Let $w$ and $z$ be any pair of common neighbors
in $\ComNei{x}{y}$ that are not connected by an undirected edge. From
the Triangle Lemma (Lemma \ref{lem:triangle}), we know that $w$ and
$z$ cannot be adjacent. But this implies that $x-z-y-w-x$ is a
four-cycle in an undirected component of $\cpdag$ without a chord,
which contradicts the fact that the undirected components of $\cpdag$
are chordal. \qed


The following lemma uses the results developed in this section to show
how to construct a particular consistent extension that will be used
later to prove the validity conditions and local score for an edge
insertion. Recall that a reversible parent $y$ of a node $x$ is a
parent for which the edge $y \rightarrow x$ is reversible.

\begin{lemma} \label{lem:dpath}
Let $\cpdag$ be any completed PDAG, and let $x$ and $y$ be any pair of
nodes that are NOT adjacent. Every undirected path between $x$ and $y$
passes through a node in $\ComNei{x}{y}$ if and only if there exists a
consistent extension in which (1) $x$ has no reversible parents, (2)
all nodes in $\ComNei{x}{y}$ are parents of $y$, and (3) $y$ has no
other reversible parents.
\end{lemma} 

{\noindent \bf Proof (only if):} We first show that all three
conditions hold if every path between $x$ and $y$ passes through a
node in $\ComNei{x}{y}$. Conditions (1) and (2) hold for any completed
PDAG: from Lemma \ref{lem:cliqueNeighbors}, the set $\ComNei{x}{y}$
forms a clique of undirected edges in $\cpdag$, and because every node
in $\ComNei{x}{y}$ is a neighbor of $x$, the set $\{x\} \cup
\ComNei{x}{y}$ is a clique as well. Thus by Lemma
\ref{Xlem:mcs-cliqueX}, we can construct a consistent extension where
within the component $K$ containing $\ComNei{x}{y}$, $x$ is first (a
root node in $K$), all nodes in $\ComNei{x}{y}$ are next, and all
nodes in $\ComNei{x}{y}$ are parents of $y$.  Note that if
$\ComNei{x}{y}$ is empty, (2) holds trivially. It remains to be shown
that there exists such a consistent extension where $y$ has no other
reversible parents (condition 3).

Suppose that no such consistent extension exists, and let $\gr$ be any
consistent extension constructed as described above, using Algorithm
DAG-MCS.  Let $A \subseteq K$ denote the set of ancestors of $y$ in
$\gr$ that are contained in the undirected component $K$ but are not
contained in $\ComNei{x}{y} \cup x$. From Lemma \ref{lem:components},
it follows that for every edge $e$ in a directed path from a node in
$A$ to $y$ in $\gr$, $e$ is reversible in $\cpdag$. This means that
when component $K$ was being processed by the algorithm, every node in
$A$ was chosen in step 4 before $y$. Consider the step when the {\em
first} node $a_f$ from $A$ was chosen by the algorithm when processing
component $K$. The only processed parents of $a_f$ from $K$ at this
point could be (1) the members of $\ComNei{x}{y}$ or (2) the node $x$;
otherwise, such a parent belongs to $A$, contradicting the fact that
$a_f$ is the first one chosen. If $a_f$ has $x$ as a processed parent,
this means there's an undirected path from $x$ to $y$ that does not
pass through a node in $\ComNei{x}{y}$, which is a contradiction. Thus
the only processed parents of $a_f$ can belong to $\ComNei{x}{y}$, and
therefore $a_f$ had less than or equal to $|\ComNei{x}{y}|$ processed
parents from $K$. If $a_f$ had less than $|\ComNei{x}{y}|$ processed
parents from $K$, $y$ would have been chosen instead of $a_f$ by the
algorithm. Otherwise, we could have broken the tie between $a_f$ and
$y$ in favor of $y$ in step 4 of the algorithm, in which case all
incident undirected edges would have been directed away from $y$.

{\noindent \bf (if):} We now show that if there exists an undirected
path between $x$ and $y$ that does not pass through $\ComNei{x}{y}$,
then at least one of the three conditions does not hold.

Let $x - a_1 - \ldots - a_k - y$ denote the {\em shortest} undirected
path between $x$ and $y$ such that for each $a_i$, $a_i \not \in
\ComNei{x}{y}$. We now show that in any consistent extension of
$\cpdag$ for which $x$ has no reversible parents (condition 1) and all
members of $\ComNei{x}{y}$ are parents of $y$ (condition 2), the last
edge $a_k - y$ in the path is directed into $y$, thus violating
condition 3.

Suppose there exists a consistent extension where the last edge is
oriented as $a_k \leftarrow y$. Let $a \rightarrow b$ be the edge in
the path {\em closest} to $y$---that is, fewest edges away from $y$
along the path---that is oriented in the direction of $y$. We know
this edge must exist because $x$ has no reversible parents. Because $a
\rightarrow b$ is the closest edge to $y$ in the path with this
orientation, we know the next edge is oriented as $b \leftarrow c$. We
know $a$ and $c$ must be adjacent, lest there would be the v-structure
$\vstruct{a}{b}{c}$ containing reversible edges. By the Triangle
Lemma, the edge connecting $a$ and $c$ must be reversible in
$\cpdag$. Because $x$ an $y$ are not adjacent, at least one of $a$ or
$c$ must be an intermediate node (i.e., not equal to $x$ or $y$) along
the path. But this means there is a shorter path between $x$ and $y$,
yielding a contradiction. \qed

\begin{corollary} \label{cor:dpath}
If there exists a consistent extension satisfying the three conditions
of Lemma \ref{lem:dpath}, then there exists such a consistent
extension for which (1) the reversible parents of each node in
$\ComNei{x}{y}$ are contained in $\{x \cup \ComNei{x}{y}\}$, and (2)
the edges among the nodes in $\ComNei{x}{y}$ are oriented to be
consistent with any total ordering over those nodes.
\end{corollary}
{\noindent \bf Proof:} Both conditions follows from the construction
of $\gr$ in the ``only if'' part of Lemma \ref{lem:dpath}. Condition
(1) follows immediately. Condition (2) follows from the fact that
$\gr$ is constructed without specifying the order of the nodes in
$\ComNei{x}{y}$, and by Lemma \ref{Xlem:mcs-cliqueX} which guarantees
that the edges can be oriented to be consistent with any total
ordering. \qed

\subsection{Compelled and Reversible Edge Insertions} \label{sec:pain}

In this section, we prove the following theorem.

\begin{theorem} \label{th:pain}
Let $\cpdag$ be any completed PDAG for which nodes $x$ and $y$ are not
adjacent. If the result of adding an edge between $x$ and $y$ is a
PDAG that admits a consistent extension, then that edge is compelled
if and only if $\Par{x} \not = \Par{y}$.
\end{theorem}

This result is important for determining which of the two insertion
operators (InsertU or InsertD) is valid for a given completed
PDAG. Unfortunately, although the result is simple enough to state and
test, we have not found a {\em simple} proof for the result. Our
approach to proving Theorem \ref{th:pain} relies on the results of a
particular rule-based method (see Section \ref{sec:trans}) for
extracting completed PDAGs from DAGs.

Meek (1995) derives three rules that can be used to identify the
completed PDAG corresponding to a DAG. In particular, all edges of a
DAG are first made to be undirected, except for those edges that
participate in a v-structure. Then, the algorithm repeatedly chooses
one of the three rules, and applies that rule to direct one of the
undirected edges. The algorithm terminates when none of the three
rules can be applied.

The three rules are as follows:
\begin{enumerate}
\item If $w \rightarrow x - y$ with $w$, $y$ not adjacent, then direct
the edge between $x$ and $y$ as $x \rightarrow y$
\item If $x \rightarrow z \rightarrow y$ and $x-y$, then direct the
edge between $x$ and $y$ as $x \rightarrow y$. 
\item If there is a v-structure $\vstruct{w}{y}{z}$ and a node $x$
such that $x$ is a neighbor of $w$, $y$, and $z$, then direct the edge
between $x$ and $y$ as $x \rightarrow y$.
\end{enumerate}

In the proofs to follow, we refer to the rules above as Rule 1, Rule
2, and Rule 3, respectively.  By our Proposition \ref{prop:rule1}, we
see that Rule 1 correctly orients undirected edges, as long as all
previously oriented edges are compelled. Similarly, Rule 2 follows by
an argument similar to our Proposition \ref{prop:rule2}. Rule 3 can be
understood by considering all possible orientations of the undirected
edges in the configuration; only those containing $x \rightarrow y$
are a consistent extension of the configuration subgraph.

Meek (1995) proves that (1) the three rules are sound, and (2) they
are complete given that the edges participating in v-structures are
directed. The result is that not only are we guaranteed that the
procedure described above will identify a completed PDAG, but that it
will do so without regard to the order in which compelled edges are
identified. This will prove important below; we will consider applying
the rules to PDAGs in which edges that are known to be compelled are
directed before the rules are applied to create the completed PDAG.

One reason that Theorem \ref{th:pain} is difficult to prove is that
after inserting edges into a completed PDAG, edges that were
reversible before can become compelled, and edges that were compelled
before can become reversible (as an example, see the operator
shown in Figure \ref{fig:scheme}). In the following lemmas and
corollaries, we demonstrate a method for determining which edges in a
completed PDAG will necessarily remain compelled and reversible after
an edge addition, based on the topology of the completed PDAG.

\begin{proposition} \label{prop:no-creverse}
Let $\cpdag$ be a completed PDAG, and let $\pdag'$ denote the PDAG
that results from an edge addition or deletion. If $\pdag'$ admits a
consistent extension, then any compelled edge $x \rightarrow y$ in
$\cpdag$ cannot be compelled in the opposite direction in $\pdag'$.
\end{proposition}
{\bf \noindent Proof:} Follows because any consistent extension of
$\pdag'$ that contains an edge between $x$ and $y$ has that edge
oriented as $x \rightarrow y$. \qed

\begin{lemma} \label{lem:stay-compelled}
Let $\cpdag$ be any completed PDAG, and let $\pdag'$ denote the PDAG
that results from adding a new adjacency between $a$ and $b$. For any
edge $x \rightarrow y$ in $\cpdag$ that is compelled in $\cpdag$ but
is reversible in $\pdag'$, there is a directed path of length
zero or more from both $a$ and $b$ to $y$ in $\cpdag$.
\end{lemma}
{\noindent \bf Proof:} Suppose this is not the case. Consider any
ordered set of rules applied by the Meek (1995) procedure to orient
the edges in $\cpdag$, staring with all edges undirected except for
those participating in a v-structure. Let $x \rightarrow y$ be the
{\em first} edge whose corresponding rule does not apply after the
addition, such that $y$ is not a descendant of $a$ and $b$ in
$\cpdag$.

We now consider each of the three rules that could have been used to
direct $x \rightarrow y$ in $\cpdag$. Suppose it was Rule 1. Because
the rule does not match after the addition, either the edge $w
\rightarrow x$ from the rule is no longer directed, or $w$ and $y$ are
now adjacent (or both). If $w \rightarrow x$ is no longer directed, we
know that $x$ is a descendant of both $a$ and $b$ because $x
\rightarrow y$ is the first (now undirected) edge for which this
property does not hold. But because $y$ is a descendant of $x$ in
$\cpdag$, it must also be a descendant of both $a$ and $b$, and we
conclude that $w \rightarrow x$ must still be directed. The only
new adjacency is the edge between $a$ and $b$, and therefore (without
loss of generality) $a = w$ and $b = y$. Because there is a directed
path from $w$ to $y$ in $\cpdag$, there is a directed path of length
zero or more from both $a$ and $b$ to $y$, yielding a contradiction. 

Suppose $x \rightarrow y$ was directed by Rule 2. Because the rule no
longer applies, either $x \rightarrow z$ or $z \rightarrow y$ (or
both) is no longer directed. In either case, because both $z$ and $y$
are ancestors of $y$, it follows that $y$ must be a descendant of both
$a$ and $b$, yielding a contradiction. Thus $x \rightarrow y$ could
not have been directed by Rule 3.

Suppose $x \rightarrow y$ was directed by Rule 3. The only way in
which this rule can no longer apply to (undirected) $x-y$ is if $w$
and $z$ are adjacent after the addition; it is easy to show that if
either of the other two undirected edges in the configuration are
previously directed, the remaining edges will be directed by a series
of applications of Rule 1 and Rule 2. This implies that (without loss
of generality) $a=w$ and $b=z$. But in $\cpdag$, both $w$ and $z$ are
parents of $y$, and therefore there is a directed path from both $a$
and $b$ to $y$. We conclude that $x \rightarrow y$ could not have been
directed by Rule 3.

Because $x \rightarrow y$ was not directed by any of the three rules,
it must participate in a v-structure in $\cpdag$ that no longer exists
as a result of the addition of the edge between $a$ and $b$. But
clearly this implies that $a$ and $b$ are both parents of $y$. Thus no
edge $x \rightarrow y$ can exist, and the lemma follows. \qed

The following corollary is essentially a re-statement of Lemma
\ref{lem:stay-compelled}, except that we use the topology of the {\em
result} of the edge addition.

\begin{corollary} \label{cor:stay-compelled-pdag}
Let $\cpdag$ be any completed PDAG, and let $\pdag'$ denote the PDAG
that results from adding a new adjacency between $a$ and $b$. For any
edge $x \rightarrow y$ in $\cpdag$ that is compelled in $\cpdag$ but
is reversible in $\pdag'$, there is a directed path of length
zero or more from both $a$ and $b$ to $y$ in $\pdag'$ that does not
pass through the edge between $a$ and $b$.
\end{corollary}
{\noindent \bf Proof:} Follows immediately from Lemma
\ref{lem:stay-compelled} and the fact that every directed edge in
$\cpdag$ is also directed in $\pdag'$. \qed

\begin{corollary} \label{cor:stay-compelled}
Let $\cpdag$ be any completed PDAG, and let $\pdag'$ denote the PDAG
that results from adding the directed edge $a \rightarrow b$. For any
compelled edge $x \rightarrow y$ in $\cpdag$ such that there a
directed path of length zero or more from both $x$ and $y$ to $b$ in
$\pdag'$, either $y = b$ or that edge remains compelled as $x
\rightarrow y$ in $\pdag'$.
\end{corollary}
{\noindent \bf Proof:} Suppose not. By Lemma \ref{lem:stay-compelled},
there is a directed path from both $a$ and $b$ to $y$ in $\cpdag$. If
$b \not = y$, this implies that $\pdag'$ is cyclic because there is a
directed path from $y$ to $b$. \qed.

\begin{lemma} \label{lem:stay-compelled-head}
Let $\cpdag$ be any completed PDAG, and let $\pdag'$ denote the PDAG
that results from adding the directed edge $a \rightarrow b$. If there
exists a compelled edge $x \rightarrow b$ in $\cpdag$ that is not
compelled in $\pdag'$, then either (1) there exists a v-structure
$\vstruct{a}{b}{z}$ in $\pdag'$, or (2) there exists a length two
directed path $a \rightarrow z \rightarrow b$ in $\cpdag$ such that $a
\rightarrow z$ remains compelled in $\pdag'$.
\end{lemma}
{\noindent \bf Proof:} From Corollary \ref{cor:stay-compelled-pdag}, there
exists a directed path in $\pdag'$ from $a$ to $b$ that does not pass
through the edge $a \rightarrow b$. Let $z \rightarrow b$ be the last
edge in this path. If $z$ is not adjacent to $a$, then the v-structure
$\vstruct{a}{b}{z}$ exists in $\pdag'$.  If $z$ and $a$ are adjacent,
then the edge must be directed as $a \rightarrow z$ in both $\cpdag$
and $\pdag'$ because there is a directed path from $a$ to $z$ in
$\cpdag$. This edge and the edge $z \rightarrow b$ together constitute
the length two directed path. Because $z \not = b$, it follows from
Corollary \ref{cor:stay-compelled} that $a \rightarrow z$ is compelled
in $\pdag'$. \qed

The following lemma complements Lemma \ref{lem:covered} presented in
the previous section.

\begin{lemma} \label{lem:not-covered}
If $x \rightarrow y$ is a directed edge in a completed PDAG, then the
edge is not covered.
\end{lemma}
{\noindent \bf Proof:} If the edge participates in a v-structure,
$\vstruct{x}{y}{z}$, then $z$ is a parent of $y$ that is not a parent
of $x$. If the edge does not participate in a v-structure, then it can
be directed by applying one of the three rules at some point in the
Meek (1995) procedure for constructing a completed PDAG.

If Rule 1 applied, then there is a node $z$ that is a parent of $x$
that is not adjacent to $y$. If Rule 2 applied, then there exists a
parent $z$ of $y$ that is a child of (and hence not a parent of) $x$.
Assume that Rule 3 applied. Then there is a v-structure
$\vstruct{w}{y}{z}$ such that $x$ is a neighbor of $w$, $y$, and
$z$. Because both edges $x - w$ and $x - z$ are undirected at the time
$x \rightarrow y$ is directed via Rule 3, the v-structure
$\vstruct{w}{x}{z}$ cannot exist in the resulting completed PDAG. Thus
at least one of these edges is {\em not} directed into $x$. Without
loss of generality, assume that $w$ is not a parent of $x$ in the
completed PDAG. Then $w$ is a parent of $y$ that is not a parent of
$x$, and consequently $x \rightarrow y$ is not covered. \qed

Lemma \ref{lem:not-covered} also follows from Condition (iv) in
Theorem 4.1 of Andersson et al. (1997).

\begin{lemma} \label{lem:stay-reversible}
Let $x - y$ be any undirected edge in a completed PDAG $\cpdag$, and
let $\pdag'$ denote the PDAG that results from inserting $a
\rightarrow b$ into $\cpdag$. If there is a directed path from both
$x$ and $y$ to $b$ in $\pdag'$, then $x - y$ is reversible in
$\pdag'$.
\end{lemma}
{\noindent \bf Proof:} Suppose this is not the case. Let $\cpdag'$ be
the completed PDAG corresponding to $\pdag'$, and (without loss of
generality) assume that the edge is directed as $x \rightarrow y$ in
$\cpdag'$. 

We now argue that in $\cpdag$, there cannot be a directed path of
length zero or more from $b$ to any of the nodes in the undirected
component $K_{x,y}$ that contains $x$ and $y$.  Suppose there exists
such a path. Because there is a directed path from both $x$ and $y$ to
$b$ in $\cpdag$, we know from Lemma \ref{lem:components} that $b$
cannot itself be contained in $K_{x,y}$, so any such path must be of
length at least one. Consider any directed path of length one or more
from $b$ to a node in $K_{x,y}$, and let $c \rightarrow w$ be the last
edge in this path. From Lemma \ref{lem:covered-component}, $c$ is a
parent of every node in $K_{x,y}$, including both $x$ and $y$, which
yields the contradiction that $\cpdag$ is cyclic.

Because there is no directed path from $b$ to any node in $K_{x,y}$ in
$\cpdag$, it follows from Corollary \ref{cor:stay-compelled} that
every edge $c \rightarrow d$ in $\cpdag$, where $d \in K_{x,y}$,
remains compelled in $\cpdag'$. Because the three rules of Meek (1995)
are sound, we can extract $\cpdag'$ by applying them to the PDAG
$\pdag''$ where all edges are initially undirected, except for the
edges participating in v-structures and those edges that connect the
parents of the nodes in $K_{x,y}$ to the nodes in $K_{x,y}$. For all
three rules, an undirected edge is only directed if it is not covered;
because all edges in $K_{x,y}$ are initially covered in $\pdag''$,
this means that in order for the {\em first} edge contained in
$K_{x,y}$ to be directed by one of the rules, there must an edge $v
\rightarrow w$ in $\cpdag'$ such that $w \in K_{x,y}$ and $v
\leftarrow w$ is in $\cpdag$. But this is impossible by Proposition
\ref{prop:no-creverse}, yielding a contradiction. \qed

\begin{lemma} \label{lem:undirected-comp}
Let $\cpdag$ be a completed PDAG for which $\Par{x} = \Par{y}$, where
$x$ and $y$ are not adjacent. Let $\pdag'$ denote the PDAG that
results from adding $x-y$ to $\cpdag$. Any undirected edge $a-b$ such
that $a$ and $b$ are reachable by an undirected path from either $x$
or $y$ in $\cpdag$ remains reversible in $\pdag'$.
\end{lemma}
{\noindent \bf Proof:} Suppose this is not the case, and let $a-b$ be
the {\em first} undirected edge with these properties that is directed
by an application of the rule-based orientation algorithm of Meek
(1995).

Suppose $a-b$ is directed by Rule 1. That means that there is a
compelled edge $c \rightarrow a$ in $\pdag'$, and that $c$ and $b$ are
not adjacent (our argument is symmetric in $a$ and $b$). If $c
\rightarrow a$ is compelled in $\cpdag$, it follows that $a-b$ would
be compelled in $\cpdag$ by Proposition \ref{prop:rule1}. By
Proposition \ref{prop:no-creverse}, we know that the edge cannot be
compelled as $c \leftarrow a$ in $\cpdag$, and thus it must be
undirected in $\cpdag$. But this implies that there is an edge $c-a$
in $\cpdag$, where both $c$ and $a$ are reachable by an undirected
path from either $x$ or $y$ in $\cpdag$, that was directed {\em
before} $a-b$, contradicting the fact that $a-b$ was the first
undirected edge with these properties to be directed.

Suppose $a-b$ is directed by Rule 2. That means there is a node $c$
such that $a \rightarrow c \rightarrow b$. Because $a-b$ is the first
edge to be chosen, we know that both $a \rightarrow c$ and $c
\rightarrow b$ are compelled in $\cpdag$. Furthermore, from
Proposition \ref{prop:no-creverse}, both of these edges have the same
orientation in $\cpdag$. Thus $a-b$ is not covered, and therefore
cannot be the same edge as $x-y$. We conclude by Proposition
\ref{prop:rule2} that the edge between $a$ and $b$ is compelled,
yielding a contradiction.

Suppose $a-b$ is directed by Rule 3. This means there is a v-structure
$\vstruct{c}{b}{d}$, and undirected edges $a-c$ and $a-d$ at the point
the rule is applied. Clearly, the v-structure must also exist in
$\cpdag$. $a$ and $c$ must be adjacent in $\cpdag$, else Rule 1 would
apply to the undirected edge $a-b$ because $c \rightarrow b$ is in
$\cpdag$. By the same argument, $a$ and $d$ must be adjacent. Because
$a-b$ is reversible in $\cpdag$, we deduce by the Triangle Lemma that
both the edge between $a$ and $c$ and the edge between $a$ and $d$
must be compelled. At least one of these edges must be directed away
from $a$, else $\cpdag$ would contain the extra v-structure
$\vstruct{c}{a}{d}$ that does not exist in $\pdag'$. But this implies
that $a-b$ would be directed by Proposition \ref{prop:rule2} in
$\cpdag$. \qed

We are now able to prove the main result of this section. 

\begin{lemma} \label{lem:consistent-insertd}
Let $\cpdag$ be a completed PDAG where nodes $x$ and $y$ are not
adjacent. Let $\pdag'$ denote the PDAG that results from adding the
edge $x \rightarrow y$ to $\cpdag$. If $\Par{x} \not = \Par{y}$ in
$\cpdag$, then $x \rightarrow y$ is compelled in $\pdag'$. 
\end{lemma}
{\noindent \bf Proof:} Given that $\Par{x} \not = \Par{y}$, either
there is a parent of $x$ that is not a parent of $y$, or a parent of
$y$ that is not a parent of $x$. 

Suppose $z$ is a parent of $x$ but not a parent of $y$ in
$\cpdag$. Because there is a directed path from both $z$ and $x$ to
$y$ in $\pdag'$, and because $x \not = y$, it follows from Corollary
\ref{cor:stay-compelled} that $z \rightarrow x$ remains compelled in
$\pdag'$.  If $z$ and $y$ are not adjacent in $\pdag'$, then $x
\rightarrow y$ is compelled by Proposition \ref{prop:rule1}. If there
is a reversible edge between $z$ and $y$ in $\cpdag$, then because
there is a directed path from both $z$ and $y$ to $y$ in $\pdag'$, the
edge remains reversible in $\pdag'$ by Lemma
\ref{lem:stay-reversible}. But this implies that $x \rightarrow y$ is
compelled in $\pdag'$ by the Triangle Lemma. If there is a compelled
edge between $z$ and $y$ in $\cpdag$, then by our supposition the edge
must be directed as $y \rightarrow z$, which implies that the
insertion of the edge from $x$ to $y$ created a cycle, contradicting
the fact that $\pdag'$ is a PDAG.

Suppose $z$ is a parent of $y$ but not a parent of $x$ in $\cpdag$. If
$x$ and $z$ are not adjacent, then $x \rightarrow y$ is compelled
because it participates in a v-structure. If $z$ and $x$ are adjacent
via an undirected edge, then because there is a directed path from
both $x$ and $z$ to $y$ in $\pdag'$, it follows by Lemma
\ref{lem:stay-reversible} that $x - z$ remains undirected in $\pdag'$.
If $z \rightarrow y$ remains compelled in $\pdag'$, it follows from
the Triangle Lemma that $x \rightarrow y$ is compelled. Otherwise, we
know from Lemma \ref{lem:stay-compelled-head} that either $x
\rightarrow y$ participates in a v-structure in $\pdag'$, in which
case it is compelled in $\pdag'$, or there is a length two directed
path $x \rightarrow w \rightarrow y$ in $\cpdag$ for which $x
\rightarrow w$ remains compelled in $\pdag'$. If $w \rightarrow y$
also remains compelled in $\pdag'$, then $x \rightarrow y$ is
compelled in $\pdag'$ by Proposition \ref{prop:rule2}. Otherwise $x
\rightarrow y$ is compelled in $\pdag'$ by the Triangle Lemma. \qed

\begin{lemma} \label{lem:consistent-insertu}
Let $\cpdag$ be a completed PDAG where nodes $x$ and $y$ are not
adjacent. Let $\pdag'$ denote the PDAG that results from adding the
edge $x - y$ to $\cpdag$. If $\Par{x} = \Par{y}$ in $\cpdag$, then $x
- y$ is reversible in $\pdag'$.
\end{lemma}
{\noindent \bf Proof:} Suppose not, and let $\cpdag'$ be the completed
PDAG corresponding to $\pdag'$. From Lemma \ref{lem:not-covered}, we
know $\Par{x} \not = \Par{y}$ in $\cpdag'$. This means that as a
result of the addition, the parent set of at least one of the nodes
changed. Without loss of generality, assume that the parent set of $y$
changed.

Suppose that there is a parent $z$ of $y$ in $\cpdag$ such that $z
\rightarrow y$ is not compelled in $\pdag'$. From Lemma
\ref{lem:stay-compelled}, there must be a directed path from $x$ to
$y$ in $\cpdag$. Let $w$ be the last node in such a path. Because
$\Par{x} = \Par{y}$ in $\cpdag$, $w$ is a parent of $x$, which implies
that $\cpdag$ is cyclic, yielding a contradiction.

Suppose there is a parent $z$ of $y$ in $\cpdag'$ that is not a parent
of $y$ in $\cpdag$. From Proposition \ref{prop:no-creverse}, $\cpdag$
must contain the undirected edge $z-y$. But it follows immediately
from Lemma \ref{lem:undirected-comp} that $z-y$ remains reversible in
$\pdag'$, yielding a contradiction. \qed \\

{\noindent \bf Theorem \ref{th:pain}} {\it Let $\cpdag$ be any
completed PDAG for which nodes $x$ and $y$ are not adjacent. If the
result of adding an edge between $x$ and $y$ is a PDAG that admits a
consistent extension, then that edge is compelled if and only if
$\Par{x} \not = \Par{y}$. \\ }

{\noindent \bf Proof:} If $\Par{x} \not = \Par{y}$, it follows from
Lemma \ref{lem:consistent-insertd} that if a {\em directed} edge is
inserted between $x$ and $y$, the resulting edge is compelled in the
resulting equivalence class. Suppose an {\em undirected} edge is
inserted between $x$ and $y$. Without loss of generality, assume the
edge is directed as $x \rightarrow y$ in the consistent
extension. Clearly, the consistent extension is also a consistent
extension of the PDAG that results from adding $x \rightarrow y$ to
$\cpdag$ instead of the undirected edge, and thus the edge is
compelled.

If $\Par{x} = \Par{y}$, it follows from Lemma
\ref{lem:consistent-insertu} that if an {\em undirected} edge is
inserted between $x$ and $y$, the resulting edge is reversible in the
resulting equivalence class. Suppose a {\em directed} edge is inserted
between $x$ and $y$. Without loss of generality, assume the edge is
directed as $x \rightarrow y$. We know that if we insert an undirected
edge between $x$ and $y$, the resulting edge is reversible, and thus
there exists a consistent extension that contains the edge $x
\rightarrow y$. Clearly, this consistent extension is also a
consistent extension of the PDAG that results from adding $x
\rightarrow y$ to $\cpdag$. \qed


\subsection{The InsertU Operator} \label{sec:op1}

{\noindent \bf Theorem \ref{th:insertu-valid}}
{\it
Let $x$ and $y$ be two nodes that are not adjacent in $\cpdag$. The
insertion of the undirected edge $x - y$ is valid if and only if (1)
every undirected path between $x$ and $y$ contains a node in
$\ComNei{x}{y}$, and (2) $\Par{x} = \Par{y}$. \\}

{\noindent \bf Proof (if):} Assume that every path between $x$ and $y$
contains a node in $\ComNei{x}{y}$, and that $\Par{x} = \Par{y}$. From
Lemma \ref{lem:dpath}, there exists a consistent extension $\gr$ for
which $x$ has no reversible parents, and the reversible parents of $y$
are precisely those nodes in $\ComNei{x}{y}$. We show that by
inserting the edge $x \rightarrow y$ into $\gr$, the DAG $\gr'$ that
results is a consistent extension of the PDAG $\pdag'$ that results
from applying the operator to $\cpdag$.

By construction of $\gr$, $\gr'$ cannot contain a cycle given that
$\gr$ is acyclic. Clearly, $\gr'$ has the same skeleton as
$\pdag'$. It remains to be shown that $\gr'$ and $\pdag'$ contain the
same v-structures.

Suppose there's a v-structure in $\gr'$ that is not in $\pdag'$. From
Lemma \ref{lem:localInsert}, this v-structure must include the edge $x
\rightarrow y$. By construction, all reversible parents of $y$ are in
$\ComNei{x}{y}$ and hence adjacent to $x$, so the other edge in the
v-structure must be compelled. But by assumption any compelled parent
of $y$ is also a compelled parent of $x$, so no such v-structure can
exist. Suppose there's a v-structure in $\pdag'$ that is not in
$\gr'$. From Lemma \ref{lem:localInsert}, this v-structure must
include the new edge $x-y$, which is impossible because it is
undirected.

To complete the proof, we note that the construction of $\gr$ is
completely symmetric in $x$ and $y$, and thus the resulting edge is
reversible in $\pdag'$. 

{\noindent \bf (only if):} Suppose there is an undirected path from
$x$ to $y$ in $\cpdag$ that does not include any node in
$\ComNei{x}{y}$, and consider the shortest such path. The length of
the path must be at least three, lest the path would pass through a
common neighbor of $x$ and $y$. The path has no chord because it is
shortest. This implies that after adding $x-y$ to $\cpdag$, there is
an undirected cycle of length four or more without a chord, and
consequently there is no consistent extension to $\pdag'$ and thus the
operator is not valid.

Suppose that $x$ and $y$ do not share compelled parents. By Theorem
\ref{th:pain} the added undirected edge is compelled in the resulting
equivalence class, and therefore the insertion is equivalent to
inserting a directed edge, and we conclude that the operator is not
valid. \qed \\

{\noindent \bf Corollary \ref{cor:insertu-score}}
{\it
For any score-equivalent decomposable scoring criterion, the increase
in score that results from a valid insertion of an undirected edge $x
- y$ to a completed PDAG $\cpdag$ is:
$$
s(y,\ComNei{x}{y}^{+x} \cup \Par{y}) - 
s(y,\ComNei{x}{y} \cup \Par{y})
$$} 

{\bf \noindent Proof:} Follows immediately by subtracting the score
of $\gr$ from the score of $\gr'$, where these DAGs are defined in
the ``if'' proof of Theorem \ref{th:insertu-valid}.  \qed 

\subsection{The DeleteU Operator} \label{sec:op2}

{\noindent \bf Theorem \ref{th:deleteu-valid}}
{\it
Let $x-y$ be an undirected edge in completed PDAG $\cpdag$. The
deletion of $x - y$ is valid if and only if $\ComNei{x}{y}$ is a
clique of undirected edges.
}\\

{\noindent \bf Proof (if):} Assume that $\ComNei{x}{y} = \{n_1, ...,
n_k\}$ is a clique of undirected edges. By definition of
$\ComNei{x}{y}$, and the fact that $x$ and $y$ are adjacent in
$\cpdag$, the set $\{x \cup y \cup \ComNei{x}{y}\}$ is also a clique
of undirected edges in $\cpdag$. From Lemma \ref{Xlem:mcs-cliqueX},
there exists a consistent extension where (1) the edges in this clique
correspond to the total ordering $x,n_1, ..., n_k, y$, (2) $x$ has no
reversible parents and (3) the reversible parents of $y$ are precisely
those nodes in $\{x \cup \ComNei{x}{y}\}$. Let $\gr$ denote any such
consistent extension, and let $\gr'$ denote the DAG that results from
deleting the edge $x \rightarrow y$ from $\gr$. We now show that
$\gr'$ is a consistent extension of the PDAG $\pdag'$ that results
from deleting $x-y$ from $\cpdag$.

Clearly, $\gr'$ has the same adjacencies as $\pdag'$. We now show that
$\gr'$ and $\pdag'$ have the same set of v-structures. From Lemma
\ref{lem:localDelete}, any v-structure that is in one but not the
other must have the form $\vstruct{x}{z}{y}$. 

Suppose $x \rightarrow z \leftarrow y$ is a v-structure in $\gr'$ that
is not in $\pdag'$. At least one of the two edges must be compelled
because by construction of $\gr$, all common neighbors of $x$ and $y$
are parents of $y$ in $\gr'$. If exactly one of the edges were
compelled, the 3-clique $\{x,y,z\}$ in $\gr$ would contain exactly two
reversible edges, which by the Triangle Lemma is impossible. If both
edges were compelled, then the v-structure exists in $\pdag'$.

Suppose $x \rightarrow z \leftarrow y$ is a v-structure in $\pdag'$
that is not in $\gr'$. This v-structure must consist of previously-compelled
edges from $\cpdag$ because only compelled edges are directed in a
completed PDAG. This implies that
$\gr$ contains both the edge $x \rightarrow z$ and the edge $y
\rightarrow z$, and consequently the v-structure must exist in
$\gr'$ once the edge between $x$ and $y$ is removed, yielding a
contradiction. 

{\noindent \bf (only if):} Assume $\ComNei{x}{y}$ does not form a
clique. This means that there exists common neighbors $n$ and $m$ such
that when $x-y$ is deleted, the resulting PDAG contains the four
cycle $x-n-y-m-x$ without a chord, and thus it does not admit a
consistent extension. \qed  \\

{\noindent \bf Corollary \ref{cor:deleteu-score}}
{\it
For any score-equivalent decomposable scoring criterion, the increase
in score that results from a valid deletion of an undirected edge
$x-y$ to a completed PDAG $\cpdag$ is:
$$ 
s(y,\ComNei{x}{y} \cup \Par{y}) - 
s(y,\ComNei{x}{y}^{+x} \cup \Par{y})
$$
}

{\noindent \bf Proof:} Follows immediately by subtracting the score of
$\gr$ from the score of $\gr'$, where these DAGs are defined in the
``if'' proof of Theorem \ref{th:deleteu-valid}.  \qed 

\subsection{The InsertD Operator} \label{sec:op3}

The main proof for the InsertD operator requires the following results
about semi-directed paths.

\begin{lemma} \label{lem:firstu}
Let $\cpdag$ be a completed PDAG that contains a semi-directed path
from $x$ to $y$. If there exists a directed edge $z \rightarrow w$ in
this path, then there exists a directed path from $z$ to $y$ in
$\cpdag$.
\end{lemma}
{\noindent \bf Proof:} We prove this by induction on the length of the
path segment from $z$ to $y$. For basis, we consider a length one
semi-directed path segment where $w=y$, and the lemma holds
trivially. Assume the lemma is true for path segments of length $k$,
and consider a length $k+1$ path segment from $z$ to $y$. If there are
no undirected edges in this segment, the segment itself constitutes a
directed path from $z$ to $y$. Otherwise, let $a-b$ be the first
undirected edge in the segment that follows $z \rightarrow w$, and let
$c \rightarrow a$ be the directed edge that precedes it. By
Proposition \ref{prop:rule1}, $c$ and $b$ must be adjacent. By the
Triangle Lemma, the edge must be compelled. By Proposition
\ref{prop:rule2} the edge must be directed as $c \rightarrow b$. By
replacing the two edges $c \rightarrow a$ and $a - b$ in the
semi-directed path with the single edge $c \rightarrow b$, we have a
length $k$ semi-directed path segment and the lemma holds by
induction. \qed

Lemma \ref{lem:firstu} can also be proven using two of the properties
of completed PDAGs given by Andersson et al. (1997).

\begin{corollary}
\label{cor:firstu}
Let $\cpdag$ be a completed PDAG. If $\cpdag$ contains a semi-directed
path from $x$ to $y$ consisting of intermediate nodes contained within
some set $N$, then the {\em shortest} semi-directed path whose
intermediate nodes are contained in $N$ consists of exactly two
consecutive segments, where the first segment consists entirely of
undirected edges and the second segment consists entirely of directed
edges.
\end{corollary}
{\noindent \bf Proof:} Follows by considering the argument in the
induction step of the proof of Lemma \ref{lem:firstu} applied to the
shortest path: because the semi-directed path is made shorter by
eliminating the first undirected edge that follows the first directed
edge by eliminating a single intermediate node, it follows that the
shortest semi-directed path can contain no such undirected edge. \qed

\begin{corollary}
\label{cor:twosegnoedge}
Let $\cpdag$ be a completed PDAG. If $\cpdag$ contains a semi-directed
path from $x$ to $y$ consisting of intermediate nodes contained within
some set $N$, then for any {\em shortest} semi-directed path whose
intermediate nodes are contained in $N$, there is no edge in $\cpdag$
that connects a pair of non-consecutive nodes along the path.
\end{corollary}
{\noindent \bf Proof:} Suppose this is not the case, and let $z$ and
$w$ be any pair of non-consecutive nodes from $N \cup \{x\} \cup
\{y\}$ that are connected by an edge in $\cpdag$. Without loss of
generality, assume that $z$ precedes $w$ along the path from $x$ to
$y$. Because we are considering the shortest path, the connecting edge
must be directed as $z \leftarrow w$. From Corollary \ref{cor:firstu},
we know the path consists of an undirected segment followed by a
directed segment. If $z$ follows the last undirected edge along this
path, then the edge $z \leftarrow w$ in conjunction with the directed
path from $z$ to $w$ constitutes a cycle. Otherwise, there is a path
of one or more undirected edges $z - n_1 - ... - n_k$ such that there
is a directed path from $n_k$ to $z$, which is impossible by Lemma
\ref{lem:components}. \qed
 
Recall that $\ParNei{x}{y} = \Par{x} \cap \Nei{y}$ is the set of
parents of node $x$ that are also neighbors of node $y$. \\

{\noindent \bf Theorem \ref{th:insertd-valid}} {\it Let $x$ and $y$ be
any two nodes that are not adjacent in completed PDAG $\cpdag$. The
insertion of $x \rightarrow y$ is valid if and only if (1) every
semi-directed path from $y$ to $x$ contains at least one node in
$\ParNei{x}{y}$, (2) $\ParNei{x}{y}$ is a clique of undirected edges,
and (3) $\Par{x} \not = \Par{y}$.}\\

{\noindent \bf Proof (if):} Suppose that the three conditions
hold. From condition (2) and the fact that $y$ is a neighbor of all
members in $\ParNei{x}{y}$, it follows that $\{y \cup \ParNei{x}{y}\}$
is a clique of undirected edges. Thus by Lemma \ref{Xlem:mcs-cliqueX},
there exists a consistent extension where the reversible parents of
$y$ are precisely those members of $\ParNei{x}{y}$. Let $\gr$ be any
such consistent extension, and let $\gr'$ denote the DAG that results
when we insert the edge $x \rightarrow y$ into $\gr$. We now show that
$\gr'$ is a consistent extension of the PDAG $\pdag'$ that results
from inserting $x \rightarrow y$ into $\cpdag$.

Suppose $\gr'$ contains a cycle. This means there's a directed path
from $y$ to $x$ in $\gr$, and a corresponding semi-directed path from
$y$ to $x$ in $\cpdag$ that passes through the same nodes. By
condition (1), this path must contain some compelled parent $z$ of $x$
that is a neighbor of $y$, which means there is a directed path from
$y$ to $z$ in $\gr$. By construction of $\gr$, however, $z$ is a
parent of $y$, which implies that $\gr$ is cyclic, contradicting the
fact that $\gr$ is a consistent extension of $\cpdag$.

Suppose there is a v-structure in $\pdag'$ that is not in $\gr'$. By
Lemma \ref{lem:localInsert}, this v-structure is of the form
$\vstruct{x}{y}{z}$. Because $x \rightarrow y$ is the only edge
changed by the operation, we know that both (1) $z \rightarrow y$ is
in $\gr'$, and (2) $x$ and $z$ are not adjacent in $\gr'$. But because
$x \rightarrow y$ was added to $\gr$ to create $\gr'$, the v-structure
$\vstruct{x}{y}{z}$ exists in $\gr'$.

Suppose there is a v-structure in $\gr'$ that is not in $\pdag'$. By
Lemma \ref{lem:localInsert}, this v-structure is of the form
$\vstruct{x}{y}{z}$. By construction, we know that $z \rightarrow y$
must be compelled in $\gr$ because the only reversible parents of $y$
are parents of $x$. But this means that $z \rightarrow y$ is directed
in $\cpdag$, and because the edge did not change as a result of the
operation, it must be directed in $\pdag'$ as well. Because the only
adjacency that changed from the operation is the one between $x$ and
$y$, it follows that $z$ and $x$ cannot be adjacent in $\pdag'$, and
consequently the v-structure $\vstruct{x}{y}{z}$ must exit in $\pdag$.
    
To complete the proof of the ``if'' part, we need to show that $x
\rightarrow y$ is compelled in $\pdag'$. This follows immediately from
Theorem \ref{th:pain} given that $\Par{x} \not = \Par{y}$ from
condition (3).

{\noindent \bf (only if):} We now show that if any of the three
conditions are violated, the insertion is not valid.

(1) Suppose there exists a semi-directed path in $\cpdag$ from $y$ to
$x$ that does not contain at least one node in $\ParNei{x}{y}$, and
consider the {\em shortest} such path. 

We first show that this path contains at least three edges. At least
one of the edges in the path must be directed, else $x$ and $y$ are in
the same undirected component in $\cpdag$ and by Lemma
\ref{lem:covered-component} we know that $\Par{x} = \Par{y}$, which in
turn implies by Theorem \ref{th:pain} that the edge would not be
compelled in the resulting equivalence class, and consequently the
insertion is not valid. From Corollary \ref{cor:firstu}, we know the
shortest semi-directed path from $y$ to $x$ (that does not contain a node
from $\ParNei{x}{y}$) consists of an undirected segment followed by a
directed segment. We know that there is at least one edge in the
undirected segment because otherwise there would be a directed path
from $y$ to $x$ and the result of the insertion is a cycle and thus
the insertion is invalid. We know that there must be at least one
additional edge in the path, lest the intermediate node of the path is
in $\ParNei{x}{y}$.

We now consider two additional properties of this shortest path. First, by
Corollary \ref{cor:twosegnoedge}, no two nodes can be adjacent along
the path. Second, because all directed edges are directed away
from $y$, there can be no pair of edges in the path that constitute a
v-structure. 

Combining these two properties with the fact that the path must
contain at least three edges, we see that after adding the edge $x
\rightarrow y$, {\em all} of the undirected edges in the path must be
directed away from $y$ in any directed graph that has the same set of
v-structures. But such an orientation is cyclic, so we conclude the
insertion is not valid.

(2) Suppose $\ParNei{x}{y}$ is {\em not} a clique of undirected
edges. Because $\ParNei{x}{y} \subseteq \Nei{y}$, we know by the
Triangle Lemma that any pair of nodes that are not connected by an
undirected edge are not adjacent. Consequently, there must exist a
v-structure $\vstruct{z}{x}{w}$ where both $z$ and $w$ are neighbors
of $y$. Consider the orientation of the edges $z-y$ and $w-y$ in any
consistent extension of the PDAG $\pdag'$ that results after inserting
$x \rightarrow y$ in $\cpdag$. Because $\Par{x}$ is non-empty, it
follows that $\Par{x} \not = \Par{y}$, and thus from Theorem
\ref{th:pain} we conclude that $x \rightarrow y$ must be compelled in
$\pdag'$. Furthermore, $z \rightarrow x$ and $w \rightarrow x$ remain
compelled in $\pdag'$ because they participate in a v-structure. But
this means that there is a compelled path both from $z$ to $y$ and
from $w$ to $y$ in $\pdag'$, and consequently both the edges $z-y$ and
$w-y$ must be compelled and directed into $y$ in any acyclic
consistent extension. Because $w$ and $z$ are not adjacent, however,
such a consistent extension would include the v-structure
$\vstruct{w}{y}{z}$ that is not in $\pdag'$, which is impossible by
definition of a consistent extension.

(3) If $\Par{x} = \Par{y}$, we know by Theorem \ref{th:pain} that the
edge would not be compelled in the resulting equivalence class, and
therefore the addition is not valid. \qed \\

{\noindent \bf Corollary \ref{cor:insertd-score}}
{\it
For any score-equivalent decomposable scoring criterion, the increase
in score that results from a valid insertion of a directed edge $x
\rightarrow y$ to a completed PDAG $\cpdag$ is:
$$ 
s(y,\ParNei{x}{y} \cup \Par{y}^{+x}) - 
s(y,\ParNei{x}{y} \cup \Par{y})
$$
}

{\noindent \bf Proof:} Follows immediately by subtracting the score of
$\gr$ from the score of $\gr'$, where these DAGs are defined in the
``if'' proof of Theorem \ref{th:insertd-valid}.  \qed

\subsection{The DeleteD Operator} \label{sec:op4}

{\noindent \bf Theorem \ref{th:deleted-valid}}
{\it
Let $x \rightarrow y$ be a directed edge in completed PDAG
$\cpdag$. The deletion of $x \rightarrow y$ is valid if and only if
$\Nei{y}$ is a clique of undirected edges.
}\\

{\noindent \bf Proof (if):} Assume $\Nei{y}$ is a clique of undirected
edges. Because all of these nodes are neighbors of $y$, $\Nei{y} \cup
y$ is also a clique of undirected edges, and by Lemma
\ref{Xlem:mcs-cliqueX}, there exists a consistent extension $\gr$ of
$\cpdag$ where the reversible parents of $y$ are precisely the
elements in $\Nei{y}$. We now show that the DAG $\gr'$ that results
from deleting $x \rightarrow y$ from $\gr$ is a consistent extension
of the PDAG $\pdag'$ that results from deleting $x \rightarrow y$ from
$\cpdag$.

Clearly, $\gr'$ and $\pdag'$ contain the same adjacencies. We now show
that $\gr'$ and $\pdag'$ have the same set of v-structures. From Lemma
\ref{lem:localDelete}, any v-structure that is in one but not the
other must have the form $\vstruct{x}{z}{y}$. 

Suppose the v-structure $\vstruct{x}{z}{y}$ exists in $\gr'$. We know
that $y \rightarrow z$ is a compelled edge in $\cpdag$ because by
construction of $\gr$, any undirected edge incident to $y$ in $\cpdag$
is directed into $y$ in $\gr$. Because $x \rightarrow y$ and $y
\rightarrow z$ are compelled in $\gr$, we know from Proposition
\ref{prop:rule2} that $x \rightarrow z$ is also compelled in
$\gr$. Consequently the v-structure must exist in $\pdag'$.

Suppose the v-structure $\vstruct{x}{z}{y}$ exists in $\pdag'$. Both
edges $x \rightarrow z$ and $y \rightarrow z$ are directed in $\cpdag$
because the only difference between $\cpdag$ and $\pdag'$ is the
presence of $x \rightarrow y$. Consequently, these edges have the same
orientation in both $\gr$ and $\gr'$, and thus the v-structure exists
in $\gr'$.

{\noindent \bf (only if):} Suppose that $\Nei{y}$ does not form a
clique in $\cpdag$. This means there are two nodes $z$ and $w$ such
that $z-y-w$ and $z$ and $w$ are not adjacent (we know by the Triangle
Lemma that $z$ and $w$ cannot be connected by a {\em directed}
edge). From Lemma \ref{lem:covered-component}, $x$ is a parent of both
$z$ and $w$ in $\cpdag$. We now show that $\pdag'$ does not admit a
consistent extension. Because the v-structure $\vstruct{z}{y}{w}$ does
not exist in $\pdag'$, any consistent extension must contain either
the edge $y \rightarrow w$ or the edge $y \rightarrow z$ (or
both). But these edges imply that the consistent extension contains
the v-structures $\vstruct{y}{w}{x}$ and $\vstruct{y}{z}{x}$,
respectively, neither of which are in $\pdag'$. \qed \\

{\noindent \bf Corollary \ref{cor:deleted-score}}
{\it
For any score-equivalent decomposable scoring criterion, the increase
in score that results from a valid deletion of a directed edge $x
\rightarrow y$ to a completed PDAG $\cpdag$ is:
$$ 
s(y,\Nei{y} \cup \Par{y}^{-x}) - 
s(y,\Nei{y} \cup \Par{y})
$$
}

{\noindent \bf Proof:} Follows immediately by subtracting the score of
$\gr$ from the score of $\gr'$, where these DAGs are defined in the
``if'' proof of Theorem \ref{th:deleted-valid}.  \qed 

\subsection{The ReverseD Operator} \label{sec:op5}

For the reversal operator, we need an additional result about
semi-directed paths.

\begin{lemma} \label{lem:semi-neighbor}
Let $x \rightarrow y$ be any edge in a completed PDAG $\cpdag$. If
there exists a semi-directed path from $x$ to $y$ in $\cpdag$ that
does not pass through $x \rightarrow y$, such that the path consists
of intermediate nodes contained within some set $N$, then the {\em
shortest} such path either (1) consists of two segments, the first of
which consists entirely of undirected edges and the second of which
consists entirely of directed edges, or (2) passes through a member of
$\Nei{y}$.
\end{lemma}
{\noindent \bf Proof:} Let $z$ be the next-to-last node (i.e.,
adjacent to $y$) in the shortest such semi-directed path from $x$ to
$y$. The sub-path that ends at $z$ clearly constitutes the shortest
semi-directed path from $x$ to $z$ that consists of nodes within $N$;
that is, this path is the shortest without the restriction that it
cannot pass through $x \rightarrow y$. From Corollary
\ref{cor:firstu}, this path consists of two segments, the first of
which consists entirely of undirected edges and the second of which
consists entirely of directed edges. Therefore if the edge between $z$
and $y$ is directed as $z \rightarrow y$, then (1) holds. If the edge
between $z$ and $y$ is undirected, then $z$ is a neighbor of $y$ and
(2) holds. The edge cannot be directed from $y$ to $z$ by definition
of semi-directed path. \qed. \\

{\noindent \bf Theorem \ref{th:reversed-valid}}
{\it
Let $x \rightarrow y$ be a directed edge in completed PDAG
$\cpdag$. The reversal of $x \rightarrow y$ is valid if and only if
(1) every semi-directed path from $x$ to $y$ that does not include the
edge $x \rightarrow y$ contains at least one node in $\ParNei{y}{x}
\cup \Nei{y}$, and (2) $\ParNei{y}{x}$ is a clique of undirected
edges.}\\

{\noindent \bf Proof (if):} Suppose that both conditions hold. From
condition (2) and the fact that $x$ is a neighbor of all members in
$\ParNei{y}{x}$, it follows that $\{x \cup \ParNei{y}{x}\}$ is a
clique of undirected edges. Thus by Lemma \ref{Xlem:mcs-cliqueX},
there exists a consistent extension where the reversible parents of
$x$ are precisely the members of $\ParNei{y}{x}$. Furthermore, because
there is a directed edge from $x$ to $y$, we know by Lemma
\ref{lem:components} that $x$ and $y$ are not in the same undirected
component; thus by Lemma \ref{lem:indep-ce} there exists such a
consistent extension where $y$ has no reversible parents. Let $\gr$ be
any such consistent extension, and let $\gr'$ be the graph that
results from reversing $x \rightarrow y$ in $\gr$. We now show that
$\gr'$ is a consistent extension of the PDAG $\pdag'$ that results
from reversing $x \rightarrow y$ in $\cpdag$.

Suppose $\gr'$ contains a cycle. This means there is a directed path
from $x$ to $y$ in $\gr$ that does not include the edge $x \rightarrow
y$, and a corresponding semi-directed path from $x$ to $y$ in $\cpdag$
that passes through the same nodes. By construction of $\gr$, $y$ is a
parent of every element in $\Nei{y}$, and consequently the path cannot
pass through any of these elements, lest $\gr$ would be cyclic. Thus
we conclude by condition (1) that the path must contain some element
$z \in \ParNei{y}{x}$, which means there is a directed path from $x$
to $z$ in $\gr$. By construction of $\gr$, however, $z$ is a parent of
$x$, which implies that $\gr$ is cyclic, contradicting the fact that
$\gr$ is a consistent extension of $\cpdag$.

Suppose there is a v-structure in $\pdag'$ that is not in
$\gr'$. Because no adjacencies have changed as a result of the
reversal, this v-structure is of the form $\vstruct{z}{x}{y}$. Because
the edge between $x$ and $y$ is the only edge changed by the
operation, we know that both (1) $z \rightarrow x$ is compelled in
$\cpdag$, and consequently exists in $\gr$, and (2) $y$ and $z$ are
not adjacent in $\gr$. But because $x \rightarrow y$ was reversed in
$\gr$ to create $\gr'$, the v-structure $\vstruct{z}{x}{y}$ exists in
$\gr'$.

Suppose there is a v-structure in $\gr'$ that is not in
$\pdag'$. Because no adjacencies have changed as a result of the
reversal, this v-structure is of the form $\vstruct{z}{x}{y}$. By
construction of $\gr$, we know that $z \rightarrow x$ must be
compelled in $\gr$ because the only reversible parents of $x$ are
parents of $y$. But this means that $z \rightarrow x$ is directed in
$\cpdag$, and because the edge did not change as a result of the
operation, it must be directed in $\pdag'$ as well. Because no
adjacencies changed from the operation, $z$ and $y$ cannot be adjacent
in $\pdag'$, and consequently the v-structure $\vstruct{z}{x}{y}$ must
exit in $\pdag'$.
    
{\noindent \bf (only if):} We now show that if either one of the two
conditions is violated, the reversal is not valid.

(1) Suppose there exists a semi-directed path from $x$ to $y$ that
does not include the edge $x \rightarrow y$ and does not contain at
least one node in $\ParNei{y}{x} \cup \Nei{y}$, and consider the {\em
shortest} such path. It follows from Lemma \ref{lem:semi-neighbor}
that the path must consist of two segments, the first of which
consists entirely of undirected edges and the second of which consists
entirely of directed edges.

We first show that this path contains at least three edges. There is
at least one edge in the directed segment of the path, else by Lemma
\ref{lem:covered-component}, $x$ would be a parent of itself. There is
at least one edge in the undirected segment because otherwise there
would be a directed path from $x$ to $y$ and the result of the
reversal is a cycle and thus the reversal is invalid. We know that
there must be at least one additional edge in the path, lest the
middle node of the path is in $\ParNei{y}{x}$.

We now consider two additional properties of this shortest
path. First, by Corollary \ref{cor:twosegnoedge}, no two nodes can be
adjacent along the path except for $x$ and $y$. Second, because all
directed edges are directed away from $x$, there can be no pair of
edges in the path that constitute a v-structure.

Combining these two properties with the fact that the path must
contain at least three edges, we see that after reversing the edge $x
\rightarrow y$, {\em all} of the undirected edges in the path must be
directed away from $x$ in any directed graph that has the same set of
v-structures. But such an orientation is cyclic, so we conclude the
reversal is not valid.

(2) Suppose $\ParNei{y}{x}$ is {\em not} a clique of undirected
edges. Because $\ParNei{y}{x} \subseteq \Nei{x}$, we know by the
Triangle Lemma that any pair of nodes that are not connected by an
undirected edge must not be adjacent. Consequently, there must exist a
v-structure $\vstruct{z}{y}{w}$ where both $z$ and $w$ are neighbors
of $x$. Consider the orientation of the edges $z-x$ and $w-x$ in any
consistent extension $\gr'$ of the PDAG $\pdag'$ that results after
reversing $x \rightarrow y$ in $\cpdag$.  At least one of these edges
must be directed away from $x$, lest $\gr'$ would contain the
v-structure $\vstruct{z}{x}{w}$ that is not in $\pdag'$. But if either
of these edges is directed into $x$, the resulting graph would contain
a cycle because there is a directed path from both $z$ and $w$ to $x$
in $\gr'$. Thus we conclude that no consistent extension exists. \qed
\\ 

{\noindent \bf Corollary \ref{cor:reversed-score}}
{\it
For any score-equivalent decomposable scoring criterion, the increase
in score that results from a valid reversal of a directed edge $x
\rightarrow y$ to a completed PDAG $\cpdag$ is:
$$ 
s(y,\Par{y}^{-x}) + s(x,\Par{x}^{+y} \cup \ParNei{y}{x})- 
s(y,\Par{y}) - s(x,\Par{x} \cup \ParNei{y}{x})
$$
}

{\noindent \bf Proof:} Follows immediately by subtracting the score of
$\gr$ from the score of $\gr'$, where these DAGs are defined in the
``if'' proof of Theorem \ref{th:reversed-valid}.  \qed 

\subsection{The MakeV Operator} \label{sec:op6}

{\noindent \bf Theorem \ref{th:makev-valid}}
{\it
Let $x-z-y$ be any length-two undirected path in completed PDAG
$\cpdag$ such that $x$ and $y$ are not adjacent. Replacing the
undirected edges with directed edges to create the v-structure
$\vstruct{x}{z}{y}$ is valid if and only if every undirected path
between $x$ and $y$ contains a node in $\ComNei{x}{y}$.
} \\

{\noindent \bf Proof (if):} From Lemma \ref{lem:dpath} and Corollary
\ref{cor:dpath}, there exists a consistent extension $\gr$ where $x$
has no reversible parents, the reversible parents of $z$ are precisely
the nodes in $\{x\} \cup \ComNei{x}{y} \setminus \{z\}$ (corresponding
to the total ordering of $\ComNei{x}{y}$ where $z$ is {\em last}), and
the reversible parents of $y$ are precisely the nodes in
$\ComNei{x}{y}$. We now show that the graph $\gr'$ that results from
reversing $z \rightarrow y$ in $\gr$ is a consistent extension of the
PDAG $\pdag'$ that results from applying the MakeV operator to
$\cpdag$.

Clearly the adjacencies in $\gr'$ and $\pdag'$ are the same. Any
v-structure that exists in one of these graphs but not the other must
include the edge $y \rightarrow z$, and thus is of the form
$\vstruct{w}{z}{y}$. Because any compelled edge from $\cpdag$ exists
as a directed edge (with the same orientation) in both $\gr'$ and
$\pdag'$, it follows that $w \rightarrow y$ must be reversible in
$\cpdag$. Because all reversible edges from $\cpdag$ except for $x-z$
and $z-y$ are undirected in $\pdag'$, no such extra v-structure can
exist in $\pdag'$. By construction of $\gr$, every reversible parent
of $z$ is a member of $\{x\} \cup \ComNei{x}{y} \setminus \{z\}$. All
of these nodes, except for $x$, is adjacent to $y$ in $\gr$ and thus
in $\gr'$, and we conclude that the extra v-structure cannot exist in
$\gr'$ either. It remains to be shown that $\gr'$ is acyclic.

Suppose $\gr'$ contains a cycle. This cycle must include the edge $y
\rightarrow z$ because $\gr$ is acyclic. This implies there is a
directed path from $z$ to $y$ in $\gr$. By construction, the only
reversible parents of $y$ in $\gr$ are nodes in $\ComNei{x}{y}$, all
of which are parents of $z$ in $\gr$; thus, this directed path must
enter $y$ via some compelled edge $w \rightarrow y$. Because $z
\rightarrow y$ is reversible in $\gr$, we know by Lemma
\ref{lem:covered-component} that there must also exist the compelled
edge $w \rightarrow z$ in $\gr$. But because there is a directed path
from $z$ to $w$, this implies $\gr$ contains a cycle, which is
impossible.

{\noindent \bf (only if)} We now show that if there is an undirected
path from $x$ to $y$ in $\cpdag$ that does not pass through a node in
$\ComNei{x}{y}$, then there is no consistent extension of $\pdag'$.

Suppose there does exist some consistent extension $\gr'$. Let $x-a_1-
\ldots -a_k-y$ denote the {\em shortest} undirected path between $x$
and $y$ such that for each $a_i$, $a_i \not \in
\ComNei{x}{y}$. Because each $a_i$ is not a common neighbor of $x$ and
$y$, there must be at least two such elements in the path. There can
be no edge connecting two non-consecutive nodes in the path because by
Lemma \ref{lem:components}, any such edge must be undirected and hence
the path would not be shortest. Because there is a {\em disjoint} path
between $x$ and $y$ through the common neighbor $z$ in $\gr$, the
shortest path participates in an undirected cycle of length four or
more. This implies that $z$ must be adjacent to $a_1$, else the
undirected path $z-x-a_1$ in this cycle would not have a chord and
thus $\gr$ would not be chordal.

First we show that the edge between $x$ and $a_1$ is directed as $x
\rightarrow a_1$ in $\gr'$. Suppose this is not the case, and we have
$a_1 \rightarrow x$. Because $\gr'$ contains the edge $x \rightarrow
z$, and $z$ is adjacent to $a_1$, it follows that the edge between
$a_1$ and $z$ must be directed as $a_1 \rightarrow z$, lest $\gr'$
contains a cycle. Because $a_1$ and $y$ are not adjacent and the edge
$y \rightarrow z$ is in $\gr'$, this implies there is a v-structure
$\vstruct{a_1}{z}{y}$ in $\gr'$ that does not exist in $\pdag'$.

Following the same argument as above, we conclude that the edge
between $y$ and $a_k$ must be directed as $y \rightarrow a_k$ in
$\gr'$. But this implies there must be some $a_i$ along the path that
has converging directed edges $a_{i-1} \rightarrow a_i \leftarrow
a_{i+1}$. Because $a_{i-1}$ and $a_{i+1}$ are not adjacent in $\gr'$,
this constitutes a v-structure that is not in $\pdag'$. \qed \\

{\noindent \bf Corollary \ref{cor:makev-score}}
{\it
For any score-equivalent decomposable scoring criterion, the increase
in score that results from a valid application of the MakeV operator
is:
$$ 
s(z,\Par{z}^{+y} \cup \ComNei{x}{y}^{-z+x}) + s(y,\Par{y} \cup
\ComNei{x}{y}^{-z})\\- s(z,\Par{z} \cup \ComNei{x}{y}^{-z+x}) - s(y,\Par{y} \cup
\ComNei{x}{y})
$$
}

{\noindent \bf Proof:} Follows immediately by subtracting the score of
$\gr$ from the score of $\gr'$, where these DAGs are defined in the
``if'' proof of Theorem \ref{th:makev-valid}.  \qed
 
\subsection{Completeness of the Operators} \label{sec:complete}

In this section, we prove Theorem \ref{th:complete}. That is, we prove
that given any pair of completed PDAGs $\cpdagone$ and $\cpdagtwo$,
there exists a sequence of valid operators that can be applied to
$\cpdagone$ that transforms it into $\cpdagtwo$. The proof of this
result is non-trivial, but the sequence of operators can be described
easily as follows: we first transform $\cpdagone$ into a completed
PDAG with no edges by deleting edges, either directed or undirected,
one at a time until there are none remaining. Next, we construct each
of the v-structures that exist in $\cpdagtwo$ with (one or) two edge
additions followed by (if necessary) a MakeV operator. Next, the
remaining directed edges from $\cpdagtwo$ are then inserted one at a
time into $\cpdagone$, such that after each addition, $\cpdagone$
contains only directed edges that are also in $\cpdagtwo$. Finally,
all of the undirected edges from $\cpdagtwo$ are added to $\cpdagone$.

This section is organized as follows. In Section
\ref{sec:phase-delete} through Section \ref{sec:phase-undirected}, we
prove the existence of each ``phase'' of operators as described
above. Finally, in Section \ref{sec:phase-done} we state the proof of
Theorem \ref{th:complete}.

In the sections to follow, we use DeleteD$(x,y)$ to denote a DeleteD
operator that deletes the (directed) edge $x \rightarrow y$. We use
analogous notation for the other edge operators. We use MakeV$(x,y,z)$
to denote the MakeV operator that directs the undirected edges $x-y$
and $z-y$ as $x \rightarrow y$ and $z \rightarrow y$, respectively.

\subsubsection{Deleting All Edges} \label{sec:phase-delete}

From Theorem \ref{th:chordal}, we know that the undirected components
of a completed PDAG are chordal. Following are some results related to
chordal components that will prove to be useful in this section.

\begin{lemma} {\bf (Blair and Peyton, 1993)} \label{lem:chordal-clique}
Within any chordal component, there exists a node $x$ for which $\Nei{x}$
is a clique.
\end{lemma}
\nocite{BlairPeyton93}

Given this result, the following corollaries follow easily.

\begin{corollary} \label{cor:chordal-clique}
For any completed PDAG $\cpdag$ containing at least one undirected
edge, there exists an undirected edge $x - y$ for which
$\ComNei{x}{y}$ is a clique.
\end{corollary}
\noindent{\bf Proof:} From Theorem \ref{th:chordal}, any undirected
edge in $\cpdag$ participates in a chordal component of $\cpdag$. This
component contains at least one edge, and thus from Lemma
\ref{lem:chordal-clique} there is a node $x$ for which $\Nei{x}$ is a
clique and where $|\Nei{x}| > 0$. Let $y$ be any member of $\Nei{x}$;
because $\ComNei{x}{y} \subset \Nei{x}$, the corollary follows because
every subgraph of a clique is itself a clique. \qed

\begin{corollary} \label{cor:delete-deleteu}
Let $\cpdag$ be any completed PDAG containing at least one undirected
edge. Then there exists a valid DeleteU operator that can be applied
to $\cpdag$.
\end{corollary}
{\noindent \bf Proof:} Follows immediately from Corollary
\ref{cor:chordal-clique} and Theorem \ref{th:deleteu-valid}. \qed

Now the main result of this section follows easily.

\begin{lemma} \label{lem:phase-delete}
Let $\cpdag$ be any completed PDAG containing at least one edge. There
exists a sequence of valid DeleteU and DeleteD operators that can
be applied to $\cpdag$ so that the result of applying all operators in
the sequence is a completed PDAG containing no edges.
\end{lemma}
{\noindent \bf Proof:} The proof follows by showing that there must
exist at least one valid DeleteU or DeleteD operator that can be
applied to $\cpdag$; because edges are never added as a result of one
of these operators, the desired sequence can then be constructed
iteratively until the resulting completed PDAG contains no
edges. If $\cpdag$ contains any undirected edges, we know by Corollary
\ref{cor:delete-deleteu} that there is a valid DeleteU operator that
we can apply to $\cpdag$. If $\cpdag$ contains no undirected edges, it
follows immediately from Theorem \ref{th:deleted-valid} that {\em any}
DeleteD operator is valid. \qed

\subsubsection{Inserting V-Structures} \label{sec:phase-v}

In this section, we show that we can convert a completed PDAG that
contains no edges into a completed PDAG that contains precisely the
edges that participate in v-structures in some other completed PDAG by
a sequence of valid InsertD, InsertU, and MakeV operators. To do so,
we need some preliminary results.

\begin{proposition} \label{prop:v2}
Let $\cpdag$ be any completed PDAG for which every edge is a directed
edge that participates in a v-structure. If there is an edge that
participates at least two v-structures, then there exists an edge in
$\cpdag$ that can be deleted such that the resulting PDAG is itself a
completed PDAG for which every edge participates in a v-structure.
\end{proposition}
{\noindent \bf Proof:} Let $x \rightarrow y$ be any edge that
participates in at least two v-structures, and let $Z = \{z_1, ...,
z_k\}$, $k \geq 2$ denote the tail nodes of the other corresponding
edges. That is, $x \rightarrow y \leftarrow z_i$ is a v-structure in
$\cpdag$ for each $z_i \in Z$. If there exists a $z_j \in Z$ for which
the only v-structure in which $z_j \rightarrow y$ participates is the
one with $x \rightarrow y$, then we can delete $z_j \rightarrow y$
from $\cpdag$ and the desired property holds. Otherwise, for every
$z_j \in Z$, $z_j \rightarrow y$ participates in at least one other
v-structure, and thus we can delete $x \rightarrow y$ from $\cpdag$
and the desired property holds. \qed

\begin{proposition} \label{prop:v1}
Let $\cpdag$ be any completed PDAG for which every edge is a directed
edge that participates in a v-structure. If every edge participates in
exactly one v-structure, then the PDAG that results from deleting the
two edges in any v-structure is itself a completed PDAG for which
every edge participates in a v-structure.
\end{proposition}
{\noindent \bf Proof:}
Suppose that after deleting the two edges $x \rightarrow y$ and $z
\rightarrow y$ from a v-structure, there exists some edge that no
longer participates in a v-structure. This implies that the edge
participated in a v-structure with either $x \rightarrow y$ or $z
\rightarrow y$, which contradicts the fact that both of these edges
participate in only one v-structure. \qed

Given these two propositions, we can now prove the main result of this
section. 

\begin{lemma} \label{lem:phase-v}
Let $\cpdag'$ be any completed PDAG for which every edge is a directed
edge that participates in a v-structure. Let $\cpdag$ be any the
completed PDAG that contains no edges. Then there exists a sequence of
valid InsertU, InsertD, and MakeV operators that can be applied
in $\cpdag$ to convert it into $\cpdag'$.
\end{lemma}
{\noindent \bf Proof:} Induction on the number of (directed) edges in
$\cpdag'$. For the base case, if $\cpdag'$ contains no edges then
$\cpdag = \cpdag'$ and thus the lemma holds with a sequence of zero
operators.  For the induction step, we assume the lemma is true
whenever the number of edges is less than $k$, and consider the case
when $\cpdag'$ contains $k$ edges. We note that the number of directed
edges in a completed PDAG can never be one.

If there exists an edge in $\cpdag'$ that participates in at least two
v-structures, then we know from Proposition \ref{prop:v2} that there
exists an edge $x \rightarrow y$ in $\cpdag'$ that we can delete such
that the resulting PDAG $\cpdag''$ is completed and for which every
edge participates in a v-structure. Because $\cpdag''$ contains $k-1$
edges, we know by the induction hypothesis that we can transform
$\cpdag$ into $\cpdag''$ using a sequence of valid operators. The
result of inserting $x \rightarrow y$ into $\cpdag''$ is a PDAG that
is identical to $\cpdag'$, and consequently the operator
InsertD$(x,y)$ is valid and constitutes the last operator in the
desired sequence for the lemma.

To complete the proof, assume that every edge in $\cpdag'$
participates in exactly one v-structure, and let $x \rightarrow y
\leftarrow z$ be any v-structure in $\cpdag'$. From Proposition
\ref{prop:v1}, the PDAG $\cpdag''$ that is identical to $\cpdag'$
except that does not contain either $x \rightarrow y$ or $y \leftarrow
z$ is itself a completed PDAG for which every edge participates in a
v-structure. Because $\cpdag''$ contains $k-2$ edges, we know by the
induction hypothesis that we can transform $\cpdag$ into $\cpdag''$
using a sequence of valid operators. We now show how to convert
$\cpdag''$ into $\cpdag'$ by either two InsertD operators or by two
InsertU operators followed by a MakeV operator.

If $\Par{x} \not = \Par{y}$ or $\Par{z} \not = \Par{y}$ in $\cpdag''$
then assume, without loss of generality, that $\Par{x} \not =
\Par{y}$. Then InsertD$(x,y)$ is a valid operator in $\cpdag''$ and
from Theorem \ref{th:pain} $x \rightarrow y$ is compelled in the
resulting completed PDAG $\cpdag'''$. Furthermore, it is easy to see
that every {\em other} edge in $\cpdag'''$ is compelled because these
edges participate in v-structures that are also in $\cpdag'$ (which
has the adjacency between $x$ and $y$). The result of adding the
directed edge $z \rightarrow y$ to $\cpdag'''$ is a PDAG that is
identical to $\cpdag'$, and thus InsertD$(z,y)$ is a valid operator
that completes the desired sequence for the lemma.

The final possibility is that in $\cpdag''$ we have $\Par{x} = \Par{y}
= \Par{z}$. In this case, we can apply InsertU$(x,y)$ to $\cpdag''$,
which is clearly valid because $\cpdag''$ contains no undirected
edges. Because all other edges in $\cpdag''$ participate in
v-structures that also exist in $\cpdag'$, and because $x$ and $y$ are
adjacent in $\cpdag'$, we know that all of these edges participate in
v-structures after adding $x-y$ to $\cpdag''$, and thus we know from
Theorem \ref{th:pain} that the completed PDAG $\cpdag'''$ that results
from InsertU$(x,y)$ is identical to $\cpdag''$ except that it contains
the undirected edge $x-y$. Following a similar set of arguments, we
see that we can next apply InsertU$(z,y)$ to $\cpdag'''$ and the
resulting PDAG is a completed PDAG $\cpdag''''$ that is identical to
$\cpdag'''$ except that is contains the undirected edge
$z-y$. Finally, because the result of simultaneously directing the two
undirected edges $x-y$ and $z-y$ in $\cpdag''''$ as $x \rightarrow y$
and $z \rightarrow y$ results in a PDAG that is identical to
$\cpdag'$, we conclude that MakeV$(x,y,z)$ is a valid operator that
completes the desired sequence for the lemma. \qed
 
\subsubsection{Inserting Compelled Edges} \label{sec:phase-compelled}

In this section, we consider the phase that transforms the PDAG
containing only edges that participate in v-structures into a PDAG
that only contains compelled edges, using valid InsertD operators.  We
first prove the following proposition about the ``rules approach'' to
identifying compelled edges in an equivalence class; this approach was
detailed in Section \ref{sec:pain}.

\begin{proposition} \label{prop:2rules}
Let $\cpdag$ be any completed PDAG containing only compelled
edges. Then all of the edges not participating in a v-structure can be
identified by undirecting them and then repeatedly applying only Rule
1 and Rule 2 from Section \ref{sec:pain}.
\end{proposition}
\noindent{\bf Proof:} Suppose this is not the case, and consider the
{\em last} edge $x \rightarrow y$ in $\cpdag$ to be oriented using
Rule 3, where there is a v-structure $w \rightarrow y \leftarrow z$
and the edges $x-w$ and $x-z$ have yet to be oriented. It is easy to
see that the orientation of an edge $a \rightarrow b$ depends only on
those edges $c \rightarrow d$ for which there is a directed path (of
length 0 or more) from $d$ to $b$; this property holds for the edges
in the rules themselves, and by the transitivity of the directed-path
relation, it holds in general. Because there is a directed edge from
both $x$ and $w$ to $y$, there cannot be a directed path from $y$ to
either of these nodes, and thus $x-w$ can be oriented without first
orienting $x \rightarrow y$. Thus if we do not perform the last
application of Rule 3, $x-w$ will still be oriented by either Rule 2
or Rule 1. If the edge gets oriented as $x \rightarrow w$, then we can
orient the edge between $x$ and $y$ using Rule 2. Otherwise, ($w
\rightarrow x$), the edge between $x$ and $z$, if not already oriented
as $x \rightarrow z$, will be oriented by Rule 1, in which case we can
again orient the edge between $x$ and $y$ using Rule 2. Because the
last application of Rule 3 is never necessary, none of them are
necessary and the proposition follows. \qed

We can now prove the main result of this section.

\begin{lemma} \label{lem:phase-compelled}
Let $\cpdag'$ be any completed PDAG containing only compelled edges,
and let $\cpdag$ be the completed PDAG containing precisely those
compelled edges that participate in v-structures in $\cpdag'$. Let $c$
denote the number of edges in $\cpdag'$ that do not participate in a
v-structure. Then there exists a sequence of $c+1$ completed PDAGs
$\cpdag = \cpdag_1, ..., \cpdag_{c+1} = \cpdag'$ for which
$\cpdag_{i+1}$ results from a valid InsertD operator applied to
$\cpdag_i$.
\end{lemma}
{\noindent \bf Proof:} Induction on the number of edges $c$ that do
not participate in v-structures in $\cpdag'$. For the basis, if $c=0$
then clearly $\cpdag = \cpdag'$ and the lemma follows trivially. For
the induction step, we assume that the lemma is true for $c < k$ and
consider the case when $c = k$.

From Proposition \ref{prop:2rules}, we can orient all of the $c$
compelled edges not participating in v-structures in $\cpdag'$ by
first undirecting them and then repeatedly applying either Rule 1 or
Rule 2 from Section \ref{sec:pain}. Consider the {\em last} edge $x-y$
that is oriented in $\cpdag'$ using this procedure. Clearly, because
only Rule 1 and Rule 2 are applied, none of the previous edge
orientations depend on the presence of $x-y$. Thus if we delete the
edge between $x$ and $y$ from $\cpdag'$, resulting in $\cpdag''$, all
of the edges remain compelled in the same direction; by the induction
hypothesis there is the desired sequence of valid InsertD operators
that transform $\cpdag$ into $\cpdag''$. Without loss of generality,
assume the edge between $x$ and $y$ is oriented as $x \rightarrow y$
in $\cpdag'$. Because the PDAG that results from adding the edge $x
\rightarrow y$ to $\cpdag''$ is identical to $\cpdag'$, we conclude
that InsertD$(x,y)$ is valid in $\cpdag$, and thus we can append this
operator to the sequence that produced $\cpdag''$ and the lemma
follows. \qed
 
\subsubsection{Inserting Reversible Edges} \label{sec:phase-undirected}

In this section, we show how to transform, using a sequence of valid
InsertU operators, a completed PDAG that contains only directed
edges into a completed PDAG with the same set of directed edges and
additional undirected edges. We first need the following result.

\begin{proposition} \label{prop:delu}
Let $\cpdag$ be any completed PDAG containing the undirected edge
$x-y$. The result of deleting $x-y$ from $\cpdag$ is a completed PDAG
that is identical to $\cpdag$ except that it does not contain the edge
$x-y$. 
\end{proposition}
{\noindent \bf Proof:} Let $\cpdag'$ be the completed PDAG that
results from deleting $x - y$.  It is easy to see that any v-structure
contained in $\cpdag$ remains in $\cpdag'$. Consider the sequence of
rules from Section \ref{sec:pain} applied to $\cpdag$ to determine the
orientation of the compelled edges that do not participate in
v-structures. The only rule that can potentially involve the edge
$x-y$ is Rule 3. Thus every {\em other} rule also applies to
$\cpdag'$. But for any Rule 3 application that involves $x-y$ in
$\cpdag$, the edge that gets oriented in $\cpdag$ must participate in
a v-structure in $\cpdag'$ and is thus already directed. Thus every
directed edge in $\cpdag$ is also directed in $\cpdag'$. The proof
follows by noting that no {\em extra} rule can apply after deleting an
undirected (i.e., reversible) edge. \qed

We can now prove the main result of this section.

\begin{lemma} \label{lem:phase-reversible}
Let $\cpdag'$ be any completed PDAG, and let $\cpdag$ be any completed
PDAG that contains identical directed edges and no undirected
edges. Let $c$ denote the number of undirected edges in
$\cpdag'$. Then there exists a sequence of $c+1$ completed PDAGs
$\cpdag = \cpdag_1, ..., \cpdag_{c+1} = \cpdag'$ for which
$\cpdag_{i+1}$ results from a valid InsertU operator applied to
$\cpdag_i$.
\end{lemma}
\noindent{\bf Proof:} We first show that there exists a sequence of
valid DeleteU operators that can transform $\cpdag'$ into $\cpdag$,
and then show that each such deletion has a corresponding inverse
InsertU operator.

From Corollary \ref{cor:delete-deleteu} in Section
\ref{sec:phase-delete}, as long as there is at least one undirected
edge in $\cpdag'$, there exists a valid DeleteU operator that we can
apply to $\cpdag'$. Furthermore, from Proposition \ref{prop:delu},
after each such deletion, the resulting completed PDAG only differs
from $\cpdag'$ by the presence of the deleted edge. Thus there exists
a sequence $\cpdag' = \cpdag'_1, \ldots, \cpdag'_{c+1} = \cpdag$ for
which $\cpdag'_{i+1}$ results from a valid DeleteU operator applied
to $\cpdag'_i$.

We complete the proof by demonstrating that for any pair $\cpdag'_i,
\cpdag'_{i+1}$, we can transform $\cpdag'_{i+1}$ into $\cpdag'_i$ by a
valid InsertU operator. Let $x-y$ be the edge that was deleted in
the transition from $\cpdag'_i$ to $\cpdag'_{i+1}$, and consider the
PDAG $\pdag$ that results from inserting $x-y$ into
$\cpdag'_{i+1}$. Because $\cpdag'_i$ differs from $\cpdag'_{i+1}$ only
by the presence of $x-y$, it follows that $\pdag = \cpdag'_i$. \qed

\subsubsection{Proof of Theorem \ref{th:complete}} \label{sec:phase-done}

We now provide the proof of Theorem \ref{th:complete}; given the
results from the previous sections, the proof follows easily.\\

{\noindent \bf Theorem \ref{th:complete}} {\it The operators
InsertU, InsertD, DeleteU, DeleteD, and MakeV are complete.}\\

{\noindent \bf Proof:} Consider any pair of completed PDAGs
$\cpdagone$ and $\cpdagtwo$. We can transform $\cpdagone$ into
$\cpdagtwo$ by a sequence of the given operators as follows. First,
from Lemma \ref{lem:phase-delete}, we know we can transform
$\cpdagone$ into the completed PDAG containing no edges by a sequence
of valid DeleteU and DeleteD operators. Next, from Lemma
\ref{lem:phase-v}, we can transform $\cpdagone$ into the completed
PDAG that contains precisely those edges in $\cpdagtwo$ that
participate in v-structures using a sequence of valid InsertU, InsertD
and MakeV operators. Next, from Lemma \ref{lem:phase-compelled}, we
can transform $\cpdagone$ into the completed PDAG that contains
precisely those edges that are compelled in $\cpdagtwo$ using a
sequence of InsertD operators. Finally, from Lemma
\ref{lem:phase-reversible}, we can transform $\cpdagone$ into
$\cpdagtwo$ by a sequence of valid InsertU operators. \qed

\end{document}