Math & Engineering

Experiments in Graph-based Semi-Supervised Learning for Class-Instance Acquisition

Description
Experiments in Graph-based Semi-Supervised Learning for Class-Instance Acquisition Partha Pratim Talukdar * (Search Labs, MSR) Fernando Pereira (Google) ACL, 2010 (Uppsala, Sweden) * Work done while at
Published
of 75
25
Published
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Share
Transcript
Experiments in Graph-based Semi-Supervised Learning for Class-Instance Acquisition Partha Pratim Talukdar * (Search Labs, MSR) Fernando Pereira (Google) ACL, 2010 (Uppsala, Sweden) * Work done while at the Univ. of Pennsylvania 2 Class-Instance Acquisition Class-Instance Acquisition Given an entity, assign human readable descriptors to it e.g., Toyota is a car manufacturer, japanese company, multinational company,... 2 Class-Instance Acquisition Given an entity, assign human readable descriptors to it e.g., Toyota is a car manufacturer, japanese company, multinational company,... Large scale, open domain 2 Class-Instance Acquisition Given an entity, assign human readable descriptors to it e.g., Toyota is a car manufacturer, japanese company, multinational company,... Large scale, open domain Applications Web search, Advertising, etc. 2 Graph-based Class-Instance Acquisition 3 Graph-based Class-Instance Acquisition Table Mining Set 1 Billy Joel (0.75) Johnny Cash (0.73) Set 2 Bob Dylan (0.95) Johnny Cash (0.87) Billy Joel (0.82) Pattern 3 Graph-based Class-Instance Acquisition Table Mining Musician Set 1 Billy Joel (0.75) Johnny Cash (0.73) Singer Set 2 Bob Dylan (0.95) Johnny Cash (0.87) Billy Joel (0.82) Pattern 3 Graph-based Class-Instance Acquisition Cluster ID Table Mining Musician Set 1 Billy Joel (0.75) Johnny Cash (0.73) Singer Set 2 Bob Dylan (0.95) Johnny Cash (0.87) Billy Joel (0.82) Pattern 3 Graph-based Class-Instance Acquisition Cluster ID Table Mining Musician Set 1 Billy Joel (0.75) Johnny Cash (0.73) Singer Set 2 Bob Dylan (0.95) Johnny Cash (0.87) Billy Joel (0.82) Pattern Extraction Confidence 3 Graph-based Class-Instance Acquisition Cluster ID Table Mining Musician Set 1 Billy Joel (0.75) Johnny Cash (0.73) Singer Set 2 Bob Dylan (0.95) Johnny Cash (0.87) Billy Joel (0.82) Pattern 0.95 Bob Dylan Extraction Confidence Set Johnny Cash Set Billy Joel Graph-based Class-Instance Acquisition Cluster ID Table Mining Musician Set 1 Billy Joel (0.75) Johnny Cash (0.73) Singer Set 2 Bob Dylan (0.95) Johnny Cash (0.87) Billy Joel (0.82) Pattern 0.95 Bob Dylan Extraction Confidence Set Musician 1.0 Johnny Cash 3 Set Billy Joel Musician 1.0 Seed Classes Graph-based Class-Instance Table Mining Acquisition Musician Set 1 Billy Joel (0.75) Johnny Cash (0.73) Singer Set 2 Bob Dylan (0.95) Johnny Cash (0.87) Billy Joel (0.82) Cluster ID Pattern Can we infer that Bob Dylan is also a Musician, as that is missing in current extractions? 0.95 Bob Dylan Extraction Confidence Set Musician 1.0 Johnny Cash 3 Set Billy Joel Musician 1.0 Seed Classes Graph-based Class-Instance Acquisition [Talukdar et al., EMNLP 2008] 0.95 Bob Dylan Set Johnny Cash Set Billy Joel 4 Graph-based Class-Instance Acquisition [Talukdar et al., EMNLP 2008] Set Bob Dylan Initialization 0.87 Set Musician 1.0 Johnny Cash Musician 1.0 Seed Labels Musician Billy Joel Musician 1.0 Predicted Labels Musician 1.0 4 Graph-based Class-Instance Acquisition [Talukdar et al., EMNLP 2008] Set Bob Dylan Iteration Musician 0.8 Set Musician 1.0 Johnny Cash Musician 1.0 Seed Labels Musician Billy Joel Musician 1.0 Predicted Labels Musician 1.0 4 Graph-based Class-Instance Acquisition [Talukdar et al., EMNLP 2008] Set Bob Dylan Musician 0.6 Iteration Musician 0.8 Set Musician 1.0 Johnny Cash Musician 1.0 Seed Labels Musician Billy Joel Musician 1.0 Predicted Labels Musician 1.0 4 Graph-based Class-Instance Acquisition Overview & previous work Review & comparison of three SSL algorithms LP-ZGL (Zhu+, 2003) Adsorption (Baluja+, 2008) Modified Adsorption (MAD) (Talukdar and Crammer, 2009) Additional Semantic Constraints Summary Outline 5 Graph-based Class-Instance Acquisition Overview & previous work Review & comparison of three SSL algorithms LP-ZGL (Zhu+, 2003) Adsorption (Baluja+, 2008) Modified Adsorption (MAD) (Talukdar and Crammer, 2009) Additional Semantic Constraints Summary Outline 5 Comments on the Constructed Graph Set 2 Set Bob Dylan Johnny Cash Billy Joel Musician 1.0 Musician 1.0 6 Comments on the Constructed Graph Set 2 Set Bob Dylan Musician 1.0 Johnny Cash Smoothness: Nodes connected by an edge should be assigned similar classes, as enforced by edge weight Billy Joel Musician 1.0 6 Comments on the Constructed Graph Initial Clusters Set 2 Set Bob Dylan Musician 1.0 Johnny Cash Smoothness: Nodes connected by an edge should be assigned similar classes, as enforced by edge weight Billy Joel Musician 1.0 6 Comments on the Constructed Graph Initial Clusters Set 2 Set Bob Dylan Musician 1.0 Johnny Cash Smoothness: Nodes connected by an edge should be assigned similar classes, as enforced by edge weight. 6 Coupling Node: Force (softly) all instance nodes connected to it to have similar class labels, exploiting the Smoothness requirement Billy Joel Musician 1.0 Comments on the Constructed Graph Initial Clusters Set 2 Set Bob Dylan Musician 1.0 Johnny Cash Smoothness: Nodes connected by an edge should be assigned similar classes, as enforced by edge weight. 6 Coupling Node: Force (softly) all instance nodes connected to it to have similar class labels, exploiting the Smoothness requirement Billy Joel Musician 1.0 Seed classes can be different from the cluster IDs LP-ZGL [Zhu et al., ICML 2003] LP-ZGL [Zhu et al., ICML 2003] m labels W : edge weight matrix Ŷ ul : weight of label l on node u Y ul : seed weight of label l on node u S: diagonal matrix, non-zero for seed nodes LP-ZGL [Zhu et al., ICML 2003] arg min Ŷ m l=1 u,v W uv (Ŷ ul Ŷ vl ) 2 ; s.t. SŶ l = SY l m labels W : edge weight matrix Ŷ ul : weight of label l on node u Y ul : seed weight of label l on node u S: diagonal matrix, non-zero for seed nodes LP-ZGL [Zhu et al., ICML 2003] arg min Ŷ m l=1 u,v W uv (Ŷ ul Ŷ vl ) 2 ; Smooth s.t. SŶ l = SY l m labels W : edge weight matrix Ŷ ul : weight of label l on node u Y ul : seed weight of label l on node u S: diagonal matrix, non-zero for seed nodes LP-ZGL [Zhu et al., ICML 2003] arg min Ŷ m l=1 u,v W uv (Ŷ ul Ŷ vl ) 2 ; Smooth s.t. SŶ l = SY l Match Seeds (hard) m labels W : edge weight matrix Ŷ ul : weight of label l on node u Y ul : seed weight of label l on node u S: diagonal matrix, non-zero for seed nodes 8 Modified Adsorption (MAD) [Talukdar and Crammer, ECML 2009] Modified Adsorption (MAD) [Talukdar and Crammer, ECML 2009] arg min Ŷ m+1 l=1 SŶ l SY l 2 + µ 1 M uv (Ŷ ul Ŷ vl) 2 + µ 2 Ŷ l R l 2 m labels, +1 dummy label M = W + W is the symmetrized weight matrix Ŷ vl: weight of label l on node v u,v Y vl : seed weight for label l on node v S: diagonal matrix, nonzero for seed nodes R vl : regularization target for label l on node v 8 Modified Adsorption (MAD) [Talukdar and Crammer, ECML 2009] arg min Ŷ m+1 l=1 Match Seeds (soft) SŶ l SY l 2 + µ 1 M uv (Ŷ ul Ŷ vl) 2 + µ 2 Ŷ l R l 2 m labels, +1 dummy label M = W + W is the symmetrized weight matrix Ŷ vl: weight of label l on node v u,v Y vl : seed weight for label l on node v S: diagonal matrix, nonzero for seed nodes R vl : regularization target for label l on node v 8 Modified Adsorption (MAD) [Talukdar and Crammer, ECML 2009] arg min Ŷ m+1 l=1 Match Seeds (soft) Smooth SŶ l SY l 2 + µ 1 M uv (Ŷ ul Ŷ vl) 2 + µ 2 Ŷ l R l 2 m labels, +1 dummy label M = W + W is the symmetrized weight matrix Ŷ vl: weight of label l on node v u,v Y vl : seed weight for label l on node v S: diagonal matrix, nonzero for seed nodes R vl : regularization target for label l on node v 8 Modified Adsorption (MAD) arg min Ŷ m+1 l=1 [Talukdar and Crammer, ECML 2009] Match Seeds (soft) Smooth SŶ l SY l 2 + µ 1 M uv (Ŷ ul Ŷ vl) 2 + µ 2 Ŷ l R l 2 m labels, +1 dummy label M = W + W is the symmetrized weight matrix Ŷ vl: weight of label l on node v u,v Y vl : seed weight for label l on node v S: diagonal matrix, nonzero for seed nodes R vl : regularization target for label l on node v Match Priors (Regularizer) 8 Modified Adsorption (MAD) arg min Ŷ m+1 l=1 [Talukdar and Crammer, ECML 2009] Match Seeds (soft) Smooth SŶ l SY l 2 + µ 1 M uv (Ŷ ul Ŷ vl) 2 + µ 2 Ŷ l R l 2 m labels, +1 dummy label M = W + W is the symmetrized weight matrix u,v Match Priors (Regularizer) Ŷ vl: weight of label l on node v Y vl : seed weight for label l on node v S: diagonal matrix, nonzero for seed nodes MAD has extra regularization compared to LP-ZGL R vl : regularization target for label l on node v 8 MAD (contd.) Inputs Y, R : V ( L + 1), W : V V, S : V V diagonal Ŷ Y M = W + W Z v S vv + µ 1 u=v M vu + µ 2 v V repeat for all v V do Ŷ v 1 (SY ) v + µ 1 M v Ŷ + µ 2 R v Z v end for until convergence 9 MAD (contd.) Inputs Y, R : V ( L + 1), W : V V, S : V V diagonal Ŷ Y M = W + W Z v S vv + µ 1 u=v M vu + µ 2 v V repeat for all v V do Ŷ v 1 (SY ) v + µ 1 M v Ŷ + µ 2 R v Z v end for until convergence Easily Parallelizable: Scalable 9 MAD (contd.) Inputs Y, R : V ( L + 1), W : V V, S : V V diagonal Ŷ Y M = W + W Z v S vv + µ 1 u=v M vu + µ 2 v V repeat for all v V do Ŷ v 1 (SY ) v + µ 1 M v Ŷ + µ 2 R v Z v end for until convergence Easily Parallelizable: Scalable Importance of a node can be discounted 9 MAD (contd.) Inputs Y, R : V ( L + 1), W : V V, S : V V diagonal Ŷ Y M = W + W Z v S vv + µ 1 u=v M vu + µ 2 v V repeat for all v V do Ŷ v 1 (SY ) v + µ 1 M v Ŷ + µ 2 R v Z v end for until convergence Easily Parallelizable: Scalable Importance of a node can be discounted Convergence guarantee under mild conditions 9 Experiments with Public Datasets 10 Experiments with Public Datasets Previous research used proprietary datasets difficult to reproduce and extend 10 Experiments with Public Datasets Previous research used proprietary datasets difficult to reproduce and extend Public datasets are used in the paper Freebase: relational tables from multiple sources TextRunner (Ritter+, 2009): Open-domain IE System YAGO (Suchanek+, 2007): KB curated from Wikipedia and WordNet Gold standard hypernyms: [Pantel+, 2009], WordNet 10 Experiments with Public Datasets Previous research used proprietary datasets difficult to reproduce and extend Public datasets are used in the paper Freebase: relational tables from multiple sources TextRunner (Ritter+, 2009): Open-domain IE System YAGO (Suchanek+, 2007): KB curated from Wikipedia and WordNet Gold standard hypernyms: [Pantel+, 2009], WordNet 10 Available at: Graph Stats Statistics of Graphs used in Experiments 11 Evaluation Metric: Mean Reciprocal Rank (MRR) MRR = 1 test-set v test-set 1 rank v (class(v)) 12 Evaluation Metric: Mean Reciprocal Rank (MRR) MRR = 1 test-set v test-set 1 rank v (class(v)) Billy Joel Linguist 0.4 Musician Gold Label: Musician MRR: 0.5 (1/2) 12 Graph-based SSL Comparisons (1) 13 Graph-based SSL Comparisons (1) 0.35 TextRunner Graph, 170 WordNet Classes LP-ZGL Adsorption MAD Graph with 175k nodes, 529k edges. Mean Reciprocal Rank (MRR) x x 10 Amount of Supervision 13 Graph-based SSL Comparisons (2) 0.39 Freebase-2 Graph, 192 WordNet Classes LP-ZGL Adsorption MAD Mean Reciprocal Rank (MRR) Graph with 303k nodes, 2.3m edges x x 10 Amount of Supervision 14 15 When is MAD most effective? When is MAD most effective? 0.4 Relative Increase in MRR by MAD over LP-ZGL Average Degree 15 When is MAD most effective? 15 Relative Increase in MRR by MAD over LP-ZGL MAD is most effective in dense graphs, where there is greater need for 0.2 regularization Average Degree Graph-based Class-Instance Acquisition Overview & previous work Review & comparison of three SSL algorithms LP-ZGL (Zhu+, 2003) Adsorption (Baluja+, 2008) Modified Adsorption (MAD) (Talukdar and Crammer, 2009) Additional Semantic Constraints Summary Outline 16 Graph-based Class-Instance Acquisition Overview & previous work Review & comparison of three SSL algorithms LP-ZGL (Zhu+, 2003) Adsorption (Baluja+, 2008) Modified Adsorption (MAD) (Talukdar and Crammer, 2009) Additional Semantic Constraints Summary Outline 16 17 Can Additional Semantic Constraints Help? Can Additional Semantic Constraints Help? Set 2 Isaac Newton Set 1 Johnny Cash Bob Dylan 17 Can Additional Semantic Constraints Help? Set 2 Isaac Newton Instances with shared attributes are likely to be from the same class. Set 1 Johnny Cash Bob Dylan 17 Can Additional Semantic Constraints Help? Set 2 Isaac Newton Instances with shared attributes are likely to be from the same class. 17 Set 1 has_attributealbums Johnny Cash Bob Dylan Can Additional Semantic Constraints Help? Isaac Newton Graph-based representation makes it easy to incorporate Set 2 such constraints! Instances with shared attributes are likely to be from the same class. 17 Set 1 has_attributealbums Johnny Cash Bob Dylan Better Classes with YAGO Attributes 18 Better Classes with YAGO Attributes 170 WordNet Classes, 10 Seeds per Class, using Adsorption Mean Reciprocal Rank (MRR) TextRunner Graph YAGO Graph TextRunner + YAGO Graph Better Classes with YAGO Attributes Mean Reciprocal Rank (MRR) WordNet Classes, 10 Seeds per Class, using Adsorption TextRunner Graph YAGO Graph TextRunner + YAGO Graph Graph constructed from TextRunner (UWash) output, 175k nodes, 529k edges Better Classes with YAGO Attributes 170 WordNet Classes, 10 Seeds per Class, using Adsorption Mean Reciprocal Rank (MRR) TextRunner Graph YAGO Graph TextRunner + YAGO Graph Graph constructed from output of YAGO Knowledge Base, 142k nodes, 777k edges Better Classes with YAGO Attributes 170 WordNet Classes, 10 Seeds per Class, using Adsorption Mean Reciprocal Rank (MRR) TextRunner Graph YAGO Graph TextRunner + YAGO Graph Combined graph, with 237k nodes, 1.3m edges Better Classes with YAGO Attributes 170 WordNet Classes, 10 Seeds per Class, using Adsorption Mean Reciprocal Rank (MRR) TextRunner Graph YAGO Graph TextRunner + YAGO Graph Better Classes with YAGO Attributes Additional semantic constraints help Mean Reciprocal Rank (MRR) WordNet Classes, 10 Seeds per Class, using Adsorption TextRunner Graph YAGO Graph TextRunner + YAGO Graph improve performance significantly Classes for Attributes Classes assigned to attribute nodes has_isbn wordnet_book wordnet_maganize 19 Classes for Attributes Classes assigned to attribute nodes has_isbn wordnet_book wordnet_maganize 19 Classes for Attributes Classes assigned to attribute nodes has_isbn wordnet_book wordnet_maganize Evidence that attributes propagate right classes. 19 20 Summary Summary Compared three graph-based SSL algorithms for class-instance acquisition MAD is consistently most effective, particularly in dense graphs 20 Summary Compared three graph-based SSL algorithms for class-instance acquisition MAD is consistently most effective, particularly in dense graphs Better class-instance acquisition with additional semantic constraints easy to add such constraints in graph-based representation 20 Summary Compared three graph-based SSL algorithms for class-instance acquisition MAD is consistently most effective, particularly in dense graphs Better class-instance acquisition with additional semantic constraints easy to add such constraints in graph-based representation Datasets available: for reproducibility and extension 20 Thank You!
Search
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x