Make-Information-Extraction-Great-Again
An (incomplete) overview of information extraction
Named Entity Recognition
Relation Extraction
Supervised Sentence-level Relation Extraction
What is it?
What are the challenges?
- 语义理解
- SPO
- EPO
- relation之间相互依赖
Mainstream methods?
- Sequence Labelling
- Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme(ACL2017): 序列标注,tag有三部分,第一部分表示实体开头/中间/结尾,第二部分表示所属关系,第三部分表示是头实体还是尾实体
- A Novel Cascade Binary Tagging Framework for Relational Triple Extraction (ACL2020): 先序列标注出头实体,再根据头实体以及某个特定关系序列标注出尾实体
- PRGC: Potential Relation and Global Correspondence Based Joint Relational Triple Extraction(ACL2021): 同样先标头实体再标relation-specific的尾实体,改进在于先判断可能的relation,有可能出现的才去标对应的尾实体
- Sequence to Sequence
- Extracting Relational Facts by an End-to-End Neural Model with Copy Mechanism(ACL2018): 输入句子,输出提取结果序列,结果序列格式是 => r1, h1, t1, r2, h2, t2, ...
- CopyMTL: Copy Mechanism for Joint Extraction of Entities and Relations with Multi-Task Learning(AAAI2020): 同上,做了改进
- Learning the Extraction Order of Multiple Relational Facts in a Sentence with Reinforcement Learning(EMNLP2019): 提取结构本来无序但是序列生来有序,用强化学习解决这个问题
- Minimize Exposure Bias of Seq2Seq Models in Joint Entity and Relation Extraction(EMNLP2020 findings): Seq2Seq的方法time step过长导致exposure bias,所以尝试把sequential的decoder变成tree
- Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction(AAAI2020):
- Question Answering
- Entity-Relation Extraction as Multi-turn Question Answering(ACL2019): 建模成多次问答
- Table
- Table Filling Multi-Task Recurrent Neural Network for Joint Entity and Relation Extraction(COLING2016): 使用RNN模型按预定义顺序遍历表格中cell捕捉cell之间依赖
- TPLinker: Single-stage Joint Extraction of Entities and Relations Through Token Pair Linking(COLING2020): 表格中的cell不是直接表示两个词之间的关系,而变成更细粒度的比如两个词是否属于同一个entity的第一个词与最后一个词、两个词是否分别是某个关系中的头实体跟尾实体的第一个词等等
- Two are Better than One: Joint Entity and Relation Extraction with Table-Sequence Encoders(EMNLP2020): 提出了table encoder以及序列encoder,table encoder内部cell会交互,table跟sequence的encoder也会交互
- UniRE: A Unified Label Space for Entity Relation Extraction(ACL2021): 实体类型跟关系类型放到同一个label dpan,提出三个不同的预训练任务
- Others
- Relation Classification via Convolutional Deep Neural Network(COLING2014): 早期工作,CNN-based,给定实体对做关系分类
- Relation Classification via Recurrent Neural Network(arXiv): 早起工作,RNN-based,给定实体对做关系分类
- A Frustratingly Easy Approach for Entity and Relation Extraction(NAACL2021): pipeline,先找实体,再在实体左右加特殊符号做RE,加context会带来帮助
- Label Verbalization and Entailment for Effective Zero and Few-Shot Relation Extraction(EMNLP2021): formulate成NLI任务来做,这样可以先用大量NLI数据做pre-train,将知识迁移过来
Datasets?
Distant Supervised Relation Extraction
What is it?
What are the challenges?
Mainstream methods?
Datasets?
Few-shot Relation Extraction
What is it?
What are the challenges?
Mainstream methods?
Datasets?
Document-level Relation Extraction
What is it?
What are the challenges?
- 每个entity有多个不同的mention
- 两个entity分布在不同的句子,跨句子关系
- 需要推理的关系
Mainstream methods?
- Graph
- Word-level Graph (图的节点是每个token)
- Cross-Sentence N-ary Relation Extraction with Graph LSTMs(EMNLP2017): dependency tree的边、相邻词的边、相邻句子root的边 + Graph LSTM
- Inter-sentence Relation Extraction with Document-level Graph Convolutional Neural Network(ACL2019): 每个句子基于dependency tree建图,之后还有相邻句子边、相邻词边、共指边和自环边
- Coarse-to-Fine Entity Representations for Document-level Relation Extraction(arXiv): 基于dependency tree建图,同样有相邻句子、相邻词、共指、自环边 + 每个实体对在图上找各个mention之间路径再聚集起来预测
- Non-word-level Graph (图的节点不是token而是mention/entity/sentence等)
- Connecting the Dots: Document-level Neural Relation Extraction with Edge-oriented Graphs(EMNLP2019): 图的节点包括mention、entity和sentence三种,启发式连边,之后与《A Walk-based Model on Entity Graphs for Relation Extraction》类似基于其他边进行边信息聚合
- Reasoning with Latent Structure Refinement for Document-Level Relation Extraction(ACL2020): 先基于dependency tree抽出关键节点以及mention node、entity node来构图,之后不断refine这个graph
- Global-to-Local Neural Networks for Document-Level Relation Extraction(EMNLP2020): 同样图上有mention、entity和sentence,用图来捕捉跨句子交互 + 对于实体对更优的通过mention表示聚合成entity表示的方式 + 预测实体对relation的时候考虑其他实体对的关系表示
- The Dots Have Their Values: Exploiting the Node-Edge Connections in Graph-based Neural Models for Document-level Relation Extraction(EMNLP2020 findings): 基于前面《Connceting the Dots》的改进,原来只考虑边,现在把节点表示也考虑上
- Document-Level Relation Extraction with Reconstruction(AAAI2021): 图上有entity、mention、sentence节点 + 核心思路是对于存在某个关系的实体对在图上有一条meta path
- Both
- Document-level Relation Extraction with Dual-tier Heterogeneous Graph(COLING2020): 首先一个基于dependency tree的syntactic graph,之后接一个以mention、entity为节点的semantic graph
- Word-level Graph (图的节点是每个token)
- Non-graph
- Simultaneously Self-Attending to All Mentions for Full-Abstract Biological Relation Extraction(NAACL2018): Transformer+Convolution+Biaffine
- Entity and Evidence Guided Document-Level Relation Extraction([email protected]): 引入evidence prediction作为auxiliary task + 编码的时候输入concat上头实体
- Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling(AAAI2021):引入一个辅助类来做adaptive threshold处理multi-label问题 + 每个实体对更有针对性地提取context用于关系预测
- Entity Structure Within and Throughout: Modeling Mention Dependencies for Document-Level Relation Extraction(AAAI2021): 魔改Transformer的self-attention模块,根据token跟token之间是否属于同一个句子、是否属于同一个实体、是否一个为实体一个为非实体之类的分为总共六类,每一类有不同的attention计算方法
- Multi-view Inference for Relation Extraction with Uncertain Knowledge(AAAI2021): 引入外部knowledge base协助编码
- Document-level Relation Extraction as Semantic Segmentation(IJCAI2021): 借鉴了CV中的U-Net
- Learning Logic Rules for Document-level Relation Extraction(EMNLP2021): 将关系的一些推理规则当成隐变量
- May Be We Do NOT Need All Sentences
- Three Sentences Are All You Need: Local Path Enhanced Document Relation Extraction(ACL2021): 对于每个实体对只需要最多三个句子就可以提取出来关系了
- SIRE: Separate Intra- and Inter-sentential Reasoning for Document-level Relation Extraction(ACL2021 findings): 如果实体对共现在同一个句子那么只需要用intra-sentence的表示即可,否则采用inter-sentence的表示 + 考虑多个关系的逻辑推理
- Discriminative Reasoning for Document-level Relation Extraction(ACL2021 findings): 定义三种推理路径,对于每个实体对抽这些路径来获取hidden state进行预测
- Eider: Evidence-enhanced Document-level Relation Extraction(arXiv): 使用整个文档预测的同时,引入另一个branch预测evidence sentences,然后只用这部分sentences组成伪文档进行预测,将两部分结果综合
- SAIS: Supervising and Augmenting Intermediate Steps for Document-Level Relation Extraction(arXiv): Eider的基础上,前面引入共指等更多的中间auxiliary tasks