INT n指令允许通过提供一个中断向量作为操作数,从软件内部产生中断。例如,INT 35指令强制调用35号中断的中断处理程序。任何从0到255的中断向量都可以作为该指令的参数。如果使用了处理器预定义的NMI向量,那么处理器的反应将与正常产生的的NMI中断产生的反应不一样。如果在这条指令中使用了2号向量(NMI向量),就会调用NMI中断处理程序,但处理器的NMI处理硬件没有被激活。用INT n指令在软件中产生的中断不能被EFLAGS寄存器中的IF标志所屏蔽。
处理器处理对异常处理程序和中断处理程序的调用的方式类似于处理对过程或任务的CALL指令来处理对过程或任务的调用。当响应一个异常或中断时,处理器使用异常或中断向量作为IDT中描述符的索引。如果该索引指向一个中断门或陷阱门,处理器调用异常或中断处理程序,其方式类似于调用门的CALL。如果 index 指向一个任务 门,处理器将执行一个任务切换到异常或中断处理任务,其方式类似于 CALL到一个任务门
如果处理程序要在一个较低的权限级别上执行,就会发生堆栈切换。 当堆栈切换发生时 : a. 处理程序使用的堆栈的段选择器和堆栈指针是从当前执行任务的TSS中获得的。在这个新的堆栈中,处理程序推送了被中断的堆栈段选择器和堆栈指针。 b. 然后,处理器将EFLAGS、CS和EIP寄存器的当前状态保存在新的堆栈中(见图6-4)。 c. 如果一个异常导致错误代码被保存,它将被推到EIP值之后的新栈上。
如果处理程序要在与被中断程序相同的权限级别下执行。 a. 处理器将EFLAGS、CS和EIP寄存器的当前状态保存在当前堆栈中(见图6-4)。 b. 如果一个异常导致错误代码被保存,那么它将在EIP值之后被推到当前堆栈中。
只有在异常或中断产生的时候,处理器才会检查中断或陷阱门的DPL,如果有一个INT n, INT 3, 或 INTO 指令产生的异常或中断,处理器才会检查中断或陷阱门的 DPL。这里,CPL必须小于或等于门的DPL。这一限制防止了运行在权限级别3的应用程序或程序使用软件中断来访问关键的异常处理程序。中断来访问关键的异常处理程序,例如页面故障处理程序,条件是这些处理程序被放在更有权限的代码段中。对于硬件产生的中断和处理器检测到的异常,处理器忽略了中断和陷阱门的DPL。
Segment Loading Instructions in IA-32e Mode(IA-32e模式下的段加载指令)
由于ES、DS和SS段寄存器在64位模式下不被使用,它们在段描述符寄存器中的字段(base, limit, and attribute)被忽略了。某些形式的段装载指令也是无效的(例如LDS, POP ES)。引用ES、DS或SS段的地址计算被视为段基 为零。 处理器检查所有线性地址引用都是典型的形式,而不是执行极限检查。模式切换并不改变段寄存器或相关描述符寄存器的内容。在64位模式执行过程中,这些寄存器也不会改变,除非执行显式段加载。 为了给应用程序设置兼容模式,段加载指令(MOV to Sreg, POP Sreg)在64位模式下正常工作。从系统描述符表(GDT或LDT)中读取一个条目,并加载到段描述符的隐藏部分。描述符寄存器的基数、极限和属性字段都被加载。然而,数据和堆栈段选择器以及描述符寄存器的内容被忽略。 当FS和GS段重写在64位模式下使用时,它们各自的基础地址被用于线性地址计算中使用。(FS或GS).base + index + displacement。然后,FS.base和GS.base会被扩展到整个实现所支持的线性地址大小。由此产生的有效地址计算可以跨越正负地址;产生的线性地址必须是规范的。 在64位模式下,使用FS段和GS段覆盖的内存访问不会被检查是否有运行时限制也不受属性检查的影响。正常的段加载(MOV to Sreg和POP Sreg)到FS和GS中加载一个在段描述符寄存器的隐藏部分加载一个标准的32位基础值。标准32位以上的基址位以上的基址位被清除为0,以保证使用少于64位的实现方式的一致性。 FS.base和GS.base的隐藏描述符寄存器字段被物理映射到MSR,以便加载64位实现支持的所有地址位。64位实现所支持的所有地址位。CPL=0的软件(特权软件)可以使用WRMSR将所有支持的线性地址位加载到FS.base或GS.base。写入64位FS.base和GS.base寄存器中的地址必须是典型的形式。如果WRMSR指令试图向这些寄存器写入非经典地址的WRMSR指令会导致#GP故障。 当处于兼容模式时,FS和GS的重写操作与32位模式行为的定义无关。值加载到隐藏描述符寄存器基字段的前32位线性地址位。兼容性模式在计算有效地址时忽略上面的32位。 一个新的64位模式指令,SWAPGS,可以用来加载GS base。SWAPGS将IA32_KernelGSbase MSR中的内核数据结构指针与GS base寄存器交换。然后,内核可以使用GS前缀对正常的内存引用来访问内核的数据结构。试图向IA32_KernelGSbase MSR写一个非正则的值(使用WRMSR)会导致一个#GP故障。
堆栈段(由SS寄存器指向的数据段)。该标志被称为B(大)标志。它指定了用于隐式堆栈操作的堆栈指针的大小(如push, pops, and calls)。如果该标志被设置,则使用一个32位的堆栈指针,该指针被存储在32位的ESP寄存器中;如果该标志被清除,则使用16位的堆栈指针,该指针被存储在16位的SP寄存器中。如果堆栈段被设置为一个向下扩展的数据段(在下一段中描述下一段描述),B标志也指定了堆栈段的上界。
[1] [McMahan et al., 2013] H Brendan McMahan, Gary Holt, David Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, et al. Ad click prediction: a view from the trenches. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1222–1230. ACM, 2013.
[2] [Juan et al., 2016] Yuchin Juan, Yong Zhuang, Wei-Sheng Chin, and Chih-Jen Lin. Field-aware factorization machines for ctr prediction. In Proceedings of the 10th ACM Conference on Recommender Systems, pages 43–50. ACM, 2016.
[3] [Wen et al., 2019] Hong Wen, Jing Zhang, Quan Lin, Keping Yang, and Pipei Huang. Multi-level deep cascade trees for conversion rate prediction in recommendation system. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 338–345, 2019
[4] [Zhou et al., 2018] Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1059–1068. ACM, 2018.
[5] [Zhou et al., 2019] Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. Deep interest evolution network for click-through rate prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 5941–5948, 201
[6] [Wang et al., 2017] Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. Deep & cross network for ad click predictions. In Proceedings of the ADKDD’17, page 12. ACM, 2017.
英文
The prediction of click-through rate (CTR) is crucial in online advertising [McMahan et al., 2013[1]; Juan et al., 2016[2]; Wen et al., 2019[3]], where the mission is to estimate the probability that users click on a recommended ad or item. In online advertising, advertisers pay publishers to display their ads on publishers’ sites. One popular payment model is the cost-per-click (CPC) model [Zhou et al., 2018[4]; Zhou et al., 2019[5]], where advertisers are charged only when a click occurs. As a consequence, a publisher’s revenue relies heavily on the ability to predict CTR accurately [Wang et al., 2017[6]].
Nowadays, various CTR models have emerged, from Linear to TreeBased , to Embedding and MLP. With the advancement of deep learning networks, the CTR model has also been fully developed. Each model has its merits. For Instance, Adaptive Factorization Network (AFN) can adaptively learn cross features of any level from data, and Dual Input Perceptual Factorization Machine (DIFM) can effectively learn input perception factors at vector level (used to reweight original feature representations). Nevertheless, there are always various situations for CTR prediction. We are faced with a large amount of user data that needs to be processed quickly at times, and a cold boot due to the lack of user history information at others. There is no CTR model that fits well in all situations.
Inspired by automated machine learning, we will create a CTR library containing a variety of CTR models that are currently performing well in the world. Based on the incoming parameters, we will self-adaptively determine and select the appropriate CTR model for forecasting according to the situation, in order to improve forecasting accuracy, shorten forecasting time, and maximize business efficiency.
[1] [McMahan et al., 2013] H Brendan McMahan, Gary Holt, David Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, et al. Ad click prediction: a view from the trenches. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1222–1230. ACM, 2013.
[2] [Juan et al., 2016] Yuchin Juan, Yong Zhuang, Wei-Sheng Chin, and Chih-Jen Lin. Field-aware factorization machines for ctr prediction. In Proceedings of the 10th ACM Conference on Recommender Systems, pages 43–50. ACM, 2016.
[3] [Wen et al., 2019] Hong Wen, Jing Zhang, Quan Lin, Keping Yang, and Pipei Huang. Multi-level deep cascade trees for conversion rate prediction in recommendation system. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 338–345, 2019
[4] [Zhou et al., 2018] Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1059–1068. ACM, 2018.
[5] [Zhou et al., 2019] Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. Deep interest evolution network for click-through rate prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 5941–5948, 201
[6] [Wang et al., 2017] Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. Deep & cross network for ad click predictions. In Proceedings of the ADKDD’17, page 12. ACM, 2017.
原文表述 CCPM can extract local-global key features from an input instance with varied elements, which can be implemented for not only single ad impression but also sequential ad impression.
原文表述 Specifically,FNN with a supervised-learning embedding layer using factorisation machines is proposed to efficiently reduce the dimension from sparse features to dense continuous features.
原文表述 To utilize the learning ability of neural networks and mine the latent patterns of data in a more effective way than MLPs,in this paper we propose Product-based Neural Network。 PNN is promising to learn high-order latent patterns on multi-field categorical data.
原文表述 WDL’s deep part concatenates sparse feature embeddings as the input of MLP,the wide part use handcrafted feature as input. The logits of deep part and wide part are added to get the prediction probability.
1) it does not need any pre-training; 2) it learns both high- and loworder feature interactions; 3) it introduces a sharing strategy of feature embedding to avoid feature engineering
原文表述 DeepFM can be seen as an improvement of WDL and FNN.Compared with WDL,DeepFM use FM instead of LR in the wide part and use concatenation of embedding vectors as the input of MLP in the deep part. Compared with FNN,the embedding vector of FM and input to MLP are same. And they do not need a FM pretrained vector to initialiaze,they are learned end2end.
原文表述 can handle a large set of sparse and dense features, and learns explicit cross features of bounded degree jointly with traditional deep representations.
原文表述 AFM is a variant of FM,tradional FM sums the inner product of embedding vector uniformly. AFM can be seen as weighted sum of feature interactions.The weight is learned by a small MLP.
原文表述 NFM use a bi-interaction pooling layer to learn feature interaction between embedding vectors and compress the result into a singe vector which has the same size as a single embedding vector. And then fed it into a MLP.The output logit of MLP and the output logit of linear part are added to get the prediction probability.
Thus xDeepFM can automatically learn high-order feature interactions in both explicit and implicit fashions, which is of great significance to reducing manual feature engineering work.
xDeepFM use a Compressed Interaction Network (CIN) to learn both low and high order feature interaction explicitly,and use a MLP to learn feature interaction implicitly. In each layer of CIN,first compute outer products between $x^k$ and $x_0$ to get a tensor $Z_{k+1}$,then use a 1DConv to learn feature maps $H_{k+1}$ on this tensor. Finally,apply sum pooling on all the feature maps $H_k$ to get one vector.The vector is used to compute the logit that CIN contributes.
Deep Interest Network (DIN), which adaptively calculates the representation vector of user interests by taking into consideration the relevance of historical behaviors given a candidate ad.
DIN introduce a attention method to learn from sequence(multi-valued) feature. Tradional method usually use sum/mean pooling on sequence feature. DIN use a local activation unit to get the activation score between candidate item and history items. User’s interest are represented by weighted sum of user behaviors. user’s interest vector and other embedding vectors are concatenated and fed into a MLP to get the prediction.
Deep Interest Evolution Network (DIEN) uses interest extractor layer to capture temporal interests from history behavior sequence. At this layer, an auxiliary loss is proposed to supervise interest extracting at each step. As user interests are diverse, especially in the e-commerce system, interest evolving layer is proposed to capture interest evolving process that is relative to the target item. At interest evolving layer, attention mechanism is embedded into the sequential structure novelly, and the effects of relative interests are strengthened during interest evolution.
The key to our method is the newly-introduced interacting layer, which allows each feature to interact with the others and to determine the relevance through learning.
AutoInt use a interacting layer to model the interactions between different features. Within each interacting layer, each feature is allowed to interact with all the other features and is able to automatically identify relevant features to form meaningful higher-order features via the multi-head attention mechanism. By stacking multiple interacting layers,AutoInt is able to model different orders of feature interactions.
ONN
出发点
1
很少有工作专注于改进由嵌入层学习的特征表示
与传统的特征嵌入方法相比,操作感知嵌入方法为所有操作学习一种表征。
原文表述
1
Compared with the traditional feature embedding method which learns one representation for all operations, operation-aware embedding can learn various representations for different operations.
ONN models second order feature interactions like like FFM and preserves second-order interaction information as much as possible.Further more,deep neural network is used to learn higher-ordered feature interactions.
FiBiNET(Feature Importance and Bilinear feature Interaction NETwork)
Feature Importance and Bilinear feature Interaction NETwork is proposed to dynamically learn the feature importance and fine-grained feature interactions. On the one hand, the FiBiNET can dynamically learn the importance of fea- tures via the Squeeze-Excitation network (SENET) mechanism; on the other hand, it is able to effectively learn the feature interactions via bilinear function.
目的
于动态学习特征重要性和细粒度的特征相互作用。
1
proposed to dynamically learn the feature importance and finegrained feature interactions.
1) For CTR task,the SENET module can learn the importance of features dynamically. It boosts the weight of the important feature and suppresses the weight of unimportant features. 2) We introduce three types of Bilinear-Interaction layers to learn feature interaction rather than calculating the feature interactions with Hadamard product or inner product. 3) Combining the SENET mechanism with bilinear feature interaction in our shallow model outperforms other shallow models such as FM and FFM. 4) In order to improve performance further, we combine a classical deep neural network(DNN) component with the shallow model to be a deep model. The deep FiBiNET consistently outperforms the other state-of-the-art deep models such as DeepFM and XdeepFM.
IFM
输入感知因子机(IFM)通过神经网络为不同实例中的同一特征学习一个独特的输入感知因子。
1
Input-aware Factorization Machine (IFM) learns a unique input-aware factor for the same feature in different instances via a neural network.
适用于稀疏的数据集。它的目的是通过有目的地学习更灵活、更准确的特征,来增强传统的FM。
1
It aims to enhance traditional FMs by purposefully learning more flexible and accurate representation of features for different instances with the help of a factor estimating network.
IFM的两个主要优势
1 2
1.与现有的技术相比,它能产生更好的预测结果 2.它能更深入地了解每个特征在预测任务中的作用。
1 2
i). it produces better prediction results compared to existing techniques ii). it provides deeper insights into the role that each feature plays in the prediction task.
DCN V2
以一种富有表现力而又简单的方式为显式交叉建模,观察到交叉网络中权重矩阵的低秩性质。
1
Observing the low-rank nature of the weight matrix in the cross network
Dual Inputaware Factorization Machines (DIFM) can adaptively reweight the original feature representations at the bit-wise and vector-wise levels simultaneously.Furthermore, DIFMs strategically integrate various components including Multi-Head Self-Attention, Residual Networks and DNNs into a unified end-to-end model.
目的是根据不同的输入实例,借助DIFMs,自适应地学习一个给定特征的灵活表示。
1
It aims to adaptively learn flexible representations of a given feature according to different input instances with the help of the Dual-Factor Estimating Network (Dual-FEN).
主要优点是它不仅能在比特级,而且能在矢量级同时有效地学习输入感知因子(用于重新加权原始特征表示)。
1
The major advantage of DIFM is that it can effectively learn the inputaware factors (used to reweight the original feature representations) not only at the bit-wise level but also at the vectorwise level imultaneously.
Adaptive Factorization Network (AFN) can learn arbitrary-order cross features adaptively from data. The core of AFN is a logarith- mic transformation layer to convert the power of each feature in a feature combination into the coefficient to be learned.
learns arbitrary-order feature interactions adaptively from data. Instead of explicitly modeling all the cross features within a fixed maximum order,AFN is able to generate discriminative cross features and the weights of the corresponding features automatically.