A New Dct-Based Perturbation Scheme for High Capacity Data Hiding in H.264/Avc Intra Frame
In:
Submitted By iboss39 Words 4068 Pages 17
Fast Training-Based Redundant Predictor Elimination Scheme for Intra Coding in H.264/AVC
Tseng-Jung Lin, Kuo-Liang Chung, Po-Chun Chang, and Wei-Jen Yang
National Taiwan University of Science and Technology, Taipei, Taiwan 10672, R. O. C. E-mail: {D9715002, k.l.chung, D9915014, wjyang}@mail.ntust.edu.tw
Abstract—Currently, Laroche et al. presented an efficient method to eliminate redundant predictors for intra coding in H.264/AVC. Their proposed method has a bitrate advantage reduction under a low bitrate environment. In this paper, we present a fast training-based redundant predictor elimination scheme to enhance the execution-time performance while preserving similar bitrates. We first develop a new statistic training approach to construct a set of most similar predictor-pairs and determine the priority of each predictor. Based on the constructed predictor-pair set and the determined predictor priorities, we thus can efficiently eliminate the redundant predictors and preserve more frequently used ones, leading to the advantages of bitrate reduction and computation-saving. The results of experiments on the sixteen standard Video Coding Experts Group (VCEG) test sequences turn out that under similar bitrates, the average execution-time improvement ratio of the proposed scheme over Laroche et al.’s method can be more than 16.77%.
I. I NTRODUCTION H.264/advanced video coding (AVC) [1], [6], [8], [10], established by the Joint Video Team (JVT) of ISO/IEC Moving Picture Experts Group (MPEG) and ITU-T Video Coding Experts Group (VCEG), has become the state-of-the-art video coding standard to deal with a large number of video applications. It can produce more than 50% bitrate improvement ratio over the MPEG-2 video coding standard when having similar video quality [12]. For some applications, such as the surveillance and TV broadcast systems, the bitrate requirement of intra frames dominates the entire compression performance in H.264/AVC. For example, in [5], the simulation results on the Foreman CIF 30 Hz sequence indicated that if an intra frame was transmitted every twelve images, the intra frames’ bitrate approached 50% of the total bitrate when the quantization parameter (QP) was 37, i.e., under a low birate environment. Therefore, researchers have emphasized enhancing the compression performance by reducing the bitrate requirement in intra coding [3], [4], [9], [11], [13]. Most of these methods made more effort to increase the number of possible predictors as well as reduce the residual between the original intra block and the reconstructed intra block. Currently, Laroche et al. [5] proposed an efficient method to eliminate redundant predictors, that may reach more than 22% of intra frames’ bitrate in the high QP environment. The intra coding which only uses the remaining non-redundant predictors can result in bitrate reduction effect, especially for a low bitrate environment. In Laroche et al.’s method, the similarity of each distinct predictor-pair is examined first to determine whether the two predictors are redundant in order to
eliminate the redundant predictors. Hence, the time complexity is Θ(��2 ), where �� is the number of predictors. More detailed explanation of Laroche et al.’s method will be described in Section II. To reduce the time complexity and enhance the execution-time performance, in this paper, we present a fast training-based redundant predictor elimination scheme. The proposed scheme first develops a new statistic training approach to construct a set of the most similar predictor-pairs and determine the priority of each predictor. In the training approach, a complete undirected weighted graph, based on the difference between the predictors of each predictor-pair, is built up, and then the most similar predictor-pair set is constructed by finding the shortest tour of the graph. Simultaneously, the priority of each predictor is determined by its rank probability. Further, we exploit the most similar predictorpair set to eliminate the redundant predictors efficiently. Since the preservation of more frequently used predictors is also a critical issue, the trained predictor priorities are utilized to tackle this problem. The proposed scheme can significantly reduce the time complexity of Laroche et al.’s method from Θ(��2 ) to Θ(��). Sixteen standard VCEG test sequences [15], [16] are used to evaluate the related performance and the results demonstrate that under similar bitrates, the proposed scheme yields more than 16.77% execution-time improvement ratio when compared with Laroche et al.’s method. The rest of this paper is organized as follows. In Section II, a brief survey of Laroche et al.’s method. I is given. In Section III, the proposed fast training-based redundant predictor elimination scheme is presented. In Section IV, the experimental results is demonstrated to show the computationsaving effect of our proposed scheme. Finally, concluding remarks are drawn in Section V. II. S URVEY OF L AROCHE et al.’ S M ETHOD In this section, we introduce Laroche et al.’s method [5]. In the intra coding of H.264/AVC, three macroblock partition modes, namely intra 16 × 16 mode, intra 8 × 8 mode, and intra 4 × 4 mode, are used. Since intra 16 × 16 mode only involves four intra predictors, it is infeasible to eliminate redundant intra predictors for the sake of the bitrate reduction. Therefore, Laroche et al.’s method is primarily used for intra 8 × 8 mode and intra 4 × 4 mode, each with nine intra predictors. Since their method for intra 4 × 4 mode is the same as that for intra 8 × 8 mode, we only discuss the case of intra 4 × 4 mode. For ease of explanation, the partitioned sub-macroblocks based on
P0 (Vertical)
M I J K L A B C D E F G H
P1 (Horizontal)
M I J K L A B C D E F G H
P2 (DC)
M I J Mean(A,D,I,L) K L A B C D E F G H
P3 (Diagonal Down-Left)
M I J K L A B C D E F G H
P4 (Diagonal Down-Right) P5 (Vertical-Right)
M I J K L A B C D E F G H M I J K L A B C D E F G H
��(�� − 1)/2, i.e., Θ(��2 ), where �� is the number of predictors and in this case, �� = 9. Having obtained the reduce predictor set, the optimal predictor, ���� , can be determined by the following rule: ���� = arg min {������ }, ���� = ���� ∪ {���� },
���� ∈����
(2)
P6 (Horizontal-Down)
M I J K L A B C D E F G H
P7 (Vertical-Left)
M I J K L A B C D E F G H
P8 (Horizontal-Up)
M I J K L A B C D E F G H
where ���� is the most probable predictor and �� ���� is the Rate-Distortion Cost (RDC) based on the predictor �� �� . ������ is defined by ������ = ������ + �������� , where ���� �� denotes the distortion measured by the sum of square difference between the current intra block and the reconstructed intra block associated with the predictor �� �� ; �� is a QP-dependent Lagrangian parameter; and �� ���� denotes the bitrate for coding the current intra block. III. T HE P ROPOSED FAST T RAINING -BASED R EDUNDANT P REDICTOR E LIMINATION S CHEME We now present the proposed training-based scheme to signifricantly improve the execution-time performance of Laroche et al.’s method under similar bitrates. In Section III-A, we first develop a new statistic training approach to construct a set of most similar predictor-pairs and determine the priority of each predictor. In Section III-B, we propose the redundant predictor elimination process based on the most similar predictor-pair set and the predictor priorities. A. Statistic training approach
�� { Given an intra block, B , in the video sequence V = } �� B ∣1 ≤ �� ≤ �� , for each predictor-pair, [�� �� , ���� ], in the predictor-pair set, �� [��,�� ] = {[���� , ���� ]∣0 ≤ ��, �� ≤ 8}, we calculate the sum of the absolute differences (SAD) between the corresponding reconstructed intra blocks by ∑ �� ∣B�� �� (��, ��) − B�� �� (��, ��)∣; ������[���� ,���� ] (B�� ) = �� �� 0≤��,��≤3
Fig. 1.
Nine predictors of intra 4 × 4 mode.
intra macroblock partition modes are called by intra blocks. Fig. 1 shows the nine predictors of intra 4 × 4 mode. From the figure, it is obvious that the nine predictors include eight directional predictors and one DC predictor; and they are denoted as ��0 , ��1 , . . . , ��8 , respectively. In addition, the intra coding of H.264/AVC also supports a most probable predictor which is dependent on the predictors of the upper and left neighboring intra blocks. If the most probable predictor is selected to be the predictor of the current intra block, the predictor information is encoded by only one bit; otherwise, it would be encoded by four bits. To reduce the bitrate, Laroche et al.’method eliminate redundant predictors. Let a video sequence with �� intra blocks be { } denoted as V = B�� ∣1 ≤ �� ≤ �� where B�� is the ��-th intra block in V. For B �� , the reconstructed intra block associated with the predictor �� �� is denoted as B�� �� . In Laroche et al.’s �� method, two predictors of B �� , ���� and ���� , can be reduced into one predictor when the following condition is held: ∣��ℳ�� �� ,���� ] (��, ��)∣ ≤ ��(��, ��), ∀��, �� ∈ {0, 1, 2, 3}, [�� (1)
where ��ℳ �� �� ,���� ] = ������ (B�� �� − B�� �� ); ������ (B) denotes [�� �� �� executing DCT [2] on the block B; and ��ℳ �� �� ,���� ] (��, ��) [�� and ��(��, ��) denote the coefficients at position (��, ��) in the DCT coefficient matrix ��ℳ �� �� ,���� ] and the quantization matrix [�� ��, respectively. More detailed explanation for Eq. (1) can be found in [5]. The redundant predictor elimination process is comprised of six steps: Step 1:Set the initial predictor set �� �� = {��0 , ��1 , ��2 , ..., ��8 } and �� ← 0. Step 2:Set �� ← �� + 1. Step 3:If the two conditions �� �� ∈ ���� and Eq. (1) hold, perform the operation �� �� ← ���� ∖ {���� }. Step 4:If the condition �� ≤ 8 holds, set �� ← �� + 1 and go to Step 3; otherwise, go to Step 5. Step 5:If the condition �� ≤ 7 holds, set �� ← �� + 1 and go to Step 2; otherwise, go to Step 6. Step 6:Output ���� as the reduced predictor set and stop; Suppose each statement needs Θ(1) operation. The time complexity of the above process is ranged from (�� − 1) to
∀��, �� ∈ {0, 1, 2, . . . , 8}, where B�� �� (��, ��) and B�� �� (��, ��) denote the pixel values at �� �� position (��, ��) in the reconstructed intra blocks, B �� �� and �� B�� �� , respectively. Then, for the video sequence V, the mean �� of SADs (MSAD) associated with distinct predictor-pairs can be obtained by �� ������[���� ,���� ] =
�� 1 ∑ �� ������[���� ,���� ] (B�� ); �� ��=1
∀��, �� ∈ {0, 1, 2, . . . , 8}. �� ������[���� ,���� ] is a useful measure to calculate the difference between the predictors of the predictor-pair, [�� �� , ���� ]. We use the VCEG test sequences with intra 4×4 mode as an example. Table I shows the MSAD between the predictors of each predictor-pair. After obtaining the MSAD between the predictors of each predictor-pair, we can construct the most similar predictor-pair set.
TABLE I B ASED ON VCEG TEST SEQUENCES , THE MSAD BETWEEN THE
TABLE II T HE RANK PROBABILITY AND PRIORITY OF EACH PREDICTOR BASED ON VCEG TEST SEQUENCES WITH INTRA 4 × 4 MODE .
in Table I, the most similar predictor-pair set is ⟨[��0 , ��5 ], [��5 , ��4 ], [��4 , ��6 ], [��6 , ��1 ], [��1 , ��8 ], [��8 , ��2 ], [��2 , ��3 ], [��3 , ��7 ], [��7 , ��0 ]⟩. Different from Laroche et al.’s method, the proposed scheme prefers to eliminating less frequently used predictors and preserving more frequently used ones. Thus, we assign an appropriate priority in terms of the rank probability to each predictor. Table II shows the rank probability and priority of each predictor based on VCEG test sequences with intra 4 × 4 mode. From the table, it is clear that the higher the rank probability, the higher will be the priority of a predictor. As we want to reduce two predictors into one predictor, the one with lower priority will be eliminated. Note that the construction of the most similar predictor-pair set and the determination of predictor priorities are preformed off-line; and they are just executed once. In the next subsection, we explain how the most similar predictor-pair set and the predictor priorities are used to assist the design of the redundant predictor elimination process. B. Redundant predictor elimination process Using the most similar predictor-pair set, �� [��,�� ] , and the predictor priorities, we can eliminate the redundant predictors efficiently. Because the elements in �� [��,�� ] is based on a cycle, for convenience of explanation, we represent �� [��,�� ] ˆ as a new directed set ��[��,�� ] = ⟨[����(��) , ����((��+1) mod 9) ]∣0 ≤ ˆ ˆ �� ≤ 8⟩, where ����(0) is the predictor with the highest priority. ˆ ˆ From ��[��,�� ] , we can set the initial predictor set �� �� = ⟨���� (0) , ����(1) , ����(2) , ..., ����(8) ⟩. The function ��(�� ) denotes ˆ ˆ ˆ ˆ extraction operation for the priority of the predictor �� . For an input intra block, B �� , the proposed redundant predictor elimination process is comprised of six steps: Step 1:Set ���� ; �� ← 0; and ���� ������ ← ����(0) . ˆ Step 2:If the condition ∣��ℳ �� ��(��) ,����((��+1) mod 9) ] (��, ��)∣ ≤ [�� ˆ ˆ ��(��, ��) holds, go to Step 3; otherwise, go to Step 4. Step 3:If the condition ��(�� �� ������ ) ≥ ��(����((��+1) mod 9) ) ˆ holds, perform the operation �� �� ← ���� ∖ {����((��+1) mod 9) }; otherwise, perform the operaˆ tion ���� ← ���� ∖ {���� ������ } and set ���� ������ ← ���� ((��+1) mod 9) . Then, go to Step 5. ˆ
Definition 1: The problem of constructing the most similar predictor-pair set is that finding a subset �� [��,�� ] = ⟨[����(��) , ����((��+1) mod 9) ]∣0 ≤ �� ≤ 8⟩ ⊆ ��[��,�� ] such that the sum of the MSADs (SMSADs) between distinct predictor-pairs in �� [��,�� ] , i.e., ���� ��������[��,�� ] = ∑ [���� ,���� ]∈��[��,�� ] �� ������[���� ,���� ] , is minimum. �� is a permutation of the elements in the set, {0, 1, . . . , 8}, and ��(��) is the ��-th permuted element. To construct the most similar predictor-pair set, we first build up a complete undirected weighted graph �� = (��, ��, ��) where �� = {���� ∣0 ≤ �� ≤ 8} is the set of vertices; �� = {��[���� ,���� ] = (���� , ���� )∣0 ≤ ��, �� ≤ 8; ∀�� < ��} is the edge set; and �� = {��(�� [���� ,���� ] ) = �� ������[���� ,���� ] ∣0 ≤ ��, �� ≤ 8; ∀�� ∕= ��} denotes the weight of each edge. Then, we find the shortest tour of ��, namely the Hamilton cycle �� = ⟨��[����(��) ,����((��+1) mod 9) ] ∣0 ≤ �� ≤ 8⟩ such that the sum of ∑8 the related weights, �� (��) = ��=0 ��(��[����(��) ,����((��+1) mod 9) ] ), is as small as possible [7]. Although the problem of finding the shortest tour from a graph is NP-complete, for this case, we can solve this problem by an exhaustive search approach since there are only nine vertices in ��. Based on the obtained Hamilton cycle ��, we can realize the construction of the most similar predictor-pair set, ��[��,�� ] = ⟨[����(��) , ����((��+1) mod 9) ]∣��[����(��) ,����((��+1) mod 9) ] ∈ ��, 0 ≤ ��, �� ≤ 8⟩. For example, based on MSADs illustrated
Step 4:Set ���� ������ ← ����((��+1) mod 9) , and then go to Step 5. ˆ Step 5:If the condition �� ≤ 8 holds, set �� ← �� + 1 and go to Step 2; otherwise, go to Step 6. Step 6:Output ���� as the reduced predictor set and stop. In the above process, the definitions of ��ℳ�� ��(��) ,����((��+1) mod 9) ] (��, ��) amd ��(��, ��) are the [�� ˆ ˆ same as those in Eq. (1). Suppose each statement needs Θ(1) operation and the time complexity of the above process is Θ(��), where �� (= 9) is the number of predictors. According to the above analysis, the proposed scheme can seriously reduce the time complexity of Laroche et al.’s method. Finally, the optimal predictor can be determined by Eq. (2). In the next section, we report the results of experiments to demonstrate the applicability and execution-time saving advantage of the proposed algorithm.
V. CONCLUSIONS We have proposed a fast training-based redundant predictor elimination scheme to enhance the execution-time performance of Laroche et al.’s method [5] while preserving similar bitrates. The proposed scheme differs from Laroche et al.’s method in that it uses a set of most similar predictor-pairs and the priority of each predictor to guide the redundant predictor elimination process. In the proposed scheme, a new statistic training approach is first developed to construct a set of most similar predictor-pairs and determine the priority of each predictor, that can used to assist in eliminating the redundant predictors efficiently and preserving more frequently used ones. The proposed scheme can significantly reduce the time complexity of Laroche et al.’s method form Θ(�� 2 ) to Θ(��). By experimenting on sixteen standard VCEG test sequences, the results demonstrate that under similar bitrates, the proposed scheme has more than 16.77% execution-time improvement ratio over Laroche et al.’s method. ACKNOWLEDGMENT This work was supported by the National Science Council of R. O. C. under contract NSC98-2221-E-011-084-MY3. R EFERENCES
[1] Draft ITU-T recommendation and final draft international standard of joint video specification (ITU-T Rec. H.264/ISO/IEC 14 496-10 AVC,) Joint Video Team of ISO/IEC and ITU-T, 2003. [2] X. Ji, S. Kwong, D. Zhao, H. Wang, C. C. J. Kou, and Q. Dai, “Early Determination of Zero-Quantized 8 × 8 DCT Coefficients,” IEEE Trans. Circuits and Systems for Video Technology, vol. 19, no. 12, pp. 1755– 1765, 2009. [3] M. Karczewicz, “Improvement Intra Coding,” ITU-T VCEG, San Jose, California, USA, Proposition VCEG-AF15, 2007. [4] G. Laroche, J. Jung, and B. Pesquet, “Intra Prediction with 1D Macroblock Partitioning for Image and Video Coding,” in Proc. SPIE VCIP, San jose, California, USA, 2009. [5] G. Laroche, J. Jung, and B. Pesquet, “Intra Coding with Prediction Mode Information Inference,” IEEE Trans. Circuits and Systems for Video Technology, vol. 20, no. 12, pp. 1786-1796, 2011. [6] J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira, T. Stockhammer, and T. Wedi, “Video coding with H.264/AVC: tools, performance, and complexity,” IEEE Circuits and Systems Magazine, vol. 4, no. 1, pp. 7–28, 2004. [7] C. H. Papadimitriou, Computational Complexity, Addison Wesley, Boston, U.S., 1994. [8] I. E. G. Richardson, H.264 and MPEG-4 Video Compression, John Willy & Sons, Ltd., 2003. [9] T. Shiodera, A. Tanizawa, and T. Chujoh, “Block Based Extra/InterPolating Prediction for Intra Coding,” in Proc. of 2007 IEEE International Conference on Image Processing, vol. 6, San Antonio, Texas, USA, 2007, pp. 445–448. [10] G. J. Sullivan and T. Wiegand, “Rate-distortion optimization for video compression,” IEEE Signal Processing Magazine, vol 15, no. 6 pp. 74– 90, 1998. [11] T. Tsukuba, T. Yamamoto, Y. Tokumo, and T. Aono, “Adaptive Multidirectional Intra Prediction,” ITU-T VCEG, Shenzhen, China, Proposition VCEG-AG05, 2007. [12] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, ”Overview of the H.264/AVC video coding standard,” IEEE Trans. Circuits and Systems for Video Technology, vol. 13, no. 17, pp. 560–567, 2003. [13] M. Wien, “Intra coding using Variable Block Sizes,” ITU-T VCEG, Pattaya, Thailand, Proposition VCEG-O31, 2001. [14] [Online]. Available: http://iphome.hhi.de/suehring/tml/. [15] [Online]. Available: http://media.xiph.org/video/derf/. [16] [Online]. Available: http://www.lvideoservices.net/hd links.htm.
IV. EXPERIMENTAL RESULTS To test the effectiveness of the proposed algorithm, we conducted experiments on sixteen standard VCEG test sequences [15], [16]. We compared the performance among the method used in JM 14.2, Laroche et al.’s method, and our proposed training-based scheme. For our scheme, we used eleven VCEG test sequences to be the training set. In the experiments, all the blocks of the test sequences were intra coded; and six different QPs, manely 17, 22, 27, 32, 37, and 42, were considered for encoding the test sequences. All the concerned methods were implemented on the IBM compatible computer with Intel Core 2 Duo E7400 CPU 2.8 GHz and 2GB RAM. The operating system used was Microsoft Windows 7; and the implementation platform was JM14.2 [14], that is realized by Visual C++ 2005. Based on the sixteen test sequences and six different QPs, we ran the three concerned methods, respectively. Table III shows the performance comparison in terms of the average signal-to noise ratio (PSNR), the average bitrate, and the average execution-time. From the table, it is obvious that on average, the three methods yielded almost the same PSNR quality performance; and both Laroche et al.’s method and our proposed scheme produced better compression performance, i.e., less bitrate requirement, than the one used in JM 14.2. Besides, under similar bitrates, the execution-time performance of the proposed scheme is superior to that of Laroche et al.’s method. Finally, Table IV illustrates the average bitrate degradation ratio (ABDR) and average execution-time improvement ratio (AETIR) of the proposed scheme when compared with Laroche et al.’s method. From the table, it is clear that the average bitrate degradation ratio is only 0.122% (= 16810.91−16790.35 × 100%); however, the average execution16810.91 time improvement ratio of the proposed scheme can reach 16.77% (= 1182.77−984.46 × 100%). Since the average bitrate 1182.77 degradation ratio is negligible but the average executiontime improvement ratio is significant, the proposed scheme is applicable.
T HE PERFORMANCE COMPARISON IN TERMS OF THE
AVERAGE
TABLE III PSNR, THE
AVERAGE BITRATE , AND THE EXECUTION - TIME .
QP:17-42 Sequences (QCIF) Foreman Container Silent (CIF) Foreman Carphone Mobile Crew Ice (4CIF) City Ice Soccer (720P) In to tree Parkrun ter (1080P) Blue sky Park joy Tractor Average PSNR (dB) 36.80 36.67 36.26 37.58 37.90 34.96 37.78 39.98 36.47 40.37 37.22 36.70 35.00 39.95 36.57 39.00 37.45