{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,2]],"date-time":"2025-08-02T16:18:03Z","timestamp":1754151483560,"version":"3.41.2"},"reference-count":49,"publisher":"Oxford University Press (OUP)","issue":"7","license":[{"start":{"date-parts":[[2025,2,17]],"date-time":"2025-02-17T00:00:00Z","timestamp":1739750400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/linproxy.fan.workers.dev:443\/https\/academic.oup.com\/pages\/standard-publication-reuse-rights"}],"funder":[{"DOI":"10.13039\/501100001809","name":"Chinese National Natural Science Foundation","doi-asserted-by":"crossref","award":["61272078","62032010","62172201"],"award-info":[{"award-number":["61272078","62032010","62172201"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,7,16]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Variable-type information is fundamental, and it greatly helps in understanding the program semantics. Previous work applies rule-based and machine learning-based methods to recover variable types from commercial off-the-shelf binaries, heavily relying on the data flow or control flow. However, according to our study, about half of the variables lacked or even had no data flow; this problem has not received much attention from previous work. We empirically explore the severity of this problem to the type inference task and analyze its root causes. Based on compilation properties, we find that the instructions surrounding the instructions that operate on variables provide good contextual information that can be used for co-encoding to overcome the above problem. In this paper, we present an effective machine learning-based method to infer variable types and overcome the challenge of limited data dependency via adjacent instructions co-encoding. Therefore, we implement a system called CATI++, which locates variables from stripped binaries and infers 19 types of variables. We evaluate CATI++ on different compilation options, all of which outperforms state-of-the-art methods. The ablation experiments verify that our scheme is not sensitive to compilation conditions, while our designed method effectively alleviates the problems caused by missing data dependency.<\/jats:p>","DOI":"10.1093\/comjnl\/bxaf004","type":"journal-article","created":{"date-parts":[[2025,2,17]],"date-time":"2025-02-17T11:44:15Z","timestamp":1739792655000},"page":"788-803","source":"Crossref","is-referenced-by-count":0,"title":["CATI++: empirical study and evaluation for adjacent instruction enhanced type inference"],"prefix":"10.1093","volume":"68","author":[{"given":"Ligeng","family":"Chen","sequence":"first","affiliation":[{"name":"State Key Laboratory for Novel Software Technology , Nanjing University, Nanjing, Xianlin Road 163, Jiangsu Province,","place":["China"]},{"name":"Department of Computer Science and Technology , Nanjing University, Nanjing, Xianlin Road 163, Jiangsu Province,","place":["China"]}]},{"given":"Zhongling","family":"He","sequence":"additional","affiliation":[{"name":"State Key Laboratory for Novel Software Technology , Nanjing University, Nanjing, Xianlin Road 163, Jiangsu Province,","place":["China"]},{"name":"Department of Computer Science and Technology , Nanjing University, Nanjing, Xianlin Road 163, Jiangsu Province,","place":["China"]}]},{"given":"Yi","family":"Qian","sequence":"additional","affiliation":[{"name":"State Key Laboratory for Novel Software Technology , Nanjing University, Nanjing, Xianlin Road 163, Jiangsu Province,","place":["China"]},{"name":"Department of Computer Science and Technology , Nanjing University, Nanjing, Xianlin Road 163, Jiangsu Province,","place":["China"]}]},{"given":"Bing","family":"Mao","sequence":"additional","affiliation":[{"name":"State Key Laboratory for Novel Software Technology , Nanjing University, Nanjing, Xianlin Road 163, Jiangsu Province,","place":["China"]},{"name":"Department of Computer Science and Technology , Nanjing University, Nanjing, Xianlin Road 163, Jiangsu Province,","place":["China"]}]}],"member":"286","published-online":{"date-parts":[[2025,2,17]]},"reference":[{"key":"2025071900315910100_ref1","first-page":"1667","article-title":"Debin: Predicting debug information in stripped binaries","volume-title":"Proceedings of the 2018 ACM SIGSAC conference on computer and communications security","author":"He","year":"2018"},{"key":"2025071900315910100_ref2","doi-asserted-by":"crossref","first-page":"690","DOI":"10.1145\/3468264.3468607","article-title":"Stateformer: Fine-grained type recovery from binaries using generative state modeling","volume-title":"Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering","author":"Pei","year":"2021"},{"key":"2025071900315910100_ref3","first-page":"4327","article-title":"Augmenting decompiler output with learned variable names and types","volume-title":"31st USENIX Security Symposium (USENIX Security 22)","author":"Chen","year":"2022"},{"key":"2025071900315910100_ref4","doi-asserted-by":"publisher","first-page":"288","DOI":"10.1007\/978-3-030-22038-9_14","article-title":"Typeminer: Recovering types in binary programs using machine learning","volume-title":"International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment","author":"Maier","year":"2019"},{"key":"2025071900315910100_ref5","doi-asserted-by":"crossref","DOI":"10.14722\/ndss.2015.23297","article-title":"Vfguard: Strict protection for virtual function calls in cots c++ binaries","volume-title":"NDSS","author":"Prakash","year":"2015"},{"key":"2025071900315910100_ref6","doi-asserted-by":"crossref","first-page":"559","DOI":"10.1109\/SP.2013.44","article-title":"Practical control flow integrity and randomization for binary executables","volume-title":"2013 IEEE Symposium on Security and Privacy","author":"Zhang","year":"2013"},{"key":"2025071900315910100_ref7","first-page":"337","article-title":"Control flow integrity for COTS binaries","volume-title":"Presented as part of the 22ndUSENIXSecurity Symposium (USENIX Security 13)","author":"Zhang","year":"2013"},{"key":"2025071900315910100_ref8","doi-asserted-by":"crossref","DOI":"10.14722\/ndss.2016.23185","article-title":"Discovre: Efficient cross-architecture identification of bugs in binary code","volume-title":"NDSS","author":"Eschweiler","year":"2016"},{"key":"2025071900315910100_ref9","doi-asserted-by":"crossref","first-page":"406","DOI":"10.1145\/2664243.2664269","article-title":"Leveraging semantic signatures for bug search in binary programs","volume-title":"Proceedings of the 30th Annual Computer Security Applications Conference","author":"Pewny","year":"2014"},{"article-title":"code2seq: Generating sequences from structured representations of code","year":"2018","author":"Alon","key":"2025071900315910100_ref10"},{"key":"2025071900315910100_ref11","article-title":"Cross-language learning for program classification using bilateral tree-based convolutional neural networks","volume-title":"Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence","author":"Bui","year":"2018"},{"key":"2025071900315910100_ref12","first-page":"1","article-title":"Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization","volume-title":"Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization","author":"Ding","year":"2019"},{"key":"2025071900315910100_ref13","doi-asserted-by":"crossref","first-page":"667","DOI":"10.1145\/3238147.3238199","article-title":"$\\alpha $diff: Cross-version binary code similarity detection with dnn","volume-title":"Proceedings of the 33rd ACM\/IEEE International Conference on Automated Software Engineering","author":"Liu","year":"2018"},{"volume-title":"IDA pro","year":"2020","author":"IDA Pro","key":"2025071900315910100_ref14"},{"key":"2025071900315910100_ref15","article-title":"Tie: Principled reverse engineering of types in binary programs","volume-title":"NDSS","author":"Lee","year":"2011"},{"key":"2025071900315910100_ref16","doi-asserted-by":"publisher","first-page":"51","DOI":"10.1145\/2499370.2462165","article-title":"Scalable variable and data type detection in a binary rewriter","volume":"48","author":"ElWazeer","year":"2013","journal-title":"ACM SIGPLAN Notices"},{"key":"2025071900315910100_ref17","doi-asserted-by":"crossref","DOI":"10.14722\/ndss.2015.23099","article-title":"Vtint: Protecting virtual function tables\u2019 integrity","volume-title":"NDSS","author":"Zhang","year":"2015"},{"key":"2025071900315910100_ref18","first-page":"5","article-title":"Automatic reverse engineering of data structures from binary execution","volume-title":"Proceedings of the 11th Annual Information Security Symposium","author":"Lin","year":"2010"},{"key":"2025071900315910100_ref19","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1109\/WCRE.2013.6671278","article-title":"Mempick: High-level data structure detection in c\/c++ binaries","volume-title":"2013 20th Working Conference on Reverse Engineering (WCRE)","author":"Haller","year":"2013"},{"key":"2025071900315910100_ref20","doi-asserted-by":"publisher","first-page":"430","DOI":"10.1007\/978-3-319-68690-5_26","article-title":"Learning types for binaries","volume-title":"International Conference on Formal Engineering Methods","author":"Zhiwu","year":"2017"},{"key":"2025071900315910100_ref21","first-page":"845","article-title":"BYTEWEIGHT: Learning to recognize functions in binary code","volume-title":"23rd USENIX Security Symposium (USENIX Security 14)","author":"Bao","year":"2014"},{"key":"2025071900315910100_ref22","doi-asserted-by":"publisher","first-page":"463","DOI":"10.1007\/978-3-642-22110-1_37","article-title":"Bap: A binary analysis platform","volume-title":"International Conference on Computer Aided Verification","author":"Brumley","year":"2011"},{"volume-title":"Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data","year":"2001","author":"Lafferty","key":"2025071900315910100_ref23"},{"key":"2025071900315910100_ref24","doi-asserted-by":"crossref","first-page":"88","DOI":"10.1109\/DSN48063.2020.00028","article-title":"Cati: Context-assisted type inference from stripped binaries","volume-title":"2020 50th Annual IEEE\/IFIP International Conference on Dependable Systems and Networks (DSN)","author":"Chen","year":"2020"},{"key":"2025071900315910100_ref25","doi-asserted-by":"crossref","first-page":"112","DOI":"10.1109\/SANER53432.2022.00025","article-title":"Dicomp: Lightweight data-driven inference of binary compiler provenance with high accuracy","volume-title":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","author":"Chen","year":"2022"},{"key":"2025071900315910100_ref26","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/978-3-540-69738-1_1","article-title":"Divine: Discovering variables in executables","volume-title":"International Workshop on Verification, Model Checking, and Abstract Interpretation","author":"Balakrishnan","year":"2007"},{"volume-title":"The DWARF Debugging Standard","year":"2020","author":"dwarf","key":"2025071900315910100_ref27"},{"key":"2025071900315910100_ref28","doi-asserted-by":"crossref","first-page":"1532","DOI":"10.3115\/v1\/D14-1162","article-title":"Glove: Global vectors for word representation","volume-title":"Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)","author":"Pennington","year":"2014"},{"volume-title":"Keras","year":"2020","author":"keras","key":"2025071900315910100_ref29"},{"volume-title":"Scikit-Learn","year":"2020","author":"scikit learn","key":"2025071900315910100_ref30"},{"volume-title":"Arm Vs x86: Instruction Sets, Architecture, and all Key Differences Explained","year":"2022","author":"Triggs","key":"2025071900315910100_ref31"},{"key":"2025071900315910100_ref32","doi-asserted-by":"crossref","first-page":"1746","DOI":"10.3115\/v1\/D14-1181","article-title":"Convolutional neural networks for sentence classification","volume-title":"Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Kim","year":"2014"},{"article-title":"A c-lstm neural network for text classification","year":"2015","author":"Zhou","key":"2025071900315910100_ref33"},{"key":"2025071900315910100_ref34","doi-asserted-by":"crossref","first-page":"934","DOI":"10.1109\/SP.2016.60","article-title":"A tough call: Mitigating advanced code-reuse attacks at the binary level","volume-title":"2016 IEEE Symposium on Security and Privacy (SP)","author":"Van Der Veen","year":"2016"},{"key":"2025071900315910100_ref35","doi-asserted-by":"publisher","first-page":"423","DOI":"10.1007\/978-3-030-00470-5_20","article-title":"$\\tau $ cfi: Type-assisted control flow integrity for x86\u201364 binaries","volume-title":"International Symposium on Research in Attacks, Intrusions, and Defenses","author":"Muntean","year":"2018"},{"key":"2025071900315910100_ref36","first-page":"99","article-title":"Neural nets can learn function type signatures from binaries","volume-title":"26th USENIX Security Symposium","author":"Chua","year":"2017"},{"key":"2025071900315910100_ref37","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1145\/3508398.3511502","article-title":"Resil: Revivifying function signature inference using deep learning with domain-specific knowledge","volume-title":"Proceedings of the Twelveth ACM Conference on Data and Application Security and Privacy","author":"Lin","year":"2022"},{"key":"2025071900315910100_ref38","doi-asserted-by":"crossref","DOI":"10.1109\/QRS57517.2022.00053","article-title":"Nimbus: Toward speed up function signature recovery via input resizing and multi-task learning","author":"Qian","year":"2022"},{"key":"2025071900315910100_ref39","doi-asserted-by":"crossref","first-page":"36","DOI":"10.1109\/SP40001.2021.00006","article-title":"When function signature recovery meets compiler optimization","volume-title":"2021 IEEE Symposium on Security and Privacy (SP)","author":"Lin","year":"2021"},{"key":"2025071900315910100_ref40","first-page":"611","article-title":"Recognizing functions in binaries with neural networks","volume-title":"24thUSENIXSecurity Symposium (USENIX Security 15)","author":"Shin","year":"2015"},{"key":"2025071900315910100_ref41","doi-asserted-by":"crossref","DOI":"10.14722\/ndss.2021.23112","article-title":"Xda: Accurate, robust disassembly with transfer learning","volume-title":"Proceedings of the 2021 Network and Distributed System Security Symposium (NDSS)","author":"Pei","year":"2021"},{"key":"2025071900315910100_ref42","article-title":"Deepdi: Learning a relational graph convolutional network model on instructions for fast and accurate disassembly","volume-title":"Proc. of the USENIX Security Symposium","author":"Sheng, Yu","year":"2022"},{"key":"2025071900315910100_ref43","article-title":"Static detection of c++ vtable escape vulnerabilities in binary code","volume-title":"NDSS","author":"Dewey","year":"2012"},{"key":"2025071900315910100_ref44","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1145\/2980983.2908119","article-title":"Polymorphic type inference for machine code","volume-title":"ACM SIGPLAN Notices","author":"Noonan","year":"2016"},{"key":"2025071900315910100_ref45","doi-asserted-by":"crossref","DOI":"10.14722\/ndss.2017.23096","article-title":"Marx: Uncovering class hierarchies in c++ programs","volume-title":"NDSS","author":"Pawlowski","year":"2017"},{"key":"2025071900315910100_ref46","first-page":"331","article-title":"Dsibin: Identifying dynamic data structures in c\/c++ binaries","volume-title":"Proceedings of the 32nd IEEE\/ACM International Conference on Automated Software Engineering","author":"Rupprecht","year":"2017"},{"key":"2025071900315910100_ref47","doi-asserted-by":"crossref","first-page":"470","DOI":"10.1145\/1014052.1014105","article-title":"Learning to detect malicious executables in the wild","volume-title":"Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining","author":"Kolter","year":"2004"},{"key":"2025071900315910100_ref48","doi-asserted-by":"publisher","first-page":"204","DOI":"10.1007\/978-3-540-89900-6_21","article-title":"Unknown malcode detection using opcode representation","volume-title":"European conference on intelligence and security informatics","author":"Moskovitch","year":"2008"},{"key":"2025071900315910100_ref49","first-page":"38","article-title":"Data mining methods for detection of new malicious executables","volume-title":"Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001","author":"Schultz","year":"2000"}],"container-title":["The Computer Journal"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/linproxy.fan.workers.dev:443\/https\/academic.oup.com\/comjnl\/article-pdf\/68\/7\/788\/61932331\/bxaf004.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/linproxy.fan.workers.dev:443\/https\/academic.oup.com\/comjnl\/article-pdf\/68\/7\/788\/61932331\/bxaf004.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,19]],"date-time":"2025-07-19T04:32:11Z","timestamp":1752899531000},"score":1,"resource":{"primary":{"URL":"https:\/\/linproxy.fan.workers.dev:443\/https\/academic.oup.com\/comjnl\/article\/68\/7\/788\/8019595"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,17]]},"references-count":49,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2025,2,17]]},"published-print":{"date-parts":[[2025,7,16]]}},"URL":"https:\/\/linproxy.fan.workers.dev:443\/https\/doi.org\/10.1093\/comjnl\/bxaf004","relation":{},"ISSN":["0010-4620","1460-2067"],"issn-type":[{"type":"print","value":"0010-4620"},{"type":"electronic","value":"1460-2067"}],"subject":[],"published-other":{"date-parts":[[2025,7]]},"published":{"date-parts":[[2025,2,17]]}}}