Hong Kong Logistics and Supply Chain MultiTech R&D Centre > R&D Areas > Project Database
Project Database
Project Reference: ITP/049/23LP
Project Title: Application System Prototype for China HS Code Recommendation Automation with Large Language Model
Hosting Institution: LSCM R&D Centre (LSCM)
Abstract: This project aims to research how pretrained large language models (LLMs) can be used
to build an automated system for China Harmonized System Code (HS code)
recommendation. Automating HS code assignment accurately can significantly reduce
customs revenue loss, compliance errors and trade delays. The application can benefit
customs authorities, shippers, and brokers involved in cross-border trade. In particular,
the project can enhance competitive capabilities of local SMEs acting as trade
intermediaries between Hong Kong and Mainland China. The project has relevance with
Government Initiatives to enhance local logistics and SME competitiveness for import and
export business, specifically for Hong Kong’s role as “International Trade Center”, “
International Shipping Center” and regional player in smart logistics per the Chief
Executive’s 2022 Policy Address (sections 44, 47 and 49)
In the past, the unstructured nature of HS code reference descriptions makes automation
tedious and painful with traditional methods. When testing pretrained LLMs' enhanced
abilities to process nuanced queries, unstructured text, context and other information
during retrieval augmentation, we found that certain pipelines allow LLMs to produce
accurate HS codes with short user inputs. Based on pilot experimental testing, the best
approach involves a step by step (2 digit at a time) retrieval workflow with in-context text
block augmentation which shows much higher accuracy for first four digits compare to
standard machine learning classification methods (>92%). From querying the pretrained
LLM, the decision rules from such in-context text blocks also matches >90% of the 2022
HS Code Interpretation (China Customs, Volume 1, Chapters 1-3). The project proposes to test various prompting techniques, retrieval augmentation approach, study
multiple LLMs including China-LLMs to find the best performer(s) to optimize the pipeline
accuracy for HS code derivation.
To allow the HS Code recommendation multistaged pipeline to be automated and
operate as a service, we propose the following engineering development for the
prototype: [1] Incorporate prefix trie as a more efficient data structure to encode
hierarchical HS Codes to reduce multistage retrieval response time [2] A vector database
can help in reducing response time by allowing for more efficient storage and retrieval of
data, considering the practically usable context window size of around 2000 usable
tokens. Finally, a chat service API will be built to facilitate convenient user interactions.
Besides meeting accuracy and response time and rapidly updatable objectives, we will
submit a study report summarizing the optimized processing architecture as well a
comparison of three LLM's performances and limitations for HS Code recommendation.
Project Coordinator: Dr Frank C H TONG
Approved Funding Amount: HK$ 2.71 M
Project Period: 1 Dec 2023 - 31 May 2025
  1. Print
  2. Share
  • Next
  • Previous
  • Back to List