Text classification : Target 4294967295 is out of bounds (CPU) #2369

AlbelTec · 2022-11-12T09:30:20Z

to be linked to : #2369 (comment)

System Information (please complete the following information):

Model Builder Version (available in Manage Extensions dialog): 16.14.0.2255902
Visual Studio Version : 2022

Describe the bug

On which step of the process did you run into an issue: when starting the train step
Clear description of the problem: when strating the trainer, I'm geeting this error

To Reproduce
Steps to reproduce the behavior:

Go to 'Train step'
Click on 'Start training'
See error in pop up window

Expected behavior
A clear and concise description of what you expected to happen and what is causing this error

Screenshots

Target 4294967295 is out of bounds.
Exception raised from nll_loss_out_frame at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\LossNLL.cpp:230 (most recent call first):
00007FFF91B7A4C200007FFF91B7A460 c10.dll!c10::Error::Error [<unknown file> @ <unknown line number>]
00007FFF91B53ED500007FFF91B53E60 c10.dll!c10::IndexError::IndexError [<unknown file> @ <unknown line number>]
00007FFEC4151FB400007FFEC414CE60 torch_cpu.dll!at::native::multi_margin_loss_cpu_out [<unknown file> @ <unknown line number>]
00007FFEC41558F300007FFEC414CE60 torch_cpu.dll!at::native::multi_margin_loss_cpu_out [<unknown file> @ <unknown line number>]
00007FFEC415773400007FFEC41576D0 torch_cpu.dll!at::native::structured_nll_loss_forward_out_cpu::impl [<unknown file> @ <unknown line number>]
00007FFEC499C3DE00007FFEC498B710 torch_cpu.dll!at::cpu::zero_ [<unknown file> @ <unknown line number>]
00007FFEC49619AE00007FFEC4936730 torch_cpu.dll!at::cpu::bucketize_outf [<unknown file> @ <unknown line number>]
00007FFEC459B89000007FFEC45474F0 torch_cpu.dll!at::_ops::zeros_out::redispatch [<unknown file> @ <unknown line number>]
00007FFEC472308300007FFEC4722FE0 torch_cpu.dll!at::_ops::nll_loss_forward::redispatch [<unknown file> @ <unknown line number>]
00007FFEC54BAFA300007FFEC533A050 torch_cpu.dll!torch::autograd::GraphRoot::apply [<unknown file> @ <unknown line number>]
00007FFEC54860F200007FFEC533A050 torch_cpu.dll!torch::autograd::GraphRoot::apply [<unknown file> @ <unknown line number>]
00007FFEC46D698C00007FFEC46D6800 torch_cpu.dll!at::_ops::nll_loss_forward::call [<unknown file> @ <unknown line number>]
00007FFEC4157F0F00007FFEC4157E90 torch_cpu.dll!at::native::nll_loss [<unknown file> @ <unknown line number>]
00007FFEC4B1B6B200007FFEC4B17680 torch_cpu.dll!at::compositeimplicitautograd::where [<unknown file> @ <unknown line number>]
00007FFEC4AFAA5D00007FFEC4ACFD00 torch_cpu.dll!at::compositeimplicitautograd::broadcast_to [<unknown file> @ <unknown line number>]
00007FFEC47A0C6F00007FFEC47A0AE0 torch_cpu.dll!at::_ops::nll_loss::call [<unknown file> @ <unknown line number>]
00007FFEC415888F00007FFEC4157F80 torch_cpu.dll!at::native::nll_loss_nd [<unknown file> @ <unknown line number>]
00007FFEC4B1B6E200007FFEC4B17680 torch_cpu.dll!at::compositeimplicitautograd::where [<unknown file> @ <unknown line number>]
00007FFEC4AFAACD00007FFEC4ACFD00 torch_cpu.dll!at::compositeimplicitautograd::broadcast_to [<unknown file> @ <unknown line number>]
00007FFEC45E142F00007FFEC45E12A0 torch_cpu.dll!at::_ops::nll_loss_nd::call [<unknown file> @ <unknown line number>]
00007FFEC415653F00007FFEC4156250 torch_cpu.dll!at::native::cross_entropy_loss [<unknown file> @ <unknown line number>]
00007FFEC4B1968100007FFEC4B17680 torch_cpu.dll!at::compositeimplicitautograd::where [<unknown file> @ <unknown line number>]
00007FFEC4AFAB5200007FFEC4ACFD00 torch_cpu.dll!at::compositeimplicitautograd::broadcast_to [<unknown file> @ <unknown line number>]
00007FFEC4786B2300007FFEC4786980 torch_cpu.dll!at::_ops::cross_entropy_loss::call [<unknown file> @ <unknown line number>]
00007FFEC3E7FC7100007FFEC3E7FC40 torch_cpu.dll!at::cross_entropy_loss [<unknown file> @ <unknown line number>]
00007FFF30895E0500007FFF30895C60 LibTorchSharp.DLL!THSNN_cross_entropy [<unknown file> @ <unknown line number>]
00007FFF39B3F754 <unknown symbol address> !<unknown symbol> [<unknown file> @ <unknown line number>]

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

beccamc · 2022-11-14T21:03:31Z

It looks like we've gotten two of these - #2368. @LittleLittleCloud any ideas about possible root cause?

beccamc · 2022-11-14T21:59:23Z

We seem to be hitting max int. @AlbelTec Can you give more details about your dataset? What is the size?

AlbelTec · 2022-11-16T19:36:28Z

Hi @beccamc Sorry for the delay I was off. The dataset is very simple based on financial emails (text & label) for multiclassification purpose. The dataset contains 2588 texts, labels. each text could contain more than 2000 characters.

beccamc · 2022-11-16T19:46:24Z

@JakeRadMSFT Thoughts? This doesn't sound like a very large dataset causing the problem.

AlbelTec · 2022-11-16T20:42:36Z

@beccamc I did a small test where I tried once again but with very small dataset (16 rows) and I'm not getting the error. I'm wondering if long texts (more than 2000 characters) could raise the issue.

beccamc · 2022-11-16T20:59:47Z

@AlbelTec Are you able to share your dataset?

AlbelTec · 2022-11-16T21:24:04Z

@beccamc Unfortunately not as it contains sensitive data. Actually, texts are emails and aren't cleaned up for purpose (signatures are included as well as all replies / forwards).

Soarc · 2022-12-25T01:37:54Z

@beccamc Same error. I can share mine :)
test-data.csv

beccamc · 2023-01-03T17:08:26Z

@v-Hailishi Can you try to repro with Soarc's dataset?

v-Hailishi · 2023-01-04T03:53:26Z

@beccamc By using Soarc's dataset test-data.csv, I can repro this issue on the latest main build 16.14.1.2262701

beccamc · 2023-01-04T19:49:01Z

@LittleLittleCloud Can you take a look at this?

LittleLittleCloud · 2023-01-05T00:19:26Z

4294967295 = 2^32 -1, so it should be caused by a wrong type casting?

After examining the code base, this place looks really suspicious.

https://github.com/dotnet/machinelearning/blob/9d798f1bb3fb17fe97eba77a694c35e2cb46a4b7/src/Microsoft.ML.TorchSharp/NasBert/TextClassificationTrainer.cs#L110

when target is 0, target - 1 will be 4294967295 after casting from uint to long

@michaelgsharp Would you take a closer look at this issue, especially verify if TextClassification still works if one of target/label is 0? Or the target of text classification should never be smaller than 1.

Update

The root cause is MapValueToKey will produce a key whose value is 0 when the value is 'NaN' or not exist in term map. And currently TextClassification will be break on all dataset which contains a 0-value key as label.

I created an issue in ml.net repo, in the meanwhile, a temp fix in model builder can be filter out rows where label is nan/empty.

LittleLittleCloud · 2023-01-05T00:57:56Z

Scale up this issue to Priority:0 as this issue might affect all text-classification scenarios.

beccamc · 2023-01-05T15:56:29Z

Can we get a repro with just ML.NET?

luisquintanilla · 2023-02-16T19:19:53Z

Closing this issue since it should be resolved in the framework. Tracking issue dotnet/machinelearning#6534

scottyboiler · 2023-08-16T12:17:35Z

@luisquintanilla - Im confused by the comment stating that this "should be resolved in the framework". I am encountering this issue today. Can you clarify the fix? Thank you.

luisquintanilla · 2023-08-16T13:07:33Z

@luisquintanilla - Im confused by the comment stating that this "should be resolved in the framework". I am encountering this issue today. Can you clarify the fix? Thank you.

@scottyboiler Model Builder is tooling for the ML.NET Framework (Microsoft.ML set of NuGet packages). The issue is at the framework level, not tooling. Therefore, fixing it there would also fix it for Model Builder. I hope that clarifies it.

scottyboiler · 2023-08-16T14:18:06Z

@luisquintanilla clarifying ML.NET fw vs. .NET fw is a helpful distinction. Thanks for the quick response.

AlbelTec changed the title ~~Text classification : error when starting training~~ Text classification : error when starting CPU training Nov 12, 2022

zewditu added the Reported by: Customer label Nov 14, 2022

beccamc changed the title ~~Text classification : error when starting CPU training~~ Text classification : Target 4294967295 is out of bounds (CPU) Nov 30, 2022

beccamc added this to the February 2023 milestone Jan 4, 2023

LittleLittleCloud added the Priority:0 Work that we can't release without label Jan 5, 2023

LittleLittleCloud mentioned this issue Jan 5, 2023

Text classification : Target 4294967295 is out of bounds dotnet/machinelearning#6534

Open

LittleLittleCloud self-assigned this Jan 5, 2023

beccamc modified the milestones: February 2023, March 2023 Feb 1, 2023

luisquintanilla closed this as completed Feb 16, 2023

LittleLittleCloud mentioned this issue Oct 6, 2023

ML.NET Training Failure #2779

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text classification : Target 4294967295 is out of bounds (CPU) #2369

Text classification : Target 4294967295 is out of bounds (CPU) #2369

AlbelTec commented Nov 12, 2022 •

edited

beccamc commented Nov 14, 2022

beccamc commented Nov 14, 2022

AlbelTec commented Nov 16, 2022 •

edited

beccamc commented Nov 16, 2022

AlbelTec commented Nov 16, 2022

beccamc commented Nov 16, 2022

AlbelTec commented Nov 16, 2022

Soarc commented Dec 25, 2022

beccamc commented Jan 3, 2023

v-Hailishi commented Jan 4, 2023

beccamc commented Jan 4, 2023

LittleLittleCloud commented Jan 5, 2023 •

edited

LittleLittleCloud commented Jan 5, 2023

beccamc commented Jan 5, 2023

luisquintanilla commented Feb 16, 2023

scottyboiler commented Aug 16, 2023

luisquintanilla commented Aug 16, 2023

scottyboiler commented Aug 16, 2023

Text classification : Target 4294967295 is out of bounds (CPU) #2369

Text classification : Target 4294967295 is out of bounds (CPU) #2369

Comments

AlbelTec commented Nov 12, 2022 • edited

beccamc commented Nov 14, 2022

beccamc commented Nov 14, 2022

AlbelTec commented Nov 16, 2022 • edited

beccamc commented Nov 16, 2022

AlbelTec commented Nov 16, 2022

beccamc commented Nov 16, 2022

AlbelTec commented Nov 16, 2022

Soarc commented Dec 25, 2022

beccamc commented Jan 3, 2023

v-Hailishi commented Jan 4, 2023

beccamc commented Jan 4, 2023

LittleLittleCloud commented Jan 5, 2023 • edited

Update

LittleLittleCloud commented Jan 5, 2023

beccamc commented Jan 5, 2023

luisquintanilla commented Feb 16, 2023

scottyboiler commented Aug 16, 2023

luisquintanilla commented Aug 16, 2023

scottyboiler commented Aug 16, 2023

AlbelTec commented Nov 12, 2022 •

edited

AlbelTec commented Nov 16, 2022 •

edited

LittleLittleCloud commented Jan 5, 2023 •

edited